October 29, 2006
Analytics Guesses Connection Speed?
Discovered something interesting tonight as I turned on and tested a new site -- Google Analytics appears to classify traffic coming in via AOL as dialup traffic. In this case, the AOL traffic was identifiable as my test surfing, which I was doing across a broadband connection using the Mac and Windows AOL clients.
This supports speculation in the Analytics group that the stats service does not use any sort of empirical testing, but instead looks at the ISP or IP range and attempts to make an educated guess based on what an IP database returns.
Unconfirmed speculation is all the group has had to go on; Google's Help Center states that the Analytics measurement is of "The visitor's connection speed, as determined by the visitor's internet connection as determined by the browser". This is a bit ambiguous, and afaict no one from Google has appeared on group to clarify the questioning posts.
I had always assumed an IP database existed, because out of the Analytics connection speed categories: 'Cable/DSL', 'Corporate', Dialup' and 'Unknown', the Cable/DSL and Corporate segmentation could only come from an IP owners database, still, it's a bit disappointing to learn that the Connection Speed reports are apparently inferred and not actually measured.
If I had a rich media site with a significant amount of AOL traffic, I'd want to know just how much of that traffic is broadband vs. dialup; looks like in such a case Analytics wouldn't be able to help me.
Posted by Lewis Francis at October 29, 2006 9:44 PM
Seems like a very difficult thing to actually determine. From the user's browser all the way to a web site there can be a multitude of transport media involved with a multitude of bottlenecks that would mask the true performance metrics needed to ascertain whether or not a user has a broadband connection. But clearly an IP database or ISP identification isn't really going to help either. Even if it does accurately identify a *supposed* broadband connection, it doesn't attempt to discern at all whether or not the connection is performing like a broadband connection. (like a small business LAN sharing a connection, reducing the effective speed to dialup). Not mention the problem of keeping those databases up to date. Does earthlink, for example, have a range of IP addresses on a certain subnet reserved for dialup users and another range for broadband users? Over time I'd think the broadband pool would grow while the dialup would shrink, and if the IP database being used isn't updated it'd report broadband users as being dialup users.
The best solution would probably be to identify the bandwith at each part of the network at the protocol level, like a new TCP/IP wrapper that includes the performance potential at each stage of a packet's journey. Then the remote host can examine the data and see what the connection really looks like. Then again, maybe this is already done - network protocols aren't really my specialty.
Yeah, you are right, bandwidth detection is a tough thing to pull off, but with enough samples you can come up with something useful.
The problem with using a test payload is it adds an additional page load burden. In the above audit we 'volunteered' a random sampling of visitors to bear this burden for the sake of science. ;)
Here is an interesting article on optimizing performance I snagged off of Digg last week; about half way down he talks about doing something similar and using log analysis to get the numbers: http://www.die.net/musings/page_load_time/