It’s certainly well known that Cloudflare’s analytics (or any request-based analytics, for that matter) tend to give larger numbers than those that are javascript based. The de facto explanations make sense: some users navigate away from a page before the it is fully loaded (before the javascript executed); other users keep javascript disabled. There are more.

But these standard answers really can’t explain the discrepancy I tend to see between the real numbers. Take these stats as an example:

Google Analytics

Pageviews: 143,646
Unique Visitors: 36,091

Cloudflare Analytics

Pageviews: 425,785
Unique Visitors: 111,921
Screen Shot 2013-04-28 at 11.18.19 PM Screen Shot 2013-04-28 at 9.49.24 PM

 

If we are to believe these numbers, than either:

  1. Most people browse the web cURL spoofed to look like Chrome, Safari or Firefox
  2. There more at work than the standard explanations

 

In a comment I just made to Hey CloudFlare, What’s Wrong with these Numbers?, I gave one suggestion:

It’s possible that Cloudflare is logging anything it connects to the server, *including* non-2XX code responses. For example, my organization uses the root domain, and 501 redirects any www. subdomain traffic to the corresponding root URL. On CF, this might be marked as 2 separate requests.

On the other hand, I’m not sure this would explain the discrepancy between the 36k monthly *uniques* on GA, verses the 112k on CF. Any thoughts?

But as you see I couldn’t even get through a comment without finding the hole in my argument. So, here’s another suggestion:

 

Prefetching and Prerendering.

There’s a relevant standard, and Chrome, Firefox, Safari, and even IE do it to varying degrees. While I haven’t researched the implementation subtleties of the different browsers, Google Chrome is apparently able to “predict network actions” and might prerender pages based on the links from anywhere, including search results.

And this is what piqued my attention. If I look at queries in the SEO section of Google Analytics, and I select only those with an average position of 10.0 or better, I see 73,599 impressions with only 8,188 clicks. That’s 65,411 opportunities for the browser to prefetch or prerender our site based on first-page Google results alone, which is likely an underestimate. It doesn’t factor in the many results that have extremely high impressions and click through rates, but don’t average on the first page. Here’s the name of the organization itself, for example:

Screen Shot 2013-04-29 at 12.45.26 AM

If Google Chrome is as smart as I think it is, there are many prerender requests to our server attributable to this keyword alone. And with Chrome responsible for nearly 32% of our traffic (and other browsers likely also contributing to the effect), this is a very likely source of this discrepancy.

I’m sure there are even more factors at play, but this is one to look out for… if that’s possible. While Firefox sends some header information when prefetching ( X-moz: prefetch ) Chrome doesn’t.

If you’ve made it this far, let me know if I’ve missed something!