Googlean logic. Are Google’s counts faked?
January 30th, 2005

In a posting on Technologies du Langage Jean V‚àö¬©ronis analyzes the algorithm’s approximations of Google’s Boolean logic.

Today Jean adds in a new comment that “it shows that the number of results returned by Google is mathematically impossible. It could be a bug (a major one, though), or that they are inflating the index size for marketing reasons. I don’t know, but there is something fishy”.

“I’ve found much more — and much more disturbing. The counts themselves are flawed in a major way, even if you don’t use any “advanced” (or not so advanced) search capabilites”.

In any case, I would not recommend professional uses of Google’s counts (such as “Google linguistics“). Yahoo! seems more reliable — or are they simply cleverer?

Thank you Jean !

  • del.icio.us
  • Digg
  • Technorati
  • blogmarks
  • co.mments
  • BlinkList
  • NewsVine
  • Slashdot
  • Reddit
  • Shadows
  • StumbleUpon
  • YahooMyWeb
Comments
1 - Kevin Carey

Search engine result counts are estimates. The basic algorithm is to count the number of matches in the first X% of the index, then infer the match count based on the size of the index. Therefore, as the index size grows, so will the error in match counts.

Other search engines may have more accurate counts simply because they have smaller indexes.

You are right in principle. However, the estimates are reasonable at Yahoo, and their index is (according to my estimates) around 5 billion pages, which is not so different from Google’s (who claims 8 b.). In addition, Google had the same problems when their index size was 4 b. (before Nov. 2004).