Ever noticed? Have a closer look at Google counts

Posted on Updated on


The other day, Anne Pier Salverda made me aware of the following strange co-incidence. While googling for “two women were having dinner” only yields a handful of hits (3 when I was performing the search today), the search for “two men were having dinner” yields tens of thousands of hits (about 220,000 when I performed the search). Let’s not ask why Anne Pier was searching for these strings to begin with ;), but it was curious, so I looked for more.

This does not seem to be driven by highly repeated mentions of only a few stories.  Neither does it seem to be gender specific. Searches for “two x were having dinner”, where x is one of “boys”, “girls”, “guys” yield no or only a few hits.

So, is this really just a weird coincidence of this particular string “two men were having dinner”? Curiously, the same set of pattern in the present progressive yields the same asymmetry: “two men are having dinner” yielded about 185,000 hits, where as the other searches (“women”, “girls”, “boys”, “guys”) yielded no or only a handful of hits. The same asymmetry also holds for “two x have dinner”. Huh?!?

Then I had a closer look at these thousands of hits. You can do it yourself for any of the searchers (provided that Google hasn’t changed this already): As soon as you click to see any of the hits beyond page 1, Google suddenly claims that there are only about 40-50 hits (depending on the particular “Two men HAVE dinner” string used). These few hits in turn seem to come from only a few news sources. So, actually, all that seems to be going on is that there are a few more hits for the “men” examples — perfectly consistent with the fact that the internet (surprisingly) seems to talk more about “men” than “women” (a 10:7 when I last looked).

Great. Have you ever noticed anything comparable? Is this a bug? I just check bing.com and they seem to return the right count. What a sad day.

Advertisements

2 thoughts on “Ever noticed? Have a closer look at Google counts

    Daniel Kaufman said:
    January 11, 2010 at 12:13 am

    Hi Florian,
    This strange phenomenon has been around from the beginning it seems. I had to throw out a bunch of frequency data when I discovered it. It’s worse than you state actually, because you have to go to the very last page of the search results to see the correct number of hits. But at least there is a way to get the right number.
    In any case, all who use frequency data from Google, beware!
    -dan

    Like

      tiflo said:
      January 11, 2010 at 12:23 am

      Ha, good to know that I am not insane. The correct number showed up for me whenever I advanced to the page that would have had more hits than there actually were. For the searches I did, this was the second page (out of hundreds). But, yes, it’s a nasty problem. Does it come up with non-multi-string searches as well?

      Like

Questions? Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s