Lot’s of zeros? Be careful with your chi-square (exact or not) and alike

Posted on Updated on


If you’re running chi-squares to analyze categorical data and you have lots of very low count (or even 0 cells), be careful in how to interpret the result. There’s a nice article by Andrew Gelman on this topic, where he shows that the problem is that all the low counts can make it harder to detect the signal (and hence a significant deviation from the expected values for a part of the table). Put differently, you might have a significant pattern, but not detect. I don’t think it’s so much a problem for most of the tests we conduct since contingency tables in psycholinguistic and linguistic research are usually rather small. I can’t recall the last time that I saw anything larger than a 3×4 or alike. From what I understand from the Gelman’s post, it would seem that the problem he points out becomes more serious the larger the table is.

Advertisements

3 thoughts on “Lot’s of zeros? Be careful with your chi-square (exact or not) and alike

    Brandon Loudermilk said:
    November 30, 2011 at 8:50 pm

    I had always heard you need a minimum of 5 per cell.
    http://frank.mtsu.edu/~dkfuller/notes302/chisquare.pdf

    Like

      tiflo said:
      November 30, 2011 at 9:11 pm

      Hi Brandon,

      what you are referring to is that the chi-square is known to be biased if the expected cell count (not the actual count) is lower than 5. This is a rule of thumb. The problem described in my blog post holds beyond this problem.

      HTH,
      Florian

      Like

Questions? Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s