Lot’s of zeros? Be careful with your chi-square (exact or not) and alike

Posted on Updated on


If you’re running chi-squares to analyze categorical data and you have lots of very low count (or even 0 cells), be careful in how to interpret the result. There’s a nice article by Andrew Gelman on this topic, where he shows that the problem is that all the low counts can make it harder to detect the signal (and hence a significant deviation from the expected values for a part of the table). Put differently, you might have a significant pattern, but not detect. I don’t think it’s so much a problem for most of the tests we conduct since contingency tables in psycholinguistic and linguistic research are usually rather small. I can’t recall the last time that I saw anything larger than a 3×4 or alike. From what I understand from the Gelman’s post, it would seem that the problem he points out becomes more serious the larger the table is.

3 thoughts on “Lot’s of zeros? Be careful with your chi-square (exact or not) and alike

    Brandon Loudermilk said:
    November 30, 2011 at 8:50 pm

    I had always heard you need a minimum of 5 per cell.

    Click to access chisquare.pdf

    Like

      tiflo said:
      November 30, 2011 at 9:11 pm

      Hi Brandon,

      what you are referring to is that the chi-square is known to be biased if the expected cell count (not the actual count) is lower than 5. This is a rule of thumb. The problem described in my blog post holds beyond this problem.

      HTH,
      Florian

      Like

Leave a reply to Brandon Loudermilk Cancel reply