If you’re running chi-squares to analyze categorical data and you have lots of very low count (or even 0 cells), be careful in how to interpret the result. There’s a nice article by Andrew Gelman on this topic, where he shows that the problem is that all the low counts can make it harder to detect the signal (and hence a significant deviation from the expected values for a part of the table). Put differently, you might have a significant pattern, but not detect. I don’t think it’s so much a problem for most of the tests we conduct since contingency tables in psycholinguistic and linguistic research are usually rather small. I can’t recall the last time that I saw anything larger than a 3×4 or alike. From what I understand from the Gelman’s post, it would seem that the problem he points out becomes more serious the larger the table is.
17
Nov
11
Lot’s of zeros? Be careful with your chi-square (exact or not) and alike
Advertisement
I had always heard you need a minimum of 5 per cell.
http://frank.mtsu.edu/~dkfuller/notes302/chisquare.pdf
Hi Brandon,
what you are referring to is that the chi-square is known to be biased if the expected cell count (not the actual count) is lower than 5. This is a rule of thumb. The problem described in my blog post holds beyond this problem.
HTH,
Florian
Got it, thanks for the clarification. ~b