Jaeger (2008), J Memory Language, 59, 434-446 (ANOVA)

Posted on April 20, 2009 Updated on January 4, 2014

Since I get asked for the R code I promised in my 2008 JML paper on mixed logit models every now and then, I have posted it here. If you find this code useful, please consider citing the Jaeger (2008) paper:

Jaeger, T. Florian (2008). Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. Journal of Memory and Language 59, 434-446.

Please note, however, that the data analyzed in that paper is not mine and you need to acquire it from the Inbal Arnon, who conducted the study. With Inbal’s permission, here’s the data file I used:

Data from the comprehension component of Study 2 from Arnon, Inbal. “Rethinking child difficulty: The effect of NP type on children’s processing of relative clauses in Hebrew.” Journal of Child Language 37.01 (2010): 27-57.

If you try to work your way through my paper, you may also find the following wiki pages from our lab with readings and more code helpful:

http://wiki.bcs.rochester.edu/HlpLab/StatsCourses/

As a quick intro you may find the talks from a recent workshop on the conceptual background, common issues and solutions for ordinary and multilevel regression models that some colleagues (Dale Barr, Roger Levy, Harald Baayen, Victor Kuperman, Austin Frank) and I gave at the CUNY sentence processing conference 2009 useful. The talk slides are all linked to the schedule on that page. You’ll find detailed walk-throughs, R code, and a conceptual overviews.

I appreciate if you leave a comment here in case this was useful. It helps to see what we should be posting. cheers.

This entry was posted in articles, class/tutorial, For students, Papers, Presentations, etc., Statistics & Methodology, statistics/R and tagged ANOVA, Jaeger 2008, mixed logit model, multilevel logit model, R code, tutorials.

3 thoughts on “Jaeger (2008), J Memory Language, 59, 434-446 (ANOVA)”

Seth VW said:
December 3, 2014 at 12:58 am

Thanks for the link!

LikeLike

tiflo responded:
August 31, 2018 at 12:39 pm

As several readers have pointed out, the R syntax for some of the calls in the attached script has changed over the years. There have also been changes in the algorithms in the lme4 library that are used to fit logistic mixed effects models, and in the checks for convergence. This has two consequences. First, some of the parameter estimates are not exactly the same anymore as in those reported in the paper (but they are very close and the patterns of significance are the same). Second, the model with the maximal random effect structure by-subject now results in a non-convergence warning (‘lme4’ version 1.1-17).

And in case it is of interest, here is the slightly updated code. I’ve tried to make minimal edits, so as to preserve the full historical shame I feel for this code ;).

##########################################
# running logit mixed model
# import Inbal’s data
i <-data.frame(read.delim("~/Downloads/inbal.tab"))

# select comprehension data only
i.compr <- subset(i, modality == 1 & Correct != "#NULL!" & !is.na(Extraction) & !is.na(NPType))

# defining some variable values
i.compr$Correct <- as.factor(as.character(i.compr$Correct))
i.compr$RCtype<- as.factor(ifelse(i.compr$Extraction == 1, "subject RC", "object RC"))
i.compr$NPtype <- as.factor(ifelse(i.compr$NPType == 1, "lexical", "pronoun"))
i.compr$Condition <- paste(i.compr$RCtype, i.compr$NPtype)

library(lme4)
i.L <- lmList(Correct ~ Extraction * NPType | child, data = i.compr)

trellis.device(color=F)
xyplot(Correct ~ Extraction | child,
data=i.compr,
main="% correct answers",
# ylim=c(5,7),
panel=function(x, y){
panel.xyplot(x, y)
# panel.loess(x, y, span=1)
panel.lmline(x, y, lty=2)
}
)

contrasts(i.compr$RCtype) = cbind("Subject" = c(0,1))
contrasts(i.compr$NPtype) = cbind("Pronoun" = c(0,1))
i.ml.F1 <- glmer(Correct ~ RCtype * NPtype + (1 + RCtype * NPtype | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
summary(i.ml.F1)
i.ml.F1.reduced <- glmer(Correct ~ RCtype * NPtype + (1 + RCtype | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
summary(i.ml.F1.reduced)
i.ml.F1.final <- glmer(Correct ~ RCtype * NPtype + (1 | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
summary(i.ml.F1.final)

i.compr$ncRCtype <- scale(as.numeric(ifelse(i.compr$Extraction == 1, 1, -1)), scale=F)
i.compr$ncNPtype <- scale(as.numeric(ifelse(i.compr$NPType == 1, -1, 1)), scale=F)
i.compr$ncInt <- (i.compr$ncRCtype – mean(i.compr$ncRCtype)) *
(i.compr$ncNPtype – mean(i.compr$ncNPtype))

contrasts(i.compr$RCtype) = cbind("Subject vs. object" = c(-1,1))
contrasts(i.compr$NPtype) = cbind("Pronoun vs. noun" = c(-1,1))
i.ml.F1.final <- glmer(Correct ~ RCtype * NPtype + (1 + RCtype * NPtype | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
summary(i.ml.F1.final)
i.ml.F1.final <- glmer(Correct ~ RCtype * NPtype + (1 + NPtype | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
summary(i.ml.F1.final)

par(mar=c(4.5,2.2,0.2,2), cex.lab=1.5, cex=1.2)
graphics::hist(unlist(ranef(i.ml.F1.reduced))[1:24], main="", xlab="log-odds", ylab="N")
graphics::hist(unlist(ranef(i.ml.F1.reduced))[25:48], main="", xlab="log-odds", ylab="N")
graphics::hist(unlist(ranef(i.ml.F1.final)), main="", xlab="log-odds", ylab="N")

i.ml.F2 <- glmer(Correct ~ Extraction * NPType + (1 | itemby4), data = i.compr, family="binomial")
summary(i.ml.F2)

i.ml.F12 <- glmer(Correct ~ RCtype * NPtype + (1 | child) + (1 | itemby4), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
summary(i.ml.F12)

i.ml.F12 <- glmer(Correct ~ nRCtype + nNPtype + nInt + (1 + RCtype | child) + (1 + RCtype | itemby4), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
summary(i.ml.F12)

LikeLike

tiflo responded:
August 31, 2018 at 12:40 pm

And thanks to Chris Pike for making me aware of these issues.

LikeLike