Jaeger (2008), J Memory Language, 59, 434-446 (ANOVA)

Posted on Updated on


Since I get asked for the R code I promised in my 2008 JML paper on mixed logit models every now and then, I have posted it here. If you find this code useful, please consider citing the Jaeger (2008) paper:

Please note, however, that the data analyzed in that paper is not mine and you need to acquire it from the Inbal Arnon, who conducted the study. With Inbal’s permission, here’s the data file I used:

  • Data from the comprehension component of Study 2 from Arnon, Inbal. “Rethinking child difficulty: The effect of NP type on children’s processing of relative clauses in Hebrew.” Journal of Child Language 37.01 (2010): 27-57.

If you try to work your way through my paper, you may also find the following wiki pages from our lab with readings and more code helpful:

http://wiki.bcs.rochester.edu/HlpLab/StatsCourses/

As a quick intro you may find the talks from a recent workshop on the conceptual background, common issues and solutions for ordinary and multilevel regression models that some colleagues (Dale Barr, Roger Levy, Harald Baayen, Victor Kuperman, Austin Frank) and I gave at the CUNY sentence processing conference 2009 useful. The talk slides are all linked to the schedule on that page. You’ll find detailed walk-throughs, R code, and a conceptual overviews.

I appreciate if you leave a comment here in case this was useful. It helps to see what we should be posting. cheers.

3 thoughts on “Jaeger (2008), J Memory Language, 59, 434-446 (ANOVA)

    Seth VW said:
    December 3, 2014 at 12:58 am

    Thanks for the link!

    Like

    tiflo responded:
    August 31, 2018 at 12:39 pm

    As several readers have pointed out, the R syntax for some of the calls in the attached script has changed over the years. There have also been changes in the algorithms in the lme4 library that are used to fit logistic mixed effects models, and in the checks for convergence. This has two consequences. First, some of the parameter estimates are not exactly the same anymore as in those reported in the paper (but they are very close and the patterns of significance are the same). Second, the model with the maximal random effect structure by-subject now results in a non-convergence warning (‘lme4’ version 1.1-17).

    And in case it is of interest, here is the slightly updated code. I’ve tried to make minimal edits, so as to preserve the full historical shame I feel for this code ;).

    ##########################################
    # running logit mixed model
    # import Inbal’s data
    i <-data.frame(read.delim("~/Downloads/inbal.tab"))

    # select comprehension data only
    i.compr <- subset(i, modality == 1 & Correct != "#NULL!" & !is.na(Extraction) & !is.na(NPType))

    # defining some variable values
    i.compr$Correct <- as.factor(as.character(i.compr$Correct))
    i.compr$RCtype<- as.factor(ifelse(i.compr$Extraction == 1, "subject RC", "object RC"))
    i.compr$NPtype <- as.factor(ifelse(i.compr$NPType == 1, "lexical", "pronoun"))
    i.compr$Condition <- paste(i.compr$RCtype, i.compr$NPtype)

    library(lme4)
    i.L <- lmList(Correct ~ Extraction * NPType | child, data = i.compr)

    trellis.device(color=F)
    xyplot(Correct ~ Extraction | child,
    data=i.compr,
    main="% correct answers",
    # ylim=c(5,7),
    panel=function(x, y){
    panel.xyplot(x, y)
    # panel.loess(x, y, span=1)
    panel.lmline(x, y, lty=2)
    }
    )

    contrasts(i.compr$RCtype) = cbind("Subject" = c(0,1))
    contrasts(i.compr$NPtype) = cbind("Pronoun" = c(0,1))
    i.ml.F1 <- glmer(Correct ~ RCtype * NPtype + (1 + RCtype * NPtype | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
    summary(i.ml.F1)
    i.ml.F1.reduced <- glmer(Correct ~ RCtype * NPtype + (1 + RCtype | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
    summary(i.ml.F1.reduced)
    i.ml.F1.final <- glmer(Correct ~ RCtype * NPtype + (1 | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
    summary(i.ml.F1.final)

    i.compr$ncRCtype <- scale(as.numeric(ifelse(i.compr$Extraction == 1, 1, -1)), scale=F)
    i.compr$ncNPtype <- scale(as.numeric(ifelse(i.compr$NPType == 1, -1, 1)), scale=F)
    i.compr$ncInt <- (i.compr$ncRCtype – mean(i.compr$ncRCtype)) *
    (i.compr$ncNPtype – mean(i.compr$ncNPtype))

    contrasts(i.compr$RCtype) = cbind("Subject vs. object" = c(-1,1))
    contrasts(i.compr$NPtype) = cbind("Pronoun vs. noun" = c(-1,1))
    i.ml.F1.final <- glmer(Correct ~ RCtype * NPtype + (1 + RCtype * NPtype | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
    summary(i.ml.F1.final)
    i.ml.F1.final <- glmer(Correct ~ RCtype * NPtype + (1 + NPtype | child), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
    summary(i.ml.F1.final)

    par(mar=c(4.5,2.2,0.2,2), cex.lab=1.5, cex=1.2)
    graphics::hist(unlist(ranef(i.ml.F1.reduced))[1:24], main="", xlab="log-odds", ylab="N")
    graphics::hist(unlist(ranef(i.ml.F1.reduced))[25:48], main="", xlab="log-odds", ylab="N")
    graphics::hist(unlist(ranef(i.ml.F1.final)), main="", xlab="log-odds", ylab="N")

    i.ml.F2 <- glmer(Correct ~ Extraction * NPType + (1 | itemby4), data = i.compr, family="binomial")
    summary(i.ml.F2)

    i.ml.F12 <- glmer(Correct ~ RCtype * NPtype + (1 | child) + (1 | itemby4), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
    summary(i.ml.F12)

    i.ml.F12 <- glmer(Correct ~ nRCtype + nNPtype + nInt + (1 + RCtype | child) + (1 + RCtype | itemby4), data = i.compr, family="binomial", nAGQ=1) # nAGQ is Laplace # method="Laplace")
    summary(i.ml.F12)

    Like

    tiflo responded:
    August 31, 2018 at 12:40 pm

    And thanks to Chris Pike for making me aware of these issues.

    Like

Questions? Thoughts?