R-code for visual model summaries: linear mixed models

Posted on Updated on


Here is some code to summarize the coefficients of a linear mixed model that produces nice graphs like the following one (well, the curved arrows were added in powerpoint): [click to see a larger version]

An example slide of a linear mixed model summary

But first some background about the example model:

  • I gave a talk at AMLaP 2007 in Turku Finland on Rational Speakers: speakers help processing when it is most necessary. The talk presents an self-paced reading experiments based on object-extracted relative clause stimuli taken from the Wall Street Journal. They were sampled in a way that half of them contained a relativizer (“that”) and half did not. Then I created a matching stimulus for each of the randomly sampled relative clauses: if the original had “that”, the match didn’t have “that”, and vice versa. The goal of the experiment was to see whether “that” facilitates comprehension (as e.g. Race and MacDonald, 2003 have shown) and – more crucial to my research question – whether “that” was only present in the original stimuli when otherwise the relative clause would have been really hard to process (but not, if it already was easy to process). The results showed just that. In a second step I then wanted to know what properties of the stimuli made comprehending them easy or hard. For that I used a linear mixed model analysis regressing multiple potential effects against the (log) reading per word in the subject region of the relative clause. The predictors (see graph) included measures of the length of the subject region and measures of the information density at the onset of the relative clause. One of the results was the first evidence that constituent predictability (the predictability of a relative clause) influences reading time outside of an ambiguity over and above the influence of word predictability. Furthermore, this effect interacts with “that”-presence: “that” generally helps comprehension, but apparently it does so more, if the relative clause is unexpected (see also Race & MacDonald, 2003 for evidence that is compatible with this hypothesis).

Awesomeness. So, here is the R-code that printed those nice plots. Of course, you have to substitute your own model, etc. And one note of caution about collinearity: printing model coefficients (be it the numbers or the graph) is kind of meaningless, if you aren’t sure that the standard errors (and hence the confidence intervals) are not inflated because of collinearity in the model (to see whether your model contains collinearity, you can use the vif() or kappa() commands in R, or look at the correlation table for fixed effects that comes with the print output for linear mixed models). How to deal with collinearity is another matter (and another post).



# an example linear mixed model. we'll call it mls
mls <- lmer(log(RTsubj) ~
   R + C + RC +
   r_Ppreceding +
   r_CndP_preceding +
   r_Pfollowing +
   R_x_Ppreceding +
   r_R_x_CndP +
   r_R_x_Pfollowing +
   Lw +
   r_Ll +
   (1|SubjectID) +
   (1|ItemID),
   data=d
)

# we cannot trust the standard error estimates of the mixed model (it's anti-conservative)
# so we use Harald Baayen's function to calculate p-values and confidence intervals based
# on corrected estimates of the standard errors using MCMC sampling (see Baayen, Davidson, & Bates, 2007)
pls <- pvals.fnc(mls, nsim=20000)

# plot all coefficient values plus error bars (I'll exclude the intercept because it's of no significance here)
# store the coefficient (mean), the lower, and the higher confidence limit
# for all coefficients except for the intercept (hence the [-1] below).

c <- as.vector(as.numeric(pls[["fixed"]]$MCMCmean))[-1]
lc <- as.vector(as.numeric(pls[["fixed"]]$HPD95lower))[-1]
hc <- as.vector(as.numeric(pls[["fixed"]]$HPD95upper))[-1]
n <-  mls@cnames[[".fixed"]][-1]

# set some graphics parameters (font size, etc.)
par(mar=c(6.5,4.5,2.5,.1), cex= 1.3, cex.main=1.5, cex.lab=.9, cex.axis=.7)

# the actual plot
barplot(c,
   axisnames = F,
   ylim= c(-.1, .2),
   main= "Effects on RC subject region",
   ylab= "coefficient estimates",
   col=c(rep("#BBEE00",3), rep("#CCCCCC",3), rep("#DDFF88",3), rep("#CCCCCC",2)),
   las = 2, border=NA)

axis(side=1, at= seq(0.7,12.8,1.2),
   labels= c("REL present",
      "Original",
      "REL present\nx Original",
      "log P(w-1)",
      "log P(RC)",
      "log P(w+1)",
      "REL present\nx log P(w-1)",
      "REL present\nx log P(RC)",
      "REL present\nx log P(w+1)",
      "RC subject\nlength (w)",
      "RC subject\nlength (g)"
   ),
   tick=F,
   cex.axis=1,
   las=2)

# Add the error bars, based on the confidence intervals
errbar(seq(0.7,12.8,1.2), c, hc, lc, lwd=2, add=TRUE)
Advertisements

Questions? Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s