The ‘softer kind’ of tutorial on linear mixed effect regression

Posted on

I recently was pointed to this nice and very accessible tutorial on linear mixed effects regression and how to run them in R by Bodo Winter (at UC Merced). If you don’t have much or any background in this type of model, I recommend you pair it with a good conceptual introduction to these models like Gelman and Hill 2007 and perhaps some slides from our LSA 2013 tutorial.

There are a few thing I’d like to add to Bodo’s suggestions regarding how to report your results:

  1. be clear how you coded the variables since this does change the interpretation of the coefficients (the betas that are often reported). E.g. say whether you sum- or treatment-coded your factors, whether you centered or standardized continuous predictors etc. As part of this, also be clear about the direction of the coding. For example, state that you “sum-coded gender as female (1) vs. male (-1)”. Alternatively, report your results in a way that clearly states the directionality (e.g., “Gender=male, beta = XXX”).
  2. please also report whether collinearity was an issue. E.g., report the highest fixed effect correlations.

Happy reading.


Updated slides on GLM, GLMM, plyr, etc. available

Posted on

Some of you asked for the slides to the Mixed effect regression class I taught at the 2013 LSA Summer Institute in Ann Arbor, MI. The class covered some Generalized Linear Model, Generalized Linear Mixed Models, extensions beyond the linear model, simulation-based approaches to assessing the validity (or power) of your analysis, data summarization and visualization, and reporting of results. The class included slides from Maureen Gillespie, Dave Kleinschmidt, and Judith Degen (see above link). Dave even came by to Ann Arbor and gave his lecture on the awesome power of plyr (and reshape etc.), which I recommend. You might also just browse through them to get an idea of some new libraries (such as Stargazer for quick and nice looking latex tables). There’s also a small example to work through for time series analysis (for beginners).

Almost all slides were created in knitr and latex (very conveniently integrated into RStudio — I know some purists hate it, but comm’on), so that the code on the slides is the code that generated the output on the slides. Feedback welcome.



Workshop announcement (Tuebingen): Advances in Visual Methods for Linguistics

Posted on

This workshop on data visualization might be of interest to a lot of you. I wish I could just hop over the pond.

  • Date: 24-Sept-2014 – 26-Sept-2014
  • Location: Tuebingen, Germany
  • Contact Person: Fabian Tomaschek (contact@avml-meeting.com)
  • Web Site: http://avml-meeting.com
  • Call Deadlines: 21 March / 18 April

The AVML-meeting offers a meeting place for all linguists from all fields who are interested in elaborating and improving their data visualization skills and methods. The meeting consists of a one-day hands-on workshop Read the rest of this entry »

Is my analysis problematic? A simulation-based example

Posted on Updated on

This post is in reply to a recent question on in ling-R-lang by Meredith Tamminga. Meredith was wondering whether an analysis she had in mind for her project was circular, causing the pattern of results predicted by the hypothesis that she was interested in testing. I felt her question (described below in more detail) was an interesting example that might best be answered with some simulations. Reasoning through an analysis can, of course, help a lot in understanding (or better, as in Meredith’s case, anticipating) problems with the interpretation of the results. Not all too infrequently, however, I find that intuition fails or isn’t sufficiently conclusive. In those cases, simulations can be a powerful tool in understanding your analysis. So, I decided to give it a go and use this as an example of how one might approach this type of question.

Results of 16 simulated priming experiments with a robust priming effect (see title for the true relative frequency of each variant in the population).
Figure 1: Results of 16 simulated priming experiments with a robust priming effect (see title for the true relative frequency of each variant in the population). For explanation see text below.

Read the rest of this entry »

Going full Bayesian with mixed effects regression models

Posted on Updated on

Thanks to some recently developed tools, it’s becoming very convenient to do full Bayesian inference for generalized linear mixed-effects models. First, Andrew Gelman et al. have developed Stan, a general-purpose sampler (like BUGS/JAGS) with a nice R interface which samples from models with correlated parameters much more efficiently than BUGS/JAGS. Second, Richard McElreath has written glmer2stan, an R package that essentially provides a drop-in replacement for the lmer command that runs Stan on a generalized linear mixed-effects model specified with a lme4-style model formula.

This means that, in many cases, you simply simply replace calls to (g)lmer() with calls to glmer2stan():

lmer.fit <- glmer(accuracy ~ (1|item) + (1+condition|subject) + condition, 
                  data=data, family='binomial')
stan.fit <- glmer2stan(accuracy ~ (1|item) + (1+condition|subject) + condition, 
                       data=data, family='binomial')

There’s the added benefit that you get a sample from the full, joint posterior distribution of the model parameters

Read on for more about the advantage of this approach and how to use it.

Using plyr to get intimate with your data

Posted on Updated on

I gave a short tutorial [pdf slides] at the LSA summer institute on one of my favorite R packages: plyr (another brilliant Hadley Wickham creation). This package provides a set of very nice and semantically clean functions for exploring and manipulating data. The basic process that these functions carry out is to split data up in some way, do something to each piece, and then combine the results from each piece back together again.

One of the most common tasks that I use this for is to do some analysis to data from each subject in an experiment, and collect the results in a data frame. For instance, to calculate the mean and variance of each subject’s reaction time, you could use:

ddply(my.data, "subject.number", function(d) {
  return(data.frame(mean.RT=mean(d$RT), var.RT=mean(d$RT)))

Plyr also provides a whole host of convenience functions. For instance, you could accomplish the same thing using a one-liner:

ddply(my.data, "subject.number", summarise, mean.RT=mean(RT), var.RT=var(RT))

There are lots more examples (as well as more background on functional programming in general and the other use cases for plyr) in the slides [pdf] (knitr source is here, too).

Knit from the command line

Posted on

Knitr is a great way to combine document markup (Latex, Markdown, HTML, etc.) with R code for data analysis and visualization. It pulls out the chunks of R code, runs them, and re-inserts the results into the document source (usually a .tex file), which can then be compiled as usual. Normally you would call it from an R console (or use something like RStudio), but what if you want to call it from the command line, like latex?  Here’s a little shell script that I use to automate the knitting of .Rnw files (combining R and Latex): knit.sh.

It call knit() inside R, then runs pdflatex on the resulting file. It is very simple to use (you must of course have the knitr package installed in R):
knit.sh awesomefile.Rnw

This would produce awesomefile.pdf (as well as the intermediate file awesomefile.tex, and the extracted R commands, awesomefile.R). You might even rename the script as knit and put it somewhere on your search path (maybe /usr/local/bin/) to be even more fancy.