Thanks to some recently developed tools, it’s becoming very convenient to do full Bayesian inference for generalized linear mixed-effects models. First, Andrew Gelman et al. have developed Stan, a general-purpose sampler (like BUGS/JAGS) with a nice R interface which samples from models with correlated parameters much more efficiently than BUGS/JAGS. Second, Richard McElreath has written glmer2stan, an R package that essentially provides a drop-in replacement for the lmer command that runs Stan on a generalized linear mixed-effects model specified with a lme4-style model formula.
This means that, in many cases, you simply simply replace calls to
(g)lmer() with calls to
library(glmer2stan) library(lme4) lmer.fit <- glmer(accuracy ~ (1|item) + (1+condition|subject) + condition, data=data, family='binomial') summary(lmer.fit) library(glmer2stan) library(rstan) stan.fit <- glmer2stan(accuracy ~ (1|item) + (1+condition|subject) + condition, data=data, family='binomial') stanmer(stan.fit)
There’s the added benefit that you get a sample from the full, joint posterior distribution of the model parameters
If you’d rather call
glmer2stan() also acts as a “Stan model compiler”, and can return a list with the input parameter required to compile and sample from the model using Stan:
stan.input <- glmer2stan(accuracy ~ (1|item) + (1+condition|subject) + condition, data=data, family='binomial', sample=FALSE) stan.output <- with(stan.input, stan(model_code=model, pars=pars, data=data))
If you go this route, you’ll have more control over the Stan end of things: you can re-use compiled models on different data sets, or even go in and edit the actual Stan model code to tweak or extend it. The tradeoff is that you lose some of the benefits of running things with the glmer2stan wrapper, like the
varef() helper functions that translate the Stan output into something more like lmer output.
Now, why should you care? Having samples from the posterior distributions of all the model parameters means that you can easily construct Bayesian confidence intervals for, say, the correlation between a random slope and intercept, unlike with lme4 which just returns a point estimate of the random effect correlations. It also makes it simple—both technically and conceptually—to construct Bayesian confidence intervals for fixed effects coefficients (further automated by
stanmer(), which makes the issue of assessing statistical significance of fixed effects in linear mixed-effects models quite a bit simpler.
Neither rstan or glmer2stan is available on CRAN, so you can’t simply
install.packages(), but it’s easy to install them nevertheless. To install rstan and its dependencies:
install.packages('inline') install.packages('Rcpp') options(repos = c(getOption("repos"), rstan = "http://wiki.rstan-repo.googlecode.com/git/")) install.packages('rstan', type = 'source')
This will take a little while and produce a lot of gibberish in the R console (things need to be compiled). Next, to install glmer2stan:
options(repos=c(getOption('repos'), glmer2stan="http://xcelab.net/R")) install.packages('glmer2stan', type='source')
I’ve noticed just a couple of wrinkles so far:
- For the binomial link function, you have to manually convert your response variable (‘accuracy’, here) to be a numeric variable with values of 0 or 1. lmer takes care of this for you (if accuracy is a factor).
- On a related note, you also need to manually convert factors specifying levels of random effects into level indices (e.g. 1 for the first subject, 2 for the second, etc.), like so:
data <- transform(data, subject.index=as.integer(as.factor(subject))) stan.fit <- glmer2stan(accuracy ~ 1 + (1 | subject.index), data=data, family='binomial'
- Because Stan works by compiling the model code into c++ before sampling, the time required to fit even a moderately complex model can be noticeably longer than for lmer. However, most of this comes from the compilation time (rather than actually sampling), which doesn’t scale with data set size. I’ve only used this on one, relatively small data set thus far, but word on the street is that Stan performs very well at reasonable scale (even without glmm-specific optimization).
- glmer2stan is under active development, so might not be the most stable or polished thing. See the GitHub repository for more information and some examples.