Going full Bayesian with mixed effects regression models

Posted on Updated on

Thanks to some recently developed tools, it’s becoming very convenient to do full Bayesian inference for generalized linear mixed-effects models. First, Andrew Gelman et al. have developed Stan, a general-purpose sampler (like BUGS/JAGS) with a nice R interface which samples from models with correlated parameters much more efficiently than BUGS/JAGS. Second, Richard McElreath has written glmer2stan, an R package that essentially provides a drop-in replacement for the lmer command that runs Stan on a generalized linear mixed-effects model specified with a lme4-style model formula.

This means that, in many cases, you simply simply replace calls to (g)lmer() with calls to glmer2stan():

lmer.fit <- glmer(accuracy ~ (1|item) + (1+condition|subject) + condition, 
                  data=data, family='binomial')
stan.fit <- glmer2stan(accuracy ~ (1|item) + (1+condition|subject) + condition, 
                       data=data, family='binomial')

There’s the added benefit that you get a sample from the full, joint posterior distribution of the model parameters

If you’d rather call stan() yourself, glmer2stan() also acts as a “Stan model compiler”, and can return a list with the input parameter required to compile and sample from the model using Stan:

stan.input <- glmer2stan(accuracy ~ (1|item) + (1+condition|subject) + condition, 
                         data=data, family='binomial', sample=FALSE)
stan.output <- with(stan.input, stan(model_code=model, pars=pars, data=data))

If you go this route, you’ll have more control over the Stan end of things: you can re-use compiled models on different data sets, or even go in and edit the actual Stan model code to tweak or extend it. The tradeoff is that you lose some of the benefits of running things with the glmer2stan wrapper, like the stanmer() and varef() helper functions that translate the Stan output into something more like lmer output.

Now, why should you care? Having samples from the posterior distributions of all the model parameters means that you can easily construct Bayesian confidence intervals for, say, the correlation between a random slope and intercept, unlike with lme4 which just returns a point estimate of the random effect correlations. It also makes it simple—both technically and conceptually—to construct Bayesian confidence intervals for fixed effects coefficients (further automated by stanmer(), which makes the issue of assessing statistical significance of fixed effects in linear mixed-effects models quite a bit simpler.

Neither rstan or glmer2stan is available on CRAN, so you can’t simply install.packages(), but it’s easy to install them nevertheless. To install rstan and its dependencies:

options(repos = c(getOption("repos"), rstan = "http://wiki.rstan-repo.googlecode.com/git/"))
install.packages('rstan', type = 'source')

This will take a little while and produce a lot of gibberish in the R console (things need to be compiled). Next, to install glmer2stan:

options(repos=c(getOption('repos'), glmer2stan="http://xcelab.net/R"))
install.packages('glmer2stan', type='source')

I’ve noticed just a couple of wrinkles so far:

  1. For the binomial link function, you have to manually convert your response variable (‘accuracy’, here) to be a numeric variable with values of 0 or 1. lmer takes care of this for you (if accuracy is a factor).
  2. On a related note, you also need to manually convert factors specifying levels of random effects into level indices (e.g. 1 for the first subject, 2 for the second, etc.), like so:
    data <- transform(data, subject.index=as.integer(as.factor(subject)))
    stan.fit <- glmer2stan(accuracy ~ 1 + (1 | subject.index), data=data, family='binomial'
  3. Because Stan works by compiling the model code into c++ before sampling, the time required to fit even a moderately complex model can be noticeably longer than for lmer. However, most of this comes from the compilation time (rather than actually sampling), which doesn’t scale with data set size. I’ve only used this on one, relatively small data set thus far, but word on the street is that Stan performs very well at reasonable scale (even without glmm-specific optimization).
  4. glmer2stan is under active development, so might not be the most stable or polished thing. See the GitHub repository for more information and some examples.

14 thoughts on “Going full Bayesian with mixed effects regression models

    Noah Motion said:
    December 13, 2013 at 10:19 pm

    It is (in my opinion) a minor point, but generally speaking, they’re not called “confidence intervals” in Bayesian data analysis. I’m not sure if there’s a standard name, but I’ve seen “credible interval” and, when estimating the interval in the appropriate way, the “highest density interval” or HDI.


      tiflo said:
      December 14, 2013 at 5:43 am

      Hi Noah,

      Absolutely. I think Dave was trying to keep things both succinct and accessible to non-Bayesians.



        Shravan Vasishth said:
        December 14, 2013 at 6:29 am

        Hi guys,

        In my opinion it is worth working within the RStan and/or JAGS (in R) language to define your models, instead of using a front-end for compiling and running these models. This helps you understand better what you are fitting. I think that one thing that has happened with the convenient lmer function is that we end up producing a lot of garbage in published work. We don’t really understand what the underlying model is. By we I mean many of us. The convenience leads to confusion. Of course, you have to pay the price of learning a bit more about the theory behind these models, but it seems to me it’s worth it. I just got done teaching bayesian methods to grad students in Linguistics at Potsdam. The big result for me is that it can be done.

        Liked by 2 people

          tiflo said:
          December 14, 2013 at 5:26 pm

          Hi Shravan,

          I agree that it’s worth learning Bayesian statistics and to directly work with RStan, JAGS, or alike. Dave Knill occasionally teaches such a class in our department and (I am sure you know) there’s a great new book by John Kruschke “Doing Bayesian Data Analysis”. If you don’t mind, do you want to post a link to your course materials here?

          I disagree, however, that one should expect everyone to delve as deep as possible into statistics (perhaps you didn’t mean that, but I read your post as saying that). Interfaces like lmer in R (or, say, SPSS, brrr) are necessary and, while they come at a cost for the field (and dedicated reviewers!), they are overall helpful. Requiring everyone to only use statistics they fully understand, in my view is equivalent to only allowing researchers to only use an eye-tracker, MEG, ERP, etc. when they exactly understand the inner workings of it. It’s invaluable to have researchers who meet that criterion in the field, but we wouldn’t be able to do anything if only everyone would have to be like that.

          I also have grown increasingly convinced that most harm (in terms of spurious results or null results) originates in bad methodological and statistical choices that are basic in nature, rather than specific to the chosen method of analysis. For example, adding and removing covariates without thinking about inflates Type I error rates, running some more subjects without thinking about the same issue (see Simmons et al 2011 in PsychScience for a great demonstration of the harm this can do), the failure to think of the nature of the scale when interpreting supposed interactions (the literature is full of likely spurious results that depend on the assumption that the effects are linear in raw reaction times), etc.

          Anyway, I think you know that I like you wish that we’d all be more careful in interpreting our analyses and spend more time understanding them, but I think all we can do is to learn it ourselves, try to teach those who are interested in it, and try to help others during reviews? I don’t see that interfaces like lmer make things worse. I hope that having an additional interface that lets you access the distribution of parameters over the mixed model will be a helpful starter for at least some. I’m pretty sure that some users will be much too curious to just use the interface and find its restrictions too limiting.



    Marc said:
    December 15, 2013 at 10:48 am

    You can also use R-INLA package: http://www.r-inla.org/


    seisen2013 said:
    December 16, 2013 at 3:34 am

    Reblogged this on Experimental Linguistics in the Field and commented:
    Interesting new statistics tools


    Shravan Vasishth said:
    December 17, 2013 at 4:19 am

    Hi Florian and everyone else,

    My course website is: http://www.ling.uni-potsdam.de/~vasishth/advanceddataanalysis.html

    I just posted some Stan and JAGS linear mixed models there. I’ll post some more complex ones soon.

    I will be teaching this course every winter semester, and will be posting a major extension of the lecture notes soon (by early Jan). I will also be teaching an intro course this coming summer on frequentist methods, the lecture notes for that will be up in early April (a revision of my old lecture notes, which are online).

    You wrote: “Requiring everyone to only use statistics they fully understand, in my view is equivalent to only allowing researchers to only use an eye-tracker, MEG, ERP, etc. when they exactly understand the inner workings of it. It’s invaluable to have researchers who meet that criterion in the field, but we wouldn’t be able to do anything if only everyone would have to be like that.”

    I don’t know much about things like MEG. But for eye tracking I would actually require that the user knows quite a bit more than just the fact that the tracker tracks the eye while you read or view a scene.

    ” I don’t see that interfaces like lmer make things worse.”

    For people like you, definitely not. For those using lmer or aov to make a binary decision (reject null or accept null) after typing lmer(…) or aov(…) and hitting enter, it can indeed be very very harmful. My lecture note (linked above) have an example.

    I don’t think people need to become statisticians. But there is a halfway point we have to be able to reach before fitting such models. That’s what I’m trying to achieve at Potsdam, and my conclusion is that it can be done. You don’t have to know how to analytically compute a conditional distribution given a joint distribution, but you need to know conceptually how it works. You have to have a feeling for conjugate analyses, even if you can’t derive one analytically yourself (I don’t mean you, Florian ;).


      tiflo said:
      December 17, 2013 at 7:16 am

      well, because you clarified that at the end, I did approve this post ;). Thanks for the links. They will be really helpful.


    Shravan Vasishth said:
    December 17, 2013 at 6:22 am

    I also wanted to say that the reason I don’t like Kruschke’s book much is that it wastes a lot of space trying to attack the frequentist approach and to demonstrate the advantage of the bayesian approach. What’s wrong with that is frequentist methods have their place (e.g., when you have lots of data, your posterior distribution is going to be dominated by the likelihood anyway; bayes comes into its own when you have little data and a lot of prior beliefs). I don’t think we need to attack frequentist methods in order to appreciate the beauty of bayesian methods. We don’t need to create a new generation of bayesian fanatics.

    In my opinion, the absolute best book on bayes out there is the one written by Scott Lynch. He understood what the essential ideas are that need to be communicated, and he knows how to communicate them. Books like Gelman et al’s 2014 BDA book and dozens of other books written by statisticians are addressed basically to themselves and people who know as much as them. Gelman et al 2014 and other books are worth reading, but only after you already know quite a bit.

    Scott Lynch’s book actually tries to explain things to a newcomer. Calculus is needed for his book, but you can get by without doing a single derivation as long as you understand that probability for us just the area under a curve.

    Here is Scott’s book: http://www.amazon.com/Introduction-Statistics-Estimation-Scientists-Behavioral/dp/038771264X

    A close second to Lynch’s book is the one by Lunn et al., unfortunately written for WinBUGS users, and getting minimum working examples going can be a pain (I will post self-contained code eventually, on my course home page).

    Here is Lunn’s et al’s book: http://www.crcpress.com/product/isbn/9781584888499


      tiflo said:
      December 17, 2013 at 7:23 am


      I really like the Scott Lynch book, too. It’s a great, conceptual and absolutely accessible introduction to many of the critical Bayesian concepts. The chapters on sampling and the stepwise introduction to different methods (Gibbs, Metropolis-Hastings) is really well done. I think reading Lynch along with Gelman and Hill 2007 works well for people with little to no background. Both come with R examples, code, and Gelman and Hill 2007 comes with its own R library. Thanks for posting the links.

      I’ve only read the first few chapters of Kruschke’s book (when someone living at my place ordered it, sneaky me), but I really enjoyed them. It struck me as a playful, but clear, approach with many examples, code, etc. I’d say something for someone who want to spend the time to develop clear intuitions about Bayesian data analysis (perhaps more so than Lynch, which is really good at targeting a bit more of a conceptual level).



    Shravan Vasishth said:
    December 17, 2013 at 4:35 pm

    Guten Morgen Florian,

    The Kruschke book is pretty accessible, but one problem with it (at least the version I had bought) is the large chunks of code htat go on and on and on. For a book that cost, I think, 140 Euros or something crazy like that, I would have preferred to see some content rather than code that is downloadable.

    I also learnt a lot from Gelman and Hill, it’s a great book. But Gelman’s code is very messy (try to get the radon example working—you have to search all over the place to find all the files that must be run before the code works). Let’s see what happens in the next, Stan edition! I’m looking forward to seeing whether his R files will run out of the box. He told me he can’t be bothered to fix his code, it takes too much time. This is also why he doesn’t release code that goes with his papers. which is strange, given how hard he is on people who don’t release their data and/or code: http://www.stat.columbia.edu/~gelman/research/published/ChanceEthics1.pdf

    The response to Gelman is worth reading by the way:

    But I’m rambling. I guess I’m in a bad mood these days, what with all this christmas cheer and all that.


      tiflo said:
      December 17, 2013 at 11:02 pm

      well, remind me to never catch you on a grumpy day! I had trouble using code out of the Gelman book (as you described), but it still was hugely helpful because if its accessibility and it’s very good at conveying the intuition behind mixed models.

      Cool exchange between Gelman and Blackman, btw!


    Philip Hofmeister said:
    January 31, 2014 at 4:50 pm

    To at least partly speak to Shravan’s point, I’ve been playing around with glmer2stan() and there a couple of things a naive user wouldn’t understand without digging into the code (or at least checking the defaults). Most importantly, the *default* for the variance components is flat priors. I’m a little unsure why this was made the default, as there seem to be good reasons to choice weakly informative priors: http://andrewgelman.com/2013/07/16/priors/. Similarly, the default is to run one chain, 1000 iterations, with a warmup of 100. This is easily changed, of course.

    One other thing I’ve learned that may be of use to people who use this: for now, it is better to use Terminal rather than the GUI to run models. It tends to crash in the default R GUI or halt the display, but runs fine in the Terminal, particularly when multiple chains are run.

    I consider this program primarily a learning tool. I find it useful to extract the model code, e.g., fit@stanmodel, to see how it produces code for the equivalent of an lmer formula. It’s been really useful to see how things are defined and various options that stan allows without having to dig through hundreds of pages of dry documentation.


    Phillip said:
    June 15, 2014 at 4:57 pm

    There’s a new R package blme that also extends lme4 to a Bayesian setting (http://cran.r-project.org/web/packages/blme/), but without using Stan, BUGS or JAGS (or at least doesn’t depend on the usual R packages for those languages).


Questions? Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s