more on old and new lme4

Posted on Updated on


(This is another guest post by Klinton Bicknell.)

This is an update to my previous blog post, in which I observed that post-version-1.0 versions of the lme4 package yielded worse model fits than old pre-version-1.0 versions for typical psycholinguistic datasets, and I gave instructions for installing the legacy lme4.0 package. As I mentioned there, however, lme4 is under active development, the short version of this update post is to say that it seems that the latest versions of the post-version-1.0 lme4 now yield models that are just as good, and often better than lme4.0! This seems to be due to the use of a new optimizer, better convergence checking, and probably other things too. Thus, installing lme4.0 now seems only useful in special situations involving old code that expects the internals of the models to look a certain way. Life is once again easier thanks to the furious work of the lme4 development team!

[update: Since lme4 1.1-7 binaries are now on CRAN, this paragraph is obsolete.] One minor (short-lived) snag is that the current version of lme4 on CRAN (1.1-6) is overzealous in displaying convergence warnings, and displays them inappropriately in many cases where models have in fact converged properly. This will be fixed in 1.1-7 (more info here). To avoid them for now, the easiest thing to do is probably to install the current development version of lme4 1.1-7 from github like so:

library("devtools"); install_github("lme4/lme4")

Read on if you want to hear more details about my comparisons of the versions.

I compared the latest lme4 (1.1-7 from github) to lme4.0 on a number of continuous and discrete datasets produced from an eyetracking in reading experiment. There were 3 regions of analysis, and 4 continuous dependent measures and 3 binary dependent measures for each region, yielding a total of 12 continuous and 9 binary datasets. Each dataset included three independent variables, and I fit models with both relatively minimal random effects and maximal random effects (random slopes for every independent variable, by subject and item, and all correlations between these), so 24 continuous and 18 binary models. Each dataset also has a lot of missing data (e.g., cases where a word was not fixated). I think this is reasonably representative of a range of psycholinguistic datasets. I fit all these models with both versions of lme4, and performed the same test on both R 3.0 and R 3.1, all on a Mac.

The main quantitative way the models were compared was in terms of log likelihood (REML for continuous models), where higher likelihood means a better model. In addition, I looked at convergence warnings, which are another important part of model fitting software, telling you which models can be trusted. A model fit that didn’t give a convergence warning means that lme4 thinks it found the actual maximum log likelihood. A model fit that did give a convergence warning means that lme4 will return the best model it found (and its associated likelihood), but thinks that it’s not the truly optimal fit. The results are relatively different for continuous and binary dependent measures, so I’ll describe them separately.

Continuous / linear.

For continuous variables (linear models), old and new lme4 yielded extremely similar models. In cases where both models converged, or both failed to converge (by failure to converge, I mean showing any kind of convergence warning), new lme4 often returned a slightly better model (between 0.1 and 0.4 points of log likelihood). There were just two cases (out of the 24 models) where only one of the two versions of lme4 gave a convergence warning. In both of these, it was the new lme4 that gave the warning, while the old lme4.0 appeared to converge happily. In one case, the two models seemed to return identical solutions, perhaps suggesting better convergence checking in the new lme4. In the other case, the new lme4 found a much *better* solution than the old (by a full 1.3 points of log likelihood), demonstrating that the old lme4.0 had certainly not found the true optimum (despite not returning a convergence warning). In sum, it seems that for continuous dependent measures, the new lme4 is just better across the board: it fits better models, and seems more reliable in terms of assessing convergence.

Binary / logistic.

The story for binary dependent measures (logistic models) is more complicated.

For one, it depends on the version of R. Specifically, in R 3.1, old lme4.0 appears to just be broken: it very often fails to converge (which is a known problem), and when it fails to converge it returns terrible models that are 30 to 40 points away from the solution returned by new lme4. Even when it returns no convergence warning, however, its solutions are still often 5 to 10 points of log likelihood worse than the models returned by the new lme4. Bottom line here: don’t use lme4.0 in R 3.1 (at least for now): either switch back to R 3.0 or use the new lme4.

In R 3.0, in which lme4.0 is not broken, the story is a little different. For models where the old lme4.0 converged, the new lme4 usually returns a virtually identical model, though often while giving a convergence warning, again, possibly pointing to convergence testing differences. In a few cases, the new lme4 returned a substantially better model (by 1.3-1.5 points of log likelihood), providing further evidence that the old lme4.0 was mistakenly confident in its convergence. So this again seems like a win for the new lme4, except for the fact that it is often reporting convergence failure. But it turns out that there’s a solution for that too! Specifically, Ben Bolker has suggested that switching the optimizer to “bobyqa” for logistic models seems to generally improve the model fits and convergence. So I repeated all these tests again using bobyqa in the new lme4: it then converged in almost every case, with models that were either identical to those found by old lme4.0 or better. To use that optimizer, all it takes is adding a control argument to the glmer call. E.g.,

glmer(y ~ x + (x|subj), data=df, family="binomial",
      control = glmerControl(optimizer="bobyqa"))

So, for logistic models, new lme4 with the bobyqa optimizer seems strictly better than lme4.0.

Conclusion.

In sum, it seems from these datasets that the new lme4 (at least version 1.1-7 from github) is now better than the old lme4.0 (especially if you use R 3.1, where lme4.0 is broken). The new lme4 finds better models for linear models out of the box, finds identical or better solutions for logistic models if you use the “bobyqa” optimizer, and is very comparable to old lme4.0 for logistic models with the default optimizer, though reports more convergence failure. (Of course, these are correlated datasets, and it’s certainly conceivable that the situation is different with other datasets.) Again, fantastic job, lme4 development team!

15 thoughts on “more on old and new lme4

    tiflo said:
    June 24, 2014 at 12:52 pm

    Hey Klinton, this is incredibly useful. Thank you!

    Like

    Michael Becker said:
    June 24, 2014 at 2:03 pm

    Thank you for the update. Question: how do you know whether a model converged or not, independent of the presence/absence of a warning message?

    Like

      klintonbicknell responded:
      June 24, 2014 at 2:06 pm

      In general, you don’t. I took lme4 as saying that a model converged properly if it didn’t give any convergence warnings. Although even if lme4 doesn’t give any warnings, you can also infer that the model didn’t actually properly converge if the other version of lme4 returned a model with a higher log likelihood.

      Like

        Michael Becker said:
        June 24, 2014 at 2:19 pm

        Thanks! I will try this with a bunch of datasets, and report if anything interesting emerges.

        Like

    Michael Becker said:
    June 24, 2014 at 8:46 pm

    Hey, any ideas on why installing lme4 from gihub might not work? I am getting “sh: make: command not found” and
    “ERROR: compilation failed for package ‘lme4′”. I have fortran and Xcode. I tried with R 3.0.3 and with R 3.1.0.

    Like

    Ben Bolker said:
    June 24, 2014 at 9:00 pm

    maybe https://groups.google.com/forum/#!topic/packrat-discuss/GZphJBPjnEc ? I sent lme4 1.1-7 to CRAN today, so hopefully you won’t have to install from github …

    Like

      klintonbicknell responded:
      June 24, 2014 at 9:07 pm

      fantastic! I will update this post as soon as it’s available.

      Like

        klintonbicknell responded:
        July 10, 2014 at 2:13 pm

        And now that the 1.1-7 binaries are on CRAN for all platforms, I updated the post!

        Like

      Michael Becker said:
      June 25, 2014 at 8:57 pm

      I did work, eventually 🙂 The missing part for me was Xcode’s command line tools.

      Like

    Morgan Sonderegger said:
    June 27, 2014 at 3:35 pm

    Thanks Klinton — this is great to know.

    Like

    Beth Stelle said:
    August 8, 2014 at 2:59 pm

    I was wondering if you have suggestions for alternatives to pvals.fnc, which no longer works since mcmcsamp wasn’t implemented in the new lme4. I’ve seen a couple of options described here
    https://github.com/lme4/lme4/blob/master/man/pvalues.Rd
    and here
    https://github.com/lme4/lme4/blob/master/man/drop1.merMod.Rd
    Are any of these being used by linguistics researchers?
    Thanks in advance for your advice!

    Like

      klintonbicknell responded:
      August 8, 2014 at 3:08 pm

      Yes, the two most common ways I’ve seen people get p values are:

      a) the z or t values (for the latter, either assuming infinite degrees of freedom, or using some df approximation, such as the satterthwaite approximation, as implemented, e.g., in the lmerTest package)

      or

      b) A likelihood ratio test comparing nested models in which one has the predictor of interest and one doesn’t. The simplest way to do this is to use the drop1 function.

      Finally, although I don’t see it used much in language research, if your model is simple enough, the best way to get p values is via parametric bootstrap, as implemented, e.g., in the bootMer package. But this is very computationally intensive even for medium-sized models (probably why I haven’t seen it).

      Like

        Beth Stelle said:
        August 8, 2014 at 3:28 pm

        Thanks for your speedy reply, this is very helpful!

        Like

        Kingdom of Tides said:
        October 21, 2014 at 6:13 am

        How does one get t-values in the new lme4? Incidentally, what are those values I get for the fixed effects in the model output? I am really sorry about my blatant ignorance…

        Like

          klintonbicknell responded:
          October 27, 2014 at 3:08 pm

          you have to use the summary function (e.g., summary(my_lmer_model)) now.

          Like

Questions? Thoughts?