(This is another guest post by Klinton Bicknell.)
This is an update to my previous blog post, in which I observed that post-version-1.0 versions of the lme4 package yielded worse model fits than old pre-version-1.0 versions for typical psycholinguistic datasets, and I gave instructions for installing the legacy lme4.0 package. As I mentioned there, however, lme4 is under active development, the short version of this update post is to say that it seems that the latest versions of the post-version-1.0 lme4 now yield models that are just as good, and often better than lme4.0! This seems to be due to the use of a new optimizer, better convergence checking, and probably other things too. Thus, installing lme4.0 now seems only useful in special situations involving old code that expects the internals of the models to look a certain way. Life is once again easier thanks to the furious work of the lme4 development team!
[update: Since lme4 1.1-7 binaries are now on CRAN, this paragraph is obsolete.]
One minor (short-lived) snag is that the current version of lme4 on CRAN (1.1-6) is overzealous in displaying convergence warnings, and displays them inappropriately in many cases where models have in fact converged properly. This will be fixed in 1.1-7 (more info here). To avoid them for now, the easiest thing to do is probably to install the current development version of lme4 1.1-7 from github like so:
Read on if you want to hear more details about my comparisons of the versions.
I compared the latest lme4 (1.1-7 from github) to lme4.0 on a number of continuous and discrete datasets produced from an eyetracking in reading experiment. There were 3 regions of analysis, and 4 continuous dependent measures and 3 binary dependent measures for each region, yielding a total of 12 continuous and 9 binary datasets. Each dataset included three independent variables, and I fit models with both relatively minimal random effects and maximal random effects (random slopes for every independent variable, by subject and item, and all correlations between these), so 24 continuous and 18 binary models. Each dataset also has a lot of missing data (e.g., cases where a word was not fixated). I think this is reasonably representative of a range of psycholinguistic datasets. I fit all these models with both versions of lme4, and performed the same test on both R 3.0 and R 3.1, all on a Mac.
The main quantitative way the models were compared was in terms of log likelihood (REML for continuous models), where higher likelihood means a better model. In addition, I looked at convergence warnings, which are another important part of model fitting software, telling you which models can be trusted. A model fit that didn’t give a convergence warning means that lme4 thinks it found the actual maximum log likelihood. A model fit that did give a convergence warning means that lme4 will return the best model it found (and its associated likelihood), but thinks that it’s not the truly optimal fit. The results are relatively different for continuous and binary dependent measures, so I’ll describe them separately.
Continuous / linear.
For continuous variables (linear models), old and new lme4 yielded extremely similar models. In cases where both models converged, or both failed to converge (by failure to converge, I mean showing any kind of convergence warning), new lme4 often returned a slightly better model (between 0.1 and 0.4 points of log likelihood). There were just two cases (out of the 24 models) where only one of the two versions of lme4 gave a convergence warning. In both of these, it was the new lme4 that gave the warning, while the old lme4.0 appeared to converge happily. In one case, the two models seemed to return identical solutions, perhaps suggesting better convergence checking in the new lme4. In the other case, the new lme4 found a much *better* solution than the old (by a full 1.3 points of log likelihood), demonstrating that the old lme4.0 had certainly not found the true optimum (despite not returning a convergence warning). In sum, it seems that for continuous dependent measures, the new lme4 is just better across the board: it fits better models, and seems more reliable in terms of assessing convergence.
Binary / logistic.
The story for binary dependent measures (logistic models) is more complicated.
For one, it depends on the version of R. Specifically, in R 3.1, old lme4.0 appears to just be broken: it very often fails to converge (which is a known problem), and when it fails to converge it returns terrible models that are 30 to 40 points away from the solution returned by new lme4. Even when it returns no convergence warning, however, its solutions are still often 5 to 10 points of log likelihood worse than the models returned by the new lme4. Bottom line here: don’t use lme4.0 in R 3.1 (at least for now): either switch back to R 3.0 or use the new lme4.
In R 3.0, in which lme4.0 is not broken, the story is a little different. For models where the old lme4.0 converged, the new lme4 usually returns a virtually identical model, though often while giving a convergence warning, again, possibly pointing to convergence testing differences. In a few cases, the new lme4 returned a substantially better model (by 1.3-1.5 points of log likelihood), providing further evidence that the old lme4.0 was mistakenly confident in its convergence. So this again seems like a win for the new lme4, except for the fact that it is often reporting convergence failure. But it turns out that there’s a solution for that too! Specifically, Ben Bolker has suggested that switching the optimizer to “bobyqa” for logistic models seems to generally improve the model fits and convergence. So I repeated all these tests again using bobyqa in the new lme4: it then converged in almost every case, with models that were either identical to those found by old lme4.0 or better. To use that optimizer, all it takes is adding a control argument to the glmer call. E.g.,
glmer(y ~ x + (x|subj), data=df, family="binomial", control = glmerControl(optimizer="bobyqa"))
So, for logistic models, new lme4 with the bobyqa optimizer seems strictly better than lme4.0.
In sum, it seems from these datasets that the new lme4 (at least version 1.1-7 from github) is now better than the old lme4.0 (especially if you use R 3.1, where lme4.0 is broken). The new lme4 finds better models for linear models out of the box, finds identical or better solutions for logistic models if you use the “bobyqa” optimizer, and is very comparable to old lme4.0 for logistic models with the default optimizer, though reports more convergence failure. (Of course, these are correlated datasets, and it’s certainly conceivable that the situation is different with other datasets.) Again, fantastic job, lme4 development team!