(This is another guest post by Klinton Bicknell.)

This is an update to my previous blog post, in which I observed that post-version-1.0 versions of the lme4 package yielded worse model fits than old pre-version-1.0 versions for typical psycholinguistic datasets, and I gave instructions for installing the legacy lme4.0 package. As I mentioned there, however, lme4 is under active development, the short version of this update post is to say that it seems that the latest versions of the post-version-1.0 lme4 now yield models that are just as good, and often better than lme4.0! This seems to be due to the use of a new optimizer, better convergence checking, and probably other things too. Thus, installing lme4.0 now seems only useful in special situations involving old code that expects the internals of the models to look a certain way. Life is once again easier thanks to the furious work of the lme4 development team!

[update: Since lme4 1.1-7 binaries are now on CRAN, this paragraph is obsolete.] One minor (short-lived) snag is that the current version of lme4 on CRAN (1.1-6) is overzealous in displaying convergence warnings, and displays them inappropriately in many cases where models have in fact converged properly. This will be fixed in 1.1-7 (more info here). To avoid them for now, the easiest thing to do is probably to install the current development version of lme4 1.1-7 from github like so:

library("devtools"); install_github("lme4/lme4")

Read on if you want to hear more details about my comparisons of the versions.

The ‘softer kind’ of tutorial on linear mixed effect regression

I recently was pointed to this nice and very accessible tutorial on linear mixed effects regression and how to run them in R by Bodo Winter (at UC Merced). If you don’t have much or any background in this type of model, I recommend you pair it with a good conceptual introduction to these models like Gelman and Hill 2007 and perhaps some slides from our LSA 2013 tutorial.

There are a few thing I’d like to add to Bodo’s suggestions regarding how to report your results:

  1. be clear how you coded the variables since this does change the interpretation of the coefficients (the betas that are often reported). E.g. say whether you sum- or treatment-coded your factors, whether you centered or standardized continuous predictors etc. As part of this, also be clear about the direction of the coding. For example, state that you “sum-coded gender as female (1) vs. male (-1)”. Alternatively, report your results in a way that clearly states the directionality (e.g., “Gender=male, beta = XXX”).
  2. please also report whether collinearity was an issue. E.g., report the highest fixed effect correlations.

old and new lme4

(This is a guest post by Klinton Bicknell.)

update 2014-06-24: Using lme4.0 probably isn’t necessary anymore. See post here.

The lme4 package‘s major 1.0 release was back in August. I and others have noticed that for typical psycholinguistic datasets, the new >=1.0 versions of lme4 often yield models with substantially poorer fits to the data than the old pre-1.0 versions (sometimes worse by many points of log likelihood), which suggests that the new lme4 isn’t as reliably converging to the actual maximum likelihood (or REML) solution. Since unconverged models yield misleading inferences about model parameters, it’s useful to be able to fit models using the old pre-1.0 lme4.

Happily, the lme4 developers have created a new package (named “lme4.0”), which is a bugfix-only version of the old pre-1.0 lme4. This allows for the installation of both old and new versions of lme4 side-by-side. As of this posting, lme4.0 is not yet on CRAN, but is installable by performing the following steps: Read the rest of this entry »

Ways of plotting map data in R (and python)

Thanks to Scott Jackson, Daniel Ezra Johnson, David Morris, Michael Shvartzman, and Nathanial Smith for the recommendations and pointers to the packages mentioned below.

  • R:
    • The maps, mapsextra, and maptools packages provide data and tools to plot world, US, and a variety of regional maps (see also mapproj and mapdata). This, combined with ggplot2 is also what we used in Jaeger et al., (2011, 2012) to plot distributions over world maps. Here’s an example from ggplot2 with maps.
    Example of using ggplot2 combined with the maps package.
    Example use of ggplot2 combined with the maps package (similar to the graphs created for Jaeger et al., 2011, 2012).

Updated slides on GLM, GLMM, plyr, etc. available

Some of you asked for the slides to the Mixed effect regression class I taught at the 2013 LSA Summer Institute in Ann Arbor, MI. The class covered some Generalized Linear Model, Generalized Linear Mixed Models, extensions beyond the linear model, simulation-based approaches to assessing the validity (or power) of your analysis, data summarization and visualization, and reporting of results. The class included slides from Maureen Gillespie, Dave Kleinschmidt, and Judith Degen (see above link). Dave even came by to Ann Arbor and gave his lecture on the awesome power of plyr (and reshape etc.), which I recommend. You might also just browse through them to get an idea of some new libraries (such as Stargazer for quick and nice looking latex tables). There’s also a small example to work through for time series analysis (for beginners).

Almost all slides were created in knitr and latex (very conveniently integrated into RStudio — I know some purists hate it, but comm'on), so that the code on the slides is the code that generated the output on the slides. Feedback welcome.



Workshop announcement (Tuebingen): Advances in Visual Methods for Linguistics

This workshop on data visualization might be of interest to a lot of you. I wish I could just hop over the pond.

  • Date: 24-Sept-2014 – 26-Sept-2014
  • Location: Tuebingen, Germany
  • Contact Person: Fabian Tomaschek (
  • Web Site:
  • Call Deadlines: 21 March / 18 April

The AVML-meeting offers a meeting place for all linguists from all fields who are interested in elaborating and improving their data visualization skills and methods. The meeting consists of a one-day hands-on workshop Read the rest of this entry »

Is my analysis problematic? A simulation-based example

This post is in reply to a recent question on in ling-R-lang by Meredith Tamminga. Meredith was wondering whether an analysis she had in mind for her project was circular, causing the pattern of results predicted by the hypothesis that she was interested in testing. I felt her question (described below in more detail) was an interesting example that might best be answered with some simulations. Reasoning through an analysis can, of course, help a lot in understanding (or better, as in Meredith’s case, anticipating) problems with the interpretation of the results. Not all too infrequently, however, I find that intuition fails or isn’t sufficiently conclusive. In those cases, simulations can be a powerful tool in understanding your analysis. So, I decided to give it a go and use this as an example of how one might approach this type of question.

Results of 16 simulated priming experiments with a robust priming effect (see title for the true relative frequency of each variant in the population).
Figure 1: Results of 16 simulated priming experiments with a robust priming effect (see title for the true relative frequency of each variant in the population). For explanation see text below.

Read the rest of this entry »