R code

Ways of plotting map data in R (and python)

Posted on Updated on

Thanks to Scott Jackson, Daniel Ezra Johnson, David Morris, Michael Shvartzman, and Nathanial Smith for the recommendations and pointers to the packages mentioned below.

  • R:
    • The maps, mapsextra, and maptools packages provide data and tools to plot world, US, and a variety of regional maps (see also mapproj and mapdata). This, combined with ggplot2 is also what we used in Jaeger et al., (2011, 2012) to plot distributions over world maps. Here’s an example from ggplot2 with maps.
    Example of using ggplot2 combined with the maps package.
    Example use of ggplot2 combined with the maps package (similar to the graphs created for Jaeger et al., 2011, 2012).

Updated slides on GLM, GLMM, plyr, etc. available

Posted on

Some of you asked for the slides to the Mixed effect regression class I taught at the 2013 LSA Summer Institute in Ann Arbor, MI. The class covered some Generalized Linear Model, Generalized Linear Mixed Models, extensions beyond the linear model, simulation-based approaches to assessing the validity (or power) of your analysis, data summarization and visualization, and reporting of results. The class included slides from Maureen Gillespie, Dave Kleinschmidt, and Judith Degen (see above link). Dave even came by to Ann Arbor and gave his lecture on the awesome power of plyr (and reshape etc.), which I recommend. You might also just browse through them to get an idea of some new libraries (such as Stargazer for quick and nice looking latex tables). There’s also a small example to work through for time series analysis (for beginners).

Almost all slides were created in knitr and latex (very conveniently integrated into RStudio — I know some purists hate it, but comm’on), so that the code on the slides is the code that generated the output on the slides. Feedback welcome.



Is my analysis problematic? A simulation-based example

Posted on Updated on

This post is in reply to a recent question on in ling-R-lang by Meredith Tamminga. Meredith was wondering whether an analysis she had in mind for her project was circular, causing the pattern of results predicted by the hypothesis that she was interested in testing. I felt her question (described below in more detail) was an interesting example that might best be answered with some simulations. Reasoning through an analysis can, of course, help a lot in understanding (or better, as in Meredith’s case, anticipating) problems with the interpretation of the results. Not all too infrequently, however, I find that intuition fails or isn’t sufficiently conclusive. In those cases, simulations can be a powerful tool in understanding your analysis. So, I decided to give it a go and use this as an example of how one might approach this type of question.

Results of 16 simulated priming experiments with a robust priming effect (see title for the true relative frequency of each variant in the population).
Figure 1: Results of 16 simulated priming experiments with a robust priming effect (see title for the true relative frequency of each variant in the population). For explanation see text below.

Read the rest of this entry »

New R library for multilevel modeling

Posted on

This might be of interest to many of you. MLwiN, a software package for multilevel modeling developed at Bristol that includes functions beyond those present in, e.g., lmer, now has an interface for R (kinda like WinBugs, etc.), so that you can continue to use R while taking advantage of the powerful tools in MLwiN. The package is called R2MLwiN. For more details, see below.

Dear all,
We are pleased to announce a new R package, R2MLwiN (Zhang et al. 2012)
that allows R users access to the functionality within MLwiN directly from
within the R package. This package has been developed as part of the e-STAT
ESRC digital social research programme grant along with the Stat-JR package.
See <http://www.bristol.ac.uk/cmm/software/r2mlwin/> for more details
including examples taken from the book MCMC Estimation in MLwiN.

Feedback gratefully received by either me or Zhengzheng Zhang (Z.Zhang@bristol.ac.uk).

Best wishes,
  Bill Browne. 

Correlation plot matrices using the ellipse library

Posted on Updated on

My new favorite library is the ellipse library. It includes functions for creating ellipses from various objects. It has a function, plotcorr() to create a correlation matrix where each correlation is represented with an ellipse approximating the shape of a bivariate normal distribution with the same correlation. While the function itself works well, I wanted a bit more redundancy in my plots and modified the code. I kept (most of) the main features provided by the function and I’ve included a few: the ability to plot ellipses and correlation values on the same plot, the ability to manipulate what is placed along the diagonal and the rounding behavior of the numbers plotted. Here is an example with some color manipulations. The colors represent the strength and direction of the correlation, -1 to 0 to 1, with University of Rochester approved red to white to blue.

First the function code:

Read the rest of this entry »

Creating spaghetti plots of eye-tracking data in R

Posted on Updated on

I’ve been working on consolidating all the different R functions I’ve written over the years for plotting my eye-tracking data and creating just one amazing super-function (based on the ggplot2 package) that can do it all. Here’s a first attempt that anybody with the right kind of dataset should be able to use to create plots like the ones below (generated from fake data. The R code that generates the data is included at the end of the post). If you find this code helpful, please consider acknowledging it via the following URL in your paper/presentation to spread the word:


Left: Empirical means with error bars indicating standard error for four experimental conditions. Contrast presence is coded in color, adjective type in line type. The first vertical line indicates adjective onset, the second ones indicate mean noun onset in each contrast condition. Right: Smoothed model estimates of proportions in each condition, with ribbons indicating 95% confidence intervals. Data from different subjects is plotted in different panels.

Read the rest of this entry »

New R resource for ordinary and multilevel regression modeling

Posted on

Here’ s what I received from the Center of Multilevel Modeling at Bristol (I haven’t checked it out yet; registration seems to be free but required):

The Centre for Multilevel Modelling is very pleased to announce the addition of
R practicals to our free on-line multilevel modelling course. These give
detailed instructions of how to carry out a range of analyses in R, starting
from multiple regression and progressing through to multilevel modelling of
continuous and binary data using the lmer and glmer functions.

MLwiN and Stata versions of these practicals are already available.
You will need to log on or register onto the course to view these


R code for Jaeger, Graff, Croft and Pontillo (2011): Mixed effect models for genetic and areal dependencies in linguistic typology: Commentary on Atkinson

Posted on Updated on

Below I am sharing the R code for our paper on the serial founder effect:
This paper is a commentary on Atkinson’s 2011 Science article on the serial founder model (see also this interview with ScienceNews, in which parts of our comment in Linguistic Typology and follow-up work are summarized). In the commentary, we provide an introduction to linear mixed effect models for typological research. We discuss how to fit and to evaluate these models, using Atkinson’s data as an example.We illustrate the use of crossed random effects to control for genetic and areal relations between languages. We also introduce a (novel?) way to model areal dependencies based on an exponential decay function over migration distances between languages.
Finally, we discuss limits to the statistical analysis due to data sparseness. In particular, we show that the data available to Atkinson did not contain enough language families with sufficiently many languages to test whether the observed effect holds once random by-family slopes (for the effect) are included in the model. We also present simulations that show that the Type I error rate (false rejections) of the approach taken in Atkinson is many times higher than conventionally accepted (i.e. above .2 when .05 is the conventionally accepted rate of Type errors).
The scripts presented below are not intended to allow full replication of our analyses (they lack annotation and we are not allowed to share the WALS data employed by Atkinson on this site anyway). However, there are many plots and tests in the paper that might be useful for typologists or other users of mixed models. For that reason, I am for now posting the raw code. Please comment below if you have questions and we will try to provide additional annotation for the scripts as needed and as time permits. If you find (parts of the) script(s) useful, please consider citing our article in Linguistic Typology.

More on random slopes and what it means if your effect is not longer significant after the inclusion of random slopes

Posted on Updated on

I thought the following snippet from a somewhat edited email I recently wrote in reply to a question about random slopes and what it means that an effect becomes insignificant might be helpful to some. I also took it as an opportunity to updated the procedure I described at http://hlplab.wordpress.com/2009/05/14/random-effect-structure/. As always, comments are welcome. What I am writing below are just suggestions.

[...] an insignificant effect in an (1 + factor|subj) model means that, after controlling for random by-subject variation in the slope/effect of factor, you find no (by-convention-significant) evidence for the effect. Like you suggest, this is due to the fact that there is between-subject variability in the slope that is sufficiently large to let us call into question the hypothesis that the ‘overall’ slope is significantly different from zero.

[...] So, what’s the rule of thumb here? If you run any of the standard simple designs (2×2, 2×3, 2x2x2,etc.) and you have the psychologist’s luxury of plenty of data (24+item, 24+ subject [...]), the full random effect structure is something you should entertain as your starting point. That’s in Clark’s spirit. That’s what F1 and F2 were meant for. [...] All of these approaches do not just capture random intercept differences by subject and item. They also aim to capture random slope differences.

[...] here’s what I’d recommend during tutorials now because it often saves time for psycholinguistic data. I am only writing down the random effects but, of course, I am assuming there are fixed effects, too, and that your design factors will remain in the model. Let’s look at a 2×2 design: Read the rest of this entry »

Mixed model’s and Simpson’s paradox

Posted on

For a paper I am currently working on, I started to think about Simpson’s paradox, which wikipedia succinctly defines as

“a paradox in which a correlation (trend) present in different groups is reversed when the groups are combined. This result is often encountered in social-science [...]“

The wikipedia page also gives a nice visual illustration. Here’s my own version of it. The plot shows 15 groups, each with 20 data points. The groups happen to order along the x-axis (“Pseudo distance from origin”) in a way that suggests a negative trend of the Pseudo distance from origin against the outcome (“Pseudo normalized phonological diversity”). However, this trend does not hold within groups. As a matter of fact, in this particular sample, most groups show the opposite of the global trend (10 out of 15 within-group slopes are clearly positive). If this data set is analyzed by an ordinary linear regression (which does not have access to the grouping structure), the result will be a significant negative slope for the Pseudo distance from origin. So, I got curious: what about linear mixed models?

Read the rest of this entry »

Diagnosing collinearity in mixed models from lme4

Posted on Updated on

I’ve just uploaded files containing some useful functions to a public git repository. You can see the files directly without worrying about git at all by visiting regression-utils.R (direct download) and mer-utils.R (direct download). Read the rest of this entry »

R code for LaTeX tables of lmer model effects

Posted on Updated on

Here’s some R code that outputs text on the console that you can copy-paste into a .tex file and creates nice LaTeX tables of fixed effects of lmer models (only works for family=”binomial”). Effects <.05 will appear in bold. The following code produces the table pasted below. It assumes the model mod.all. prednames creates a mapping from predictor names in the model to predictor names you want to appear in the table. Note that for the TeX to work you need to include \usepackage{booktabs} in the preamble.
Read the rest of this entry »

Annotated example analysis using mixed models

Posted on Updated on

Jessica Nelson (Learning Research and Development Center, University of Pittsburgh) uploaded a step-by-step example analysis using mixed models to her blog. Each step is nicely annotated and Jessica also discusses some common problems she encountered while trying to analyze her data using mixed models. I think this is a nice example for anyone trying to learn to use mixed models. It goes through all/most of the steps outlined in Victor Kuperman and my WOMM tutorial (click on the graph to see it full size):

Tutorial on Regression and Mixed Models at Penn State

Posted on Updated on

Last week (02/3-5/10), I had the pleasure to give the inaugural CLS Graduate Student Young Scientist Colloquium (“An information theoretic perspective on language production”) at the Center for Language Science at Penn State (State College).

I also gave two 3h-lectures on regression and mixed models. The slides for Day 1 introduce linear regression, generalized linear models, and generalized linear mixed models.  I am using example analyses of real psycholinguistic data sets from Harald Baayen’s languageR library (freely available through the free stats package R). The slides for Day 2 go through problems and solutions for regression models. For more information have a look at the online lectures available via the HLP lab wiki. I’ve uploaded the pdf slides and an R script. There also might be a pod cast available at some point. Feedback welcome. I’ll be giving a similar workshop at McGill in May, so watch for more materials.

I had an intensive and fun visit, meeting with researchers from Psychology, Communication and Disorders, Linguistics, Spanish, German, etc.  I learned a lot about bilingualism (not only though)  and a bit about anticipatory motor planning. So thanks to everyone there who helped to organize the visit, especially Jorge Valdes and Jee Sook Park. And thanks to Judith Kroll for the awesome cake (see below). Goes without saying that it was a pleasure meeting the unofficial mayor of State College, too ;). See you all at CUNY! Read the rest of this entry »

Blog on ggplot2

Posted on Updated on

If you haven’t already, check out this nice R blog with lots of code for good ggplot2 and lattice figures.