Workshop announcement (Tuebingen): Advances in Visual Methods for Linguistics

Posted on

This workshop on data visualization might be of interest to a lot of you. I wish I could just hop over the pond.

  • Date: 24-Sept-2014 – 26-Sept-2014
  • Location: Tuebingen, Germany
  • Contact Person: Fabian Tomaschek (
  • Web Site:
  • Call Deadlines: 21 March / 18 April

The AVML-meeting offers a meeting place for all linguists from all fields who are interested in elaborating and improving their data visualization skills and methods. The meeting consists of a one-day hands-on workshop Read the rest of this entry »

Correlation plot matrices using the ellipse library

Posted on Updated on

My new favorite library is the ellipse library. It includes functions for creating ellipses from various objects. It has a function, plotcorr() to create a correlation matrix where each correlation is represented with an ellipse approximating the shape of a bivariate normal distribution with the same correlation. While the function itself works well, I wanted a bit more redundancy in my plots and modified the code. I kept (most of) the main features provided by the function and I’ve included a few: the ability to plot ellipses and correlation values on the same plot, the ability to manipulate what is placed along the diagonal and the rounding behavior of the numbers plotted. Here is an example with some color manipulations. The colors represent the strength and direction of the correlation, -1 to 0 to 1, with University of Rochester approved red to white to blue.

First the function code:

Read the rest of this entry »

Blog on ggplot2

Posted on Updated on

If you haven’t already, check out this nice R blog with lots of code for good ggplot2 and lattice figures.

One slide on developing a regression model with interpretable coefficients

Posted on Updated on

While Victor Kuperman and I are preparing our slides for WOMM, I’ve been thinking about how to visualize the process from input variables to a full model. Even though it involves many steps that hugely depend on the type of regression model, which in turn depends on the type of outcome (dependent) variable, there are a number of steps that one always needs to go through if we want interpretable coefficient estimates (as well as unbiased standard error estimates for those coefficients).


Read the rest of this entry »

Visualizing the quality of an glmer(family=”binomial”) model

Posted on Updated on

Ah, while I am at, I may as well put this plot up, too. The code needs to be updated, but let me know if you think this could be useful. It’s very similar to the calibrate() plots from Harell’s Design library, just that it works for lmer() models from Doug Bates’ lme4 library.

The plot below is from a model of complementizer that-mentioning (a type of syntactic reduction as in I believe (that) it is time to go to bed). The model uses 26 parameters to predict speakers’ choice between complement clauses with and without that. This includes predictors modeling the accessibility, fluency, etc.  at the complement clause onset,  overall domain complexity, the potential for ambiguity avoidance, predictability of the complement clause, syntactic persistence effects, social effects, individual speaker differences, etc.

Visualization of mode fit: predicted probability vs. observed proportions of complementizer "that"

Mean predicted probabilities vs. observed proportions of that. The data is divided into 20 bins based on 0.05 intervals of predicted values from 0 to 1. The amount of observed data points in each bin is expressed as multiples of the minimum bin size. The data rug at the top of the plot visualizes the distribution of the predicted values. See Jaeger (almost-submitted, Figure 2).

Plotting effects for glmer(, family=”binomial”) models

Posted on Updated on

UPDATE 12/15/10: Bug fix. Thanks to Christian Pietsch.

UPDATE 10/31/10: Some further updates and bug fixes. The code below is the updated one.

UPDATE 05/20/10: I’ve updated the code with a couple of extensions (both linear and binomial models should now work; the plot now uses ggplot2) and minor fixes (the code didn’t work if the model only had one fixed effect predictor).  I also wanted to be clear that the dashed lines in the plots aren’t confidence intervals. They are multiples of the standard error of the effect.

Here’s a new function for plotting the effect of predictors in multilevel logit models fitted in R using lmer() from the lme4 package. It’s based on code by Austin Frank and I also borrowed from Harald Baayen’s plotLMER.fnc() (package languageR). First a cool pic:

Predicted effect of speechrate on complementizer-mentioning
Predicted effect of speechrate on complementizer-mentioning

These plots contain the distribution of the predictor (x-axis) against the predicted values (based on the entire model, y-axis) using hexbinplot() from the package hexbin. On top of that, you see the model prediction fo the selected predictor along with confidence intervals. Note that the predictor is given in its original form (here speech rate) although it was entered into the model as the centered log-transformed speechrate. The plot consideres that. Of course, you can configure things.

Read the rest of this entry »

R-code for visual model summaries: linear mixed models

Posted on Updated on

Here is some code to summarize the coefficients of a linear mixed model that produces nice graphs like the following one (well, the curved arrows were added in powerpoint): [click to see a larger version]

An example slide of a linear mixed model summary

But first some background about the example model: Read the rest of this entry »