Month: January 2009
Ah, while I am at, I may as well put this plot up, too. The code needs to be updated, but let me know if you think this could be useful. It’s very similar to the calibrate() plots from Harell’s Design library, just that it works for lmer() models from Doug Bates’ lme4 library.
The plot below is from a model of complementizer that-mentioning (a type of syntactic reduction as in I believe (that) it is time to go to bed). The model uses 26 parameters to predict speakers’ choice between complement clauses with and without that. This includes predictors modeling the accessibility, fluency, etc. at the complement clause onset, overall domain complexity, the potential for ambiguity avoidance, predictability of the complement clause, syntactic persistence effects, social effects, individual speaker differences, etc.
Mean predicted probabilities vs. observed proportions of that. The data is divided into 20 bins based on 0.05 intervals of predicted values from 0 to 1. The amount of observed data points in each bin is expressed as multiples of the minimum bin size. The data rug at the top of the plot visualizes the distribution of the predicted values. See Jaeger (almost-submitted, Figure 2).
UPDATE 12/15/10: Bug fix. Thanks to Christian Pietsch.
UPDATE 10/31/10: Some further updates and bug fixes. The code below is the updated one.
UPDATE 05/20/10: I’ve updated the code with a couple of extensions (both linear and binomial models should now work; the plot now uses ggplot2) and minor fixes (the code didn’t work if the model only had one fixed effect predictor). I also wanted to be clear that the dashed lines in the plots aren’t confidence intervals. They are multiples of the standard error of the effect.
Here’s a new function for plotting the effect of predictors in multilevel logit models fitted in R using lmer() from the lme4 package. It’s based on code by Austin Frank and I also borrowed from Harald Baayen’s plotLMER.fnc() (package languageR). First a cool pic:
These plots contain the distribution of the predictor (x-axis) against the predicted values (based on the entire model, y-axis) using hexbinplot() from the package hexbin. On top of that, you see the model prediction fo the selected predictor along with confidence intervals. Note that the predictor is given in its original form (here speech rate) although it was entered into the model as the centered log-transformed speechrate. The plot consideres that. Of course, you can configure things.
Hal Tily gave an interesting talk about how processing factors influencing synchronic word order variation ultimately lead to diachronic word order change, focusing on the SOV-SVO variation in Old English. Locality-based processing constraints lead speakers to prefer orders which minimize the distance between syntactically dependent elements (Hawkins 2004, Gibson 2000). All things being equal, then, SVO structures will be preferable to SOV structures, and this preference will be more pronounced as the weight of the object increases. With no additional pressure making SOV structures preferable in circumstances where weight does not play much of a role, SVO structures will become more frequent (and in fact they do). The next question is how does the synchronic production preference eventually lead to diachronic change. Hal suggests that language learners are inducing structure based on their input and do not correct for processing factors such as weight. A simulation shows that VO order will become dominant (and effectively grammaticalized) in a situation where language users select utterances in a way that is sensitive to weight, but where language learners essentially estimate a regression without a term for weight. Of course questions remain: why was there SOV in Old English to begin with, if there is an over-arching preference for SVO? Why aren’t all languages SVO? Maybe SOV structures had the same sorts of processing advantages (anti-locality effects) that modern verb-final languages seem to have, and these gradually dried up as the English case system disappeared….In any case, I thought this study nicely illustrated a way of linking processing phenomena and language change, and it raises some interesting questions.
Two new resources have recently become available that may be of interest to the NLP and Psycholingustics communities. First, the New York Times has released “The New York Times Annotated Corpus”. It’s available through the LDC. It’s been marked up with tags for people, places, topics, and organizations. 650,000 of the 1.8 million articles contain human-written summaries (36%). The LDC listing can be found here. A nice write up of the release is at the NYT Open Blog. Read the rest of this entry »
Unfortunately, I was only able to attend the first day of the LSA meeting this year, but it was good being there (ran into lots of interesting folks and saw a couple of good talks). Ting gave his presentation on Constant Entropy Rate in Mandarin Chinese and he was a real pro ;). Read the rest of this entry »