# class/tutorial

### Presentation at CNS symposium on “Prediction, adaptation and plasticity of language processing in the adult brain”

Earlier this week, **Dave Kleinschmidt** and I gave a presentation as part of a mini-symposium at Cognitive Neuroscience Conference on “Prediction, adaptation and plasticity of language processing in the adult brain” organized by **Gina Kuperberg**. For this symposium we were tasked to address the following questions:

- What is prediction and why do we predict?
- What is adaptation and why do we adapt?
- How do prediction and adaptation relate?

Although we address these questions in the context of language processing, most of our points are pretty general. We aim to provide intuitions about the notions of distribution, prediction, distributional/statistical learning and adaptation. We walked through examples of belief-updating, intentionally keeping our presentation math-free. Perhaps some of the slides are of interest to some of you, so I attached them below. A more in-depth treatment of these questions is also provided in Kleinschmidt & Jaeger (under review, available on request).

Comments welcome. (sorry – some of the slides look strange after importing them and all the animations got lost but I think they are all readable).

It was great to see these notions discussed and related to ERP, MEG, and fMRI research in the three other presentations of the symposium by **Matt Davis**,** Kara Federmeier and Eddy Wlotko**, and** Gina Kuperberg**. You can read their abstracts following the link to the symposium I included above.

### Updated slides on GLM, GLMM, plyr, etc. available

Some of you asked for the slides to the** Mixed effect regression class** I taught at the **2013 LSA Summer Institute in Ann Arbor, MI**. The class covered some Generalized Linear Model, Generalized Linear Mixed Models, extensions beyond the linear model, simulation-based approaches to assessing the validity (or power) of your analysis, data summarization and visualization, and reporting of results. The class included slides from Maureen Gillespie, Dave Kleinschmidt, and Judith Degen (see above link). Dave even came by to Ann Arbor and gave his lecture on the awesome power of plyr (and reshape etc.), which I recommend. You might also just browse through them to get an idea of some new libraries (such as Stargazer for quick and nice looking latex tables). There’s also a small example to work through for time series analysis (for beginners).

Almost all slides were created in knitr and latex (very conveniently integrated into RStudio — I know some purists hate it, but comm’on), so that the code on the slides is the code that generated the output on the slides. Feedback welcome.

### HLP lab will be at the LSA 2013 summer institute

Come join us in Ann Arbor, MI for the 2013 Summer Institute of the Linguistic Society of America. You can follow the institute on facebook.

Victor Ferreira and I will be organizing a workshop on *How the brain accommodates variability in linguistic representations *(more on that soonish). I will be teaching a class on regression and mixed models and I am sure a bunch of other folks from the lab will be there, too.

### Creating spaghetti plots of eye-tracking data in R

I’ve been working on consolidating all the different R functions I’ve written over the years for plotting my eye-tracking data and creating just one amazing super-function (based on the ggplot2 package) that can do it all. Here’s a first attempt that anybody with the right kind of dataset should be able to use to create plots like the ones below (generated from fake data. The R code that generates the data is included at the end of the post). If you find this code helpful, please consider acknowledging it via the following URL in your paper/presentation to spread the word:

https://hlplab.wordpress.com/2012/02/27/creating-spaghetti-plots-of-eye-tracking-data-in-r/

### New R resource for ordinary and multilevel regression modeling

Here’ s what I received from the Center of Multilevel Modeling at Bristol (I haven’t checked it out yet; registration seems to be free but required):

The Centre for Multilevel Modelling is very pleased to announce the addition of R practicals to our free on-line multilevel modelling course. These give detailed instructions of how to carry out a range of analyses in R, starting from multiple regression and progressing through to multilevel modelling of continuous and binary data using the lmer and glmer functions. MLwiN and Stata versions of these practicals are already available. You will need to log on or register onto the course to view these practicals. Read More... http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13

### LSA 2011 class on Computational Psycholinguistics

Due to popular demand 😉 – you can find the *Computational Psycholinguistics *class Roger Levy and I are currently teaching at the LSA 2011 institute at Boulder mirrored here.

### LSA 2011 at Boulder: Yeah!

Woohooo. Roger Levy and I will be teaching a **class on*** **Computational Psycholinguistics *at the 2011 LSA’s Linguistics Institute to be held July 5th- August 5th next year in Boulder, CO. The class description should be available through their website soon, but here are some snippets from our proposal: Read the rest of this entry »

### Coming up: Mini-WoMM in Montreal

If you in the Montreal area, consider joining us for a **Workshop on Ordinary and Multilevel Models** to be held 5/3-4 at McGill and organized by Michael Wagner, Aparna Nadig, and Kris Onishi. The workshop will include the usual intros to linear regression, linear mixed models, logistic regression, and mixed logit models. We will also discuss common issues and solutions to regression modeling. Additionally, we will have a couple of special area lectures/tutorials:

- Maureen Gillespie (Northeastern) will talk about different ways to code your variables and how that relates to the specific hypotheses you’re testing.
- Peter Graff (MIT) will give a tutorial on logistic regression, specifically to test linguistic theories. In all likelihood, he will also sing. Which relates to the previous post, because he likes to sing about OT.

So, join us! I think there also will be a party =). Below is the full invitation (some details may change). Read the rest of this entry »

### Tutorial on Regression and Mixed Models at Penn State

Last week (02/3-5/10), I had the pleasure to give the inaugura*l CLS Graduate Student Young Scientist Colloquium* (“An information theoretic perspective on language production”) at the Center for Language Science at Penn State (State College).

I also gave two 3h-lectures on regression and mixed models. The slides for Day 1 introduce linear regression, generalized linear models, and generalized linear mixed models. I am using example analyses of real psycholinguistic data sets from Harald Baayen’s *languageR* library (freely available through the free stats package *R*). The slides for Day 2 go through problems and solutions for regression models. For more information have a look at the online lectures available via the HLP lab wiki. I’ve uploaded the pdf slides and an R script. There also might be a pod cast available at some point. Feedback welcome. I’ll be giving a similar workshop at McGill in May, so watch for more materials.

I had an intensive and fun visit, meeting with researchers from Psychology, Communication and Disorders, Linguistics, Spanish, German, etc. I learned a lot about bilingualism (not only though) and a bit about anticipatory motor planning. So thanks to everyone there who helped to organize the visit, especially Jorge Valdes and Jee Sook Park. And thanks to Judith Kroll for the awesome cake (see below). Goes without saying that it was a pleasure meeting the unofficial mayor of State College, too ;). See you all at CUNY! Read the rest of this entry »

### LSA09-125: Psycholinguistics and Syntactic Corpora

The LSA Summer Institute is almost over and it has been a lot of fun so far. I didn’t get to see nearly as many talks and classes as I had hoped to, but instead there were tons of interesting conversations, new ideas, and just nice moments hanging out in the sun.

Brief update: It couldn’t have been different — I missed my flight. That happens every time I try to leave the Bay area. I am so used to it, I am not even trying to be on time anymore ;). Ah well, it gives me a chance to enjoy a cappuccino in my favorite SF Cafe (Ritual Roasters) and even to attend Dan’s party (yippie!). Oh, and to upload some random pictures from the class room. Yeah, pretty dark I know. If you have better pictures — can you send them to me and I upload them? Also, here are some pics from our office hours at Caffee Strada (thanks to Judith and Alex for a great job!):

LSA125-ers — thanks for an enjoyable class, for all the questions, and I hope you keep enjoying your projects (or, if nothing else, now know for certain that you really really never want to work with corpora ;). Send us an update about your papers as they progress.

To everyone else out there: If you’re interested in the use of syntactic corpora to investigate language production, you may find our** ****LSA125 class** webpage useful (see especially the links and information on the corpus pages, but also the slides). If you use material from this page, please let us know. Thanks to Judith, we now have a nicely documented version of the TGrep2 Database Tools, which we have dubbed **TDT lite**. Alex and Judith have also prepared example projects. TDT

*lite*allows you to combine the output of TGrep2 searchers on syntactic corpora into a nice tab-delimited database that can be importated into R, Excel, or the stats program of your choice. While it doesn’t give you the full flexibility of scripting things yourself, it makes it considerably easier to start your own corpus-based project. We’re in the progress of polishing things up for distribution (thanks to all the brave members of our class who helped us to understand which parts still need further improvement!). So, if something like that might be of interest to you, let us know whether you would like further information. We hope to have a beta release by the end of August.

### Multilevel model tutorial at Haskins lab

Austin Frank and I just gave a 2×3 hours workshop on multilevel models at Haskins Lab (thanks to Tine Mooshammer for organizing!). We had a great audience with a pretty diverse background (ranging from longitudinal studies on nutrition, over speech researchers, clinical studies, and psycholinguists, to fMRI researchers), which made for lots of interesting conversations on topics I don’t usually get to think about. Thanks to everyone attending =). We had a great time.

We may post the recordings once we receive them, if it turns out they may be useful. But for now, here are many of the slides we used, a substantial subset of which were created by Roger Levy (UC San Diego) and/or in collaboration with Victor Kuperman (Stanford University) for WOMM’09 at the CUNY Sentence Processing Conference, as indicated on the slides. No guarantees for the R-code and please do not distribute (rather: refer to this page) and ask before citing.

**Conceptual intro to Generalized Linear Models and Generalized Linear Mixed**(a.k.a multilevel) models (based on WOMM’09 slides by Roger Levy with minimal changes by Florian Jaeger)**Common issues in ordinary and multilevel regression modeling**(based on Florian Jaeger & Victor Kuperman’s WOMM’09 slides plus some additional materials).**Additional issues: multiple post-hoc comparisons and random effect ex/in-clusion**(Austin Frank)

Questions and comments welcome, preferably using the comment box at the bottom of this page. R related questions should be send to the very friendly email support list for language researchers using R (see R-lang link in the navigation bar to the right).

### Jaeger (2008), J Memory Language, 59, 434-446 (ANOVA)

Since I get asked for the R code I promised in my 2008 JML paper on mixed logit models every now and then, I have posted it here. **If you find this code useful, please consider citing the Jaeger (2008) paper:**

- Jaeger, T. Florian (2008). Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models.
*Journal of Memory and Language 59*, 434-446*.*

Please note, however, that **the data analyzed in that paper is not mine and you need to acquire it from the Inbal Arnon**, who conducted the study. With Inbal’s permission, here’s the data file I used:

- Data from the comprehension component of Study 2 from Arnon, Inbal. “Rethinking child difficulty: The effect of NP type on children’s processing of relative clauses in Hebrew.”
*Journal of Child Language*37.01 (2010): 27-57.

If you try to work your way through my paper, you may also find the following wiki pages from our lab with readings and more code helpful:

http://wiki.bcs.rochester.edu/HlpLab/StatsCourses/

As a quick intro you may find the talks from a recent workshop on the conceptual background, common issues and solutions for ordinary and multilevel regression models that some colleagues (Dale Barr, Roger Levy, Harald Baayen, Victor Kuperman, Austin Frank) and I gave at the CUNY sentence processing conference 2009 useful. The talk slides are all linked to the schedule on that page. You’ll find detailed walk-throughs, R code, and a conceptual overviews.

I appreciate if you leave a comment here in case this was useful. It helps to see what we should be posting. cheers.

### Great new article about random speaker effects in sociolinguistic data analysis

Heya. I just wanted to bring the following nice article by Daniel Ezra Johnson to everyone’s attention:

Getting off the GoldVarb Standard: Introducing Rbrul for Mixed-Effects Variable Rule Analysis,

Daniel Ezra Johnson , University of York,

*Language and Linguistics Compass* 3/1

(2008): 359-383, doi: 10.1111/j.1749-818X.2008.00108.x

PDF: http://www.blackwell-synergy.com/doi/pdf/10.1111/j.1749-818X.2008.00108.x

The article addresses the need for random speaker effect modeling in sociolinguistic data analysis and why researchers should switch from a Goldvarb standard to mixed effect models. The paper also describes an implementation available in R (Rbrul) that does affords both ordinary and multilevel regression modeling and is capable of formatting output in ways that either follow standard regression conventions or the Varbul standard which is more common in sociolinguistics and variationist work. I think the paper is really well written and provides some compelling arguments to use the more advance mixed effect models. Spread the word. There are still plenty of people out there who are hesitant to leave Goldvarb behind despite the obvious shortcoming that it does not support random effects.

### Pre-CUNY Workshop on “Good practices in ordinary and multilevel regression models”?

Out of recent conversations with a whole bunch of folks (e.g. John Trueswell, Jennifer Arnold, Elsi Kaiser, Matt Traxler, Mike Tanenhaus, Jim Magnuson, and more), we came up with the idea to possible hold a workshop on “good practices in ordinary and multilevel regression models” [working title ;)] for researchers working on psycholinguistics/the psychology of language just a day before CUNY 2009 (to be held 03/26-28 at UC Davis), so 03/25 in Davis. This is just a baby of thought at this point, but if you’re interested, I’ve summarized some thoughts below and I’d appreciate your feedback (just leave a comment below and I will receive it).

### Mini-tutorial on regression and mixed (linear & logit) models in R

This summer, Austin Frank and I organized a six 3h-session tutorial on regression and mixed models. It is posted on our HLP lab wiki and consists out of reading suggestions and commented R scripts that we went through in class. Among the topics (also listed for each session on the wiki) are:

- linear & logistics regression
- linear & logit mixed/multilevel/hierarchical models
- model evaluation (residuals, outliers, distributions)
- collinearity tests and dealing with collinearity
- coding of variables (contrasts)
- visualization

We used both Baayen’s 2008 textbook *Analyzing Linguistic Data: A Practical Introduction to Statistics using R *(available online) and Gelman and Hill’s 2007 book on *Data Analysis using Regression and Multilevel/Hierarchical Models*, both of which we can recommend (they also complement each other nicely). If you have questions about this class or you have suggestions for improvement, please send us an email or leave a comment to this page (we’ll get notified).