Posts Tagged ‘mixed models

07
Nov
11

The serial founder hypothesis and word order universals

Check out this article in ScienceNews summarizing commentaries on two recent language studies in Science (Atkinson, 2011: ) and Nature (Dunn et al., 2011). Each of the studies has received a lot of attention and they are the subject of two special issues in press for Linguistic Typology, to which HLP Lab contributed on three articles. I will add a link to the special issue(s) once it comes out. Continue reading ‘The serial founder hypothesis and word order universals’

27
Jul
11

New R resource for ordinary and multilevel regression modeling

Here’ s what I received from the Center of Multilevel Modeling at Bristol (I haven’t checked it out yet; registration seems to be free but required):

The Centre for Multilevel Modelling is very pleased to announce the addition of
R practicals to our free on-line multilevel modelling course. These give
detailed instructions of how to carry out a range of analyses in R, starting
from multiple regression and progressing through to multilevel modelling of
continuous and binary data using the lmer and glmer functions.

MLwiN and Stata versions of these practicals are already available.
You will need to log on or register onto the course to view these
practicals.

Read More...
http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13
13
Jul
11

R code for Jaeger, Graff, Croft and Pontillo (2011): Mixed effect models for genetic and areal dependencies in linguistic typology: Commentary on Atkinson

Below I am sharing the R code for our paper on the serial founder effect:
This paper is a commentary on Atkinson’s 2011 Science article on the serial founder model (see also this interview with ScienceNews, in which parts of our comment in Linguistic Typology and follow-up work are summarized). In the commentary, we provide an introduction to linear mixed effect models for typological research. We discuss how to fit and to evaluate these models, using Atkinson’s data as an example.We illustrate the use of crossed random effects to control for genetic and areal relations between languages. We also introduce a (novel?) way to model areal dependencies based on an exponential decay function over migration distances between languages.
Finally, we discuss limits to the statistical analysis due to data sparseness. In particular, we show that the data available to Atkinson did not contain enough language families with sufficiently many languages to test whether the observed effect holds once random by-family slopes (for the effect) are included in the model. We also present simulations that show that the Type I error rate (false rejections) of the approach taken in Atkinson is many times higher than conventionally accepted (i.e. above .2 when .05 is the conventionally accepted rate of Type errors).
The scripts presented below are not intended to allow full replication of our analyses (they lack annotation and we are not allowed to share the WALS data employed by Atkinson on this site anyway). However, there are many plots and tests in the paper that might be useful for typologists or other users of mixed models. For that reason, I am for now posting the raw code. Please comment below if you have questions and we will try to provide additional annotation for the scripts as needed and as time permits. If you find (parts of the) script(s) useful, please consider citing our article in Linguistic Typology.
25
Jun
11

More on random slopes and what it means if your effect is not longer significant after the inclusion of random slopes

I thought the following snippet from a somewhat edited email I recently wrote in reply to a question about random slopes and what it means that an effect becomes insignificant might be helpful to some. I also took it as an opportunity to updated the procedure I described at http://hlplab.wordpress.com/2009/05/14/random-effect-structure/. As always, comments are welcome. What I am writing below are just suggestions.

[...] an insignificant effect in an (1 + factor|subj) model means that, after controlling for random by-subject variation in the slope/effect of factor, you find no (by-convention-significant) evidence for the effect. Like you suggest, this is due to the fact that there is between-subject variability in the slope that is sufficiently large to let us call into question the hypothesis that the ‘overall’ slope is significantly different from zero.

[...] So, what’s the rule of thumb here? If you run any of the standard simple designs (2×2, 2×3, 2x2x2,etc.) and you have the psychologist’s luxury of plenty of data (24+item, 24+ subject [...]), the full random effect structure is something you should entertain as your starting point. That’s in Clark’s spirit. That’s what F1 and F2 were meant for. [...] All of these approaches do not just capture random intercept differences by subject and item. They also aim to capture random slope differences.

[...] here’s what I’d recommend during tutorials now because it often saves time for psycholinguistic data. I am only writing down the random effects but, of course, I am assuming there are fixed effects, too, and that your design factors will remain in the model. Let’s look at a 2×2 design: Continue reading ‘More on random slopes and what it means if your effect is not longer significant after the inclusion of random slopes’

31
May
11

Two interesting papers on mixed models

While searching for something else, I just came across two papers that should be of interest to folks working with mixed models.

  • Schielzeth, H. and Forstmeier, W. 2009. Conclusions beyond support: overconfident estimates in mixed models. Behavioral Ecology Volume 20, Issue 2, 416-420.  I have seen the same point being made in several papers under review and at a recent CUNY (e.g. Doug Roland’s 2009? CUNY poster). On the one hand, it should be absolutely clear that random intercepts alone are often insufficient to account for violations of independence (this is a point, I make every time I am teaching a tutorial). On the other hand, I have reviewed quite a number of papers, where this mistake was made. So, here you go. Black on white. The moral is (once again) that no statistical procedure does what you think it should do if you don’t use it the way it was intended to.
  • The second paper takes on a more advanced issue, but one that is becoming more and more relevant. How can we test whether a random effect is essentially non-necessary – i.e. that it has a variance of 0? Currently, most people conduct model comparison (following Baayen, Davidson and Bates, 2008).  But this approach is not recommended (and neither do Baayen et al recommend it) if we want to test whether all random effects can be completely removed from the model (cf. the very useful R FAQ list, which states “do not compare lmer models with the corresponding lm fits, or glmer/glm; the log-likelihoods [...] include different additive terms”). This issue is taken on in Scheipl, F., Grevena, S. and Küchenhoff, H. 2008. Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models. Computational Statistics & Data Analysis.Volume 52, Issue 7, 3283-3299. They present power comparisons of various tests.
31
May
11

Mixed model’s and Simpson’s paradox

For a paper I am currently working on, I started to think about Simpson’s paradox, which wikipedia succinctly defines as

“a paradox in which a correlation (trend) present in different groups is reversed when the groups are combined. This result is often encountered in social-science [...]“

The wikipedia page also gives a nice visual illustration. Here’s my own version of it. The plot shows 15 groups, each with 20 data points. The groups happen to order along the x-axis (“Pseudo distance from origin”) in a way that suggests a negative trend of the Pseudo distance from origin against the outcome (“Pseudo normalized phonological diversity”). However, this trend does not hold within groups. As a matter of fact, in this particular sample, most groups show the opposite of the global trend (10 out of 15 within-group slopes are clearly positive). If this data set is analyzed by an ordinary linear regression (which does not have access to the grouping structure), the result will be a significant negative slope for the Pseudo distance from origin. So, I got curious: what about linear mixed models?

Continue reading ‘Mixed model’s and Simpson’s paradox’

24
Feb
11

Diagnosing collinearity in mixed models from lme4

I’ve just uploaded files containing some useful functions to a public git repository. You can see the files directly without worrying about git at all by visiting regression-utils.R (direct download) and mer-utils.R (direct download). Continue reading ‘Diagnosing collinearity in mixed models from lme4′

16
Apr
10

Annotated example analysis using mixed models

Jessica Nelson (Learning Research and Development Center, University of Pittsburgh) uploaded a step-by-step example analysis using mixed models to her blog. Each step is nicely annotated and Jessica also discusses some common problems she encountered while trying to analyze her data using mixed models. I think this is a nice example for anyone trying to learn to use mixed models. It goes through all/most of the steps outlined in Victor Kuperman and my WOMM tutorial (click on the graph to see it full size):

16
Feb
10

Tutorial on Regression and Mixed Models at Penn State

Last week (02/3-5/10), I had the pleasure to give the inaugural CLS Graduate Student Young Scientist Colloquium (“An information theoretic perspective on language production”) at the Center for Language Science at Penn State (State College).

I also gave two 3h-lectures on regression and mixed models. The slides for Day 1 introduce linear regression, generalized linear models, and generalized linear mixed models.  I am using example analyses of real psycholinguistic data sets from Harald Baayen’s languageR library (freely available through the free stats package R). The slides for Day 2 go through problems and solutions for regression models. For more information have a look at the online lectures available via the HLP lab wiki. I’ve uploaded the pdf slides and an R script. There also might be a pod cast available at some point. Feedback welcome. I’ll be giving a similar workshop at McGill in May, so watch for more materials.

I had an intensive and fun visit, meeting with researchers from Psychology, Communication and Disorders, Linguistics, Spanish, German, etc.  I learned a lot about bilingualism (not only though)  and a bit about anticipatory motor planning. So thanks to everyone there who helped to organize the visit, especially Jorge Valdes and Jee Sook Park. And thanks to Judith Kroll for the awesome cake (see below). Goes without saying that it was a pleasure meeting the unofficial mayor of State College, too ;) . See you all at CUNY! Continue reading ‘Tutorial on Regression and Mixed Models at Penn State’

21
Dec
09

Some thoughts on the sensitivity of mixed models to (two types of) outliers

Outliers are a nasty problem for both ANOVA or regression analyses, though at least some folks  consider them more of a problem for the latter type of analysis. So, I thought I post some of my thoughts on a recent discussion about outliers that took place at CUNY 2009. Hopefully, some of you will react and enlighten me/us (maybe there are some data, some simulations out there that may speak to the issues I mention below?). I first summarize a case where one outlier out of 400 apparently drove the result of a regression analysis, but wouldn’t have done so if the researchers had used ANO(C)OVA. After that I’ll have some simulation data for you on  another type of “outlier” (I am not even sure whether outlier is the right word): the case where a few levels of a group-level predictor may be driving the result. That is, how do we make sure that our results aren’t just due to item-specific or subject-specific properties. Continue reading ‘Some thoughts on the sensitivity of mixed models to (two types of) outliers’

14
May
09

Random effect: Should I stay or should I go?

One of the more common questions I get about mixed models is whether there are any standards regarding the removal of random effects from the model. When should a random effect be included in the model? This was also one of the questions we had hope to answer for our field (psycholinguistics) in the pre-CUNY Workshop on Ordinary and Multilevel Models (WOMM), but I don’t think we got anywhere close to a “standard” (see Harald Baayen’s presentation on understanding random effect correlations though for a very insightful discussion).

That being said, I find most of us would probably agree on a set of rules of thumb, at least for factorial analyses of balanced data: Continue reading ‘Random effect: Should I stay or should I go?’

03
May
09

Multilevel model tutorial at Haskins lab

Austin Frank and I just gave a 2×3 hours workshop on multilevel models at Haskins Lab (thanks to Tine Mooshammer for organizing!). We had a great audience with a pretty diverse background (ranging from longitudinal studies on nutrition, over speech researchers, clinical studies, and psycholinguists, to fMRI researchers), which made for lots of interesting conversations on topics I don’t usually get to think about. Thanks to everyone attending =). We had a great time.

We may post the recordings once we receive them, if it turns out they may be useful. But for now, here are many of the slides we used, a substantial subset of which were created by Roger Levy (UC San Diego) and/or in collaboration with Victor Kuperman (Stanford University) for WOMM’09 at the CUNY Sentence Processing Conference, as indicated on the slides. No guarantees for the R-code and please do not distribute (rather: refer to this page) and ask before citing.

Questions and comments welcome, preferably using the comment box at the bottom of this page. R related questions should be send to the very friendly email support list for language researchers using R (see R-lang link in the navigation bar to the right).

12
Jan
09

In eigener Sache: LSA 2009 meeting

Unfortunately, I was only able to attend the first day of the LSA meeting this year, but it was good being there (ran into lots of interesting folks and saw a couple of good talks). Ting gave his presentation on Constant Entropy Rate in Mandarin Chinese and he was a real pro ;) . Continue reading ‘In eigener Sache: LSA 2009 meeting’

09
Sep
08

Mini-tutorial on regression and mixed (linear & logit) models in R

This summer, Austin Frank and I organized a six 3h-session tutorial on regression and mixed models. It is posted on our HLP lab wiki and consists out of reading suggestions and commented R scripts that we went through in class. Among the topics (also listed for each session on the wiki) are:

  • linear & logistics regression
  • linear & logit mixed/multilevel/hierarchical models
  • model evaluation (residuals, outliers, distributions)
  • collinearity tests and dealing with collinearity
  • coding of variables (contrasts)
  • visualization

We used both Baayen’s 2008 textbook Analyzing Linguistic Data: A Practical Introduction to Statistics using R (available online) and Gelman and Hill’s 2007 book on Data Analysis using Regression and Multilevel/Hierarchical Models, both of which we can recommend (they also complement each other nicely). If you have questions about this class or you have suggestions for improvement, please send us an email or leave a comment to this page (we’ll get notified).




Blog Stats

  • 117,920 hits

 

June 2012
M T W T F S S
« May    
 123
45678910
11121314151617
18192021222324
252627282930  

Categories

RSS Language Log


Follow

Get every new post delivered to your Inbox.