Posts Tagged ‘data analysis

13
Jul
11

R code for Jaeger, Graff, Croft and Pontillo (2011): Mixed effect models for genetic and areal dependencies in linguistic typology: Commentary on Atkinson

Below I am sharing the R code for our paper on the serial founder effect:
This paper is a commentary on Atkinson’s 2011 Science article on the serial founder model (see also this interview with ScienceNews, in which parts of our comment in Linguistic Typology and follow-up work are summarized). In the commentary, we provide an introduction to linear mixed effect models for typological research. We discuss how to fit and to evaluate these models, using Atkinson’s data as an example.We illustrate the use of crossed random effects to control for genetic and areal relations between languages. We also introduce a (novel?) way to model areal dependencies based on an exponential decay function over migration distances between languages.
Finally, we discuss limits to the statistical analysis due to data sparseness. In particular, we show that the data available to Atkinson did not contain enough language families with sufficiently many languages to test whether the observed effect holds once random by-family slopes (for the effect) are included in the model. We also present simulations that show that the Type I error rate (false rejections) of the approach taken in Atkinson is many times higher than conventionally accepted (i.e. above .2 when .05 is the conventionally accepted rate of Type errors).
The scripts presented below are not intended to allow full replication of our analyses (they lack annotation and we are not allowed to share the WALS data employed by Atkinson on this site anyway). However, there are many plots and tests in the paper that might be useful for typologists or other users of mixed models. For that reason, I am for now posting the raw code. Please comment below if you have questions and we will try to provide additional annotation for the scripts as needed and as time permits. If you find (parts of the) script(s) useful, please consider citing our article in Linguistic Typology.
26
Feb
11

Bayesian Data Analysis, p-values, and more: What do we need?

Some of you might find this open letter by John Kruschke (Indiana University) interesting. He is making a passionate argument to abandon traditional “20th century” data analysis in favor of Bayesian approaches.

10
Mar
09

Great new article about random speaker effects in sociolinguistic data analysis

Heya. I just wanted to bring the following nice article by Daniel Ezra Johnson to everyone’s attention:

Getting off the GoldVarb Standard: Introducing Rbrul for Mixed-Effects Variable Rule Analysis,
Daniel Ezra Johnson , University of York,
Language and Linguistics Compass 3/1
(2008): 359-383, doi: 10.1111/j.1749-818X.2008.00108.x
PDF:  http://www.blackwell-synergy.com/doi/pdf/10.1111/j.1749-818X.2008.00108.x

The article addresses the need for random speaker effect modeling in sociolinguistic data analysis and why researchers should switch from a Goldvarb standard to mixed effect models. The paper also describes an implementation available in R (Rbrul) that does affords both ordinary and multilevel regression modeling and is capable of formatting output in ways that either follow standard regression conventions or the Varbul standard which is more common in sociolinguistics and variationist work. I think the paper is really well written and provides some compelling arguments to use the more advance mixed effect models. Spread the word. There are still plenty of people out there who are hesitant to leave Goldvarb behind despite the obvious shortcoming that it does not support random effects.




Blog Stats

  • 117,920 hits

 

June 2012
M T W T F S S
« May    
 123
45678910
11121314151617
18192021222324
252627282930  

Categories

RSS Language Log


Follow

Get every new post delivered to your Inbox.