typology

HLP Lab is looking for graduate researchers

Posted on

The Human Language Processing (HLP/Jaeger) Lab in the Department of Brain and Cognitive Sciences at the University of Rochester is looking for PhD researchers to join the lab. Admission is through the PhD program in the Brain and Cognitive Sciences, which offers full five-year scholarship. International applications are welcome.

Read the rest of this entry »

Advertisements

The serial founder hypothesis and word order universals

Posted on Updated on

Check out this article in ScienceNews summarizing commentaries on two recent language studies in Science (Atkinson, 2011: ) and Nature (Dunn et al., 2011). Each of the studies has received a lot of attention and they are the subject of two special issues in press for Linguistic Typology, to which HLP Lab contributed on three articles. I will add a link to the special issue(s) once it comes out. Read the rest of this entry »

R code for Jaeger, Graff, Croft and Pontillo (2011): Mixed effect models for genetic and areal dependencies in linguistic typology: Commentary on Atkinson

Posted on Updated on

Below I am sharing the R code for our paper on the serial founder effect:
This paper is a commentary on Atkinson’s 2011 Science article on the serial founder model (see also this interview with ScienceNews, in which parts of our comment in Linguistic Typology and follow-up work are summarized). In the commentary, we provide an introduction to linear mixed effect models for typological research. We discuss how to fit and to evaluate these models, using Atkinson’s data as an example.We illustrate the use of crossed random effects to control for genetic and areal relations between languages. We also introduce a (novel?) way to model areal dependencies based on an exponential decay function over migration distances between languages.
Finally, we discuss limits to the statistical analysis due to data sparseness. In particular, we show that the data available to Atkinson did not contain enough language families with sufficiently many languages to test whether the observed effect holds once random by-family slopes (for the effect) are included in the model. We also present simulations that show that the Type I error rate (false rejections) of the approach taken in Atkinson is many times higher than conventionally accepted (i.e. above .2 when .05 is the conventionally accepted rate of Type errors).
The scripts presented below are not intended to allow full replication of our analyses (they lack annotation and we are not allowed to share the WALS data employed by Atkinson on this site anyway). However, there are many plots and tests in the paper that might be useful for typologists or other users of mixed models. For that reason, I am for now posting the raw code. Please comment below if you have questions and we will try to provide additional annotation for the scripts as needed and as time permits. If you find (parts of the) script(s) useful, please consider citing our article in Linguistic Typology.

Two nice resources to find the language of your choice

Posted on

I was just reading Haspelmath’s post on the CyberLingBlog in reply to a summary of a recent talk by Newmeyer. Most of you probable know the World Atlas of Languages, which allows you to browse through language and linguistic properties, view their distributions over beautiful maps, and contains nice introductory articles to many typological features. It’s very well structured and gives you references for each language, too. Here’s a link to a page on a specific language, Polish.

There is another database that I didn’t know about which let’s you browse or search a (as of yet rather small set of) languages for morpho-syntactic properties: Syntactic Structures of the World’s Languages. Properties are defined in a pragmatic and manageable way. For example, SVO is defines allowing that order in a “neutral” context. The definition also makes clear that SVO can be “yes”, while other word order features are “yes”, too.

It seems that you can even contribute to this database by entering your own data (though maybe you need to apply?), including examples with glosses. Looks interesting. The usual caveats apply, but great that someone is trying! Aside of this project I remember only one similar project that someone at the University of Vienna started while I was an undergrad …  but as far as I remember that never reached critical mass.

If anybody knows of similar databases out there, feel free to post them below. Or even better: contribute to the CyberLingBlog. Everybody is invited.