Since more an more folks are running web-based experiments (typically, via Amazon’s Mechanical Turk or other platforms), I thought I’d put together a little sampling of demo experiments. We’ll keep updating this periodically, so feel free to subscribe to the RSS feed. Note that not all of the paradigms listed below have been developed by HLP Lab members (for credits, see below). We might release some of our paradigms for use by others soon. If you’re interested, please leave a comment below and subscribe to this page. This is the easiest way for us to make sure to keep you in the loop. Thank you for understanding. Continue reading ‘Some examples of web-based experiments’
Author Archive for Florian Jaeger
The reproducibility project
Thanks to Anne Pier Salverda who made me aware of this project to replicate all studies in certain psych journals, including APA journals that publish psycholinguistic work, such as JEP:LMC. This might be a fine April fools joke slightly delayed, but it sure is a great idea! In a similar study researchers apparently found that 6 out of 53 cancer studies replicated (see linked article).
And while we are at it, here’s an article that, if followed, is guaranteed to increase the proportion of replications (whereas power, effect sizes, lower p-values, family-wise error corrections, min-F and all the other favorites out there are pretty much guaranteed to not do the job). Simmons et al 2011, published in Psychological Science, shows what we should all know but that is all too often forgotten or belittled: lax criteria in excluding data, adding additional subjects, transforming data, adding or removing covariates inflate the Type I error rate (in combination easily up to over 80% false negatives for p<.05!!!). Enjoy.
The NSF/SBE released its executive summary of 252 short white papers on the future of the social, behavioral, and economic sciences. Among other things, the report identifies four focus areas (population change; sources of disparities; communication, language, and linguistics; and technology, new media, and social network) and three properties of future research (data-intensive, multidisciplinary, and collaborative). But read for yourself. The report summarizes what the community (authors that submitted white papers) had to say about what works well and what needs to be improved in terms of the processes that are currently employed by the NSF to distribute its funding. On p. 24 an onward, you can read a summary of the many many linguistic white papers that seem to have been submitted (see p. 39 for a summary of which disciplines the white papers came from). On p.29 an onward the report lays out possible scenarios as to how the NSF might change in order to get to the outlined vision.
This might be of interest to some of you: Google Scholar now allows you to correct links or citations to your work. It also provides a complete summary of all your citations, by article, by year, etc. It’s a functionality similar to academia.edu, but it let’s you remove wrong links to your work (e.g. to old prepublished manuscripts).
The interface is rather convenient since it allows you to import all references from scholar, which is almost 95% correct. Overall, it’s actually much more convenient than academia.edu (though I’d say it serves a slightly different purpose). It also generates a list of all your co-authors and other schnick-schnack
. Check it out. Sweet.
Continue reading ‘Google scholar now provides detailed citation report’
If you’re running chi-squares to analyze categorical data and you have lots of very low count (or even 0 cells), be careful in how to interpret the result. There’s a nice article by Andrew Gelman on this topic, where he shows that the problem is that all the low counts can make it harder to detect the signal (and hence a significant deviation from the expected values for a part of the table). Put differently, you might have a significant pattern, but not detect. I don’t think it’s so much a problem for most of the tests we conduct since contingency tables in psycholinguistic and linguistic research are usually rather small. I can’t recall the last time that I saw anything larger than a 3×4 or alike. From what I understand from the Gelman’s post, it would seem that the problem he points out becomes more serious the larger the table is.
This might be of interest to folks, in case you haven’t seen it. First, there’s RAPID and EAGER. RAPID is a mechanism for research that requires fast funding decisions (e.g. b/c the first language with only one phoneme was just discovered but its last speaker is just about to enter into a vow of silence). EAGERs are “Early-concept Grants for Exploratory Research” for exploratory work – i.e. high risk research with a high potential for high pay-off. One important property of both mechanisms is that submissions do not have to be sent out for external review, which should substantially shorten the time until you hear back from NSF.
Second, there is now a new type of proposal that is specifically aimed at interdisciplinary work that would not usually be funded by any of the existing NSF panels alone – CREATIV: Creative Research Awards for Transformative Interdisciplinary Ventures.
Note that all three of these funding types allow no re-submission.
Check out this article in ScienceNews summarizing commentaries on two recent language studies in Science (Atkinson, 2011: ) and Nature (Dunn et al., 2011). Each of the studies has received a lot of attention and they are the subject of two special issues in press for Linguistic Typology, to which HLP Lab contributed on three articles. I will add a link to the special issue(s) once it comes out. Continue reading ‘The serial founder hypothesis and word order universals’
And while I am at it, let me post three more papers that are interesting for anyone interested in uniform information density and, more generally, theories of communicatively efficient language production (though most of you may already know these papers):
- They call it speech information rate, but it’s essentially the same: Pellegrine, F., Coupe, C., and Marsico, E. 2011. A cross-linguistic perspective on speech information rate. Language 87(3), 539-558.
- Maurits, L., Perfors, A., and Navarro, D. 2010. Why are some word orders more common than others. A uniform information density account. NIPS.
- S.T. Piantadosi, H. Tily, and E. Gibson. 2011. Word lengths are optimized for efficient communication.Proceedings of the National Academy of Sciences, 108(9):3526.
UID and text generation
Ah, just when I thought it couldn’t get any better: Uniform Information Density has been applied to text generation
. Have a look at this paper (thanks, Raja, for forwarding it):
- Rajakrishnan Rajkumar and Michael White. 2011. Linguistically Motivated Complementizer Choice in Surface Realization. In Proc. of the EMNLP-11 Workshop on Using Corpora in NLG. (bib)
Better late than never: Congratulations to Dave Kleinschmidt for winning the “Student Talk Prize” at the 2011 meeting of Architecture and Mechanisms of Language Processing in Paris, France. If you want to learn more about’s Dave’s work on A Bayesian belief updating model of phonetic recalibration and selective adaptation either have a look at this AMLaP abstract or read Dave’s short ACL paper on some the findings presented at the 2011 Cognitive Modeling and Computational Linguistics workshop in Portland, Oregon (here’s a link to the full proceedings).
If you’re interested in this line of work, you might also enjoy reading Morgan Sonderegger and Alan Yu’s 2010 CogSci paper on A rational account of perceptual compensation for coarticulation, which we learned about recently.
Here’ s what I received from the Center of Multilevel Modeling at Bristol (I haven’t checked it out yet; registration seems to be free but required):
The Centre for Multilevel Modelling is very pleased to announce the addition of R practicals to our free on-line multilevel modelling course. These give detailed instructions of how to carry out a range of analyses in R, starting from multiple regression and progressing through to multilevel modelling of continuous and binary data using the lmer and glmer functions. MLwiN and Stata versions of these practicals are already available. You will need to log on or register onto the course to view these practicals. Read More... http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13
Due to popular demand
– you can find the Computational Psycholinguistics class Roger Levy and I are currently teaching at the LSA 2011 institute at Boulder mirrored here.
- Jaeger, Graff, Croft, and Pontillo. 2011. Mixed effect models for genetic and areal dependencies in linguistic typology: Commentary on Atkinson. Linguistic Typology 15(2), 281–319. [if you're not subscribed to Linguistic Typology, check out this pre-final draft or contact me for an offprint].
I thought the following snippet from a somewhat edited email I recently wrote in reply to a question about random slopes and what it means that an effect becomes insignificant might be helpful to some. I also took it as an opportunity to updated the procedure I described at http://hlplab.wordpress.com/2009/05/14/random-effect-structure/. As always, comments are welcome. What I am writing below are just suggestions.
[...] an insignificant effect in an (1 + factor|subj) model means that, after controlling for random by-subject variation in the slope/effect of factor, you find no (by-convention-significant) evidence for the effect. Like you suggest, this is due to the fact that there is between-subject variability in the slope that is sufficiently large to let us call into question the hypothesis that the ‘overall’ slope is significantly different from zero.
[...] So, what’s the rule of thumb here? If you run any of the standard simple designs (2×2, 2×3, 2x2x2,etc.) and you have the psychologist’s luxury of plenty of data (24+item, 24+ subject [...]), the full random effect structure is something you should entertain as your starting point. That’s in Clark’s spirit. That’s what F1 and F2 were meant for. [...] All of these approaches do not just capture random intercept differences by subject and item. They also aim to capture random slope differences.
[...] here’s what I’d recommend during tutorials now because it often saves time for psycholinguistic data. I am only writing down the random effects but, of course, I am assuming there are fixed effects, too, and that your design factors will remain in the model. Let’s look at a 2×2 design: Continue reading ‘More on random slopes and what it means if your effect is not longer significant after the inclusion of random slopes’
While searching for something else, I just came across two papers that should be of interest to folks working with mixed models.
- Schielzeth, H. and Forstmeier, W. 2009. Conclusions beyond support: overconfident estimates in mixed models. Behavioral Ecology Volume 20, Issue 2, 416-420. I have seen the same point being made in several papers under review and at a recent CUNY (e.g. Doug Roland’s 2009? CUNY poster). On the one hand, it should be absolutely clear that random intercepts alone are often insufficient to account for violations of independence (this is a point, I make every time I am teaching a tutorial). On the other hand, I have reviewed quite a number of papers, where this mistake was made. So, here you go. Black on white. The moral is (once again) that no statistical procedure does what you think it should do if you don’t use it the way it was intended to.
- The second paper takes on a more advanced issue, but one that is becoming more and more relevant. How can we test whether a random effect is essentially non-necessary – i.e. that it has a variance of 0? Currently, most people conduct model comparison (following Baayen, Davidson and Bates, 2008). But this approach is not recommended (and neither do Baayen et al recommend it) if we want to test whether all random effects can be completely removed from the model (cf. the very useful R FAQ list, which states “do not compare lmer models with the corresponding lm fits, or glmer/glm; the log-likelihoods [...] include different additive terms”). This issue is taken on in Scheipl, F., Grevena, S. and Küchenhoff, H. 2008. Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models. Computational Statistics & Data Analysis.Volume 52, Issue 7, 3283-3299. They present power comparisons of various tests.