Due to popular demand
– you can find the Computational Psycholinguistics class Roger Levy and I are currently teaching at the LSA 2011 institute at Boulder mirrored here.
Posts Tagged ‘psycholinguistics
CogSci 2011 papers uploaded
In case, there’s interest, have a look at the papers to be presented at this year’s Cognitive Science meeting in Boston (July, 20th-23rd). HLP lab will be represented by two talks and four posters. The two talks will presenting work employing artificial language learning to address questions about typological generalizations:
- Masha Fedzechkina(BCS, University of Rochester) will present evidence that language learners are biased to reduced the uncertainty in the mapping from form to meaning. Her work is comparing the acquisition of miniature languages with and without case-marking in terms of to what extent learners tend to regularize or even fix variable word orders for these two types of languages (Fedzechkina, Jaeger, & Newport, 2011). Together with other recent work (e.g. by Newport, by Culbertson), this work provides evidence that language learners deviate from the input provided to them in a predictable manner. In this case, we designed the experiment to directly test the functionalist claim that language learners are biases towards acquiring languages that support communication (cf. Bates and MacWhinney’s early work).
- Hal Tily (BCS, MIT) will present work employing a novel web-based artificial language learning paradigm, in which hundreds of participants can be run within a matter of a few days. Using this paradigm, we first replicated and extended a well-known study on determiner learning (Hudson Kam and Newport, 2004) and then investigate to what extent cross-linguistically observed quantitative patterns in argument and determiner order are replicated by language learners. We discuss how this paradigm will facilitate further tests of typological generalizations (Tily, Frank, & Jaeger, 2011).
At this year’s CUNY Sentence Processing Conference, Emily Bender and Jennifer Arnold presented a Festschrift celebrating Thomas Wasow. Here’s what the publisher’s site (CSLI) says (picture taken from the publisher’s website, which is hopefully ok; see the book for copyrights):
This book is a collection of papers on language processing, usage, and grammar, written in honor of Tom Wasow to commemorate his career on the occasion of his 65th birthday. Tom is a professor of linguistics and philosophy. But more accurately, he is a renaissance academic, having done work that connects with many different disciplines, including formal linguistics, sociolinguistics, historical linguistics, psycholinguistics, computational linguistics, and philosophy. Appropriately, this book reflects the diversity of Tom’s research and interests, including topics from multiple branches of linguistics and human information processing. These papers are written with minimal background assumed, so they can be used as teaching materials for beginning scholars. As such, this volume is a tribute to what is perhaps Tom’s most lasting contribution to the field—the mentorship and inspiration he provided to his students and collaborators, many of whom have contributed to this volume.
The book contains introductory and overview articles on a variety of topics in cognitive science from Emily M. Bender, Dan Flickinger, Stephan Oepen, Ash Asudeh, Peter Sells, Amy Perfors, James Paul Gee, John R. Rickford, T. Florian Jaeger, Jennifer E. Arnold, Harry J. Tily, Neal Snider, John A. Hawkins, and Susanne Riehemann.
Ohio
Judith Degen, Masha Fedzechkina and I just came back from Ohio State’s linguistics departments, where we had a great time presenting and discussing our work. Masha gave her first talk ever, presenting her work within the artificial language learning paradigm on functional biases on acquisition (an extension of her LSA poster, soon to be posted here). Judith gave a wonderful guest lecture for Shari Speer’s introduction to psycholinguistics. She talked about scalar implicature and her work with Mike Tanenhaus on this topic. Since even I got it (and I am well-known to be pragmatically challenged), I can highly recommend her slides on scalar implicature processing (beware it’s a monster file – click and go grab a coffee).
Thanks to everyone there for great and insightful conversations and for organizing this. I was particularly excited to hear about potential applications of Uniform Information Density to natural language generation (please keep me posted!). Oh, and extra big thanks to Judith Tonhauser and her fat white cat.
Thanks to Zach Warren for pointing me to this cool tool by Google: the ngram viewer.
And just as a demonstration of how cool this is: watch how the OCP rocks that-mention in complement clauses to the verb believe. Of course, we would have to check for other complement clause embedding verbs and for the a priori probability of this vs. that determiner and pronoun uses. And no, that will be my paper. So don’t you dare write it. If you into OCP effects on optional function word use (e.g. because they might be taken to argue for phonological effects on grammatical encoding), see the references below.
And here are some more verbs for those complement clause lovers out there:
Continue reading ‘Watch the OCP rock Google’s ngram viewer’
You are ever so cordially invited to attend the following awesome-to-be workshop at the LSA 2011:
Empirically Examining Parsimony and Redundancy
in Usage-Based Models
Organized Session at 2011 Linguistic Society of America Annual Meeting
Schedule
Please see http://www.lsadc.org/info/preliminary-program-2011.cfm#saturday-afternoon (#50)
Main Session:
When: Saturday, 1/08, 2-3:30pm (1.5 jam-packed hours of mindless fun)
Where: Grand Ballroom 4, Wyndham Grand Pittsburgh Downtown Hotel, Pittsburgh, PA
Poster Session
When: Sunday, 1/09, 9-12am (the journey continues)
Where: Grand Ballroom Foyer, Wyndham Grand Pittsburgh Downtown Hotel, Pittsburgh, PA
Participants
R. Harald Baayen (University of Alberta)
Joan Bresnan (Stanford University)
Walter Daelemans (University of Antwerp)
Bruce Derwing (University of Alberta)
Daniel Gildea (University of Rochester)
Matthew Goldrick (Northwestern University)
Peter Hendrix (University of Alberta)
Gerard Kempen (Max Planck Institute)
Victor Kuperman (McMaster University)
Yongeun Lee (Chung Ang University)
Gary Libben (University of Calgary)
Marco Marelli (University of Alberta)
Petar Milin (University of Alberta)
Timothy John O’Donnell (Harvard University)
Gabriel Recchia (Indiana University)
Antoine Tremblay (IWK Health Center)
Benjamin V. Tucker (University of Alberta)
Antal van den Bosch (Tilburg University/University of Antwerp)
Christ Westbury (University of Alberta)
Organizers
Neal Snider (Nuance Communications, Inc.)
Daniel Wiechmann (Friedrich-Schiller-Universität Jena)
Elma Kerz (RWTH-Universität Aachen)
T. Florian Jaeger (University of Rochester)
Description
Recent years have seen a growing interest in usage-based (UB) theories of language, which assume that language use plays a causal role in the development of linguistic systems over historical time. A central assumption of the UB-framework is the idea that shapes of grammars are closely connected to principles of human cognitive processing (Bybee 2006, Givon 1991, Hawkins 2004). UB-accounts strongly gravitate towards sign- or construction-based theories of language, viz. theories that are committed to the belief that linguistic knowledge is best conceived of as an assembly of symbolic structures (e.g. Goldberg 2006, Langacker 2008, Sag et al. 2003). These constructionist accounts share (1) the postulation of a single representational format of all linguistic knowledge and (2) claim a strong commitment to psychological plausibility of mechanisms for the learning, storage, and retrieval of linguistic units. They do, however, exhibit a considerable degree of variation with respect to their architectural and mechanistic details (cf. Croft & Cruse 2004). Continue reading ‘Special session at the LSA meeting in da’Burgh’
Comment on our article in LLC
The Language and Linguistics Compass (LLC) has launched a new joint project with LinguistList, so that they now showcase selected LLC articles on LinguistList. You get free access to these articles for that time and can comment on them in a moderated forum.
If you have time, have a look at and possibly add to the discussion forum on Jaeger & Norcliffe. 2009. The Cross-linguistic Study of Sentence Production, 3(4), 866 – 88. It just was posted, so it’s probably still scarily empty
.
LSA 2011 at Boulder: Yeah!
Woohooo. Roger Levy and I will be teaching a class on Computational Psycholinguistics at the 2011 LSA’s Linguistics Institute to be held July 5th- August 5th next year in Boulder, CO. The class description should be available through their website soon, but here are some snippets from our proposal: Continue reading ‘LSA 2011 at Boulder: Yeah!’
And here is one more poster from CUNY. This one is work by Robin Melnick at Stanford together with Tom Wasow. Robin ran forced-choice and 100-point-preference norming experiments on that-mentioning in relative and complement clauses to investigate the extent to which the factors that affect processing correlate with the factors affecting acceptability judgments. Going beyond previous work, he actually directly correlates the effect sizes of individual predictors in the processing and acceptability models. All experiments were run both in the lab and over the web using MechanicalTurk.
And here is one more poster on Yucatec, following Lindsay’s example. This is work by Lis Norcliffe, who just graduated from Stanford and join the MPI in Nijmegen. Her thesis work is on the (possibly resumptive) morphology discussed in this poster and the experiments were part of that thesis, too. You’ll find effects of definiteness and dependency length, which we investigated since they (in our view) provide evidence that this morphological reduction alternation is affected by both a preference for uniform information density and a preference for dependency minimization. Feedback welcome.
Good news! We’ve analyzed the previously mentioned experiment on animacy and word order in Yucatec. We coded animacy of the Agent and Patient referents (human, animal, inanimate), transitivity (transitive, intransitive) and voice (active, passive, other) of the verb. We also coded the definiteness of the Agent and Patient referents (definite, indefinite).
Overall, Agent-Verb-Patient word order was strongly preferred (see Table 1). Moreover, human subjects were more likely to appear earlier in the sentence (ps<0.0001, interaction n.s., N=597), which is predicted by direct accessibility accounts. Human agents and patients were were more likely to be described as definite (ps<0.0002), and definite NPs showed a tendency to be mentioned earlier (agent: p<0.0001; patient: n.s., interaction p<0.0001). Still, the effect of animacy held independently (ps<0.002; interaction n.s.). The agent animacy effect was somewhat mediated by an effect on transitivity (whether participants described an event as e.g. an apple hitting a man or an apple falling on a man in that inanimate agents were less often described transitively (p<0.0001; no patient effects). The agent animacy effect remained significant even for transitive sentences (p<0.004; no interaction, N=502). In terms of the effects of voice, human agents correlated with the use of active voice (p<0.0001), and human patients correlated with the use of passive voice, though not at strongly (p<0.03, N=604).
| Word order | Total | Active | Passive | Other |
| Agent-Verb-Patient | 440 | 427 | 7 | 6 |
| Patient-Verb-Agent | 63 | 2 | 61 | 0 |
| Other | 28 | 20 | 7 | 1 |
What does this mean? Good news! Interesting results. In Yucatec, the passive voice is encoded by verbal morphology. Passive voice does not presuppose or preclude a word order change. When a patient was human, sentences were more likely to be in the passive voice. Moreover, human patients were more likely to be mentioned earlier. So, we’ve seen the use of passive voice morphology and earlier mention with human patients.
Benjamin Van Durme and Austin Frank (who still doesn’t have a webpage) have been doing some neat comparisons of web-based estimate of language experience vs. traditional data sources. This work is part of a project funded by the University of Rochester’s Provost Award for Multidisciplinary Research. Since I really like the results, I am gonna use some lazy time to blog about my favorites.
We found that web-based probability estimates can be used to investigate probability-sensitive human behavior. We used databases of word naming, picture naming, and lexical decision tasks, as well as a database of word durations derived from the Switchboard corpus of spontaneous speech. Comparing Google Web 1T 5-gram counts vs. CELEX (spoken and written), BNC (spoken and written), and Switchboard counts, we estimated word frequencies and compared models using these different frequency estimates against the different types of probability sensitive language behaviors mentioned above (word naming RTs, etc.).
I find this encouraging, as web data, unlike traditionally used data sources, is cheap and readily available for many languages, thereby facilitating cross-linguistics work on probability-sensitive human language processing. Additionally, we found at least preliminary evidence that simple principal component analysis over the various frequency estimates leads to better correlation against human language behavior.

CELEX (written), Google Ngram, and their 1st principal component fitted against probability-sensitive human language behavior
Continue reading ‘How good is the web as an approximation of language experience?’
The LSA Summer Institute is almost over and it has been a lot of fun so far. I didn’t get to see nearly as many talks and classes as I had hoped to, but instead there were tons of interesting conversations, new ideas, and just nice moments hanging out in the sun.
Brief update: It couldn’t have been different — I missed my flight. That happens every time I try to leave the Bay area. I am so used to it, I am not even trying to be on time anymore
. Ah well, it gives me a chance to enjoy a cappuccino in my favorite SF Cafe (Ritual Roasters) and even to attend Dan’s party (yippie!). Oh, and to upload some random pictures from the class room. Yeah, pretty dark I know. If you have better pictures — can you send them to me and I upload them? Also, here are some pics from our office hours at Caffee Strada (thanks to Judith and Alex for a great job!):
- Random class room shot (2)
- Random class room shot
- Late night “office hours” at Jupiter’s
- Michi smiling with TGrep2 at his command (almost!)
- Judith and Alex working hard to spread the word of Switchboard
- Judith (at hour 2 of 6)
- hmm, probably at Jupiter’s again
LSA125-ers — thanks for an enjoyable class, for all the questions, and I hope you keep enjoying your projects (or, if nothing else, now know for certain that you really really never want to work with corpora
. Send us an update about your papers as they progress.
To everyone else out there: If you’re interested in the use of syntactic corpora to investigate language production, you may find our LSA125 class webpage useful (see especially the links and information on the corpus pages, but also the slides). If you use material from this page, please let us know. Thanks to Judith, we now have a nicely documented version of the TGrep2 Database Tools, which we have dubbed TDTlite. Alex and Judith have also prepared example projects. TDTlite allows you to combine the output of TGrep2 searchers on syntactic corpora into a nice tab-delimited database that can be importated into R, Excel, or the stats program of your choice. While it doesn’t give you the full flexibility of scripting things yourself, it makes it considerably easier to start your own corpus-based project. We’re in the progress of polishing things up for distribution (thanks to all the brave members of our class who helped us to understand which parts still need further improvement!). So, if something like that might be of interest to you, let us know whether you would like further information. We hope to have a beta release by the end of August.
As some of you know, we’ve been planning to study certain aspect of language production in Mayan for some time now. Well, planning has been followed by flying, and now we (Elisabeth Norcliffe, Stanford University, and I) are here and ready to run our first studies!
Continue reading ‘Getting started in Mexico: Contacts & Pilots on Mayan’
CogSci08 – here we come
I still don’t know how I managed to not ever have been to CogSci before, but this year it will happen. Thanks to Austin Frank (BCS, UofR), Carlos Gomez Gallo (CS, UofR), and Neal Snider (Ling, Stanford University), a bunch of us will be presenting at CogSci08 in Washington, D.C. in July. I will upload the papers soon.









