Ok, we may have named in a haste, but maybe it’s still useful: We’ve created an email list that can be used to announce stats and machine learning workshops of interest to language researchers, psycholinguists, etc. in the Upstate area (Rochester, Cornell, Buffalo, etc.). Feel free to join, it’s very low traffic: http://groups.google.com/group/statistics-northeast
We recently ran into a problem getting Lingalyzer, the analysis program from Doug Rohde’s Linger to work on MacOS X. The problem at least occurs on 10.5 and up, but could well occur on lower versions as well. Lingalyzer depends on a statistics suite called |stat, which is where the actual problem lies.
When you run the lingalyzer script, it dies with the error "warning: this program uses gets(), which is unsafe." We were initially confused, because Lingalyzer is written in Tcl, which has a gets() function, and lingalyzer uses it quite a bit. But the problem was actually the |stat programs that it was calling, which use the C gets() functions. The gets() function is well known for being a buffer overflow risk. GCC warns you sternly not to use it, but MacOS X goes so far as to trap calls to it and refuse to execute the offending program.
It turns out that there is a relatively easy solution, namely replacing all calls to gets() with calls to fgets(). Wherever in the source code you see:
while (gets (line))
replace it with:
while (fgets (line, sizeof(line), stdin))
I have a patch that can be applied to the |stat source code that replaces all of them, as well as adding the CFLAGS to the makefile to build a Universal Binary. However, the license for |stat appears to prohibit redistributing modified versions of the code, and a patch might run afoul of that. If you ask nicely I can email it to you though. The license also prohibits even local modifications for any purpose other than making it run on your system, so if MacOS didn’t terminate programs with gets() with extreme prejudice, then even the changes I made would be in violation. Weird.
What to do when you need an intuitive measure of model quality for your logit (logistic) model? The problem is that logit models don’t have a nice measure such as R-square for linear models, which has a super intuitive interpretation. However, several pseudo R-square measures have been suggested are some are more commonly used (e.g. Nagelkerke R2). In R, some model-fitting procedures for ordinary logistic regression provide the Nagelkerke R-square as part of the standard output (e.g. lrm in Harrell’s Design package). However, no such measure is provided for the most widely used mixed logit model-fitting procedure (lmer in Bates’ lme4 library). Below I provide some code that provides Nagelkerke and CoxSnell pseudo R-squares for mixed logit models. Continue reading ‘Nagelkerke and CoxSnell Pseudo R2 for Mixed Logit Models’
www est mort, vive www
Our web server (www.hlp.rochester.edu) bit the dust yesterday. It was a seven year old computer that originally came from a compute cluster, and then we had running non-stop for the last two years as a web server. A new computer is being ordered, but we’re likely to be down for a week or two, just so you know, if you care.
UPDATE (2009-09-16): WWW is back online.
It’s maaaaaaaaaa pleasure to announce that a couple of new folks will be joining/visiting HLP lab this Fall. It’s less of a pleasure to say that some of the folks will leave the lab to move on, but that’s how it goes. So, here comes an introduction and a farewell.
Two new students will join via the PhD program:
- Ting Qian has decided to join us for his graduate studies. He completed his B.Sc. in BCS (Artificial Intelligence track) at the University of Rochester after transferring from SUNY Oswego. He has worked on genetic algorithms, attribute-driven extraction of lexical classes, as well as on the distribution of information throughout discourses in written and spoken Mandarin Chinese (publications on the latter topic can be found on the HLP lab website).
- Masha Fedzechkina will join us coming from the University of Cologne (originally from Belarus), where she finished a Magister in Data Processing including classes in linguistics and CS. She’s planning to work on processing-driven effects on acquisition using both computational and behavioral methods.
Additionally, two graduate researchers from other universities will join the lab to lead our NSF-funded research project on Field-based Psycholinguistics in the Yucatan (joint work with Juergen Bohnemeyer, UB):
- Alice Lemieux will join us from the University of Chicago where she’s working on her PhD in Linguistics. She’s done other fieldwork (on Washo) before and is interested in language contact.
- Lindsay K. Butler will join us from the University of Arizona where she’s working on her PhD in Linguistics. Lindsay already has some background in Yucatec from an instensive summer workshop at UNC (including a couple of weeks in the Yucatan).
Alice and Lindsay will take classes on Yucatec from Juergen Bohnemeyer to obtain basic speaking knowledge of Yucatec and to learn about the linguistic structure of Yucatec. They will run sentence production studies on Yucatec Maya (in the Yucatan) including experiments on accessibility and weigt effects on word order and morphological choices. I want to take this opportunity to publically thank Lis Norcliffe and Tania Nikitina for absolutely invaluable help with the preparation of the grant. Lis hugely influenced the design of our experiments and co-wrote large sections of the grants. Of course, she’s not to be blamed for anything we may screw up
. Thanks also go to Carlos Gomez Gallo and Katrina Housel who helped gather the pilot data for the grant during our previous visit to the Yucatan.
- In September, we also will have an HLP lab visitor. Elma Kerz is joining us from the RWTH Aachen in Germany where she teaches computer-based linguistics, psycholinguistics, and various other things. Her work includes paper in construction grammar, cognitive grammar, and on grammaticalization.
Finally, HLP will have its first alumni this year:
- Benjamin VanDurme (“The Durmster”; PhD in CS with a Minor in Linguistics) got offers from Stanford (post-doc with Dan Jurafsky and Chris Manning) and the Human Language Technology Center of Excellence at Hopkins. After much consideration he chose Hopkins as research faculty where he will start soon. Most of Ben’s work over recent years was with Len Schubert and, more recently, also with Dan Gildea, but he’s also been involved in several HLP lab projects including work on the link between words’ redundancy in context and their pronunciation as well as work assessing the use of Google n-grams for research on language processing. It’s unclear what will happen to our espresso machine, now that he’s gone. Ben, just you wait and see. You’ll come back for that (lukewarm) espresso
! - Carlos Gomez Gallo is about to wrap things up, too (things=PhD in CS with a Minor in Linguistics). He received post-doc offers from Minnesota and Harvard, and an offer to join the American University at Beirut as faculty. He’s going to join Maria Polinksy’s lab at Harvard to run cross-linguistics studies on linguistic representations and processing. It looks like collaborations (among other things on Spanish and Maya data we collected in the Yucatan last Winter) will continue since Masha is interested in similar questions. Carlos is also about to finish a manuscript on work investigating incremental language production beyond the clausal level. Much of his work at Rochester was concerned with the creation and use of the Fruitcart corpus (which he will not fail to mention if you run into him
). This work was done in collaboration with with James Allen and others in his lab. Carlos, Good luck at Harvard!
So, welcome and ciao ciao (but visit).
Berlin – it still rocks
Ok, everyonce in a while I should be allowed a kinda personal post here (I just decided that). So, I choose … dickes B. I just had a couple of wonderful days in Berlin. It still is full of surprises and August is still the best time to visit the best city ever
. Why? Continue reading ‘Berlin – it still rocks’
The LSA Summer Institute is almost over and it has been a lot of fun so far. I didn’t get to see nearly as many talks and classes as I had hoped to, but instead there were tons of interesting conversations, new ideas, and just nice moments hanging out in the sun.
Brief update: It couldn’t have been different — I missed my flight. That happens every time I try to leave the Bay area. I am so used to it, I am not even trying to be on time anymore
. Ah well, it gives me a chance to enjoy a cappuccino in my favorite SF Cafe (Ritual Roasters) and even to attend Dan’s party (yippie!). Oh, and to upload some random pictures from the class room. Yeah, pretty dark I know. If you have better pictures — can you send them to me and I upload them? Also, here are some pics from our office hours at Caffee Strada (thanks to Judith and Alex for a great job!):
- Random class room shot (2)
- Random class room shot
- Late night “office hours” at Jupiter’s
- Michi smiling with TGrep2 at his command (almost!)
- Judith and Alex working hard to spread the word of Switchboard
- Judith (at hour 2 of 6)
- hmm, probably at Jupiter’s again
LSA125-ers — thanks for an enjoyable class, for all the questions, and I hope you keep enjoying your projects (or, if nothing else, now know for certain that you really really never want to work with corpora
. Send us an update about your papers as they progress.
To everyone else out there: If you’re interested in the use of syntactic corpora to investigate language production, you may find our LSA125 class webpage useful (see especially the links and information on the corpus pages, but also the slides). If you use material from this page, please let us know. Thanks to Judith, we now have a nicely documented version of the TGrep2 Database Tools, which we have dubbed TDTlite. Alex and Judith have also prepared example projects. TDTlite allows you to combine the output of TGrep2 searchers on syntactic corpora into a nice tab-delimited database that can be importated into R, Excel, or the stats program of your choice. While it doesn’t give you the full flexibility of scripting things yourself, it makes it considerably easier to start your own corpus-based project. We’re in the progress of polishing things up for distribution (thanks to all the brave members of our class who helped us to understand which parts still need further improvement!). So, if something like that might be of interest to you, let us know whether you would like further information. We hope to have a beta release by the end of August.
Come to Cyberling
For all those of you at the LSA 09 Summer Institute: If you’re interested in replicability, scientific standards, resource creation and sharing, etc., come to next weekend’s Cyberling workshop. We need your input.
Two paper updates
Just a quick note. It feels good to be able to announce that the overview paper on cross-linguistic production (written with Elisabeth Norcliffe) is now available online. Thanks for your feedback and all the references that were sent our way.
Also: I’ve finally written up the first study from my thesis. Well, a considerably updated version of it. Anyway, if you’re interested in Uniform Information Density and/or the syntactic reduction of complement clauses in spontaneous speech … have a look at the paper… feedback is welcome. Oh, and did I mention that it is a corpus-based study
.
Over and out (from lovely Berkeley, enjoying the LSA Summer Institute)
….. during the LSA Summer Institute at Berkeley. Come and join us. More information and registration can be found here.






