Author: aufrank

Diagnosing collinearity in mixed models from lme4

Posted on Updated on

I’ve just uploaded files containing some useful functions to a public git repository. You can see the files directly without worrying about git at all by visiting regression-utils.R (direct download) and mer-utils.R (direct download). Read the rest of this entry »

Multinomial random effects models in R

Posted on Updated on

This post is partly a response to this message. The author of that question is working on ordered categorical data. For that specific case, there are several packages in R that might work, none of which I’ve tried. The most promising is the function DPolmm() from DPpackage. It’s worth noting, though, that in that package you are committed to a Dirichlet Process prior for the random effects (instead of the more standard Gaussian). A different package, mprobit allows one clustering factor. This could be suitable, depending on the data set. MNP, mlogit, multinomRob, vbmp, nnet, and msm all offer some capability of modeling ordered categorical data, and it’s possible that one of them allows for random effects (though I haven’t discovered any yet). MCMCpack may also be useful, as it provides MCMC implementations for a large class of regression models. lrm() from the Design package handles ordered categorical data, and clustered bootstrap sampling can be used for a single cluster effect.

I’ve recently had some success using MCMCglmm for the analysis of unordered multinomial data, and want to post a quick annotated example here. It should be noted that the tutorial on the CRAN page is extremely useful, and I encourage anyone using the package to work through it.

I’m going to cheat a bit in my choice of data sets, in that I won’t be using data from a real experiment with a multinomial (or polychotomous) outcome. Instead, I want to use a publicly available data set with some relevance to language research. I also need a categorical dependent variable with more than two levels for this demo to be interesting. Looking through the data sets provided in the languageR package, I noticed that the dative data set has a column SemanticClass which has five levels. We’ll use this as our dependent variable for this example. We’ll investigate whether the semantic class of a ditransitive event is influenced by the modality in which it is produced (spoken or written).

library(MCMCglmm)
data("dative", package = "languageR")

k <- length(levels(dative$SemanticClass))
I <- diag(k-1)
J <- matrix(rep(1, (k-1)^2), c(k-1, k-1))

m <- MCMCglmm(SemanticClass ~ -1 + trait + Modality,
              random = ~ us(trait):Verb + us(Modality):Verb,
              rcov = ~ us(trait):units,
              prior = list(
                R = list(fix=1, V=0.5 * (I + J), n = 4),
                G = list(
                  G1 = list(V = diag(4), n = 4),
                  G2 = list(V = diag(2), n = 2))),
              burnin = 15000,
              nitt = 40000,
              family = "categorical",
              data = dative)

Read on for an explanation of this model specification, along with some functions for evaluating the model fit.

Read the rest of this entry »

Using WinBUGS on an PPC OSX laptop connected to a Linux server

Posted on Updated on

While R is an excellent tool for a wide variety of statistical analyses, it’s not the only game in town. Practitioners of Bayesian statistics have a few other tools that complement R nicely. One case where R originally lagged was in offering a general-purpose MCMC sampler. That situation has largely changed, but there are still cases where you might want to look outside of the R toolbox. In particular, certain Bayesian stats books are written with the assumption that exercises and examples can be executed in WinBUGS. While there is (just) another Gibbs sampler that runs natively on OSX and linux, JAGS can’t read WinBUGS .odc files.

Read on to see how I got WinBUGS running on my PowerPC OSX laptop connect to a linux server.

Read the rest of this entry »

New text resources available

Posted on Updated on

Two new resources have recently become available that may be of interest to the NLP and Psycholingustics communities.  First, the New York Times has released “The New York Times Annotated Corpus”.  It’s available through the LDC.  It’s been marked up with tags for people, places, topics, and organizations.  650,000 of the 1.8 million articles  contain human-written summaries (36%).  The LDC listing can be found here. A nice write up of the release is at the NYT Open Blog. Read the rest of this entry »