Posts Tagged ‘R-code

29
Aug
09

Nagelkerke and CoxSnell Pseudo R2 for Mixed Logit Models

What to do when you need an intuitive measure of model quality for your logit (logistic) model? The problem is that logit models don’t have a nice measure such as R-square for linear models, which has a super intuitive interpretation. However, several pseudo R-square measures have been suggested are some are more commonly used (e.g. Nagelkerke R2). In R, some model-fitting procedures for ordinary logistic regression provide the Nagelkerke R-square as part of the standard output (e.g. lrm in Harrell’s Design package). However, no such measure is provided for the most widely used mixed logit model-fitting procedure (lmer in Bates’ lme4 library). Below I provide some code that provides Nagelkerke and CoxSnell pseudo R-squares for mixed logit models. Continue reading ‘Nagelkerke and CoxSnell Pseudo R2 for Mixed Logit Models’

06
Jun
09

little function to clean up factor variables after subset-ing

I am always annoyed that one has to remind R to reduce the number of levels of a factor after a subset (of the original data set) has been created. In addition to screwing up tables (b/c they will contain all the zero rows/columns, too), this also can affect comparison of factor values (“Factors do not have the same number of levels”), and it makes RData files much bigger than they need to be. In our lab, we often work with large data files (up to 800,000 rows and 100-350 variables are relatively common), so that an RData file containing just that data.frame can easily be 100MB+. Say, you select 5,000 rows out of 800,000, that may still leave you at an RData file size of 50MB+ because R remembers all original levels for all factors still in the data.frame. The little script I attach below, takes either a factor or a data.frame as input and returns the factor or the data.frame in such a way that only levels still in the data are considered. In the files, Ir recently worked with, that reduced the file size by 90%+, which in turns leads to a considerable speed-up in analyzing the data (I mean, on a small laptop, you will definitely feel the difference …). Anyway, nothing big, but I maybe some of you may find it useful: Continue reading ‘little function to clean up factor variables after subset-ing’

07
May
09

Multinomial random effects models in R

This post is partly a response to this message. The author of that question is working on ordered categorical data. For that specific case, there are several packages in R that might work, none of which I’ve tried. The most promising is the function DPolmm() from DPpackage. It’s worth noting, though, that in that package you are committed to a Dirichlet Process prior for the random effects (instead of the more standard Gaussian). A different package, mprobit allows one clustering factor. This could be suitable, depending on the data set. MNP, mlogit, multinomRob, vbmp, nnet, and msm all offer some capability of modeling ordered categorical data, and it’s possible that one of them allows for random effects (though I haven’t discovered any yet). MCMCpack may also be useful, as it provides MCMC implementations for a large class of regression models. lrm() from the Design package handles ordered categorical data, and clustered bootstrap sampling can be used for a single cluster effect.

I’ve recently had some success using MCMCglmm for the analysis of unordered multinomial data, and want to post a quick annotated example here. It should be noted that the tutorial on the CRAN page is extremely useful, and I encourage anyone using the package to work through it.

I’m going to cheat a bit in my choice of data sets, in that I won’t be using data from a real experiment with a multinomial (or polychotomous) outcome. Instead, I want to use a publicly available data set with some relevance to language research. I also need a categorical dependent variable with more than two levels for this demo to be interesting. Looking through the data sets provided in the languageR package, I noticed that the dative data set has a column SemanticClass which has five levels. We’ll use this as our dependent variable for this example. We’ll investigate whether the semantic class of a ditransitive event is influenced by the modality in which it is produced (spoken or written).

library(MCMCglmm)
data("dative", package = "languageR")

k <- length(levels(dative$SemanticClass))
I <- diag(k-1)
J <- matrix(rep(1, (k-1)^2), c(k-1, k-1))

m <- MCMCglmm(SemanticClass ~ -1 + trait + Modality,
              random = ~ us(trait):Verb + us(Modality):Verb,
              rcov = ~ us(trait):units,
              prior = list(
                R = list(fix=1, V=0.5 * (I + J), n = 4),
                G = list(
                  G1 = list(V = diag(4), n = 4),
                  G2 = list(V = diag(2), n = 2))),
              burnin = 15000,
              nitt = 40000,
              family = "categorical",
              data = dative)

Read on for an explanation of this model specification, along with some functions for evaluating the model fit.

Continue reading ‘Multinomial random effects models in R’

27
Apr
09

Centering several variables

One of the most common issues in regression analyses of even balanced experimental data is collinearity between main effects and interactions. To avoid this problem, a simple first step is to center all predictors. In my experience folks often fail to do that simply because it’s a bit more work and we’re all lazy. So here’s an attempt at a simple R function that takes single variables as well as entire dataframes. Continue reading ‘Centering several variables’

20
Apr
09

Jaeger (2008), J Memory Language, 59, 434-446

Since I get asked for the R code I promised in my 2008 JML paper on mixed logit models every now and then, I have posted it here. Note, however, that the data analyzed in that paper is not mine and you need to acquire it from the Inbal Arnon, who conducted the study. If you try to work your way through my paper, you may also find the following wiki pages from our lab with readings and more code helpful:

http://wiki.bcs.rochester.edu:2525/HlpLab/StatsCourses/

As a quick intro you may find the talks from a recent workshop on the conceptual background, common issues and solutions for ordinary and multilevel regression models that some colleagues (Dale Barr, Roger Levy, Harald Baayen, Victor Kuperman, Austin Frank) and I gave at the CUNY sentence processing conference 2009 useful. The talk slides are all linked to the schedule on that page. You’ll find detailed walk-throughs, R code, and a conceptual overviews.

I appreciate if you leave a comment here in case this was useful. It helps to see what we should be posting. cheers.

23
Jan
08

Modeling self-paced reading data: Effects of word length, word position, spill-over, etc.

I’ve been using a two-step approach, where in the first step I use all data (including fillers, but not practice items) of an experiment to fit a model of log-transformed raw reading times with:

23
Jan
08

R-code for reading time data preparation

Some time ago I posted some R-code on how to create spill-over data from a linger reading time file (for spill-over analysis of self-paced reading time data). Here are the steps that need to be done prior to that, importing from a linger file, data preparation, outlier check, etc.

 Continue reading 'R-code for reading time data preparation'
04
Dec
07

Categorical Data Analysis

If you’re interested in getting an intro to ordinary logistic regression and mixed logit models (logit models with random subject and item modeling) and why not to blindly trust ANOVA over proportions, even when they are based on (arcsine) transformed proportions, check out my paper on categorical data analysis (accepted for publication in JML; version prior to proofs). Feedback is welcome, but at this point I can’t really change that much.

Cheers to all you folks who helped me with this!

03
Dec
07

R-code for visual model summaries: linear mixed models

Here is some code to summarize the coefficients of a linear mixed model that produces nice graphs like the following one (well, the curved arrows were added in powerpoint): [click to see a larger version]

An example slide of a linear mixed model summary

But first some background about the example model: Continue reading ‘R-code for visual model summaries: linear mixed models’

23
Nov
07

Spill-over effects in self-paced reading

I’ve been working on some R-code for spill-over analysis for self-paced reading experiments. I’ll be posting the actual analysis a later. Here’s some code that adds the spill-over from previous words to each word: Continue reading ‘Spill-over effects in self-paced reading’




Blog Stats

  • 33,649 hits

Categories

Archives

 

November 2009
M T W T F S S
« Oct    
 1
2345678
9101112131415
16171819202122
23242526272829
30