29
Aug
09

Nagelkerke and CoxSnell Pseudo R2 for Mixed Logit Models

What to do when you need an intuitive measure of model quality for your logit (logistic) model? The problem is that logit models don’t have a nice measure such as R-square for linear models, which has a super intuitive interpretation. However, several pseudo R-square measures have been suggested are some are more commonly used (e.g. Nagelkerke R2). In R, some model-fitting procedures for ordinary logistic regression provide the Nagelkerke R-square as part of the standard output (e.g. lrm in Harrell’s Design package). However, no such measure is provided for the most widely used mixed logit model-fitting procedure (lmer in Bates’ lme4 library). Below I provide some code that provides Nagelkerke and CoxSnell pseudo R-squares for mixed logit models.

Disclaimer1: I myself am not convinced that using pseudo R-squares as measure of model quality for logistic models is such a great idea (there’s a reason why they are called pseudo R-squares), but audiences (including reviewers) unfamiliar with logit models often are understandably unsatisfied with model comparisons (of your model against the baseline model) as a measure of the quality of the model. If you end up using the code below, I recommend you also read up on the definition of the R-square measures.

Disclaimer 2: The code provided below is obviously a fast hack. If you build on it and clean it up, it would be really nice, if you shoot me a note so that I can link to it, or if you post it as a reply to this thread.

my.lmer.nagelkerke <- function(f, d) {
 lmer.full= lmer(formula= as.formula(paste(f, collapse="+")), d, family="binomial")
 logLik.lmer.full= as.numeric(logLik(lmer.full))
 N.lmer.full= nrow(lmer.full@X)
 cat(paste("Full mixed model: L=", logLik.lmer.full, ", N=", N.lmer.full, "\n", sep=""))

 lmer.intercept= lmer(formula= as.formula(paste(unlist(strsplit(f[1], "~"))[1], paste("1", f[2], sep=" + "), sep="~ ")), data= d, family="binomial")
 logLik.lmer.intercept= as.numeric(logLik(lmer.intercept))
 N.lmer.intercept= nrow(lmer.intercept@X)
 cat(paste("Intercept mixed model: L=", logLik.lmer.intercept, ", N=", N.lmer.intercept, "\n", sep=""))

 lrm.full= lrm(formula= as.formula(f[1]), data= d)
 logLik.lrm.intercept= as.numeric(deviance(lrm.full)[1] / - 2)
 N.lrm.intercept= as.numeric(lrm.full$stats[1])
 cat(paste("Intercept ordinary model: L=", logLik.lrm.intercept, ", N=", N.lrm.intercept, "\n", sep=""))

 coxsnell.lmer= 1 - exp((logLik.lmer.intercept - logLik.lmer.full) * (2/N.lmer.full))
 nagelkerke.lmer= coxsnell.lmer / (1 - exp(logLik.lmer.intercept * (2/N.lmer.full)))
 cat(paste("Full model evaluated against mixed intercept model:\n\tCoxSnell R2: ", coxsnell.lmer, "\n\tNagelkerke R2: ", nagelkerke.lmer,"\n", sep=""))
 coxsnell.lrm= 1 - exp((logLik.lrm.intercept - logLik.lmer.full) * (2/N.lmer.full))
 nagelkerke.lrm= coxsnell.lrm / (1 - exp(logLik.lrm.intercept * (2/N.lmer.full)))
 cat(paste("Full model evaluated against ordinary intercept model:\n\tCoxSnell R2: ", coxsnell.lrm, "\n\tNagelkerke R2: ", nagelkerke.lrm,"\n", sep=""))
 coxsnell.lrm.2= 1 - exp((logLik.lrm.intercept - logLik.lmer.intercept) * (2/N.lmer.full))
 nagelkerke.lrm.2= coxsnell.lrm.2 / (1 - exp(logLik.lrm.intercept * (2/N.lmer.full)))
 cat(paste("Mixed intercept model evaluated against ordinary intercept model:\n\tCoxSnell R2: ", coxsnell.lrm.2, "\n\tNagelkerke R2: ", nagelkerke.lrm.2,"\n", sep=""))
}

Input: The function expects a binary character vector as first argument, which should be the fixed-effect part of the formula and the random effect part, and it expects the data set as second argument.

NB: The data set should only contain cases for which all predictors of the full model are defined.

Output: The function provides CoxSnell and Nagelkerke R-squares for the full model compared against two baseline models, (1) a mixed logit model with only the intercept and the random effects and (2) an ordinary logit model with only the intercept. It also provides (3) the R-square measures for (1) compared against the baseline model (2).

This can be seen as providing pseudo measures of the variance accounted for by the fixed effects compare to (1) the baseline mixed model and compared (2) the ordinary baseline model, while at the same providing a measure of how much the random effects account for compared to the ordinary baseline model.

HTH. Please improve, debug, etc.


4 Responses to “Nagelkerke and CoxSnell Pseudo R2 for Mixed Logit Models”


  1. December 7, 2010 at 11:50 pm

    This is nice little function! I have used your basic algorithm to create a “drop1″ function that computes the pseudo-R2 for each predictor in your model. It requires 2 arguments: the formula for the full lmer model as a character string, and the data frame. For example:

    my.lmer.nagelkerke.drop1(“Response ~ factor1 + factor1 + (1|factor3) + (1|factor4)”,data)

    Enjoy, and extend/debug!

    my.lmer.nagelkerke.drop1 <- function(f, d) {
    resp = unlist(strsplit(f,"[ ]*~[ ]*"))[1]
    factors = unlist(strsplit(unlist(strsplit(f,"[ ]*~[ ]*"))[2],"[ ]*\\+[ ]*"))

    lmer.full= lmer(formula= as.formula(paste(resp,paste(factors,collapse=" + "),sep=" ~ ")), d, family="binomial")
    logLik.lmer.full= as.numeric(logLik(lmer.full))
    N.lmer.full= nrow(lmer.full@X)
    cat(paste("Full mixed model: L=", logLik.lmer.full, ", N=", N.lmer.full, "\n", sep=""))

    for(factor in factors) {
    myFormula = as.formula(paste(resp,paste(factors[-which(factors==factor)],collapse=" + "),sep=" ~ "))
    lmer.intercept= lmer(formula= myFormula, data= d, family="binomial")
    logLik.lmer.intercept= as.numeric(logLik(lmer.intercept))
    N.lmer.intercept= nrow(lmer.intercept@X)
    #cat(paste("Intercept mixed model: L=", logLik.lmer.intercept, ", N=", N.lmer.intercept, "\n", sep=""))

    coxsnell.lmer= 1 – exp((logLik.lmer.intercept – logLik.lmer.full) * (2/N.lmer.full))
    nagelkerke.lmer= coxsnell.lmer / (1 – exp(logLik.lmer.intercept * (2/N.lmer.full)))
    cat(paste(factor,":\tCoxSnell R2: ", format(coxsnell.lmer), "\tNagelkerke R2: ", format(nagelkerke.lmer),"\n", sep=""))
    }
    }

  2. 2 fahad
    January 9, 2011 at 2:47 pm

    hi,

    how Pseudo-R2 is calculated from the SPSS table under the heading of ‘Variables in the Equation’

    because there are different variables
    which are
    B, S.E. , Wald , df , Sig , Exp(B)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Blog Stats

  • 117,920 hits

 

August 2009
M T W T F S S
« Jul   Oct »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Categories

RSS Language Log


Follow

Get every new post delivered to your Inbox.