What to do when you need an intuitive measure of model quality for your logit (logistic) model? The problem is that logit models don’t have a nice measure such as R-square for linear models, which has a super intuitive interpretation. However, several pseudo R-square measures have been suggested are some are more commonly used (e.g. Nagelkerke R2). In R, some model-fitting procedures for ordinary logistic regression provide the Nagelkerke R-square as part of the standard output (e.g. lrm in Harrell’s Design package). However, no such measure is provided for the most widely used mixed logit model-fitting procedure (lmer in Bates’ lme4 library). Below I provide some code that provides Nagelkerke and CoxSnell pseudo R-squares for mixed logit models.
Disclaimer1: I myself am not convinced that using pseudo R-squares as measure of model quality for logistic models is such a great idea (there’s a reason why they are called pseudo R-squares), but audiences (including reviewers) unfamiliar with logit models often are understandably unsatisfied with model comparisons (of your model against the baseline model) as a measure of the quality of the model. If you end up using the code below, I recommend you also read up on the definition of the R-square measures.
Disclaimer 2: The code provided below is obviously a fast hack. If you build on it and clean it up, it would be really nice, if you shoot me a note so that I can link to it, or if you post it as a reply to this thread.
my.lmer.nagelkerke <- function(f, d) {
lmer.full= lmer(formula= as.formula(paste(f, collapse="+")), d, family="binomial")
logLik.lmer.full= as.numeric(logLik(lmer.full))
N.lmer.full= nrow(lmer.full@X)
cat(paste("Full mixed model: L=", logLik.lmer.full, ", N=", N.lmer.full, "\n", sep=""))
lmer.intercept= lmer(formula= as.formula(paste(unlist(strsplit(f[1], "~"))[1], paste("1", f[2], sep=" + "), sep="~ ")), data= d, family="binomial")
logLik.lmer.intercept= as.numeric(logLik(lmer.intercept))
N.lmer.intercept= nrow(lmer.intercept@X)
cat(paste("Intercept mixed model: L=", logLik.lmer.intercept, ", N=", N.lmer.intercept, "\n", sep=""))
lrm.full= lrm(formula= as.formula(f[1]), data= d)
logLik.lrm.intercept= as.numeric(deviance(lrm.full)[1] / - 2)
N.lrm.intercept= as.numeric(lrm.full$stats[1])
cat(paste("Intercept ordinary model: L=", logLik.lrm.intercept, ", N=", N.lrm.intercept, "\n", sep=""))
coxsnell.lmer= 1 - exp((logLik.lmer.intercept - logLik.lmer.full) * (2/N.lmer.full))
nagelkerke.lmer= coxsnell.lmer / (1 - exp(logLik.lmer.intercept * (2/N.lmer.full)))
cat(paste("Full model evaluated against mixed intercept model:\n\tCoxSnell R2: ", coxsnell.lmer, "\n\tNagelkerke R2: ", nagelkerke.lmer,"\n", sep=""))
coxsnell.lrm= 1 - exp((logLik.lrm.intercept - logLik.lmer.full) * (2/N.lmer.full))
nagelkerke.lrm= coxsnell.lrm / (1 - exp(logLik.lrm.intercept * (2/N.lmer.full)))
cat(paste("Full model evaluated against ordinary intercept model:\n\tCoxSnell R2: ", coxsnell.lrm, "\n\tNagelkerke R2: ", nagelkerke.lrm,"\n", sep=""))
coxsnell.lrm.2= 1 - exp((logLik.lrm.intercept - logLik.lmer.intercept) * (2/N.lmer.full))
nagelkerke.lrm.2= coxsnell.lrm.2 / (1 - exp(logLik.lrm.intercept * (2/N.lmer.full)))
cat(paste("Mixed intercept model evaluated against ordinary intercept model:\n\tCoxSnell R2: ", coxsnell.lrm.2, "\n\tNagelkerke R2: ", nagelkerke.lrm.2,"\n", sep=""))
}
Input: The function expects a binary character vector as first argument, which should be the fixed-effect part of the formula and the random effect part, and it expects the data set as second argument.
NB: The data set should only contain cases for which all predictors of the full model are defined.
Output: The function provides CoxSnell and Nagelkerke R-squares for the full model compared against two baseline models, (1) a mixed logit model with only the intercept and the random effects and (2) an ordinary logit model with only the intercept. It also provides (3) the R-square measures for (1) compared against the baseline model (2).
This can be seen as providing pseudo measures of the variance accounted for by the fixed effects compare to (1) the baseline mixed model and compared (2) the ordinary baseline model, while at the same providing a measure of how much the random effects account for compared to the ordinary baseline model.
HTH. Please improve, debug, etc.
This is nice little function! I have used your basic algorithm to create a “drop1″ function that computes the pseudo-R2 for each predictor in your model. It requires 2 arguments: the formula for the full lmer model as a character string, and the data frame. For example:
my.lmer.nagelkerke.drop1(“Response ~ factor1 + factor1 + (1|factor3) + (1|factor4)”,data)
Enjoy, and extend/debug!
my.lmer.nagelkerke.drop1 <- function(f, d) {
resp = unlist(strsplit(f,"[ ]*~[ ]*"))[1]
factors = unlist(strsplit(unlist(strsplit(f,"[ ]*~[ ]*"))[2],"[ ]*\\+[ ]*"))
lmer.full= lmer(formula= as.formula(paste(resp,paste(factors,collapse=" + "),sep=" ~ ")), d, family="binomial")
logLik.lmer.full= as.numeric(logLik(lmer.full))
N.lmer.full= nrow(lmer.full@X)
cat(paste("Full mixed model: L=", logLik.lmer.full, ", N=", N.lmer.full, "\n", sep=""))
for(factor in factors) {
myFormula = as.formula(paste(resp,paste(factors[-which(factors==factor)],collapse=" + "),sep=" ~ "))
lmer.intercept= lmer(formula= myFormula, data= d, family="binomial")
logLik.lmer.intercept= as.numeric(logLik(lmer.intercept))
N.lmer.intercept= nrow(lmer.intercept@X)
#cat(paste("Intercept mixed model: L=", logLik.lmer.intercept, ", N=", N.lmer.intercept, "\n", sep=""))
coxsnell.lmer= 1 – exp((logLik.lmer.intercept – logLik.lmer.full) * (2/N.lmer.full))
nagelkerke.lmer= coxsnell.lmer / (1 – exp(logLik.lmer.intercept * (2/N.lmer.full)))
cat(paste(factor,":\tCoxSnell R2: ", format(coxsnell.lmer), "\tNagelkerke R2: ", format(nagelkerke.lmer),"\n", sep=""))
}
}
hi,
how Pseudo-R2 is calculated from the SPSS table under the heading of ‘Variables in the Equation’
because there are different variables
which are
B, S.E. , Wald , df , Sig , Exp(B)
I am not sure I understand.