I’ve just uploaded files containing some useful functions to a public git repository. You can see the files directly without worrying about git at all by visiting regression-utils.R (direct download) and mer-utils.R (direct download).
The first file contains functions that I use in building models. We all know that centering and standardizing regression predictors can reduce collinearity. This often leads to code like
library(lme4)
d <- within(sleepstudy, {
c.Days <- scale(Days, center = TRUE, scale = FALSE)
})
m <- lmer(Reaction ~ c.Days + (1 + c.Days | Subject), data = d)
That works fine, until you want to fit the same model to a subset of the data, like
m2 <- update(m, data = subset(d, Days > 4))
mean(m2@frame$c.Days) # != 0
In this case, c.Days will no longer be properly centered (its mean is now 2.5, not 0). If, however, we do the centering when we build the model, everything works nicely
m3 <- lmer(Reaction ~ scale(Days, center = TRUE, scale = FALSE) +
(1 + scale(Days, center = TRUE, scale = FALSE) | Subject),
data = d)
m4 <- update(m3, data = subset(d, Days > 4))
mean(m4@frame$`scale(Days, center = TRUE, scale = FALSE)`) # == 0
… except that the names of our predictors are now incredibly ugly. So regression-utils.R includes some shorthand functions for transformations commonly used in regression models. Using those functions, we can write
m5 <- lmer(Reaction ~ c.(Days) + (1 + c.(Days) | Subject), data = d)
m6 <- update(m5, data = subset(d, Days > 4))
mean(m6@frame$`c.(Days)`) # == 0
To me, this looks much nicer and makes it feasible to do transformations inside of formulas.
The functions contained in the file are:
-
c.(x): center a predictor -
z.(x): standardize (z-transform) a predictor -
r.(formula, ...): return standardized residuals from regressing a predictor against at least one other predictor -
s.(x): apply a transformation from Seber 1977 that puts the data in the range [-1,1] -
p.(x, ...): polynomial terms around x (uses orthogonal polynomials by default, see ?poly)
Now that we have a convenient way to reduce collinearity within our models (that can be reused on models fit to different subsets of the data), we want to measure the collinearity between the predictors. I’ve adapted three standard collinearity diagnostics to work directly on predictors in lmer glmer models. Let’s look at the effects of using orthogonal vs. natural polynomials on collinearity.
m.natural <- lmer(Reaction ~ p.(Days, 4, raw = TRUE) + (1 | Subject), data = sleepstudy)
m.orthogonal <- lmer(Reaction ~ p.(Days, 4) + (1 | Subject), data =sleepstudy)
## kappa, aka condition number.
## kappa < 10 is reasonable collinearity,
## kappa < 30 is moderate collinearity,
## kappa >= 30 is troubling collinearity
kappa.mer(m.natural) # 12.53
kappa.mer(m6) # 1.00, properly centered
## variance inflation factor, aka VIF
## values over 5 are troubling.
## should probably investigate anything over 2.5.
max(vif.mer(m.natural)) # 14.47
max(vif.mer(m.orthogonal)) # 1
## condition index and variance decomposition proportions,
## see ?colldiag from the package perturb
colldiag.mer(m.natural) # the quartic term has a high condition index, and shares a large portion of variance with the quadratic term
colldiag.mer(m.orthogonal) # all condition indeces are low, no need to worry about variance proportions
## highest correlation among predictors, can be found in triangle matrix output by summary() on a mer object
## investigate further for any absolute values greater than .4
maxcorr.mer(m.natural) # -0.96
maxcorr.mer(m.orthogonal) # 0.00
Patches and pull requests welcome!
mer-utils.R is looking good!
Please tell us how you would like this to be referenced in a paper. Better still, give a sample paragraph from a real or hypothetical paper that shows how to report the use of mer-utils.
Hi, I let Austin answer this, but I don’t think there is a paper (nor is there one planned, but maybe Austin will correct me). Maybe you can give Austin Frank (now at Haskins Labs, working with Jim Magnuson) a big shout out in the acknowledgments? Referring to the blog may help other folks to find this and other useful R code. Thank you!
Thanks for the input, Florian. I will do as you suggest for now, but note that we regularly reference code, as we do for R, R packages, and other software projects. I think that would look even better than an acknowledgment or footnote.
Hi Michael,
good point. Maybe Austin can put together a package with his functions. I’ll ask him.
Florian
Thanks for all of this, Austin! Question: You refer to kappa < 10 as reasonable. Baayen (2008:182) suggests < 6 is no collinearity; 6<K<30 medium. So about that 6-10 range (which, er, just happens to be where a model I'm looking at right now falls!)… Is there a definitive source on interpreting kappa to which you might point me?
Thanks!
Robin
Thanks for this Austin!
One question. You write:
“We all know that centering and standardizing regression predictors can reduce collinearity.”
What exactly do you mean here? Centering just shifts the values of a variable. This presumably makes interpretation of any interaction coefficient easier, but I am not sure it will alleviate problems of collinearity.
Hi Niels,
thanks for posting this question. Centering does reduce collinearity between predictors and their higher-order terms (such as interactions or e.g. polynomials).
Let me know whether you need more information or whether this makes sense. You might find some of the tutorials we have posted on the blog useful (e.g. Maureen Gillespie’s tutorial on coding, i think, talks about this, too http://hlplab.wordpress.com/2010/05/10/mini-womm/)
HTH,
Florian
Used your code to determine VIF for some multi-level modeling I’m doing. It worked like a charm. Many thanks!