Archive for the 'statistics/R' Category

01
Jun
12

transferring installed packages to a different installation of R

It used to take me a while to reinstall all the R packages that I use after upgrading to a new version of R.  I couldn’t think of another way to do this than to create a list of installed packages by examining the R package directory, and to manually select and install each one of those packages in the new version of R.  In order to ensure that my home and office installation of R had the same packages installed, I did something similar.

I recently discovered that there is a much, much easier way to transfer the packages that you have installed to a different installation of R.  I found some R code on the web that I adapted to my needs.  Here is what you need to do:

1. Run the script “store_packages.R” in your current version of R.

# store_packages.R
#
# stores a list of your currently installed packages

tmp = installed.packages()

installedpackages = as.vector(tmp[is.na(tmp[,"Priority"]), 1])
save(installedpackages, file=”~/Desktop/installed_packages.rda”)

(Make sure that all the quotation marks in the script are straight.  The scripts will generate an error if they include any curly quotation marks.  For some reason, when I saved this blog entry, some quotation marks changed to curly ones.  WordPress is probably to blame for this problem, which I have not been able to fix.)

2. Close R.  Open the installation of R that you want the packages to be installed in.

3. Run the script “restore_packages.R”.

# restore_packages.R
#
# installs each package from the stored list of packages

load(“~/Desktop/installed_packages.rda”)

for (count in 1:length(installedpackages)) install.packages(installedpackages[count])

Note that if you want to install the list of packages in an installation of R on a different computer, you should transfer the .rda file that is created by the store_packages script to that computer, and make sure that the path for the “load” command in the restore_packages script is set to the right location.

18
Apr
12

The reproducibility project

Thanks to Anne Pier Salverda who made me aware of this project to replicate all studies in certain psych journals, including APA journals that publish psycholinguistic work, such as JEP:LMC. This might be a fine April fools joke slightly delayed, but it sure is a great idea! In a similar study researchers apparently found that 6 out of 53 cancer studies replicated (see linked article).

And while we are at it, here’s an article that, if followed, is guaranteed to increase the proportion of replications (whereas power, effect sizes, lower p-values, family-wise error corrections, min-F and all the other favorites out there are pretty much guaranteed to not do the job). Simmons et al 2011, published in Psychological Science, shows what we should all know but that is all too often forgotten or belittled: lax criteria in excluding data, adding additional subjects, transforming data, adding or removing covariates inflate the Type I error rate (in combination easily up to over 80% false negatives for p<.05!!!).  Enjoy.

20
Mar
12

Correlation plot matrices using the ellipse library

My new favorite library is the ellipse library. It includes functions for creating ellipses from various objects. It has a function, plotcorr() to create a correlation matrix where each correlation is represented with an ellipse approximating the shape of a bivariate normal distribution with the same correlation. While the function itself works well, I wanted a bit more redundancy in my plots and modified the code. I kept (most of) the main features provided by the function and I’ve included a few: the ability to plot ellipses and correlation values on the same plot, the ability to manipulate what is placed along the diagonal and the rounding behavior of the numbers plotted. Here is an example with some color manipulations. The colors represent the strength and direction of the correlation, -1 to 0 to 1, with University of Rochester approved red to white to blue.

First the function code:

Continue reading ‘Correlation plot matrices using the ellipse library’

17
Nov
11

Lot’s of zeros? Be careful with your chi-square (exact or not) and alike

If you’re running chi-squares to analyze categorical data and you have lots of very low count (or even 0 cells), be careful in how to interpret the result. There’s a nice article by Andrew Gelman on this topic, where he shows that the problem is that all the low counts can make it harder to detect the signal (and hence a significant deviation from the expected values for a part of the table). Put differently, you might have a significant pattern, but not detect. I don’t think it’s so much a problem for most of the tests we conduct since contingency tables in psycholinguistic and linguistic research are usually rather small. I can’t recall the last time that I saw anything larger than a 3×4 or alike. From what I understand from the Gelman’s post, it would seem that the problem he points out becomes more serious the larger the table is.

27
Jul
11

New R resource for ordinary and multilevel regression modeling

Here’ s what I received from the Center of Multilevel Modeling at Bristol (I haven’t checked it out yet; registration seems to be free but required):

The Centre for Multilevel Modelling is very pleased to announce the addition of
R practicals to our free on-line multilevel modelling course. These give
detailed instructions of how to carry out a range of analyses in R, starting
from multiple regression and progressing through to multilevel modelling of
continuous and binary data using the lmer and glmer functions.

MLwiN and Stata versions of these practicals are already available.
You will need to log on or register onto the course to view these
practicals.

Read More...
http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13
13
Jul
11

R code for Jaeger, Graff, Croft and Pontillo (2011): Mixed effect models for genetic and areal dependencies in linguistic typology: Commentary on Atkinson

Below I am sharing the R code for our paper on the serial founder effect:
This paper is a commentary on Atkinson’s 2011 Science article on the serial founder model (see also this interview with ScienceNews, in which parts of our comment in Linguistic Typology and follow-up work are summarized). In the commentary, we provide an introduction to linear mixed effect models for typological research. We discuss how to fit and to evaluate these models, using Atkinson’s data as an example.We illustrate the use of crossed random effects to control for genetic and areal relations between languages. We also introduce a (novel?) way to model areal dependencies based on an exponential decay function over migration distances between languages.
Finally, we discuss limits to the statistical analysis due to data sparseness. In particular, we show that the data available to Atkinson did not contain enough language families with sufficiently many languages to test whether the observed effect holds once random by-family slopes (for the effect) are included in the model. We also present simulations that show that the Type I error rate (false rejections) of the approach taken in Atkinson is many times higher than conventionally accepted (i.e. above .2 when .05 is the conventionally accepted rate of Type errors).
The scripts presented below are not intended to allow full replication of our analyses (they lack annotation and we are not allowed to share the WALS data employed by Atkinson on this site anyway). However, there are many plots and tests in the paper that might be useful for typologists or other users of mixed models. For that reason, I am for now posting the raw code. Please comment below if you have questions and we will try to provide additional annotation for the scripts as needed and as time permits. If you find (parts of the) script(s) useful, please consider citing our article in Linguistic Typology.
25
Jun
11

More on random slopes and what it means if your effect is not longer significant after the inclusion of random slopes

I thought the following snippet from a somewhat edited email I recently wrote in reply to a question about random slopes and what it means that an effect becomes insignificant might be helpful to some. I also took it as an opportunity to updated the procedure I described at http://hlplab.wordpress.com/2009/05/14/random-effect-structure/. As always, comments are welcome. What I am writing below are just suggestions.

[...] an insignificant effect in an (1 + factor|subj) model means that, after controlling for random by-subject variation in the slope/effect of factor, you find no (by-convention-significant) evidence for the effect. Like you suggest, this is due to the fact that there is between-subject variability in the slope that is sufficiently large to let us call into question the hypothesis that the ‘overall’ slope is significantly different from zero.

[...] So, what’s the rule of thumb here? If you run any of the standard simple designs (2×2, 2×3, 2x2x2,etc.) and you have the psychologist’s luxury of plenty of data (24+item, 24+ subject [...]), the full random effect structure is something you should entertain as your starting point. That’s in Clark’s spirit. That’s what F1 and F2 were meant for. [...] All of these approaches do not just capture random intercept differences by subject and item. They also aim to capture random slope differences.

[...] here’s what I’d recommend during tutorials now because it often saves time for psycholinguistic data. I am only writing down the random effects but, of course, I am assuming there are fixed effects, too, and that your design factors will remain in the model. Let’s look at a 2×2 design: Continue reading ‘More on random slopes and what it means if your effect is not longer significant after the inclusion of random slopes’

31
May
11

Two interesting papers on mixed models

While searching for something else, I just came across two papers that should be of interest to folks working with mixed models.

  • Schielzeth, H. and Forstmeier, W. 2009. Conclusions beyond support: overconfident estimates in mixed models. Behavioral Ecology Volume 20, Issue 2, 416-420.  I have seen the same point being made in several papers under review and at a recent CUNY (e.g. Doug Roland’s 2009? CUNY poster). On the one hand, it should be absolutely clear that random intercepts alone are often insufficient to account for violations of independence (this is a point, I make every time I am teaching a tutorial). On the other hand, I have reviewed quite a number of papers, where this mistake was made. So, here you go. Black on white. The moral is (once again) that no statistical procedure does what you think it should do if you don’t use it the way it was intended to.
  • The second paper takes on a more advanced issue, but one that is becoming more and more relevant. How can we test whether a random effect is essentially non-necessary – i.e. that it has a variance of 0? Currently, most people conduct model comparison (following Baayen, Davidson and Bates, 2008).  But this approach is not recommended (and neither do Baayen et al recommend it) if we want to test whether all random effects can be completely removed from the model (cf. the very useful R FAQ list, which states “do not compare lmer models with the corresponding lm fits, or glmer/glm; the log-likelihoods [...] include different additive terms”). This issue is taken on in Scheipl, F., Grevena, S. and Küchenhoff, H. 2008. Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models. Computational Statistics & Data Analysis.Volume 52, Issue 7, 3283-3299. They present power comparisons of various tests.
31
May
11

Mixed model’s and Simpson’s paradox

For a paper I am currently working on, I started to think about Simpson’s paradox, which wikipedia succinctly defines as

“a paradox in which a correlation (trend) present in different groups is reversed when the groups are combined. This result is often encountered in social-science [...]“

The wikipedia page also gives a nice visual illustration. Here’s my own version of it. The plot shows 15 groups, each with 20 data points. The groups happen to order along the x-axis (“Pseudo distance from origin”) in a way that suggests a negative trend of the Pseudo distance from origin against the outcome (“Pseudo normalized phonological diversity”). However, this trend does not hold within groups. As a matter of fact, in this particular sample, most groups show the opposite of the global trend (10 out of 15 within-group slopes are clearly positive). If this data set is analyzed by an ordinary linear regression (which does not have access to the grouping structure), the result will be a significant negative slope for the Pseudo distance from origin. So, I got curious: what about linear mixed models?

Continue reading ‘Mixed model’s and Simpson’s paradox’

26
Feb
11

Bayesian Data Analysis, p-values, and more: What do we need?

Some of you might find this open letter by John Kruschke (Indiana University) interesting. He is making a passionate argument to abandon traditional “20th century” data analysis in favor of Bayesian approaches.

24
Feb
11

Diagnosing collinearity in mixed models from lme4

I’ve just uploaded files containing some useful functions to a public git repository. You can see the files directly without worrying about git at all by visiting regression-utils.R (direct download) and mer-utils.R (direct download). Continue reading ‘Diagnosing collinearity in mixed models from lme4′

15
Jun
10

R code for LaTeX tables of lmer model effects

Here’s some R code that outputs text on the console that you can copy-paste into a .tex file and creates nice LaTeX tables of fixed effects of lmer models (only works for family=”binomial”). Effects <.05 will appear in bold. The following code produces the table pasted below. It assumes the model mod.all. prednames creates a mapping from predictor names in the model to predictor names you want to appear in the table. Note that for the TeX to work you need to include \usepackage{booktabs} in the preamble.
Continue reading ‘R code for LaTeX tables of lmer model effects’

10
May
10

Mini-WOMM Montreal Slides Now Available…

Florian, Maureen Gillespie and I taught a mini-Workshop on Ordinary and Multilevel Models at McGill University last week. Our slides are available here:

Lecture1McGill (F. Jaeger)

Lecture2McGill (F. Jaeger)

CodingTutorial (M. Gillespie)

ModelComparisonTutorial (P. Graff)

Other materials, data files and scripts can be found here:

http://wiki.bcs.rochester.edu/HlpLab/StatsCourses

Feedback, comments and questions are much appreciated!

16
Apr
10

Annotated example analysis using mixed models

Jessica Nelson (Learning Research and Development Center, University of Pittsburgh) uploaded a step-by-step example analysis using mixed models to her blog. Each step is nicely annotated and Jessica also discusses some common problems she encountered while trying to analyze her data using mixed models. I think this is a nice example for anyone trying to learn to use mixed models. It goes through all/most of the steps outlined in Victor Kuperman and my WOMM tutorial (click on the graph to see it full size):

15
Apr
10

Coming up: Mini-WoMM in Montreal

If you in the Montreal area, consider joining us for a Workshop on Ordinary and Multilevel Models to be held 5/3-4 at McGill and organized by Michael Wagner, Aparna Nadig, and Kris Onishi. The workshop will include the usual intros to linear regression, linear mixed models, logistic regression, and mixed logit models. We will also discuss common issues and solutions to regression modeling. Additionally, we will have a couple of special area lectures/tutorials:

  • Maureen Gillespie (Northeastern) will talk about different ways to code your variables and how that relates to the specific hypotheses you’re testing.
  • Peter Graff (MIT) will give a tutorial on logistic regression, specifically to test linguistic theories. In all likelihood, he will also sing. Which relates to the previous post, because he likes to sing about OT.

So, join us! I think there also will be a party =). Below is the full invitation (some details may change). Continue reading ‘Coming up: Mini-WoMM in Montreal’




Blog Stats

  • 117,920 hits

 

June 2012
M T W T F S S
« May    
 123
45678910
11121314151617
18192021222324
252627282930  

Categories

RSS Language Log


Follow

Get every new post delivered to your Inbox.