Some time ago, I announced that some folks have been thinking about organizing a small workshop on common issues and standards in regression modeling (including multilevel models) in psycholinguistic research to be held the day before CUNY 2009 (i.e. 03/25 at UC Davis). Here’s an update on this “workshop” along with some thoughts for planning.
The target audience for this workshop are researchers that have already used regression models for their research.
The goal is a discussion of quality standards for the field (how should we fit models? when can we trust them? what should we report in papers?). We hope to gather interested researchers who are beginning to use regression and multilevel methods and “experts” who have already used them for some time. Undoubtedly, it will not be the case that there is only one correct statistical method for each problem, but hopefully the discussion will lead to some common ground and increased awareness of what experienced regression users consider “good”. It will be impossible to arrive at standards for all types of models, but the goal is to work towards some general minimum quality standards. Similarly, I don’t anticipate that we will solve all questions that have been sent to me, but maybe we make some progress.
The structure of the workshop will probably consist of some overview talks (see below for topics) and enough time for a forum/discussion and maybe an invited talk on future perspectives, conceptual background of multilevel models, and/or advanced implementation details of currently available multilevel fitting methods.
Invited speaker(s)? Given the spontaneous decision to have this workshop, it’s going to be hard to get invited speakers from outside of the CUNY sentence processing community. But we are trying.
UPDATE: There is a chance that Andrew Gelman may be able to come to the workshop and give an overview presentation. That would obviously be awesome. Suggestions for topics also included relation between models for prediction vs. models for data analysis, relation between machine learning and most recent developments in multilevel modeling, and going beyond predicting the mean (explicit modeling of variance and covariance components).
Specific topics and questions:
- Standards of regression model fitting (using the right linking function, dealing with collinearity, guarding against overfitting) and what to report so that reviewers and readers can judge the adequacy of the model
- when and how to include interactions (incl. basics of centering, coding, residualization)
- when and how to test for non-linearities
- standards for random effect in-/exclusions
- standards for assessing significance of n-ary factor removal for n > 2 in mixed models that are not fit by ML.
- for more detail see also the previous post.
- How to report, summarize, and visualize regression results (incl. interpretation of coefficients, coding of variables)
- Short refresher on conceptual background for multilevel models
- Other open issues:
- Eye-tracking and other time series data
- consequences of unbalanced data (incl. data resulting from balanced designs with high exclusion rates)
- non-gaussian linking functions?
- future developments, what’s coming up?
- [Fermin says] one might want to discuss the possibility of predicting further than means. For instance, it is becoming clear in the analysis of RT data that one will have variables and factors that affect means, and variables and factors that affect variances (and possibly several other factors in a distribution; see, e.g., the Balota et al paper in the JML special issue).
Feedback is, of course, very welcome.