R-code for reading time data preparation

Posted on Updated on


Some time ago I posted some R-code on how to create spill-over data from a linger reading time file (for spill-over analysis of self-paced reading time data). Here are the steps that need to be done prior to that, importing from a linger file, data preparation, outlier check, etc.


# load dataframe with ALL data (including fillers)
#######################################################################
d <- as.data.frame(read.delim(file="C:\\Data\\readingtime-linger.rtm",
   sep=" ",
   col.names=c('EXPT','CONDITION','ITEM','SUBJ','Lpos','Wpos','WORD',
               'REGION', 'RTraw','zRTraw','RTresidual','zRTresidual',
               'CORRECT')
))

# EXPT = experiment ID (in case several experiments were combined in one
# list
# Lpos = position of stimulus in list
# Wpos = position of word in stimulus
# WORD = string of word
# REGION = defined reading time regions
# RTraw = raw reading times
# zRTraw = s-score of raw reading times
# CORRECT = was the comprehension question about the stimulus answered
# correctly


# some example data
# filler1 a _ 1 23 86 1 The 1 192 -0.699 -52.09 -0.189 100
# filler1 a _ 1 23 86 2 king 2 191 -0.930 -55.06 -0.261 100
# filler1 a _ 1 23 86 3 of 3 199 -0.668 -43.12 -0.170 100
# filler1 a _ 1 23 86 4 the 4 167 -0.815 -77.09 -0.296 100
# filler1 a _ 1 23 86 5 small 5 183 -0.586 -65.03 -0.201 100
# filler1 a _ 1 23 86 6 nation 6 175 -0.708 -75.00 -0.221 100
# filler1 a _ 1 23 86 7 used 7 199 -0.628 -47.06 -0.136 100
# filler1 a _ 1 23 86 8 to 8 183 -0.819 -59.12 -0.184 100
# filler1 a _ 1 23 86 9 wear 9 183 -0.909 -63.06 -0.105 100
# filler1 a _ 1 23 86 10 a 10 199 -0.663 -41.15 -0.119 100
# filler1 a _ 1 23 86 11 purple 11 183 -0.555 -67.00 -0.193 100
# filler1 a _ 1 23 86 12 and 12 191 -0.619 -53.09 -0.097 100
# filler1 a _ 1 23 86 13 gold 13 191 -0.892 -55.06 -0.124 100
# filler1 a _ 1 23 86 14 robe 14 191 -0.730 -55.06 -0.068 100
# filler1 a _ 1 23 86 15 made 15 215 -0.761 -31.06 0.140 100
# filler1 a _ 1 23 86 16 of 16 207 -0.600 -35.12 -0.027 100
# filler1 a _ 1 23 86 17 fancy 17 206 -0.877 -42.03 0.069 100
# filler1 a _ 1 23 86 18 thread 18 207 -0.729 -43.00 0.003 100
# filler1 a _ 1 23 86 19 in 19 215 -0.561 -27.12 -0.000 100
# filler1 a _ 1 23 86 20 the 20 231 -0.319 -13.09 -0.022 100
# filler1 a _ 1 23 86 21 bathtub. 21 375 -0.001 121.06 0.680 100
# ProdComp subj 2 21 23 87 1 The 1 208 -0.736 -36.09 0.050 100
# ProdComp subj 2 21 23 87 2 understudy 1 199 -0.786 -58.88 -0.112 100
# ProdComp subj 2 21 23 87 3 that 2 199 -0.936 -47.06 -0.291 100
# ProdComp subj 2 21 23 87 4 telephoned 3 198 -0.683 -59.88 -0.163 100
# ProdComp subj 2 21 23 87 5 the 4 231 -0.275 -13.09 -0.093 100
# ProdComp subj 2 21 23 87 6 agent 4 207 -0.309 -41.03 -0.137 100
# ProdComp subj 2 21 23 87 7 in 6 247 -0.466 4.88 -0.019 100
# ProdComp subj 2 21 23 87 8 Los 6 263 -0.412 18.91 0.033 100
# ProdComp subj 2 21 23 87 9 Angeles 6 231 -0.521 -20.97 -0.116 100
# ProdComp subj 2 21 23 87 10 an 7 215 -0.332 -27.12 -0.028 100
# ProdComp subj 2 21 23 87 11 hour 7 223 -0.314 -23.06 -0.019 100
# ProdComp subj 2 21 23 87 12 ago 7 175 -0.422 -69.09 -0.127 100
# ProdComp subj 2 21 23 87 13 shared 8 215 -0.455 -35.00 -0.166 100
# ProdComp subj 2 21 23 87 14 the 9 223 -0.397 -21.09 -0.190 100
# ProdComp subj 2 21 23 87 15 story 10 207 -0.663 -41.03 -0.010 100
# ProdComp subj 2 21 23 87 16 about 5 181 -1.004 -67.03 -0.195 100
# ProdComp subj 2 21 23 87 17 the 5 215 -0.778 -29.09 0.085 100
# ProdComp subj 2 21 23 87 18 job 5 190 -0.944 -54.09 -0.100 100
# ProdComp subj 2 21 23 87 19 and 11 203 -0.835 -41.09 -0.008 100
# ProdComp subj 2 21 23 87 20 felt 11 222 -0.713 -24.06 0.102 100
# ProdComp subj 2 21 23 87 21 relieved 11 190 -0.919 -63.94 -0.156 100
# ProdComp subj 2 21 23 87 22 immediately. 11 293 -0.256 31.18 0.459 100
# ...


# look at dataset and understand its structure
#######################################################################
str(d)
summary(d)
nrow(d)

# exclude practice trials (because reading times on them reflect
# probably mostly the training itself
#######################################################################
d <- d[-grep("^practice", d$EXPT),]
nrow(d)

# define some variables
#######################################################################
# log-transformed reading times (because those are usually more normally
# distributed, but check for yourself)
d$logRT <- log(prodcomp$RTraw)

# word length in graphemes (letters)
d$Wlen <- nchar(as.character(prodcomp$WORD), type="chars")

# removing corrupted items (e.g. because they are missing conditions,
# had unintended ambiguities, etc.
#######################################################################
d <- subset(d, !(EXPT == "ProdComp" & (ITEM %in% c(6,17,24))))

# exclude readers that are too slow (SEs over readers' average log RT)
#######################################################################
# Explore for each experiment whether there were subjects that were a lot
# slower on those stimuli than other subjects (outliers are almost always
# SLOW outliers).
# use the same DV that you also will use for the main analysis.
abs(scale(unlist(lapply(split(d$logRT,
                        as.factor(as.character(d$SUBJ))), mean)))) < 3
abs(scale(unlist(lapply(split(d$logRT,
                        as.factor(as.character(d$SUBJ))), mean)))) < 2.5
abs(scale(unlist(lapply(split(d$logRT,
                        as.factor(as.character(d$SUBJ))), mean)))) < 2

# exclude subject 5
d <- subset(d, SUBJ != "5")

# now you can apply the spill-over analysis
Advertisements

2 thoughts on “R-code for reading time data preparation

    LingLangLung » More sophisticated spillover analysis said:
    February 25, 2008 at 3:26 pm

    […] another post of Florian’s with scripts for prepping reading time […]

    Like

    […] residuals of that model are used as dependent variable for the second model and spill-over variables are entered into the […]

    Like

Questions? Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s