### Bayesian replication test

In a very cool paper on different types of replication tests, Verhagen and Wagenmakers proposed their own variant of a Bayesian replication test (see reference below). The test compares the “proponent’s” hypothesis that the new data constitute a replication against the “skeptic’s” hypothesis that the new data constitutes a null effect. The presents simulations comparing the different Bayesian and frequentist measures, and makes a compelling argument for the proposed new measure.

For the case of a t-test, Verhagen and Wagenmakers show how one can approximate the evidential support (Bayes Factor) for the replication hypothesis over the null hypothesis, BF_*r0, *based on just

- the
*t-*statistics of the original and new data - the number of data points in the original
*t*-test and*t*-test over the replication data

It occurred to me that one might be able to use the same measure to assess the replicability of individual effects in linear mixed models (i.e., the evidentiary support for the hypothesis that a specific effect in a multiple linear regression model replicates), in particular in cases where the different predictors in the model are all orthogonal to each other (e.g., in balanced designs). So one purpose of this post is to give everyone something to shout at if they disagree.

The second purpose is to post the code here. **The code is really all from Josine Verhagen’s webpage.** I’m posting it here for convenience. But I just

- wrote a quick and dirty wrapper function that uses the
*lmerTest*and*pbkrtest*packages to provide Satterthwaite (fast and often good for large data sets) or Kenward-Roger (more precise, but a*lot*slower and ever more so, the larger the data) approximations of the degrees of freedom for the*t*-test, which are used to get the ‘number of data points’ that went into the original and replication*t*-test. - update the plotting of the prior and posterior densities to use
*ggplot2*, so that the plot can be more easily modified subsequently.

For documentation and background, please see Verhagen and Wagenmaker’s excellent paper!

whichDFMethod <- function(model) { l = length(fitted(model)) # This threshold was set more or less arbitrarily, based on some # initial testing. You might want to change it. Satterthwaite's # approximation can result in hugely inflated DFs. if (l < 3000) { cat(paste0("Using Kenward-Roger's approximation of DFs for t-test over coefficient because the data set is small: ", l, "\n")) return("Kenward-Roger") } else { return("Satterthwaite") } } # wrapper function that calls ReplicationBayesfactorNCT with t-values and n1, n2 observations # based on the output of mixed model analyses. The test is conducted for the k=th t-test within # the mixed model, specified by the "coef" argument. Assumes that the models have been fit with lmerTest, # so that the approximated DFs can be used as n1 and n2 (number of observations in the original and # replication experiment). For that reason (and only that reason), the function requires lmerTest. replicationBF <- function(model1, model2, yhigh = 0, limits.x = NULL, coef = 1, M = 500000) { require(lmerTest) require(pbkrtest) summary1 = summary(model1, ddf=whichDFMethod(model1)) summary2 = summary(model2, ddf=whichDFMethod(model2)) print(summary1) cat("\n\n") print(summary2) cat("\n\nCalculting replication Bayes Factor:\n") out <- ReplicationBayesfactorNCT( coef(summary1)[coef,"t value"], coef(summary2)[coef,"t value"], coef(summary1)[coef,"df"] + 1, # Since the ReplicationBayesfactorNCT function subtracts 1 for the DFs (for sample == 1) coef(summary2)[coef,"df"] + 1, # Since the ReplicationBayesfactorNCT function subtracts 1 for the DFs (for sample == 1) sample = 1, plot = 1, post = 1, yhigh = yhigh, limits.x = limits.x, M = M ) return (out) } # Replication Bayes Factor taken from Josine Verhagen's webpage # (http://josineverhagen.com/?page_id=76). # See Verhagen and Wagenmakers (2014) for details. # # Minor modifications to the plotting made by fjaeger@ur.rochester.edu ReplicationBayesfactorNCT <- function( tobs, # t value in first experiment trep, # t value in replicated experiment n1, # first experiment: n in group 1 or total n n2, # second experiment: n in group 1 or total n m1 = 1, # first experiment: n in group 2 or total n m2 = 1, # second experiment: n in group 2 or total n sample = 1, # 1 = one sample t-test (or within), 2 = two sample t-test wod = dir, # working directory plot = 0, # 0 = no plot 1 = replication post = 0, # 0 = no posterior, 1 = estimate posterior M = 500000, yhigh = 0, limits.x = NULL ) { require(MCMCpack) require(ggplot2) # added by TFJ ################################################################## #STEP 1: compute the prior for delta based on the first experiment ################################### ############################### D <- tobs if (sample==1) { sqrt.n.orig <- sqrt(n1) df.orig <- n1 -1 } if (sample==2) { sqrt.n.orig <- sqrt( 1/(1/n1+1/m1)) # two sample alternative for sqrt(n) df.orig <- n1 + m1 -2 #degrees of freedom } # To find out quickly the lower and upper bound areas, the .025 and .975 quantiles of tobs at a range of values for D are computed. # To make the algorithm faster, only values in a reasonable area around D are computed. # Determination of area, larger with large D and small N range.D <- 4 + abs(D/(2*sqrt.n.orig)) sequence.D <- seq(D,D + range.D,.01) #make sequence with range # determine which D gives a quantile closest to tobs with an accuracy of .01 options(warn=-1) approximatelow.D <- sequence.D[which( abs(qt(.025,df.orig,sequence.D)-tobs)==min(abs(qt(.025,df.orig,sequence.D)-tobs)) )] options(warn=0) # Then a more accurate interval is computed within this area # Make sequence within .01 from value found before sequenceappr.D <- seq((approximatelow.D-.01),(approximatelow.D+.01),.00001) # determine which D gives a quantile closest to tobs with an accuracy of .00001 low.D <- sequenceappr.D[which( abs(qt(.025,df.orig,sequenceappr.D)-tobs)==min(abs(qt(.025,df.orig,sequenceappr.D)-tobs)) )] # Compute standard deviation for the corresponding normal distribution. sdlow.D <- (D-low.D)/qnorm(.025) # compute prior mean and as for delta prior.mudelta <- D/sqrt.n.orig prior.sdelta <- sdlow.D/sqrt.n.orig ################################################################## #STEP 2: Compute Replication Bayes Factor ################################################################## # For one sample t-test: within (group2==1) or between (group2==vector) if (sample==1) { df.rep <- n2-1 sqrt.n.rep <- sqrt(n2) Likelihood.Y.H0 <- dt(trep,df.rep) sample.prior <- rnorm(M,prior.mudelta,prior.sdelta) options(warn=-1) average.Likelihood.H1.delta <- mean(dt(trep,df.rep,sample.prior*sqrt.n.rep)) options(warn=0) BF <- average.Likelihood.H1.delta/Likelihood.Y.H0 } # For two sample t-test: if (sample==2) { df.rep <- n2 + m2 -2 sqrt.n.rep <- sqrt(1/(1/n2+1/m2)) Likelihood.Y.H0 <- dt(trep,df.rep) sample.prior <- rnorm(M,prior.mudelta,prior.sdelta) options(warn=-1) average.Likelihood.H1.delta <- mean(dt(trep,df.rep,sample.prior*sqrt.n.rep)) options(warn=0) BF <- average.Likelihood.H1.delta/Likelihood.Y.H0 } ################################################################# #STEP 3: Posterior distribution ################################################################# if( post == 1) { options(warn=-1) likelihood <- dt(trep,df.rep,sample.prior*sqrt.n.rep) prior.density <- dnorm(sample.prior,prior.mudelta,prior.sdelta) likelihood.x.prior <- likelihood * prior.density LikelihoodXPrior <- function(x) {dnorm(x,prior.mudelta,prior.sdelta) * dt(trep,df.rep,x*sqrt.n.rep) } fact <- integrate(LikelihoodXPrior,-Inf,Inf) posterior.density <- likelihood.x.prior/fact$value PosteriorDensityFunction <- function(x) {(dnorm(x,prior.mudelta,prior.sdelta) * dt(trep,df.rep,x*sqrt.n.rep))/fact$value } options(warn=0) mean <- prior.mudelta sdh <- prior.mudelta + .5*(max(sample.prior)[1] - prior.mudelta) sdl <- prior.mudelta - .5*(prior.mudelta - min(sample.prior)[1]) dev <- 2 for ( j in 1:10) { rangem <- seq((mean-dev)[1],(mean+dev)[1],dev/10) rangesdh <- seq((sdh-dev)[1],(sdh+dev)[1],dev/10) rangesdl <- seq((sdl-dev)[1],(sdl+dev)[1],dev/10) perc <- matrix(0,length(rangem),3) I<-min(length(rangem), length(rangesdh),length(rangesdl) ) for ( i in 1:I) { options(warn=-1) vpercm <- integrate(PosteriorDensityFunction, -Inf, rangem[i]) perc[i,1]<- vpercm$value vpercsh <- integrate(PosteriorDensityFunction, -Inf, rangesdh[i]) perc[i,2]<- vpercsh$value vpercsl <- integrate(PosteriorDensityFunction, -Inf, rangesdl[i]) perc[i,3]<- vpercsl$value options(warn=0) } mean <- rangem[which(abs(perc[,1]-.5)== min(abs(perc[,1]-.5)))] sdh <- rangesdh[which(abs(perc[,2]-pnorm(1))== min(abs(perc[,2]-pnorm(1))))] sdl <- rangesdl[which(abs(perc[,3]-pnorm(-1))== min(abs(perc[,3]-pnorm(-1))))] dev <- dev/10 } posterior.mean <- mean posterior.sd <- mean(c(abs(sdh- mean),abs(sdl - mean))) } if (post != 1) { posterior.mean <- 0 posterior.sd <- 0 } ###########OUT dat.SD=new.env() dat.SD$BF <- BF dat.SD$prior.mean= round(prior.mudelta,2) dat.SD$prior.sd= round(prior.sdelta,2) dat.SD$post.mean= round(posterior.mean,2) dat.SD$post.sd= round(posterior.sd,2) dat.SD=as.list(dat.SD) ########################################### #PLOT ########################################### if (plot==1) { if (post != 1) { options(warn =-1) rp <- ReplicationPosterior(trep,prior.mudelta,prior.sdelta,n2,m2=1,sample=sample) options(warn =0) posterior.mean <- rp[[1]] posterior.sd <- rp[[2]] } par(cex.main = 1.5, mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5, font.lab = 2, cex.axis = 1.3, bty = "n", las=1) high <- dnorm(posterior.mean,posterior.mean,posterior.sd) add <- high/5 if(yhigh==0) { yhigh <- high + add } scale <- 3 if (is.null(limits.x)) { min.x <- min((posterior.mean - scale*posterior.sd),(prior.mudelta - scale*prior.sdelta)) max.x <- max((posterior.mean + scale*posterior.sd),(prior.mudelta + scale*prior.sdelta)) } else { min.x <- limits.x[1] max.x <- limits.x[2] } ggplot( data = data.frame(x = 0), mapping = aes( x = x ) ) + scale_x_continuous(expression(Effect ~ size ~ delta), limits = c(min.x, max.x) ) + scale_y_continuous("Density", limits = c(0, yhigh) ) + stat_function( fun = function(x) dnorm(x, prior.mudelta, prior.sdelta), aes(linetype = "Before replication"), size = .75 ) + stat_function( fun = function(x) dnorm(x, posterior.mean, posterior.sd), aes(linetype = "After replication"), size = 1 ) + scale_linetype_manual("", values = c("Before replication" = 2, "After replication" = 1) ) + geom_point( x = 0, y = dnorm(0,prior.mudelta,prior.sdelta), color = "gray", size = 3 ) + geom_point( x = 0, y = dnorm(0,posterior.mean,posterior.sd), color = "gray", size = 3 ) + geom_segment( x = 0, xend = 0, y = dnorm(0,prior.mudelta,prior.sdelta), yend= dnorm(0,posterior.mean,posterior.sd), color = "gray" ) + annotate( geom = "text", x = sum(limits.x) / 2, y = yhigh - 1/10 * yhigh, label = as.expression(bquote(BF[r0] == .(round(BF, digits=2)))) ) + theme( legend.position = "bottom", legend.key.width = unit(1.5, "cm") ) } return(dat.SD) }

Verhagen, J., & Wagenmakers, E. J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General, 143(4), 1457.