Predictions about Replication Success in OSF-Reproducibility Project

Ulrich Schimmack and Andrew Laughton(2015). Predicting Successful Replications in the OSF-Reproducibility Project.


This project used post-hoc-power analysis to predict the success rate in the OSF-Reproducibility Project. Various estimation methods predict a success rate around 50% in the Reproducibility Project. However, the success rate is predicted to vary by journal and content area. The replication rate for JEP:LMC and cognitive articles in Psychological Science is predicted to be much higher than the replication rate for articles published in JPSP and for social articles published in Psychological Science.  A table with predictions for 99 individual replication studies is provided.

The Open-Science-Framework (OSF) is run by the Center for Open Science. A few years ago, OSF organized the Reproducibility Project . The aim of the Reproducibility Project was to estimate the reproducibility of psychological science.

In order to use the results of the Reproducibility Project as information about the reproducibility of psychological studies in general, it is important to consider the sampling of studies that were included in the Reproducibility Project.

The project organizers decided to sample from three psychological journals: Psychological Science, the Journal of Personality and Social Psychology, and the Journal of Experimental Psychology: Learning, Memory, and Cognition. Only articles published in the beginning of 2008 were available for replication. Each article contained often more than one study and each study had several tests of statistical significance. If an article contained multiple studies, the last study was used for replication. A total of 100 studies were reproduced.

The most important result of the Replication Project is the rate of successful replications. As there are various definition of success, it is important to define clearly what counts as a successful replication.We define a replication study as successful if an exact replication of the original study replicates the result of the original study. This means, if the original study produced a significant result that led to the conclusion that an effect is present (rejecting H0), the replication study also produced a significant result. It is important to realize that a non-significant result does not invalidate the conclusion of the original study because a non-significant result can be a type-II error (failing to reject the null-hypothesis when the null-hypothesis is false).

The definition of replication success as a significant result makes it possible to predict the outcome of the replicability project by using post-hoc-power analysis. Power is defined as the long-run frequency of significant results in a series of exact replication studies. For example, if power were 50%, the expected rate of successful replications in the Reproducibility Project would be 50 out of 100 studies. Cohen (1962) estimated that psychological studies have 60% power. Thus, we would expect 60 out of 100 studies in the Reproducibility Project to produce significant results. There are several caveats with using Cohen’s (1962) estimate of power in psychological research. First, power may have changed over time. Second, Cohen’s (1962) estimate was based on the reported effect sizes in journals. However, these effect sizes are inflated by publication bias and the use of questionable research practices (Schimmack, 2012; Sterling, 1959; Sterling et al., 1995). Third, power may vary across research areas. To address these concerns, Schimmack and Bruner have introduced Post-Hoc-Power curves as a method to estimate replicability for sets of studies with heterogeneity in power and publication bias.

The method was first used to examine the predicted success rate of replication studies in the three journals selected for the Reproducibiltiy Project. The analysis showed clear differences in replication rates.

Journal of Experimental Psychology: Learning, Memory, and Cognition (JEP:LMC) had the highest replicability score (71%). A time-trend analysis showed that the year 2008 was close to the historic average of JEP:LMC.

Psychological Science had the second highest score with 60%. Again, the year 2008 was close to the historic average of this journal.

The Journal of Personality and Social Psychology has three independent sections. The majority of studies selected for the Reproducibiltiy Project were from the Attitudes and Social Cognition and the Interpersonal Relationships and Group Processes sections. The replicability scores for these two sections are similar with 54% and 51%, respectively.   The replicability scores in 2008 were below the historic average and averaged 48% for the two sections.


To get more specific predictions, we obtained list of the OSF-Reproducibility Project studies. Our predictions are based on 99 articles that reported a (marginally) significant effect and were target of a replication attempt (see list of DOI below). For each article, we converted the reported test statistic into a z-score. Power was determined using the homogeneous model for z-scores in the range from 2 to 4. Effects with a z-score greater than4 were expected to have a 100% success rate (100% power). In addition, we estimated power based on all t- and F-values that were reported in the article using an automated text analysis. Test-statistics were again converted into z-scores and analyzed to estimate power using the homogeneous effect size model for z-scores between 2 and 4. The success rate was estimated independently for both methods. In addition, a prediction for individual effects was made using information from both analyses. The two methods were considered to predict a successful replication when the power estimate was greater than 50% and the individual effect had a z-score greater than 3. When power was estimated below 50% and the z-score for the critical test was below 3, the replication attempt was predicted to fail. When the two tests produced inconsistent information, a more detailed analysis of the article was used to make predictions.

For the 99 statistical tests in our analysis, we predicted that 52 would replicate.

The automated method based on all t- and F-tests reported in an article, yielded an average power estimate of 52 based on 88 articles that contained useful information.

The PHP-curve based on the actual statistical tests is shown in the figure below. Based on the distribution of z-scores between 1.96 and 4, power for the 67 tests with z-scores in this range is estimated to be 45%. This means that 41 of these effects are expected to produce a significant result in exact replication studies with the same sample size. An additional 9 studies with z-scores greater than 4 are expected to replicate with 100%. This gives a total replication rate of 50/99 = 50%.


Table of Predictions for Individual Studies

Article Pred. Success Power Estimate z-score
Z-Score > 4
DOI: 10.1037/0278-7393.34.1.204 YES 98 > 4
DOI: 10.1037/0278-7393.34.2.257 YES 70 > 4
DOI: 10.1037/0278-7393.34.2.430 YES 65 > 4
DOI: 10.1037/0278-7393.34.3.546 YES 21 > 4
DOI: 10.1037/0278-7393.34.2.343 YES 18 > 4
DOI: 10.1037/0278-7393.34.1.219 YES NA > 4
Consistent High Power
DOI: 10.1037/0278-7393.34.3.524 YES 97 3.98
DOI: 10.1037/0278-7393.34.1.128 YES 97 3.00
DOI: 10.1037/0278-7393.34.1.167 YES 92 3.10
DOI: 10.1037/0278-7393.34.3.439 YES 75 3.80
DOI: 10.1037/0278-7393.34.1.97 YES 78 3.14
DOI: 10.1037/0278-7393.34.1.80 YES 71 3.73
DOI: 10.1037/0278-7393.34.3.460 YES 68 3.72
DOI: 10.1037/0278-7393.34.2.302 YES 61 3.06
Consistent Low Power
DOI: 10.1037/0278-7393.34.2.399 NO 28 2.68
DOI:10.1037/0278-7393.34.3.478 NO 23 2.16
DOI: 10.1037/0278-7393.34.3.514 NO 17 1.91
DOI: 10.1037/0278-7393.34.1.249 NO 10 1.97
Inconsistent (Low/High)
DOI: 10.1037/0278-7393.34.1.230 YES 35 3.19
DOI: 10.1037/0278-7393.34.1.65 YES 30 3.51
DOI: 10.1037/0278-7393.34.1.146 YES NA 3.25
DOI: 10.1037/0278-7393.34.1.186 YES 82 2.92
DOI: 10.1037/0278-7393.34.2.353 NO 58 2.34
DOI: 10.1037/0278-7393.34.1.243 NO 91 1.97
DOI: 10.1037/0022-3514.94.4.718 NO 81 2.17
DOI: 10.1037/0278-7393.34.2.408 NO 80 2.19
DOI: 10.1037/0278-7393.34.2.369 NO 54 1.95
Z-Score > 4
doi:10.1111/j.1467-9280.2008.02046.x YES 92 > 4
doi:10.1111/j.1467-9280.2008.02051.x YES 84 > 4
doi:10.1111/j.1467-9280.2008.02080.x YES 66 > 4
doi:10.1111/j.1467-9280.2008.02136.x YES 54 > 4
doi:10.1111/j.1467-9280.2008.02041.x YES 60 > 4
Inconsistent (high/low)
doi:10.1111/j.1467-9280.2008.02049.x YES 97 2.97
doi:10.1111/j.1467-9280.2008.02042.x NO 93 2.38
doi:10.1111/j.1467-9280.2008.02098.x NO 85 2.20
doi:10.1111/j.1467-9280.2008.02070.x YES 64 2.73
doi:10.1111/j.1467-9280.2008.02064.x NO 62 2.23
doi:10.1111/j.1467-9280.2008.02050.x YES 51 2.78
Inconsistent (low/high)
doi:10.1111/j.1467-9280.2008.02044.x YES 36 3.04
Z-Score > 4
DOI: 10.1037/0022-3514.94.4.579 YES 74 > 4
DOI: 10.1037/0022-3514.94.4.647 YES NA > 4
DOI: 10.1037/0022-3514.95.2.420 YES 62 > 4
DOI: 10.1037/0022-3514.94.3.516 YES 87 > 4
DOI: 10.1037/0022-3514.94.4.718 YES 30 > 4
DOI: 10.1037/0022-3514.94.4.672 YES 82 > 4
Consistent High Power
DOI: 10.1037/0022-3514.94.4.600 YES 70 3.91
DOI: 10.1037/0022-3514. YES 62 3.36
DOI: 10.1037/0022-3514.94.4.615 YES 57 3.35
Consistent Low Power
DOI: 10.1037/0022-3514.94.1.158 NO 42 2.31
DOI: 10.1037/0022-3514.94.1.91 YES 34 2.93
DOI: 10.1037/0022-3514.94.3.429 YES 31 2.96
DOI: 10.1037/0022-3514.94.3.429 YES 31 2.96
DOI: 10.1037/a0012518 NO 28 2.00
DOI: 10.1037/0022-3514.94.4.560 NO 22 2.38
DOI: 10.1037/0022-3514.94.1.1 NO 22 1.97
DOI: 10.1037/0022-3514.94.1.48 NO 14 2.30
DOI: 10.1037/0022-3514.94.4.696 NO 13 2.18
DOI: 10.1037/a0012833 NO 11 2.42
DOI: 10.1037/0022-3514.94.3.396 NO 3 2.01
DOI: 10.1037/0022-3514.94.3.495 NO NA 1.91
Inconsistent (high/low)
DOI: 10.1037/0022-3514.94.1.116 YES 92 2.59
DOI: 10.1037/0022-3514.94.3.412 NO 92 2.05
DOI: 10.1037/0022-3514.95.2.293 NO 70 2.01
DOI: 10.1037/0022-3514.94.1.60 NO 68 2.16
DOI: 10.1037/0022-3514. NO 67 2.02
DOI: 10.1037/0022-3514.94.3.382 NO 59 2.31
DOI: 10.1037/0022-3514.94.1.16 YES 56 2.97
DOI: 10.1037/0022-3514.94.3.479 NO 54 2.35
DOI: 10.1037/0278-7393.34.1.237 YES 53 2.70
DOI: 10.1037/0022-3514.94.5.871 NO 51 1.96
Inconsistent (low/high)
DOI: 10.1037/0022-3514.95.2.274 YES 43 3.5
DOI: 10.1037/2333-8113.1.S.73 YES 26 3.2
DOI: 10.1037/0022-3514.95.1.76 YES 22 3.8
Z-Score > 4
doi:10.1111/j.1467-9280.2008.02100.x YES 97 > 4
doi:10.1111/j.1467-9280.2008.02092.x YES 88 > 4
doi:10.1111/j.1467-9280.2008.02088.x YES 61 > 4
doi:10.1111/j.1467-9280.2008.02089.x YES 30 > 4
Consistent High Power
Consistent Low Power
doi:10.1111/j.1467-9280.2008.02227.x NO 41 1.85
doi:10.1111/j.1467-9280.2008.02062.x NO 13 1.96
doi:10.1111/j.1467-9280.2008.02084.x NO 23 2.6
doi:10.1111/j.1467-9280.2008.02056.x NO 19 2.88
doi:10.1111/j.1467-9280.2008.02083.x NO 16 2.89
doi:10.1111/j.1467-9280.2008.02090.x NO 09 2.24
doi:10.1111/j.1467-9280.2008.02060.x NO 07 2.65
doi:10.1111/j.1467-9280.2008.02078.x NO 3 2.04
doi:10.1111/j.1467-9280.2008.02040.x NO 3 2.31
doi:10.1111/j.1467-9280.2008.02099.x NO NA 2.52
doi:10.1111/j.1467-9280.2008.02052.x NO NA 2.47
doi: 10.1111/j.1467-9280.2008.02095.x NO NA 2.27
doi:10.1111/j.1467-9280.2008.02093.x NO NA 2.26
doi:10.1111/j.1467-9280.2008.02093.x NO NA 2.26
doi:10.1111/j.1467-9280.2008.02039.x NO NA 2.09
doi:10.1111/j.1467-9280.2008.02039.x NO NA 2.09
Inconsistent (high/low)
doi:10.1111/j.1467-9280.2008.02061.x YES 90 2.58
doi:10.1111/j.1467-9280.2008.02045.x YES 84 2.8
doi:10.1111/j.1467-9280.2008.02057.x NO 83 1.99
doi:10.1111/j.1467-9280.2008.02053.x YES 70 2.76
doi:10.1111/j.1467-9280.2008.02082.x NO 70 1.99
doi:10.1111/j.1467-9280.2008.02077.x NO 55 2.27
doi:10.1111/j.1467-9280.2008.02072.x YES NA 2.98

One thought on “Predictions about Replication Success in OSF-Reproducibility Project

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s