Manuscript under review, copyright belongs to Jerry Brunner and Ulrich Schimmack
How replicable is psychology? A comparison of four methods of estimating replicability on the basis of test statistics in original studies
Jerry Brunner and Ulrich Schimmack
University of Toronto @ Mississauga
In the past five years, the replicability of original findings published in psychology journals has been questioned. We show that replicability can be estimated by computing the average power of studies. We then present four methods that can be used to estimate average power for a set of studies that were selected for significance: p-curve, p-uniform, maximum likelihood, and z-curve. We present the results of large-scale simulation studies with both homogeneous and heterogeneous effect sizes. All methods work well with homogeneous effect sizes, but only maximum likelihood and z-curve produce accurate estimates with heterogeneous effect sizes. All methods overestimate replicability using the Open Science Collaborative reproducibility project and we discuss possible reasons for this. Based on the simulation studies, we recommend z-curve as a valid method to estimate replicability. We also validated a conservative bootstrap confidence interval that makes it possible to use z-curve with small sets of studies.
Keywords: Power estimation, Post-hoc power analysis, Publication bias, Maximum likelihood, P-curve, P-uniform, Z-curve, Effect size, Replicability, Simulation.
Link to manuscript: http://www.utstat.utoronto.ca/~brunner/zcurve2016/HowReplicable.pdf
Link to website with technical supplement: