Jerry Brunner and I developed two methods to estimate replicability of published results based on test statistics in original studies. One method, z-curve, is used to provide replicabiltiy estimates in my powergraphs.
In September, we submitted a manuscript that describes these methods to Psychological Methods, where it was rejected.
We now revised the manuscript. The new manuscript contains a detailed discussion of various criteria for replicability with arguments why a significant result in an exact replication study is an important, if not the only, criterion to evaluate the outcome of replication studies.
It also makes a clear distinction between selection for significance in an original study and the file drawer problem in a series of conceptual or exact replication studies. Our methods only assumes selection for significance in original studies, but no file drawer or questionable research practices. This idealistic assumption may explain why our model predicts a much higher success rate in the OSC reproducibility project (66%) than was actually obtained (36%). As there is ample evidence for file-drawers with non-significant conceptual replication studies, we believe that file-drawers and QRP contribute to the low success rate in the OSC project. However, we also mention concerns about the quality of some replication studies.
We hope that the revised version is clearer, but fundamentally nothing has changed. Reviewers at Psychological Methods didn’t like our paper, the editor thought NHST is no longer relevant (see editorial letter and reviews), but nobody challenged our statistical method or the results of our simulation studies that validate the method. It works and it provides an estimate of replicability under very idealistic conditions, which means we can only expect a considerably lower success rate in actual replication studies as long as researchers file-drawer non-significant results.