Amna Shakil and Ulrich Schimmack
In 2008, Turner and colleagues (2008) examined the presence of publication bias in clinical trials of antidepressants. They found that out of 74 FDA-registered studies, 51% showed positive results. However, positive results were much more likely to be published, as 94% of the published results were positive. There were two reasons for the inflated percentage of positive results. First, negative results were not published. Second, negative results were published as positive results. Turner and colleagues’ (2008) results received a lot of attention and cast doubt on the effectiveness of anti-depressants.
A year after Turner and colleagues (2008) published their study, Moreno, Sutton, Turner, Abrams, Cooper and Palmer (2009) examined the influence of publication bias on the effect-size estimate in clinical trials of antidepressants. They found no evidence of publication bias in the FDA-registered trials, leading the researchers to conclude that the FDA data provide an unbiased gold standard to examine biases in the published literature.
The effect size for treatment with anti-depressants in the FDA data was g = 0.31, 95% confidence interval 0.27 to 0.35. In contrast, the uncorrected average effect size in the published studies was g = 0.41, 95% confidence interval 0.37 to 0.45. This finding shows that publication bias inflates effect size estimates by 32% ((0.41 – 0.31)/0.31).
Moreno et al. (2009) also used regression analysis to obtain a corrected effect size estimate based on the biased effect sizes in the published literature. In this method, effect sizes are regressed on sampling error under the assumption that studies with smaller samples (and larger sampling error) have more bias. The intercept is used as an estimate of the population effect size when sampling error is zero. This correction method yielded an effect size estimate of g = 0.29, 95% confidence interval 0.23 to 0.35, which is similar to the gold standard estimate (.31).
The main limitation of the regression method is that other factors can produce a correlation between sample size and effect size (e.g., higher quality studies are more costly and use smaller samples). To avoid this problem, we used an alternative correction method that does not make this assumption.
The method uses the R-Index to examine bias in a published data set. The R-Index increases as statistical power increases and it decreases when publication bias is present. To obtain an unbiased effect size estimate, studies are selected to maximize the R-Index.
Since the actual data files were not available, graphs A and B from Moreno et al.’s (2009) study were used to obtain information about effect size and sample error of all the FDA-registered and the published journal articles.
The FDA-registered studies had the success rate of 53% and the observed power of 56%, resulting in an inflation of close to 0. The close match between the success rate and observed confirms FDA studies are not biased. Given the lack of bias (inflation), the most accurate estimate of the effect size is obtained by using all studies.
The published journal articles had a success rate of 86% and the observed power of 73%, resulting in the inflation rate of 12%. The inflation rate of 12% confirms that the published data set is biased. The R-Index subtracts the inflation rate from observed power to correct for inflation. Thus, the R-Index for the published studies is 73-12 = 61. The weighted effect size estimate was d = .40.
The next step was to select sets of studies to maximize the R-Index. As most studies were significant, the success rate could not change much. As a result, most of the increase would be achieved by selecting studies with higher sample sizes in order to increase power. The maximum R-Index was obtained for a cut-off point of N = 225. This left 14 studies with a total sample size of 4,170 participants. The success rate was 100% with median observed power of 85%. The Inflation was still 15%, but the R-Index was higher than it was for the full set of studies (70 vs. 61). The weighted average effect size in the selected set of powerful studies was d = .34. This result is very similar to the gold standard in the FDA data. The small discrepancy can be attributed to the fact that even studies with 85% power still have a small bias in the estimation of the true effect size.
In conclusion, our alternative effect size estimation procedure confirms Moreno et al.’s (2009) results using an alternative bias-correction method and shows that the R-Index can be a valuable tool to detect and correct for publication bias in other meta-analyses.
These results have important practical implications. The R-Index confirms that published clinical trials are biased and can provide false information about the effectiveness of drugs. It is therefore important to ensure that clinical trials are preregistered and that all results of clinical trials are published. The R-Index can be used to detect violations of these practices that lead to biased evidence. Another important finding is that clinical trials of antidepressants do show effectiveness and that antidepressants can be used as effective treatments of depression. The presence of publication bias should not be used to claim that antidepressants lack effectiveness.
Moreno, S. G., Sutton, A. J., Turner, E. H., Abrams, K. R., Cooper, N. J., Palmer, T. M., & Ades, A. E. (2009). Novel methods to deal with publication biases: secondary analysis of antidepressant trials in the FDA trial registry database and related journal publications. Bmj, 339, b2981.
Turner, E. H., Matthews, A. M., Linardatos, E., Tell, R. A., & Rosenthal, R. (2008). Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine, 358(3), 252-260.