In 2015, Science published the results of the first empirical attempt to estimate the reproducibility of psychology. One key finding was that out of 97 attempts to reproduce a significant result, only 36% of attempts succeeded.
This finding fueled debates about a replication crisis in psychology. However, there have been few detailed examinations of individual studies to examine why a particular result could be replicated or not. The main reason is probably that it is a daunting task to conduct detailed examinations of all studies. Another reason is that replication failures can be caused by several factors. Each study may have a different explanation. This means it is important to take an ideographic (study-centered) perspective.
The conclusions of these ideographic reviews will be used for a nomothetic research project that aims to predict actual replication outcomes on the basis of the statistical results reported in the original article. These predictions will only be accurate if the replication studies were close replications of the original study. Otherwise, differences between the original study and the replication study may explain why replication studies failed.
The senior author of Article 140 was John A. Bargh. Other studies by John Bargh have failed to replicate and questions about his research were raised in an open letter by Noble Laureate Daniel Kahneman, who prominently features Bargh’s work in his popular book “Thinking Fast and Slow.” John Bargh’s response to this criticisms can be described as stoic defiance. In 2017, he published a book on his work that did not mention replication failures or concerns about replicability of social priming research in general. A quantitative review of the book showed that much of the cited evidence was weak.
Summary of Original Article
The article “Keeping One’s Distance: The Influence of Spatial Distance Cues on Affect and
Evaluation” by Lawrence E. Williams and John A. Bargh was published in Psychological Science. The article has been cited 155 times overall and 17 times in 2017.
The main hypothesis is that physical distance influences social judgments without reference to the self. To provide evidence for this hypothesis the authors reported four priming studies. Physical distance was primed by plotting points in a way that suggested physical closeness or distance. Mean differences indicated greater enjoyment of media depicting embarrassment, less emotional distress from violent media, lower estimates of the number of calories in unhealthy food, and weaker reports of emotional attachments to family members and hometowns in the distant prime condition than in the close prime condition.
73 undergraduate students were assigned to three priming conditions (n = 24 per cell; 41 female, 32 male).
After the priming manipulation, participants were given an excerpt from the book “Good in Bed.” and rated their enjoyment of an article titled “Loving a larger woman.”
An ANOVA showed a significant difference between the three groups, F(2, 67) = 3.14, p = .0497 (reported as p-rep = .88). The authors wisely did not conduct post-hoc comparisons of the close or distant condition with the control condition. Instead, they reported a significant linear trend, t(67) = 2.41, p = .019.
Although men and women may respond differently to reading an article about making love to a larger woman, gender differences were not examined or reported.
Study 2 reduced the sample size form 73 participants to 42 participants (n = 14 per cell).
The dependent variable was liking of a violent and shocking story.
Despite the loss in statistical power, the ANOVA result was significant, F(2, 39) = 4.37, p = .019. The linear contrast was also significant, t(39) = 2.62, p = .012.
59 community members participated in Study 3.
This time, participants rated caloric content (but not liking?) of healthy and unhealthy foods .
The mixed-model ANOVA showed a significant interaction result, F(2,56) = 3.36, p = .042. The interaction was due to significant differences between the priming conditions for unhealthy foods, F(2,56) = 5.62, p = .006. There were no significant differences for healthy foods. No linear contrasts were reported. The Figure shows that the significant effect was mainly due to a lower estimate of calories in the distant priming condition.
84 students participated in Study 4.
The dependent variable was the average of closeness ratings to siblings, parents, and hometown.
The ANOVA result was significant, F(2,81) = 4.97, p = .009. The linear contrast also showed a significant trend for lower closeness ratings after distance priming, t(81) = 2.86, p = .005.
All four studies showed significant results with p-values (.019, .012, .006, and .005). These p-values correspond to z-scores of z = 2.35, 2.51, 2.75, and 2.81. Median observed power is 75%, while the success rate is 100%. The Replicability Index corrects for inflation in median observed power when the success rate is higher than median observed power. An R-Index of 50% (75% – 25% Inflation) is not terrible, but also not very reassuring. In the grading scheme of the R-Index it is a D- (in comparison, many chapters in Bargh’s book have an Index below 50% and an F).
More concerning is that the variance of the four z-scores is only var(z) = 0.046, when sampling error of independent studies should be 1. The Test of Insufficient Variance (TIVA) shows that such a small variance or even less variance would occur in four studies with a probability of p = .013. This suggests that some non-random factors contributed to the observed results.
These results suggest that it is difficult to replicate the reported results because the reported effect sizes may be inflated by the use of questionable research practices.
Consistent with the OSC project guidelines, the authors replicated the last study in the article.
The sample size was moderately larger than in the original study (N = 133 vs. 84).
The simple procedure was reproduced exactly.
The ANOVA result was not significant, F(2,122) = 0.24, p = .79. The linear contrast was also not significant, t(122) = -0.59, p = .56.
The pattern of means showed a slightly higher closeness (to family) rating after priming closeness (M = 5.44, SD = 0.83) than in the control condition (M = 5.31, SD = 1.07). The mean for the distance priming condition was identical to the control condition (M = 5.31, SD = 1.15).
In conclusion, the replication study failed to replicate the original finding. Given the simplicity of the study, there are no obvious differences between the studies that could explain the replication failure. The most plausible explanation is that the original article reported inflated effect sizes.
Other Replication Failures
A previous replication attempt failed to replicate the results of Studies 3 and 4 (Harold Pashler, Noriko Coburn, Christine R. Harris, 2012).
The replication of Study 3 had 92 participants (vs. 59 in the original study). The study failed to replicate the interaction between distance priming and healthiness of food, F(2,85) = 0.45, p = .64. The replication of Study 4 also had 92 participants, although some were excluded because they did not have siblings or parents or could not perform the plotting task. The final sample size was N = 71 (vs. 84 in the original study). It also failed to replicate the original result, F(2,68) = 0.31.
Pashler et al. (2012) found no plausible explanation for their replication failure, but they did not consider QRPs. In contrast, the bias analyses presented here suggest that QRPs were used to report only supportive evidence for distance priming. If QRPs were used, it is not surprising that unbiased replication attempts fail to produce biased results.
Williams and Bargh (2008) proposed that a simple geometric task could prime distance or closeness and alter judgments of liking (close = like). Although they presented four studies with significant results, the evidence is not conclusive because bias tests suggest that the results are too good to be true. This impression was confirmed by a replication failure in a replication study that had slightly more power to detect the predicted effect due to a larger sample.
Although the replication failure was reported in 2015, the original article continues to be cited as if no replication failure occurred or the replication failure can be dismissed. A careful bias analysis suggests that the original results do not provide credible evidence for distance priming and the article should not be cited as evidence for it. Unless future studies with larger samples provide credible evidence for the effect, it remains doubtful that a simple geometric drawing task can alter evaluations of closeness to family members.
This replication failure has to be considered in the context of other replication failures of studies by John Bargh and social priming studies in general. As noted by Daniel Kahneman in his 2012 letter to John Bargh.
“As all of you know, of course, questions have been raised about the robustness of priming results…. your field is now the poster child for doubts about the integrity of psychological research… people have now attached a question mark to the field, and it is your responsibility to remove it.”
Bargh’s response to this letter may be described as willful attentional blindness. Not addressing concerns about replicability may be one way to maintain confidence in oneself in the face of criticism and doubt by others, but it is not good science. Experimental social psychologists who still believe in social priming effects like distance priming need to demonstrate replicability with high powered studies. So far, the results have not been encouraging (see failure to replicate professor priming).
P.S If you liked this blog post, the reason is that I primed you with a closeness prime (featured image). In reality, this blog post is terrible. If you didn’t like the blog post, the priming manipulation didn’t work (ironic isn’t it).