DOI: 10.1037/0022-3514.86.2.345

There are a lot of articles with questionable statistical results and it seems pointless to single out particular articles. However, once in a while, an article catches my attention and I will comment on the statistical results in it. This is one of these articles….

The format of this review highlights why articles like this passed peer-review and are cited at high frequency as if they provided empirical facts. The reason is a phenomenon called “verbal overshadowing.” In work on eye-witness testimony, participants first see the picture of a perpetrator. Before the actual line-up task, they are asked to give a verbal description of the tasks. The verbal description can distort the memory of the actual face and lead to a higher rate of misidentifications. Something similar happens when researchers read articles. Sometimes they only read abstracts, but even when they read the article, the words can overshadow the actual empirical results. As a result, memory is more strongly influenced by verbal descriptions than by the cold and hard statistical facts.

In the first part, I will present the results of the article verbally without numbers. In the second part, I will present only the numbers.

Part 1:

In the article “I Like Myself but I Don’t Know Why: Enhancing Implicit Self-Esteem by

Subliminal Evaluative Conditioning” Ap Dijksterhuis reports the results of six studies (1-4, 5a, 5b). All studies used a partially or fully subliminal evaluative conditioning task to influence implicit measures of self-esteem. The abstract states: “Participants were repeatedly presented with trials in which the word I was paired with positive trait terms. Relative to control conditions, this procedure enhanced implicit self-esteem.” Study 1 used preferences for initials to measure implicit self-esteem. and “results confirmed the hypothesis that evaluative conditioning enhanced implicit self-esteem.” (p. 348). Study 2 modified the control condition and showed that “participants in the conditioned self-esteem condition showed higher implicit self-esteem after the treatment than before the treatment, relative to control participants” (p. 348). Experiment 3 changed the evaluative conditioning procedure. Now, both the CS and the US (positive trait terms) were

presented subliminally for 17 ms. It also used the Implicit Association Test to measure implicit self-esteem. The results showed that “difference in response latency between blocks was much more pronounced in the conditioned self-esteem condition, indicating higher self-esteem” (p. 349). Study 4 also showed that “participants in the conditioned self-esteem condition exhibited higher implicit self-esteem than participants

in the control condition” (p. 350). Study 5a and 5b showed that “individuals whose

self-esteem was enhanced seemed to be insensitive to personality feedback, whereas control participants whose self-esteem was not enhanced did show effects of the intelligence feedback.” (p. 352). The General Discussion section summarizes the results. “In our experiments, implicit self-esteem was enhanced through subliminal evaluative conditioning. Pairing the self-depicting word I with positive trait terms consistently improved implicit self-esteem.” (p. 352). A final conclusion section points out the potential of this work for enhancing self-esteem. “It is worthwhile to explicitly mention an intriguing aspect of the present work. Implicit self-esteem can be enhanced, at least temporarily, subliminally in about 25 seconds.” (p. 353).

Part 2:

Study | Statistic | p | z | OP |

1 | F(1,76)=5.15 | 0.026 | 2.22 | 0.60 |

2 | F(1,33)=4.32 | 0.046 | 2.00 | 0.52 |

3 | F(1,14)=8.84 | 0.010 | 2.57 | 0.73 |

4 | F(1,79)=7.45 | 0.008 | 2.66 | 0.76 |

5a | F(1,89)=4.91 | 0.029 | 2.18 | 0.59 |

5b | F(1,51)=4.74 | 0.034 | 2.12 | 0.56 |

All six studies produced statistically significant results. To achieve this outcome two conditions have to be met: (a) the effect exists and (b) sampling error is small to avoid a failed study (i.e., a non-significant result even though the effect is real). The probability of obtaining a significant result is called power. The last column shows observed power. Observed power can be used to estimate the actual power of the six studies. Median observed power is 60%. With 60% power, we would expect that only 60% of the 6 studies (3.6 studies) produce a significant result, but all six studies show a significant result. The excess of significant result shows that the results in this article present an overly positive picture of the robustness of the effect. If these six studies were replicated exactly, we would not expect to obtain six significant results again. Moreover, the inflation of significant results also leads to an inflation of the power estimate. The R-Index corrects for this inflation by subtracting the inflation rate (100% observed success rate – 60% median observed power) from the power estimate. The R-Index is .60 – .40 = .20. Results with such a low R-Index often do not replicate in independent replication attempts.

Another method to examine the replicability of these results is to examine the variability of the z-scores (second last column). Each z-score reflects the strength of evidence against the null-hypothesis. Even if the same study is replicated, this measure will vary as a function of random sampling. The expected variance is approximately 1 (the standard deviation of a standard normal distribution). Low variance suggests that future studies will produce more variable results and with p-values close to .05, this means that future studies are expected to produce non-significant results. This bias test is called the Test of Insufficient Variance (TIVA). The variance of the z-scores is Var(z) = 0.07. The probability of this restricted variance to occur by chance is p = .003 (1/300).

Based on these results, the statistical evidence presented in this article is questionable and does not provide support for the conclusion that subliminal evaluative conditioning can enhance implicit self-esteem. Another problem with this conclusion is that implicit self-esteem measures have low reliability and low convergent validity. As a result, we would not expect strong and consistent effects of any experimental manipulation on these measures. Finally, even if a small and reliable effect could be obtained, it remains an open question whether this effect shows an effect on implicit self-esteem or whether the manipulation produces a systematic bias in the measurement of implicit self-esteem. “It is not yet known how long the effects of this manipulation last. In addition, it is not yet

known whether people who could really benefit from enhanced self-esteem (i.e., people with problematically low levels of self-esteem) can benefit from subliminal conditioning techniques.” (p. 353). 12 years later, we may wonder whether these results have been replicated in other laboratories and whether these effects last more than a few minutes after the conditioning experiment.

