Validity of the Implicit Association Test as a Measure of Implicit Attitudes

This blog post reports the results of an analysis of correlations among 4 explicit and 3 implicit attitude measures published by Ranganath, Tucker, and Nosek (2008).

Original article:
Kate A. Ranganath, Colin Tucker Smith, & Brian A. Nosek (2008). Distinguishing automatic and controlled components of attitudes from direct and indirect measurement methods. Journal of Experimental Social Psychology 44 (2008) 386–396; doi:10.1016/j.jesp.2006.12.008

Distinct automatic and controlled processes are presumed to influence social evaluation. Most empirical approaches examine automatic processes using indirect methods, and controlled processes using direct methods. We distinguished processes from measurement methods to test whether a process distinction is more useful than a measurement distinction for taxonomies of attitudes. Results from two studies suggest that automatic components of attitudes can be measured directly. Direct measures of automatic attitudes were reports of gut reactions (Study 1) and behavioral performance in a speeded self-report task (Study 2). Confirmatory factor analyses comparing two factor models revealed better fits when self-reports of gut reactions and speeded self-reports shared a factor with automatic measures versus sharing a factor with controlled self-report measures. Thus, distinguishing attitudes by the processes they are presumed to measure (automatic versus controlled) is more meaningful than distinguishing based on the directness of measurement.

Description of Original Study

Study 1 measured relative attitudes towards heterosexuals and homosexuals with seven measures; four explicit measures and three reaction time tasks. Specifically, the four explicit measures were

Actual = Participants were asked to report their “actual feelings” towards gay and straight people when given enough time for full consideration on a scale ranging from 1=very negative to 8 = very positive.

Gut = Participants were asked to report their “gut reaction” towards gay and straight people when given enough time for full consideration on a scale ranging from 1=very negative to 8 = very positive.

Time0 and Time5: A second explicit rating task assessed an “attitude timeline”. Participants reported their attitudes toward the two groups at multiple time points: (1) instant reaction, (2) reaction a split-second later, (3) reaction after 1 s, (4) reaction after 5 s, and (5) reaction when given enough time to think fully. Only the first (Time0) and the last (Time5) rating were included in the model.

The three reaction time measures were the Implicit Association Test (IAT), the Go-NoGo Association Test (GNAT), and a Four-Category Sorting Paired Features Task (SPF). All three measures use differences in response times to measure attitudes.

Table A1 in the Appendix reported the correlations among the seven tasks.

GNAT .36 1
SPF .26 .18 1
GUT .23 .33 .12 1
Actual .16 .31 .01 .65 1
Time0 .19 .31 .16 .85 .50 1
Time5 .01 .24 .01 .54 .81 .50 1

The authors tested a variety of structural equation models. The best fitting model, preferred by the authors, was a model with three correlated latent factors. “In this three-factor model, self-reported gut feelings (GutFeeling, Instant Feeling) comprised their own attitude factor distinct from a factor comprised of the indirect, automatic measures (IAT, GNAT, SPF) and from a factor comprised of the direct, controlled measures (Actual Feeling, Fully Considered Feeling). The data were an excellent fit (chi^2(12) = 10.8).

The authors then state “while self-reported gut feelings were more similar to the indirect measures than to the other self-reported attitude measures, there was some unique variance in self-reported gut feelings that was distinct from both.” (p. 391) and they go on to speculate that “one possibility is that these reports are a self-theory that has some but not complete correspondence with automatic evaluations” (p. 391). The also consider the possibility that “measures like the IAT, GNAT, and SPF partly assess automatic evaluations that are “experienced” and amenable to introspective report, and partly evaluations that are not” (p. 391). But they favor the hypothesis that “self-report of ‘gut feelings’ is a meaningful account of some components of automatic evaluation” (p. 391). The interpret these results as strong support for their “contention that a taxonomy of attitudes by measurement features is not as effective as one that distinguishes by presumed component processes” (p. 391). The conclusion reiterates this point. “The present studies suggest that attitudes have distinct but related automatic and controlled factors contributing to social evaluation and that parsing attitudes by underlying processes is superior to parsing attitude measures by measurement features” (p. 393). Surprisingly, the author do not mention the three-factor model in the Discussion and rather claim support for a two-factor model that distinguishes processes rather than measures (explicit vs. implicit). “In both studies, model comparison using confimatory factor analysis showed the data were better fit to a two-factor model distinguishing automatic and controlled components of attitudes than to a model distinguishing attitudes by whether they were measured directly or indirectly” (p. 393). The authors then suggest that some explicit measures (ratings of gut reactions) can measure automatic attitudes. “These findings suggest that direct measures can be devised to capture automatic components of attitudes despite suggestions that indirect measures are essential for such assessments” (p. 393).

New Analysis 

The main problem with this article is that the author never report parameter estimates for the model. Depending on the pattern of correlations among the three factors and factor loadings, the interpretation of the results can change. I first tried to fit the three-factor model to the covariance matrix (setting variances to 1) to the published correlation matrix. MPLUS7.1 showed some problems with negative residual variance for Actual. Also the model had one less degree of freedom than the published model. However, fixing the residual variance of actual did not solve the problem. I then proceeded to fit my own model. The model is essentially the same model as the three-factor model with the exception that I modeled the correlation among the three-latent factor with a single higher-order factor. This factor represents variation in common causes that influences all attitude measures. The problem of negative variance in the actual measure was solved by allowing for an extra correlation between the actual and gut ratings. As seen in the correlation table, these two explicit measures correlated more highly with each other (r = .65) than the corresponding T0 and T5 measures (rs = .54, .50). As in the original article, model fit was good (see Figure). Figure 1 shows for the first time the parameter estimates of the model.



The loadings of the explicit measures on the primary latent factors are above .80. For single item measures, this implies that these ratings are essentially measuring the same construct with some random error. Thus, the latent factors can be interpreted as explicit ratings of affective responses immediately or after some reflection. The loadings of these two factors on the higher order factor show that reflective and immediate responses are strongly influenced by the common factor. This is not surprising. Reflection may alter the immediate response somewhat, but it is unlikely to reverse or dramatically change the response a few seconds later. Interestingly, the immediate response has a higher loading on the attitude factor, although in this small sample the differences in loadings is not significant (chi^2(1) = 0.22. The third primary factor represents the shared variance among the three reaction time measures. It also loads on the general attitude factor, but the loading is weaker than the loading for the explicit measures. The parameter estimates suggest that about 25% of the variance is explained by the common attitude (.51^2) and 75% is unique to the reaction time measures. This variance component can be interpreted as unique variance in implicit measures. The factor loadings of the three reaction time measures are also relevant. The loading of the IAT suggests that only 28% (.53^2) of the observed variance in the IAT reflects the effect of causal factors that influence reaction time measures of attitudes. As some of this variance is also shared with explicit measures, only 21% ((.86*.53)^2) of the variance in the IAT represents the variance in the implicit attitude factor This has important implications for the use of the IAT to examine potential effects of implicit attitudes on behavior. Even if implicit attitudes had a strong effect on a behavior (r = .5), the correlation between IAT scores and the behavior only would be r = .86*.53*.5 = .23. A sample size of N = 146 participants would be needed to have 80% power to provide significant evidence for such a relationship (p < .05, two-tailed). Given a more modest effect of attitudes on behavior, r = .86*.53*.30 = .14, the sample size would need to be larger (N = 398). As many studies of implicit attitudes and behavior used smaller samples, we would expect many non-significant results, unless non-significant results remain unreported and published results report inflated effect sizes. One solution to the problem of low power in studies of implicit attitudes would be the use of multiple implicit attitude measures. This study suggests that a battery of different reaction time tasks can be used to remove random and task specific measurement error. Such a multi-method approach to the measurement of implicit attitudes is highly recommended for future studies because it would also help to interpret results of studies in which implicit attitudes do not influence behavior. If a set of implicit measures show convergent validity, this finding would indicate that implicit attitudes did not influence the behavior. In contrast, a null-result with a single implicit measure may simply show that the measure failed to measure implicit attitudes.


This article reported some interesting data, but failed to report the actual results. This analysis of the data showed that explicit measures are highly correlated with each other and show discriminant validity from implicit, reaction time measures. The new analysis also made it possible to estimate the amount of variance in the Implicit Association Test that reflects variance that is not shared with explicit measures but shared with other implicit measures. The estimate of 20% suggests that most of the variance in the IAT is due to factors other than implicit attitudes and that the test cannot be used to diagnose individuals. Whether the 20% of variance that is uniquely shared with other implicit measures reflects unconscious attitudes or method variance that is common to reaction time tasks remains unclear. The model also implies that predictive validity of a single IAT for prejudice behaviors is expected to be small to moderate (r < .30), which means large samples are needed to study the effect of implicit attitudes on behavior.






Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s