6 thoughts on “The Power of the Pen Paradigm: A Replicability Analysis

  1. This is an interesting analysis, but I don’t think you should call it a “replicability analysis” given that you’re simply meta-analyzing studies that are so methodologically heterogeneous that they could never falsify an original hypothesis. A true (& valid) “replicability analysis” needs to ensure that each replication study uses a methodology that is indeed sufficiently methodologically similar to an original study, which is what Curate Science does, as outlined in our latest unified framework paper: https://osf.io/preprints/psyarxiv/uwmr8

    Also, you should stop (mis)defining replicability as statistical power (“…goal is to estimate the average replicability of a set of studies, where replicability is defined as the probability of obtaining a statistically significant result”) because this just confuses everything. Replicability is the extent to which an effect is independently replicable in sufficiently methodogically similar replication studies (which themselves are also each sufficiently transparently reported & exhibit sufficient analytic reproducibility): see https://osf.io/pxe5h/

    Finally, I disagree with your statement (“The same cannot be said about Darwin’s theory of emotion”) because his theory might still have merit, but has not **yet** been tested correctly (e.g., testing the facial feedback hypothesis using a very sophisticated camouflaged highly-repeated within-person design).


    1. 1. How do you define sufficiently similar?

      2. The core definition of replication is that the replication study produces the same result. If the result of the first study is statistical significance, this implies that the criterion for the replication study is also statistical significance.

      3. There may be other ways to do replicability analysis and to define replicability, but that does not mean that my use of the term is wrong, just different from your approach.

      4. Finally, it is a statistical fact that the probability of replicating an original finding is a function fo power. Of course, power is not the only factor because replications are never exact, but power is by definition the probability of obtaining a significant result in the original study and in the replication study.

      What these analyses show is that all studies had very low power to detect a facial feedback effect and publication bias inflated the impression of the replicability of the effect.

      It would be interesting to see what conclusions we can draw from your analysis of pen-in-mouth studies.


  2. When you first posted this on facebook, you asked me if I thought some papers were missing. I was in the midst of heavy teaching, so I didn’t have much time to go through things. I did note that a paper by Niedenthal (where I am also a co-author) was missing, but in that paper the “pen in the mouth” is cast as blocking mimicry rather than as a method for inducing emotions via facial feedback. (It was originally thought to do that, but it didn’t work, so, well, some late hypothesizing – but it was a long time ago I guess).

    I didn’t think that the “blocking of mimicry” was the focus on this collection, and when I finally had time to look closer, I also ignored papers that attempts to block mimicry rather than enhancing it.

    The list I used was Strack’s compliation of papers that have investigated facial feedback, which he shared along with his commentary to the failed RRR. This is also the list I started using to delve deeper into facial feedback (and which I didn’t get very far with).

    I compared his list where the pen-manipulation was used, with the list here, and although there are overlaps, there are several papers on his list that aren’t here (and also, several papers here that weren’t on his list). – I copy them in here, in order of publication year.

    2007 Havas, D. A., Glenberg, A. M., & Rinck, M. (2007). Emotion simulation during language comprehension. Psychonomic Bulletin & Review, 14(3), 436-441.
    2008 Stel, M., van den Heuvel, C., & Smeets, R. C. (2008). Facial feedback mechanisms in autistic spectrum disorders. Journal of autism and developmental disorders, 38(7), 1250-1258.
    2009 Ashton-James, C., Maddux, W. W., Galinsky, A. D., & Chartrand, T. L. Feeling Badly Makes Us More Who We Are: Negative Affect Strengthens Culturally Consistent Self-Construals.
    2009 Topolinski, S., & Strack, F. (2009). The architecture of intuition: Fluency and affect determine intuitive judgments of semantic and visual coherence and judgments of grammaticality in artificial grammar learning. Journal of Experimental Psychology: General, 138(1), 39.
    2010 Blaesi, S., & Wilson, M. (2010). The mirror reflects both ways: Action influences perception of others. Brain and cognition, 72(2), 306-309.
    2013 Fernández-Abascal, E. G., & Díaz, M. D. M. (2013). Affective induction and creative thinking. Creativity Research Journal, 25(2), 213-221.
    2013 Topolinski, S., & Deutsch, R. (2013). Phasic affective modulation of semantic priming. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(2), 414.
    2014 Bilewicz, M., & Kogan, A. (2014). Embodying imagined contact: Facial feedback moderates the intergroup consequences of mental simulation. British Journal of Social Psychology, 53(2), 387-395.
    2014 Chang, J., Zhang, M., Hitchman, G., Qiu, J., & Liu, Y. (2014). When you smile, you become happy: Evidence from resting state task-based fMRI. Biological psychology, 103, 100-106.
    2015 Lobmaier, J. S., & Fischer, M. H. (2015). Facial feedback affects perceived intensity but not quality of emotional expressions. Brain sciences, 5(3), 357-368.
    2015 Sel, A., Calvo-Merino, B., Tuettenberg, S., & Forster, B. (2015). When you smile, the world smiles at you: ERP evidence for self-expression effects on face processing. Social cognitive and affective neuroscience, 10(10), 1316-1322.

    I have done a rough coding of the papers, entering them into the R-index work-sheet. It is very rough (considering that I have spent no more than 3 interrupted half-days on it), and should probably be looked over. But, this is roughly what they add:

    Success Rate Obs. Power Inflation Rate R-Index
    0,6875 0,587337408 0,100162592 0,487174815

    It is a rather heterogeneous set of studies – although all of them include the smiling pen manipulation (ok, so one uses chopsticks). Outcomes vary.

    It would be nice updating with these, one way or another.


    1. Hi Ase, I looked through these studies. All except one were actually in our database. I did not include them in the analysis for a number of reasons; mainly within-subject designs and studies with DV that are only tentatively related to experienced affect. I can do a sensitivity analysis, whether results change for this broader set of studies. Conclusion shouldn’t depend on selection of studies.


  3. I figured it would be something like this. It wasn’t quite straightforward going through them (and, yes I noticed the repeated measures).

    But, it leads me to another thought that I have had for a while, since I started looking through these last year, and that is that there is possibly a need to chart the methods and measures (and the varieties of methods and measures), because, as they say, the devil is in the details. It isn’t enough to extract effect sizes, because how we manipulate and measure matters

    I really like Malte Elson’s effort with the flexible measures site because it really clarifies the varieties of manipulations and measures.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s