1. Many Labs 5: Testing pre-data collection peer review as an intervention to increase replicability
- Author
-
Charles R. Ebersole, Maya B Mathur, Erica Baranski, Diane-Jo Bart-Plange, Nick Buttrick, Christopher R. Chartier, Katherine S. Corker, Martin Corley, Joshua K. Hartshorne, Hans IJzerman, Ljiljana B. Lazarevic, Hugh Rabagliati, Ivan Ropovik, Balazs Aczel, Lena Fanya Aeschbach, Luca Andrighetto, Jack Dennis Arnal, Holly Arrow, Peter Babincak, Bence Endre Bakos, Gabriel Baník, Ernest Baskin, Radomir Belopavlović, Michael Bernstein, Michal Bialek, Nicholas Bloxsom, Bojana Bodroža, Diane B. V. Bonfiglio, Leanne Boucher, Florian Brühlmann, Claudia Chloe Brumbaugh, Erica Casini, Yiling Chen, Carlo Chiorri, William J. Chopik, Oliver Christ, Heather M. Claypool, Sean coary, Marija V. Čolić, W. Matthew Collins, Paul G Curran, Chris Day, Benjamin Dering, Anna Dreber, John Edlund, Filipe Falcão, Anna Fedor, Lily Feinberg, Ian Ferguson, Máire Ford, Michael C. Frank, Emily Fryberger, Alexander Garinther, Katarzyna Gawryluk, Mauro Giacomantonio, Steffen Robert Giessner, Jon E. Grahe, Rosanna Elizabeth Guadagno, Ewa Hałasa, Peter Hancock, Joachim Hüffmeier, Sean Hughes, Katarzyna Idzikowska, Michael Inzlicht, Alan Jern, William Jimenez-Leal, Magnus Johannesson, Jennifer Alana Joy-Gaba, Mathias Kauff, Danielle Kellier, Mallory Kidwell, Amanda Kimbrough, Josiah King, Sabina Kołodziej, Marton Kovacs, Karolina Krasuska, Sue Kraus, Lacy Elise Krueger, Katarzyna Kuchno, Caio Ambrosio Lage, Eleanor V. Langford, Carmel Levitan, Tiago Jessé Souza Lima, Hause Lin, Samuel Lins, J E Loy, Dylan Manfredi, Lukasz Markiewicz, Madhavi Menon, Brett Mercier, Mitchell Metzger, Ailsa E Millen, Jeremy K. Miller, Andres Montealegre, Don A Moore, Gideon Nave, Austin Lee Nichols, Sarah Ann Novak, Ana Orlic, Angelo Panno, Kimberly P. Parks, Ivana Pedović, Emilian Pękala, Matthew R. Penner, Sebastiaan Pessers, Boban Petrovic, Thomas Pfeiffer, Damian Pieńkosz, Emanuele Preti, Danka Purić, Tiago Silva Ramos, Jon Ravid, Timothy Razza, Katrin Rentzsch, Juliette Richetin, Sean Chandler Rife, Anna Dalla Rosa, Janos Salamon, Blair Saunders, Przemyslaw Sawicki, Kathleen Schmidt, Kurt Schuepfer, Thomas Schultze, Stefan Schulz-Hardt, Astrid Schütz, Ani Shabazian, Rúben Filipe Lopes Silva, Barbara Sioma, Lauren Skorb, Luana Elayne Cunha Souza, sara steegen, LAR Stein, R. Weylin Sternglanz, Darko Stojilović, Daniel Storage, Gavin Brent Sullivan, Barnabas Szaszi, Peter Szecsi, Orsolya Szoke, Attila Szuts, Manuela Thomae, Natasha Davis Tidwell, Carly tocco, Ann-Kathrin Torka, francis tuerlinckx, wolf vanpaemel, Leigh Ann Vaughn, Michelangelo Vianello, Domenico Viganola, Maria Vlachou, Ryan J. Walker, Sophia Christin Weissgerber, Aaron Lee Wichman, Bradford Jay Wiggins, Daniel Wolf, Michael James Wood, David A. Zealley, Iris Zezelj, Mark Zrubka, and Brian A. Nosek
- Abstract
Replications in psychological science sometimes fail to reproduce prior findings. If replications use methods that are unfaithful to the original study or ineffective in eliciting the phenomenon of interest, then a failure to replicate may be a failure of the protocol rather than a challenge to the original finding. Formal pre-data collection peer review by experts may address shortcomings and increase replicability rates. We selected 10 replications from the Reproducibility Project: Psychology (RP:P; Open Science Collaboration, 2015) in which the original authors had expressed concerns about the replication designs before data collection and only one of which was “statistically significant” (p < .05). Commenters suggested that lack of adherence to expert review and low-powered tests were the reasons that most of these RP:P studies failed to replicate (Gilbert et al., 2016). We revised the replication protocols and received formal peer review prior to conducting new replications. We administered the RP:P and Revised protocols in multiple laboratories (Median number of laboratories per original study = 6.5; Range 3 to 9; Median total sample = 1279.5; Range 276 to 3512) for high-powered tests of each original finding with both protocols. Overall, Revised protocols produced similar effect sizes as RP:P protocols following the preregistered analysis plan (Δr = .002 or .014, depending on analytic approach). The median effect size for Revised protocols (r = .05) was similar to RP:P protocols (r = .04) and the original RP:P replications (r = .11), and smaller than the original studies (r = .37). The cumulative evidence of original study and three replication attempts suggests that effect sizes for all 10 (median r = .07; range .00 to .15) are 78% smaller on average than original findings (median r = .37; range .19 to .50), with very precisely estimated effects.
- Published
- 2019
- Full Text
- View/download PDF