Back to Search Start Over

How reliable are standard reading time analyses? Hierarchical bootstrap reveals substantial power over-optimism and scale-dependent Type I error inflation.

Authors :
Burchill, Zachary J.
Jaeger, T. Florian
Source :
Journal of Memory & Language. Apr2024, Vol. 136, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

• Hierarchical bootstrap as alternative to parametric Type I/II error simulations. • Identify inflated Type I & II errors of two most common approaches to RT analyses. • Identify substantial over-optimism of RT analyses in the literature. • Identify issues with past parametric validations of analysis approaches. • Provide recommendations for new standards in RT analyses. We investigate the statistical power and Type I error rate of the two most common approaches to reading time (RT) analyses: assuming normality of residuals and homogeneity of variance in raw or log-transformed RTs. We first show that the assumptions of such analyses—such as t -tests, ANOVAs, and linear mixed-effects models—are neither consistently met by raw RTs, nor by log-transformed RTs (or any other common power transforms, incl. inverse-transformed RTs). Only a non-power transform (log-shift) provides a decent fit for all data sets and data preparation steps we consider. We then compare the statistical power and Type I error rate for linear mixed-effects models over raw or log-transformed RTs. Previous studies on this matter relied on parametrically generated data. We show why this is problematic, and introduce as an alternative a hierarchical bootstrap approach over naturally distributed reading times. This approach yields substantially different—and arguably more informative—results than the parametric simulation approaches we compare it to. Our results suggests that it is time to heed the advice others have provided for reading research: for any but the simplest designs, we find both the rate of spurious significances and the rate of undetected true effects can strongly depend on the scale (e.g., raw or log-RTs) in which effects are assumed to be linear. Researchers should thus clearly motivate the choice of analysis based on theoretical grounds, assess the robustness of findings under different analysis approaches, and discuss potential mismatches between analyses. The R scripts and libraries shared in the accompanying OSF repo allow researchers to assess the reliability of their analyses via hierarchical bootstrap over their own data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0749596X
Volume :
136
Database :
Academic Search Index
Journal :
Journal of Memory & Language
Publication Type :
Academic Journal
Accession number :
175936205
Full Text :
https://doi.org/10.1016/j.jml.2023.104494