Back to Search Start Over

Pseudo-random Number Generator Influences on Average Treatment Effect Estimates Obtained with Machine Learning.

Authors :
Naimi, Ashley I.
Ya-Hui Yu
Bodnar, Lisa M.
Source :
Epidemiology; Nov2024, Vol. 35 Issue 6, p779-786, 8p
Publication Year :
2024

Abstract

Background: The use of machine learning to estimate exposure effects introduces a dependence between the results of an empirical study and the value of the seed used to fix the pseudo-random number generator. Methods: We used data from 10,038 pregnant women and a 10% subsample (N = 1004) to examine the extent to which the risk difference for the relation between fruit and vegetable consumption and preeclampsia risk changes under different seed values. We fit an augmented inverse probability weighted estimator with two Super Learner algorithms: a simple algorithm including random forests and single-layer neural networks and a more complex algorithm with a mix of tree-based, regression-based, penalized, and simple algorithms. We evaluated the distributions of risk differences, standard errors, and P values that result from 5000 different seed value selections. Results: Our findings suggest important variability in the risk difference estimates, as well as an important effect of the stacking algorithm used. The interquartile range width of the risk differences in the full sample with the simple algorithm was 13 per 1000. However, all other interquartile ranges were roughly an order of magnitude lower. The medians of the distributions of risk differences differed according to the sample size and the algorithm used. Conclusions: Our findings add another dimension of concern regarding the potential for "p-hacking," and further warrant the need to move away from simplistic evidentiary thresholds in empirical research. When empirical results depend on pseudo-random number generator seed values, caution is warranted in interpreting these results. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10443983
Volume :
35
Issue :
6
Database :
Supplemental Index
Journal :
Epidemiology
Publication Type :
Academic Journal
Accession number :
180792870
Full Text :
https://doi.org/10.1097/EDE.0000000000001785