1. Utilizing machine learning to predict participant response to follow-up health surveys in the Millennium Cohort Study
- Author
-
Wisam Barkho, Nathan C. Carnes, Claire A. Kolaja, Xin M. Tu, Satbir K. Boparai, Sheila F. Castañeda, Beverly D. Sheppard, Jennifer L. Walstrom, Jennifer N. Belding, Rudolph P. Rull, and the Millennium Cohort Study Team
- Subjects
Longitudinal studies ,Survey nonresponse ,Survey response ,Machine learning ,Latent class analysis ,Survey outreach efforts ,Medicine ,Science - Abstract
Abstract The Millennium Cohort Study is a longitudinal study which collects self-reported data from surveys to examine the long-term effects of military service. Participant nonresponse to follow-up surveys presents a potential threat to the validity and generalizability of study findings. In recent years, predictive analytics has emerged as a promising tool to identify predictors of nonresponse. Here, we develop a high-skill classifier using machine learning techniques to predict participant response to follow-up surveys of the Millennium Cohort Study. Six supervised algorithms were employed to predict response to the 2021 follow-up survey. Using latent class analysis (LCA), we classified participants based on historical survey response and compared prediction performance with and without this variable. Feature analysis was subsequently conducted on the best-performing model. Including the LCA variable in the machine learning analysis, all six algorithms performed comparably. Without the LCA variable, random forest outperformed the benchmark regression model, however overall prediction performance decreased. Feature analysis showed the LCA variable as the most important predictor. Our findings highlight the importance of historical response to improve prediction performance of participant response to follow-up surveys. Machine learning algorithms can be especially valuable when historical data are not available. Implementing these methods in longitudinal studies can enhance outreach efforts by strategically targeting participants, ultimately boosting survey response rates and mitigating nonresponse.
- Published
- 2024
- Full Text
- View/download PDF