Back to Search Start Over

Predicting the time to get back to work using statistical models and machine learning approaches.

Authors :
Bouliotis G
Underwood M
Froud R
Source :
BMC medical research methodology [BMC Med Res Methodol] 2024 Nov 29; Vol. 24 (1), pp. 295. Date of Electronic Publication: 2024 Nov 29.
Publication Year :
2024

Abstract

Background: Whether machine learning approaches are superior to classical statistical models for survival analyses, especially in the case of lack of proportionality, is unknown.<br />Objectives: To compare model performance and predictive accuracy of classic regressions and machine learning approaches using data from the Inspiring Families programme.<br />Methods: The Inspiring Families programme aims to support members of families with complex issues to return to work. We explored predictors of time to return to work with proportional hazards (Semi-Parametric Cox in Stata) and (Flexible Parametric Parmar-Royston in Stata) against the Survival penalised regression with Elastic Net penalty (scikit-survival), (conditional) Survival Forest algorithm (pySurvival), and (kernel) Survival Support Vector Machine (pySurvival).<br />Results: At baseline we obtained data on 61 binary variables from all 3161 participants. No model appeared superior, with a low predictive power (concordance index between 0.51 and 0.61). The median time for finding the first job was about 254 days. The top five contributing variables were 'family issues and additional barriers', 'restriction of hours', 'available CV', 'self-employment considered' and 'education'. The Harrell's Concordance index was range from 0.60 (Cox model) to 0.71 (Random Survival Forest) suggesting a better fit for the machine learning approaches. However, the comparison for predicting median time on a selected scenario based showed only minor differences.<br />Conclusion: Implementing a series of survival models with and without proportional hazards background provides a useful insight as well as better interpretation of the coefficients affected by non-linearities. However, that better fit does not translate to substantially higher predictive power and accuracy from using machine learning approaches. Further tuning of the machine learning algorithms may provide improved results.<br />Competing Interests: Declarations. Ethics approval and consent to participate: Ethical approval was given by University of Warwick Biomedical and Scientific Research ethics Sub-committee for the study Predicting RTW class membership using Machine Learning (PRIMAL) REGO-2018-2186. Informed consent was obtained, from participants in the Inspiring Families, at the point of entry onto the programme for the University of Warwick to use depersonalised data obtained from participants. Consent for publication: N/A. Competing interests: MU is chief investigator or co-investigator on multiple previous and current research grants from the UK National Institute for Health Research, Arthritis Research UK and is a co-investigator on grants funded by the Australian NHMRC and Norwegian MRC. He was an NIHR Senior Investigator until March 2021. He receives some salary support from University Hospitals Coventry and Warwickshire. He is a co-investigator on two current and one completed NIHR funded studies that are, of have had, additional support from Stryker Ltd. Until March 2020 he was an editor of the NIHR journal series, and a member of the NIHR Journal Editors Group, for which he received a fee. RF is Chief Investigator on a research grant from Norwegian Medical Research Council on return-to-work initiatives. RF & MU are shareholders and directors of a University of Warwick spinout company that provide data collection services for health services research. These services were not use in this study. GB declares no competing interests.<br /> (© 2024. The Author(s).)

Details

Language :
English
ISSN :
1471-2288
Volume :
24
Issue :
1
Database :
MEDLINE
Journal :
BMC medical research methodology
Publication Type :
Academic Journal
Accession number :
39614191
Full Text :
https://doi.org/10.1186/s12874-024-02390-4