Behrooz Mamandipoor, Raphael Romano Bruno, Bernhard Wernly, Georg Wolff, Jesper Fjølner, Antonio Artigas, Bernardo Bollen Pinto, Joerg C. Schefold, Malte Kelm, Michael Beil, Sviri Sigal, Susannah Leaver, Dylan W. De Lange, Bertrand Guidet, Hans Flaatten, Wojciech Szczeklik, Christian Jung, and Venet Osmani
Background COVID-19 remains a complex disease in terms of its trajectory and the diversity of outcomes rendering disease management and clinical resource allocation challenging. Varying symptomatology in older patients as well as limitation of clinical scoring systems have created the need for more objective and consistent methods to aid clinical decision making. In this regard, machine learning methods have been shown to enhance prognostication, while improving consistency. However, current machine learning approaches have been limited by lack of generalisation to diverse patient populations, between patients admitted at different waves and small sample sizes. Objectives We sought to investigate whether machine learning models, derived on routinely collected clinical data, can generalise well i) between European countries, ii) between European patients admitted at different COVID-19 waves, and iii) between geographically diverse patients, namely whether a model derived on the European patient cohort can be used to predict outcomes of patients admitted to Asian, African and American ICUs. Methods We compare Logistic Regression, Feed Forward Neural Network and XGBoost algorithms to analyse data from 3,933 older patients with a confirmed COVID-19 diagnosis in predicting three outcomes, namely: ICU mortality, 30-day mortality and patients at low risk of deterioration. The patients were admitted to ICUs located in 37 countries, between January 11, 2020, and April 27, 2021. Results The XGBoost model derived on the European cohort and externally validated in cohorts of Asian, African, and American patients, achieved AUC of 0.89 (95% CI 0.89–0.89) in predicting ICU mortality, AUC of 0.86 (95% CI 0.86–0.86) for 30-day mortality prediction and AUC of 0.86 (95% CI 0.86–0.86) in predicting low-risk patients. Similar AUC performance was achieved also when predicting outcomes between European countries and between pandemic waves, while the models showed high calibration quality. Furthermore, saliency analysis showed that FiO2 values of up to 40% do not appear to increase the predicted risk of ICU and 30-day mortality, while PaO2 values of 75 mmHg or lower are associated with a sharp increase in the predicted risk of ICU and 30-day mortality. Lastly, increase in SOFA scores also increase the predicted risk, but only up to a value of 8. Beyond these scores the predicted risk remains consistently high. Conclusion The models captured both the dynamic course of the disease as well as similarities and differences between the diverse patient cohorts, enabling prediction of disease severity, identification of low-risk patients and potentially supporting effective planning of essential clinical resources. Trial registration number NCT04321265. Author summary COVID-19 remains a complex disease, making it challenging to estimate the risk of deterioration of critically ill patients and consequently allocation of clinical resources, such as ventilators. As a result, there is a need to support clinical decision making through objective methods and address some of the limitations of the current clinical scoring systems. In response, we developed machine learning models using routine clinical data of patients from 37 countries worldwide, including 18 European countries. We find that: i) machine learning models can predict outcomes in patients from diverse European countries, that is which patients have a low risk of deterioration and which may require increased care such that resources are allocated efficiently; ii) routine clinical data from European patients can be used to predict outcomes in non-European patients, namely those admitted in Asian, African, and American intensive care units, without significantly affecting the performance, and iii) routine clinical data collected during the first COVID-19 pandemic wave, can be used to predict the risk of deterioration of patients admitted during subsequent waves. Our study is the first step towards improving standardisation and equity of critical care across healthcare institutions and further afield across diverse countries and territories.