Back to Search Start Over

Using random forest and biomarkers for differentiating COVID-19 and Mycoplasma pneumoniae infections.

Authors :
Zhou X
Zhang J
Deng XM
Fu FM
Wang JM
Zhang ZY
Zhang XQ
Luo YX
Zhang SY
Source :
Scientific reports [Sci Rep] 2024 Sep 30; Vol. 14 (1), pp. 22673. Date of Electronic Publication: 2024 Sep 30.
Publication Year :
2024

Abstract

The COVID-19 pandemic has underscored the critical need for precise diagnostic methods to distinguish between similar respiratory infections, such as COVID-19 and Mycoplasma pneumoniae (MP). Identifying key biomarkers and utilizing machine learning techniques, such as random forest analysis, can significantly improve diagnostic accuracy. We conducted a retrospective analysis of clinical and laboratory data from 214 patients with acute respiratory infections, collected between October 2022 and October 2023 at the Second Hospital of Nanping. The study population was categorized into three groups: COVID-19 positive (n = 52), MP positive (n = 140), and co-infected (n = 22). Key biomarkers, including C-reactive protein (CRP), procalcitonin (PCT), interleukin- 6 (IL-6), and white blood cell (WBC) counts, were evaluated. Correlation analyses were conducted to assess relationships between biomarkers within each group. The random forest analysis was applied to evaluate the discriminative power of these biomarkers. The random forest model demonstrated high classification performance, with area under the ROC curve (AUC) scores of 0.86 (95% CI: 0.70-0.97) for COVID-19, 0.79 (95% CI: 0.64-0.92) for MP, 0.69 (95% CI: 0.50-0.87) for co-infections, and 0.90 (95% CI: 0.83-0.95) for the micro-average ROC. Additionally, the precision-recall curve for the random forest classifier showed a micro-average AUC of 0.80 (95% CI: 0.69-0.91). Confusion matrices highlighted the model's accuracy (0.77) and biomarker relationships. The SHAP feature importance analysis indicated that age (0.27), CRP (0.25), IL6 (0.14), and PCT (0.14) were the most significant predictors. The integration of computational methods, particularly random forest analysis, in evaluating clinical and biomarker data presents a promising approach for enhancing diagnostic processes for infectious diseases. Our findings support the use of specific biomarkers in differentiating between COVID-19 and MP, potentially leading to more targeted and effective diagnostic strategies. This study underscores the potential of machine learning techniques in improving disease classification in the era of precision medicine.<br /> (© 2024. The Author(s).)

Details

Language :
English
ISSN :
2045-2322
Volume :
14
Issue :
1
Database :
MEDLINE
Journal :
Scientific reports
Publication Type :
Academic Journal
Accession number :
39349769
Full Text :
https://doi.org/10.1038/s41598-024-74057-5