Back to Search Start Over

Classifying Dry Eye Disease Patients from Healthy Controls Using Machine Learning and Metabolomics Data.

Authors :
Amouei Sheshkal, Sajad
Gundersen, Morten
Alexander Riegler, Michael
Aass Utheim, Øygunn
Gunnar Gundersen, Kjell
Rootwelt, Helge
Prestø Elgstøen, Katja Benedikte
Lewi Hammer, Hugo
Source :
Diagnostics (2075-4418). Dec2024, Vol. 14 Issue 23, p2696. 19p.
Publication Year :
2024

Abstract

Background: Dry eye disease is a common disorder of the ocular surface, leading patients to seek eye care. Clinical signs and symptoms are currently used to diagnose dry eye disease. Metabolomics, a method for analyzing biological systems, has been found helpful in identifying distinct metabolites in patients and in detecting metabolic profiles that may indicate dry eye disease at early stages. In this study, we explored the use of machine learning and metabolomics data to identify cataract patients who suffer from dry eye disease, a topic that, to our knowledge, has not been previously explored. As there is no one-size-fits-all machine learning model for metabolomics data, choosing the most suitable model can significantly affect the quality of predictions and subsequent metabolomics analyses. Methods: To address this challenge, we conducted a comparative analysis of eight machine learning models on two metabolomics data sets from cataract patients with and without dry eye disease. The models were evaluated and optimized using nested k-fold cross-validation. To assess the performance of these models, we selected a set of suitable evaluation metrics tailored to the data set's challenges. Results: The logistic regression model overall performed the best, achieving the highest area under the curve score of 0.8378 , balanced accuracy of 0.735 , Matthew's correlation coefficient of 0.5147 , an F1-score of 0.8513 , and a specificity of 0.5667 . Additionally, following the logistic regression, the XGBoost and Random Forest models also demonstrated good performance. Conclusions: The results show that the logistic regression model with L2 regularization can outperform more complex models on an imbalanced data set with a small sample size and a high number of features, while also avoiding overfitting and delivering consistent performance across cross-validation folds. Additionally, the results demonstrate that it is possible to identify dry eye in cataract patients from tear film metabolomics data using machine learning models. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20754418
Volume :
14
Issue :
23
Database :
Academic Search Index
Journal :
Diagnostics (2075-4418)
Publication Type :
Academic Journal
Accession number :
181654176
Full Text :
https://doi.org/10.3390/diagnostics14232696