1. A class-imbalanced hybrid learning strategy based on Raman spectroscopy of serum samples for the diagnosis of hepatitis B, hepatitis A, and thyroid dysfunction.
- Author
-
Leng H, Zhang Z, Chen C, and Chen C
- Subjects
- Humans, Thyroid Diseases diagnosis, Thyroid Diseases blood, Support Vector Machine, Algorithms, Machine Learning, Decision Trees, Spectrum Analysis, Raman methods, Hepatitis B diagnosis, Hepatitis B blood, Hepatitis A diagnosis, Hepatitis A blood
- Abstract
Computer-aided vibrational spectroscopy detection technology has achieved promising results in the field of early disease diagnosis. Yet limited by factors such as the number of actual samples and the cost of spectral acquisition in clinical medicine, the data available for model training are insufficient, and the amount of data varies greatly between different diseases, which constrain the performance optimization and enhancement of the diagnostic model. In this study, vibrational spectroscopy data of three common diseases are selected as research objects, and experimental research is conducted around the class imbalance situation that exists in medical data. When dealing with the challenge of class imbalance in medical vibrational spectroscopy research, it no longer relies on some kind of independent and single method, but considers the combined effect of multiple strategies. SVM, K-Nearest Neighbor (KNN), and Decision Tree (DT) are used as baseline comparison models on Raman spectroscopy medical datasets with different imbalance rates. The performance of the three strategies, Ensemble Learning, Feature Extraction, and Resampling, is verified on the class imbalance dataset by G-mean and AUC metrics, respectively. The results show that all the above three methods mitigate the negative impact caused by unbalanced learning. Based on this, we propose a hybrid ensemble classifier (HEC) that integrates resampling, feature extraction, and ensemble learning to verify the effectiveness of the hybrid learning strategy in solving the class imbalance problem. The G-mean and AUC values of the HEC method are 82.7 % and 83.12 % for the HBV dataset, is 2.02 % and 1.98 % higher than the optimal strategy; 83.62 % and 83.76 % for the HCV dataset, is 9.79 % and 8.47 % higher than the optimal strategy; while for the thyroid dysfunction dataset are 77.56 % and 77.85 %, is 6.92 % and 6.36 % higher than that of the optimal strategy, respectively. The experimental results show that the G-mean and AUC metrics of the HEC method are higher than those of the baseline classifier as well as the optimal combination using separate strategies. It can be seen that the HEC method can effectively counteract the unfavorable effects of imbalance learning and is expected to be applied to a wider range of imbalance scenarios., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024. Published by Elsevier B.V.)
- Published
- 2024
- Full Text
- View/download PDF