1. Combination of ADASYN and random forest for classification of imbalanced lung cancer dataset.
- Author
-
Pulungan, Annisa Fadhillah, Selvida, Desilia, and Silitonga, Agnes Irene
- Subjects
- *
LUNG cancer , *ERROR rates , *RANDOM forest algorithms , *CAUSES of death , *ETIOLOGY of cancer , *CLASSIFICATION , *DEATH rate - Abstract
One of the main causes of death in the world is caused by cancer. And one of them is Lung Cancer. According to World Health Organization, in 2014 the death rate caused by lung cancer in Indonesia was 21.8% in men and 9.1% in women with 30,865 cases of death caused by lung cancer each year in men and women. Many studies have been carried out on computational lung cancer. One of them is by implementing machine learning in detecting lung cancer. However, there are obstacles, namely the imbalance in the amount of data between patients and non-patients. So it takes an approach to overcome this imbalance. One of these methods is ADASYN which is then combined with the Random Forest classification algorithm. In this study, we will compare the results of the classification model performance in Random Forest before and after the ADASYN sampling method was used in the training process. The results of this study showed an increase in the performance of the Random Forest classification model with an AUC value of 0.859 after the ADASYN sampling method was carried out with an error rate of 4.9%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF