1. A novel K-nearest neighbor classifier for lung cancer disease diagnosis.
- Author
-
Sachdeva, Ravi Kumar, Bathla, Priyanka, Rani, Pooja, Lamba, Rohit, Ghantasala, G. S. Pradeep, and Nassar, Ibrahim F.
- Subjects
- *
K-nearest neighbor classification , *SUPPORT vector machines , *LUNG cancer , *LUNG diseases , *RANDOM forest algorithms , *PEARSON correlation (Statistics) - Abstract
One of the world's deadliest diseases is lung cancer. Based on a few features, machine learning techniques can help in the diagnosis of lung cancer. The performance of several classifiers: support vector machine (SVM), logistic regression (LR), Naïve Bayes (NB), random forest (RF), and K-nearest neighbor (KNN), was evaluated by the authors using the dataset available on Kaggle to create a systematic approach for the diagnosis of lung cancer disease based on readily observable signs and historical medical data without the requirement of CT scan images. The authors have proposed a novel approach for classification called Pearson correlation weighted KNN (PCWKNN), which is a modified version of KNN and uses Pearson correlation coefficient values to determine weights in a weighted KNN. The performance of the classifiers was evaluated using the hold-out validation method. SVM, LR, and RF were 96.77% accurate. NB obtained 95.16% accuracy. KNN achieved 91.93% accuracy. PCWKNN outperformed the employed classifiers and obtained an accuracy of 98.39%. Addressing the imperative for improved model generalization, the researchers utilized PCWKNN on an alternative, more extensive lung cancer dataset and subsequently broadened its application to diverse diseases, including the brain stroke dataset. The encouraging outcomes underscore PCWKNN's resilience and adaptability, suggesting its viability for real-world implementation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF