1. Diagnosis of Breast Cancer Using Random Forests.
- Author
-
Minnoor, Manas and Baths, Veeky
- Subjects
RANDOM forest algorithms ,SUPERVISED learning ,MACHINE learning ,CANCER diagnosis ,SUPPORT vector machines ,FEATURE selection - Abstract
Breast cancer was the most diagnosed form of cancer in 2020. Early diagnosis of breast cancer results in a significant improvement in long-term survival rates. Current methods require consultation with experts, which is expensive and time-consuming and thus may not be accessible to all. This paper seeks to train and evaluate supervised machine learning models for the accurate and efficient detection of breast cancer. The Wisconsin Breast Cancer Database dataset describes 30 attributes of cell nuclei, including, but not limited to, their radius, texture, and concavity. It contains 569 instances, 212 of which are malignant tumors. The Random Forest algorithm outperforms other algorithms in classifying breast tumors as either malignant or benign and is thus selected as our primary model. It is trained on two different subsets of the dataset having 16 and 8 features, respectively, identified with the help of multiple feature selection methods. The Random Forest models are tested post hyperparameter tuning on a holdout set, and accuracies of 100% and 99.30% respectively. The models are also compared with four other machine learning classification algorithms: Support Vector Machine (SVM), Decision Tree, Multilayer Perceptron, and K-Nearest Neighbors. The results confirm that Random Forest is the superior method for breast cancer diagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF