1. Analysis of different machine learning classifiers on MP election commission and breast cancer big dataset
- Author
-
Priyank Jain and Shriya Sahu
- Subjects
010302 applied physics ,Learning classifier system ,Computer science ,business.industry ,Big data ,Decision tree ,02 engineering and technology ,General Medicine ,021001 nanoscience & nanotechnology ,Machine learning ,computer.software_genre ,01 natural sciences ,Class (biology) ,Support vector machine ,Naive Bayes classifier ,ComputingMethodologies_PATTERNRECOGNITION ,Software ,0103 physical sciences ,Scalability ,Artificial intelligence ,0210 nano-technology ,business ,computer - Abstract
This paper is a unique effort to resolve the scaling issues of machine learning using the multi-node environment of Big Data. For this purpose, it incorporates the concept of machine learning with big data. Machine learning is a branch of Artificial Intelligence which trains the computers to learn without being explicitly programmed. Machine learning works on the development of computer programs and software that work according to the input dataset. In machine learning, classification is used to identify the class of instances. In this the category or class or a group provides the new observation to which it belongs on the basis of the training dataset. Training datasets contain those instances whose class is already known. An algorithm that implements classification is known as a classifier. The main aim of this paper is to evaluate how different classifiers work on two different datasets, one is a real time dataset of MP Election nomination (2018), while the other is a standard Breast Cancer dataset, released by University of Wisconsin (1995). There are various classifiers available like Decision Trees, K-Nearest Neighbors, Support Vector Machine, Logistic Regression, Naive Bayes etc. Different classifiers work differently on different datasets, giving different accuracy according to the kind and the size of the dataset. So we have used different classifiers for our work. When we were applying machine learning classifier on a real-time dataset, i.e. MPSEC dataset, Accuracy goes down. Similarly, these classifiers apply to Standard Breast Cancer dataset then results show outstanding accuracy. This research work shows the exciting patterns to study the case of a real-time and standard dataset using different machine learning classifiers with a combination of big data. This research work also incorporates the concept of parallelization using a multi-nodes environment which is able to deal with scalability issues.
- Published
- 2023