Back to Search
Start Over
Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
- Source :
- BioData Mining
- Publication Year :
- 2016
- Publisher :
- Springer Science and Business Media LLC, 2016.
-
Abstract
- Background An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. Results In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Conclusions Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.
- Subjects :
- Computer science
Population
Binary number
Scale (descriptive set theory)
02 engineering and technology
Machine learning
computer.software_genre
Biochemistry
Imbalanced dataset
Biomedical data
020204 information systems
Credibility
0202 electrical engineering, electronic engineering, information engineering
Genetics
Oversampling
education
Molecular Biology
SMOTE
education.field_of_study
business.industry
Research
Swarm behaviour
Swarm optimisation
Classification
Class (biology)
Computer Science Applications
Computational Mathematics
ComputingMethodologies_PATTERNRECOGNITION
Computational Theory and Mathematics
Dynamic Multi-objective
020201 artificial intelligence & image processing
Artificial intelligence
Data mining
Under-sampling
business
computer
Algorithm
Subjects
Details
- ISSN :
- 17560381
- Volume :
- 9
- Database :
- OpenAIRE
- Journal :
- BioData Mining
- Accession number :
- edsair.doi.dedup.....b150ec597635415b618c11b7cc17f0de
- Full Text :
- https://doi.org/10.1186/s13040-016-0117-1