Back to Search Start Over

Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.

Authors :
Crabtree NM
Moore JH
Bowyer JF
George NI
Source :
BioData mining [BioData Min] 2017 Apr 24; Vol. 10, pp. 13. Date of Electronic Publication: 2017 Apr 24 (Print Publication: 2017).
Publication Year :
2017

Abstract

Background: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES.<br />Results: The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small.<br />Conclusion: The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.

Details

Language :
English
ISSN :
1756-0381
Volume :
10
Database :
MEDLINE
Journal :
BioData mining
Publication Type :
Academic Journal
Accession number :
28450890
Full Text :
https://doi.org/10.1186/s13040-017-0134-8