Back to Search
Start Over
Primal-Dual for Classification with Rejection (PD-CR): A novel method for classification and feature selection. An application in metabolomics studies
- Source :
- BMC Bioinformatics, BMC Bioinformatics, BioMed Central, In press, BMC Bioinformatics, 2021, 22 (1), ⟨10.1186/s12859-021-04478-w⟩, BMC Bioinformatics, BioMed Central, 2021, 22 (1), ⟨10.1186/s12859-021-04478-w⟩, BMC Bioinformatics, 2021, 22 (1), pp.594. ⟨10.1186/s12859-021-04478-w⟩, BMC Bioinformatics, Vol 22, Iss 1, Pp 1-17 (2021), BMC Bioinformatics, BioMed Central, In press, ⟨10.1186/s12859-021-04478-w⟩
- Publication Year :
- 2021
- Publisher :
- HAL CCSD, 2021.
-
Abstract
- Background: Supervised classification methods have been used for many years for feature selection in metabolomics and other omics studies. We developed a novel primal-dual based classification method (PD-CR) that can perform classification with rejection and feature selection on high dimensional datasets. PD-CR projects data onto a low dimension space and performs classification by minimizing an appropriate quadratic cost. It simultaneously optimizes the selected features and the prediction accuracy with a new tailored, constrained primal-dual method. The primal-dual framework is general enough to encompass various robust losses and to allow for convergence analysis. Here, we compared PD-CR to two commonly used methods : Partial Least Squares Discriminant Analysis (PLS-DA) and Random Forests. We analyzed two metabolomics datasets: one urinary metabolomics dataset concerning lung cancer patients and healthy controls; and a metabolomics dataset obtained from frozen glial tumor samples with mutated isocitrate dehydrogenase (IDH) or wild-type IDH. Results: PD-CR was more accurate than PLS-DA and Random Forests for classification using the 2 metabolomics datasets. It also selected biologically relevant metabolites. PD-CR has the advantage of providing a confidence score for each prediction, which can be used to perform classification with rejection. This substantially reduces the False Discovery Rate. Conclusion: The confidence score provided with PD-CR adds considerable value to the prediction as it includes a metric that is implicitly used by every physician when they make a medical decision: the probability to make the wrong choice. So far, one of the main obstacles to the use of machine learning in medicine resides in the fact that it is harder to trust the decision of a machine learning method than that of a physician when it comes to health issues. We believe that providing a confidence score associated to the decision would make these new tools more convincing if used in routine clinical practice.
- Subjects :
- False discovery rate
Support Vector Machine
QH301-705.5
Computer science
[SDV]Life Sciences [q-bio]
Computer applications to medicine. Medical informatics
R858-859.7
Feature selection
Biochemistry
03 medical and health sciences
0302 clinical medicine
Dimension (vector space)
[STAT.ML]Statistics [stat]/Machine Learning [stat.ML]
Structural Biology
[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]
Convergence (routing)
Partial least squares regression
Humans
Metabolomics
Biology (General)
Least-Squares Analysis
Molecular Biology
030304 developmental biology
0303 health sciences
business.industry
Applied Mathematics
Research
Discriminant Analysis
Pattern recognition
Linear discriminant analysis
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
[STAT.ML] Statistics [stat]/Machine Learning [stat.ML]
Computer Science Applications
3. Good health
Random forest
Support vector machine
[INFO.INFO-TI] Computer Science [cs]/Image Processing [eess.IV]
030220 oncology & carcinogenesis
[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV]
Artificial intelligence
business
Subjects
Details
- Language :
- English
- ISSN :
- 14712105
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics, BMC Bioinformatics, BioMed Central, In press, BMC Bioinformatics, 2021, 22 (1), ⟨10.1186/s12859-021-04478-w⟩, BMC Bioinformatics, BioMed Central, 2021, 22 (1), ⟨10.1186/s12859-021-04478-w⟩, BMC Bioinformatics, 2021, 22 (1), pp.594. ⟨10.1186/s12859-021-04478-w⟩, BMC Bioinformatics, Vol 22, Iss 1, Pp 1-17 (2021), BMC Bioinformatics, BioMed Central, In press, ⟨10.1186/s12859-021-04478-w⟩
- Accession number :
- edsair.doi.dedup.....f013bd75951e1907027edf9c9a9e1d61