Back to Search Start Over

LASSO‐based false‐positive selection for class‐imbalanced data in metabolomics.

Authors :
Fu, Guang‐Hui
Yi, Lun‐Zhao
Pan, Jianxin
Source :
Journal of Chemometrics. Oct2019, Vol. 33 Issue 10, pN.PAG-N.PAG. 1p.
Publication Year :
2019

Abstract

Feature selection and rebalancing can be seen as two preprocessing ways in class‐imbalanced learning. Recently, there have been many research achievements and applications on LASSO‐type feature selection, whereas most of them are not directly designed for addressing class‐imbalanced data. In this study, we proposed a LASSO‐based stable feature selection algorithm for class‐imbalanced data analysis, and false‐positive selection (FPS) under balanced and imbalanced situations was calculated via selection frequency of each predictor in doing stable selection. The results on simulation studies and real data examples show that class imbalance contributes to avoid overselection caused by LASSO when the data are highly correlated and a lower FPS can be obtained with class‐imbalanced data than balanced one in most of cases in the same settings. A statistical explanation was given for this phenomenon. In addition, it does not need to rebalance the class‐imbalanced data for performing such LASSO‐based feature selection with a stable strategy, and to some degree, intentionally disequilibrating the balanced data could be an alternative strategy to weaken overselection and to perform biomarker identification for finding a few of most important biomarkers. A LASSO‐based stable feature selection algorithm was proposed for imbalanced data. It is not necessary to rebalance the imbalanced data with such stable feature selection strategy. Class imbalance even contributes to avoid overselection caused by LASSO when data are strongly correlated. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08869383
Volume :
33
Issue :
10
Database :
Academic Search Index
Journal :
Journal of Chemometrics
Publication Type :
Academic Journal
Accession number :
139211047
Full Text :
https://doi.org/10.1002/cem.3177