Back to Search Start Over

Feature-specific quantile normalization and feature-specific mean–variance normalization deliver robust bi-directional classification and feature selection performance between microarray and RNAseq data.

Authors :
Skubleny, Daniel
Ghosh, Sunita
Spratlin, Jennifer
Schiller, Daniel E.
Rayat, Gina R.
Source :
BMC Bioinformatics. 3/29/2024, Vol. 25 Issue 1, p1-14. 14p.
Publication Year :
2024

Abstract

Background: Cross-platform normalization seeks to minimize technological bias between microarray and RNAseq whole-transcriptome data. Incorporating multiple gene expression platforms permits external validation of experimental findings, and augments training sets for machine learning models. Here, we compare the performance of Feature Specific Quantile Normalization (FSQN) to a previously used but unvalidated and uncharacterized method we label as Feature Specific Mean Variance Normalization (FSMVN). We evaluate the performance of these methods for bidirectional normalization in the context of nested feature selection. Results: FSQN and FSMVN provided clinically equivalent bidirectional model performance with and without feature selection for colon CMS and breast PAM50 classification. Using principal component analysis, we determine that these methods eliminate batch effects related to technological platforms. Without feature selection, no statistical difference was identified between the performance of FSQN and FSMVN of cross-platform data compared to within-platform distributions. Under optimal feature selection conditions, balanced accuracy was FSQN and FSMVN were statistically equivalent to the within-platform distribution performance in multivariable linear regression analysis. FSQN and FSMVN also provided similar performance to within-platform distributions as the number of selected genes used to create models decreases. Conclusions: In the context of generating supervised machine learning classifiers for molecular subtypes, FSQN and FSMVN are equally effective. Under optimal modeling conditions, FSQN and FSMVN provide equivalent model accuracy performance on cross-platform normalization data compared to within-platform data. Using cross-platform data should still be approached with caution as subtle performance differences may exist depending on the classification problem, training, and testing distributions. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14712105
Volume :
25
Issue :
1
Database :
Academic Search Index
Journal :
BMC Bioinformatics
Publication Type :
Academic Journal
Accession number :
176339204
Full Text :
https://doi.org/10.1186/s12859-024-05759-w