Back to Search Start Over

Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons

Authors :
Marcel Smid
Robert R. J. Coebergh van den Braak
Harmen J. G. van de Werken
Job van Riet
Anne van Galen
Vanja de Weerd
Michelle van der Vlugt-Daane
Sandra I. Bril
Zarina S. Lalmahomed
Wigard P. Kloosterman
Saskia M. Wilting
John A. Foekens
Jan N. M. IJzermans
on behalf of the MATCH study group
John W. M. Martens
Anieta M. Sieuwerts
Source :
BMC Bioinformatics, Vol 19, Iss 1, Pp 1-13 (2018)
Publication Year :
2018
Publisher :
BMC, 2018.

Abstract

Abstract Background Current normalization methods for RNA-sequencing data allow either for intersample comparison to identify differentially expressed (DE) genes or for intrasample comparison for the discovery and validation of gene signatures. Most studies on optimization of normalization methods typically use simulated data to validate methodologies. We describe a new method, GeTMM, which allows for both inter- and intrasample analyses with the same normalized data set. We used actual (i.e. not simulated) RNA-seq data from 263 colon cancers (no biological replicates) and used the same read count data to compare GeTMM with the most commonly used normalization methods (i.e. TMM (used by edgeR), RLE (used by DESeq2) and TPM) with respect to distributions, effect of RNA quality, subtype-classification, recurrence score, recall of DE genes and correlation to RT-qPCR data. Results We observed a clear benefit for GeTMM and TPM with regard to intrasample comparison while GeTMM performed similar to TMM and RLE normalized data in intersample comparisons. Regarding DE genes, recall was found comparable among the normalization methods, while GeTMM showed the lowest number of false-positive DE genes. Remarkably, we observed limited detrimental effects in samples with low RNA quality. Conclusions We show that GeTMM outperforms established methods with regard to intrasample comparison while performing equivalent with regard to intersample normalization using the same normalized data. These combined properties enhance the general usefulness of RNA-seq but also the comparability to the many array-based gene expression data in the public domain.

Details

Language :
English
ISSN :
14712105
Volume :
19
Issue :
1
Database :
Directory of Open Access Journals
Journal :
BMC Bioinformatics
Publication Type :
Academic Journal
Accession number :
edsdoj.0b7a92e9f436482ca4f434ca1ad429a9
Document Type :
article
Full Text :
https://doi.org/10.1186/s12859-018-2246-7