Back to Search
Start Over
Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions
- Publication Year :
- 2017
- Publisher :
- Oxford University Press, 2017.
-
Abstract
- RNA-Seq is a widely-used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An under-emphasized feature of normalization is the assumptions upon which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this paper, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment.<br />Comment: 20 pages, 6 figures, 1 table. Supplementary information contains 9 pages, 1 table. For associated simulation code, see https://github.com/ciaranlevans/rnaSeqAssumptions
- Subjects :
- 0301 basic medicine
Normalization (statistics)
Paper
Differential expression analysis
Computer science
Machine learning
computer.software_genre
03 medical and health sciences
Databases, Genetic
False positive paradox
Humans
Quantitative Biology - Genomics
Computer Simulation
RNA, Messenger
Molecular Biology
Genomics (q-bio.GN)
business.industry
Sequence Analysis, RNA
Gene Expression Profiling
Computational Biology
High-Throughput Nucleotide Sequencing
030104 developmental biology
FOS: Biological sciences
Artificial intelligence
business
Raw data
computer
Information Systems
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....7888c931d5fbef09c06a131ea2d3976f