Start Over

Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference.

Authors :: Li, Tenglong
Zhang, Yuqing
Patil, Prasad
Johnson, W Evan
Source :: Biostatistics. Jul2023, Vol. 24 Issue 3, p635-652. 18p.
Publication Year :: 2023
Abstract: Nonignorable technical variation is commonly observed across data from multiple experimental runs, platforms, or studies. These so-called batch effects can lead to difficulty in merging data from multiple sources, as they can severely bias the outcome of the analysis. Many groups have developed approaches for removing batch effects from data, usually by accommodating batch variables into the analysis (one-step correction) or by preprocessing the data prior to the formal or final analysis (two-step correction). One-step correction is often desirable due it its simplicity, but its flexibility is limited and it can be difficult to include batch variables uniformly when an analysis has multiple stages. Two-step correction allows for richer models of batch mean and variance. However, prior investigation has indicated that two-step correction can lead to incorrect statistical inference in downstream analysis. Generally speaking, two-step approaches introduce a correlation structure in the corrected data, which, if ignored, may lead to either exaggerated or diminished significance in downstream applications such as differential expression analysis. Here, we provide more intuitive and more formal evaluations of the impacts of two-step batch correction compared to existing literature. We demonstrate that the undesired impacts of two-step correction (exaggerated or diminished significance) depend on both the nature of the study design and the batch effects. We also provide strategies for overcoming these negative impacts in downstream analyses using the estimated correlation matrix of the corrected data. We compare the results of our proposed workflow with the results from other published one-step and two-step methods and show that our methods lead to more consistent false discovery controls and power of detection across a variety of batch effect scenarios. Software for our method is available through GitHub (https://github.com/jtleek/sva-devel) and will be available in future versions of the |$\texttt{sva}$| R package in the Bioconductor project (https://bioconductor.org/packages/release/bioc/html/sva.html). [ABSTRACT FROM AUTHOR]

Subjects :: *GENE expression
*INFERENTIAL statistics

Details

Language :: English
ISSN :: 14654644
Volume :: 24
Issue :: 3
Database :: Academic Search Index
Journal :: Biostatistics
Publication Type :: Academic Journal
Accession number :: 164935214
Full Text :: https://doi.org/10.1093/biostatistics/kxab039

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources