Back to Search Start Over

POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching

Authors :
Susanna Holmström
Sampsa Hautaniemi
Antti Häkkinen
Sampsa Hautaniemi / Principal Investigator
Faculty Common Matters (Faculty of Medicine)
Bioinformatics
Department of Biochemistry and Developmental Biology
Research Program in Systems Oncology
Faculty of Medicine
Source :
Bioinformatics
Publication Year :
2022
Publisher :
Oxford University Press (OUP), 2022.

Abstract

Motivation RNA sequencing and other high-throughput technologies are essential in understanding complex diseases, such as cancers, but are susceptible to technical factors manifesting as patterns in the measurements. These batch patterns hinder the discovery of biologically relevant patterns. Unbiased batch effect correction in heterogeneous populations currently requires special experimental designs or phenotypic labels, which are not readily available for patient samples in existing datasets. Results We present POIBM, an RNA-seq batch correction method, which learns virtual reference samples directly from the data. We use a breast cancer cell line dataset to show that POIBM exceeds or matches the performance of previous methods, while being blind to the phenotypes. Further, we analyze The Cancer Genome Atlas RNA-seq data to show that batch effects plague many cancer types; POIBM effectively discovers the true replicates in stomach adenocarcinoma; and integrating the corrected data in endometrial carcinoma improves cancer subtyping. Availability and implementation https://bitbucket.org/anthakki/poibm/ (archived at https://doi.org/10.5281/zenodo.6122436). Supplementary information Supplementary data are available at Bioinformatics online.

Details

ISSN :
13674811 and 13674803
Volume :
38
Database :
OpenAIRE
Journal :
Bioinformatics
Accession number :
edsair.doi.dedup.....0bde99b8bd8f2bf9807d6e990be66fbe