Back to Search Start Over

Virus expression detection reveals RNA-sequencing contamination in TCGA

Authors :
Sara R. Selitsky
David Marron
Daniel Hollern
Lisle E. Mose
Katherine A. Hoadley
Corbin Jones
Joel S. Parker
Dirk P. Dittmer
Charles M. Perou
Source :
BMC Genomics, Vol 21, Iss 1, Pp 1-11 (2020)
Publication Year :
2020
Publisher :
BMC, 2020.

Abstract

Abstract Background Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in several of The Cancer Genome Atlas (TCGA) RNA-sequencing samples. This work motivated us to assess a greater number of samples and determine the origin of possible contaminations using viral sequences. To detect viruses with high specificity, we developed the publicly available workflow, VirDetect, that detects virus and laboratory vector sequences in RNA-seq samples. We applied VirDetect to 9143 RNA-seq samples sequenced at one TCGA sequencing center (28/33 cancer types) over 5 years. Results We confirmed that H-HPV18 was present in many samples and determined that viral transcripts from H-HPV18 significantly co-occurred with those from xenotropic mouse leukemia virus-related virus (XMRV). Using laboratory metadata and viral transcription, we determined that the likely contaminant was a pool of cell lines known as the “common reference”, which was sequenced alongside TCGA RNA-seq samples as a control to monitor quality across technology transitions (i.e. microarray to GAII to HiSeq), and to link RNA-seq to previous generation microarrays that standardly used the “common reference”. One of the cell lines in the pool was a laboratory isolate of MCF-7, which we discovered was infected with XMRV; another constituent of the pool was likely HeLa cells. Conclusions Altogether, this indicates a multi-step contamination process. First, MCF-7 was infected with an XMRV. Second, this infected cell line was added to a pool of cell lines, which contained HeLa. Finally, RNA from this pool of cell lines contaminated several TCGA tumor samples most-likely during library construction. Thus, these human tumors with H-HPV or XMRV reads were likely not infected with H-HPV 18 or XMRV.

Details

Language :
English
ISSN :
14712164
Volume :
21
Issue :
1
Database :
Directory of Open Access Journals
Journal :
BMC Genomics
Publication Type :
Academic Journal
Accession number :
edsdoj.6cefec523c50451aa5deede07a0e530b
Document Type :
article
Full Text :
https://doi.org/10.1186/s12864-020-6483-6