Back to Search Start Over

Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data

Authors :
Michael Boehnke
Kimberly F. Doheny
Kurt N. Hetrick
Goo Jun
Matthew Flickinger
Jane Romm
Gonçalo R. Abecasis
Hyun Min Kang
Source :
The American Journal of Human Genetics. 91:839-848
Publication Year :
2012
Publisher :
Elsevier BV, 2012.

Abstract

DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies.

Details

ISSN :
00029297
Volume :
91
Database :
OpenAIRE
Journal :
The American Journal of Human Genetics
Accession number :
edsair.doi.dedup.....0333848056ac3d42f5c5290e0eaaf255
Full Text :
https://doi.org/10.1016/j.ajhg.2012.09.004