Back to Search Start Over

Revising a personal genome by comparing and combining data from two different sequencing platforms.

Authors :
Deokhoon Kim
Woo-Yeon Kim
Sun-Young Lee
Sung-Yeoun Lee
Hongseok Yun
Soo-Yong Shin
Jungyoun Lee
Yoojin Hong
Youngmi Won
Seong-Jin Kim
Yong Seok Lee
Sung-Min Ahn
Source :
PLoS ONE, Vol 8, Iss 4, p e60585 (2013)
Publication Year :
2013
Publisher :
Public Library of Science (PLoS), 2013.

Abstract

For the robust practice of genomic medicine, sequencing results must be compatible, regardless of the sequencing technologies and algorithms used. Presently, genome sequencing is still an imprecise science and is complicated by differences in the chemistry, coverage, alignment, and variant-calling algorithms. We identified ~3.33 million single nucleotide variants (SNVs) and ~3.62 million SNVs in the SJK genome using SOLiD and Illumina data, respectively. Approximately 3 million SNVs were concordant between the two platforms while 68,532 SNVs were discordant; 219,616 SNVs were SOLiD-specific and 516,080 SNVs were Illumina-specific (i.e., platform-specific). Concordant, discordant, and platform-specific SNVs were further analyzed and characterized. Overall, a large portion of heterozygous SNVs that were discordant with genotyping calls of single nucleotide polymorphism chips were highly confident. Approximately 70% of the platform-specific SNVs were located in regions containing repetitive sequences. Such platform-specificity may arise from differences between platforms, with regard to read length (36 bp and 72 bp vs. 50 bp), insert size (~100-300 bp vs. ~1-2 kb), sequencing chemistry (sequencing-by-synthesis using single nucleotides vs. ligation-based sequencing using oligomers), and sequencing quality. When data from the two platforms were merged for variant calling, the proportion of callable regions of the reference genome increased to 99.66%, which was 1.43% higher than the average callability of the two platforms, representing ~40 million bases. In this study, we compared the differences in sequencing results between two sequencing platforms. Approximately 90% of the SNVs were concordant between the two platforms, yet ~10% of the SNVs were either discordant or platform-specific, indicating that each platform had its own strengths and weaknesses. When data from the two platforms were merged, both the overall callability of the reference genome and the overall accuracy of the SNVs improved, demonstrating the likelihood that a re-sequenced genome can be revised using complementary data.

Subjects

Subjects :
Medicine
Science

Details

Language :
English
ISSN :
19326203
Volume :
8
Issue :
4
Database :
Directory of Open Access Journals
Journal :
PLoS ONE
Publication Type :
Academic Journal
Accession number :
edsdoj.81e2f640617a4459990782f66aeed752
Document Type :
article
Full Text :
https://doi.org/10.1371/journal.pone.0060585