Back to Search Start Over

Increasing calling accuracy, coverage, and read-depth in sequence data by the use of haplotype blocks.

Authors :
Torsten Pook
Adnane Nemri
Eric Gerardo Gonzalez Segovia
Daniel Valle Torres
Henner Simianer
Chris-Carolin Schoen
Source :
PLoS Genetics, Vol 17, Iss 12, p e1009944 (2021)
Publication Year :
2021
Publisher :
Public Library of Science (PLoS), 2021.

Abstract

High-throughput genotyping of large numbers of lines remains a key challenge in plant genetics, requiring geneticists and breeders to find a balance between data quality and the number of genotyped lines under a variety of different existing genotyping technologies when resources are limited. In this work, we are proposing a new imputation pipeline ("HBimpute") that can be used to generate high-quality genomic data from low read-depth whole-genome-sequence data. The key idea of the pipeline is the use of haplotype blocks from the software HaploBlocker to identify locally similar lines and subsequently use the reads of all locally similar lines in the variant calling for a specific line. The effectiveness of the pipeline is showcased on a dataset of 321 doubled haploid lines of a European maize landrace, which were sequenced at 0.5X read-depth. The overall imputing error rates are cut in half compared to state-of-the-art software like BEAGLE and STITCH, while the average read-depth is increased to 83X, thus enabling the calling of copy number variation. The usefulness of the obtained imputed data panel is further evaluated by comparing the performance of sequence data in common breeding applications to that of genomic data generated with a genotyping array. For both genome-wide association studies and genomic prediction, results are on par or even slightly better than results obtained with high-density array data (600k). In particular for genomic prediction, we observe slightly higher data quality for the sequence data compared to the 600k array in the form of higher prediction accuracies. This occurred specifically when reducing the data panel to the set of overlapping markers between sequence and array, indicating that sequencing data can benefit from the same marker ascertainment as used in the array process to increase the quality and usability of genomic data.

Subjects

Subjects :
Genetics
QH426-470

Details

Language :
English
ISSN :
15537390 and 15537404
Volume :
17
Issue :
12
Database :
Directory of Open Access Journals
Journal :
PLoS Genetics
Publication Type :
Academic Journal
Accession number :
edsdoj.f02646122a2948a9b14262c3ac94938d
Document Type :
article
Full Text :
https://doi.org/10.1371/journal.pgen.1009944