Start Over

High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

Authors :: Dilthey AT
Gourraud PA
Mentzer AJ
Cereb N
Iqbal Z
McVean G
Source :: PLoS computational biology [PLoS Comput Biol] 2016 Oct 28; Vol. 12 (10), pp. e1005151. Date of Electronic Publication: 2016 Oct 28 (Print Publication: 2016).
Publication Year :: 2016
Abstract: Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant challenge to practical application.<br />Competing Interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: ATD and GM are partners in Peptide Groove, LLP. GM is a founder and shareholder of Genomics, Ltd. NC has ownership and management interest in Histogenetics.

Subjects :: Humans
Reference Values
Algorithms
Chromosome Mapping methods
Genetics, Population
Genome, Human genetics
Hemochromatosis Protein genetics
High-Throughput Nucleotide Sequencing methods

Details

Language :: English
ISSN :: 1553-7358
Volume :: 12
Issue :: 10
Database :: MEDLINE
Journal :: PLoS computational biology
Publication Type :: Academic Journal
Accession number :: 27792722
Full Text :: https://doi.org/10.1371/journal.pcbi.1005151

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources