Back to Search Start Over

Haplotype Frequency Inference From Pooled Genetic Data With a Latent Multinomial Model

Authors :
Foo, Yong See
Flegg, Jennifer
Source :
IEEE/ACM Transactions on Computational Biology and Bioinformatics; November 2024, Vol. 21 Issue: 6 p1864-1873, 10p
Publication Year :
2024

Abstract

In genetic association studies, haplotype data provide more refined information than data about separate genetic markers. However, large-scale studies that genotype hundreds to thousands of individuals may only provide results of pooled data. Methods for inferring haplotype frequencies from pooled genetic data that scale well with pool size rely on a normal approximation, which we observe to produce unreliable inference when applied to real data. We illustrate cases where the approximation fails, due to the normal covariance matrix being near-singular. As an alternative to approximate methods, in this paper we propose two exact methods to infer haplotype frequencies from pooled genetic data based on a latent multinomial model, where the pooled results are considered integer combinations of latent, unobserved haplotype counts. One of our methods, latent count sampling via Markov bases, achieves approximately linear runtime with respect to pool size. Our exact methods produce more accurate inference over existing approximate methods for synthetic data and for haplotype data from the 1000 Genomes Project. We also demonstrate how our methods can be applied to time-series of pooled genetic data, as a proof of concept of how our methods are relevant to more complex hierarchical settings, such as spatiotemporal models.

Details

Language :
English
ISSN :
15455963 and 15579964
Volume :
21
Issue :
6
Database :
Supplemental Index
Journal :
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Publication Type :
Periodical
Accession number :
ejs68307693
Full Text :
https://doi.org/10.1109/TCBB.2024.3420430