Back to Search Start Over

Estimation of site frequency spectra from low-coverage sequencing data using stochastic EM reduces overfitting, runtime, and memory usage

Authors :
Rasmussen, Malthe Sebro
Garcia-Erill, Genís
Korneliussen, Thorfinn Sand
Wiuf, Carsten
Albrechtsen, Anders
Rasmussen, Malthe Sebro
Garcia-Erill, Genís
Korneliussen, Thorfinn Sand
Wiuf, Carsten
Albrechtsen, Anders
Source :
Rasmussen , M S , Garcia-Erill , G , Korneliussen , T S , Wiuf , C & Albrechtsen , A 2022 , ' Estimation of site frequency spectra from low-coverage sequencing data using stochastic EM reduces overfitting, runtime, and memory usage ' , Genetics , vol. 222 , no. 4 , iyac148 .
Publication Year :
2022

Abstract

The site frequency spectrum (SFS) is an important summary statistic in population genetics used for inference on demographic history and selection. However, estimation of the SFS from called genotypes introduce bias when working with low-coverage sequencing data. Methods exist for addressing this issue, but sometimes suffer from two problems. First, they can have very high computational demands, to the point that it may not be possible to run estimation for genome-scale data. Second, existing methods are prone to overfitting, especially for multi-dimensional SFS estimation. In this article, we present a stochastic expectation-maximisation algorithm for inferring the SFS from NGS data that addresses these challenges. We show that this algorithm greatly reduces runtime and enables estimation with constant, trivial RAM usage. Further, the algorithm reduces overfitting and thereby improves downstream inference. An implementation is available at github.com/malthesr/winsfs.

Details

Database :
OAIster
Journal :
Rasmussen , M S , Garcia-Erill , G , Korneliussen , T S , Wiuf , C & Albrechtsen , A 2022 , ' Estimation of site frequency spectra from low-coverage sequencing data using stochastic EM reduces overfitting, runtime, and memory usage ' , Genetics , vol. 222 , no. 4 , iyac148 .
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1382510845
Document Type :
Electronic Resource