1. LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants
- Author
-
Liang-Chi Chen, Yao-Ting Huang, Jyun-Hong Lin, and Shu-Qi Yu
- Subjects
Statistics and Probability ,Computer science ,Scale (descriptive set theory) ,Phaser ,Genome ,Biochemistry ,Computer Science Applications ,Computational Mathematics ,Chromosome (genetic algorithm) ,Computational Theory and Mathematics ,Metagenomics ,Ultra fast ,Human genome ,Algorithm ,Molecular Biology - Abstract
Motivation Long-read phasing has been used for reconstructing diploid genomes, improving variant calling and resolving microbial strains in metagenomics. However, the phasing blocks of existing methods are broken by large Structural Variations (SVs), and the efficiency is unsatisfactory for population-scale phasing. Results This article presents a novel algorithm, LongPhase, which can simultaneously phase single nucleotide polymorphisms (SNPs) and SVs of a human genome in 10–20 min, 10× faster than the state-of-the-art WhatsHap, HapCUT2 and Margin. In particular, co-phasing SNPs and SVs produces much larger haplotype blocks (N50 = 25 Mbp) than those of existing methods (N50 = 10–15 Mbp). We show that LongPhase combined with Nanopore ultra-long reads is a cost-effective and highly contiguous solution, which can produce between one and 26 blocks per chromosome arm without the need for additional trios, chromosome-conformation and strand-seq data. Availabilityand implementation LongPhase is freely available at https://github.com/twolinin/LongPhase/. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2022