Back to Search
Start Over
DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer.
- Source :
-
Nature biotechnology [Nat Biotechnol] 2023 Feb; Vol. 41 (2), pp. 232-238. Date of Electronic Publication: 2022 Sep 01. - Publication Year :
- 2023
-
Abstract
- Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10-25 kilobases), accurate 'HiFi' reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer-encoder for sequence correction. Compared to pbccs, DeepConsensus reduces read errors by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27% and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9 megabases (Mb) to 17.2 Mb), increase gene completeness (94% to 97%), reduce the false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45) and reduce variant-calling errors by 24%. DeepConsensus models could be trained to the general problem of analyzing the alignment of other types of sequences, such as unique molecular identifiers or genome assemblies.<br /> (© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.)
- Subjects :
- Sequence Analysis, DNA
High-Throughput Nucleotide Sequencing
Subjects
Details
- Language :
- English
- ISSN :
- 1546-1696
- Volume :
- 41
- Issue :
- 2
- Database :
- MEDLINE
- Journal :
- Nature biotechnology
- Publication Type :
- Academic Journal
- Accession number :
- 36050551
- Full Text :
- https://doi.org/10.1038/s41587-022-01435-7