Back to Search Start Over

DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer.

Authors :
Baid G
Cook DE
Shafin K
Yun T
Llinares-López F
Berthet Q
Belyaeva A
Töpfer A
Wenger AM
Rowell WJ
Yang H
Kolesnikov A
Ammar W
Vert JP
Vaswani A
McLean CY
Nattestad M
Chang PC
Carroll A
Source :
Nature biotechnology [Nat Biotechnol] 2023 Feb; Vol. 41 (2), pp. 232-238. Date of Electronic Publication: 2022 Sep 01.
Publication Year :
2023

Abstract

Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10-25 kilobases), accurate 'HiFi' reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer-encoder for sequence correction. Compared to pbccs, DeepConsensus reduces read errors by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27% and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9 megabases (Mb) to 17.2 Mb), increase gene completeness (94% to 97%), reduce the false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45) and reduce variant-calling errors by 24%. DeepConsensus models could be trained to the general problem of analyzing the alignment of other types of sequences, such as unique molecular identifiers or genome assemblies.<br /> (© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.)

Details

Language :
English
ISSN :
1546-1696
Volume :
41
Issue :
2
Database :
MEDLINE
Journal :
Nature biotechnology
Publication Type :
Academic Journal
Accession number :
36050551
Full Text :
https://doi.org/10.1038/s41587-022-01435-7