Back to Search Start Over

Beam search decoder for enhancing sequence decoding speed in single-molecule peptide sequencing data.

Authors :
Kipen, Javier
Jaldén, Joakim
Source :
PLoS Computational Biology. 11/7/2023, Vol. 19 Issue 11, p1-21. 21p.
Publication Year :
2023

Abstract

Next-generation single-molecule protein sequencing technologies have the potential to significantly accelerate biomedical research. These technologies offer sensitivity and scalability for proteomic analysis. One auspicious method is fluorosequencing, which involves: cutting naturalized proteins into peptides, attaching fluorophores to specific amino acids, and observing variations in light intensity as one amino acid is removed at a time. The original peptide is classified from the sequence of light-intensity reads, and proteins can subsequently be recognized with this information. The amino acid step removal is achieved by attaching the peptides to a wall on the C-terminal and using a process called Edman Degradation to remove an amino acid from the N-Terminal. Even though a framework (Whatprot) has been proposed for the peptide classification task, processing times remain restrictive due to the massively parallel data acquisicion system. In this paper, we propose a new beam search decoder with a novel state formulation that obtains considerably lower processing times at the expense of only a slight accuracy drop compared to Whatprot. Furthermore, we explore how our novel state formulation may lead to even faster decoders in the future. Author summary: Proteomic analyses frequently rely on mass spectrometry, a method characterized by its limited dynamic range, potentially overlooking low-abundant proteins. To address this limitation, single-molecule protein sequencing methods offer a solution. Fluorosequencing is a cutting-edge single-molecule protein sequencing method, which can distinguish peptides or protein molecules massively parallelly. This method has attracted interest from investors, as evidenced by the recent funding of Erisyon, a company developing this technology. This technique contains a challenging classification task: determining the original peptide sequence from light-intensity observations obtained after several Edman cycles. A classifier based on a combination of k Nearest Neighbors (kNN) with Hidden Markov Models (HMM) had been shown to have close-to-optimal accuracy with tractable complexity. We propose in this paper a new algorithm that reduces computation time significantly at the expense of a slight reduction in accuracy compared to state-of-the-art method. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1553734X
Volume :
19
Issue :
11
Database :
Academic Search Index
Journal :
PLoS Computational Biology
Publication Type :
Academic Journal
Accession number :
173472439
Full Text :
https://doi.org/10.1371/journal.pcbi.1011345