1. Beam search decoder for enhancing sequence decoding speed in single-molecule peptide sequencing data.
- Author
-
Kipen, Javier and Jaldén, Joakim
- Subjects
AMINO acid sequence ,MOLECULAR beams ,HIDDEN Markov models ,PEPTIDES ,AMINO acids ,PROTEOMICS - Abstract
Next-generation single-molecule protein sequencing technologies have the potential to significantly accelerate biomedical research. These technologies offer sensitivity and scalability for proteomic analysis. One auspicious method is fluorosequencing, which involves: cutting naturalized proteins into peptides, attaching fluorophores to specific amino acids, and observing variations in light intensity as one amino acid is removed at a time. The original peptide is classified from the sequence of light-intensity reads, and proteins can subsequently be recognized with this information. The amino acid step removal is achieved by attaching the peptides to a wall on the C-terminal and using a process called Edman Degradation to remove an amino acid from the N-Terminal. Even though a framework (Whatprot) has been proposed for the peptide classification task, processing times remain restrictive due to the massively parallel data acquisicion system. In this paper, we propose a new beam search decoder with a novel state formulation that obtains considerably lower processing times at the expense of only a slight accuracy drop compared to Whatprot. Furthermore, we explore how our novel state formulation may lead to even faster decoders in the future. Author summary: Proteomic analyses frequently rely on mass spectrometry, a method characterized by its limited dynamic range, potentially overlooking low-abundant proteins. To address this limitation, single-molecule protein sequencing methods offer a solution. Fluorosequencing is a cutting-edge single-molecule protein sequencing method, which can distinguish peptides or protein molecules massively parallelly. This method has attracted interest from investors, as evidenced by the recent funding of Erisyon, a company developing this technology. This technique contains a challenging classification task: determining the original peptide sequence from light-intensity observations obtained after several Edman cycles. A classifier based on a combination of k Nearest Neighbors (kNN) with Hidden Markov Models (HMM) had been shown to have close-to-optimal accuracy with tractable complexity. We propose in this paper a new algorithm that reduces computation time significantly at the expense of a slight reduction in accuracy compared to state-of-the-art method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF