1. Bidirectional de novo peptide sequencing using a transformer model.
- Author
-
Lee, Sangjeong and Kim, Hyunwoo
- Subjects
AMINO acid sequence ,DEEP learning ,TANDEM mass spectrometry ,MASS spectrometry ,AMINO acids ,SEQUENCE analysis ,PROTEOMICS ,HEBBIAN memory - Abstract
In proteomics, a crucial aspect is to identify peptide sequences. De novo sequencing methods have been widely employed to identify peptide sequences, and numerous tools have been proposed over the past two decades. Recently, deep learning approaches have been introduced for de novo sequencing. Previous methods focused on encoding tandem mass spectra and predicting peptide sequences from the first amino acid onwards. However, when predicting peptides using tandem mass spectra, the peptide sequence can be predicted not only from the first amino acid but also from the last amino acid due to the coexistence of b-ion (or a- or c-ion) and y-ion (or x- or z-ion) fragments in the tandem mass spectra. Therefore, it is essential to predict peptide sequences bidirectionally. Our approach, called NovoB, utilizes a Transformer model to predict peptide sequences bidirectionally, starting with both the first and last amino acids. In comparison to Casanovo, our method achieved an improvement of the average peptide-level accuracy rate of approximately 9.8% across all species. Author summary: Understanding the characteristics of data is very important in deep learning methods. When predicting sentences, the transformer model naturally predicts from the first word. For this reason, previous methods predicted peptide sequences from the first amino acid. However, in tandem mass spectra, it is possible to predict peptide sequences bidirectionally. This method shows better results than previous approaches because it can better encode tandem mass spectra. We have demonstrated that good results can be achieved simply by understanding the characteristics of such data and using the model appropriately. We hope that this paper will help various readers improve the performance capabilities of their models. Furthermore, given that bidirectional peptide sequence prediction is crucial in de novo peptide sequence analysis, we hope that this approach will be applied to both existing and future methods utilizing deep learning techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF