Back to Search Start Over

A learned score function improves the power of mass spectrometry database search.

Authors :
Ananth, Varun
Sanders, Justin
Yilmaz, Melih
Wen, Bo
Oh, Sewoong
Source :
Bioinformatics. 2024 Supplement, Vol. 40, pi410-i417. 8p.
Publication Year :
2024

Abstract

Motivation One of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search and de novo peptide sequencing. State-of-the-art methods for de novo sequencing use machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesized that machine learning models for de novo sequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools. Results To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-art de novo sequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13674803
Volume :
40
Database :
Academic Search Index
Journal :
Bioinformatics
Publication Type :
Academic Journal
Accession number :
178778991
Full Text :
https://doi.org/10.1093/bioinformatics/btae218