Back to Search Start Over

Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra

Authors :
Rieder, Vera
Schork, Karin U.
Kerschke, Laura
Blank-Landeshammer, Bernhard
Sickmann, Albert
Rahnenführer, Jörg
Source :
Journal of Proteome Research; 20240101, Issue: Preprints
Publication Year :
2024

Abstract

In proteomics, liquid chromatography–tandem mass spectrometry (LC–MS/MS) is established for identifying peptides and proteins. Duplicated spectra, that is, multiple spectra of the same peptide, occur both in single MS/MS runs and in large spectral libraries. Clustering tandem mass spectra is used to find consensus spectra, with manifold applications. First, it speeds up database searches, as performed for instance by Mascot. Second, it helps to identify novel peptides across species. Third, it is used for quality control to detect wrongly annotated spectra. We compare different clustering algorithms based on the cosine distance between spectra. CAST, MS-Cluster, and PRIDE Cluster are popular algorithms to cluster tandem mass spectra. We add well-known algorithms for large data sets, hierarchical clustering, DBSCAN, and connected components of a graph, as well as the new method N-Cluster. All algorithms are evaluated on real data with varied parameter settings. Cluster results are compared with each other and with peptide annotations based on validation measures such as purity. Quality control, regarding the detection of wrongly (un)annotated spectra, is discussed for exemplary resulting clusters. N-Cluster proves to be highly competitive. All clustering results benefit from the so-called DISMS2 filter that integrates additional information, for example, on precursor mass.

Details

Language :
English
ISSN :
15353893 and 15353907
Issue :
Preprints
Database :
Supplemental Index
Journal :
Journal of Proteome Research
Publication Type :
Periodical
Accession number :
ejs43358252
Full Text :
https://doi.org/10.1021/acs.jproteome.7b00427