1. Unified and Standardized Mass Spectrometry Data Processing in Python Using spectrum_utils
- Author
-
Wout Bittremieux, Lev Levitsky, Matteo Pilz, Timo Sachsenberg, Florian Huber, Mingxun Wang, and Pieter C. Dorrestein
- Subjects
Chemistry ,General Chemistry ,Biology ,Biochemistry - Abstract
Introduction There exists a rich ecosystem of open-source MS software. Compared to vendor software and other closed-source software, these open-source solutions provide flexibility to develop powerful functionalities, are verifiable through their open-source nature, and have garnered widespread community support to improve robustness and grow their capabilities. spectrum_utils provides a Python-based solution to cover common MS/MS spectrum operations, enabling the community to quickly prototype computational ideas for mass spectrometry projects and to produce publication-quality and interactive spectrum graphics. Here we present spectrum_utils version 0.4.0, which has been extended with support for community data standards, updated visualization functionalities, performance improvements, and integration with complementary MS software libraries. This has enabled spectrum_utils to grow into a building block of the MS Python ecosystem. Methods spectrum_utils supports several official data standards and best practices developed by the Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO). Specifically, spectrum_utils has now integrated support for the Universal Spectrum Identifier (USI) for convenient retrieval of MS data from ProteomeXchange and other online resources, the ProForma 2.0 specification to encode proteoform information, and the mzPAF specification for standardized fragment ion annotations. Based on a convenient high-level application programming interface (API), flexible MS/MS data processing and visualization can be achieved using only a small number of lines of code. spectrum_utils has also been extended and made compatible with third-party Python MS libraries focusing on both proteomics and metabolomics, including Pyteomics, pyOpenMS, and matchms for MS-based proteomics and metabolomics. Preliminary Data spectrum_utils combines a high-level Python API to easily perform common MS data tasks with only a single line of code, such as annotating MS/MS spectra with their peptide labels after database searching or visualizing spectrum–spectrum matches from spectral library searching using mirror plots. Additionally, power users can expand upon the spectrum_utils functions and infinitely customize their results by writing surrounding Python code. As an example, we demonstrate these aspects by annotating fragment ions for 2,153,703 MS/MS spectra in the MassIVE-KB v1 spectral library, which is a repository-wide HCD spectral library derived from 227 public proteomics datasets on the MassIVE repository. This functionality is similar to some closed-source software, but is inherently flexible using only a few lines of Python code, supports an extensive variety of PTMs through modification support in ProForma 2.0, and is fully cross-platform and open source. To interpret the observed fragment ions, we considered a, b, c, x, y, and z peptide fragments, immonium ions, internal fragment ions, and intact precursor ions. Additionally, common neutral losses were considered for any of these ions. Despite the many theoretical fragments that are possible, peak annotation of over 2 million MS/MS spectra took under 2.5 hours. On average, 74% of the observed intensity of the spectra could be explained by a matching peak interpretation. As expected, the most prevalent ion types were y ions and b ions. Driven by the large number of internal fragment ions that can be considered, these also covered a non-negligible amount of intensity. Although the majority of explained intensity corresponds to fragments that do not include a neutral loss, a quarter of the observed intensity matches fragment ions that have undergone a wide variety of neutral losses, which indicates that considering appropriate neutral losses can boost the quality of spectrum annotations. Novel Aspect spectrum_utils is a community-driven and open-source solution for powerful, flexible, and efficient MS data manipulation for proteomics and metabolomics.
- Published
- 2023