1. DeepLC can predict retention times for peptides that carry as-yet unseen modifications
- Author
-
Sven Degroeve, Niels Hulstaert, Robbin Bouwmeester, Lennart Martens, and Ralf Gabriels
- Subjects
Proteome ,Computer science ,Datasets as Topic ,Peptide ,computer.software_genre ,01 natural sciences ,Biochemistry ,Atomic composition ,Graphical user interface ,media_common ,computer.programming_language ,chemistry.chemical_classification ,0303 health sciences ,030302 biochemistry & molecular biology ,Ambiguity ,Identification (information) ,Algorithms ,Biotechnology ,TheoryofComputation_COMPUTATIONBYABSTRACTDEVICES ,media_common.quotation_subject ,Computational biology ,Machine learning ,SEQUENCE ,Peptide Mapping ,03 medical and health sciences ,Deep Learning ,REVEALS ,Humans ,RATES ,Molecular Biology ,030304 developmental biology ,IDENTIFICATION ,business.industry ,Deep learning ,010401 analytical chemistry ,Biology and Life Sciences ,Proteins ,Cell Biology ,Python (programming language) ,PERFORMANCE LIQUID-CHROMATOGRAPHY ,Peptide Fragments ,0104 chemical sciences ,Workflow ,ComputingMethodologies_PATTERNRECOGNITION ,chemistry ,Artificial intelligence ,business ,Protein Processing, Post-Translational ,computer ,Retention time - Abstract
The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex LC-MS identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open modification searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We here therefore present DeepLC, a novel deep learning peptide retention time predictor utilizing a new peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides, and, more importantly, accurately predicts retention times for modifications not seen during training. Moreover, we show that DeepLC’s ability to predict retention times for any modification enables potentially incorrect identifications to be flagged in an open modification search of CD8-positive T-cell proteome data. DeepLC is available under the permissive Apache 2.0 open source license and comes with a user-friendly graphical user interface, as well as a Python package on PyPI, Bioconda, and BioContainers for effortless workflow integration.
- Published
- 2021
- Full Text
- View/download PDF