Back to Search
Start Over
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
- Source :
- BMC Bioinformatics, LOCUS Repositório Institucional da UFV, Universidade Federal de Viçosa (UFV), instacron:UFV, TU Graz
- Publication Year :
- 2016
- Publisher :
- BioMed Central, 2016.
-
Abstract
- Background This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. Results In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. Conclusions The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics.
- Subjects :
- 0301 basic medicine
Artificial neural network
Proteomics
Phosphoproteomics
Proteome
Computer science
Peptide/protein identification
ProteinProphet
Peptide
Tandem mass spectrometry
Biochemistry
law.invention
03 medical and health sciences
Cost sensitive classification
Structural Biology
law
Tandem Mass Spectrometry
Shotgun proteomics
Sensitivity (control systems)
Databases, Protein
Molecular Biology
Data mining
Probability
chemistry.chemical_classification
Applied Mathematics
Research
A protein
Computer Science Applications
Task (computing)
Identification (information)
030104 developmental biology
chemistry
Venn diagram
Protein identification
Neural Networks, Computer
DNA microarray
Peptides
Algorithm
Algorithms
Software
Subjects
Details
- Language :
- English
- ISSN :
- 14712105
- Volume :
- 17
- Issue :
- Suppl 18
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....55aa856ee3adeaa42adecffcb497da2c