Back to Search
Start Over
Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics
- Source :
- BMC Bioinformatics, BMC Bioinformatics, Vol 8, Iss 1, p 323 (2007)
- Publication Year :
- 2007
- Publisher :
- BioMed Central, 2007.
-
Abstract
- Background In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now. Results In this study, we implemented a machine learning approach known as predictive genetic algorithm (GA) for the optimization of filtering criteria to maximize the number of identified peptides at fixed false-discovery rate (FDR) for SEQUEST database searching. As the FDR was directly determined by decoy database search scheme, the GA based optimization approach did not require any pre-knowledge on the characteristics of the data set, which represented significant advantages over statistical approaches such as PeptideProphet. Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time. Moreover, the GA based approach can be easily extended to process other database search results as it did not rely on any assumption on the data. Conclusion Our results indicated that filtering criteria should be optimized individually for different samples. The new developed software using GA provides a convenient and fast way to create tailored optimal criteria for different proteome samples to improve proteome coverage.
- Subjects :
- Proteomics
Proteome
PeptideProphet
Molecular Sequence Data
Information Storage and Retrieval
Biology
lcsh:Computer applications to medicine. Medical informatics
computer.software_genre
Biochemistry
Peptide Mapping
Mass Spectrometry
Software
Structural Biology
Search algorithm
Sequence Analysis, Protein
Genetic algorithm
Database search engine
Amino Acid Sequence
Shotgun proteomics
Databases, Protein
lcsh:QH301-705.5
Molecular Biology
Database
business.industry
Applied Mathematics
Statistical model
Computer Science Applications
Data set
lcsh:Biology (General)
lcsh:R858-859.7
Database Management Systems
business
computer
Algorithms
Research Article
Subjects
Details
- Language :
- English
- ISSN :
- 14712105
- Volume :
- 8
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....fcb7019c546a46a84c6684984cd8aa2e