Back to Search
Start Over
PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment.
- Source :
-
Bioinformatics . 5/15/2015, Vol. 31 Issue 10, p1544-1552. 9p. - Publication Year :
- 2015
-
Abstract
- Motivation: The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as 'Unknown protein' or alike. Also the functionally annotated parts are reported to contain 30-40% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation. Results: Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge. Availability and implementation: The PANNZER program was developed using the Python programming language (Version 2.6). The stand-alone installation of the PANNZER requires MySQL database for data storage and the BLAST (BLASTALL v.2.2.21) tools for the sequence similarity search. The tutorial, evaluation test sets and results are available on the PANNZER web site. [ABSTRACT FROM AUTHOR]
- Subjects :
- *GENETIC databases
*SEQUENCE analysis
*AMINO acid sequence
Subjects
Details
- Language :
- English
- ISSN :
- 13674803
- Volume :
- 31
- Issue :
- 10
- Database :
- Academic Search Index
- Journal :
- Bioinformatics
- Publication Type :
- Academic Journal
- Accession number :
- 102813619
- Full Text :
- https://doi.org/10.1093/bioinformatics/btu851