Back to Search Start Over

PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment.

Authors :
Koskinen, Patrik
Törönen, Petri
Nokso-Koivisto, Jussi
Holm, Liisa
Source :
Bioinformatics. 5/15/2015, Vol. 31 Issue 10, p1544-1552. 9p.
Publication Year :
2015

Abstract

Motivation: The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as 'Unknown protein' or alike. Also the functionally annotated parts are reported to contain 30-40% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation. Results: Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge. Availability and implementation: The PANNZER program was developed using the Python programming language (Version 2.6). The stand-alone installation of the PANNZER requires MySQL database for data storage and the BLAST (BLASTALL v.2.2.21) tools for the sequence similarity search. The tutorial, evaluation test sets and results are available on the PANNZER web site. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13674803
Volume :
31
Issue :
10
Database :
Academic Search Index
Journal :
Bioinformatics
Publication Type :
Academic Journal
Accession number :
102813619
Full Text :
https://doi.org/10.1093/bioinformatics/btu851