1. AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding
- Author
-
Lingyan Zheng, Shuiyang Shi, Mingkun Lu, Pan Fang, Ziqi Pan, Hongning Zhang, Zhimeng Zhou, Hanyu Zhang, Minjie Mou, Shijie Huang, Lin Tao, Weiqi Xia, Honglin Li, Zhenyu Zeng, Shun Zhang, Yuzong Chen, Zhaorong Li, and Feng Zhu
- Subjects
Protein function annotation ,Long-tail problem ,Protein representation ,Pre-training ,LSTM ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have been developed. However, the existing methods suffer from a serious long-tail problem, with a large number of GO families containing few annotated proteins. Herein, an innovative strategy named AnnoPRO was therefore constructed by enabling sequence-based multi-scale protein representation, dual-path protein encoding using pre-training, and function annotation by long short-term memory-based decoding. A variety of case studies based on different benchmarks were conducted, which confirmed the superior performance of AnnoPRO among available methods. Source code and models have been made freely available at: https://github.com/idrblab/AnnoPRO and https://zenodo.org/records/10012272
- Published
- 2024
- Full Text
- View/download PDF