Back to Search Start Over

MiPepid: MicroPeptide identification tool using machine learning

Authors :
Michael Gribskov
Mengmeng Zhu
Source :
BMC Bioinformatics, BMC Bioinformatics, Vol 20, Iss 1, Pp 1-11 (2019)
Publication Year :
2019

Abstract

Background Micropeptides are small proteins with length Results In this study, we have developed MiPepid, a machine-learning tool specifically for the identification of micropeptides. We trained MiPepid using carefully cleaned data from existing databases and used logistic regression with 4-mer features. With only the sequence information of an ORF, MiPepid is able to predict whether it encodes a micropeptide with 96% accuracy on a blind dataset of high-confidence micropeptides, and to correctly classify newly discovered micropeptides not included in either the training or the blind test data. Compared with state-of-the-art coding potential prediction methods, MiPepid performs exceptionally well, as other methods incorrectly classify most bona fide micropeptides as noncoding. MiPepid is alignment-free and runs sufficiently fast for genome-scale analyses. It is easy to use and is available at https://github.com/MindAI/MiPepid. Conclusions MiPepid was developed to specifically predict micropeptides, a category of proteins with increasing significance, from DNA sequences. It shows evident advantages over existing coding potential prediction methods on micropeptide identification. It is ready to use and runs fast.

Details

ISSN :
14712105
Volume :
20
Issue :
1
Database :
OpenAIRE
Journal :
BMC bioinformatics
Accession number :
edsair.doi.dedup.....cf412fd2ae7b3977c9eb51f3ec7390c9