Back to Search
Start Over
MiPepid: MicroPeptide identification tool using machine learning
- Source :
- BMC Bioinformatics, BMC Bioinformatics, Vol 20, Iss 1, Pp 1-11 (2019)
- Publication Year :
- 2019
-
Abstract
- Background Micropeptides are small proteins with length Results In this study, we have developed MiPepid, a machine-learning tool specifically for the identification of micropeptides. We trained MiPepid using carefully cleaned data from existing databases and used logistic regression with 4-mer features. With only the sequence information of an ORF, MiPepid is able to predict whether it encodes a micropeptide with 96% accuracy on a blind dataset of high-confidence micropeptides, and to correctly classify newly discovered micropeptides not included in either the training or the blind test data. Compared with state-of-the-art coding potential prediction methods, MiPepid performs exceptionally well, as other methods incorrectly classify most bona fide micropeptides as noncoding. MiPepid is alignment-free and runs sufficiently fast for genome-scale analyses. It is easy to use and is available at https://github.com/MindAI/MiPepid. Conclusions MiPepid was developed to specifically predict micropeptides, a category of proteins with increasing significance, from DNA sequences. It shows evident advantages over existing coding potential prediction methods on micropeptide identification. It is ready to use and runs fast.
- Subjects :
- Computer science
Machine learning
computer.software_genre
lcsh:Computer applications to medicine. Medical informatics
Biochemistry
DNA sequencing
Machine Learning
03 medical and health sciences
Open Reading Frames
lncRNA
Structural Biology
Small peptide
Databases, Protein
Molecular Biology
lcsh:QH301-705.5
030304 developmental biology
chemistry.chemical_classification
0303 health sciences
business.industry
Coding
Applied Mathematics
030302 biochemistry & molecular biology
Computational Biology
Small ORF
Computer Science Applications
Amino acid
Open reading frame
Identification (information)
chemistry
lcsh:Biology (General)
Micropeptide
smORF
Noncoding
lcsh:R858-859.7
Artificial intelligence
DNA microarray
business
Peptides
computer
Software
sORF
Test data
Subjects
Details
- ISSN :
- 14712105
- Volume :
- 20
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- BMC bioinformatics
- Accession number :
- edsair.doi.dedup.....cf412fd2ae7b3977c9eb51f3ec7390c9