Back to Search
Start Over
iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
- Source :
- Briefings in bioinformatics. 21(3)
- Publication Year :
- 2019
-
Abstract
- With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
- Subjects :
- Feature engineering
Web server
Computer science
0206 medical engineering
Feature extraction
Feature selection
02 engineering and technology
computer.software_genre
Machine learning
Machine Learning
03 medical and health sciences
Cluster analysis
Molecular Biology
030304 developmental biology
computer.programming_language
0303 health sciences
Internet
business.industry
Dimensionality reduction
Proteins
DNA
Python (programming language)
Ensemble learning
RNA
Artificial intelligence
business
computer
Sequence Analysis
020602 bioinformatics
Algorithms
Information Systems
Subjects
Details
- ISSN :
- 14774054
- Volume :
- 21
- Issue :
- 3
- Database :
- OpenAIRE
- Journal :
- Briefings in bioinformatics
- Accession number :
- edsair.doi.dedup.....1f0d8aeaa31436baaeb76fa16146b6fa