Back to Search
Start Over
Identification of bacteriophage genome sequences with representation learning
- Source :
- Bioinformatics (Oxford, England). 38(18)
- Publication Year :
- 2021
-
Abstract
- MotivationBacteriophages/Phages are the viruses that infect and replicate within bacteria and archaea, and rich in human body. To investigate the relationship between phages and microbial communities, the identification of phages from metagenome sequences is the first step. Currently, there are two main methods for identifying phages: database-based (alignment-based) methods and alignment-free methods. Database-based methods typically use a large number of sequences as references; alignment-free methods usually learn the features of the sequences with machine learning and deep learning models.ResultsWe propose INHERIT which uses a deep representation learning model to integrate both database-based and alignment-free methods, combining the strengths of both. Pre-training is used as an alternative way of acquiring knowledge representations from existing databases, while the BERT-style deep learning framework retains the advantage of alignment-free methods. We compare INHERIT with four existing methods on a third-party benchmark dataset. Our experiments show that INHERIT achieves a better performance with the F1-score of 0.9932. In addition, we find that pre-training two species separately helps the non-alignment deep learning model make more accurate predictions.AvailabilityThe codes of INHERIT are now available in: https://github.com/Celestial-Bai/INHERIT.Contactyaozhong@ims.u-tokyo.ac.jp and imoto@hgc.jpSupplementary informationSupplementary data are available at BioRxiv online.
- Subjects :
- Statistics and Probability
Computer science
Machine learning
computer.software_genre
Biochemistry
Machine Learning
Similarity (psychology)
Humans
Bacteriophages
Representation (mathematics)
Molecular Biology
Bacteria
business.industry
Deep learning
Replicate
Computer Science Applications
Identification (information)
Computational Mathematics
Computational Theory and Mathematics
Metagenomics
Benchmark (computing)
Metagenome
Artificial intelligence
business
computer
Feature learning
Software
Subjects
Details
- ISSN :
- 13674811
- Volume :
- 38
- Issue :
- 18
- Database :
- OpenAIRE
- Journal :
- Bioinformatics (Oxford, England)
- Accession number :
- edsair.doi.dedup.....18706ac87d8703937626b3c7e5a2ee97