Back to Search Start Over

Fast dendrogram-based OTU clustering using sequence embedding

Authors :
Bertil Schmidt
Chee Keong Kwoh
Thuy-Diem Nguyen
Source :
BCB
Publication Year :
2014
Publisher :
ACM, 2014.

Abstract

Biodiversity assessment is an important step in a metagenomic processing pipeline. The biodiversity of a microbial metagenome is often estimated by grouping its 16S rRNA reads into operational taxonomic units or OTUs. These metagenomic datasets are typically large and hence require effective yet accurate computational methods for processing.In this paper, we introduce a new hierarchical clustering method called CRiSPy-Embed which aims to produce high-quality clustering results at a low computational cost. We tackle two computational issues of the current OTU hierarchical clustering approach: (1) the compute-intensive sequence alignment operation for building the distance matrix and (2) the quadratic memory requirement of the clustering procedure.Our performance evaluation shows that CRiSPy-Embed achieves higher efficiency in terms of both runtime and memory consumption in comparison to existing dendrogram-based approaches. Furthermore, to obtain the final OTU grouping, CRiSPy-Embed dynamically determines a natural cutoff of the dendrogram. With this strategy, CRiSPy-Embed achieves better and more robust clustering outcomes compared to other notable OTU clustering pipelines.

Details

Database :
OpenAIRE
Journal :
Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
Accession number :
edsair.doi...........77f613feb52a1e4d08a93edbfac5f68a
Full Text :
https://doi.org/10.1145/2649387.2649402