Back to Search
Start Over
Sensitive clustering of protein sequences at tree-of-life scale using DIAMOND DeepClust
- Publication Year :
- 2023
- Publisher :
- Cold Spring Harbor Laboratory, 2023.
-
Abstract
- The biosphere genomics era is transforming life science research, but existing methods struggle to efficiently reduce the vast dimensionality of the protein universe. We present DIAMOND DeepClust, an ultra-fast cascaded clustering method optimized to cluster the 19 billion protein sequences currently defining the protein biosphere. As a result, we detect 1.7 billion clusters of which 32% hold more than one sequence. This means that 544 million clusters represent 94% of all known proteins, illustrating that clustering across the tree of life can significantly accelerate comparative studies in the Earth BioGenome era.
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi...........2c170cba8695d29989c657f39b4b6e65