Back to Search
Start Over
Optimal sequence similarity thresholds for clustering of molecular operational taxonomic units in DNA metabarcoding studies.
- Source :
-
Molecular ecology resources [Mol Ecol Resour] 2023 Feb; Vol. 23 (2), pp. 368-381. Date of Electronic Publication: 2022 Sep 15. - Publication Year :
- 2023
-
Abstract
- Clustering approaches are pivotal to handle the many sequence variants obtained in DNA metabarcoding data sets, and therefore they have become a key step of metabarcoding analysis pipelines. Clustering often relies on a sequence similarity threshold to gather sequences into molecular operational taxonomic units (MOTUs), each of which ideally represents a homogeneous taxonomic entity (e.g., a species or a genus). However, the choice of the clustering threshold is rarely justified, and its impact on MOTU over-splitting or over-merging even less tested. Here, we evaluated clustering threshold values for several metabarcoding markers under different criteria: limitation of MOTU over-merging, limitation of MOTU over-splitting, and trade-off between over-merging and over-splitting. We extracted sequences from a public database for nine markers, ranging from generalist markers targeting Bacteria or Eukaryota, to more specific markers targeting a class or a subclass (e.g., Insecta, Oligochaeta). Based on the distributions of pairwise sequence similarities within species and within genera, and on the rates of over-splitting and over-merging across different clustering thresholds, we were able to propose threshold values minimizing the risk of over-splitting, that of over-merging, or offering a trade-off between the two risks. For generalist markers, high similarity thresholds (0.96-0.99) are generally appropriate, while more specific markers require lower values (0.85-0.96). These results do not support the use of a fixed clustering threshold. Instead, we advocate careful examination of the most appropriate threshold based on the research objectives, the potential costs of over-splitting and over-merging, and the features of the studied markers.<br /> (© 2022 John Wiley & Sons Ltd.)
Details
- Language :
- English
- ISSN :
- 1755-0998
- Volume :
- 23
- Issue :
- 2
- Database :
- MEDLINE
- Journal :
- Molecular ecology resources
- Publication Type :
- Academic Journal
- Accession number :
- 36052659
- Full Text :
- https://doi.org/10.1111/1755-0998.13709