Jean-Marc Frigerio, Martyn Kelly, Ana Baričević, Maria Kahlert, Sebastian Proft, Teofana Chonova, Demetrio Mora, Jonas Zimmermann, Martin Pfannkuchen, Laure Apothéloz-Perret-Gentil, Valentin Vasselon, Mathieu Ramon, Bonnie Bailet, Alain Franc, Swedish University of Agricultural Sciences (SLU), University of Geneva [Switzerland], Institut Ruđer Bošković (IRB), Centre Alpin de Recherche sur les Réseaux Trophiques et Ecosystèmes Limniques (CARRTEL), Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Biodiversité, Gènes & Communautés (BioGeCo), Université de Bordeaux (UB)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), from patterns to models in computational biodiversity and biotechnology (PLEIADE), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Biodiversité, Gènes & Communautés (BioGeCo), Université de Bordeaux (UB)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), University of Nottingham, UK (UON), Freie Universität Berlin, Fera Science Ltd, Stiftelsen Oscar och Lili Lamms Minne Swedish Agency for Marine andWater Management Ministry of Agriculture and Forestry in Finland Federal Ministry of Education and Research (German Barcode of Life 2 Diatoms (GBOL2)) 01LI1501ECroatian Science Foundation project: Life strategies of phytoplankton in the northern Adriatic UIP-2014-09-6563European COST-Action DNAqua Net CA15219, Université de Genève = University of Geneva (UNIGE), and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest
Ecological assessment of lakes and rivers using benthic diatom assemblages currently requires considerable taxonomic expertise to identify species using light microscopy. This traditional approach is also time-consuming. Diatom metabarcoding is a promising alternative and there is increasing interest in using this approach for routine assessment. However, until now, analysis protocols for diatom metabarcoding have been developed and optimised by research groups working in isolation. The diversity of existing bioinformatics methods highlights the need for an assessment of the performance and comparability of results of different methods. The aim of this study was to test the correspondence of outputs from six bioinformatics pipelines currently in use for diatom metabarcoding in different European countries. Raw sequence data from 29 biofilm samples were treated by each of the bioinformatics pipelines, five of them using the same curated reference database. The outputs of the pipelines were compared in terms of sequence unit assemblages, taxonomic assignment, biotic index score and ecological assessment outcomes. The three last components were also compared to outputs from traditional light microscopy, which is currently accepted for ecological assessment of phytobenthos, as required by the Water Framework Directive. We also tested the performance of the pipelines on the two DNA markers (rbcL and 185-V4) that are currently used by the working groups participating in this study. The sequence unit assemblages produced by different pipelines showed significant differences in terms of assigned and unassigned read numbers and sequence unit numbers. When comparing the taxonomic assignments at genus and species level, correspondence of the taxonomic assemblages between pipelines was weak. Most discrepancies were linked to differential detection or quantification of taxa, despite the use of the same reference database. Subsequent calculation of biotic index scores also showed significant differences between approaches, which were reflected in the final ecological assessment. Use of the rbcL marker always resulted in better correlation among molecular datasets and also in results closer to these generated using traditional microscopy. This study shows that decisions made in pipeline design have implications for the dataset's structure and the taxonomic assemblage, which in turn may affect biotic index calculation and ecological assessment. There is a need to define best-practice bioinformatics parameters in order to ensure the best representation of diatom assemblages. Only the use of similar parameters will ensure the compatibility of data from different working groups. The future of diatom metabarcoding for ecological assessment may also lie in the development of new metrics using, for example, presence/absence instead of relative abundance data. (C) 2020 The Authors. Published by Elsevier B.V.