Back to Search
Start Over
Automated DNA-based plant identification for large-scale biodiversity assessment
- Source :
- Digital.CSIC. Repositorio Institucional del CSIC, instname
- Publication Year :
- 2015
- Publisher :
- Wiley-Blackwell, 2015.
-
Abstract
- Rapid degradation of tropical forests urges to improve our efficiency in large-scale biodiversity assessment. DNA barcoding can assist greatly in this task, but commonly used phenetic approaches for DNA-based identifications rely on the existence of comprehensive reference databases, which are infeasible for hyperdiverse tropical ecosystems. Alternatively, phylogenetic methods are more robust to sparse taxon sampling but time-consuming, while multiple alignment of species-diagnostic, typically length-variable, markers can be problematic across divergent taxa. We advocate the combination of phylogenetic and phenetic methods for taxonomic assignment of DNA-barcode sequences against incomplete reference databases such as GenBank, and we developed a pipeline to implement this approach on large-scale plant diversity projects. The pipeline workflow includes several steps: database construction and curation, query sequence clustering, sequence retrieval, distance calculation, multiple alignment and phylogenetic inference. We describe the strategies used to establish these steps and the optimization of parameters to fit the selected psbA-trnH marker. We tested the pipeline using infertile plant samples and herbivore diet sequences from the highly threatened Nicaraguan seasonally dry forest and exploiting a valuable purpose-built resource: a partial local reference database of plant psbA-trnH. The selected methodology proved efficient and reliable for high-throughput taxonomic assignment, and our results corroborate the advantage of applying ‘strict’ tree-based criteria to avoid false positives. The pipeline tools are distributed as the scripts suite ‘BAGpipe’ (pipeline for Biodiversity Assessment using GenBank data), which can be readily adjusted to the purposes of other projects and applied to sequence-based identification for any marker or taxon.<br />This project was mainly funded by the ‘Ecology and Conservation Biology’ programme (BIOCON 08) of the Spanish private fund, ‘Fundación BBVA’ led by J.G.-Z. Additional funding was available for fieldwork thanks to Project C/032352/10 of the Interuniversity Cooperation and Research Programme (PCI-AECID 2010) of the Spanish Ministry of Foreign Affairs and Cooperation (MAEC). A.P. was funded by the postdoctoral programme ‘Juan de la Cierva’ of the Spanish Ministry of Science and Innovation (MICINN); G.D.C. by the predoctoral programme of the AECID (Spanish MAEC); and D.C. by the Knowledge Innovation Program of the Chinese Academy of Sciences (Grant No. KSXC2-EW-B-02) and the Chinese National Science Foundation (Grants No. 30870268, 31172048, J1210002).
- Subjects :
- DNA, Plant
Dry tropical forest
Molecular Sequence Data
Nicaragua
Forests
Biology
computer.software_genre
Plant identification
Taxonomic assignment
Genetics
DNA Barcoding, Taxonomic
DNA barcoding
Phylogeny
Ecology, Evolution, Behavior and Systematics
Sequence clustering
Multiple sequence alignment
BAGpipe script suite
Phylogenetic tree
Ecology
Biodiversity
Sequence Analysis, DNA
Plants
Pipeline (software)
Taxon
psbA-trnH
GenBank
Identification (biology)
Data mining
computer
Biotechnology
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Digital.CSIC. Repositorio Institucional del CSIC, instname
- Accession number :
- edsair.doi.dedup.....4841b9f81bcfdc093098b9123312ad85