Back to Search
Start Over
PIPEBAR and OverlapPER: tools for a fast and accurate DNA barcoding analysis and paired-end assembly
- Source :
- BMC Bioinformatics, Vol 19, Iss 1, Pp 1-10 (2018), BMC Bioinformatics
- Publication Year :
- 2018
- Publisher :
- Springer Science and Business Media LLC, 2018.
-
Abstract
- Background Taxonomic identification of plants and insects is a hard process that demands expert taxonomists and time, and it’s often difficult to distinguish on morphology only. DNA barcodes allow a rapid species discovery and identification and have been widely used for taxonomic identification by targeting known gene regions that permit to discriminate these species. DNA barcode sequence analysis is usually carried out with processes and tools that still demand a high interaction with the user or researcher. To reduce at most such interaction, we proposed PIPEBAR, a pipeline for DNA chromatograms analysis of Sanger platform sequencing, ensuring high quality consensus sequences along with efficient running time. We also proposed a paired-end reads assembly tool, OverlapPER, which is used in sequence or independently of PIPEBAR. Results PIPEBAR is a command line tool to automatize the processing of large number of trace files. It is accurate as the proprietary Geneious tool and faster than most popular software for barcoding analysis. It is 7 times faster than Geneious and 14 times faster than SeqTrace for processing hundreds of barcoding sequences. OverlapPER is a novel tool for overlapping paired-end reads accurately that accepts both substitution and indel errors and returns both overlapped and non-overlapped regions between a pair of reads. OverlapPER obtained the best results compared to currently used tools when merging 1,000,000 simulated paired-end reads. Conclusions PIPEBAR and OverlapPER run on most operating systems and are freely available, along with supporting code and documentation, at https://sourceforge.net/projects/PIPEBAR/ and https://sourceforge.net/projects/overlapper-reads/. Electronic supplementary material The online version of this article (10.1186/s12859-018-2307-y) contains supplementary material, which is available to authorized users.
- Subjects :
- 0301 basic medicine
Computer science
Sequence analysis
lcsh:Computer applications to medicine. Medical informatics
computer.software_genre
Biochemistry
DNA barcoding
DNA sequencing
Paired-end assembly
03 medical and health sciences
chemistry.chemical_compound
Structural Biology
Consensus Sequence
DNA barcode
Consensus sequence
DNA Barcoding, Taxonomic
Frameshift Mutation
Indel
lcsh:QH301-705.5
Molecular Biology
Gene
Sanger
TRACE (psycholinguistics)
Base Sequence
Applied Mathematics
Pipeline (software)
Computer Science Applications
Identification (information)
030104 developmental biology
lcsh:Biology (General)
chemistry
Dna barcodes
Codon, Terminator
lcsh:R858-859.7
Data mining
Line (text file)
DNA microarray
computer
Software
DNA
Subjects
Details
- ISSN :
- 14712105
- Volume :
- 19
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....152c08ed09cf4a4dfd4aea269db1f79e