Back to Search
Start Over
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
- Source :
- Genome Biology, Vol 20, Iss 1, Pp 1-18 (2019), Genome Biology
- Publication Year :
- 2019
- Publisher :
- BMC, 2019.
-
Abstract
- BackgroundSequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.ResultsWe benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, andF1. Using the most robust programs, we create a comprehensive pipeline called Extensivede-novoTE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.ConclusionsThe benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available:https://github.com/oushujun/EDTA.
- Subjects :
- 0106 biological sciences
Transposable element
lcsh:QH426-470
Inverted repeat
Annotation
Sequence assembly
Retrotransposon
Computational biology
Biology
01 natural sciences
Genome
03 medical and health sciences
Pipeline
Animals
Humans
lcsh:QH301-705.5
030304 developmental biology
0303 health sciences
Research
food and beverages
Molecular Sequence Annotation
Genome project
Pipeline (software)
Benchmarking
lcsh:Genetics
lcsh:Biology (General)
DNA Transposable Elements
Software
010606 plant biology & botany
Subjects
Details
- Language :
- English
- Volume :
- 20
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Genome Biology
- Accession number :
- edsair.doi.dedup.....5c8d477fbd71277ffab94c14e014beb9