Back to Search
Start Over
NucBreak: location of structural errors in a genome assembly by using paired-end Illumina reads
- Source :
- BMC Bioinformatics, BMC Bioinformatics, Vol 21, Iss 1, Pp 1-11 (2020)
- Publication Year :
- 2018
-
Abstract
- Background Advances in whole genome sequencing strategies have provided the opportunity for genomic and comparative genomic analysis of a vast variety of organisms. The analysis results are highly dependent on the quality of the genome assemblies used. Assessment of the assembly accuracy may significantly increase the reliability of the analysis results and is therefore of great importance. Results Here, we present a new tool called NucBreak aimed at localizing structural errors in assemblies, including insertions, deletions, duplications, inversions, and different inter- and intra-chromosomal rearrangements. The approach taken by existing alternative tools is based on analysing reads that do not map properly to the assembly, for instance discordantly mapped reads, soft-clipped reads and singletons. NucBreak uses an entirely different and unique method to localise the errors. It is based on analysing the alignments of reads that are properly mapped to an assembly and exploit information about the alternative read alignments. It does not annotate detected errors. We have compared NucBreak with other existing assembly accuracy assessment tools, namely Pilon, REAPR, and FRCbam as well as with several structural variant detection tools, including BreakDancer, Lumpy, and Wham, by using both simulated and real datasets. Conclusions The benchmarking results have shown that NucBreak in general predicts assembly errors of different types and sizes with relatively high sensitivity and with lower false discovery rate than the other tools. Such a balance between sensitivity and false discovery rate makes NucBreak a good alternative to the existing assembly accuracy assessment tools and SV detection tools. NucBreak is freely available at https://github.com/uio-bmi/NucBreak under the MPL license.
- Subjects :
- Computer science
Reliability (computer networking)
Sequence assembly
02 engineering and technology
Computational biology
computer.software_genre
lcsh:Computer applications to medicine. Medical informatics
Biochemistry
Genome
Illumina paired-end reads
03 medical and health sciences
Structural Biology
0202 electrical engineering, electronic engineering, information engineering
Sensitivity (control systems)
Comparative genomic analysis
Molecular Biology
lcsh:QH301-705.5
030304 developmental biology
Whole genome sequencing
0303 health sciences
Genome assembly
Applied Mathematics
Structural variant
High-Throughput Nucleotide Sequencing
Reproducibility of Results
Structural variant detection
020206 networking & telecommunications
Genomics
Sequence Analysis, DNA
Assembly accuracy assessment
Computer Science Applications
lcsh:Biology (General)
lcsh:R858-859.7
Data mining
DNA microarray
computer
Assembly errors
Software
Subjects
Details
- ISSN :
- 14712105
- Volume :
- 21
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- BMC bioinformatics
- Accession number :
- edsair.doi.dedup.....ac5d6d6b3ca2310a3e2791ff1fbd0e9b