Back to Search
Start Over
PECC: Correcting contigs based on paired-end read distribution
- Source :
- Computational Biology and Chemistry. 69:178-184
- Publication Year :
- 2017
- Publisher :
- Elsevier BV, 2017.
-
Abstract
- Motivation Cheap and fast next generation sequencing (NGS) technologies facilitate research of de novo assembly greatly. The reliability of contigs is critical to construct reliable scaffolding. However, contigs generated from most assemblers contain errors because of the limitation of assembly strategy and computation complexity. Among all these errors, the misassembly error is one of the most harmful types. Results In this paper, we propose a new method named “PECC” to identify and correct misassembly errors in contigs based on the paired-end read distribution. PECC extracts sequence regions with lower paired-end reads supports and verifies them based on the distribution of paired-end supports. To validate the effectiveness of PECC, we applied PECC to the contigs produced by five popular assemblers on four real datasets, and we also carried out experiments to analyze the influences of PECC on scaffolding. The results show that PECC can reduce misassembly errors and improve the performance of scaffolding results, which demonstrate the promising applications of PECC in de novo assembly.
- Subjects :
- 0301 basic medicine
Contig
business.industry
Computer science
0206 medical engineering
Organic Chemistry
Sequence assembly
02 engineering and technology
Machine learning
computer.software_genre
Biochemistry
03 medical and health sciences
Computational Mathematics
030104 developmental biology
Structural Biology
Computation complexity
Artificial intelligence
business
computer
020602 bioinformatics
Subjects
Details
- ISSN :
- 14769271
- Volume :
- 69
- Database :
- OpenAIRE
- Journal :
- Computational Biology and Chemistry
- Accession number :
- edsair.doi.dedup.....a5a114d4918dc627fad0944a3e4127e7
- Full Text :
- https://doi.org/10.1016/j.compbiolchem.2017.03.012