Next Generation Cluster Editing

Authors :: Alexander Schönhuth
Thomas Bellitto
Tobias Marschall
Gunnar W. Klau
Evolutionary Intelligence
Source :: PeerJ PrePrints, 3, PeerJ PrePrints, German Conference on Bioinformatics
Publication Year :: 2015
Abstract: Genomic structural variations play key roles in genetic diversity and disease. Despite recent advances in structural variation discovery, many variants are yet to be discovered. Midsize insertions and deletions pose particularly involved algorithmic challenges. The recent CLEVER algorithm addressed these challenges with a statistical model on cliques in a graph whose nodes are read alignments and whose edges arise from a statistical test on length and overlap of read alignments. However, the resulting read alignment clusters tend to be too small and are heavily overlapping, which leads to losses in recall performance rates. Here we present a model based on weighted cluster editing, which alleviates these issues: clusters are provably non-overlapping and tend to be larger. In order to render the inherent optimization problem tractable on all read alignments of a genome, we present a novel, principled heuristic, which runs in time linear in the length of the genome. The heuristic is based on an exact polynomial-time algorithm for weighted cluster editing in one-dimensional point graphs. We demonstrate that the new model improves recall rates achieved by CLEVER.

Subjects :: Structural variation
FOS: Computer and information sciences
Theoretical computer science
Optimization problem
Computer science
Computer Science - Data Structures and Algorithms
Cluster (physics)
Graph theory
Statistical model
Data Structures and Algorithms (cs.DS)
Genome
Graph
Statistical hypothesis testing

Language :: English
Database :: OpenAIRE
Journal :: PeerJ PrePrints, 3, PeerJ PrePrints, German Conference on Bioinformatics
Accession number :: edsair.doi.dedup.....3164f228c4adc5536686b247e2fbd33f

Tools