1. Identification of Enriched Regions in ChIP-Seq Data via a Linear-Time Multi-Level Thresholding Algorithm
- Author
-
Akram Vasighizaker, Musab Naik, and Luis Rueda
- Subjects
Chromatin Immunoprecipitation ,Binding Sites ,Dynamic range ,Computer science ,business.industry ,Applied Mathematics ,0206 medical engineering ,Pattern recognition ,DNA ,Sequence Analysis, DNA ,02 engineering and technology ,ENCODE ,Chip ,Thresholding ,Identification (information) ,Genetics ,Chromatin Immunoprecipitation Sequencing ,Artificial intelligence ,Noise (video) ,business ,Heuristics ,Time complexity ,Algorithms ,020602 bioinformatics ,Biotechnology - Abstract
Chromatin immunoprecipitation (ChIP-Seq) has emerged as a superior alternative to microarray technology as it provides higher resolution, less noise, greater coverage and wider dynamic range. While ChIP-Seq enables probing of DNA-protein interaction over the entire genome, it requires the use of sophisticated tools to recognize hidden patterns and extract meaningful data. Over the years, various attempts have resulted in several algorithms making use of different heuristics to accurately determine individual peaks corresponding to unique DNA-protein. However, finding all the significant peaks with high accuracy in a reasonable time is still a challenge. In this work, we propose the use of Multi-level thresholding algorithm, which we call LinMLTBS, used to identify the enriched regions on ChIP-Seq data. Although various suboptimal heuristics have been proposed for multi-level thresholding, we emphasize on the use of an algorithm capable of obtaining an optimal solution, while maintaining linear-time complexity. Testing various algorithm on various ENCODE project datasets shows that our approach attains higher accuracy relative to previously proposed peak finders while retaining a reasonable processing speed.
- Published
- 2022