1. High-performance method for identification of super enhancers from ChIP-Seq data with configurable cloud virtual machines
- Author
-
Olga V. Bogatova, A. V. Orlov, and Natalia N. Orlova
- Subjects
Epigenomics ,Computer science ,Clinical Biochemistry ,Cloud computing ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Chromatin immunoprecipitation followed by sequencing ,03 medical and health sciences ,Next generation sequencing ,Data structure alignment ,lcsh:Science ,030304 developmental biology ,0105 earth and related environmental sciences ,0303 health sciences ,Data processing ,business.industry ,Method Article ,H3K27ac ,Stitched enhancers ,Medical Laboratory Technology ,Identification (information) ,Parallel processing (DSP implementation) ,Virtual machine ,Feature (computer vision) ,lcsh:Q ,Data mining ,business ,Peak calling ,computer - Abstract
A universal method for rapid identifying super-enhancers which are large domains of multiple closely-spaced enhancers is proposed. The method applies configurable cloud virtual machines (cVMs) and the rank-ordering of super-enhancers (ROSE) algorithm. To identify super-enhancers a сVM-based analysis of the ChIP-seq binding patterns of the active enhancer-associated mark is employed. The use of the proposed method is described step-by-step: configuration of cVM; ChIP-seq data alignment; peak calling; ROSE algorithm; interpretation of the results on a client machine. The method was validated for the search of super-enhancers using the H3K27ac mark in the sample datasets of a cell line (human MCF-7), mouse tissue (heart), and human tissue (adrenal gland). The total analysis cycle time of raw ChIP-seq data ranges from 15 to 48 min, depending on the number of initial short reads. Depending on the data processing step and availability of multi-threading, a cVM can be scaled up to a multi-CPU configuration with large amount of RAM. An important feature of the method is that it can run on a client machine that has low-performance with virtually any OS. The proposed method allows for simultaneous and independent processing of different sample datasets on multiple clones of a single cVM.•Cloud VMs were used for rapid processing of ChIP-seq data to identify super-enhancers.•The method can use a low-performance computer with virtually any OS on it.•It can be scaled up for parallel processing of individual sample datasets on their own VMs for rapid high-throughput processing., Graphical abstract Image, graphical abstract
- Published
- 2020