Back to Search
Start Over
Systematic Prediction of Regulatory Motifs from Human ChIP-Sequencing Data Based on a Deep Learning Framework
- Publication Year :
- 2018
- Publisher :
- Cold Spring Harbor Laboratory, 2018.
-
Abstract
- Identification of transcription factor binding sites (TFBSs) and cis-regulatory motifs (motifs for short) from genomics datasets, provides a powerful view of the rules governing the interactions between TFs and DNA. Existing motif prediction methods however, are limited by high false positive rates in TFBSs identification, contributions from non-sequence-specific binding, and complex and indirect binding mechanisms. High throughput next-generation sequencing data provides unprecedented opportunities to overcome these difficulties, as it provides multiple whole-genome scale measurements of TF binding information. Uncovering this information brings new computational and modeling challenges in high-dimensional data mining and heterogeneous data integration. To improve TFBS identification and novel motifs prediction accuracy in the human genome, we developed an advanced computational technique based on deep learning (DL) and high-performance computing, named DESSO. DESSO utilizes deep neural network and binomial distribution to optimize the motif prediction. Our results showed that DESSO outperformed existing tools in predicting distinct motifs from the 690 in vivo ENCODE ChIP-Sequencing (ChIP-Seq) datasets for 161 human TFs in 91 cell lines. We also found that protein-protein interactions (PPIs) are prevalent among human TFs, and a total of 61 potential tethering binding were identified among the 100 TFs in the K562 cell line. To further expand DESSO’s deep-learning capabilities, we included DNA shape features and found that (i) shape information has a strong predictive power for TF-DNA binding specificity; and (ii) it aided in identification of the shape motifs recognized by human TFs which in turn contributed to the interpretation of TF-DNA binding in the absence of sequence recognition. DESSO and the analyses it enabled will continue to improve our understanding of how gene expression is controlled by TFs and the complexities of DNA binding. The source code and the predicted motifs and TFBSs from the 690 ENCODE TF ChIP-Seq datasets are freely available at the DESSO web server: http://bmbl.sdstate.edu/DESSO.
- Subjects :
- 0303 health sciences
Computer science
business.industry
Deep learning
Genomics
Computational biology
ENCODE
ChIP-sequencing
DNA binding site
03 medical and health sciences
chemistry.chemical_compound
0302 clinical medicine
chemistry
Gene expression
natural sciences
Human genome
Artificial intelligence
business
030217 neurology & neurosurgery
DNA
030304 developmental biology
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....86a84b934ea595f6e9928d018d6cdf11
- Full Text :
- https://doi.org/10.1101/417378