Back to Search
Start Over
Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data
- Source :
- Nature Communications, Vol 11, Iss 1, Pp 1-11 (2020), Nature Communications
- Publication Year :
- 2020
- Publisher :
- Nature Portfolio, 2020.
-
Abstract
- Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus. However, accurately identifying sub-compartments from chromatin interaction data remains a challenge in computational biology. Here, we present Sub-Compartment Identifier (SCI), an algorithm that uses graph embedding followed by unsupervised learning to predict sub-compartments using Hi-C chromatin interaction data. We find that the network topological centrality and clustering performance of SCI sub-compartment predictions are superior to those of hidden Markov model (HMM) sub-compartment predictions. Moreover, using orthogonal Chromatin Interaction Analysis by in-situ Paired-End Tag Sequencing (ChIA-PET) data, we confirmed that SCI sub-compartment prediction outperforms HMM. We show that SCI-predicted sub-compartments have distinct epigenetic marks, transcriptional activities, and transcription factor enrichment. Moreover, we present a deep neural network to predict sub-compartments using epigenome, replication timing, and sequence data. Our neural network predicts more accurate sub-compartment predictions when SCI-determined sub-compartments are used as labels for training.<br />Accurate identification of sub-compartments from chromatin interaction data remains a challenge. Here, the authors introduce an algorithm combining graph embedding and unsupervised learning to predict sub-compartments using Hi-C data.
- Subjects :
- 0301 basic medicine
Data Analysis
Graph embedding
Computer science
Science
General Physics and Astronomy
Gene Expression
Computational biology
General Biochemistry, Genetics and Molecular Biology
Article
03 medical and health sciences
Epigenome
0302 clinical medicine
Machine learning
Computer Graphics
Cluster Analysis
Humans
lcsh:Science
Cluster analysis
Hidden Markov model
Replication timing
Multidisciplinary
Artificial neural network
Reproducibility of Results
General Chemistry
Genomics
Chromatin
Markov Chains
Data processing
030104 developmental biology
Unsupervised learning
lcsh:Q
Neural Networks, Computer
K562 Cells
030217 neurology & neurosurgery
Software
Algorithms
Unsupervised Machine Learning
Subjects
Details
- Language :
- English
- ISSN :
- 20411723
- Volume :
- 11
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Nature Communications
- Accession number :
- edsair.doi.dedup.....8defd19d11fa39ae1b85333f61827e8d