1. CoRE-ATAC: A deep learning model for the functional classification of regulatory elements from single cell and bulk ATAC-seq data
- Author
-
Shubham Khetan, Asa Thibodeau, Alper Eroglu, Ryan Tewhey, Duygu Ucar, and Michael L. Stitzel
- Subjects
Gene Expression ,ATAC-seq ,Regulatory Sequences, Nucleic Acid ,Monocytes ,Animal Cells ,Medicine and Health Sciences ,Biology (General) ,Materials ,Cells, Cultured ,Ecology ,Chromosome Biology ,Eukaryota ,Genomics ,Plants ,Cell sorting ,Legumes ,Chromatin ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Chromatin Immunoprecipitation Sequencing ,Epigenetics ,Single-Cell Analysis ,Cellular Types ,Research Article ,Cell type ,QH301-705.5 ,Immune Cells ,Materials Science ,Immunology ,Computational biology ,Biology ,DNA sequencing ,Cellular and Molecular Neuroscience ,Islets of Langerhans ,Deep Learning ,Genetics ,Humans ,Gene Regulation ,Enhancer ,Molecular Biology ,Transcription factor ,Ecology, Evolution, Behavior and Systematics ,Organisms ,Peas ,Computational Biology ,Biology and Life Sciences ,Promoter ,DNA ,Cell Biology ,Insulators ,Genome Analysis ,Genome Annotation - Abstract
Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n = 6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n = 40 samples) that were not used in model training (mean average precision = 0.80, mean F1 score = 0.70). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases., Author summary Non-coding DNA sequences serve different functional roles to regulate gene expression. For these sequences to be active, they must be accessible for proteins to bind and carry out specific regulatory functions. Even so, mutations or other regulatory events may modulate their activity or regulatory function, making it critical to infer their function to understand their regulatory impact. Current sequencing technologies capture accessible sequences from low cell numbers, enabling the study of clinical samples. However, determining their functional role remains a challenge. For example, enhancers and insulators serve distinct regulatory functions, yet both fall in open chromatin regions and have similar genomic annotations (i.e., distance to transcription start site). Hence, alternative data sources and features (e.g., from DNA sequence or ATAC-seq data) must be integrated to distinguish them. Towards this goal, we developed CoRE-ATAC to infer whether open chromatin regions correspond to promoters, enhancers, or insulators. We demonstrate that CoRE-ATAC can infer regulatory functions in diverse cell types, capture activity differences modulated by genetic mutations, and can be applied to single cell ATAC-seq data to study rare cell populations. These inferences will further our understanding of how genes are regulated and how these regulatory mechanisms are disrupted with disease.
- Published
- 2021