1,326 results on '"STARR-seq"'
Search Results
2. HDI-STARR-seq: Condition-specific enhancer discovery in mouse liver in vivo.
- Author
-
Chang TY and Waxman DJ
- Abstract
Background: STARR-seq and other massively-parallel reporter assays are widely used to discover functional enhancers in transfected cell models, which can be confounded by plasmid vector-induced type-I interferon immune responses and lack the multicellular environment and endogenous chromatin state of complex mammalian tissues., Results: Here, we describe HDI-STARR-seq, which combines STARR-seq plasmid library delivery to the liver, by hydrodynamic tail vein injection (HDI), with reporter RNA transcriptional initiation driven by a minimal Albumin promoter, which we show is essential for mouse liver STARR-seq enhancer activity assayed 7 days after HDI. Importantly, little or no vector-induced innate type-I interferon responses were observed. Comparisons of HDI-STARR-seq activity between male and female mouse livers and in livers from males treated with an activating ligand of the transcription factor CAR ( Nr1i3 ) identified many condition-dependent enhancers linked to condition-specific gene expression. Further, thousands of active liver enhancers were identified using a high complexity STARR-seq library comprised of ~ 50,000 genomic regions released by DNase-I digestion of mouse liver nuclei. When compared to stringently inactive library sequences, the active enhancer sequences identified were highly enriched for liver open chromatin regions with activating histone marks (H3K27ac, H3K4me1, H3K4me3), were significantly closer to gene transcriptional start sites, and were significantly depleted of repressive (H3K27me3, H3K9me3) and transcribed region histone marks (H3K36me3)., Conclusions: HDI-STARR-seq offers substantial improvements over current methodologies for large scale, functional profiling of enhancers, including condition-dependent enhancers, in liver tissue in vivo, and can be adapted to characterize enhancer activities in a variety of species and tissues by selecting suitable tissue- and species-specific promoter sequences., Competing Interests: Competing interests The authors declare that they have no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
3. Identification of Highly Repetitive Enhancers with Long-range Regulation Potential in Barley via STARR-seq.
- Author
-
Zhou W, Shi H, Wang Z, Huang Y, Ni L, Chen X, Liu Y, Li H, Li C, and Liu Y
- Subjects
- Histones metabolism, Histones genetics, DNA Transposable Elements genetics, Genome, Plant genetics, Repetitive Sequences, Nucleic Acid genetics, Sequence Analysis, DNA methods, Hordeum genetics, Hordeum metabolism, Enhancer Elements, Genetic genetics, Gene Expression Regulation, Plant genetics
- Abstract
Enhancers are DNA sequences that can strengthen transcription initiation. However, the global identification of plant enhancers is complicated due to uncertainty in the distance and orientation of enhancers, especially in species with large genomes. In this study, we performed self-transcribing active regulatory region sequencing (STARR-seq) for the first time to identify enhancers across the barley genome. A total of 7323 enhancers were successfully identified, and among 45 randomly selected enhancers, over 75% were effective as validated by a dual-luciferase reporter assay system in the lower epidermis of tobacco leaves. Interestingly, up to 53.5% of the barley enhancers were repetitive sequences, especially transposable elements (TEs), thus reinforcing the vital role of repetitive enhancers in gene expression. Both the common active mark H3K4me3 and repressive mark H3K27me3 were abundant among the barley STARR-seq enhancers. In addition, the functional range of barley STARR-seq enhancers seemed much broader than that of rice or maize and extended to ±100 kb of the gene body, and this finding was consistent with the high expression levels of genes in the genome. This study specifically depicts the unique features of barley enhancers and provides available barley enhancers for further utilization., (© The Author(s) 2024. Published by Oxford University Press and Science Press on behalf of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China.)
- Published
- 2024
- Full Text
- View/download PDF
4. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights.
- Author
-
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, and Georgakopoulos-Soares I
- Subjects
- Humans, Computational Biology methods, Transcription Factors metabolism, Transcription Factors genetics, Gene Expression Regulation genetics, Animals, Regulatory Elements, Transcriptional genetics, CRISPR-Cas Systems genetics, Gene Regulatory Networks, Machine Learning
- Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease., (© 2024 The Authors. BioEssays published by Wiley Periodicals LLC.)
- Published
- 2024
- Full Text
- View/download PDF
5. An unbiased AAV-STARR-seq screen revealing the enhancer activity map of genomic regions in the mouse brain in vivo
- Author
-
Ya-Chien Chan, Eike Kienle, Martin Oti, Antonella Di Liddo, Maria Mendez-Lago, Dominik F. Aschauer, Manuel Peter, Michaela Pagani, Cosmas Arnold, Andreas Vonderheit, Christian Schön, Sebastian Kreuz, Alexander Stark, and Simon Rumpel
- Subjects
Medicine ,Science - Abstract
Abstract Enhancers are important cis-regulatory elements controlling cell-type specific expression patterns of genes. Furthermore, combinations of enhancers and minimal promoters are utilized to construct small, artificial promoters for gene delivery vectors. Large-scale functional screening methodology to construct genomic maps of enhancer activities has been successfully established in cultured cell lines, however, not yet applied to terminally differentiated cells and tissues in a living animal. Here, we transposed the Self-Transcribing Active Regulatory Region Sequencing (STARR-seq) technique to the mouse brain using adeno-associated-viruses (AAV) for the delivery of a highly complex screening library tiling entire genomic regions and covering in total 3 Mb of the mouse genome. We identified 483 sequences with enhancer activity, including sequences that were not predicted by DNA accessibility or histone marks. Characterizing the expression patterns of fluorescent reporters controlled by nine candidate sequences, we observed differential expression patterns also in sparse cell types. Together, our study provides an entry point for the unbiased study of enhancer activities in organisms during health and disease.
- Published
- 2023
- Full Text
- View/download PDF
6. Inference of Transcriptional Regulation From STARR-seq Data
- Subjects
Nucleotide sequencing -- Genetic aspects ,Genetic research -- Genetic aspects ,DNA sequencing -- Genetic aspects ,RNA -- Genetic aspects ,Genetic transcription -- Genetic aspects ,Physical fitness ,Health - Abstract
2024 MAR 30 (NewsRx) -- By a News Reporter-Staff News Editor at Obesity, Fitness & Wellness Week -- According to news reporting based on a preprint abstract, our journalists obtained [...]
- Published
- 2024
7. HDI-STARR-seq: Condition-specific enhancer discovery in mouse liver in vivo
- Author
-
Chang, Ting-Ya, primary and Waxman, David J, additional
- Published
- 2024
- Full Text
- View/download PDF
8. Validation of Enhancer Regions in Primary Human Neural Progenitor Cells using Capture STARR-seq.
- Author
-
Gaynor-Gillett SC, Cheng L, Shi M, Liu J, Wang G, Spector M, Flaherty M, Wall M, Hwang A, Gu M, Chen Z, Chen Y, Consortium P, Moran JR, Zhang J, Lee D, Gerstein M, Geschwind D, and White KP
- Abstract
Genome-wide association studies (GWAS) and expression analyses implicate noncoding regulatory regions as harboring risk factors for psychiatric disease, but functional characterization of these regions remains limited. We performed capture STARR-sequencing of over 78,000 candidate regions to identify active enhancers in primary human neural progenitor cells (phNPCs). We selected candidate regions by integrating data from NPCs, prefrontal cortex, developmental timepoints, and GWAS. Over 8,000 regions demonstrated enhancer activity in the phNPCs, and we linked these regions to over 2,200 predicted target genes. These genes are involved in neuronal and psychiatric disease-associated pathways, including dopaminergic synapse, axon guidance, and schizophrenia. We functionally validated a subset of these enhancers using mutation STARR-sequencing and CRISPR deletions, demonstrating the effects of genetic variation on enhancer activity and enhancer deletion on gene expression. Overall, we identified thousands of highly active enhancers and functionally validated a subset of these enhancers, improving our understanding of regulatory networks underlying brain function and disease., Competing Interests: Competing interests: Kevin P. White is a shareholder of Tempus Labs, Inc. and Provaxus, Inc. All other authors declare that they have no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
9. New Findings from Sichuan Agricultural University Update Understanding of Genomics Proteomics and Bioinformatics (Identification of Highly Repetitive Enhancers With Long-range Regulation Potential In Barley Via Starr-seq)
- Subjects
Computational biology -- Research -- Laws, regulations and rules ,Genomics -- Laws, regulations and rules -- Research ,Genetic research -- Laws, regulations and rules ,Proteomics -- Research -- Laws, regulations and rules ,Government regulation ,Biotechnology industry ,Pharmaceuticals and cosmetics industries - Abstract
2024 SEP 18 (NewsRx) -- By a News Reporter-Staff News Editor at Biotech Week -- Investigators publish new report on Biotechnology - Genomics Proteomics and Bioinformatics. According to news reporting [...]
- Published
- 2024
10. Inference of Transcriptional Regulation From STARR-seq Data
- Author
-
Safaeesirat, Amin, primary, Taeb, Hoda, additional, Tekoglu, Emirhan, additional, Morova, Tunc, additional, Lack, Nathan A., additional, and Emberly, Eldon, additional
- Published
- 2024
- Full Text
- View/download PDF
11. HDI-STARR-seq: Condition-specific enhancer discovery in mouse liver in vivo
- Subjects
Biological response modifiers ,Liver ,Genetic vectors ,Biological sciences ,Health - Abstract
2024 JUN 25 (NewsRx) -- By a News Reporter-Staff News Editor at Life Science Weekly -- According to news reporting based on a preprint abstract, our journalists obtained the following [...]
- Published
- 2024
12. Identification of highly repetitive barley enhancers with long-range regulation potential via STARR-seq
- Author
-
Zhou, Wanlin, primary, Shi, Haoran, additional, Wang, Zhiqiang, additional, Huang, Yuxin, additional, Ni, Lin, additional, Chen, Xudong, additional, Liu, Yan, additional, Li, Haojie, additional, Li, Caixia, additional, and Liu, Yaxi, additional
- Published
- 2024
- Full Text
- View/download PDF
13. Computational Analysis of Maize Enhancer Regulatory Elements Using ATAC-STARR-seq
- Author
-
Marand, Alexandre, primary
- Published
- 2024
- Full Text
- View/download PDF
14. STARRPeaker: uniform processing and accurate identification of STARR-seq active regions
- Author
-
Donghoon Lee, Manman Shi, Jennifer Moran, Martha Wall, Jing Zhang, Jason Liu, Dominic Fitzgerald, Yasuhiro Kyono, Lijia Ma, Kevin P. White, and Mark Gerstein
- Subjects
Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract STARR-seq technology has employed progressively more complex genomic libraries and increased sequencing depths. An issue with the increased complexity and depth is that the coverage in STARR-seq experiments is non-uniform, overdispersed, and often confounded by sequencing biases, such as GC content. Furthermore, STARR-seq readout is confounded by RNA secondary structure and thermodynamic stability. To address these potential confounders, we developed a negative binomial regression framework for uniformly processing STARR-seq data, called STARRPeaker. Moreover, to aid our effort, we generated whole-genome STARR-seq data from the HepG2 and K562 human cell lines and applied STARRPeaker to comprehensively and unbiasedly call enhancers in them.
- Published
- 2020
- Full Text
- View/download PDF
15. Underlying causes for prevalent false positives and false negatives in STARR-seq data.
- Author
-
Ni P, Wu S, and Su Z
- Abstract
Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-seq peaks in repressive chromatin might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. Although methods have been proposed to mitigate systematic errors caused by the use of plasmid vectors, the artifacts due to the intrinsic limitations of current STARR-seq methods are still prevalent and the underlying causes are not fully understood. Based on predicted cis -regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines/tissues with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR-seq peaks generated by major variants of STARR-seq methods and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results., (© The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.)
- Published
- 2023
- Full Text
- View/download PDF
16. STARR-seq identifies active, chromatin-masked, and dormant enhancers in pluripotent mouse embryonic stem cells
- Author
-
Tianran Peng, Yanan Zhai, Yaser Atlasi, Menno ter Huurne, Hendrik Marks, Hendrik G. Stunnenberg, and Wout Megchelenbrink
- Subjects
Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Enhancers are distal regulators of gene expression that shape cell identity and control cell fate transitions. In mouse embryonic stem cells (mESCs), the pluripotency network is maintained by the function of a complex network of enhancers, that are drastically altered upon differentiation. Genome-wide chromatin accessibility and histone modification assays are commonly used as a proxy for identifying putative enhancers and for describing their activity levels and dynamics. Results Here, we applied STARR-seq, a genome-wide plasmid-based assay, as a read-out for the enhancer landscape in “ground-state” (2i+LIF; 2iL) and “metastable” (serum+LIF; SL) mESCs. This analysis reveals that active STARR-seq loci show modest overlap with enhancer locations derived from peak calling of ChIP-seq libraries for common enhancer marks. We unveil ZIC3-bound loci with significant STARR-seq activity in SL-ESCs. Knock-out of Zic3 removes STARR-seq activity only in SL-ESCs and increases their propensity to differentiate towards the endodermal fate. STARR-seq also reveals enhancers that are not accessible, masked by a repressive chromatin signature. We describe a class of dormant, p53 bound enhancers that gain H3K27ac under specific conditions, such as after treatment with Nocodazol, or transiently during reprogramming from fibroblasts to pluripotency. Conclusions In conclusion, loci identified as active by STARR-seq often overlap with those identified by chromatin accessibility and active epigenetic marking, yet a significant fraction is epigenetically repressed or display condition-specific enhancer activity.
- Published
- 2020
- Full Text
- View/download PDF
17. STARR-seq identifies active, chromatin-masked, and dormant enhancers in pluripotent mouse embryonic stem cells
- Author
-
Peng, Tianran, Zhai, Yanan, Atlasi, Yaser, ter Huurne, Menno, Marks, Hendrik, Stunnenberg, Hendrik G., and Megchelenbrink, Wout
- Published
- 2020
- Full Text
- View/download PDF
18. Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq
- Author
-
Arnold, Cosmas D., Gerlach, Daniel, Stelzer, Christoph, Boryń, Łukasz M., Rath, Martina, and Stark, Alexander
- Published
- 2013
- Full Text
- View/download PDF
19. University Medical Center Researchers Publish New Studies and Findings in the Area of Gene Therapy (An unbiased AAV-STARR-seq screen revealing the enhancer activity map of genomic regions in the mouse brain in vivo)
- Subjects
Boehringer Ingelheim GmbH -- Reports ,Medical centers -- Reports -- Research ,Genes -- Reports -- Research ,Gene therapy -- Research -- Reports ,Physical fitness -- Research -- Reports ,Pharmaceutical industry -- Reports -- Research ,Health - Abstract
2023 MAY 20 (NewsRx) -- By a News Reporter-Staff News Editor at Obesity, Fitness & Wellness Week -- Investigators publish new report on gene therapy. According to news originating from [...]
- Published
- 2023
20. STARRPeaker: uniform processing and accurate identification of STARR-seq active regions
- Author
-
Lee, Donghoon, Shi, Manman, Moran, Jennifer, Wall, Martha, Zhang, Jing, Liu, Jason, Fitzgerald, Dominic, Kyono, Yasuhiro, Ma, Lijia, White, Kevin P., and Gerstein, Mark
- Published
- 2020
- Full Text
- View/download PDF
21. Study Findings on Genomics Proteomics and Bioinformatics Published by a Researcher at Sichuan Agricultural University (Identification of highly repetitive barley enhancers with long-range regulation potential via STARR-seq)
- Subjects
Biochemistry -- Laws, regulations and rules -- Research -- Reports ,Computational biology -- Research -- Reports -- Laws, regulations and rules ,Genomics -- Laws, regulations and rules -- Reports -- Research ,Genetic research -- Laws, regulations and rules -- Reports ,Proteomics -- Research -- Laws, regulations and rules -- Reports ,Government regulation ,Biotechnology industry ,Pharmaceuticals and cosmetics industries - Abstract
2024 MAR 13 (NewsRx) -- By a News Reporter-Staff News Editor at Biotech Week -- New study results on genomics proteomics and bioinformatics have been published. According to news originating [...]
- Published
- 2024
22. Global Quantitative Mapping of Enhancers in Rice by STARR-seq
- Author
-
Jialei Sun, Na He, Longjian Niu, Yingzhang Huang, Wei Shen, Yuedong Zhang, Li Li, and Chunhui Hou
- Subjects
Biology (General) ,QH301-705.5 ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Enhancers activate transcription in a distance-, orientation-, and position-independent manner, which makes them difficult to be identified. Self-transcribing active regulatory region sequencing (STARR-seq) measures the enhancer activity of millions of DNA fragments in parallel. Here we used STARR-seq to generate a quantitative global map of rice enhancers. Most enhancers were mapped within genes, especially at the 5′ untranslated regions (5′UTR) and in coding sequences. Enhancers were also frequently mapped proximal to silent and lowly-expressed genes in transposable element (TE)-rich regions. Analysis of the epigenetic features of enhancers at their endogenous loci revealed that most enhancers do not co-localize with DNase I hypersensitive sites (DHSs) and lack the enhancer mark of histone modification H3K4me1. Clustering analysis of enhancers according to their epigenetic marks revealed that about 40% of identified enhancers carried one or more epigenetic marks. Repressive H3K27me3 was frequently enriched with positive marks, H3K4me3 and/or H3K27ac, which together label enhancers. Intergenic enhancers were also predicted based on the location of DHS regions relative to genes, which overlap poorly with STARR-seq enhancers. In summary, we quantitatively identified enhancers by functional analysis in the genome of rice, an important model plant. This work provides a valuable resource for further mechanistic studies in different biological contexts. Keywords: Plant, Enhancer, Functional analysis, Epigenetic modification, Gene expression
- Published
- 2019
- Full Text
- View/download PDF
23. Computational Analysis of Maize Enhancer Regulatory Elements Using ATAC-STARR-seq.
- Author
-
Marand AP
- Abstract
The blueprints to development, response to the environment, and cellular function are largely the manifestation of distinct gene expression programs controlled by the spatiotemporal activity of cis -regulatory elements. Although biochemical methods for identifying accessible chromatin - a hallmark of active cis -regulatory elements - have been developed, approaches capable of measuring and quantifying cis -regulatory activity are only beginning to be realized. Massively Parallel Reporter Assays coupled to chromatin accessibility profiling present a high-throughput solution for testing the transcription-activating capacity of millions of putatively regulatory DNA sequences in parallel. However, clear computational pipelines for analyzing these high-throughput sequencing-based reporter assays are lacking. In this protocol, I layout and rationalize a computational framework for the processing and analysis of Assay for Transposase Accessible Chromatin profiling followed by Self-Transcribed Active Regulatory Region sequencing (ATAC-STARR-seq) data from a recent study in Zea mays. The approach described herein can be adapted to other sequencing-based reporter assays and is largely agnostic to the model organism with the appropriate input substitutions., Competing Interests: Competing interests A.P.M. declares no competing interests.
- Published
- 2023
- Full Text
- View/download PDF
24. Challenges and considerations for reproducibility of STARR-seq assays
- Author
-
Maitreya Das, Ayaan Hossain, Deepro Banerjee, Craig Alan Praul, and Santhosh Girirajan
- Subjects
Genetics ,Genetics (clinical) - Abstract
High-throughput methods such as RNA-seq, ChIP-seq, and ATAC-seq have well-established guidelines, commercial kits, and analysis pipelines that enable consistency and wider adoption for understanding genome function and regulation. STARR-seq, a popular assay for directly quantifying the activities of thousands of enhancer sequences simultaneously, has seen limited standardization across studies. The assay is long, with more than 250 steps, and frequent customization of the protocol and variations in bioinformatics methods raise concerns for reproducibility of STARR-seq studies. Here, we assess each step of the protocol and analysis pipelines from published sources and in-house assays, and identify critical steps and quality control (QC) checkpoints necessary for reproducibility of the assay. We also provide guidelines for experimental design, protocol scaling, customization, and analysis pipelines for better adoption of the assay. These resources will allow better optimization of STARR-seq for specific research needs, enable comparisons and integration across studies, and improve the reproducibility of results.
- Published
- 2023
25. Genome‐wide prediction of activating regulatory elements in rice by combining STARR‐seq with FACS.
- Author
-
Tian, Wei, Huang, Xi, and Ouyang, Xinhao
- Subjects
- *
PLANT performance , *FORECASTING , *PROTOPLASTS - Abstract
Summary: Self‐transcribing active regulatory region sequencing (STARR‐seq) is widely used to identify enhancers at the whole‐genome level. However, whether STARR‐seq works as efficiently in plants as in animal systems remains unclear. Here, we determined that the traditional STARR‐seq method can be directly applied to rice (Oryza sativa) protoplasts to identify enhancers, though with limited efficiency. Intriguingly, we identified not only enhancers but also constitutive promoters with this technique. To increase the performance of STARR‐seq in plants, we optimized two procedures. We coupled fluorescence activating cell sorting (FACS) with STARR‐seq to alleviate the effect of background noise, and we minimized PCR cycles and retained duplicates during prediction, which significantly increased the positive rate for activating regulatory elements (AREs). Using this method, we determined that AREs are associated with AT‐rich regions and are enriched for a motif that the AP2/ERF family can recognize. Based on GC content preferences, AREs are clustered into two groups corresponding to promoters and enhancers. Either AT‐ or GC‐rich regions within AREs could boost transcription. Additionally, disruption of AREs resulted in abnormal expression of both proximal and distal genes, which suggests that STARR‐seq‐revealed elements function as enhancers in vivo. In summary, our work provides a promising method to identify AREs in plants. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Correcting signal biases and detecting regulatory elements in STARR-seq data
- Author
-
Thomas N. Cowart, Alejandro Barrera, Andrew S. Allen, William H. Majoros, Alejandro Ochoa, Timothy E. Reddy, Jungkyun Seo, Graham Johnson, and Young-Sook Kim
- Subjects
0303 health sciences ,Genome, Human ,SIGNAL (programming language) ,High-Throughput Nucleotide Sequencing ,Method ,Statistical model ,Computational biology ,Variance (accounting) ,Biology ,Regulatory region ,03 medical and health sciences ,Enhancer Elements, Genetic ,0302 clinical medicine ,STARR-seq ,Bias ,Genetics ,Humans ,Human genome ,Precision and recall ,030217 neurology & neurosurgery ,Genetics (clinical) ,030304 developmental biology - Abstract
High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure regulatory element activity across the entire human genome at once. The resulting data, however, present substantial analytical challenges. Here, we identify technical biases that explain most of the variance in STARR-seq data. We then develop a statistical model to correct those biases and to improve detection of regulatory elements. This approach substantially improves precision and recall over current methods, improves detection of both activating and repressive regulatory elements, and controls for false discoveries despite strong local correlations in signal.
- Published
- 2021
27. Challenges and considerations for reproducibility of STARR-seq assays.
- Author
-
Das M, Hossain A, Banerjee D, Praul CA, and Girirajan S
- Subjects
- Reproducibility of Results, Computational Biology methods, High-Throughput Nucleotide Sequencing methods, Sequence Analysis, DNA methods, Genome, Regulatory Sequences, Nucleic Acid
- Abstract
High-throughput methods such as RNA-seq, ChIP-seq, and ATAC-seq have well-established guidelines, commercial kits, and analysis pipelines that enable consistency and wider adoption for understanding genome function and regulation. STARR-seq, a popular assay for directly quantifying the activities of thousands of enhancer sequences simultaneously, has seen limited standardization across studies. The assay is long, with more than 250 steps, and frequent customization of the protocol and variations in bioinformatics methods raise concerns for reproducibility of STARR-seq studies. Here, we assess each step of the protocol and analysis pipelines from published sources and in-house assays, and identify critical steps and quality control (QC) checkpoints necessary for reproducibility of the assay. We also provide guidelines for experimental design, protocol scaling, customization, and analysis pipelines for better adoption of the assay. These resources will allow better optimization of STARR-seq for specific research needs, enable comparisons and integration across studies, and improve the reproducibility of results., (© 2023 Das et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2023
- Full Text
- View/download PDF
28. STARR-seq identifies active, chromatin-masked, and dormant enhancers in pluripotent mouse embryonic stem cells
- Author
-
Peng, T., Zhai, Y., Atlasi, Yaser, Huurne, M.C. ter, Marks, Hendrik, Stunnenberg, Hendrik G., Megchelenbrink, Wout, Peng, T., Zhai, Y., Atlasi, Yaser, Huurne, M.C. ter, Marks, Hendrik, Stunnenberg, Hendrik G., and Megchelenbrink, Wout
- Abstract
Contains fulltext : 224964.pdf (publisher's version ) (Open Access)
- Published
- 2020
29. STARR-seq for high-throughput identification of plant enhancers.
- Author
-
Zhang L, Yung WS, and Huang M
- Subjects
- High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA, Plants, Genome-Wide Association Study, Genomics, Regulatory Sequences, Nucleic Acid
- Abstract
Competing Interests: Declaration of interests No interests are declared.
- Published
- 2022
- Full Text
- View/download PDF
30. Underlying causes for prevalent false positives and false negatives in STARR-seq data
- Author
-
Ni, Pengyu, primary, Wu, Siwen, additional, and Su, Zhengchang, additional
- Published
- 2023
- Full Text
- View/download PDF
31. Sequence model evaluation framework for STARR-seq peak calling.
- Author
-
Christopher R. Beal, John G. Peters, and Ronald J. Nowling
- Published
- 2021
- Full Text
- View/download PDF
32. Identification of Plant Enhancers and Their Constituent Elements by STARR-seq in Tobacco Leaves
- Author
-
Jackson Tonnies, Josh T. Cuperus, Michael W. Dorrity, Stanley Fields, Tobias Jores, and Christine Queitsch
- Subjects
0106 biological sciences ,0301 basic medicine ,Light ,Agrobacterium ,Green Fluorescent Proteins ,Population ,Nicotiana benthamiana ,Plant Science ,Computational biology ,Breakthrough Report ,Proof of Concept Study ,01 natural sciences ,Genome ,In Brief ,Plant Viruses ,chemistry.chemical_compound ,03 medical and health sciences ,Transformation, Genetic ,STARR-seq ,Gene Expression Regulation, Plant ,Genes, Reporter ,Plant virus ,Tobacco ,Promoter Regions, Genetic ,Saturated mutagenesis ,education ,Enhancer ,Gene ,Triticum ,Plant Proteins ,030304 developmental biology ,2. Zero hunger ,0303 health sciences ,education.field_of_study ,Reporter gene ,biology ,fungi ,food and beverages ,Cell Biology ,Plants, Genetically Modified ,biology.organism_classification ,Plant Leaves ,Transformation (genetics) ,Enhancer Elements, Genetic ,030104 developmental biology ,chemistry ,DNA ,010606 plant biology & botany - Abstract
Genetic engineering of cis-regulatory elements in crop plants is a promising strategy to ensure food security. However, such engineering is currently hindered by our limited knowledge of plant cis-regulatory elements. Here, we adapted STARR-seq — a technology for the high-throughput identification of enhancers — for its use in transiently transformed tobacco leaves. We demonstrate that the optimal placement in the reporter construct of enhancer sequences from a plant virus, pea and wheat was just upstream of a minimal promoter, and that none of these four known enhancers was active in the 3′-UTR of the reporter gene. The optimized assay sensitively identified small DNA regions containing each of the four enhancers, including two whose activity was stimulated by light. Furthermore, we coupled the assay to saturation mutagenesis to pinpoint functional regions within an enhancer, which we recombined to create synthetic enhancers. Our results describe an approach to define enhancer properties that can be performed in potentially any plant species or tissue transformable by Agrobacterium and that can use regulatory DNA derived from any plant genome.One-sentence summaryWe developed a high-throughput assay in transiently transformed tobacco leaves that can identify enhancers, characterize their functional elements and detect condition-specific enhancer activity.
- Published
- 2020
33. ATAC-STARR-seq reveals transcription factor–bound activators and silencers within chromatin-accessible regions of the human genome
- Author
-
Emily Hodges and Tyler Hansen
- Subjects
Genetics ,Genetics (clinical) - Abstract
Massively parallel reporter assays (MPRAs) test the capacity of putative gene regulatory elements to drive transcription on a genome-wide scale. Most gene regulatory activity occurs within accessible chromatin, and recently described methods have combined assays that capture these regions—such as assay for transposase-accessible chromatin using sequencing (ATAC-seq)—with self-transcribing active regulatory region sequencing (STARR-seq) to selectively assay the regulatory potential of accessible DNA (ATAC-STARR-seq). Here, we report an integrated approach that quantifies activating and silencing regulatory activity, chromatin accessibility, and transcription factor (TF) occupancy with one assay using ATAC-STARR-seq. Our strategy, including important updates to the ATAC-STARR-seq assay and workflow, enabled high-resolution testing of ∼50 million unique DNA fragments tiling ∼101,000 accessible chromatin regions in human lymphoblastoid cells. We discovered that 30% of all accessible regions contain an activator, a silencer, or both. Although few MPRA studies have explored silencing activity, we demonstrate that silencers occur at similar frequencies to activators, and they represent a distinct functional group enriched for unique TF motifs and repressive histone modifications. We further show that Tn5 cut-site frequencies are retained in the ATAC-STARR plasmid library compared to standard ATAC-seq, enabling TF occupancy to be ascertained from ATAC-STARR data. With this approach, we found that activators and silencers cluster by distinct TF footprint combinations, and these groups of activity represent different gene regulatory networks of immune cell function. Altogether, these data highlight the multilayered capabilities of ATAC-STARR-seq to comprehensively investigate the regulatory landscape of the human genome all from a single DNA fragment source.
- Published
- 2022
34. Filtering STARR-Seq Peaks for Enhancers with Sequence Models.
- Author
-
Ronald J. Nowling, Rafael Reple Geromel, and Benjamin Halligan
- Published
- 2020
- Full Text
- View/download PDF
35. Underlying causes for prevalent false positives and false negatives in STARR-seq data
- Author
-
Pengyu Ni, Siwen Wu, and Zhengchang Su
- Abstract
STARR-seq and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR peaks are located in repressive chromatins and are not functional in the tested cells. While some of the STARR peaks in repressive chromatins might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. However, the prevalence of and underlying causes for the artifacts are not fully understood. Based on predictedcis-regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR peaks and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results.
- Published
- 2023
36. Diff-ATAC-STARR-Seq: A Method for Genome-Wide Functional Screening of Enhancer Activity in Vivo
- Author
-
Kazuki Nagayasu, Chihiro Andoh, Hisashi Shirakawa, and Shuji Kaneko
- Subjects
Pharmacology ,Pharmaceutical Science ,General Medicine - Published
- 2022
37. Sequence model evaluation framework for STARR-seq peak calling
- Author
-
John G. Peters, Christopher R. Beal, and Ronald J. Nowling
- Subjects
Sequence ,STARR-seq ,Software ,business.industry ,Computer science ,Experimental data ,Pattern recognition ,Artificial intelligence ,business ,Enhancer ,Base (topology) ,Data type ,Peak calling - Abstract
Enhancers are short regions of non-coding DNA that increase transcription rates of genes despite being located distantly from the genes themselves [5]. Enhancers are identified through experimental techniques such as ChIP-Seq or CUT&RUN with H3K4me1 and H3K27ac histone modifications, self-transcribing active regulatory region sequencing (STARR-Seq), and massively parallel reporter assays (MPRA). Machine learning models have been used in conjunction with experimental data to identify enhancer activity from sequences [3], predict enhancer-transcription factor interactions [4], and decode the enhancer regulatory language [2]. We describe a framework that connects peak calling errors to the prediction accuracy of sequence models. The key assumptions of our framework are that (1) enhancers have consistent sequence patterns that can be used to separate enhancers from control sequences, (2) errors in the training data impact prediction accuracies in predictable ways, and (3) prediction accuracy is a useful proxy for evaluating peak calling accuracy. In the framework, data sets are constructed from peak (positive) and randomly sampled (control) sequences. Machine learning models are trained and evaluated on the sequences in a cross-chromosome (cross-fold) setup. Lastly, precision of the originating peaks are evaluated by calculating true and false positive rates. We applied our framework to evaluate peaks for D. melanogaster STARR-Seq data [1] called with the MACS software [6]. Although designed for ChIP-Seq data, MACS can be used to process other types of data, but users must be careful about parameter choices. We evaluated different parameter combinations with our framework and visual comparisons of called peaks. True and false positive rates ranged from a high of 88.0% to a low of 74.7% and from a low of 18.6% to a high of 49.4%, respectively. The default MACS parameters produced the highest true and lowest false positive rates, suggesting that the default parameters are also suitable for STARR-Seq data. Our results demonstrate the utility of our framework through a practical application and provide a base for future development.
- Published
- 2021
38. Computational Analysis of Maize Enhancer Regulatory Elements Using ATAC-STARR-seq
- Author
-
Marand, Alexandre P., primary
- Published
- 2023
- Full Text
- View/download PDF
39. Identification of Barley Enhancers across Genome via STARR-seq
- Author
-
Zhou, Wanlin, primary, Shi, Haoran, additional, Wang, Zhiqiang, additional, Huang, Yuxin, additional, Ni, Lin, additional, Chen, Xudong, additional, Liu, Yan, additional, Li, Haojie, additional, Li, Caixia, additional, and Liu, Yaxi, additional
- Published
- 2022
- Full Text
- View/download PDF
40. Assessing genome-wide dynamic changes in enhancer activity during early mESC differentiation by FAIRE-STARR-seq
- Author
-
Edda Einfeldt, Mara Steiger, Laura V. Glaser, Martin Vingron, Alisa Fuchs, Alena van Boemmel, Sebastiaan H. Meijsing, and Ho-Ryun Chung
- Subjects
Regulation of gene expression ,Receptors, Retinoic Acid ,AcademicSubjects/SCI00010 ,Gene regulation, Chromatin and Epigenetics ,Retinoic acid ,Chromosome Mapping ,Cell Differentiation ,Mouse Embryonic Stem Cells ,Biology ,Cell biology ,Mice ,chemistry.chemical_compound ,Enhancer Elements, Genetic ,STARR-seq ,chemistry ,Retinoic acid receptor alpha ,Regulatory sequence ,Genetics ,Animals ,Function and Dysfunction of the Nervous System ,Enhancer ,Sequence motif ,Gene - Abstract
Embryonic stem cells (ESCs) can differentiate into any given cell type and therefore represent a versatile model to study the link between gene regulation and differentiation. To quantitatively assess the dynamics of enhancer activity during the early stages of murine ESC differentiation, we analyzed accessible genomic regions using STARR-seq, a massively parallel reporter assay. This resulted in a genome-wide quantitative map of active mESC enhancers, in pluripotency and during the early stages of differentiation. We find that only a minority of accessible regions is active and that such regions are enriched near promoters, characterized by specific chromatin marks, enriched for distinct sequence motifs, and modeling shows that active regions can be predicted from sequence alone. Regions that change their activity upon retinoic acid-induced differentiation are more prevalent at distal intergenic regions when compared to constitutively active enhancers. Further, analysis of differentially active enhancers verified the contribution of individual TF motifs toward activity and inducibility as well as their role in regulating endogenous genes. Notably, the activity of retinoic acid receptor alpha (RARα) occupied regions can either increase or decrease upon the addition of its ligand, retinoic acid, with the direction of the change correlating with spacing and orientation of the RARα consensus motif and the co-occurrence of additional sequence motifs. Together, our genome-wide enhancer activity map elucidates features associated with enhancer activity levels, identifies regulatory regions disregarded by computational prediction tools, and provides a resource for future studies into regulatory elements in mESCs.
- Published
- 2021
41. University of North Carolina Charlotte Researcher Provides New Study Findings on Genomics and Bioinformatics (Underlying causes for prevalent false positives and false negatives in STARR-seq data)
- Subjects
Biochemistry -- Research -- Reports ,Computational biology -- Research -- Reports ,Genetic research -- Reports ,Biotechnology industry ,Pharmaceuticals and cosmetics industries ,University of North Carolina -- Reports - Abstract
2023 OCT 11 (NewsRx) -- By a News Reporter-Staff News Editor at Biotech Week -- Current study results on genomics and bioinformatics have been published. According to news originating from [...]
- Published
- 2023
42. ATAC-STARR-seq reveals transcription factor-bound activators and silencers within chromatin-accessible regions of the human genome.
- Author
-
Hansen TJ and Hodges E
- Subjects
- Humans, Chromatin Immunoprecipitation Sequencing methods, Transposases metabolism, Transposases genetics, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA methods, Binding Sites, Silencer Elements, Transcriptional, Genome, Human, Chromatin metabolism, Chromatin genetics, Transcription Factors metabolism, Transcription Factors genetics
- Abstract
Massively parallel reporter assays (MPRAs) test the capacity of putative gene regulatory elements to drive transcription on a genome-wide scale. Most gene regulatory activity occurs within accessible chromatin, and recently described methods have combined assays that capture these regions-such as assay for transposase-accessible chromatin using sequencing (ATAC-seq)-with self-transcribing active regulatory region sequencing (STARR-seq) to selectively assay the regulatory potential of accessible DNA (ATAC-STARR-seq). Here, we report an integrated approach that quantifies activating and silencing regulatory activity, chromatin accessibility, and transcription factor (TF) occupancy with one assay using ATAC-STARR-seq. Our strategy, including important updates to the ATAC-STARR-seq assay and workflow, enabled high-resolution testing of ∼50 million unique DNA fragments tiling ∼101,000 accessible chromatin regions in human lymphoblastoid cells. We discovered that 30% of all accessible regions contain an activator, a silencer, or both. Although few MPRA studies have explored silencing activity, we demonstrate that silencers occur at similar frequencies to activators, and they represent a distinct functional group enriched for unique TF motifs and repressive histone modifications. We further show that Tn5 cut-site frequencies are retained in the ATAC-STARR plasmid library compared to standard ATAC-seq, enabling TF occupancy to be ascertained from ATAC-STARR data. With this approach, we found that activators and silencers cluster by distinct TF footprint combinations, and these groups of activity represent different gene regulatory networks of immune cell function. Altogether, these data highlight the multilayered capabilities of ATAC-STARR-seq to comprehensively investigate the regulatory landscape of the human genome all from a single DNA fragment source., (© 2022 Hansen and Hodges; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2022
- Full Text
- View/download PDF
43. Functional Definition of Thyroid Hormone Response Elements Based on a Synthetic STARR-seq Screen.
- Author
-
Flamant F, Zekri Y, and Guyot R
- Subjects
- DNA metabolism, Response Elements, Retinoid X Receptors genetics, Thyroid Hormones, Receptors, Retinoic Acid genetics, Receptors, Thyroid Hormone metabolism
- Abstract
When bound to thyroid hormone, the nuclear receptor TRα1 activates the transcription of a number of genes in many cell types. It mainly acts by binding DNA as a heterodimer with retinoid X receptors at specific response elements related to the DR4 consensus sequence. However, the number of DR4-like elements in the genome exceed by far the number of occupied sites, indicating that minor variations in nucleotides composition deeply influence the DNA-binding capacity and transactivation activity of TRα1. An improved protocol of synthetic self-transcribing active regulatory region sequencing was used to quantitatively assess the transcriptional activity of thousands of synthetic sites in parallel. This functional screen highlights a strong correlation between the affinity of the heterodimers for DNA and their capacity to mediate the thyroid hormone response., (© The Author(s) 2022. Published by Oxford University Press on behalf of the Endocrine Society.)
- Published
- 2022
- Full Text
- View/download PDF
44. Identification of Barley Enhancers across Genome via STARR-seq
- Author
-
Wanlin Zhou, Haoran Shi, Zhiqiang Wang, Yuxin Huang, Lin Ni, Xudong Chen, Yan Liu, Haojie Li, Caixia Li, and Yaxi Liu
- Abstract
Enhancers are DNA sequences that can strengthen transcription initiation. However, the global identification of plant enhancers is complicated due to uncertainty in the distance and orientation of enhancers, especially in species with large genomes. In this study, we performed self-transcribing active regulatory region sequencing (STARR-seq) for the first time to identify enhancers across the barley genome. A total of 7323 enhancers were successfully identified, and among 45 randomly selected enhancers, over 75% were effective as validated by a dual-luciferase reporter assay system in the lower epidermis of tobacco leaves. Interestingly, up to 53.5% of the barley enhancers were repetitive sequences, especially transposable elements (TEs), thus reinforcing the vital role of repetitive enhancers in gene expression. Both the common active transcription marker H3K4me3 and repressive histone marker H3K27me3 were abundant among the barley STARR-seq enhancers. In addition, the functional range of barley STARR-seq enhancers seemed much broader than that of rice or maize and extended to ± 100 KB of the gene body, and this finding was consistent with the high expression levels of genes in the genome. This work specifically depicts the unique features of barley enhancers and provides available barley enhancers for further utilization.
- Published
- 2022
45. Diff-ATAC-STARR-Seq: A Method for Genome-Wide Functional Screening of Enhancer Activity in Vivo
- Author
-
Nagayasu, Kazuki, primary, Andoh, Chihiro, additional, Shirakawa, Hisashi, additional, and Kaneko, Shuji, additional
- Published
- 2022
- Full Text
- View/download PDF
46. Computational Analysis of Maize Enhancer Regulatory Elements Using ATAC-STARR-seq
- Author
-
Alexandre Marand
- Abstract
The blueprints to development, response to the environment, and cellular function are largely the manifestation of distinct gene expression programs controlled by the spatiotemporal activity ofcis-regulatory elements. Although biochemical methods for identifying accessible chromatin – a hallmark of activecis-regulatory elements – have been developed, approaches capable of measuring and quantifyingcis-regulatory activity are only beginning to be realized. Massively Parallel Reporter Assays coupled to chromatin accessibility profiling present a high-throughput solution for testing the transcription-activating capacity of millions of putatively regulatory DNA sequences in parallel.However, clear computational pipelines for analyzing these high-throughput sequencing-based reporter assays are lacking. In this protocol, I layout and rationalize a computational framework for the processing and analysis of Assay for Transposase Accessible Chromatin profiling followed by Self-Transcribed Active Regulatory Region sequencing (ATAC-STARR-seq) data from a recent study inZea mays. The approach described herein can be adapted to other sequencing-based reporter assays and is largely agnostic to the model organism with the appropriate input substitutions.
- Published
- 2023
47. Resolving a Systematic Error in STARR-seq for Quantitative Enhancer Activity Mapping
- Author
-
Yingzhang Huang, Longjian Niu, Jing Wan, Chunhui Hou, Lin Li, Na He, and Jialei Sun
- Subjects
STARR-seq ,biology ,Polyadenylation ,Gene expression ,Arabidopsis thaliana ,Computational biology ,Epigenetics ,Primer (molecular biology) ,Enhancer ,biology.organism_classification ,Gene - Abstract
STARR-seq assesses millions of fragments in parallel measuring enhancer activity quantitatively. Here we show that STARR-seq is critically flawed with a systematic error in the cells of Arabidopsis thaliana (A. thaliana). Large amount of self-transcripts (STs) is lost during reverse transcription because these STs are polyadenylated after alternative polyadenylation sites (APAS) inside the test sequences. We solved this problem by using specially designed primer and recovered self-transcribed sequences independent from the PAS usage. In A. thaliana, we identified active enhancers and also enhancers quiescent in their endogenous genomic loci. Different from traditional STARR-seq identified enhancers, enhancers identified by new method are highly enriched in sequences proximal to the 5’ and 3’ ends of genes, and their epigenetic states correlate with gene expression levels. Our solution applies to methods based on self-transcript quantification. In addition, our results provide an invaluable functional enhancer activity map and insights into the functional complexity of enhancers in A. thaliana.
- Published
- 2020
48. Adaptation of STARR-seq method to be used with 3rd generation integrase-deficient promoterless lentiviral vectors
- Author
-
Azuolas Ciukas, Mantas Matjusaitis, and Donato Tedesco
- Subjects
Cell type ,STARR-seq ,biology ,Regulatory sequence ,Computer science ,biology.protein ,Promoter ,Computational biology ,Transfection ,Enhancer ,Gene ,Integrase - Abstract
Ability to functionally screen gene regulatory sequences, such as promoters and enhancers, in high throughput manner is an important prerequisite for many basic and translational research programs. One of the methods that allow such screening is STARR-seq, or self-transcribing active regulatory region sequencing. It allows to quickly screen millions of candidate sequences in the cell type of interest. However, it does rely on transfection as a delivery method which can be a limiting-step for some hard-to-transfect cells such as senescent cells. Here we show that integration-deficient and integration-competent promoterless lentiviral particles can be used to deliver STARR-seq constructs into cells. These constructs reported CMV enhancer activity both at protein and mRNA level. While further validations are necessary, ability to deliver STARR-seq libraries using lentiviral particles will significantly improve the versatility and usability of such a method.Competing Interest StatementThe authors have declared no competing interest.View Full Text
- Published
- 2020
49. Synthetic STARR-seq reveals how DNA shape and sequence modulate transcriptional output and noise.
- Author
-
Stefanie Schöne, Melissa Bothe, Edda Einfeldt, Marina Borschiwer, Philipp Benner, Martin Vingron, Morgane Thomas-Chollier, and Sebastiaan H Meijsing
- Subjects
Genetics ,QH426-470 - Abstract
The binding of transcription factors to short recognition sequences plays a pivotal role in controlling the expression of genes. The sequence and shape characteristics of binding sites influence DNA binding specificity and have also been implicated in modulating the activity of transcription factors downstream of binding. To quantitatively assess the transcriptional activity of tens of thousands of designed synthetic sites in parallel, we developed a synthetic version of STARR-seq (synSTARR-seq). We used the approach to systematically analyze how variations in the recognition sequence of the glucocorticoid receptor (GR) affect transcriptional regulation. Our approach resulted in the identification of a novel highly active functional GR binding sequence and revealed that sequence variation both within and flanking GR's core binding site can modulate GR activity without apparent changes in DNA binding affinity. Notably, we found that the sequence composition of variants with similar activity profiles was highly diverse. In contrast, groups of variants with similar activity profiles showed specific DNA shape characteristics indicating that DNA shape may be a better predictor of activity than DNA sequence. Finally, using single cell experiments with individual enhancer variants, we obtained clues indicating that the architecture of the response element can independently tune expression mean and cell-to cell variability in gene expression (noise). Together, our studies establish synSTARR as a powerful method to systematically study how DNA sequence and shape modulate transcriptional output and noise.
- Published
- 2018
- Full Text
- View/download PDF
50. Filtering STARR-Seq Peaks for Enhancers with Sequence Models
- Author
-
Benjamin Halligan, Rafael Reple Geromel, and Ronald J. Nowling
- Subjects
0303 health sciences ,business.industry ,Pattern recognition ,Genome ,03 medical and health sciences ,genomic DNA ,0302 clinical medicine ,STARR-seq ,Histogram ,Artificial intelligence ,Enhancer ,business ,Peak calling ,030217 neurology & neurosurgery ,030304 developmental biology ,Mathematics ,Sequence (medicine) ,Reference genome - Abstract
STARR-Seq is a high-throughput technique for directly identifying genomic regions with enhancer activity [1]. Genomic DNA is sheared, inserted into artificial plasmids designed so that DNA with enhancer activity trigger self-transcription, and transfected into culture cells. The resulting RNA is converted back into cDNA, sequenced, and aligned to a reference genome. "Peaks" are called by comparing observed read depth at each point to an expected read depth from control DNA using a statistical test. Examples of peak calling methods based on read depth include MACS2 [4], basicSTARRSeq, and STARRPeaker [3]. It is challenging to accurately distinguish between real peaks and artifacts in regions where mean read depth is low but the variance is high. Fortunately, enhancer activity is strongly correlated with sequence content. We propose using sequence-based machine learning models in a semi-supervised framework to filter peaks. 501-bp sequences centered on the a11k STARR peaks from [1] were extracted from the Drosophila melanogaster dm3 genome. Randomly-sampled 501-bp sequences were used as a negative set. Peaks were filtered using a Bonferroni-corrected significance value (α = 0.05) to create a "high-confidence" subset of a2.2k peaks. A Logistic Regression model with k-mer count features was trained on the high-confidence peak sequences and their negatives and used to classifying the remaining a8.8k peak sequences. The self-trained, sequenced-based model identified an additional a3.7k candidate enhancers ("medium confidence"). The remaining a5k STARR peaks were considered "low confidence" peaks. We plotted histograms of the read depth log-fold change for the three sets of peaks (high, medium, and low confidence) (see Figure 1). The distributions for the medium- and low-confidence peaks overlapped significantly. The sequence-based model identified enhancer candidates that would otherwise be filtered out using read depth alone. We called peaks for the 4 D. melanogaster FAIRE-Seq data sets from [2]. Sequencing data were cleaned with Trimmomatic, aligned to the dm3 genome with bwa backtrack, and filtered for mapping quality (q
- Published
- 2020
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.