Back to Search Start Over

ATG deserts define a novel core promoter subclass

Authors :
Maxwell P. Lee
Aparna Kotekar
Kenneth H. Buetow
Dinah S. Singer
Kevin Howcroft
Howard H. Yang
Source :
Genome Research. 15:1189-1197
Publication Year :
2005
Publisher :
Cold Spring Harbor Laboratory, 2005.

Abstract

Regulation of gene expression is mediated by specific interactions of transcription factors with promoter DNA sequences, resulting in the assembly of the transcription machinery and onset of transcription (Chen et al. 1994; Roeder 1996; Berk 1999; Gill 2001; Kadonaga 2004). RNA pol II promoters are conceptually divided into two domains, upstream regulatory and core promoter regions. Although the diversity of transcription factor binding sites and the complexity of their organization in upstream regulatory regions has been long recognized (Struhl 2001), it is increasingly apparent that core promoter regions are also highly diverse and complex (Burke and Kadonaga 1997; Lagrange et al. 1998; Smale et al. 1998; Kutach and Kadonaga 2000; Willy et al. 2000; Smale 2001; Butler and Kadonaga 2002). Core promoters can be grouped according to the presence of specific DNA sequence elements such as TATAA box (Singer et al. 1990; Butler and Kadonaga 2002), Inr (Smale and Baltimore 1989; Zenzie-Gregory et al. 1993; Kaufmann and Smale 1994; Lo and Smale 1996; Smale et al. 1998), TFIIB response element (BRE) (Lagrange et al. 1998; Littlefield et al. 1999), the downstream promoter element (DPE) (Burke and Kadonaga 1997; Burke et al. 1998; Kutach and Kadonaga 2000; Butler and Kadonaga 2001; Kadonaga 2002), or the MED-1 element (Ince and Scotto 1995). Another sequence feature common to many promoters is the presence of CpG islands (CGI) (Bird 1986; Gardiner-Garden and Frommer 1987; Cross and Bird 1995; Antequera 2003; Wang and Leung 2004). Although the presence of CGI has been used to localize promoters, not all CGI are associated with promoter regions. In general, CGI associated with promoters are distinguished from CGI not associated with promoters by their greater size (≥500 bp) and a higher G+C content (>0.55) and observed/expected CpG ratio (>0.65) (Takai and Jones 2002). In the human genome, it is estimated that there are 41,468 CGI based on NCBI's Build 34 genome annotation (Takai and Jones 2002) and 37,000 in the mouse (Antequera and Bird 1993). Further, 90% of all housekeeping genes and 40% of all tissue-specific genes fall within CGI. For many genes a CGI is the only identifiable core promoter structure, but little is known about how CGI directly contribute to transcription initiation (Butler and Kadonaga 2002). The sequence elements in the core promoter and its structure can both contribute to the regulation of gene expression. In yeast, it has been shown that these different classes of core promoters subserve different functions. While only about 20% of promoters in the yeast genome have TATAA elements, 50% of stress-responsive genes are TATAA promoters (Basehoar et al. 2004; Zanton and Pugh 2004). In Drosophila, differential usage of two closely linked promoter elements of the ADH gene is developmentally regulated (Hansen and Tjian 1995). In mammalian cells, the usage of promoters associated with the CIITA gene is tissue specific (Wong et al. 2002). Core promoter regions also differ in their patterns of transcription start sites (TSS). Recent genome-wide analyses have reported that the majority of genes initiate transcription at multiple sites distributed over the core promoter region (Suzuki et al. 2004). The observed TSS range from unique to tightly clustered to highly dispersed among the different promoters examined. Based on an analysis of 276 genes, Suzuki and colleagues suggested that the presence of a TATAA promoter in 42 genes correlated with tightly clustered start sites. The functional significance of multiple TSS in a promoter is unknown. However, the diversity of TSS suggests that initiation at individual promoters is surprisingly complex and may be a target for transcriptional regulation. A major challenge is to understand the degree to which differential TSS utilization contributes to the regulation of gene expression. We have begun to address this challenge by characterizing the core promoter structure and patterns of expression of an MHC class I gene. The MHC class I gene family encodes cell-surface molecules that provide immune surveillance against intracellular pathogens. The classical class Ia genes HLA-A, B, and C in human and PD1 in miniature swine are ubiquitously expressed, however, their expression is actively regulated in a tissue-specific fashion (Singer and Maguire 1990; Le Bouteiller 1994; Girdlestone 1995; Howcroft and Singer 2003). The highest levels of class I gene expression are found in the cells and tissues of the immune system. The promoter region of the MHC class I gene, PD1, is contained within a CGI extending from -556 to +1452 bp relative to a YTCA+1GYY Inr-like sequence that is conserved among class I genes. Our in vitro transcription studies revealed that initiation occurs at multiple TSS within the core promoter (Howcroft et al. 2003). Indeed, individual TSS usage in vitro reflects the prior exposure history of cells to modulatory cytokines such as γ-interferon (IFNγ) that regulate class I expression. Here we report that differential transcription start site usage within the core promoter occurs in vivo in basal and activated transcription, demonstrating that transcription start-site selection is actively regulated. The regulation of class I transcription through the use of multiple TSS is made possible by the absence of any ATG codons within ∼460 bp upstream of the translation initiation codon of the class I gene. The presence of this “ATG desert” ensures that only a single protein product is made, regardless of the TSS selected. Importantly, we identify a subclass of promoters in the human, mouse, and rat genomes that contain ATG deserts, thereby defining a novel core promoter feature. The ATG desert is a DNA segment that has a lower frequency of occurrence of the ATG trinucleotide than the surrounding sequences and spans a region of ∼1 kB both upstream and downstream of the major transcription start site. ATG deserts are an intrinsic feature of core promoters that do not contain canonical TATAA elements, independent of the presence of a CGI. We further document a significant correlation between the presence of ATG deserts and the use of multiple transcription start sites among non-TATAA promoters. A consequence of the presence of ATG deserts is that they enable the use of multiple TSS whose products all encode a single protein, thereby permitting the core promoter to serve as a platform where complex upstream regulatory signals are integrated through selective transcription start site usage.

Details

ISSN :
10889051
Volume :
15
Database :
OpenAIRE
Journal :
Genome Research
Accession number :
edsair.doi.dedup.....b91f652775c2d11fe27040548ee6778f