Jarkko Toivonen, Jussi Taipale, Teemu Kivioja, Arttu Jolma, Junaid Akhtar, Kazuhiro R. Nitta, Eileen E. M. Furlong, Ekaterina Morgunova, Korneel Hens, Yimeng Yin, Bart Deplancke, Research Programs Unit, Genome-Scale Biology (GSB) Research Program, and Department of Computer Science
Divergent morphology of species has largely been ascribed to genetic differences in the tissue-specific expression of proteins, which could be achieved by divergence in cis-regulatory elements or by altering the binding specificity of transcription factors (TFs). The relative importance of the latter has been difficult to assess, as previous systematic analyses of TF binding specificity have been performed using different methods in different species. To address this, we determined the binding specificities of 242 Drosophila TFs, and compared them to human and mouse data. This analysis revealed that TF binding specificities are highly conserved between Drosophila and mammals, and that for orthologous TFs, the similarity extends even to the level of very subtle dinucleotide binding preferences. The few human TFs with divergent specificities function in cell types not found in fruit flies, suggesting that evolution of TF specificities contributes to emergence of novel types of differentiated cells. DOI: http://dx.doi.org/10.7554/eLife.04837.001, eLife digest Flies look very different from humans, but both are descended from a common ancestor that existed over 600 million years ago. Some differences between animal species are due to them having different genes: stretches of DNA that contain the instructions to make proteins and other molecules. However, often differences are caused by the same or similar genes being switched on and off at different times and in different tissues in each species. The instructions that control when and where a gene is expressed are written in the sequence of DNA bases located in the regulatory region of the gene. These instructions are written in a language that is often called the ‘gene regulatory code’. This code is read and interpreted by proteins called transcription factors that bind to specific sequences of DNA (or ‘DNA words’) and increase or decrease gene expression. Changes in gene expression between species could therefore be due to changes in the transcription factors and/or changes in the instructions within the regulatory regions of specific genes. Gene regulatory regions are not well conserved between species. However, it is unclear if the instructions in these regions are written using the same gene regulatory code, and whether transcription factors found in different species recognize different DNA words. Nitta et al. have now used high-throughput methods to identify the DNA words recognized by 242 transcription factors from a fruit fly called Drosophila melanogaster. Nitta et al. then used new computational tools to find motifs, or collections of DNA words, that are recognized by each of the transcription factors. By comparing the motifs, they observed that, in spite of more than 600 million years of evolution, almost all known motifs found in humans and mice were recognized by fruit fly transcription factors. Nitta et al. noted that both fruit flies and humans have transcription factors that recognize a few unique motifs, and confer properties that are specific to each species. For example, some of the transcription factors that control the development of the fruit fly wing are not present in humans. Moreover, fruit flies lack both mucus-producing goblet cells and the ability to recognize a motif read by the transcription factor that controls the development of these cells in humans. The findings of Nitta et al. also indicate that transcription factors do not evolve to recognize subtly different DNA motifs, but instead appear constrained to recognize the same motifs. Thus, much like the genetic code that instructs how to build proteins, the gene regulatory code that determines how DNA sequences direct gene expression is also highly conserved in animals. The language used to guide the development of animals has, as such, remained very similar for millions of years. What makes animals different is differences in the content and length of the instructions that are written using this language into the regulatory regions of their genes. DOI: http://dx.doi.org/10.7554/eLife.04837.002