Juan I. Fuxman Bass, Hamed S. Najafabadi, Kamesh Narasimhan, Hong Zheng, Matthew T. Weirauch, Sanie Mnaimneh, Samuel A. Lambert, Timothy P. Hughes, Jeremy Riddell, Ally Yang, Mihai Albu, John S. Reece-Hoyes, and Albertha J.M. Walhout
Caenorhabditis elegans is a powerful model for studying gene regulation, as it has a compact genome and a wealth of genomic tools. However, identification of regulatory elements has been limited, as DNA-binding motifs are known for only 71 of the estimated 763 sequence-specific transcription factors (TFs). To address this problem, we performed protein binding microarray experiments on representatives of canonical TF families in C. elegans, obtaining motifs for 129 TFs. Additionally, we predict motifs for many TFs that have DNA-binding domains similar to those already characterized, increasing coverage of binding specificities to 292 C. elegans TFs (∼40%). These data highlight the diversification of binding motifs for the nuclear hormone receptor and C2H2 zinc finger families and reveal unexpected diversity of motifs for T-box and DM families. Motif enrichment in promoters of functionally related genes is consistent with known biology and also identifies putative regulatory roles for unstudied TFs. DOI: http://dx.doi.org/10.7554/eLife.06967.001, eLife digest Many scientists use ‘model’ species—such as the fruit fly or a nematode worm called Caenorhabditis elegans—in their research because these organisms have useful features that make it easier to carry out many experiments. For example, C. elegans has a smaller genome compared to many other animals, which is useful for studying the roles of individual genes or stretches of DNA. Transcription factors are a type of protein that can bind to specific stretches of DNA and help to switch certain genes on or off. These ‘motifs’ may be close to the gene or further away in the genome, and therefore, must stand out amongst the rest of the DNA, like lights on a landing strip. However, the motifs for only 10% of the estimated 763 transcription factors in C. elegans have been identified so far. In this study, Narasimhan, Lambert, Yang et al. used a technique called a ‘protein binding microarray’ to identify the motifs for many more of the C. elegans transcription factors. These findings were then used to predict motifs for other transcription factors. Together, these methods increased the proportion of C. elegans transcription factors with known DNA-binding motifs from 10% to around 40%. Now that more DNA motifs have been identified, it is possible to look for similarities and differences between them. For example, Narasimhan, Lambert, Yang et al. found that transcription factors with similar sequences can bind to very varied motifs. On the other hand, some transcription factors that are very different are able to recognize very similar motifs. The experiments also indicate that motifs found very close to genes—in sequences known as ‘promoters’—may be able to interact with many proteins to influence the activity of genes. Narasimhan, Lambert, Yang et al.'s findings increase the number of C. elegans transcription factors with a motif, bringing the knowledge of these proteins more in line with the better-studied transcription factors of humans and fruit flies. The next challenge is to identify DNA motifs for the remaining 60% of transcription factors. DOI: http://dx.doi.org/10.7554/eLife.06967.002