Back to Search Start Over

Computational Prediction of c-MYCBinding and Action by Integration of Multiple Data Sources.

Authors :
Chen, Yili
Blackwell, Thomas W.
Gao, Jing
Hewagama, Anura
Grifka, Heather M.
Lee, Angel W.
States, David J.
Source :
Blood; November 2006, Vol. 108 Issue: 11 p4345-4345, 1p
Publication Year :
2006

Abstract

c-MYCis an important proto-oncogene. Its actions are mediated by sequence specific binding of the c-MYCprotein to genomic DNA. While many c-MYCrecognition sites can be identified in c-MYCresponsive genes, many others are associated with genes showing no c-MYCresponse. It is not yet known how the cell determines which of the many c-MYCrecognition sites are biologically active and directly bind c-MYCprotein to regulate gene expression. We have developed a computational model that predict c-MYCbinding and functional activation as distinct processes. Our model integrates four types of evidence to predict functional c-MYCtargets: genomic sequence, MYC binding, gene expression and gene function annotations. First, a Bayesian network classifier is used to predict c-MYCrecognition sites likely to exhibit high occupancy binding in chromatin immunoprecipitation studies using several types of sequence information, including predicted DNA methylation using a computational model to estimate the likelihood of genomic DNA methylation. In the second step, the DNA binding probability of MYC is combined with the gene expression information from 9 independent microarray datasets in multiple tissues and the gene function annotations in Gene Ontology to predict the c-MYCtargets. The prediction results were compared with the c-MYCtargets in public MYC target database [www.myccancergene.org], which collected the c-MYCtargets identified in biomedical literatures. In total, we predicted 599 likely c-MYCgenes on human genome, of which 73 have been reported to be both bound and regulated by MYC, 83 are bound by MYC in vivo and another 93 are MYC regulated. The approach thus successfully identified many known c-MYCtargets as well as suggesting many novel sites including many sites that are remote from the transcription start site. Our findings suggest that to identify c-MYCgenomic targets, any study based on single high throughput dataset is likely to be insufficient. Using multiple gene expression datasets helps to improve the sensitivity and integration of different data sources helps to improve the specificity.

Details

Language :
English
ISSN :
00064971 and 15280020
Volume :
108
Issue :
11
Database :
Supplemental Index
Journal :
Blood
Publication Type :
Periodical
Accession number :
ejs56866743
Full Text :
https://doi.org/10.1182/blood.V108.11.4345.4345