1. A predictive model for identifying mini-regulatory modules in the mouse genome.
- Author
-
Yaragatti M, Sandler T, and Ungar L
- Subjects
- Algorithms, Animals, Databases, Nucleic Acid, Expressed Sequence Tags, Genomics methods, Mice, RNA, Messenger genetics, Computational Biology methods, Genome, Regulatory Sequences, Nucleic Acid genetics
- Abstract
Motivation: Rapidly advancing genome technology has allowed access to a large number of diverse genomes and annotation data. We have defined a systems model that integrates assembly data, comparative genomics, gene predictions, mRNA and EST alignments and physiological tissue expression. Using these as predictive parameters, we engineered a machine learning approach to decipher putative active regions in the genome., Results: Analysis of genomic sequences showed nucleosome-free region (NFR) modules containing a higher percentage of conserved regions, RNA-encoding sequences, CpG islands, splice sites and GC-rich areas. In contrast, random in silico fragments revealed higher percentages of DNA repeats and a lower conservation. The larger conserved sequences from the Vista enhancer browser (VEB) showed a greater percentage of short DNA sequence matches and RNA coding regions in multiple species. Our model can predict small regulatory regions in the genome with >95% prediction accuracy using NFR modules and >85% prediction accuracy with VEB elements. Ultimately, this systems model can be applied to any organism to identify candidate transcriptional modules on a genome scale.
- Published
- 2009
- Full Text
- View/download PDF