1. Alignments anchored on genomic landmarks can aid in the identification of regulatory elements
- Author
-
David Landsman, Kannan Tharakaraman, Sergey L. Sheetlin, John L. Spouge, and Leonardo Mariòo-Ramírez
- Subjects
Statistics and Probability ,Human dna ,Amino Acid Motifs ,Molecular Sequence Data ,Genomics ,Sequence alignment ,Regulatory Sequences, Nucleic Acid ,Biology ,computer.software_genre ,Biochemistry ,Article ,symbols.namesake ,Cluster Analysis ,Humans ,Databases, Protein ,Promoter Regions, Genetic ,Cluster analysis ,Molecular Biology ,Statistical hypothesis testing ,Models, Statistical ,Base Sequence ,Nucleotides ,business.industry ,Computational Biology ,Pattern recognition ,DNA ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,symbols ,Gibbs sampling algorithm ,Data mining ,Motif (music) ,Artificial intelligence ,Transcription Initiation Site ,business ,Sequence Alignment ,computer ,Software ,Gibbs sampling - Abstract
Motivation: The transcription start site (TSS) has been located for an increasing number of genes across several organisms. Statistical tests have shown that some cis-acting regulatory elements have positional preferences with respect to the TSS, but few strategies have emerged for locating elements by their positional preferences. This paper elaborates such a strategy. First, we align promoter regions without gaps, anchoring the alignment on each promoter's TSS. Second, we apply a novel word-specific mask. Third, we apply a clustering test related to gapless BLAST statistics. The test examines whether any specific word is placed unusually consistently with respect to the TSS. Finally, our program A-GLAM, an extension of the GLAM program, uses significant word positions as new 'anchors' to realign the sequences. A Gibbs sampling algorithm then locates putative cis-acting regulatory elements. Usually, Gibbs sampling requires a preliminary masking step, to avoid convergence onto a dominant but uninteresting signal from a DNA repeat. However, since the positional anchors focus A-GLAM on the motif of interest, masking DNA repeats during Gibbs sampling becomes unnecessary. Results: In a set of human DNA sequences with experimentally characterized TSSs, the placement of 791 octonucleotide words was unusually consistent (multiple test corrected P < 0.05). Alignments anchored on these words sometimes located statistically significant motifs inaccessible to GLAM or AlignACE. Availability: The A-GLAM program and a list of statistically significant words are available at ftp://ftp.ncbi.nih.gov/pub/spouge/papers/archive/AGLAM/. Contact: spouge@ncbi.nlm.nih.gov
- Published
- 2005
- Full Text
- View/download PDF