19 results on '"Minmei Hou"'
Search Results
2. Aligning Two Genomic Sequences That Contain Duplications.
- Author
-
Minmei Hou, Cathy Riemer, Piotr Berman, Ross C. Hardison, and Webb Miller
- Published
- 2009
- Full Text
- View/download PDF
3. Approximating the spanning star forest problem and its applications to genomic sequence alignment.
- Author
-
C. Thach Nguyen, Jian Shen, Minmei Hou, Li Sheng 0001, Webb Miller, and Louxin Zhang
- Published
- 2007
4. Controlling Size When Aligning Multiple Genomic Sequences with Duplications.
- Author
-
Minmei Hou, Piotr Berman, Louxin Zhang, and Webb Miller
- Published
- 2006
- Full Text
- View/download PDF
5. Pico-inplace-inversions between human and chimpanzee.
- Author
-
Minmei Hou, Ping Yao, Angela Antonou, and Mitrick A. Johns
- Published
- 2011
- Full Text
- View/download PDF
6. Approximating the Spanning Star Forest Problem and Its Application to Genomic Sequence Alignment.
- Author
-
C. Thach Nguyen, Jian Shen, Minmei Hou, Li Sheng 0001, Webb Miller, and Louxin Zhang
- Published
- 2008
- Full Text
- View/download PDF
7. HomologMiner: looking for homologous genomic groups in whole genomes.
- Author
-
Minmei Hou, Piotr Berman, Chih-Hao Hsu, and Robert S. Harris
- Published
- 2007
- Full Text
- View/download PDF
8. Mulan: Multiple-sequence local alignment and visualization for studying function and evolution
- Author
-
Ovcharenko, Ivan, Giardine, Belinda M., Jian Ma, Minmei Hou, Loots, Gabriela G., Hardison, Ross C., Stubbs, Lisa, and Miller, Webb
- Subjects
Birds -- Genetic aspects ,Online information services -- Analysis ,Information services -- Analysis ,Online services -- Analysis ,Online information service ,Health - Abstract
Mulan, a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time, is introduced. The uses and applications of Mulan tool through multispecies comparisons of the GATA3 gene locus and the identification of elements that are conserved in a different way in avians than in other genomes, allowing speculation on the evolution of birds are illustrated.
- Published
- 2005
9. 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser
- Author
-
Ross C. Hardison, Kate R. Rosenbloom, Brian J. Raney, Thomas H. Pringle, Sergei L. Kosakovsky Pond, Minmei Hou, Adam Siepel, Richard Burhans, Arthur M. Lesk, Robert Baertsch, Robert S. Harris, George M. Weinstock, Belinda Giardine, Webb Miller, Mark Diekhans, David C. King, Richard A. Gibbs, W. James Kent, Svitlana Tyekucheva, Anton Nekrutenko, David Haussler, James Taylor, William J. Murphy, Eric S. Lander, Kerstin Lindblad-Toh, and Daniel Blankenberg
- Subjects
Resource ,Guinea Pigs ,Molecular Sequence Data ,Codon, Initiator ,Sequence alignment ,Genome browser ,Biology ,Genome ,Conserved sequence ,Mice ,Dogs ,Databases, Genetic ,Genetics ,Animals ,Humans ,Coding region ,natural sciences ,Indel ,Conserved Sequence ,Genetics (clinical) ,Sequence Deletion ,Base Sequence ,Genome, Human ,Stop codon ,Rats ,Mutagenesis, Insertional ,Evolutionary biology ,Cats ,Codon, Terminator ,Cattle ,Human genome ,Rabbits ,Sequence Alignment - Abstract
This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.
- Published
- 2007
- Full Text
- View/download PDF
10. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
- Author
-
Kate R. Rosenbloom, LaDeana W. Hillier, John Spieth, David Haussler, Jakob Skou Pedersen, George M. Weinstock, Richard A. Gibbs, W. James Kent, Richard K. Wilson, Adam Siepel, Webb Miller, Gill Bejerano, Minmei Hou, Stephen Richards, Hiram Clawson, and Angie S. Hinrichs
- Subjects
Insecta ,Molecular Sequence Data ,Biology ,Human accelerated regions ,Genome ,Conserved non-coding sequence ,Conserved sequence ,Evolution, Molecular ,Saccharomyces ,Yeasts ,Genetics ,Animals ,Humans ,Caenorhabditis elegans ,3' Untranslated Regions ,Base Pairing ,Genome size ,Conserved Sequence ,Genetics (clinical) ,Base Sequence ,Phylogenetic tree ,Articles ,biology.organism_classification ,Insects ,Caenorhabditis ,Evolutionary biology ,Vertebrates ,DNA, Intergenic ,Human genome - Abstract
We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%–8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%–53%), Caenorhabditis elegans (18%–37%), and Saccharaomyces cerevisiae (47%–68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3′ UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.
- Published
- 2005
- Full Text
- View/download PDF
11. Alignathon: a competitive assessment of whole-genome alignment methods
- Author
-
Inna Dubchak, Aaron E. Darling, Brian J. Raney, Glenn Hickey, Cedric Notredame, W. J. Kent, Vladimir Molodtsov, Dent Earl, David Haussler, Benedict Paten, Stephen Fitzgerald, Carsten Kemena, Jian Ma, Ngan Nguyen, Minmei Hou, Michael Brudno, Ionas Erb, Victor V. Solovyev, Kathryn Beal, Alexander Poliakov, Hiram Clawson, Robert S. Harris, Jia-Ming Chang, Igor Seledtsov, Jaebum Kim, and Javier Herrero
- Subjects
Resource ,Bioinformatics ,media_common.quotation_subject ,Datasets as Topic ,Genomics ,Sequence alignment ,Biology ,Machine learning ,computer.software_genre ,Medical and Health Sciences ,Software ,Resource (project management) ,Genetics ,Animals ,Humans ,Quality (business) ,Computer Simulation ,Genetics (clinical) ,Phylogeny ,media_common ,Mammals ,Multiple sequence alignment ,Genome ,business.industry ,Reproducibility of Results ,Computational Biology ,Benchmarking ,Biological Sciences ,Artificial intelligence ,business ,computer ,Sequence Alignment ,Genome-Wide Association Study - Abstract
© 2014 Earl et al. Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.
- Published
- 2014
12. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser
- Author
-
Miller, Webb, Rosenbloom, Kate, Hardison, Ross C., Minmei Hou, Taylor, James, Raney, Brian, Burhans, Richard, King, David C., Baertsch, Robert, Blankenberg, Daniel, Pond, Sergei L. Kosakovsky, Nekrutenko, Anton, Giardine, Belinda, Harris, Robert S., Tyekucheva, Svitlana, Diekhans, Mark, Pringle, Thomas H., Murphy, William J., Lesk, Arthur, Weinstock, George M., Lindblad-Toh, Kerstin, Gibbs, Richard A., Lander, Eric S., Siepel, Adam, Haussler, David, and Kent, W. James
- Subjects
Evolution -- Research ,Human genome -- Research ,Human genome -- Information management ,Mutation (Biology) -- Research ,Company systems management ,Health - Abstract
A set of alignments of 28 vertebrate genome sequences observed on the UCSC Human Genome Browser is described. The data source could be used to explore vertebrate and mammalian evolution.
- Published
- 2007
13. Analyses of deep mammalian sequence alignment and constraint predictions for 1% of the human genome
- Author
-
Margulies, Elliott H., Schwartz, Ariel S., Loytynoja, Ari, Whelan, Simon, Pardi, Fabio, Taylor, James, Nikolaev, Sergey, Montoya-Burgos, Juan I., Minmei Hou, Dewey, Colin N., Siepel, Adam, Birney, Ewan, Keefe, Damian, Cooper, Gregory M., Asimenos, George, Thomas, Daryl J., Massingham, Tim, Ureta-Vidal, Abel, Paten, Benedict; Stone, Eric A., Bickel, Peter, Mullikin, James C., Holmes, Ian, Rosenbloom, Kate R., Brown, James B., Goldman, Nick, Hardison, Ross, Miller, Webb, Haussler, David, Antonarakis, Stylianos E., Batzoglou, Serafim, Kent, W.James, Pachter, Lior, Green, Eric D., and Sidow, Arend
- Subjects
Mammals -- Genetic aspects ,Nucleotide sequence -- Analysis ,Biological diversity -- Analysis ,Health - Abstract
The sequence analyses of functional elements involving sequence generation, alignment and evolutionary constraint of 23 mammalian species are carried out using quantitative and qualitative methods. The constraint regions and the elements are found to be nonoverlapping.
- Published
- 2007
14. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
- Author
-
Nan Jiang, Alfonso Valencia, Rachel A. Harte, Abigail Woodroffe, Michael Seringhaus, Andrew Haydock, Eugene Davydov, Todd M. Lowe, Peggy J. Farnham, Robert E. Thurman, Tyler Alioto, Adam Ameur, Morgan Park, Roderic Guigó, Archana Thakkapallayil, Philipp Kapranov, Francis S. Collins, Donna Karolchik, Stefan Washietl, Kerstin Lindblad-Toh, Michael L. Tress, Barbara E. Stranger, Gregory M. Cooper, Kun Wang, Thomas R. Gingeras, Serafim Batzoglou, Peter D. Ellis, Annie Yang, Stylianos E. Antonarakis, Jonghwan Kim, Robert M. Andrews, W. James Kent, Kuo Ping Chiu, Madhavan Ganesh, Jason D. Lieb, Shane Neph, Albin Sandelin, Michael Hawrylycz, Eric S. Lander, Matthew T. Weirauch, Nick Goldman, Alexander E. Urban, Ian Bell, Anason S. Halees, Jan Komorowski, Webb Miller, Kandhadayar G. Srinivasan, Evelyn Cheung, David B. Jaffe, Peter J. Good, Gregory Lefebvre, Yuko Yoshinaga, Sylvain Foissac, Alexander W. Bruce, Mark Dickson, Christoph M. Koch, Antigone S. Dimas, Zhengdong D. Zhang, Matthew J. Oberley, Paul I.W. de Bakker, Arend Sidow, Xueqing Zhang, Molly Weaver, Jane Rogers, Jacquelyn R. Idol, Jeff Goldy, Haiyan Huang, William Stafford Noble, Angie S. Hinrichs, Sandeep Patel, David A. Nix, Lluís Armengol, Siew Woh Choo, Hong Sain Ooi, Sara Van Calcar, Ivan Adzhubei, Job Dekker, Sara J. Cooper, Hari Tammana, Valerie Maduro, Jason A. Greenbaum, Bing Ren, Sharon L. Squazzo, Jennifer C. McDowell, Chikatoshi Kai, Ivo L. Hofacker, Ian Dunham, Peter J. Bickel, Nancy Holroyd, Eduardo Eyras, Julien Lagarde, Fei Yao, Man Yu, Piero Carninci, Chia-Lin Wei, Alice C. Young, Yong Yu, Daryl J. Thomas, George Asimenos, Xiaoqin Xu, Galt P. Barber, Andrea Tanzer, Juan I. Montoya-Burgos, Sujit Dike, Nathan Day, Gregory E. Crawford, Michele Clamp, Todd Richmond, Nuria Lopez-Bigas, Vishwanath R. Iyer, Ewan Birney, Richard Humbert, Gary C. Hon, David Swarbreck, Xiaobin Guan, Sarah Wilcox, Nate Heintzman, Josep F. Abril, Elaine R. Mardis, Stefan Enroth, Charlie W.H. Lee, Nicholas Matthews, Benedict Paten, Robert Castelo, Michael A. Singer, Mousheng Xu, Chiou Yu Choo, Nancy F. Hansen, Elizabeth Rosenzweig, Patrick A. Navas, Jacqueline Chrast, Brett E. Johnson, Jan O. Korbel, Simon Whelan, Stephen Hartman, Ulas Karaoz, Ingileif B. Hallgrímsdóttir, David Haussler, Michael R. Brent, Jill Cheng, Gonçalo R. Abecasis, Ann S. Zweig, Sherman M. Weissman, Michael O. Dorschner, Jin Lian, Vinsensius B. Vega, Cordelia Langford, Alexandre Reymond, Mark Gerstein, Pawandeep Dhami, Ola Wallerman, Huaiyang Jiang, Lior Pachter, James Taylor, Eric A. Stone, David R. Inman, Yijun Ruan, Peter E. Newburger, Roland Green, Ari Löytynoja, Shelley Force Aldred, Alvaro Rada-Iglesias, Baishali Maskeri, Joel Rozowsky, Jorg Drenkow, Colin N. Dewey, Srinka Ghosh, Yutao Fu, Kayla E. Smith, Xavier Estivill, Donna M. Muzny, Christine P. Bird, Tim Hubbard, Jana Hertel, Kristin Missal, Neerja Karnani, Ericka M. Johnson, Nan Zhang, Zhou Zhu, Stephen C. J. Parker, Minmei Hou, Charlotte N. Henrichsen, Heather A. Hirsch, Caroline Manzano, Laura A. Liefer, Kim C. Worley, Robert Baertsch, Mark S. Guyer, Ross C. Hardison, Zheng Lian, Hiram Clawson, Leah O. Barrera, Manja Lindemeyer, James Cuff, Chunxu Qu, Jun Kawai, Jennifer Hillman-Jackson, Eric D. Green, Robert W. Blakesley, Abel Ureta-Vidal, Rhona K. Stuart, Fabio Pardi, Peter J. Sabo, Edward A. Sekinger, John S. Mattick, Ankit Malhotra, Taane G. Clark, James G. R. Gilbert, James C. Mullikin, Deyou Zheng, Robert M. Kuhn, Tae Hoon Kim, M. Geoff Rosenfeld, Kirsten Lee, Jörg Hackermüller, Oliver M. Dovey, Deanna M. Church, Kyle J. Munn, Peter F. Stadler, Phillippe Couttet, Claudia Fried, Jaafar N. Haidar, Kris A. Wetterstrand, Wing-Kin Sung, Paul G. Giresi, Jia Qian Wu, Ruth Taylor, David A. Wheeler, Zarmik Moqtaderi, Adam Siepel, Michael Snyder, Ian Holmes, Jun Liu, Olof Emanuelsson, Kevin Struhl, Saurabh Asthana, Akshay Bhinge, Adam Frankish, Yoshihide Hayashizaki, Ghia Euskirchen, Joel D. Martin, Robert S. Fulton, Ugrappa Nagalakshmi, Heike Fiegler, Gayle K. Clelland, Shane C. Dillon, Fidencio Neri, Elliott H. Margulies, Sean Davis, Mark Bieda, Tristan Frum, Michael S. Kuehn, Heather Trumbower, Pamela J. Thomas, Kazutoyo Osoegawa, Richard A. Gibbs, Emmanouil T. Dermitzakis, Julian L. Huppert, Richard K. Wilson, Tina Graves, Zhiping Weng, Anthony Shafer, Baoli Zhu, Christopher K. Glass, Patrick J. Boyle, Hennady P. Shulha, Maxim Koriabine, Christoph Flamm, David Vetrie, Nigel P. Carter, Patrick Ng, Peter Kraus, John A. Stamatoyannopoulos, George M. Weinstock, Tim Massingham, Jane M. Lin, Damian Keefe, Jean L. Chang, Shamil R. Sunyaev, Sergey Nikolaev, Kate R. Rosenbloom, Carine Wyss, Hua Cao, Keith D. James, Michael C. Zody, Gerard G. Bouffard, Atif Shahab, Nathan D. Trinklein, James B. Brown, Erica Sodergren, Xiaodong Zhao, Rosa Luna, Sante Gnerre, Paul Flicek, Joanna C. Fowler, Andrew D. Kern, Jakob Skou Pedersen, David C. King, Anindya Dutta, Elise A. Feingold, Richard M. Myers, Richard Sandstrom, Catherine Ucla, Thomas D. Tullius, Mikhail Nefedov, Claes Wadelius, Jennifer Harrow, Christopher M. Taylor, Xiaoling Zhang, Pieter J. de Jong, Dermitzakis, Emmanouil, and Reymond, Alexandre
- Subjects
DNA Replication ,RNA, Messenger/genetics ,Chromatin Immunoprecipitation ,Heterozygote ,RNA, Untranslated ,Transcription, Genetic ,Systems biology ,Histones/metabolism ,RNA, Untranslated/genetics ,Pilot Projects ,Genomics ,Computational biology ,Regulatory Sequences, Nucleic Acid ,Biology ,ENCODE ,Genome ,Article ,DNase-Seq ,Histones ,Evolution, Molecular ,Exons/genetics ,Humans ,ddc:576.5 ,Transcription Factors/metabolism ,RNA, Messenger ,Conserved Sequence ,Chromatin/genetics/metabolism ,Genetics ,Transcription, Genetic/ genetics ,Multidisciplinary ,Genome, Human ,GENCODE ,Genetic Variation ,Exons ,Chromatin ,Genetic Variation/genetics ,Regulatory Sequences, Nucleic Acid/ genetics ,Human genome ,Conserved Sequence/genetics ,Transcription Initiation Site ,Functional genomics ,Genome, Human/ genetics ,Transcription Factors ,Protein Binding - Abstract
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view about chromatin structure has emerged, including its interrelationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded novel mechanistic and evolutionary insights about the functional landscape of the human genome. Together, these studies are defining a path forward to pursue a more-comprehensive characterisation of human genome function.
- Published
- 2007
- Full Text
- View/download PDF
15. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome
- Author
-
James Cuff, Nancy F. Hansen, Rachel A. Harte, Jean L. Chang, Abel Ureta-Vidal, Fabio Pardi, Michele Clamp, Xiaobin Guan, Erica Sodergren, Richard A. Moore, Alice C. Young, Huaiyang Jiang, Kim C. Worley, Sante Gnerre, Adam Siepel, Eric A. Stone, Ann S. Zweig, Peter J. Bickel, Donna M. Muzny, Eric D. Green, Simon Whelan, Jacquelyn R. Idol, Ariel S. Schwartz, Robert W. Blakesley, Ross C. Hardison, Arend Sidow, Robert M. Kuhn, Donna Karolchik, Elaine R. Mardis, Ian Holmes, Daryl J. Thomas, James C. Mullikin, George Asimenos, Robert S. Fulton, Galt P. Barber, Elliott H. Margulies, Webb Miller, Ewan Birney, Jacqueline E. Schein, Lior Pachter, Ari Löytynoja, Minmei Hou, Serafim Batzoglou, Gregory M. Cooper, Eric S. Lander, Nick Goldman, Baishali Maskeri, George M. Weinstock, Hiram Clawson, Matthew A. Field, Tim Massingham, Damian Keefe, Heather Trumbower, David B. Jaffe, Tina Graves, David Haussler, Valerie Maduro, Richard A. Gibbs, Richard K. Wilson, Stylianos E. Antonarakis, Carrie A. Matthewson, W. James Kent, Morgan Park, David A. Wheeler, Kerstin Lindblad-Toh, Sergey Nikolaev, Kate R. Rosenbloom, Gerard G. Bouffard, Angie S. Hinrichs, James B. Brown, Marco A. Marra, Benedict Paten, Colin N. Dewey, Jennifer C. McDowell, Juan I. Montoya-Burgos, Pamela J. Thomas, James Taylor, Montoya Burgos, Juan Ignacio, and Antonarakis, Stylianos
- Subjects
Genetics ,Mammals ,ddc:616 ,Genome, Human ,Sequence alignment ,Computational biology ,Biology ,ENCODE ,Genome ,Article ,Constraint (information theory) ,Evolution, Molecular ,Consistency (database systems) ,Open Reading Frames ,Phylogenetics ,Human Genome Project ,Animals ,Humans ,Human genome ,Mammals/ genetics ,Sequence Alignment ,Genetics (clinical) ,Phylogeny ,Sequence (medicine) - Abstract
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.
- Published
- 2007
16. HomologMiner: looking for homologous genomic groups in whole genomes
- Author
-
Robert S. Harris, Piotr Berman, Chih Hao Hsu, and Minmei Hou
- Subjects
Statistics and Probability ,Sequence analysis ,Interspersed repeat ,Molecular Sequence Data ,Computational biology ,Biology ,Biochemistry ,Genome ,Tandem repeat ,Sequence Homology, Nucleic Acid ,Gene cluster ,Direct repeat ,Molecular Biology ,Segmental duplication ,Repetitive Sequences, Nucleic Acid ,Genetics ,Base Sequence ,Chromosome Mapping ,Sequence Analysis, DNA ,Computer Science Applications ,Computational Mathematics ,Variable number tandem repeat ,Computational Theory and Mathematics ,Sequence Alignment ,Algorithms ,Software - Abstract
Motivation: Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain. Results: We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families. Availability: All programs and datasets are downloadable from www.bx.psu.edu/miller_lab Contact: mhou@cse.psu.edu
- Published
- 2007
17. Mulan: Multiple-sequence local alignment and visualization for studying function and evolution
- Author
-
Lisa Stubbs, Jian Ma, Ross C. Hardison, Ivan Ovcharenko, Minmei Hou, Belinda Giardine, Webb Miller, and Gabriela G. Loots
- Subjects
Sequence alignment ,Computational biology ,GATA3 Transcription Factor ,Biology ,Genome ,Conserved sequence ,Evolution, Molecular ,Mice ,Chicken Special/Resources ,Sequence Homology, Nucleic Acid ,Genetics ,Computer Graphics ,Animals ,Humans ,Genetics (clinical) ,Conserved Sequence ,Phylogeny ,Smith–Waterman algorithm ,Binding Sites ,Phylogenetic tree ,Genome, Human ,Fishes ,Computational Biology ,Genome project ,Rats ,DNA binding site ,DNA-Binding Proteins ,Trans-Activators ,Human genome ,Anura ,Chickens ,Sequence Alignment ,Software - Abstract
Multiple-sequence alignment analysis is a powerful approach for understanding phylogenetic relationships, annotating genes, and detecting functional regulatory elements. With a growing number of partly or fully sequenced vertebrate genomes, effective tools for performing multiple comparisons are required to accurately and efficiently assist biological discoveries. Here we introduce Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the TBA multi-aligner program for rapid identification of local sequence conservation, and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA. Local multiple alignments computed by Mulan ensure reliable representation of short- and large-scale genomic rearrangements in distant organisms. Mulan allows for interactive modification of critical conservation parameters to differentially predict conserved regions in comparisons of both closely and distantly related species. We illustrate the uses and applications of the Mulan tool through multispecies comparisons of the GATA3 gene locus and the identification of elements that are conserved in a different way in avians than in other genomes, allowing speculation on the evolution of birds. Source code for the aligners and the aligner-evaluation software can be freely downloaded from http://www.bx.psu.edu/miller_lab/.
- Published
- 2005
18. Genome sequence of the Brown Norway rat yields insights into mammalian evolution
- Author
-
Rui Chen, George M. Weinstock, Cynthia Pfannkoch, Chris P. Ponting, Mark S. Guyer, Manuel L. Gonzalez-Garay, James Taylor, Yixin Chen, Eric D. Green, Simon Cawley, Jo Gullings-Handley, Granger G. Sutton, Jose M. Duarte, Stephen M. J. Searle, Laura Elnitski, Aleksandar Milosavljevic, Alicia Hawes, Stephen C. Mockrin, Oliver Delgado, Shannon Dugan-Rocha, Christine Deramo, Dean Pasko, Marina Alexandersson, Eitan E. Winter, Robert W. Blakesley, Donna Karolchik, Huajun Wang, David Shteynberg, Diane M. Dunn, Carlos López-Otín, Abel Ureta-Vidal, Jia Qian Wu, A. Glodek, Shan Yang, Natasja Wye, Sue Daniels, Keita Geer, Arian F.A. Smit, Jozef Lazar, Pallavi Eswara, Carl Fosler, Douglas Smith, Martin Krzywinski, Uma Mudunuri, George Miner, Herbert Schulz, Angie S. Hinrichs, Manimozhiyan Arumugam, Josep F. Abril, Ursula Vitt, Andrei Volkov, Peter J. Tonellato, Von Bing Yap, Bingshan Li, Jyoti Shetty, Ian Bosdet, Evgeny M. Zdobnov, San Diego Glenn Tesler, Chris Fjell, Yi Zhang, Francis S. Collins, Serafim Batzoglou, Robert Baertsch, Laura Clarke, David Neil Cooper, Carrie Mathewson, Diana L. Kolbe, Kate R. Rosenbloom, Valerie Curwen, Bret A. Payseur, Gerard G. Bouffard, Michael R. Brent, Barbara J. Trask, Scott A. Beatson, Sourav Chatterji, Francisco Camara, Detlev Ganten, Andrew R. Jackson, Claire M. Fraser, Klaus Lindpaintner, Yue Liu, Mark Raymond Adams, Robert A. Holt, Erik Gustafson, Hiram Clawson, Michael L. Metzker, John Douglas Mcpherson, Gregory M. Cooper, Martin S. Taylor, Scott Schwartz, Hui Huang, Darryl Gietzen, Patrick Cahill, Geoffrey Okwuonu, Sandra Hines, J. Craig Venter, Jan Monti, David Steffen, Marco A. Marra, Arnold Kana, Richard D. Emes, Asim Sarosh Siddiqui, Erica Sodergren, Mario Caccamo, Jim Wingrove, Richard R. Copley, Leo Goodstadt, Francesca Chiaromonte, Davinder Virk, Kirt Martin, Colin N. Dewey, Xiang Qin, T. Dan Andrews, K. James Durbin, Michael P. McLeod, Susan Bromberg, Pavel A. Pevzner, Petra Brandt, Austin J. Cooney, Don Jennings, Baoli Zhu, Lynn Doucette-Stamm, Heather Trumbower, Eray Tüzün, Kristian Stevens, Norbert Hubner, Young-Ae Lee, Zhiping Gu, Harold Riethman, Xose S. Puente, Cynthia Sitter, Michael Brudno, Gerald Nyakatura, Oliver Hummel, Caleb Webber, Olivier Couronne, Kim Fechtel, W. J. Kent, Zhengdong D. Zhang, Xing Zhi Song, Matt Weirauch, Ewan Birney, Richard A. Gibbs, William C. Nierman, Anne E. Kwitek, Alexander Poliakov, Mary Barnstead, Jeanette Schmidt, Yanru Ren, Howard J. Jacob, Kateryna D. Makova, Edward M. Rubin, Susan Old, Trixie Nguyen, Arend Sidow, Nicolas Bray, Hong Mei Lee, Lisa M. D'Souza, Heinz Himmelbauer, Cara Woodwark, Peter G. Amanatides, Paul Havlak, Janet M. Young, Eduardo Eyras, Thomas Kreitler, Heming Xing, Sofiya Shatsman, Kushal Chakrabarti, Stephen Rice, Cheryl A. Evans, Kim C. Worley, Peter D. Stenson, Rachel Gill, Pieter J. de Jong, Jacqueline E. Schein, Lior Pachter, Steve Ferriera, Santa Cruz David Haussler, Ross C. Hardison, Holly Baden-Tillson, Margaret Adetobi, Krishna M. Roskin, Guillaume Bourque, Eric A. Stone, Emmanuel Mongin, Michele Clamp, Margaret Morgan, Richard Durbin, Cathy Riemer, Anton Nekrutenko, Mikita Suyama, Soo H. Chin, Kenneth J. Kalafus, Anat Caspi, Donna M. Muzny, Inna Dubchak, Shaying Zhao, Sofyia Abramzon, Michael I. Jensen-Seaman, Steven E. Scherer, Lora Lewis, M. Mar Albà, Terrence S. Furey, Peer Bork, Trevor Woodage, David A. Wheeler, Hans Lehrach, Graham R. Scott, Bin Ma, Paula E. Burch, Robert B. Weiss, Kazutoyo Osoegawa, Evan E. Eichler, Amy Egan, Webb Miller, Cheryl L. Kraft, Steven J.M. Jones, Jeffrey A. Bailey, Roderic Guigó, David Torrents, Heike Zimdahl, Adam Felsenfeld, Jane Peterson, Simon N. Twigger, Claudia Goesele, Keith Weinstock, Minmei Hou, and Zdobnov, Evgeny
- Subjects
Male ,Models, Molecular ,Mammalian Genetics ,RNA, Untranslated ,Retroelements ,Sequence analysis ,Gene prediction ,Centromere ,Genomics ,Biology ,Regulatory Sequences, Nucleic Acid ,Genome ,DNA, Mitochondrial ,Polymorphism, Single Nucleotide ,Rat Genome Database ,Evolution, Molecular ,Mice ,Gene Duplication ,Rats, Inbred BN ,Animals ,Humans ,ddc:576.5 ,Gene ,Whole genome sequencing ,Genetics ,Base Composition ,Multidisciplinary ,Sequence Analysis, DNA ,Telomere ,Chromosomes, Mammalian ,Introns ,Rats ,Evolutionary biology ,Mutagenesis ,DNA Transposable Elements ,CpG Islands ,RNA Splice Sites - Abstract
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
- Published
- 2003
19. APPROXIMATING THE SPANNING STAR FOREST PROBLEM AND ITS APPLICATION TO GENOMIC SEQUENCE ALIGNMENT.
- Author
-
Nguyen, C. Thach, Jian Shen, Minmei Hou, Li Sheng, Webb Miller, and Louxin Zhang
- Subjects
ALGORITHMS ,POLYNOMIALS ,APPROXIMATION theory ,GRAPH theory ,LINEAR programming ,MATHEMATICAL models ,GENETICS - Abstract
This paper studies the algorithmic issues of the spanning star forest problem. We prove the following results: (1) There is a polynomial-time approximation scheme for planar graphs; (2) there is a polynomial-time 3/5 -approximation algorithm for graphs; (3) it is NP-hard to approximate the problem within ratio 259/260 +ϵ for graphs; (4) there is a linear-time algorithm to compute the maximum star forest of a weighted tree; (5) there is a polynomial-time ½-approximation algorithm for weighted graphs. We also show how to apply this spanning star forest model to aligning multiple genomic sequences over a tandem duplication region. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.