Author: "Joshua C. Stein" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Joshua C. Stein"' showing total 55 results

Start Over Author "Joshua C. Stein"

55 results on '"Joshua C. Stein"'

1. Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Author: Shujun Ou, Jianing Liu, Kapeel M. Chougule, Arkarachai Fungtammasan, Arun S. Seetharam, Joshua C. Stein, Victor Llaca, Nancy Manchanda, Amanda M. Gilbert, Sharon Wei, Chen-Shan Chin, David E. Hufnagel, Sarah Pedersen, Samantha J. Snodgrass, Kevin Fengler, Margaret Woodhouse, Brian P. Walenz, Sergey Koren, Adam M. Phillippy, Brett T. Hannigan, R. Kelly Dawe, Candice N. Hirsch, Matthew B. Hufford, and Doreen Ware
Subjects: Science
Abstract: Sequence depth and read length determine the quality of genome assembly. Here, the authors leverage a set of PacBio reads to develop guidelines for sequencing and assembly of complex plant genomes in order to allocate finite resources using maize as an example.
Published: 2020
Full Text: View/download PDF

2. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing

Author: Bo Wang, Elizabeth Tseng, Michael Regulski, Tyson A Clark, Ting Hon, Yinping Jiao, Zhenyuan Lu, Andrew Olson, Joshua C. Stein, and Doreen Ware
Subjects: Science
Abstract: Zea mays is an important crop species and genetic model but uncertainties remain regarding the structure of the transcriptome. Here Wang et al. use single-molecule sequencing and size-fractionated libraries to identify novel transcripts and isoforms illustrating the complexity of maize mRNA.
Published: 2016
Full Text: View/download PDF

3. Ensembl Genomes 2020 - enabling non-vertebrate genomic research.

Author: Kevin L. Howe, Bruno Contreras-Moreira, Nishadi De Silva, Gareth Maslen, Wasiu A. Akanni, James E. Allen, Jorge álvarez-Jarreta, Matthieu Barba, Dan M. Bolser, Lahcen Cambell, Manuel Carbajo, Marc Chakiachvili, Mikkel B. Christensen, Carla A. Cummins, Alayne Cuzick, Paul Davis 0001, Silvie Fexova, Astrid Gall, Nancy George, Laurent Gil, Parul Gupta, Kim E. Hammond-Kosack, Erin Haskell, Sarah E. Hunt, Pankaj Jaiswal, Sophie H. Janacek, Paul J. Kersey, Nick Langridge, Uma Maheswari, Thomas Maurel, Mark D. McDowall, Benjamin Moore, Matthieu Muffato, Guy Naamati, Sushma Naithani, Andrew Olson, Irene Papatheodorou, Mateus Patricio, Michael Paulini, Helder Pedro, Emily Perry, Justin Preece, Marc Rosello, Matthew Russell, Vasily Sitnik, Daniel M. Staines, Joshua C. Stein, Marcela K. Tello-Ruiz, Stephen J. Trevanion, Martin Urban, Sharon Wei, Doreen Ware, Gary Williams, Andrew D. Yates, and Paul Flicek
Published: 2020
Full Text: View/download PDF

4. Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species.

Author: Paul Julian Kersey, James E. Allen, Alexis Allot, Matthieu Barba, Sanjay Boddu, Bruce J. Bolt, Denise Carvalho-Silva, Mikkel B. Christensen, Paul Davis 0001, Christoph Grabmueller, Navin Kumar, Zicheng Liu 0004, Thomas Maurel, Benjamin Moore, Mark D. McDowall, Uma Maheswari, Guy Naamati, Victoria Newman, Chuang Kee Ong, Michael Paulini, Helder Pedro, Emily Perry, Matthew Russell, Helen Sparrow, Electra Tapanari, Kieron R. Taylor, Alessandro Vullo, Gareth Williams, Amonida Zadissa, Andrew Olson, Joshua C. Stein, Sharon Wei, Marcela K. Tello-Ruiz, Doreen Ware, Aurelien Luciani, Simon C. Potter, Robert D. Finn, Martin Urban, Kim E. Hammond-Kosack, Dan M. Bolser, Nishadi De Silva, Kevin L. Howe, Nicholas Langridge, Gareth Maslen, Daniel Michael Staines, and Andrew D. Yates
Published: 2018
Full Text: View/download PDF

5. Gramene 2018: unifying comparative genomics and pathway resources for plant research.

Author: Marcela K. Tello-Ruiz, Sushma Naithani, Joshua C. Stein, Parul Gupta, Michael Campbell, Andrew Olson, Sharon Wei, Justin Preece, Matthew J. Geniza, Yinping Jiao, Young Koung Lee, Bo Wang, Joseph Mulvaney, Kapeel Chougule, Justin Elser, Noor Al-Bader, Sunita Kumari, James Thomason, Vivek Kumar, Daniel M. Bolser, Guy Naamati, Electra Tapanari, Nuno A. Fonseca, Laura Huerta, Haider Iqbal, Maria Keays, Alfonso Muñoz-Pomer Fuentes, Y. Amy Tang, Antonio Fabregat, Peter D'Eustachio, Joel Weiser, Lincoln D. Stein, Robert Petryszak, Irene Papatheodorou, Paul J. Kersey, Patti Lockhart, Crispin Taylor, Pankaj Jaiswal, and Doreen Ware
Published: 2018
Full Text: View/download PDF

6. Improved maize reference genome with single-molecule technologies.

Author: Yinping Jiao, Paul Peluso, Jinghua Shi, Tiffany Y. Liang, Michelle C. Stitzer, Bo Wang, Michael Campbell, Joshua C. Stein, Xuehong Wei, Chen-Shan Chin, Katherine E. Guill, Michael Regulski, Sunita Kumari, Andrew Olson, Jonathan Gent, Kevin L. Schneider, Thomas K. Wolfgruber, Michael R. May, Nathan M. Springer, Eric Antoniou, W. Richard McCombie, Gernot G. Presting, Michael D. McMullen, Jeffrey Ross-Ibarra, R. Kelly Dawe, Alex Hastie, David R. Rank, and Doreen Ware
Published: 2017
Full Text: View/download PDF

7. Ensembl Genomes 2016: more genomes, more complexity.

Author: Paul Julian Kersey, James E. Allen, Irina M. Armean, Sanjay Boddu, Bruce J. Bolt, Denise Carvalho-Silva, Mikkel B. Christensen, Paul Davis 0001, Lee J. Falin, Christoph Grabmueller, Jay C. Humphrey, Arnaud Kerhornou, Julia Khobova, Naveen K. Aranganathan, Nicholas Langridge, Ernesto Lowy-Gallego, Mark D. McDowall, Uma Maheswari, Michael Nuhn, Chuang Kee Ong, Bert Overduin, Michael Paulini, Helder Pedro, Emily Perry, Giulietta Spudich, Electra Tapanari, Brandon Walts, Gareth Williams, Marcela K. Tello-Ruiz, Joshua C. Stein, Sharon Wei, Doreen Ware, Daniel M. Bolser, Kevin L. Howe, Eugene Kulesha, Daniel Lawson, Gareth Maslen, and Daniel M. Staines
Published: 2016
Full Text: View/download PDF

8. Gramene 2016: comparative plant genomics and pathway resources.

Author: Marcela K. Tello-Ruiz, Joshua C. Stein, Sharon Wei, Justin Preece, Andrew Olson, Sushma Naithani, Vindhya Amarasinghe, Palitha Dharmawardhana, Yinping Jiao, Joseph Mulvaney, Sunita Kumari, Kapeel Chougule, Justin Elser, Bo Wang, James Thomason, Daniel M. Bolser, Arnaud Kerhornou, Brandon Walts, Nuno A. Fonseca, Laura Huerta, Maria Keays, Y. Amy Tang, Helen E. Parkinson, Antonio Fabregat, Sheldon J. McKay, Joel Weiser, Peter D'Eustachio, Lincoln Stein, Robert Petryszak, Paul J. Kersey, Pankaj Jaiswal, and Doreen Ware
Published: 2016
Full Text: View/download PDF

9. Ensembl Genomes 2013: scaling up access to genome-wide data.

Author: Paul Julian Kersey, James E. Allen, Mikkel B. Christensen, Paul Davis 0001, Lee J. Falin, Christoph Grabmueller, Daniel Seth Toney Hughes, Jay C. Humphrey, Arnaud Kerhornou, Julia Khobova, Nicholas Langridge, Mark D. McDowall, Uma Maheswari, Gareth Maslen, Michael Nuhn, Chuang Kee Ong, Michael Paulini, Helder Pedro, Iliana Toneva, Mary Ann Tuli, Brandon Walts, Gareth Williams, Derek Wilson, Ken Youens-Clark, Marcela K. Monaco, Joshua C. Stein, Xuehong Wei, Doreen Ware, Daniel M. Bolser, Kevin Lee Howe, Eugene Kulesha, Daniel Lawson, and Daniel Michael Staines
Published: 2014
Full Text: View/download PDF

10. Gramene 2013: comparative plant genomics resources.

Author: Marcela K. Monaco, Joshua C. Stein, Sushma Naithani, Sharon Wei, Palitha Dharmawardhana, Sunita Kumari, Vindhya Amarasinghe, Ken Youens-Clark, James Thomason, Justin Preece, Shiran Pasternak, Andrew Olson, Yinping Jiao, Zhenyuan Lu, Daniel M. Bolser, Arnaud Kerhornou, Daniel M. Staines, Brandon Walts, Guanming Wu, Peter D'Eustachio, Robin Haw, David Croft 0001, Paul J. Kersey, Lincoln Stein, Pankaj Jaiswal, and Doreen Ware
Published: 2014
Full Text: View/download PDF

11. Gramene database in 2010: updates and extensions.

Author: Ken Youens-Clark, Edward S. Buckler, Terry M. Casstevens, Charles Chen, Genevieve DeClerck, Paul S. Derwent, Palitha Dharmawardhana, Pankaj Jaiswal, Paul J. Kersey, A. S. Karthikeyan, Jerry Lu, Susan McCouch, Liya Ren, William Spooner, Joshua C. Stein, Jim Thomason, Sharon Wei, and Doreen Ware
Published: 2011
Full Text: View/download PDF

12. Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Author: R. Kelly Dawe, Jianing Liu, Kapeel Chougule, Adam M. Phillippy, Chen-Shan Chin, Sharon Wei, Brian P. Walenz, Sergey Koren, Samantha J. Snodgrass, Brett T. Hannigan, Joshua C. Stein, Arkarachai Fungtammasan, Nancy Manchanda, Arun S. Seetharam, Margaret R. Woodhouse, Kevin Fengler, Sarah Pedersen, Candice N. Hirsch, Shujun Ou, Matthew B. Hufford, Doreen Ware, Victor Llaca, Amanda M. Gilbert, and David E. Hufnagel
Subjects: 0301 basic medicine, 0106 biological sciences, Transposable element, Agricultural genetics, Computer science, Heterochromatin, Science, General Physics and Astronomy, Sequence assembly, Computational biology, Biology, 01 natural sciences, Genome, Zea mays, General Biochemistry, Genetics and Molecular Biology, Article, 03 medical and health sciences, Centromere, Genome assembly algorithms, Resource allocation (computer), Inbreeding, lcsh:Science, Gene, 030304 developmental biology, Sequence (medicine), Repetitive Sequences, Nucleic Acid, 2. Zero hunger, 0303 health sciences, Multidisciplinary, Base Sequence, High-Throughput Nucleotide Sequencing, General Chemistry, 030104 developmental biology, DNA Transposable Elements, lcsh:Q, Line (text file), Limited resources, Genome, Plant, 010606 plant biology & botany
Abstract: Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature., Sequence depth and read length determine the quality of genome assembly. Here, the authors leverage a set of PacBio reads to develop guidelines for sequencing and assembly of complex plant genomes in order to allocate finite resources using maize as an example.
Published: 2020

13. Ensembl Genomes 2020—enabling non-vertebrate genomic research

Author: Bruno Contreras-Moreira, Laurent Gil, Uma Maheswari, Erin Haskell, Paul Flicek, Kim E. Hammond-Kosack, Stephen J. Trevanion, Matthieu Barba, Sushma Naithani, Matthew Russell, Mark D. McDowall, Nancy George, Andrew D. Yates, Emily Perry, Vasily Sitnik, Dan Bolser, Michael Paulini, Paul J. Kersey, Pankaj Jaiswal, Mikkel B. Christensen, Helder Pedro, Gareth Maslen, Lahcen Cambell, Sophie Helen Janacek, Daniel M. Staines, Gary Williams, Matthieu Muffato, Carla Cummins, Wasiu Akanni, Marc Rosello, James E. Allen, Irene Papatheodorou, Astrid Gall, Nick Langridge, Marc Chakiachvili, Joshua C. Stein, Mateus Patricio, Silvie Fexova, Nishadi De Silva, Sarah E. Hunt, Benjamin Moore, Martin Urban, Justin Preece, Marcela K. Tello-Ruiz, Guy Naamati, Andrew Olson, Parul Gupta, Thomas Maurel, Jorge Alvarez-Jarreta, Doreen Ware, Paul Davis, Manuel Carbajo, Alayne Cuzick, Kevin L. Howe, and Sharon Wei
Subjects: 0106 biological sciences, Tree of life, Genomics, Context (language use), Computational biology, Biology, 01 natural sciences, Genome, User-Computer Interface, 03 medical and health sciences, Resource (project management), Reference Values, Ensembl Genomes, Databases, Genetic, Genetics, Database Issue, Animals, Ensembl, Caenorhabditis elegans, 030304 developmental biology, Internet, 0303 health sciences, Computational Biology, Genetic Variation, Molecular Sequence Annotation, Plants, Phenotype, Genome, Fungal, Algorithms, Genome, Bacterial, Genome, Plant, Software, 010606 plant biology & botany
Abstract: Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
Published: 2019

14. Gene disruption by structural mutations drives selection in US rice breeding over the last century

Author: Anna M. McClung, Ramsey C Youngblood, Daniel G. Peterson, Walid Korani, Justin N. Vaughn, Sheron A. Simpson, Jane Grimwood, Doreen Ware, Jeremy D. Edwards, Joshua C. Stein, and Brian E. Scheffler
Subjects: Transposable element, Oryza sativa, biology, Gene mapping, Evolutionary biology, food and beverages, Oryza glaberrima, biology.organism_classification, Indel, Gene, Genome, Selection (genetic algorithm)
Abstract: The genetic basis of general plant vigor is of major interest to food producers, yet the trait is recalcitrant to genetic mapping because of the number of loci involved, their small effects, and linkage. Observations of heterosis in many crops suggests that recessive, malfunctioning versions of genes are a major cause of poor performance, yet we have little information on the mutational spectrum underlying these disruptions. To address this question, we generated a long-read assembly of a tropical japonica rice (Oryza sativa) variety, Carolina Gold, which allowed us to identify structural mutations (>50 bp) and orient them with respect to their ancestral state using the outgroup, Oryza glaberrima. Supporting prior work, we find substantial genome expansion is the sativa branch. While transposable elements (TEs) account for the largest share of size variation, the majority of events are not directly TE-mediated. Tandem duplications are the most common source of insertions and are highly enriched among 50-200bp mutations. To explore the relative impact of various mutational classes on crop fitness, we then track these structural events over the last century of US rice improvement using 101 resequenced varieties. Within this material, a pattern of temporary hybridization between medium and long-grain varieties was followed by recent divergence. During this long-term selection, structural mutations that impact gene exons have been removed at a greater rate than intronic indels and single-nucleotide mutations. These results support the use of ab initio estimates of mutational burden, based on structural data, as an orthogonal predictor in genomic selection.Significance StatementSome crop varieties have superior performance across years and environments. In hybrids, harmful mutations in one parent are masked by the ancestral alleles in the other parent, resulting in increased vigor. Unfortunately, these mutations are very difficult to identify precisely because, individually, they only have a small effect. In this study, we use long-read sequencing to characterize the entire mutational spectrum between two rice varieties. We then track these mutations through the last century of rice breeding. We show that large structural mutations in exons are selected against at a greater rate than any other mutational class. These findings illuminate the nature of deleterious alleles and will guide attempts to predict variety vigor based solely on genomic information.
Published: 2020

15. Detailed analysis of a contiguous 22-Mb region of the maize genome.

Author: Fusheng Wei, Joshua C Stein, Chengzhi Liang, Jianwei Zhang, Robert S Fulton, Regina S Baucom, Emanuele De Paoli, Shiguo Zhou, Lixing Yang, Yujun Han, Shiran Pasternak, Apurva Narechania, Lifang Zhang, Cheng-Ting Yeh, Kai Ying, Dawn H Nagel, Kristi Collura, David Kudrna, Jennifer Currie, Jinke Lin, Hyeran Kim, Angelina Angelova, Gabriel Scara, Marina Wissotski, Wolfgang Golser, Laura Courtney, Scott Kruchowski, Tina A Graves, Susan M Rock, Stephanie Adams, Lucinda A Fulton, Catrina Fronick, William Courtney, Melissa Kramer, Lori Spiegel, Lydia Nascimento, Ananth Kalyanaraman, Cristian Chaparro, Jean-Marc Deragon, Phillip San Miguel, Ning Jiang, Susan R Wessler, Pamela J Green, Yeisoo Yu, David C Schwartz, Blake C Meyers, Jeffrey L Bennetzen, Robert A Martienssen, W Richard McCombie, Srinivas Aluru, Sandra W Clifton, Patrick S Schnable, Doreen Ware, Richard K Wilson, and Rod A Wing
Subjects: Genetics, QH426-470
Abstract: Most of our understanding of plant genome structure and evolution has come from the careful annotation of small (e.g., 100 kb) sequenced genomic regions or from automated annotation of complete genome sequences. Here, we sequenced and carefully annotated a contiguous 22 Mb region of maize chromosome 4 using an improved pseudomolecule for annotation. The sequence segment was comprehensively ordered, oriented, and confirmed using the maize optical map. Nearly 84% of the sequence is composed of transposable elements (TEs) that are mostly nested within each other, of which most families are low-copy. We identified 544 gene models using multiple levels of evidence, as well as five miRNA genes. Gene fragments, many captured by TEs, are prevalent within this region. Elimination of gene redundancy from a tetraploid maize ancestor that originated a few million years ago is responsible in this region for most disruptions of synteny with sorghum and rice. Consistent with other sub-genomic analyses in maize, small RNA mapping showed that many small RNAs match TEs and that most TEs match small RNAs. These results, performed on approximately 1% of the maize genome, demonstrate the feasibility of refining the B73 RefGen_v1 genome assembly by incorporating optical map, high-resolution genetic map, and comparative genomic data sets. Such improvements, along with those of gene and repeat annotation, will serve to promote future functional genomic and phylogenomic research in maize and other grasses.
Published: 2009
Full Text: View/download PDF

16. Gramene database: Navigating plant comparative genomics resources

Author: Paul J. Kersey, Yinping Jiao, Robert Petryszak, Joshua C. Stein, Lincoln D. Stein, Doreen Ware, Antonio Fabregat, Sunita Kumari, Kapeel Chougule, Justin Preece, Laura Huerta, Peter D'Eustachio, Sharon Wei, Parul Gupta, Sushma Naithani, Andrew Olson, Pankaj Jaiswal, Maria Keays, Joel Weiser, Young Koung Lee, Marcela K. Tello-Ruiz, and Joseph Mulvaney
Subjects: 0301 basic medicine, Systems biology, Genomic data, Plant Science, Biology, computer.software_genre, Biochemistry, Genome, Article, 03 medical and health sciences, Annotation, lcsh:Botany, Genetics, 2. Zero hunger, Comparative genomics, Phylogenetic tree, Database, food and beverages, Cell Biology, 15. Life on land, Pathway analysis, Data science, lcsh:QK1-989, 030104 developmental biology, Expression data, computer, Developmental Biology
Abstract: Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationships to enrich the annotation of genomic data and provides tools to perform powerful comparative analyses across a wide spectrum of plant species. It consists of an integrated portal for querying, visualizing and analyzing data for 44 plant reference genomes, genetic variation data sets for 12 species, expression data for 16 species, curated rice pathways and orthology-based pathway projections for 66 plant species including various crops. Here we briefly describe the functions and uses of the Gramene database.
Published: 2016
Full Text: View/download PDF

17. Publisher Correction: Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza

Author: Ann Danowitz, Shu-Min Kao, Manyuan Long, Thomas Wicker, Andrea R. Gschwend, Chengjun Zhang, Jayson Talag, Dave Flowers, Railson Schreinert dos Santos, Derrick J. Zwickl, Bin Han, Jetty S.S. Ammiraju, Seunghee Lee, Claude Becker, Muhua Wang, Scott A. Jackson, Qi Feng, Ramil Mauleon, Kshirod K. Jena, Luis F. Rivera, Moaine El Baidouri, Jeremy Schmutz, Eric Lasserre, Kevin G. Nyberg, Jhih wun Zeng, Robert J Henry, Jose Luis Goicoechea, Carlos A. Machado, Daniel da Rosa Farias, Michael J. Sanderson, Kapeel Chougule, Jianwei Zhang, Nori Kurata, Yi Liao, Julie Jacquemin, Yeisoo Yu, Christos Noutsos, Chuanzhu Fan, Joshua C. Stein, Richard Cooke, Rod A. Wing, Marie-Christine Carpentier, Aiko Iwata, Dongying Gao, Carlos E.M. Londono, Nickolai Alexandrov, Olivier Panaud, Kenneth L. McNally, Xiang Song, Li Zhang, Cheng chieh Wu, Antonio Costa de Oliveira, Dario Copetti, Andrea Zuccolo, Fu Jin Wei, Mingsheng Chen, Sharon Wei, Dave Kudrna, Yue-Ie C. Hsing, Doreen Ware, Jun Wang, Detlef Weigel, Paul L. Sanchez, Luciano Carlos da Maia, and Qiang Zhao
Subjects: 0301 basic medicine, Plant genetics, Genomics, Biology, Oryza, biology.organism_classification, Genome, 03 medical and health sciences, 030104 developmental biology, Genus, Evolutionary biology, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Genetics, Domestication
Abstract: This article was not made open access when initially published online, which was corrected before print publication. In addition, ORCID links were missing for 12 authors and have been added to the HTML and PDF versions of the article.
Published: 2018

18. Gene disruption by structural mutations drives selection in US rice breeding over the last century

Author: Jane Grimwood, Walid Korani, Anna M. McClung, Sheron A. Simpson, Daniel G. Peterson, Joshua C. Stein, Jeremy D. Edwards, Brian E. Scheffler, Kapeel Chougule, Justin N. Vaughn, Ramey C Youngblood, and Doreen Ware
Subjects: 0106 biological sciences, Cancer Research, Heredity, DNA Repair, Single Nucleotide Polymorphisms, QH426-470, 01 natural sciences, Genome, Database and Informatics Methods, INDEL Mutation, Genetics (clinical), 0303 health sciences, biology, Eukaryota, food and beverages, Genomics, Plants, Genetic Mapping, Experimental Organism Systems, Seeds, Sequence Analysis, Genome, Plant, Research Article, Crops, Agricultural, Bioinformatics, Environment, Oryza glaberrima, Genes, Plant, Research and Analysis Methods, Oryza, 03 medical and health sciences, Gene mapping, Plant and Algal Models, Genetics, Grasses, Selection, Genetic, Indel, Molecular Biology, Gene, Alleles, Ecology, Evolution, Behavior and Systematics, Selection (genetic algorithm), 030304 developmental biology, Oryza sativa, Organisms, Biology and Life Sciences, biology.organism_classification, Plant Breeding, Haplotypes, Genetic Loci, Evolutionary biology, Mutation, DNA Transposable Elements, Animal Studies, Hybridization, Genetic, Gene-Environment Interaction, Rice, Sequence Alignment, 010606 plant biology & botany
Abstract: The genetic basis of general plant vigor is of major interest to food producers, yet the trait is recalcitrant to genetic mapping because of the number of loci involved, their small effects, and linkage. Observations of heterosis in many crops suggests that recessive, malfunctioning versions of genes are a major cause of poor performance, yet we have little information on the mutational spectrum underlying these disruptions. To address this question, we generated a long-read assembly of a tropical japonica rice (Oryza sativa) variety, Carolina Gold, which allowed us to identify structural mutations (>50 bp) and orient them with respect to their ancestral state using the outgroup, Oryza glaberrima. Supporting prior work, we find substantial genome expansion in the sativa branch. While transposable elements (TEs) account for the largest share of size variation, the majority of events are not directly TE-mediated. Tandem duplications are the most common source of insertions and are highly enriched among 50-200bp mutations. To explore the relative impact of various mutational classes on crop fitness, we then track these structural events over the last century of US rice improvement using 101 resequenced varieties. Within this material, a pattern of temporary hybridization between medium and long-grain varieties was followed by recent divergence. During this long-term selection, structural mutations that impact gene exons have been removed at a greater rate than intronic indels and single-nucleotide mutations. These results support the use of ab initio estimates of mutational burden, based on structural data, as an orthogonal predictor in genomic selection., Author summary Some crop varieties have superior performance across years and environments. In hybrids, harmful mutations in one parent are masked by the ancestral alleles in the other parent, resulting in increased vigor. Unfortunately, these mutations are very difficult to identify precisely because, individually, they only have a small effect. In this study, we use long-read sequencing to characterize the entire mutational spectrum between two rice varieties. We then track these mutations through the last century of rice breeding. We show that large structural mutations in exons are selected against at a greater rate than any other mutational class. These findings illuminate the nature of deleterious alleles and will guide attempts to predict variety vigor based solely on genomic information.
Published: 2021

19. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing

Author: Elizabeth Tseng, Andrew Olson, Joshua C. Stein, Ting Hon, Yinping Jiao, Zhenyuan Lu, Bo Wang, Doreen Ware, Michael Regulski, and Tyson A. Clark
Subjects: 0301 basic medicine, Science, General Physics and Astronomy, Biology, Genome, Polymerase Chain Reaction, Zea mays, General Biochemistry, Genetics and Molecular Biology, DNA sequencing, Article, Transcriptome, 03 medical and health sciences, Gene Expression Regulation, Plant, Genetic model, Gene, Plant Proteins, 2. Zero hunger, Regulation of gene expression, Whole genome sequencing, Genetics, Multidisciplinary, Sequence Analysis, RNA, Gene Expression Profiling, fungi, food and beverages, General Chemistry, Gene expression profiling, 030104 developmental biology
Abstract: Zea mays is an important genetic model for elucidating transcriptional networks. Uncertainties about the complete structure of mRNA transcripts limit the progress of research in this system. Here, using single-molecule sequencing technology, we produce 111,151 transcripts from 6 tissues capturing ∼70% of the genes annotated in maize RefGen_v3 genome. A large proportion of transcripts (57%) represent novel, sometimes tissue-specific, isoforms of known genes and 3% correspond to novel gene loci. In other cases, the identified transcripts have improved existing gene models. Averaging across all six tissues, 90% of the splice junctions are supported by short reads from matched tissues. In addition, we identified a large number of novel long non-coding RNAs and fusion transcripts and found that DNA methylation plays an important role in generating various isoforms. Our results show that characterization of the maize B73 transcriptome is far from complete, and that maize gene expression is more complex than previously thought., Zea mays is an important crop species and genetic model but uncertainties remain regarding the structure of the transcriptome. Here Wang et al. use single-molecule sequencing and size-fractionated libraries to identify novel transcripts and isoforms illustrating the complexity of maize mRNA.
Published: 2016

20. Improved RNA‐seq Workflows Using CyVerse Cyberinfrastructure

Author: Doreen Ware, Robert R. Klein, Kapeel Chougule, Joshua C. Stein, Xiaofei Wang, Upendra K. Devisetty, and Liya Wang
Subjects: 0301 basic medicine, Protocol (science), Sequence Analysis, RNA, Computer science, business.industry, Gene Expression Profiling, Molecular Sequence Annotation, General Medicine, computer.software_genre, Visualization, 03 medical and health sciences, ComputingMethodologies_PATTERNRECOGNITION, 030104 developmental biology, Workflow, Cyberinfrastructure, Software, Scalability, RNA, Messenger, Data mining, business, computer, Sorghum, Reference genome, Graphical user interface
Abstract: RNA-seq is a vital method for understanding gene structure and expression patterns. Typical RNA-seq analysis protocols use sequencing reads of length 50 to 150 nucleotides for alignment to the reference genome and assembly of transcripts. The resultant transcripts are quantified and used for differential expression and visualization. Existing tools and protocols for RNA-seq are vast and diverse; given their differences in performance, it is critical to select an analysis protocol that is scalable, accurate, and easy to use. Tuxedo, a popular alignment-based protocol for RNA-seq analysis, has been updated with HISAT2, StringTie, StringTie-merge, and Ballgown, and the updated protocol outperforms its predecessor. Similarly, new pseudo-alignment-based protocols like Kallisto and Sleuth reduce runtime and improve performance. However, these tools are challenging for researchers lacking command-line experience. Here, we describe two new RNA-seq analysis protocols, in which all tools are deployed on CyVerse Cyberinfrastructure with user-friendly graphical user interfaces, and validate their performance using plant RNA-seq data. (c) 2018 by John Wiley & Sons, Inc.
Published: 2018

21. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza

Author: Seunghee Lee, Jetty S.S. Ammiraju, Railson Schreinert dos Santos, Ann Danowitz, Shu-Min Kao, Li Zhang, Chengjun Zhang, Cheng chieh Wu, Dongying Gao, Carlos E.M. Londono, Scott A. Jackson, Yi Liao, Mingsheng Chen, Chuanzhu Fan, Andrea Zuccolo, Muhua Wang, Christos Noutsos, Rod A. Wing, Manyuan Long, Robert J Henry, Marie-Christine Carpentier, Kshirod K. Jena, Aiko Iwata, Yue-Ie C. Hsing, Jose Luis Goicoechea, Bin Han, Richard Cooke, Joshua C. Stein, Luis F. Rivera, Thomas Wicker, Dario Copetti, Fu Jin Wei, Claude Becker, Paul L. Sanchez, Qi Feng, Andrea R. Gschwend, Ramil Mauleon, Carlos A. Machado, Derrick J. Zwickl, Daniel da Rosa Farias, Jayson Talag, Dave Flowers, Eric Lasserre, Nickolai Alexandrov, Yeisoo Yu, Moaine El Baidouri, Luciano Carlos da Maia, Jeremy Schmutz, Dave Kudrna, Olivier Panaud, Kenneth L. McNally, Xiang Song, Kevin G. Nyberg, Nori Kurata, Qiang Zhao, Kapeel Chougule, Jhih wun Zeng, Antonio Costa de Oliveira, Jianwei Zhang, Doreen Ware, Jun Wang, Detlef Weigel, Sharon Wei, Julie Jacquemin, Michael J. Sanderson, Ecology and Evolutionary Biology [Tucson] (EEB), University of Arizona, Dipartimento Sci Agr & Ambientali, Università degli Studi di Udine - University of Udine [Italie], Wuhan University [China], University of Georgia [USA], Chinese Academy of Agricultural Mechanization Sciences (CCCME), Laboratoire Génome et développement des plantes (LGDP), Université de Perpignan Via Domitia (UPVD)-Centre National de la Recherche Scientifique (CNRS), Gregor Mendel Institute of Molecular Plant Biology (GMI), Austrian Academy of Sciences (OeAW), Tsukuba University of Technology, Graduate School of Comprehensive Human Sciences, Université de Tsukuba = University of Tsukuba, Sismologie (IPGS) (IPGS-Sismologie), Institut de physique du globe de Strasbourg (IPGS), Université de Strasbourg (UNISTRA)-Institut national des sciences de l'Univers (INSU - CNRS)-Centre National de la Recherche Scientifique (CNRS)-Université de Strasbourg (UNISTRA)-Institut national des sciences de l'Univers (INSU - CNRS)-Centre National de la Recherche Scientifique (CNRS), United States Department of Energy, Institute of Plant Biology, University of Zurich, Plant Genomics and Breeding Center, Department of Biology, and Cold Spring Harbor Laboratory
Subjects: 0301 basic medicine, Crops, Agricultural, Evolution, Genetic Speciation, [SDV]Life Sciences [q-bio], Plant genetics, Introgression, Genomics, Crops, Oryza, Genome, Medical and Health Sciences, Evolution, Molecular, Domestication, 03 medical and health sciences, Molecular evolution, Phylogenetics, Genetics, ComputingMilieux_MISCELLANEOUS, Phylogeny, Conserved Sequence, 2. Zero hunger, Agricultural, biology, food and beverages, Molecular, Genetic Variation, Plant, 15. Life on land, Biological Sciences, biology.organism_classification, 030104 developmental biology, Evolutionary biology, Genome, Plant, Developmental Biology
Abstract: The genus Oryza is a model system for the study of molecular evolution over time scales ranging from a few thousand to 15 million years. Using 13 reference genomes spanning the Oryza species tree, we show that despite few large-scale chromosomal rearrangements rapid species diversification is mirrored by lineage-specific emergence and turnover of many novel elements, including transposons, and potential new coding and noncoding genes. Our study resolves controversial areas of the Oryza phylogeny, showing a complex history of introgression among different chromosomes in the young 'AA' subclade containing the two domesticated species. This study highlights the prevalence of functionally coupled disease resistance genes and identifies many new haplotypes of potential use for future crop protection. Finally, this study marks a milestone in modern rice research with the release of a complete long-read assembly of IR 8 'Miracle Rice', which relieved famine and drove the Green Revolution in Asia 50 years ago.
Published: 2018

22. The maize W22 genome provides a foundation for functional genomics and transposon biology

Author: Kokulapalan Wimalanathan, R. Kelly Dawe, Erik Vollbrecht, Karen E. Koch, Toru Kudo, Sharon Wei, Daniel L. Vera, Ethalinda K. S. Cannon, Qing Li, Paul S. Chomet, Michael S. Campbell, A. Mark Settles, Yinping Jiao, Julia Vrebalov, Christine M. Gault, Dustin Mayfield-Jones, Chunguang Du, Fang Bai, Omer Barad, Doreen Ware, Masaharu Suzuki, Hank W. Bass, Robert Bukowski, Georg Jander, Ruth Davenport, Kevin R. Ahern, John L. Portwood, Doron Shem-Tov, Fei Lu, Wenwei Xiong, Jinghua Shi, Donald R. McCarty, Tobias G. Köllner, Gil Ben-Zvi, Carson M. Andorf, Gil Ronen, Wenbin Mei, Limei He Du, Katherine A. Easterling, Nathan M. Springer, Jaclyn M. Noshay, Hugo K. Dooner, Sarah N. Anderson, Thomas P. Brutnell, Ilya Soifer, Jiahn-Chou Guan, Michelle C. Stitzer, Margaret R. Woodhouse, Charles T. Hunter, W. Brad Barbazuk, Edward S. Buckler, Joshua C. Stein, Kobi Baruch, and Guy Kol
Subjects: 0301 basic medicine, Transposable element, DNA Copy Number Variations, DNA, Plant, Genomics, Computational biology, Biology, Genes, Plant, Genome, Zea mays, DNA sequencing, Chromosomes, Plant, 03 medical and health sciences, Open Reading Frames, Genetics, Copy-number variation, Whole genome sequencing, Sequence Analysis, DNA, DNA Methylation, Chromatin, 030104 developmental biology, DNA Transposable Elements, Functional genomics, Genome, Plant, Reference genome
Abstract: The maize W22 inbred has served as a platform for maize genetics since the mid twentieth century. To streamline maize genome analyses, we have sequenced and de novo assembled a W22 reference genome using short-read sequencing technologies. We show that significant structural heterogeneity exists in comparison to the B73 reference genome at multiple scales, from transposon composition and copy number variation to single-nucleotide polymorphisms. The generation of this reference genome enables accurate placement of thousands of Mutator (Mu) and Dissociation (Ds) transposable element insertions for reverse and forward genetics studies. Annotation of the genome has been achieved using RNA-seq analysis, differential nuclease sensitivity profiling and bisulfite sequencing to map open reading frames, open chromatin sites and DNA methylation profiles, respectively. Collectively, the resources developed here integrate W22 as a community reference genome for functional genomics and provide a foundation for the maize pan-genome.
Published: 2017

23. Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes

Author: Carson M. Andorf, Kevin L. Childs, Jikai Lei, Doreen Ware, Carson Holt, Shin-Han Shiu, Michael S. Campbell, Andrew Olson, Mei Yee Law, Joshua C. Stein, Nicholas Panchy, Mark Yandell, Ning Jiang, Yanni Sun, Carolyn J. Lawrence, and Dian Jiao
Subjects: Quality Control, RNA, Untranslated, Physiology, Pseudogene, Plant Science, Computational biology, Vertebrate and Genome Annotation Project, Biology, Genes, Plant, Zea mays, Genome, Annotation, Databases, Genetic, Genetics, Gene, health care economics and organizations, Whole genome sequencing, Models, Genetic, Molecular Sequence Annotation, Exons, Gene Annotation, Genome project, Breakthrough Technologies, humanities, Introns, Genome, Plant, Pseudogenes
Abstract: The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes.
Published: 2014

24. Gramene 2018: unifying comparative genomics and pathway resources for plant research

Author: Lincoln D. Stein, Pankaj Jaiswal, Dan Bolser, Parul Gupta, Guy Naamati, Sunita Kumari, Crispin B. Taylor, Kapeel Chougule, Antonio Fabregat, Nuno A. Fonseca, Alfonso Muñoz-Pomer Fuentes, Joseph Mulvaney, James Thomason, Noor Al-Bader, Yinping Jiao, Robert Petryszak, Electra Tapanari, Y. Amy Tang, Paul J. Kersey, Justin Preece, Matthew Geniza, Haider Iqbal, Patti Lockhart, Bo Wang, Young Koung Lee, Doreen Ware, Sushma Naithani, Peter D'Eustachio, Laura Huerta, Vivek Kumar, Sharon Wei, Justin Elser, Marcela K. Tello-Ruiz, Irene Papatheodorou, Andrew Olson, Maria Keays, Joel Weiser, Michael S. Campbell, and Joshua C. Stein
Subjects: 0301 basic medicine, Genetic Research, Knowledge Bases, Genomics, Genome browser, Computational biology, Paralogous Gene, Biology, Bioinformatics, Genome, Epigenesis, Genetic, 03 medical and health sciences, User-Computer Interface, Gene Expression Regulation, Plant, Databases, Genetic, Genetics, Ensembl, Database Issue, Synteny, 2. Zero hunger, Comparative genomics, Genetic Variation, Molecular Sequence Annotation, Gene Annotation, 15. Life on land, Plants, 030104 developmental biology, Gene Ontology, Genome, Plant, Metabolic Networks and Pathways, Software
Abstract: Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversity; and pathway associations. Gramene's Plant Reactome provides a knowledgebase of cellular-level plant pathway networks. Specifically, it uses curated rice reference pathways to derive pathway projections for an additional 66 species based on gene orthology, and facilitates display of gene expression, gene–gene interactions, and user-defined omics data in the context of these pathways. As a community portal, Gramene integrates best-of-class software and infrastructure components including the Ensembl genome browser, Reactome pathway browser, and Expression Atlas widgets, and undergoes periodic data and software upgrades. Via powerful, intuitive search interfaces, users can easily query across various portals and interactively analyze search results by clicking on diverse features such as genomic context, highly augmented gene trees, gene expression anatomograms, associated pathways, and external informatics resources. All data in Gramene are accessible through both visual and programmatic interfaces.
Published: 2017

25. Improved maize reference genome with single-molecule technologies

Author: Xuehong Wei, Tiffany Y. Liang, Nathan M. Springer, Yinping Jiao, Doreen Ware, Thomas K. Wolfgruber, Jonathan I. Gent, Michael D. McMullen, Andrew Olson, Michelle C. Stitzer, Joshua C. Stein, Bo Wang, Paul Peluso, Chen-Shan Chin, Jinghua Shi, Eric Antoniou, Jeffrey Ross-Ibarra, Kevin L. Schneider, W. Richard McCombie, R. Kelly Dawe, Michael Regulski, Katherine E. Guill, Alex Hastie, Sunita Kumari, Gernot G. Presting, Michael R. May, Michael S. Campbell, and David R. Rank
Subjects: 0301 basic medicine, Optics and Photonics, Messenger, Genome informatics, Genome, Contig Mapping, Phylogeny, 2. Zero hunger, Genetics, Multidisciplinary, Contig, High-Throughput Nucleotide Sequencing, Reference Standards, Single Molecule Imaging, DNA, Intergenic, Genome, Plant, Crops, Agricultural, Transposable element, General Science & Technology, 1.1 Normal biological development and functioning, Centromere, Crops, Computational biology, Biology, Genes, Plant, Zea mays, Chromosomes, Plant, Article, Chromosomes, 03 medical and health sciences, Gene density, RNA, Messenger, Gene, Sorghum, Agricultural, Intergenic, Human Genome, Molecular Sequence Annotation, Gene Annotation, Plant, DNA, 030104 developmental biology, Genes, DNA Transposable Elements, RNA, Plant sciences, Reference genome
Abstract: An improved reference genome for maize, using single-molecule sequencing and high-resolution optical mapping, enables characterization of structural variation and repetitive regions, and identifies lineage expansions of transposable elements that are unique to maize. Supplementary information The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users., A better map of the maize genome The maize genome was initially reported in 2009 but with some accuracy limitations. Doreen Ware and colleagues report a new reference genome for maize using single-molecule sequencing and high-resolution optical mapping. The technique shows improvements in the gene space including resolution of gaps and misassemblies and correction of order and orientation of genes. The authors characterize structural variation and repetitive regions, and identify transposable element lineage expansions unique to maize. Supplementary information The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users., Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation1. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions2. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome3, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing4. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes. Supplementary information The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users.
Published: 2017

26. Disentangling Methodological and Biological Sources of Gene Tree Discordance on Oryza (Poaceae) Chromosome 3

Author: Derrick J. Zwickl, Joshua C. Stein, Doreen Ware, Michael J. Sanderson, and Rod A. Wing
Subjects: Genetics, Introgression, Inference, Oryza, Sequence alignment, Context (language use), Biology, Classification, Genome, Chromosomes, Plant, Tree (data structure), Phylogenetics, Evolutionary biology, Phylogenomics, Genome, Plant, Phylogeny, Ecology, Evolution, Behavior and Systematics
Abstract: We describe new methods for characterizing gene tree discordance in phylogenomic data sets, which screen for deviations from neutral expectations, summarize variation in statistical support among gene trees, and allow comparison of the patterns of discordance induced by various analysis choices. Using an exceptionally complete set of genome sequences for the short arm of chromosome 3 in Oryza (rice) species, we applied these methods to identify the causes and consequences of differing patterns of discordance in the sets of gene trees inferred using a panel of 20 distinct analysis pipelines. We found that discordance patterns were strongly affected by aspects of data selection, alignment, and alignment masking. Unusual patterns of discordance evident when using certain pipelines were reduced or eliminated by using alternative pipelines, suggesting that they were the product of methodological biases rather than evolutionary processes. In some cases, once such biases were eliminated, evolutionary processes such as introgression could be implicated. Additionally, patterns of gene tree discordance had significant downstream impacts on species tree inference. For example, inference from supermatrices was positively misleading when pipelines that led to biased gene trees were used. Several results may generalize to other data sets: we found that gene tree and species tree inference gave more reasonable results when intron sequence was included during sequence alignment and tree inference, the alignment software PRANK was used, and detectable "block-shift" alignment artifacts were removed. We discuss our findings in the context of well-established relationships in Oryza and continuing controversies regarding the domestication history of O. sativa. (gene trees; multilocus data; Oryza; phylogenomics; phylogeny reconstruction; species trees.)
Published: 2014

27. Improved maize reference genome with single molecule technologies

Author: Jinghua Shi, Xuehong Wei, Michael S. Campbell, Eric Antoniou, Katherine E. Guill, Alex Hastie, Michael D. McMullen, Michelle C. Stitzer, Bo Wang, Jeffrey Ross-Ibarra, Nathan M. Springer, Thomas K. Wolfgruber, Gernot G. Presting, Sunita Kumari, Tiffany Y. Liang, Jonathan I. Gent, Chen-Shan Chin, Kelly Dawe, Yinping Jiao, Paul Peluso, Doreen Ware, Michael Regulski, Andrew Olson, Richard W. McCombie, Joshua C. Stein, David R. Rank, Michael R. May, and Kevin L. Schneider
Subjects: 0106 biological sciences, Transposable element, 0303 health sciences, Contig, Genomics, Computational biology, Gene Annotation, Biology, 01 natural sciences, Genome, 03 medical and health sciences, Gene density, 030304 developmental biology, 010606 plant biology & botany, Reference genome, Single molecule real time sequencing
Abstract: Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate elucidation of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here, we report the assembly and annotation of maize, a genetic and agricultural model species, using Single Molecule Real-Time (SMRT) sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and significant improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed over 130,000 intact transposable elements (TEs), allowing us to identify TE lineage expansions unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by SMRT sequencing. In addition, comparative optical mapping of two other inbreds revealed a prevalence of deletions in the low gene density region and maize lineage-specific genes.
Published: 2016
Full Text: View/download PDF

28. MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations

Author: Mei Yee Law, Gaurav D. Moghe, Carolyn J. Lawrence, Yanni Sun, David E. Hufnagel, Mark Yandell, Kevin L. Childs, Rujira Achawanantakun, Joshua C. Stein, Shin-Han Shiu, Dian Jiao, Doreen Ware, Ning Jiang, Michael S. Campbell, Jikai Lei, and Carson Holt
Subjects: Physiology, Pseudogene, Arabidopsis, Plant Science, Computational biology, Genes, Plant, Zea mays, Genome, Article, Annotation, Genetics, Arabidopsis thaliana, Repetitive Sequences, Nucleic Acid, biology, Computational Biology, Reproducibility of Results, The Arabidopsis Information Resource, Molecular Sequence Annotation, Exons, Genome project, biology.organism_classification, Non-coding RNA, Alternative Splicing, Genome, Plant, Pseudogenes, Software
Abstract: We have optimized and extended the widely used annotation engine MAKER in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, noncoding RNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software tool kit, MAKER-P, using the Arabidopsis (Arabidopsis thaliana) and maize (Zea mays) genomes. Here, we demonstrate the ability of the MAKER-P tool kit to automatically update, extend, and revise the Arabidopsis annotations in light of newly available data and to annotate pseudogenes and noncoding RNAs absent from The Arabidopsis Informatics Resource 10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even Arabidopsis, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center. We show that this public resource can de novo annotate the entire Arabidopsis and maize genomes in less than 3 h and produce annotations of comparable quality to those of the current The Arabidopsis Information Resource 10 and maize V2 annotation builds.
Published: 2013

29. Gramene 2013: comparative plant genomics resources

Author: Sunita Kumari, Vindhya Amarasinghe, Pankaj Jaiswal, Andrew Olson, Brandon Walts, Sushma Naithani, Paul J. Kersey, Joshua C. Stein, Daniel M. Staines, Marcela K. Monaco, Guanming Wu, Arnaud Kerhornou, Justin Preece, Palitha Dharmawardhana, Peter D'Eustachio, James Thomason, Dan Bolser, Yinping Jiao, David Croft, Lincoln D. Stein, Ken Youens-Clark, Robin Haw, Shiran Pasternak, Zhenyuan Lu, Sharon Wei, and Doreen Ware
Subjects: Crops, Agricultural, Systems biology, Genomics, Computational biology, Genome, Arabidopsis, Databases, Genetic, Genetics, Synteny, 2. Zero hunger, Internet, Phylogenetic tree, biology, food and beverages, Genetic Variation, Molecular Sequence Annotation, 15. Life on land, Plants, biology.organism_classification, VII. Plant databases, Functional genomics, Genome, Plant, Metabolic Networks and Pathways
Abstract: Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
Published: 2013

30. Ensembl Genomes 2013: scaling up access to genome-wide data

Author: Kevin L. Howe, Ken Youens-Clark, Lee J. Falin, Michael Nuhn, Paul J. Kersey, Dan Bolser, Gareth Maslen, Brandon Walts, Daniel M. Staines, Christoph Grabmueller, Doreen Ware, Julia Khobova, Arnaud Kerhornou, Mikkel B. Christensen, Nicholas Langridge, Eugene Kulesha, Xuehong Wei, Marcela K. Monaco, Daniel Lawson, Jay C. Humphrey, Gareth Williams, Helder Pedro, Michael Paulini, Chuang Kee Ong, Derek Wilson, Daniel S.T. Hughes, James E. Allen, Uma Maheswari, Iliana Toneva, Mark D. McDowall, Mary Ann Tuli, Joshua C. Stein, and Paul Davis
Subjects: 0106 biological sciences, Genomics, Context (language use), Computational biology, Bacterial genome size, Biology, 01 natural sciences, Genome, 03 medical and health sciences, Ensembl Genomes, Databases, Genetic, Genetics, Animals, Ensembl, 030304 developmental biology, Internet, 0303 health sciences, Molecular Sequence Annotation, Genome project, ComputingMethodologies_PATTERNRECOGNITION, Genome, Fungal, Edible Grain, Genome, Bacterial, Genome, Plant, Software, IV. Viruses, bacteria, protozoa and fungi, 010606 plant biology & botany, Reference genome
Abstract: Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.
Published: 2013

31. Ensembl Genomes 2016: more genomes, more complexity

Author: Paul J. Kersey, Daniel M. Staines, Jay C. Humphrey, Bert Overduin, Julia Khobova, Doreen Ware, Paul Davis, Gareth Maslen, Emily Perry, Kevin L. Howe, Electra Tapanari, Bruce J. Bolt, Sharon Wei, Michael Nuhn, Joshua C. Stein, Ernesto Lowy, Naveen K. Aranganathan, Irina M. Armean, Brandon Walts, Mikkel B. Christensen, Giulietta Spudich, James E. Allen, Lee J. Falin, Marcela K. Tello-Ruiz, Gareth Williams, Christoph Grabmueller, Denise Carvalho-Silva, Mark D. McDowall, Daniel Lawson, Nicholas Langridge, Chuang Kee Ong, Uma Maheswari, Helder Pedro, Dan Bolser, Arnaud Kerhornou, Michael Paulini, Sanjay Boddu, and Eugene Kulesha
Subjects: 0301 basic medicine, Genomics, Context (language use), Bacterial genome size, Computational biology, Biology, Genome, Polyploidy, 03 medical and health sciences, Ensembl Genomes, Databases, Genetic, Genetics, Ensembl, Animals, Database Issue, Whole genome sequencing, Eukaryota, Genetic Variation, Diploidy, Invertebrates, 030104 developmental biology, Genome, Fungal, Sequence Alignment, Genome, Bacterial, Genome, Plant, Reference genome
Abstract: Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
Published: 2016

32. Gramene: A Resource for Comparative Analysis of Plants Genomes and Pathways

Author: Ken Youens-Clark, Pankaj Jaiswal, Marcela K. Tello-Ruiz, Sharon Wei, Joshua C. Stein, and Doreen Ware
Subjects: 0301 basic medicine, Comparative genomics, Phylogenetic tree, Genome browser, Computational biology, Biology, Genome, Structural variation, 03 medical and health sciences, Upload, 030104 developmental biology, Phylogenetics, Botany, Synteny
Abstract: Gramene is an integrated informatics resource for accessing, visualizing, and comparing plant genomes and biological pathways. Originally targeting grasses, Gramene has grown to host annotations for economically important and research model crops, including wheat, potato, tomato, banana, grape, poplar, and Chlamydomonas. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. This chapter outlines system requirements for end users and database hosting, data types and basic navigation within Gramene, and provides examples of how to (1) view a phylogenetic tree for a family of transcription factors, (2) explore genetic variation in the orthologues of a gene with a known trait association, and (3) upload, visualize, and privately share end user data into a new genome browser track.Moreover, this is the first publication describing Gramene's new web interface-intended to provide a simplified portal to the most complete and up-to-date set of plant genome and pathway annotations.
Published: 2016

33. Gramene 2016: comparative plant genomics and pathway resources

Author: Sharon Wei, Justin Preece, Andrew Olson, Y. Amy Tang, James Thomason, Bo Wang, Palitha Dharmawardhana, Paul J. Kersey, Dan Bolser, Lincoln D. Stein, Helen Parkinson, Joseph Mulvaney, Vindhya Amarasinghe, Sheldon J. McKay, Robert Petryszak, Yinping Jiao, Pankaj Jaiswal, Brandon Walts, Maria Keays, Sushma Naithani, Antonio Fabregat, Joshua C. Stein, Justin Elser, Nuno A. Fonseca, Joel Weiser, Marcela K. Tello-Ruiz, Doreen Ware, Sunita Kumari, Kapeel Chougule, Peter D'Eustachio, Arnaud Kerhornou, and Laura Huerta
Subjects: 0301 basic medicine, Gene Expression, Genomics, Computational biology, Biology, Bioinformatics, Genome, 03 medical and health sciences, Annotation, Databases, Genetic, Genetics, Ensembl, Database Issue, 2. Zero hunger, Comparative genomics, Internet, food and beverages, Genetic Variation, Molecular Sequence Annotation, 15. Life on land, Plants, 030104 developmental biology, Functional genomics, Plant genomics, Genome, Plant, Metabolic Networks and Pathways
Abstract: Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.
Published: 2015

34. Gramene database in 2010: updates and extensions

Author: Doreen Ware, William Spooner, Ken Youens-Clark, Sharon Wei, Liya Ren, Paul J. Kersey, Jerry Lu, Joshua C. Stein, Palitha Dharmawardhana, Pankaj Jaiswal, Terry M. Casstevens, A. S. Karthikeyan, Paul S. Derwent, Genevieve DeClerck, Charles Chen, Edward S. Buckler, James Thomason, and Susan R. McCouch
Subjects: 0106 biological sciences, Quantitative Trait Loci, Genomics, Genome browser, computer.software_genre, Genes, Plant, 01 natural sciences, Genome, Synteny, 03 medical and health sciences, Annotation, Databases, Genetic, Genetics, Ensembl, 030304 developmental biology, 2. Zero hunger, 0303 health sciences, Database, biology, food and beverages, Chromosome Mapping, Genetic Variation, Articles, 15. Life on land, Plants, biology.organism_classification, Brachypodium, Web service, computer, Genome, Plant, Metabolic Networks and Pathways, 010606 plant biology & botany
Abstract: Now in its 10th year, the Gramene database (http://www.gramene.org) has grown from its primary focus on rice, the first fully-sequenced grass genome, to become a resource for major model and crop plants including Arabidopsis, Brachypodium, maize, sorghum, poplar and grape in addition to several species of rice. Gramene began with the addition of an Ensembl genome browser and has expanded in the last decade to become a robust resource for plant genomics hosting a wide array of data sets including quantitative trait loci (QTL), metabolic pathways, genetic diversity, genes, proteins, germplasm, literature, ontologies and a fully-structured markers and sequences database integrated with genome browsers and maps from various published studies (genetic, physical, bin, etc.). In addition, Gramene now hosts a variety of web services including a Distributed Annotation Server (DAS), BLAST and a public MySQL database. Twice a year, Gramene releases a major build of the database and makes interim releases to correct errors or to make important updates to software and/or data.
Published: 2010

35. Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor

Author: Steven R. Eichten, Nathan M. Springer, Ruth A. Swanson-Wagner, Doreen Ware, Peter Tiffin, Joshua C. Stein, and Sunita Kumari
Subjects: Recombination, Genetic, Genetics, Comparative Genomic Hybridization, Genotype, Research, Gene Dosage, Chromosome Mapping, Genetic Variation, Biology, Genes, Plant, Zea mays, Genome, Gene dosage, Structural variation, Species Specificity, Genetic variation, Gene family, Copy-number variation, Genetic variability, Gene, Genetics (clinical)
Abstract: Individuals of the same species are generally thought to have very similar genomes. However, there is growing evidence that structural variation in the form of copy number variation (CNV) and presence–absence variation (PAV) can lead to variation in the genome content of individuals within a species. Array comparative genomic hybridization (CGH) was used to compare gene content and copy number variation among 19 diverse maize inbreds and 14 genotypes of the wild ancestor of maize, teosinte. We identified 479 genes exhibiting higher copy number in some genotypes (UpCNV) and 3410 genes that have either fewer copies or are missing in the genome of at least one genotype relative to B73 (DownCNV/PAV). Many of these DownCNV/PAV are examples of genes present in B73, but missing from other genotypes. Over 70% of the CNV/PAV examples are identified in multiple genotypes, and the majority of events are observed in both maize and teosinte, suggesting that these variants predate domestication and that there is not strong selection acting against them. Many of the genes affected by CNV/PAV are either maize specific (thus possible annotation artifacts) or members of large gene families, suggesting that the gene loss can be tolerated through buffering by redundant functions encoded elsewhere in the genome. While this structural variation may not result in major qualitative variation due to genetic buffering, it may significantly contribute to quantitative variation.
Published: 2010

36. New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica

Author: Mark Wright, Jer Ming Chia, Elena Ghiban, Joshua C. Stein, Eric Antonio, James Gurtowski, Doreen Ware, Lyza G. Maron, Alejandro Hernandez Wences, W. Richard McCombie, Hayan Lee, Susan R. McCouch, Eric Biggers, Michael C. Schatz, and Melissa Kramer
Subjects: 0106 biological sciences, 2. Zero hunger, 0303 health sciences, Sequence assembly, Genomics, Computational biology, Biology, 01 natural sciences, Genome, Phenotype, Structural variation, 03 medical and health sciences, Gene, Organism, 030304 developmental biology, 010606 plant biology & botany, Reference genome
Abstract: The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. Currently, when the genomes of different strains of a given organism are compared, whole genome resequencing data are aligned to an established reference sequence. However when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to explore the extent of structural variation among strains adapted to different ecologies and geographies, and show that this variation can be significant, often matching or exceeding the variation present in closely related human populations or other mammals. We demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared to provide an unbiased assessment. Using this approach, we are able to accurately assess the ?pan-genome? of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard resequencing approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.
Published: 2014

37. A 4-gigabase physical map unlocks the structure and evolution of the complex genome of Aegilops tauschii, the wheat D-genome progenitor

Author: Frank M. You, Gerard R. Lazo, Ming-Cheng Luo, Yong Q. Gu, Shiran Pasternak, Song Weining, Karin R. Deal, Patrick E. McGuire, W. Richard McCombie, Shiyong Chen, Wanlong Li, Shahryar F. Kianian, Yuqin Hu, Melissa Kramer, Sunish K. Sehgal, Naxin Huo, Chad M. Jorgensen, Jirui Wang, Doreen Ware, Jan Dvorak, Michael W. Bevan, Yong Zhang, Mihaela Martis, Bikram S. Gill, Jaroslav Doležel, Olin D. Anderson, Joshua C. Stein, Klaus F. X. Mayer, Yaqin Ma, Hana Šimková, and Yi Wang
Subjects: Genetic Markers, Chromosomes, Artificial, Bacterial, Genome evolution, Centromere, Genes, Plant, Poaceae, Polymorphism, Single Nucleotide, Genome, Chromosomes, Plant, Single nucleotide polymorphism, Synteny, Gene density, Oryza, BAC contig coassembly, Evolution, Molecular, Contig Mapping, Aegilops tauschii, Triticum, Recombination, Genetic, Whole genome sequencing, Genetics, Multidisciplinary, biology, Shotgun sequencing, food and beverages, Sequence Analysis, DNA, Genome project, Biological Sciences, biology.organism_classification, Brachypodium distachyon, Genome, Plant
Abstract: The current limitations in genome sequencing technology require the construction of physical maps for high-quality draft sequences of large plant genomes, such as that of Aegilops tauschii , the wheat D-genome progenitor. To construct a physical map of the Ae. tauschii genome, we fingerprinted 461,706 bacterial artificial chromosome clones, assembled contigs, designed a 10K Ae. tauschii Infinium SNP array, constructed a 7,185-marker genetic map, and anchored on the map contigs totaling 4.03 Gb. Using whole genome shotgun reads, we extended the SNP marker sequences and found 17,093 genes and gene fragments. We showed that collinearity of the Ae. tauschii genes with Brachypodium distachyon, rice, and sorghum decreased with phylogenetic distance and that structural genome evolution rates have been high across all investigated lineages in subfamily Pooideae, including that of Brachypodieae. We obtained additional information about the evolution of the seven Triticeae chromosomes from 12 ancestral chromosomes and uncovered a pattern of centromere inactivation accompanying nested chromosome insertions in grasses. We showed that the density of noncollinear genes along the Ae. tauschii chromosomes positively correlates with recombination rates, suggested a cause, and showed that new genes, exemplified by disease resistance genes, are preferentially located in high-recombination chromosome regions.
Published: 2013

38. Signaling the Arrest of Pollen Tube Development in Self-Incompatible Plants

Author: Joshua C. Stein, June B. Nasrallah, Muthugapatti K. Kandasamy, and Mikhail E. Nasrallah
Subjects: Multidisciplinary, Pollination, Mechanism (biology), Botany, Recognition system, Pollen tube, Signal transduction, Biology, Receptor, Protein kinase A, Inbreeding, Cell biology
Abstract: Self-incompatibility (SI), the cellular recognition system that limits inbreeding, has served as a paradigm for the study of cell-to-cell communication in plants since the phenomenon was first described by Darwin. Recent studies indicate that SI is achieved by diverse molecular mechanisms in different plant species. In the mustard family, the mechanism of SI shows parallels to the signaling systems found in animals that are mediated by cell-surface receptors with signal-transducing protein kinase activity.
Published: 1994

39. Evidence for Network Evolution in an Arabidopsis Interactome Map

Author: Viviana Romero, Jeffery L. Dangl, Jyotika Mirchandani, M. Shahid Mukhtar, Eric Olivares, Samuel J. Pevzner, Gopalakrishna Ramaswamy, Jonathan D. Chesnut, Geetha M. Swamilingiah, Thomas Rolland, Doreen Ware, William Spooner, Tijana Milenkovic, Edward A. Rietman, Huaming Chen, Rosa Cheuk Kim, Pascal Braun, Frederick P. Roth, Lantian Gai, Matthew M. Poulin, Gourab Ghoshal, Robert J. Schmitz, Balaji Santhanam, Stacy Wu, Murat Tasan, Paul Shinn, Michael E. Cusick, Danielle Byrdsong, Marc Vidal, Andrew MacWilliams, Selma Waaijers, Chris de los Reyes, Jonathan D. Moore, Uday Matrubutham, Fana Gebreab, Patrick Reichert, Claire Lurin, Dario Monachello, Changyu Fan, Jean Vandenhaute, Padmavathi Balumuri, Matija Dreze, Vanessa Bautista, Yong-Yeol Ahn, Albert-László Barabási, Natasa Przulj, Benoit Charloteaux, Joshua C. Stein, Tong Hao, Mary Galli, Joseph R. Ecker, Junshi Yazaki, Amélie Dricot, Suswapna Patnaik, Melissa Duarte, Sabrina Rabello, Evan M. Weiner, Anne-Ruxandra Carvunis, Christopher Kim, Rosa Quan, Patrick Gilles, Bryan J. Gutierrez, David E. Hill, Stanley Tam, Harvard Medical School [Boston] (HMS), Department of Genetics [Boston], Faculté de Médecine, Université de Liège, Faculté Universitaire Notre Dame de la Paix, Partenaires INRAE, Salk Institute for Biological Studies, Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, department of biological chemistry and molecular pharmacology, Life Technologies, Unité de recherche en génomique végétale (URGV), Institut National de la Recherche Agronomique (INRA)-Université d'Évry-Val-d'Essonne (UEVE)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS), Department of Biology, Northern Arizona University [Flagstaff], Northeastern University [Boston], University of Notre Dame [Indiana] (UND), University of Warwick, Boston University [Boston] (BU), Department of Computing, Imperial College London, Cold Spring Harbor Laboratory (CSHL), Eagle Genomics Ltd, Eagle Genomics, and University of Warwick [Coventry]
Subjects: 0106 biological sciences, binding, [SDV]Life Sciences [q-bio], plant, Computational biology, Biology, 01 natural sciences, Interactome, 03 medical and health sciences, Arabidopsis, Botany, expression, [SDV.BV]Life Sciences [q-bio]/Vegetal Biology, genome, 030304 developmental biology, 2. Zero hunger, 0303 health sciences, Multidisciplinary, evolve, fungi, food and beverages, 15. Life on land, biology.organism_classification, duplicated gene, protein interaction network, fate, plasticity, divergence, 010606 plant biology & botany
Abstract: International audience; Plants have unique features that evolved in response to their environments and ecosystems. A full account of the complex cellular networks that underlie plant-specific functions is still missing. We describe a proteome-wide binary protein-protein interaction map for the interactome network of the plant Arabidopsis thaliana containing about 6200 highly reliable interactions between about 2700 proteins. A global organization of plant biological processes emerges from community analyses of the resulting network, together with large numbers of novel hypothetical functional links between proteins and pathways. We observe a dynamic rewiring of interactions following gene duplication events, providing evidence for a model of evolution acting upon interactome networks. This and future plant interactome maps should facilitate systems approaches to better understand plant biology and improve crops.
Published: 2011

40. A Plant Receptor-Like Gene, the S-Locus Receptor Kinase of Brassica oleracea L., Encodes a Functional Serine/Threonine Kinase

Author: Joshua C. Stein and June B. Nasrallah
Subjects: Physiology, Molecular Sequence Data, Brassica, Plant Science, Protein Serine-Threonine Kinases, Biology, Genes, Plant, MAP2K7, Genetics, Protein phosphorylation, Amino Acid Sequence, c-Raf, Phosphorylation, Protein kinase A, Protein kinase C, Plant Proteins, Serine/threonine-specific protein kinase, Binding Sites, Base Sequence, food and beverages, DNA, Receptor protein serine/threonine kinase, Molecular biology, Biochemistry, bacteria, Casein kinase 2, Protein Kinases, Research Article
Abstract: To investigate the catalytic properties of the Brassica oleracea S-locus receptor kinase (SRK), we have expressed the domain that is homologous to protein kinases as a fusion protein in Escherichia coli. Following in vivo labeling of cultures with 32P-labeled inorganic phosphate, we observed phosphorylation of the fusion protein on serine and threonine, but not on tyrosine. In contrast, labeling was not observed when lysine-524, a residue conserved among all protein kinases, was mutated to arginine, thus confirming that SRK phosphorylation was the result of intrinsic serine/threonine kinase activity.
Published: 1993

41. The B73 maize genome: complexity, diversity, and dynamics

Author: Kristi Collura, Nay Thane, Sanzhen Liu, Sharon Wei, Joshua C. Stein, Jason Waligorski, Shanmugam Rajasekar, Robert A. Martienssen, Patrick S. Schnable, Marc Cotton, Georgina Lopez, R. Kelly Dawe, Jennifer Sgro, Krista Delaney, Linda McMahan, Krishna L. Kanchi, Qi Sun, Jeffrey L. Bennetzen, Asif T. Chinwalla, Zhijie Liu, Gernot G. Presting, Jennifer S. Hodges, Jianwei Zhang, Doreen Ware, William Spooner, Melissa Kramer, Stephanie Muller, Kelly Mead, Jeffrey A. Jeddeloh, Peter Van Buren, W. Richard McCombie, Thomas J. Wang, Stephanie M. Jackson, Beth Miller, Ananth Kalyanaraman, Wolfgang Golser, Rene Lomeli, Aswathy Sebastian, Ara Ko, Alan M. Myers, Carol Soderlund, Kai Ying, Thomas K. Wolfgruber, Lixing Yang, Sunita Kumari, Yujun Han, Jayson Talag, John D. Nguyen, Shawn Leonard, Shiran Pasternak, Chad Tomlinson, Barbara Gillam, Angelina Angelova, Weizu Chen, Bryan W. Penning, Catrina Fronick, Apurva Narechania, Zeljko Dujmic, Matt Cordes, Tina Graves, Cheng Ting Yeh, Jennifer Currie, Michael S. Waterman, Seunghee Lee, Amy Denise Reily, Sandra W. Clifton, Jean-Marc Deragon, Matthew W. Vaughn, Jessica Ruppert, Chengzhi Liang, Dan Nettleton, Maureen C. McCann, Michele Braidotti, Scott Kruchowski, Shiguo Zhou, Ning Jiang, Feiyu Du, Cindy Strong, Thomas P. Brutnell, Scott J. Emrich, Nicholas C. Carpita, Michael J. Levy, Srinivas Aluru, Yi Jia, Liya Ren, Laura Courtney, Teri Mueller, Ruifeng He, Marco Cardenas, Fusheng Wei, Brandon Delgado, Lalit Ponnala, Robert S. Fulton, Elizabeth Applebaum, Jinke Lin, Kevin L. Schneider, Le Yan, Kelsi Rotter, Ben Faga, Susan M. Rock, Elizabeth Ingenthron, Adam Scimone, Andrea Zuccolo, Cristian Chaparro, Neha Shah, Qihui Zhu, Hye-Ran Lee, Richard P. Westerman, Chuanzhu Fan, Dave Kudrna, Rachel Abbott, Lidia Nascimento, Jer Ming Chia, Kerri Ochoa, Lindsey Phelps, Elizabeth Ashley, Damon Lisch, Lucinda Fulton, Gabriel Scara, Bill Courtney, Lori Spiegel, Kim D. Delehaunty, Anupma Sharma, Andrew Levy, Hyeran Kim, Richard K. Wilson, Patrick Minx, Rod A. Wing, Phillip SanMiguel, An-Ping Hsia, Yan Fu, Kyung Kim, Nathan M. Springer, Regina S. Baucom, Woojin Kim, Jason Falcone, Pinghua Li, David C. Schwartz, W. Brad Barbazuk, Jamey Higginbotham, Susan R. Wessler, T. K. Thane, Jessica Henke, Hao Wang, Jiming Jiang, Yeisoo Yu, Sara Kohlberg, Claude Ambroise, Kevin Crouse, Theresa Zutavern, Pamela Marchetto, David Campos, Lifang Zhang, James C. Estill, Dawn H. Nagel, Marina Wissotski, Eddie Belter, Center for Plant Genomics, Iowa State University (ISU), Cold Spring Harbor Laboratory (CSHL), Department of Genetics [Saint-Louis], Washington University in Saint Louis (WUSTL), Ecology and Evolutionary Biology [Tucson] (EEB), University of Arizona, Department of Genetics, Development, and Cell Biology, Department of Electrical and Computer Engineering [Iowa], University of Iowa [Iowa City], University of Florida [Gainesville] (UF), Department of Genetics, University of Georgia [USA], Cornell University [New York], Department of Botany and Plant Pathology, Purdue University [West Lafayette], Laboratoire Génome et développement des plantes (LGDP), Université de Perpignan Via Domitia (UPVD)-Centre National de la Recherche Scientifique (CNRS), Department of Agronomy, NimbleGen, Department of Horticulture, Michigan State University [East Lansing], Michigan State University System-Michigan State University System, Lawrence Berkeley National Laboratory [Berkeley] (LBNL), Department of Biological Sciences [West Lafayette], and Department of plant Biology
Subjects: 0106 biological sciences, MESH: Genome, Plant, MESH: Sequence Analysis, DNA, MESH: Zea mays, MESH: Base Sequence, MESH: RNA, Plant, [SDV.BID.SPT]Life Sciences [q-bio]/Biodiversity/Systematics, Phylogenetics and taxonomy, 01 natural sciences, Genome, Divergence, MESH: Ploidies, MESH: DNA Methylation, MESH: Genes, Plant, Nested association mapping, Copy-number variation, MESH: Genetic Variation, MESH: Chromosomes, Plant, 2. Zero hunger, Genetics, 0303 health sciences, Multidisciplinary, [SDV.BBM.BS]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Structural Biology [q-bio.BM], [SDV.BID.EVO]Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE], [SDV.BBM.MN]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Molecular Networks [q-bio.MN], Arabidopsis-Thaliana, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Retrotransposons, MESH: DNA Transposable Elements, Helitron, MESH: Centromere, MESH: Recombination, Genetic, MESH: DNA Copy Number Variations, Ploidy, Transposable element, Genome evolution, Evolution, [SDV.BC]Life Sciences [q-bio]/Cellular Biology, Biology, Methylation, [SDV.GEN.GPL]Life Sciences [q-bio]/Genetics/Plants genetics, 03 medical and health sciences, MESH: Retroelements, MESH: Inbreeding, [SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN], Zea-Mays, [SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biochemistry [q-bio.BM], MESH: DNA, Plant, Gene, 030304 developmental biology, MESH: Molecular Sequence Data, Transposable Elements, [SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Molecular biology, Plant, 15. Life on land, MESH: Crops, Agricultural, [SDV.BV.AP]Life Sciences [q-bio]/Vegetal Biology/Plant breeding, Genes, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], MESH: Chromosome Mapping, MESH: MicroRNAs, 010606 plant biology & botany
Abstract: A-Maize-ing Maize is one of our oldest and most important crops, having been domesticated approximately 9000 years ago in central Mexico. Schnable et al. (p. 1112 ; see the cover) present the results of sequencing the B73 inbred maize line. The findings elucidate how maize became diploid after an ancestral doubling of its chromosomes and reveals transposable element movement and activity and recombination. Vielle-Calzada et al. (p. 1078 ) have sequenced the Palomero Toluqueño ( Palomero ) landrace, a highland popcorn from Mexico, which, when compared to the B73 line, reveals multiple loci impacted by domestication. Swanson-Wagner et al. (p. 1118 ) exploit possession of the genome to analyze expression differences occurring between lines. The identification of single nucleotide polymorphisms and copy number variations among lines was used by Gore et al. (p. 1115 ) to generate a Haplotype map of maize. While chromosomal diversity in maize is high, it is likely that recombination is the major force affecting the levels of heterozygosity in maize. The availability of the maize genome will help to guide future agricultural and biofuel applications (see the Perspective by Feuillet and Eversole ).
Published: 2009

42. Detailed analysis of a contiguous 22-Mb region of the maize genome

Author: Cheng Ting Yeh, Lucinda Fulton, Phillip San Miguel, Srinivas Aluru, Shiran Pasternak, Stephanie Adams, Lori Spiegel, Fusheng Wei, Regina S. Baucom, Melissa Kramer, Patrick S. Schnable, Blake C. Meyers, Lixing Yang, Rod A. Wing, Pamela J. Green, Catrina Fronick, Robert S. Fulton, Kristi Collura, Cristian Chaparro, Scott Kruchowski, Gabriel Scara, Susan M. Rock, David Kudrna, Joshua C. Stein, Richard K. Wilson, Yeisoo Yu, Jianwei Zhang, Dawn H. Nagel, Hyeran Kim, Jinke Lin, Emanuele De Paoli, William Courtney, Marina Wissotski, Sandra W. Clifton, Lifang Zhang, Angelina Angelova, Lydia Nascimento, Apurva Narechania, Laura Courtney, Robert A. Martienssen, Wolfgang Golser, Kai Ying, Ananth Kalyanaraman, Chengzhi Liang, Doreen Ware, Ning Jiang, Shiguo Zhou, David C. Schwartz, Susan R. Wessler, Jennifer Currie, Jeffrey L. Bennetzen, W. Richard McCombie, Yujun Han, Tina Graves, Jean-Marc Deragon, Ecology and Evolutionary Biology [Tucson] (EEB), University of Arizona, Cold Spring Harbor Laboratory (CSHL), Department of Genetics [Saint-Louis], Washington University in Saint Louis (WUSTL), Department of Genetics, University of Georgia [USA], Delaware Biotechnology Institute, University of Delaware [Newark], Laboratory for Molecular and Computational Genomics [Madison], University of Wisconsin-Madison, Department of Plant Biology [Athens], Center for Plant Genomics, Iowa State University (ISU), School of Electrical Engineering and Computer Science (EECS), Washington State University (WSU), Laboratoire Génome et développement des plantes (LGDP), Université de Perpignan Via Domitia (UPVD)-Centre National de la Recherche Scientifique (CNRS), Department of Horticulture and Landscape Architecture, Purdue University [West Lafayette], Department of Horticulture, Michigan State University [East Lansing], Michigan State University System-Michigan State University System, Department of Electrical and Computer Engineering, and Ecker, Joseph R
Subjects: 0106 biological sciences, MESH: Zea mays, Sequence Homology, MESH: Base Sequence, MESH: RNA, Plant, [SDV.BID.SPT]Life Sciences [q-bio]/Biodiversity/Systematics, Phylogenetics and taxonomy, 01 natural sciences, Gene Duplication, MESH: Genes, Plant, MESH: Chromosomes, Plant, Base Pairing, 0303 health sciences, [SDV.BBM.BS]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Structural Biology [q-bio.BM], [SDV.BID.EVO]Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE], MESH: Gene Duplication, food and beverages, [SDV.BBM.MN]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Molecular Networks [q-bio.MN], Physical Chromosome Mapping, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], MESH: DNA Transposable Elements, RNA, Plant, Genetics and Genomics/Comparative Genomics, Transposable element, Evolution, MESH: Gene Rearrangement, Molecular Sequence Data, Gene redundancy, MESH: Physical Chromosome Mapping, Evolution, Molecular, [SDV.GEN.GPL]Life Sciences [q-bio]/Genetics/Plants genetics, 03 medical and health sciences, Open Reading Frames, [SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN], Genetics, Molecular Biology, Ecology, Evolution, Behavior and Systematics, Genetics and Genomics/Plant Genomes and Evolution, Synteny, [SDV.GEN]Life Sciences [q-bio]/Genetics, MESH: Molecular Sequence Data, Molecular, Oryza, [SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Molecular biology, Gene rearrangement, Plant, MESH: Open Reading Frames, Genes, Genetic Loci, Mutation, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Developmental Biology, MESH: Genome, Plant, Cancer Research, Sequence assembly, Retrotransposon, Genome, MESH: Sorghum, Genetics (clinical), MESH: Evolution, Molecular, 2. Zero hunger, Gene Rearrangement, MESH: Synteny, Genetics and Genomics/Bioinformatics, MESH: Oryza sativa, Genome, Plant, Research Article, Biotechnology, MESH: Mutation, lcsh:QH426-470, MESH: Base Pairing, Computational biology, [SDV.BC]Life Sciences [q-bio]/Cellular Biology, Biology, Genes, Plant, Zea mays, MESH: Sequence Homology, Nucleic Acid, MESH: Genetic Loci, Chromosomes, Plant, Chromosomes, Sequence Homology, Nucleic Acid, [SDV.BV]Life Sciences [q-bio]/Vegetal Biology, [SDV.BBM]Life Sciences [q-bio]/Biochemistry, Molecular Biology, Genetics and Genomics/Genomics, [SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biochemistry [q-bio.BM], Gene, Sorghum, 030304 developmental biology, Base Sequence, Nucleic Acid, Human Genome, lcsh:Genetics, Genetics and Genomics/Genome Projects, [SDV.BV.AP]Life Sciences [q-bio]/Vegetal Biology/Plant breeding, DNA Transposable Elements, RNA, 010606 plant biology & botany
Abstract: Most of our understanding of plant genome structure and evolution has come from the careful annotation of small (e.g., 100 kb) sequenced genomic regions or from automated annotation of complete genome sequences. Here, we sequenced and carefully annotated a contiguous 22 Mb region of maize chromosome 4 using an improved pseudomolecule for annotation. The sequence segment was comprehensively ordered, oriented, and confirmed using the maize optical map. Nearly 84% of the sequence is composed of transposable elements (TEs) that are mostly nested within each other, of which most families are low-copy. We identified 544 gene models using multiple levels of evidence, as well as five miRNA genes. Gene fragments, many captured by TEs, are prevalent within this region. Elimination of gene redundancy from a tetraploid maize ancestor that originated a few million years ago is responsible in this region for most disruptions of synteny with sorghum and rice. Consistent with other sub-genomic analyses in maize, small RNA mapping showed that many small RNAs match TEs and that most TEs match small RNAs. These results, performed on ∼1% of the maize genome, demonstrate the feasibility of refining the B73 RefGen_v1 genome assembly by incorporating optical map, high-resolution genetic map, and comparative genomic data sets. Such improvements, along with those of gene and repeat annotation, will serve to promote future functional genomic and phylogenomic research in maize and other grasses., Author Summary Maize is a major cereal crop and key experimental system for eukaryotic biology. Previous investigations of the maize genome at the sequence level have primarily focused on analyses of genome survey sequences and BAC contigs. Here we used a comprehensive set of resources to construct an ordered and oriented 22-Mb sequence from chromosome 4 that represents 1% of the maize genome. Genome annotation revealed the presence of 544 genes that are interspersed with transposable elements (TEs), which occupy 83.8% of the sequence. Fifty-one genes were involved in 14 tandem gene clusters and most appear to have arisen after lineage divergence. TEs, especially helitrons, were found to contain gene fragments and were widely distributed in gene-rich regions. Large inversions and unequal gene deletion between the two homoeologous maize regions were the main contributors to synteny disruption among maize, sorghum, and rice. We also show that small RNAs are primarily associated with TEs across the region. Comparison of this ordered and oriented sequence with the corresponding uncurated region in the whole genome sequence of maize resulted in improvements in TE annotation that will ultimately enhance detection sensitivity and characterization of TEs. Doing so is likely to improve the specificity of gene annotations.
Published: 2009

43. A Genome-Wide Characterization of MicroRNA Genes in Maize

Author: Michael D. McMullen, Doreen Ware, Apurva Narechania, Zhijie Liu, Katherine E. Guill, Lifang Zhang, Joshua C. Stein, Christopher G. Maher, Sunita Kumari, and Jer Ming Chia
Subjects: 0106 biological sciences, Cancer Research, Small RNA, Genome evolution, lcsh:QH426-470, RNA Splicing, Molecular Sequence Data, Computational Biology/Comparative Sequence Analysis, Biology, Genes, Plant, 01 natural sciences, Genome, Synteny, Zea mays, Homology (biology), Genetics and Genomics/Plant Genetics and Gene Expression, Conserved sequence, 03 medical and health sciences, Plant Biology/Plant Genetics and Gene Expression, Open Reading Frames, Gene Expression Regulation, Plant, Sequence Homology, Nucleic Acid, Genetics, RNA, Messenger, Genetics and Genomics/Genomics, Molecular Biology, Gene, Genetics (clinical), Ecology, Evolution, Behavior and Systematics, Conserved Sequence, Sorghum, 030304 developmental biology, 2. Zero hunger, Regulation of gene expression, 0303 health sciences, Base Sequence, Nucleotides, Gene Expression Profiling, Genetic Variation, Gene expression profiling, lcsh:Genetics, MicroRNAs, Organ Specificity, Multigene Family, 010606 plant biology & botany, Research Article
Abstract: MicroRNAs (miRNAs) are small, non-coding RNAs that play essential roles in plant growth, development, and stress response. We conducted a genome-wide survey of maize miRNA genes, characterizing their structure, expression, and evolution. Computational approaches based on homology and secondary structure modeling identified 150 high-confidence genes within 26 miRNA families. For 25 families, expression was verified by deep-sequencing of small RNA libraries that were prepared from an assortment of maize tissues. PCR–RACE amplification of 68 miRNA transcript precursors, representing 18 families conserved across several plant species, showed that splice variation and the use of alternative transcriptional start and stop sites is common within this class of genes. Comparison of sequence variation data from diverse maize inbred lines versus teosinte accessions suggest that the mature miRNAs are under strong purifying selection while the flanking sequences evolve equivalently to other genes. Since maize is derived from an ancient tetraploid, the effect of whole-genome duplication on miRNA evolution was examined. We found that, like protein-coding genes, duplicated miRNA genes underwent extensive gene-loss, with ∼35% of ancestral sites retained as duplicate homoeologous miRNA genes. This number is higher than that observed with protein-coding genes. A search for putative miRNA targets indicated bias towards genes in regulatory and metabolic pathways. As maize is one of the principal models for plant growth and development, this study will serve as a foundation for future research into the functional roles of miRNA genes., Author Summary MicroRNAs are non-coding RNAs that regulate gene expression post-transcriptionally and play roles in diverse pathways including those acting on development and responses to stress. Here, we describe a genome-wide computational prediction of maize miRNA genes and their characterization with respect to expression, putative targets, evolution following whole genome duplication, and allelic diversity. The structures of unprocessed primary miRNA transcripts were determined by 5′ RACE and 3′ RACE. Expression profiles were surveyed in five tissue types by deep-sequencing of small RNA libraries. We predicted miRNA targets computationally based on the most recent maize protein annotations. Analysis of the predicted functions of target genes, on the basis of gene ontology, supported their roles in regulatory processes. We identified putative orthologs in Sorghum based on an analysis of synteny and found that maize-homoeologous miRNA genes were retained more frequently than expected. We also explored miRNA nucleotide diversity among many maize inbred lines and partially inbred teosinte lines. The results indicated that mature miRNA genes were highly conserved during their evolution. This preliminary characterization based on our findings provides a framework for future analysis of miRNA genes and their roles in key traits of maize as feed, fodder, and biofuel.
Published: 2009

44. Molecular cloning of a putative receptor protein kinase gene encoded at the self-incompatibility locus of Brassica oleracea

Author: Joshua C. Stein, Mikhail E. Nasrallah, Douglas C. Boyes, Bruce Howlett, and June B. Nasrallah
Subjects: Molecular Sequence Data, Restriction Mapping, Oligonucleotides, Gene Expression, Receptors, Cell Surface, Locus (genetics), Brassica, Biology, Genes, Plant, Pollen coat, Polymerase Chain Reaction, Homology (biology), Amino Acid Sequence, RNA, Messenger, Cloning, Molecular, Protein kinase A, Gene, Alleles, Regulation of gene expression, Genetics, Multidisciplinary, Base Sequence, food and beverages, Protein-Tyrosine Kinases, Blotting, Northern, Transmembrane domain, Pollen, Pollen-pistil interaction, Polymorphism, Restriction Fragment Length, Research Article
Abstract: Self-recognition between pollen and stigma during pollination in Brassica oleracea is genetically controlled by the multiallelic self-incompatibility locus (S). We describe the S receptor kinase (SRK) gene, a previously uncharacterized gene that resides at the S locus. The nucleotide sequences of genomic DNA and of cDNAs corresponding to SRK predict a putative transmembrane receptor having serine/threonine-specific protein kinase activity. Its extracellular domain exhibits striking homology to the secreted product of the S-locus glycoprotein (SLG) gene and is connected via a single pass transmembrane domain to a protein kinase catalytic center. SRK alleles derived from different S-locus genotypes are highly polymorphic and have apparently evolved in unison with genetically linked alleles of SLG. SRK directs the synthesis of several alternative transcripts, which potentially encode different protein products, and these transcripts were detected exclusively in reproductive organs. The identification of SRK may provide new perspectives into the signal transduction mechanism underlying pollen recognition.
Published: 1991

45. Transformation of Brassica oleracea with an S-locus gene from B. campestris changes the self-incompatibility phenotype

Author: Kinya Toriyama, Joshua C. Stein, Mikhail E. Nasrallah, and June B. Nasrallah
Subjects: Genetics, biology, fungi, Genetic transfer, Brassica, food and beverages, General Medicine, Genetically modified crops, biology.organism_classification, Transformation (genetics), Brassica oleracea, Allele, Agronomy and Crop Science, Gene, Pollen-pistil interaction, Biotechnology
Abstract: An SLG gene derived from the S-locus and encoding and S-locus-specific glycoprotein of Brassica campestris L. was introduced via Agrobacterium-mediated transformation into B. oleracea L. A self-incompatible hybrid and another with partial self-compatibility were used as recipients. The transgenic plants were altered in their pollen-stigma interaction and were fully compatible upon self-pollination. Reciprocal crosses between the transgenic plants and untransformed control plants indicated that the stigma reaction was changed in one recipient strain while the pollen reaction was altered in the other. Due to interspecific incompatibility, we could not demonstrate whether or not the introduced SLG gene confers a new allelic specificity in the transgenic plants. Our results show that the introduced SLG gene perturbs the self-incompatibility phenotype of stigma and pollen.
Published: 1991

46. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats

Author: François Sabot, Nils Stein, Apurva Narechania, Doreen Ware, Thomas Wicker, Joshua C. Stein, Giang Thu Vu, Andreas Graner, University of Zurich, and Stein, N
Subjects: 0106 biological sciences, Chromosomes, Artificial, Bacterial, DNA, Plant, lcsh:QH426-470, lcsh:Biotechnology, Sequence assembly, Hybrid genome assembly, 580 Plants (Botany), Biology, Genes, Plant, 01 natural sciences, Genome, 03 medical and health sciences, 10126 Department of Plant and Microbial Biology, 1311 Genetics, lcsh:TP248.13-248.65, Genetics, Gene, Repetitive Sequences, Nucleic Acid, 030304 developmental biology, 2. Zero hunger, 0303 health sciences, Bacterial artificial chromosome, Shotgun sequencing, Chromosome Mapping, food and beverages, Hordeum, Sequence Analysis, DNA, genomic DNA, lcsh:Genetics, 1305 Biotechnology, DNA microarray, Genome, Plant, Research Article, 010606 plant biology & botany, Biotechnology
Abstract: Background Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR) index can be generated to map repetitive regions in genomic sequences. Results We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC) clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. Conclusion An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences) regions in uncharacterised genomic sequences. The restriction that a particular MDR index can not be used across species is outweighed by the low costs of Illumina/Solexa sequencing which makes any chosen genome accessible for whole-genome sequence sampling.
Published: 2008

47. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes

Author: Doreen Ware, Apurva Narechania, Stefan Kurtz, and Joshua C. Stein
Subjects: 0106 biological sciences, Genome evolution, lcsh:QH426-470, Gene prediction, lcsh:Biotechnology, Genomics, Retrotransposon, Computational biology, Biology, 01 natural sciences, Genome, Zea mays, 03 medical and health sciences, lcsh:TP248.13-248.65, Genetics, Methods, Sorghum, 030304 developmental biology, 2. Zero hunger, 0303 health sciences, Shotgun sequencing, Methodology Article, Computational Biology, food and beverages, Oryza, Genome project, lcsh:Genetics, k-mer, DNA Transposable Elements, Genome, Plant, Software, 010606 plant biology & botany, Biotechnology
Abstract: Background The challenges of accurate gene prediction and enumeration are further aggravated in large genomes that contain highly repetitive transposable elements (TEs). Yet TEs play a substantial role in genome evolution and are themselves an important subject of study. Repeat annotation, based on counting occurrences of k-mers, has been previously used to distinguish TEs from low-copy genic regions; but currently available software solutions are impractical due to high memory requirements or specialization for specific user-tasks. Results Here we introduce the Tallymer software, a flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the k-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set of whole genome shotgun sequences from maize (B73) (total size 109 bp.). We analyzed k-mer frequencies for a wide range of k. At this low genome coverage (≈ 0.45×) highly repetitive 20-mers constituted 44% of the genome but represented only 1% of all possible k-mers. Similar low-complexity was seen in the repeat fractions of sorghum and rice. When applying our method to other maize data sets, High-C 0 t derived sequences showed the greatest enrichment for low-copy sequences. Among annotated TEs, the most highly repetitive were of the Ty3/gypsy class of retrotransposons, followed by the Ty1/copia class, and DNA transposons. Among expressed sequence tags (EST), a notable fraction contained high-copy k-mers, suggesting that transposons are still active in maize. Retrotransposons in Mo17 and McC cultivars were readily detected using the B73 20-mer frequency index, indicating their conservation despite extensive rearrangement across cultivars. Among one hundred annotated bacterial artificial chromosomes (BACs), k-mer frequency could be used to detect transposon-encoded genes with 92% sensitivity, compared to 96% using alignment-based repeat masking, while both methods showed 92% specificity. Conclusion The Tallymer software was effective in a variety of applications to aid genome annotation in maize, despite limitations imposed by the relatively low coverage of sequence available. For more information on the software, see http://www.zbh.uni-hamburg.de/Tallymer.
Published: 2008

48. Engineering vitamin E content: from Arabidopsis mutant to soy oil

Author: Alison L. Van Eenennaam, Kim Lincoln, Timothy P. Durrett, Henry E. Valentin, Christine K. Shewmaker, Greg M. Thorne, Jian Jiang, Susan R. Baszis, Charlene K. Levering, Eric D. Aasen, Ming Hao, Joshua C. Stein, Susan R. Norris, and Robert L. Last
Subjects: Methyltransferase, DNA, Complementary, medicine.medical_treatment, Transgene, Mutant, Molecular Sequence Data, Arabidopsis, Tocopherols, Plant Science, Biology, Gene Expression Regulation, Enzymologic, Gene Expression Regulation, Plant, medicine, Escherichia coli, Vitamin E, Tocopherol, Amino Acid Sequence, Cloning, Molecular, Alleles, Regulation of gene expression, Sequence Homology, Amino Acid, Arabidopsis Proteins, food and beverages, Cell Biology, Methyltransferases, Sequence Analysis, DNA, biology.organism_classification, Plants, Genetically Modified, Enzyme assay, Soybean Oil, Biochemistry, Mutation, biology.protein, Soybeans, Research Article
Abstract: We report the identification and biotechnological utility of a plant gene encoding the tocopherol (vitamin E) biosynthetic enzyme 2-methyl-6-phytylbenzoquinol methyltransferase. This gene was identified by map-based cloning of the Arabidopsis mutation vitamin E pathway gene3-1 (vte3-1), which causes increased accumulation of delta-tocopherol and decreased gamma-tocopherol in the seed. Enzyme assays of recombinant protein supported the hypothesis that At-VTE3 encodes a 2-methyl-6-phytylbenzoquinol methyltransferase. Seed-specific expression of At-VTE3 in transgenic soybean reduced seed delta-tocopherol from 20 to 2%. These results confirm that At-VTE3 protein catalyzes the methylation of 2-methyl-6-phytylbenzoquinol in planta and show the utility of this gene in altering soybean tocopherol composition. When At-VTE3 was coexpressed with At-VTE4 (gamma-tocopherol methyltransferase) in soybean, the seed accumulated to95% alpha-tocopherol, a dramatic change from the normal 10%, resulting in a greater than eightfold increase of alpha-tocopherol and an up to fivefold increase in seed vitamin E activity. These findings demonstrate the utility of a gene identified in Arabidopsis to alter the tocopherol composition of commercial seed oils, a result with both nutritional and food quality implications.
Published: 2003

49. Srk

Author: June B. Nasrallah and Joshua C. Stein
Subjects: chemistry.chemical_classification, Genetics, Serine, chemistry, Haplotype, Autophosphorylation, food and beverages, Threonine, Allele, Biology, Null allele, Gene, Amino acid
Abstract: The chapter discusses the S-locus receptor PK (Srk) that has a receptor-like structure, possesses intrinsic serine/threonine PK activity, and is capable of autophosphorylation. The pattern of expression of SRK gene and its high degree of sequence polymorphism among different self-incompatibility haplotypes suggest a role for Srk as a receptor in pollen/stigma recognition. Plants bearing null alleles of SRK exhibit loss of the self-incompatibility response. Potential ligands and substrates remain to be identified. Each S-locus haplotype (>50 are known) is believed to encode a distinct Srk allele. Srk6, Srk2, and Srk910 have been characterized. These show as much as 32% sequence divergence at the amino acid level. The S-locus glycoprotein (Slg) is highly similar to the extracellular domain of Srk and is also encoded at the S locus. When derived from the same S-locus haplotype, Srk and S1g have as high as 90% amino acid identity. SRK transcript variants potentially encode N-terminally truncated and C-terminally truncated forms. SRK6 and SRK2 were sequenced from B. oleracea. SRK910 was sequenced from a B. campestris haplotype that was introgressed into B. napus.
Published: 1995

50. An alternative transcript of the S locus glycoprotein gene in a class II pollen-recessive self-incompatibility haplotype of Brassica oleracea encodes a membrane-anchored protein

Author: Joshua C. Stein, Titima Tantikanjana, Mikhail E. Nasrallah, June B. Nasrallah, and Che-Hong Chen
Subjects: Transgene, Immunoblotting, Molecular Sequence Data, Plant Science, Brassica, Biology, Genes, Plant, Gene product, Tobacco, Amino Acid Sequence, RNA, Messenger, Gene, Peptide sequence, Glycoproteins, Plant Proteins, Genetics, chemistry.chemical_classification, Base Sequence, Reproduction, Alternative splicing, Haplotype, RNA, Membrane Proteins, Cell Biology, DNA, Molecular biology, Biological Evolution, Alternative Splicing, Plants, Toxic, chemistry, Haplotypes, Multigene Family, Pollen, Glycoprotein, Protein Kinases, Research Article
Abstract: Recent reports have shown that SLG, one of two genes linked to the S locus of Brassica, encodes a secreted glycoprotein. We have used RNA gel blot analysis, genomic and cDNA clone analysis, expression in transgenic plants, and immunodetection to characterize SLG2, the SLG gene derived from the S2 haplotype. This haplotype belongs to the class II group of S haplotypes that exhibit a weak incompatibility phenotype and are pollen recessive. We showed that SLG2 produces two transcript forms: the expected 1.6-kb transcript that predicts a secreted glycoprotein and an alternative 1.8-kb transcript that predicts a membrane-anchored protein. Stigmas of the S2 haplotype and pistils of transgenic tobacco plants transformed with the SLG2 gene produce a membrane-associated 62-kD protein as well as soluble 57- and 58-kD glycoforms. Because of the sequence similarity between SLG2 and the extracellular domain of the S Locus Receptor Kinase (SRK2) gene, the membrane-anchored form of SLG2 may be viewed as a naturally occurring truncated form of the receptor that lacks the kinase catalytic domain. The occurrence of this protein has potential implications for the activity of the full-length receptor. Furthermore, the underlying structure of the SLG2 gene suggests the evolution of SLG from an ancestral SRK-like gene.
Published: 1993

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

55 results on '"Joshua C. Stein"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources