10 results on '"Ware, D."'
Search Results
2. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats
- Author
-
Wicker, T, Narechania, A, Sabot, F, Stein, J, Vu, G T H, Graner, A, Ware, D, Stein, N, Wicker, T, Narechania, A, Sabot, F, Stein, J, Vu, G T H, Graner, A, Ware, D, and Stein, N
- Abstract
BACKGROUND: Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR) index can be generated to map repetitive regions in genomic sequences. RESULTS: We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC) clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. CONCLUSION: An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences) regions in uncharacterised genomic sequences. The restriction that a particular MDR index can not be us
- Published
- 2008
3. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats
- Author
-
Wicker, T, Narechania, A, Sabot, F, Stein, J, Vu, G T H, Graner, A, Ware, D, and Stein, N
- Subjects
2. Zero hunger
4. A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset.
- Author
-
Zhou Y, Kathiresan N, Yu Z, Rivera LF, Yang Y, Thimma M, Manickam K, Chebotarov D, Mauleon R, Chougule K, Wei S, Gao T, Green CD, Zuccolo A, Xie W, Ware D, Zhang J, McNally KL, and Wing RA
- Subjects
- Workflow, Plant Breeding, Software, High-Throughput Nucleotide Sequencing methods, Polymorphism, Single Nucleotide, Genome, Plant
- Abstract
Background: Single-nucleotide polymorphisms (SNPs) are the most widely used form of molecular genetic variation studies. As reference genomes and resequencing data sets expand exponentially, tools must be in place to call SNPs at a similar pace. The genome analysis toolkit (GATK) is one of the most widely used SNP calling software tools publicly available, but unfortunately, high-performance computing versions of this tool have yet to become widely available and affordable., Results: Here we report an open-source high-performance computing genome variant calling workflow (HPC-GVCW) for GATK that can run on multiple computing platforms from supercomputers to desktop machines. We benchmarked HPC-GVCW on multiple crop species for performance and accuracy with comparable results with previously published reports (using GATK alone). Finally, we used HPC-GVCW in production mode to call SNPs on a "subpopulation aware" 16-genome rice reference panel with ~ 3000 resequenced rice accessions. The entire process took ~ 16 weeks and resulted in the identification of an average of 27.3 M SNPs/genome and the discovery of ~ 2.3 million novel SNPs that were not present in the flagship reference genome for rice (i.e., IRGSP RefSeq)., Conclusions: This study developed an open-source pipeline (HPC-GVCW) to run GATK on HPC platforms, which significantly improved the speed at which SNPs can be called. The workflow is widely applicable as demonstrated successfully for four major crop species with genomes ranging in size from 400 Mb to 2.4 Gb. Using HPC-GVCW in production mode to call SNPs on a 25 multi-crop-reference genome data set produced over 1.1 billion SNPs that were publicly released for functional and breeding studies. For rice, many novel SNPs were identified and were found to reside within genes and open chromatin regions that are predicted to have functional consequences. Combined, our results demonstrate the usefulness of combining a high-performance SNP calling architecture solution with a subpopulation-aware reference genome panel for rapid SNP discovery and public deployment., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
5. An ultra-high-density map as a community resource for discerning the genetic basis of quantitative traits in maize.
- Author
-
Liu H, Niu Y, Gonzalez-Portilla PJ, Zhou H, Wang L, Zuo T, Qin C, Tai S, Jansen C, Shen Y, Lin H, Lee M, Ware D, Zhang Z, Lübberstedt T, and Pan G
- Subjects
- Genetic Linkage, Genome, Plant, Polymorphism, Single Nucleotide, Sequence Analysis, DNA, Zea mays physiology, Chromosome Mapping methods, Quantitative Trait Loci, Zea mays genetics
- Abstract
Background: To safeguard the food supply for the growing human population, it is important to understand and exploit the genetic basis of quantitative traits. Next-generation sequencing technology performs advantageously and effectively in genetic mapping and genome analysis of diverse genetic resources. Hence, we combined re-sequencing technology and a bin map strategy to construct an ultra-high-density bin map with thousands of bin markers to precisely map a quantitative trait locus., Results: In this study, we generated a linkage map containing 1,151,856 high quality SNPs between Mo17 and B73, which were verified in the maize intermated B73 × Mo17 (IBM) Syn10 population. This resource is an excellent complement to existing maize genetic maps available in an online database (iPlant, http://data.maizecode.org/maize/qtl/syn10/ ). Moreover, in this population combined with the IBM Syn4 RIL population, we detected 135 QTLs for flowering time and plant height traits across the two populations. Eighteen known functional genes and twenty-five candidate genes for flowering time and plant height trait were fine-mapped into a 2.21-4.96 Mb interval. Map expansion and segregation distortion were also analyzed, and evidence for inadvertent selection of early flowering time in the process of mapping population development was observed. Furthermore, an updated integrated map with 1,151,856 high-quality SNPs, 2,916 traditional markers and 6,618 bin markers was constructed. The data were deposited into the iPlant Discovery Environment (DE), which provides a fundamental resource of genetic data for the maize genetic research community., Conclusions: Our findings provide basic essential genetic data for the maize genetic research community. An updated IBM Syn10 population and a reliable, verified high-quality SNP set between Mo17 and B73 will aid in future molecular breeding efforts.
- Published
- 2015
- Full Text
- View/download PDF
6. Functional annotation of the transcriptome of Sorghum bicolor in response to osmotic stress and abscisic acid.
- Author
-
Dugas DV, Monaco MK, Olsen A, Klein RR, Kumari S, Ware D, and Klein PE
- Subjects
- Gene Expression Regulation, Plant, Gene Regulatory Networks, Genes, Plant, Osmosis, Phenotype, Promoter Regions, Genetic, RNA, Plant genetics, Sequence Analysis, RNA, Sorghum physiology, Transcription Factors genetics, Water physiology, Abscisic Acid pharmacology, Droughts, Sorghum genetics, Transcriptome
- Abstract
Background: Higher plants exhibit remarkable phenotypic plasticity allowing them to adapt to an extensive range of environmental conditions. Sorghum is a cereal crop that exhibits exceptional tolerance to adverse conditions, in particular, water-limiting environments. This study utilized next generation sequencing (NGS) technology to examine the transcriptome of sorghum plants challenged with osmotic stress and exogenous abscisic acid (ABA) in order to elucidate genes and gene networks that contribute to sorghum's tolerance to water-limiting environments with a long-term aim of developing strategies to improve plant productivity under drought., Results: RNA-Seq results revealed transcriptional activity of 28,335 unique genes from sorghum root and shoot tissues subjected to polyethylene glycol (PEG)-induced osmotic stress or exogenous ABA. Differential gene expression analyses in response to osmotic stress and ABA revealed a strong interplay among various metabolic pathways including abscisic acid and 13-lipoxygenase, salicylic acid, jasmonic acid, and plant defense pathways. Transcription factor analysis indicated that groups of genes may be co-regulated by similar regulatory sequences to which the expressed transcription factors bind. We successfully exploited the data presented here in conjunction with published transcriptome analyses for rice, maize, and Arabidopsis to discover more than 50 differentially expressed, drought-responsive gene orthologs for which no function had been previously ascribed., Conclusions: The present study provides an initial assemblage of sorghum genes and gene networks regulated by osmotic stress and hormonal treatment. We are providing an RNA-Seq data set and an initial collection of transcription factors, which offer a preliminary look into the cascade of global gene expression patterns that arise in a drought tolerant crop subjected to abiotic stress. These resources will allow scientists to query gene expression and functional annotation in response to drought.
- Published
- 2011
- Full Text
- View/download PDF
7. Applications and methods utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for bioinformatics resource discovery and disparate data and service integration.
- Author
-
Nelson RT, Avraham S, Shoemaker RC, May GD, Ware D, and Gessler DD
- Abstract
Background: Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of data between information resources difficult and labor intensive. A recently described semantic web protocol, the Simple Semantic Web Architecture and Protocol (SSWAP; pronounced "swap") offers the ability to describe data and services in a semantically meaningful way. We report how three major information resources (Gramene, SoyBase and the Legume Information System [LIS]) used SSWAP to semantically describe selected data and web services., Methods: We selected high-priority Quantitative Trait Locus (QTL), genomic mapping, trait, phenotypic, and sequence data and associated services such as BLAST for publication, data retrieval, and service invocation via semantic web services. Data and services were mapped to concepts and categories as implemented in legacy and de novo community ontologies. We used SSWAP to express these offerings in OWL Web Ontology Language (OWL), Resource Description Framework (RDF) and eXtensible Markup Language (XML) documents, which are appropriate for their semantic discovery and retrieval. We implemented SSWAP services to respond to web queries and return data. These services are registered with the SSWAP Discovery Server and are available for semantic discovery at http://sswap.info., Results: A total of ten services delivering QTL information from Gramene were created. From SoyBase, we created six services delivering information about soybean QTLs, and seven services delivering genetic locus information. For LIS we constructed three services, two of which allow the retrieval of DNA and RNA FASTA sequences with the third service providing nucleic acid sequence comparison capability (BLAST)., Conclusions: The need for semantic integration technologies has preceded available solutions. We report the feasibility of mapping high priority data from local, independent, idiosyncratic data schemas to common shared concepts as implemented in web-accessible ontologies. These mappings are then amenable for use in semantic web services. Our implementation of approximately two dozen services means that biological data at three large information resources (Gramene, SoyBase, and LIS) is available for programmatic access, semantic searching, and enhanced interaction between the separate missions of these resources.
- Published
- 2010
- Full Text
- View/download PDF
8. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.
- Author
-
Kurtz S, Narechania A, Stein JC, and Ware D
- Subjects
- Computational Biology methods, Methods, Oryza, Software, Sorghum, Zea mays, DNA Transposable Elements, Genome, Plant, Genomics methods
- Abstract
Background: The challenges of accurate gene prediction and enumeration are further aggravated in large genomes that contain highly repetitive transposable elements (TEs). Yet TEs play a substantial role in genome evolution and are themselves an important subject of study. Repeat annotation, based on counting occurrences of k-mers, has been previously used to distinguish TEs from low-copy genic regions; but currently available software solutions are impractical due to high memory requirements or specialization for specific user-tasks., Results: Here we introduce the Tallymer software, a flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the k-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set of whole genome shotgun sequences from maize (B73) (total size 109 bp.). We analyzed k-mer frequencies for a wide range of k. At this low genome coverage ( approximately 0.45x) highly repetitive 20-mers constituted 44% of the genome but represented only 1% of all possible k-mers. Similar low-complexity was seen in the repeat fractions of sorghum and rice. When applying our method to other maize data sets, High-C0t derived sequences showed the greatest enrichment for low-copy sequences. Among annotated TEs, the most highly repetitive were of the Ty3/gypsy class of retrotransposons, followed by the Ty1/copia class, and DNA transposons. Among expressed sequence tags (EST), a notable fraction contained high-copy k-mers, suggesting that transposons are still active in maize. Retrotransposons in Mo17 and McC cultivars were readily detected using the B73 20-mer frequency index, indicating their conservation despite extensive rearrangement across cultivars. Among one hundred annotated bacterial artificial chromosomes (BACs), k-mer frequency could be used to detect transposon-encoded genes with 92% sensitivity, compared to 96% using alignment-based repeat masking, while both methods showed 92% specificity., Conclusion: The Tallymer software was effective in a variety of applications to aid genome annotation in maize, despite limitations imposed by the relatively low coverage of sequence available. For more information on the software, see http://www.zbh.uni-hamburg.de/Tallymer.
- Published
- 2008
- Full Text
- View/download PDF
9. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats.
- Author
-
Wicker T, Narechania A, Sabot F, Stein J, Vu GT, Graner A, Ware D, and Stein N
- Subjects
- Chromosome Mapping, Chromosomes, Artificial, Bacterial, DNA, Plant genetics, Genes, Plant, Genome, Plant, Hordeum genetics, Repetitive Sequences, Nucleic Acid, Sequence Analysis, DNA methods
- Abstract
Background: Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR) index can be generated to map repetitive regions in genomic sequences., Results: We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC) clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised., Conclusion: An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences) regions in uncharacterised genomic sequences. The restriction that a particular MDR index can not be used across species is outweighed by the low costs of Illumina/Solexa sequencing which makes any chosen genome accessible for whole-genome sequence sampling.
- Published
- 2008
- Full Text
- View/download PDF
10. Integration of hybridization-based markers (overgos) into physical maps for comparative and evolutionary explorations in the genus Oryza and in Sorghum.
- Author
-
Hass-Jacobus BL, Futrell-Griggs M, Abernathy B, Westerman R, Goicoechea JL, Stein J, Klein P, Hurwitz B, Zhou B, Rakhshan F, Sanyal A, Gill N, Lin JY, Walling JG, Luo MZ, Ammiraju JS, Kudrna D, Kim HR, Ware D, Wing RA, San Miguel P, and Jackson SA
- Subjects
- Chromosomes, Artificial, Bacterial genetics, DNA Fingerprinting, Evolution, Molecular, Gene Library, Sequence Alignment, Sequence Homology, Nucleic Acid, Species Specificity, Chromosome Mapping methods, Chromosomes, Plant genetics, DNA Probes, Genetic Markers, Genome, Plant, Nucleic Acid Hybridization, Oryza genetics, Sorghum genetics
- Abstract
Background: With the completion of the genome sequence for rice (Oryza sativa L.), the focus of rice genomics research has shifted to the comparison of the rice genome with genomes of other species for gene cloning, breeding, and evolutionary studies. The genus Oryza includes 23 species that shared a common ancestor 8-10 million years ago making this an ideal model for investigations into the processes underlying domestication, as many of the Oryza species are still undergoing domestication. This study integrates high-throughput, hybridization-based markers with BAC end sequence and fingerprint data to construct physical maps of rice chromosome 1 orthologues in two wild Oryza species. Similar studies were undertaken in Sorghum bicolor, a species which diverged from cultivated rice 40-50 million years ago., Results: Overgo markers, in conjunction with fingerprint and BAC end sequence data, were used to build sequence-ready BAC contigs for two wild Oryza species. The markers drove contig merges to construct physical maps syntenic to rice chromosome 1 in the wild species and provided evidence for at least one rearrangement on chromosome 1 of the O. sativa versus Oryza officinalis comparative map. When rice overgos were aligned to available S. bicolor sequence, 29% of the overgos aligned with three or fewer mismatches; of these, 41% gave positive hybridization signals. Overgo hybridization patterns supported colinearity of loci in regions of sorghum chromosome 3 and rice chromosome 1 and suggested that a possible genomic inversion occurred in this syntenic region in one of the two genomes after the divergence of S. bicolor and O. sativa., Conclusion: The results of this study emphasize the importance of identifying conserved sequences in the reference sequence when designing overgo probes in order for those probes to hybridize successfully in distantly related species. As interspecific markers, overgos can be used successfully to construct physical maps in species which diverged less than 8 million years ago, and can be used in a more limited fashion to examine colinearity among species which diverged as much as 40 million years ago. Additionally, overgos are able to provide evidence of genomic rearrangements in comparative physical mapping studies.
- Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.