22 results on '"Carson, M."'
Search Results
2. Functional annotation and meta-analysis of maize transcriptomes reveal genes involved in biotic and abiotic stress
- Author
-
Rita K Hayford, Olivia C Haley, Ethalinda K Cannon, John L Portwood, Jack M Gardiner, Carson M Andorf, and Margaret R Woodhouse
- Subjects
Differentially expressed genes ,Maize ,RNA-Sequencing ,Transcription factors ,Gene Ontology ,Abiotic stress ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Environmental stress factors, such as biotic and abiotic stress, are becoming more common due to climate variability, significantly affecting global maize yield. Transcriptome profiling studies provide insights into the molecular mechanisms underlying stress response in maize, though the functions of many genes are still unknown. To enhance the functional annotation of maize-specific genes, MaizeGDB has outlined a data-driven approach with an emphasis on identifying genes and traits related to biotic and abiotic stress. Results We mapped high-quality RNA-Seq expression reads from 24 different publicly available datasets (17 abiotic and seven biotic studies) generated from the B73 cultivar to the recent version of the reference genome B73 (B73v5) and deduced stress-related functional annotation of maize gene models. We conducted a robust meta-analysis of the transcriptome profiles from the datasets to identify maize loci responsive to stress, identifying 3,230 differentially expressed genes (DEGs): 2,555 DEGs regulated in response to abiotic stress, 408 DEGs regulated during biotic stress, and 267 common DEGs (co-DEGs) that overlap between abiotic and biotic stress. We discovered hub genes from network analyses, and among the hub genes of the co-DEGs we identified a putative NAC domain transcription factor superfamily protein (Zm00001eb369060) IDP275, which previously responded to herbivory and drought stress. IDP275 was up-regulated in our analysis in response to eight different abiotic and four different biotic stresses. A gene set enrichment and pathway analysis of hub genes of the co-DEGs revealed hormone-mediated signaling processes and phenylpropanoid biosynthesis pathways, respectively. Using phylostratigraphic analysis, we also demonstrated how abiotic and biotic stress genes differentially evolve to adapt to changing environments. Conclusions These results will help facilitate the functional annotation of multiple stress response gene models and annotation in maize. Data can be accessed and downloaded at the Maize Genetics and Genomics Database (MaizeGDB).
- Published
- 2024
- Full Text
- View/download PDF
3. Co-expression pan-network reveals genes involved in complex traits within maize pan-genome
- Author
-
Cagirici, H Busra, Andorf, Carson M, and Sen, Taner Z
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,Pediatric Research Initiative ,Biotechnology ,Aetiology ,2.1 Biological and endogenous factors ,Zea mays ,Genome-Wide Association Study ,Multifactorial Inheritance ,Phenotype ,Gene Regulatory Networks ,Polymorphism ,Single Nucleotide ,Co-expression network ,Pan-network ,Maize ,Pan-genome ,GWAS ,Complex traits ,Tassel branch number ,Starch ,Microbiology ,Plant Biology ,Crop and Pasture Production ,Plant Biology & Botany ,Crop and pasture production ,Plant biology - Abstract
BackgroundWith the advances in the high throughput next generation sequencing technologies, genome-wide association studies (GWAS) have identified a large set of variants associated with complex phenotypic traits at a very fine scale. Despite the progress in GWAS, identification of genotype-phenotype relationship remains challenging in maize due to its nature with dozens of variants controlling the same trait. As the causal variations results in the change in expression, gene expression analyses carry a pivotal role in unraveling the transcriptional regulatory mechanisms behind the phenotypes.ResultsTo address these challenges, we incorporated the gene expression and GWAS-driven traits to extend the knowledge of genotype-phenotype relationships and transcriptional regulatory mechanisms behind the phenotypes. We constructed a large collection of gene co-expression networks and identified more than 2 million co-expressing gene pairs in the GWAS-driven pan-network which contains all the gene-pairs in individual genomes of the nested association mapping (NAM) population. We defined four sub-categories for the pan-network: (1) core-network contains the highest represented ~ 1% of the gene-pairs, (2) near-core network contains the next highest represented 1-5% of the gene-pairs, (3) private-network contains ~ 50% of the gene pairs that are unique to individual genomes, and (4) the dispensable-network contains the remaining 50-95% of the gene-pairs in the maize pan-genome. Strikingly, the private-network contained almost all the genes in the pan-network but lacked half of the interactions. We performed gene ontology (GO) enrichment analysis for the pan-, core-, and private- networks and compared the contributions of variants overlapping with genes and promoters to the GWAS-driven pan-network.ConclusionsGene co-expression networks revealed meaningful information about groups of co-regulated genes that play a central role in regulatory processes. Pan-network approach enabled us to visualize the global view of the gene regulatory network for the studied system that could not be well inferred by the core-network alone.
- Published
- 2022
4. Tissue-specific gene expression and protein abundance patterns are associated with fractionation bias in maize
- Author
-
Walsh, Jesse R, Woodhouse, Margaret R, Andorf, Carson M, and Sen, Taner Z
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Biotechnology ,Human Genome ,Generic health relevance ,Chromosome Mapping ,Evolution ,Molecular ,Gene Duplication ,Gene Expression ,Gene Expression Regulation ,Plant ,Gene Ontology ,Genes ,Plant ,Genome ,Plant ,Phylogeny ,Plant Proteins ,Pollen ,Polyploidy ,Zea mays ,Subgenome ,Gene expression ,Protein abundance ,Maize ,Functional divergence ,Microbiology ,Plant Biology ,Crop and Pasture Production ,Plant Biology & Botany ,Crop and pasture production ,Plant biology - Abstract
BACKGROUND:Maize experienced a whole-genome duplication event approximately 5 to 12 million years ago. Because this event occurred after speciation from sorghum, the pre-duplication subgenomes can be partially reconstructed by mapping syntenic regions to the sorghum chromosomes. During evolution, maize has had uneven gene loss between each ancient subgenome. Fractionation and divergence between these genomes continue today, constantly changing genetic make-up and phenotypes and influencing agronomic traits. RESULTS:Here we regenerate the subgenome reconstructions for the most recent maize reference genome assembly. Based on both expression and abundance data for homeologous gene pairs across multiple tissues, we observed functional divergence of genes across subgenomes. Although the genes in the larger maize subgenome are often expressing more highly than their homeologs in the smaller subgenome, we observed cases where homeolog expression dominance switches in different tissues. We demonstrate for the first time that protein abundances are higher in the larger subgenome, but they also show tissue-specific dominance, a pattern similar to RNA expression dominance. We also find that pollen expression is uniquely decoupled from protein abundance. CONCLUSION:Our study shows that the larger subgenome has a greater range of functional assignments and that there is a relative lack of overlap between the subgenomes in terms of gene functions than would be suggested by similar patterns of gene expression and protein abundance. Our study also revealed that some reactions are catalyzed uniquely by the larger and smaller subgenomes. The tissue-specific, nonequivalent expression-level dominance pattern observed here implies a change in regulatory control which favors differentiated selective pressure on the retained duplicates leading to eventual change in gene functions.
- Published
- 2020
5. Co-expression pan-network reveals genes involved in complex traits within maize pan-genome
- Author
-
H. Busra Cagirici, Carson M. Andorf, and Taner Z. Sen
- Subjects
Co-expression network ,Pan-network ,Maize ,Pan-genome ,GWAS ,Complex traits ,Botany ,QK1-989 - Abstract
Abstract Background With the advances in the high throughput next generation sequencing technologies, genome-wide association studies (GWAS) have identified a large set of variants associated with complex phenotypic traits at a very fine scale. Despite the progress in GWAS, identification of genotype-phenotype relationship remains challenging in maize due to its nature with dozens of variants controlling the same trait. As the causal variations results in the change in expression, gene expression analyses carry a pivotal role in unraveling the transcriptional regulatory mechanisms behind the phenotypes. Results To address these challenges, we incorporated the gene expression and GWAS-driven traits to extend the knowledge of genotype-phenotype relationships and transcriptional regulatory mechanisms behind the phenotypes. We constructed a large collection of gene co-expression networks and identified more than 2 million co-expressing gene pairs in the GWAS-driven pan-network which contains all the gene-pairs in individual genomes of the nested association mapping (NAM) population. We defined four sub-categories for the pan-network: (1) core-network contains the highest represented ~ 1% of the gene-pairs, (2) near-core network contains the next highest represented 1–5% of the gene-pairs, (3) private-network contains ~ 50% of the gene pairs that are unique to individual genomes, and (4) the dispensable-network contains the remaining 50–95% of the gene-pairs in the maize pan-genome. Strikingly, the private-network contained almost all the genes in the pan-network but lacked half of the interactions. We performed gene ontology (GO) enrichment analysis for the pan-, core-, and private- networks and compared the contributions of variants overlapping with genes and promoters to the GWAS-driven pan-network. Conclusions Gene co-expression networks revealed meaningful information about groups of co-regulated genes that play a central role in regulatory processes. Pan-network approach enabled us to visualize the global view of the gene regulatory network for the studied system that could not be well inferred by the core-network alone.
- Published
- 2022
- Full Text
- View/download PDF
6. A pan-genomic approach to genome databases using maize as a model system
- Author
-
Margaret R. Woodhouse, Ethalinda K. Cannon, John L. Portwood, Lisa C. Harper, Jack M. Gardiner, Mary L. Schaeffer, and Carson M. Andorf
- Subjects
Databases ,Genomes ,Maize ,Pan-genome ,Nomenclature ,Browsers ,Botany ,QK1-989 - Abstract
Abstract Research in the past decade has demonstrated that a single reference genome is not representative of a species’ diversity. MaizeGDB introduces a pan-genomic approach to hosting genomic data, leveraging the large number of diverse maize genomes and their associated datasets to quickly and efficiently connect genomes, gene models, expression, epigenome, sequence variation, structural variation, transposable elements, and diversity data across genomes so that researchers can easily track the structural and functional differences of a locus and its orthologs across maize. We believe our framework is unique and provides a template for any genomic database poised to host large-scale pan-genomic data.
- Published
- 2021
- Full Text
- View/download PDF
7. Tissue-specific gene expression and protein abundance patterns are associated with fractionation bias in maize
- Author
-
Jesse R. Walsh, Margaret R. Woodhouse, Carson M. Andorf, and Taner Z. Sen
- Subjects
Subgenome ,Gene expression ,Protein abundance ,Maize ,Functional divergence ,Botany ,QK1-989 - Abstract
Abstract Background Maize experienced a whole-genome duplication event approximately 5 to 12 million years ago. Because this event occurred after speciation from sorghum, the pre-duplication subgenomes can be partially reconstructed by mapping syntenic regions to the sorghum chromosomes. During evolution, maize has had uneven gene loss between each ancient subgenome. Fractionation and divergence between these genomes continue today, constantly changing genetic make-up and phenotypes and influencing agronomic traits. Results Here we regenerate the subgenome reconstructions for the most recent maize reference genome assembly. Based on both expression and abundance data for homeologous gene pairs across multiple tissues, we observed functional divergence of genes across subgenomes. Although the genes in the larger maize subgenome are often expressing more highly than their homeologs in the smaller subgenome, we observed cases where homeolog expression dominance switches in different tissues. We demonstrate for the first time that protein abundances are higher in the larger subgenome, but they also show tissue-specific dominance, a pattern similar to RNA expression dominance. We also find that pollen expression is uniquely decoupled from protein abundance. Conclusion Our study shows that the larger subgenome has a greater range of functional assignments and that there is a relative lack of overlap between the subgenomes in terms of gene functions than would be suggested by similar patterns of gene expression and protein abundance. Our study also revealed that some reactions are catalyzed uniquely by the larger and smaller subgenomes. The tissue-specific, nonequivalent expression-level dominance pattern observed here implies a change in regulatory control which favors differentiated selective pressure on the retained duplicates leading to eventual change in gene functions.
- Published
- 2020
- Full Text
- View/download PDF
8. A pan-genomic approach to genome databases using maize as a model system
- Author
-
Woodhouse, Margaret R., Cannon, Ethalinda K., Portwood, II, John L., Harper, Lisa C., Gardiner, Jack M., Schaeffer, Mary L., and Andorf, Carson M.
- Published
- 2021
- Full Text
- View/download PDF
9. MaizeMine: A Data Mining Warehouse for the Maize Genetics and Genomics Database
- Author
-
Md Shamimuzzaman, Jack M. Gardiner, Amy T. Walsh, Deborah A. Triant, Justin J. Le Tourneau, Aditi Tayal, Deepak R. Unni, Hung N. Nguyen, John L. Portwood, Ethalinda K. S. Cannon, Carson M. Andorf, and Christine G. Elsik
- Subjects
data mining ,genome database ,InterMine ,maize ,Zea mays ,Plant culture ,SB1-1110 - Abstract
MaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.
- Published
- 2020
- Full Text
- View/download PDF
10. Maize GO Annotation—Methods, Evaluation, and Review (maize‐GAMER)
- Author
-
Kokulapalan Wimalanathan, Iddo Friedberg, Carson M. Andorf, and Carolyn J. Lawrence‐Dill
- Subjects
functional annotation ,gene ontology ,genomics ,GO ,maize ,Botany ,QK1-989 - Abstract
Abstract We created a new high‐coverage, robust, and reproducible functional annotation of maize protein‐coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein‐coding genes, respectively, this study provides annotations for 100% of the genes. We also compared the quality of our newly derived annotations with the existing Gramene and Phytozome functional annotation sets by comparing all three to a manually annotated gold standard set of 1,619 genes where annotations were primarily inferred from direct assay or mutant phenotype. Evaluations based on the gold standard indicate that our new annotation set is measurably more accurate than those from Phytozome and Gramene. To derive this new high‐coverage, high‐confidence annotation set, we used sequence similarity and protein domain presence methods as well as mixed‐method pipelines that were developed for the Critical Assessment of Function Annotation (CAFA) challenge. Our project to improve maize annotations is called maize‐GAMER (GO Annotation Method, Evaluation, and Review), and the newly derived annotations are accessible via MaizeGDB (http://download.maizegdb.org/maize-GAMER) and CyVerse (B73 RefGen_v3 5b+ at doi.org/10.7946/P2S62P and B73 RefGen_v4 Zm00001d.2 at doi.org/10.7946/P2M925).
- Published
- 2018
- Full Text
- View/download PDF
11. A pan-genomic approach to genome databases using maize as a model system
- Author
-
Ethalinda K. S. Cannon, Carson M. Andorf, Mary L. Schaeffer, Lisa C. Harper, John L. Portwood, Margaret R. Woodhouse, and Jack M. Gardiner
- Subjects
Transposable element ,Browsers ,Locus (genetics) ,Plant Science ,Computational biology ,Biology ,NAM founders ,Pan-genome ,Genome ,Zea mays ,Structural variation ,Database ,Databases ,Genomes ,Gene ,Nomenclature ,Data Collection ,Botany ,Genetic Variation ,Epigenome ,Genomics ,Data Accuracy ,Maize ,Databases as Topic ,QK1-989 ,human activities ,Genome, Plant ,Reference genome - Abstract
Research in the past decade has demonstrated that a single reference genome is not representative of a species’ diversity. MaizeGDB introduces a pan-genomic approach to hosting genomic data, leveraging the large number of diverse maize genomes and their associated datasets to quickly and efficiently connect genomes, gene models, expression, epigenome, sequence variation, structural variation, transposable elements, and diversity data across genomes so that researchers can easily track the structural and functional differences of a locus and its orthologs across maize. We believe our framework is unique and provides a template for any genomic database poised to host large-scale pan-genomic data. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03173-5.
- Published
- 2021
12. Tissue-specific gene expression and protein abundance patterns are associated with fractionation bias in maize
- Author
-
Carson M. Andorf, Margaret R. Woodhouse, Taner Z. Sen, and Jesse R. Walsh
- Subjects
0106 biological sciences ,0301 basic medicine ,Protein abundance ,Gene Expression ,Plant Science ,Biology ,Genes, Plant ,01 natural sciences ,Genome ,Zea mays ,Evolution, Molecular ,Polyploidy ,03 medical and health sciences ,Gene Expression Regulation, Plant ,lcsh:Botany ,Gene Duplication ,Gene duplication ,Gene expression ,Functional divergence ,Subgenome ,Gene ,Phylogeny ,Synteny ,Plant Proteins ,Tissue-Specific Gene Expression ,Chromosome Mapping ,lcsh:QK1-989 ,Maize ,030104 developmental biology ,Gene Ontology ,Evolutionary biology ,Pollen ,Genome, Plant ,010606 plant biology & botany ,Reference genome ,Research Article - Abstract
BackgroundMaize experienced a whole-genome duplication event approximately 5 to 12 million years ago. Because this event occurred after speciation from sorghum, the pre-duplication subgenomes can be partially reconstructed by mapping syntenic regions to the sorghum chromosomes. During evolution, maize has had uneven gene loss between each ancient subgenome. Fractionation and divergence between these genomes continue today, constantly changing genetic make-up and phenotypes and influencing agronomic traits.ResultsHere we regenerate the subgenome reconstructions for the most recent maize reference genome assembly. Based on both expression and abundance data for homeologous gene pairs across multiple tissues, we observed functional divergence of genes across subgenomes. Although the genes in the larger maize subgenome are often expressing more highly than their homeologs in the smaller subgenome, we observed cases where homeolog expression dominance switches in different tissues. We demonstrate for the first time that protein abundances are higher in the larger subgenome, but they also show tissue-specific dominance, a pattern similar to RNA expression dominance. We also find that pollen expression is uniquely decoupled from protein abundance.ConclusionOur study shows that the larger subgenome has a greater range of functional assignments and that there is a relative lack of overlap between the subgenomes in terms of gene functions than would be suggested by similar patterns of gene expression and protein abundance. Our study also revealed that some reactions are catalyzed uniquely by the larger and smaller subgenomes. The tissue-specific, nonequivalent expression-level dominance pattern observed here implies a change in regulatory control which favors differentiated selective pressure on the retained duplicates leading to eventual change in gene functions.
- Published
- 2020
13. Association mapping across a multitude of traits collected in diverse environments in maize.
- Author
-
Mural, Ravi V, Sun, Guangchao, Grzybowski, Marcin, Tross, Michael C, Jin, Hongyu, Smith, Christine, Newton, Linsey, Andorf, Carson M, Woodhouse, Margaret R, Thompson, Addie M, Sigmon, Brandi, and Schnable, James C
- Subjects
GENOTYPE-environment interaction ,PLANT genetics ,GENETIC variation ,QUANTITATIVE genetics ,GENOME-wide association studies - Abstract
Classical genetic studies have identified many cases of pleiotropy where mutations in individual genes alter many different phenotypes. Quantitative genetic studies of natural genetic variants frequently examine one or a few traits, limiting their potential to identify pleiotropic effects of natural genetic variants. Widely adopted community association panels have been employed by plant genetics communities to study the genetic basis of naturally occurring phenotypic variation in a wide range of traits. High-density genetic marker data—18M markers—from 2 partially overlapping maize association panels comprising 1,014 unique genotypes grown in field trials across at least 7 US states and scored for 162 distinct trait data sets enabled the identification of of 2,154 suggestive marker-trait associations and 697 confident associations in the maize genome using a resampling-based genome-wide association strategy. The precision of individual marker-trait associations was estimated to be 3 genes based on a reference set of genes with known phenotypes. Examples were observed of both genetic loci associated with variation in diverse traits (e.g. above-ground and below-ground traits), as well as individual loci associated with the same or similar traits across diverse environments. Many significant signals are located near genes whose functions were previously entirely unknown or estimated purely via functional data on homologs. This study demonstrates the potential of mining community association panel data using new higher-density genetic marker sets combined with resampling-based genome-wide association tests to develop testable hypotheses about gene functions, identify potential pleiotropic effects of natural genetic variants, and study genotype-by-environment interaction. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. Maize GO Annotation—Methods, Evaluation, and Review (maize‐GAMER)
- Author
-
Carson M. Andorf, Carolyn J. Lawrence-Dill, Iddo Friedberg, and Kokulapalan Wimalanathan
- Subjects
0301 basic medicine ,Computer science ,Genomics ,Plant Science ,Computational biology ,maize ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,Set (abstract data type) ,03 medical and health sciences ,Annotation ,genomics ,Critical Assessment of Function Annotation ,Ecology, Evolution, Behavior and Systematics ,Original Research ,2. Zero hunger ,030102 biochemistry & molecular biology ,Ecology ,Gene ontology ,Botany ,Mutant phenotype ,functional annotation ,030104 developmental biology ,Direct assay ,Functional annotation ,GO ,QK1-989 ,gene ontology - Abstract
We created a new high‐coverage, robust, and reproducible functional annotation of maize protein‐coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein‐coding genes, respectively, this study provides annotations for 100% of the genes. We also compared the quality of our newly derived annotations with the existing Gramene and Phytozome functional annotation sets by comparing all three to a manually annotated gold standard set of 1,619 genes where annotations were primarily inferred from direct assay or mutant phenotype. Evaluations based on the gold standard indicate that our new annotation set is measurably more accurate than those from Phytozome and Gramene. To derive this new high‐coverage, high‐confidence annotation set, we used sequence similarity and protein domain presence methods as well as mixed‐method pipelines that were developed for the Critical Assessment of Function Annotation (CAFA) challenge. Our project to improve maize annotations is called maize‐GAMER (GO Annotation Method, Evaluation, and Review), and the newly derived annotations are accessible via MaizeGDB (http://download.maizegdb.org/maize-GAMER) and CyVerse (B73 RefGen_v3 5b+ at doi.org/10.7946/P2S62P and B73 RefGen_v4 Zm00001d.2 at doi.org/10.7946/P2M925).
- Published
- 2018
15. Crowdsourcing Image Analysis for Plant Phenomics to Generate Ground Truth Data for Machine Learning
- Author
-
Nigel Lee, Jonathan W. Kelly, Zachary D. Siegel, Carson M. Andorf, Dan Nettleton, Scott Zarecor, Iddo Friedberg, Baskar Ganapathysubramanian, Darwin A. Campbell, Naihui Zhou, and Carolyn J. Lawrence-Dill
- Subjects
0301 basic medicine ,0106 biological sciences ,Computer science ,Image Processing ,Pilot Projects ,Plant Science ,Plant Genetics ,computer.software_genre ,01 natural sciences ,Field (computer science) ,Food Supply ,Task (project management) ,Machine Learning ,0302 clinical medicine ,Image Processing, Computer-Assisted ,lcsh:QH301-705.5 ,media_common ,2. Zero hunger ,0303 health sciences ,Ground truth ,Ecology ,Applied Mathematics ,Simulation and Modeling ,Eukaryota ,Agriculture ,Plants ,Data Accuracy ,Phenotypes ,Phenotype ,Computational Theory and Mathematics ,Experimental Organism Systems ,Research Design ,Modeling and Simulation ,Physical Sciences ,Engineering and Technology ,Crowdsourcing ,The Internet ,Algorithms ,Research Article ,Crops, Agricultural ,Computer and Information Sciences ,Best practice ,media_common.quotation_subject ,Crops ,Context (language use) ,Research and Analysis Methods ,Machine learning ,Cellular and Molecular Neuroscience ,Machine Learning Algorithms ,03 medical and health sciences ,Model Organisms ,Plant and Algal Models ,Artificial Intelligence ,Genetics ,Humans ,Quality (business) ,Grasses ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Crop Genetics ,Internet ,business.industry ,Organisms ,Biology and Life Sciences ,Pilot Studies ,15. Life on land ,Maize ,030104 developmental biology ,lcsh:Biology (General) ,Data quality ,Signal Processing ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery ,Mathematics ,Crop Science ,010606 plant biology & botany - Abstract
The accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets., Author summary Food security is a growing global concern. Farmers, plant breeders, and geneticists are hastening to address the challenges presented to agriculture by climate change, dwindling arable land, and population growth. Scientists in the field of plant phenomics are using satellite and drone images to understand how crops respond to a changing environment and to combine genetics and environmental measures to maximize crop growth efficiency. However, the terabytes of image data require new computational methods to extract useful information. Machine learning algorithms are effective in recognizing select parts of images, but they require high quality data curated by people to train them, a process that can be laborious and costly. We examined how well crowdsourcing works in providing training data for plant phenomics, specifically, segmenting a corn tassel—the male flower of the corn plant—from the often-cluttered images of a cornfield. We provided images to students, and to Amazon MTurkers, the latter being an on-demand workforce brokered by Amazon.com and paid on a task-by-task basis. We report on best practices in crowdsourcing image labeling for phenomics, and compare the different groups on measures such as fatigue and accuracy over time. We find that crowdsourcing is a good way of generating quality labeled data, rivaling that of experts.
- Published
- 2018
16. G-Quadruplex (G4) Motifs in the Maize (Zea mays L.) Genome Are Enriched at Specific Locations in Thousands of Genes Coupled to Energy Status, Hypoxia, Low Sugar, and Nutrient Deprivation
- Author
-
M. Elizabeth Stroupe, Drena Dobbs, Hank W. Bass, Carolyn J. Lawrence, Carson M. Andorf, Karen E. Koch, and Mykhailo Kopylov
- Subjects
0106 biological sciences ,Immunoglobulin gene ,DNA, Plant ,Biology ,Genes, Plant ,Zea mays ,01 natural sciences ,Genome ,03 medical and health sciences ,chemistry.chemical_compound ,Oxygen Consumption ,G4 ,Gene Expression Regulation, Plant ,Transcription (biology) ,Genetics ,Hypoxia ,3' Untranslated Regions ,Molecular Biology ,Gene ,030304 developmental biology ,Regulator gene ,2. Zero hunger ,Regulation of gene expression ,0303 health sciences ,Sucrose synthase ,G-quadruplex ,Models, Genetic ,Circular Dichroism ,Maize ,G-Quadruplexes ,chemistry ,Carbohydrate Metabolism ,Energy Metabolism ,Sequence motif ,Genome, Plant ,Metabolic Networks and Pathways ,DNA ,010606 plant biology & botany - Abstract
The G-quadruplex (G4) elements comprise a class of nucleic acid structures formed by stacking of guanine base quartets in a quadruple helix. This G4 DNA can form within or across single-stranded DNA molecules and is mutually exclusive with duplex B-form DNA. The reversibility and structural diversity of G4s make them highly versatile genetic structures, as demonstrated by their roles in various functions including telomere metabolism, genome maintenance, immunoglobulin gene diversification, transcription, and translation. Sequence motifs capable of forming G4 DNA are typically located in telomere repeat DNA and other non-telomeric genomic loci. To investigate their potential roles in a large-genome model plant species, we computationally identified 149,988 non-telomeric G4 motifs in maize (Zea mays L., B73 AGPv2), 29% of which were in non-repetitive genomic regions. G4 motif hotspots exhibited non-random enrichment in genes at two locations on the antisense strand, one in the 5′ UTR and the other at the 5′ end of the first intron. Several genic G4 motifs were shown to adopt sequence-specific and potassium-dependent G4 DNA structures in vitro. The G4 motifs were prevalent in key regulatory genes associated with hypoxia (group VII ERFs), oxidative stress (DJ-1/GATase1), and energy status (AMPK/SnRK) pathways. They also showed statistical enrichment for genes in metabolic pathways that function in glycolysis, sugar degradation, inositol metabolism, and base excision repair. Collectively, the maize G4 motifs may represent conditional regulatory elements that can aid in energy status gene responses. Such a network of elements could provide a mechanistic basis for linking energy status signals to gene regulation in maize, a model genetic system and major world crop species for feed, food, and fuel.
- Published
- 2014
17. MaizeMine: A Data Mining Warehouse for the Maize Genetics and Genomics Database.
- Author
-
Shamimuzzaman, Md, Gardiner, Jack M., Walsh, Amy T., Triant, Deborah A., Le Tourneau, Justin J., Tayal, Aditi, Unni, Deepak R., Nguyen, Hung N., Portwood II, John L., Cannon, Ethalinda K. S., Andorf, Carson M., and Elsik, Christine G.
- Subjects
GENETIC databases ,DATA warehousing ,DATA mining ,CORN ,SINGLE nucleotide polymorphisms - Abstract
MaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
18. Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.
- Author
-
Zhou, Naihui, Siegel, Zachary D., Zarecor, Scott, Lee, Nigel, Campbell, Darwin A., Andorf, Carson M., Nettleton, Dan, Lawrence-Dill, Carolyn J., Ganapathysubramanian, Baskar, Kelly, Jonathan W., and Friedberg, Iddo
- Subjects
CROWDSOURCING ,MACHINE learning ,IMAGE analysis ,DATA quality ,SCHOOL credits - Abstract
The accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. Enhanced pan-genomic resources at the maize genetics and genomics database.
- Author
-
Cannon, Ethalinda K, Portwood, John L, Hayford, Rita K, Haley, Olivia C, Gardiner, Jack M, Andorf, Carson M, and Woodhouse, Margaret R
- Subjects
- *
DATABASES , *PROTEINS , *BIOLOGICAL models , *GENOMICS , *CORN , *RESEARCH funding , *GENE expression , *PROTEOMICS , *MEDICAL research , *GENETICS , *GENOMES , *BIOMARKERS , *SEQUENCE analysis - Abstract
Pan-genomes, encompassing the entirety of genetic sequences found in a collection of genomes within a clade, are more useful than single reference genomes for studying species diversity. This is especially true for a species like Zea mays , which has a particularly diverse and complex genome. Presenting pan-genome data, analyses, and visualization is challenging, especially for a diverse species, but more so when pan-genomic data is linked to extensive gene model and gene data, including classical gene information, markers, insertions, expression and proteomic data, and protein structures as is the case at MaizeGDB. Here, we describe MaizeGDB's expansion to include the genic subset of the Zea pan-genome in a pan-gene data center featuring the maize genomes hosted at MaizeGDB, and the outgroup teosinte Zea genomes from the Pan-Andropoganeae project. The new data center offers a variety of browsing and visualization tools, including sequence alignment visualization, gene trees and other tools, to explore pan-genes in Zea that were calculated by the pipeline Pandagma. Combined, these data will help maize researchers study the complexity and diversity of Zea , and to use the comparative functions to validate pan-gene relationships for a selected gene model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Maize GO Annotation—Methods, Evaluation, and Review (maize‐GAMER).
- Author
-
Wimalanathan, Kokulapalan, Friedberg, Iddo, Andorf, Carson M., and Lawrence‐Dill, Carolyn J.
- Abstract
We created a new high‐coverage, robust, and reproducible functional annotation of maize protein‐coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein‐coding genes, respectively, this study provides annotations for 100% of the genes. We also compared the quality of our newly derived annotations with the existing Gramene and Phytozome functional annotation sets by comparing all three to a manually annotated gold standard set of 1,619 genes where annotations were primarily inferred from direct assay or mutant phenotype. Evaluations based on the gold standard indicate that our new annotation set is measurably more accurate than those from Phytozome and Gramene. To derive this new high‐coverage, high‐confidence annotation set, we used sequence similarity and protein domain presence methods as well as mixed‐method pipelines that were developed for the Critical Assessment of Function Annotation (CAFA) challenge. Our project to improve maize annotations is called maize‐GAMER (GO Annotation Method, Evaluation, and Review), and the newly derived annotations are accessible via MaizeGDB (http://download.maizegdb.org/maize-GAMER) and CyVerse (B73 RefGen_v3 5b+ at doi.org/10.7946/P2S62P and B73 RefGen_v4 Zm00001d.2 at doi.org/10.7946/P2M925). [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
21. Maize protein structure resources at the maize genetics and genomics database.
- Author
-
Woodhouse, Margaret R., Portwood II, John L., Sen, Shatabdi, Hayford, Rita K., Gardiner, Jack M., Cannon, Ethalinda K., Harper, Lisa C., and Andorf, Carson M.
- Subjects
- *
DATABASES , *GENETICS , *CORN , *PLANT proteins , *GENOMICS , *MOLECULAR structure - Abstract
Protein structures play an important role in bioinformatics, such as in predicting gene function or validating gene model annotation. However, determining protein structure was, until now, costly and time-consuming, which resulted in a structural biology bottleneck. With the release of such programs AlphaFold and ESMFold, this bottleneck has been reduced by several orders of magnitude, permitting protein structural comparisons of entire genomes within reasonable timeframes. MaizeGDB has leveraged this technological breakthrough by offering several new tools to accelerate protein structural comparisons between maize and other plants as well as human and yeast outgroups. MaizeGDB also offers bulk downloads of these comparative protein structure data, along with predicted functional annotation information. In this way, MaizeGDB is poised to assist maize researchers in assessing functional homology, gene model annotation quality, and other information unavailable to maize scientists even a few years ago. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Rapid method of evaluating maize for sheath-collar feeding resistance to the European corn borer (Lepidoptera: Pyralidae)
- Author
-
Carson, M. A., Sisco, R., Meehan, M. E., and Kaster, L. V.
- Subjects
CORN - Published
- 1991
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.