Author: "F Alex Feltus" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"F Alex Feltus"' showing total 92 results

Start Over Author "F Alex Feltus"

92 results on '"F Alex Feltus"'

51. Exploring Lossy Compression of Gene Expression Matrices

Author: Alexandra Poulos, M. Reed Bender, Coleman B. McKnight, F. Alex Feltus, and Jon Calhoun
Subjects: Lossless compression, Physics::Instrumentation and Detectors, Computer science, Data_CODINGANDINFORMATIONTHEORY, Lossy compression, computer.software_genre, Data type, Domain (software engineering), Reduction (complexity), Workflow, Compression (functional analysis), Compression ratio, Data mining, computer
Abstract: Gene Expression Matrices (GEMs) are a fundamental data type in the genomics domain. As the size and scope of genomics experiments increase, researchers are struggling to process large GEMs through downstream workflows with currently accepted practices. In this paper, we propose a methodology to reduce the size of GEMs using multiple approaches. Our method partitions data into discrete fields based on data type and employs state-of-the-art lossless and lossy compression algorithms to reduce the input data size. This work explores a variety of lossless and lossy compression methods to determine which methods work the best for each component of a GEM. We evaluate the accuracy of the compressed GEMs by running them through the Knowledge Independent Network Construction (KINC) workflow and comparing the quality of the resulting gene co-expression network with a lossless control to verify result fidelity. Results show that utilizing a combination of lossy and lossless compression results in compression ratios up to 9.77× on a Yeast GEM, while still preserving the biological integrity of the data. Usage of the compression methodology on the Cancer Cell Line Encyclopedia(CCLE) GEM resulted in compression ratios up to 9.26×. By using this methodology, researchers in the Genomics domain may be able to process previously inaccessible GEMs while realizing significant reduction in computational costs.
Published: 2019
Full Text: View/download PDF

52. Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases

Author: Ming Chen, F. Alex Feltus, Jill L. Wegrzyn, Margaret Staton, Helena Rasche, Abdullah Almsaeed, Lacey-Anne Sanderson, Shawna Spoor, Chun-Huai Cheng, Kirstin E. Bett, Stephen P. Ficklin, Sook Jung, Anthony Bretaudeau, Bradford Condon, Dorrie Main, Washington State University (WSU), University of Saskatchewan, University of Tennessee, Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Institut National de la Recherche Agronomique (INRA)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-AGROCAMPUS OUEST, University of Freiburg [Freiburg], University of Connecticut (UCONN), Clemson University, BRE060, Saskatchewan Pulse Growers, 8302, Genome Canada, USDA-ARS, U.S. Dry Pea and Lentil Council, Northern Pulse Growers, Cotton Incorporated, Washington Tree Fruit Research Commission, 1443040, NSF DIBBs, 1444573, NSF PGRP, USDA NIFA NRSP10, 2014-51181-2237, USDA NIFA SCRI, University of Saskatchewan [Saskatoon] (U of S), Institut National de la Recherche Agronomique (INRA)-Université de Rennes (UR)-AGROCAMPUS OUEST, and Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
Subjects: Computer science, Biological database, Ontology (information science), computer.software_genre, Data type, General Biochemistry, Genetics and Molecular Biology, 03 medical and health sciences, Resource (project management), open science, Databases, Genetic, ontology, database, ComputingMilieux_MISCELLANEOUS, ontologie, 030304 developmental biology, base de données, 0303 health sciences, Internet, [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB], Database, Community engagement, Application programming interface, Information Dissemination, 030302 biochemistry & molecular biology, Genomics, Online community, Biota, Data sharing, Original Article, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], General Agricultural and Biological Sciences, Transcriptome, computer, Software, Information Systems
Abstract: Community biological databases provide an important online resource for both public and private data, analysis tools and community engagement. These sites house genomic, transcriptomic, genetic, breeding and ancillary data for specific species, families or clades. Due to the complexity and increasing quantities of these data, construction of online resources is increasingly difficult especially with limited funding and access to technical expertise. Furthermore, online repositories are expected to promote FAIR data principles (findable, accessible, interoperable and reusable) that presents additional challenges. The open-source Tripal database toolkit seeks to mitigate these challenges by creating both the software and an interactive community of developers for construction of online community databases. Additionally, through coordinated, distributed co-development, Tripal sites encourage community-wide sustainability. Here, we report the release of Tripal version 3 that improves data accessibility and data sharing through systematic use of controlled vocabularies (CVs). Tripal uses the community-developed Chado database as a default data store, but now provides tools to support other data stores, while ensuring that CVs remain the central organizational structure for the data. A new site developer can use Tripal to develop a basic site with little to no programming, with the ability to integrate other data types using extension modules and the Tripal application programming interface. A thorough online User’s Guide and Developer’s Handbook are available at http://tripal.info, providing download, installation and step-by-step setup instructions.
Published: 2019
Full Text: View/download PDF

53. Moving Just Enough Deep Sequencing Data to Get the Job Done

Author: Ethan M. Bensman, William L. Poehlman, Nicholas Mills, Walter B. Ligon, and F. Alex Feltus
Subjects: FASTQ format, 0303 health sciences, Computer science, Applied Mathematics, RNA-Seq, Computational biology, high-throughput DNA sequencing, FASTQ, Biochemistry, DNA sequencing, Deep sequencing, Computer Science Applications, 03 medical and health sciences, Computational Mathematics, High-Throughput DNA Sequencing, 0302 clinical medicine, ComputingMethodologies_PATTERNRECOGNITION, lcsh:Biology (General), data transfers, Molecular Biology, lcsh:QH301-705.5, 030217 neurology & neurosurgery, 030304 developmental biology, Original Research
Abstract: Motivation: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of the original data while still preserving the biological information of interest. Results: Using 4 high-throughput DNA sequence datasets of differing sequencing depth from 2 species as use cases, we demonstrate the effect of processing partial datasets on the number of detected RNA transcripts using an RNA-Seq workflow. We used transcript detection to decide on a cutoff point. We then physically transferred the minimal partial dataset and compared with the transfer of the full dataset, which showed a reduction of approximately 25% in the total transfer time. These results suggest that as sequencing datasets get larger, one way to speed up analysis is to simply transfer the minimal amount of data that still sufficiently detects biological signal. Availability: All results were generated using public datasets from NCBI and publicly available open source software.
Published: 2019

54. Linking Binary Gene Relationships to Drivers of Renal Cell Carcinoma Reveals Convergent Function in Alternate Tumor Progression Paths

Author: F. Alex Feltus, James J. Hsieh, and William L. Poehlman
Subjects: 0301 basic medicine, Oncogene Proteins, Fusion, Carcinogenesis, medicine.medical_treatment, lcsh:Medicine, Datasets as Topic, Biology, medicine.disease_cause, urologic and male genital diseases, Article, Targeted therapy, PBRM1, 03 medical and health sciences, 0302 clinical medicine, Renal cell carcinoma, medicine, Humans, Gene Regulatory Networks, lcsh:Science, Gene, neoplasms, Carcinoma, Renal Cell, Neoplasm Staging, Mutation, BAP1, Multidisciplinary, Tumor Suppressor Proteins, lcsh:R, Cancer, medicine.disease, female genital diseases and pregnancy complications, Kidney Neoplasms, DNA-Binding Proteins, Gene Expression Regulation, Neoplastic, 030104 developmental biology, Tumor progression, Von Hippel-Lindau Tumor Suppressor Protein, Cancer research, Disease Progression, lcsh:Q, Transcriptome, Ubiquitin Thiolesterase, 030217 neurology & neurosurgery, Transcription Factors
Abstract: Renal cell carcinoma (RCC) subtypes are characterized by distinct molecular profiles. Using RNA expression profiles from 1,009 RCC samples, we constructed a condition-annotated gene coexpression network (GCN). The RCC GCN contains binary gene coexpression relationships (edges) specific to conditions including RCC subtype and tumor stage. As an application of this resource, we discovered RCC GCN edges and modules that were associated with genetic lesions in known RCC driver genes, including VHL, a common initiating clear cell RCC (ccRCC) genetic lesion, and PBRM1 and BAP1 which are early genetic lesions in the Braided Cancer River Model (BCRM). Since ccRCC tumors with PBRM1 mutations respond to targeted therapy differently than tumors with BAP1 mutations, we focused on ccRCC-specific edges associated with tumors that exhibit alternate mutation profiles: VHL-PBRM1 or VHL-BAP1. We found specific blends molecular functions associated with these two mutation paths. Despite these mutation-associated edges having unique genes, they were enriched for the same immunological functions suggesting a convergent functional role for alternate gene sets consistent with the BCRM. The condition annotated RCC GCN described herein is a novel data mining resource for the assignment of polygenic biomarkers and their relationships to RCC tumors with specific molecular and mutational profiles.
Published: 2019

55. Integrity Protection for Scientific Workflow Data: Motivation and Initial Experiences

Author: William L. Poehlman, Mats Rynge, Omkar Bhide, F. Alex Feltus, Von Welch, Karan Vahi, Randy Heiland, Raquel Hill, Anirban Mandal, Ewa Deelman, and Ilya Baldin
Subjects: Computer science, business.industry, RAID, Encryption, Computer security, computer.software_genre, law.invention, Workflow, law, Data integrity, Checksum, Data Corruption, Erasure code, business, computer, Workflow management system
Abstract: With the continued rise of scientific computing and the enormous increases in the size of data being processed, scientists must consider whether the processes for transmitting and storing data sufficiently assure the integrity of the scientific data. When integrity is not preserved, computations can fail and result in increased computational cost due to reruns, or worse, results can be corrupted in a manner not apparent to the scientist and produce invalid science results. Technologies such as TCP checksums, encrypted transfers, checksum validation, RAID and erasure coding provide integrity assurances at different levels, but they may not scale to large data sizes and may not cover a workflow from end-to-end, leaving gaps in which data corruption can occur undetected. In this paper we explore an approach of assuring data integrity - considering either malicious or accidental corruption - for workflow executions orchestrated by the Pegasus Workflow Management System. To validate our approach, we introduce Chaos Jungle - a toolkit providing an environment for validating integrity verification mechanisms by allowing researchers to introduce a variety of integrity errors during data transfers and storage. In addition to controlled experiments with Chaos Jungle, we provide analysis of integrity errors that we encountered when running production workflows.
Published: 2019
Full Text: View/download PDF

56. Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.

Author: F. Alex Feltus, Stephen P. Ficklin, Scott M. Gibson, and Melissa C. Smith
Published: 2013
Full Text: View/download PDF

57. Uncovering biomarker genes with enriched classification potential from Hallmark gene sets

Author: Benjamin T. Shealy, Melissa C. Smith, F. Alex Feltus, Courtney A. Shearer, and Colin Targonski
Subjects: 0301 basic medicine, Candidate gene, lcsh:Medicine, Computational biology, Biology, Article, 03 medical and health sciences, 0302 clinical medicine, Cancer genome, Gene expression, Databases, Genetic, Machine learning, Biomarkers, Tumor, Cancer genomics, Humans, Genetic Predisposition to Disease, lcsh:Science, Gene, Genetic Association Studies, Multidisciplinary, Gene Expression Profiling, Gene sets, lcsh:R, Computational Biology, Oncogenes, Phenotype, Gene expression profiling, 030104 developmental biology, Gene Ontology, Biomarker (medicine), lcsh:Q, 030217 neurology & neurosurgery, Algorithms
Abstract: Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call “candidate genes”, by evaluating the ability of gene combinations to classify samples from a dataset, which we call “classification potential”. Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidate and non-candidate biomarker genes. We tested this algorithm on curated gene sets from the Molecular Signatures Database (MSigDB) quantified in RNAseq gene expression matrices obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data repositories. First, we identified which MSigDB Hallmark subsets have significant classification potential for both the TCGA and GTEx datasets. Then, we identified the most discriminatory candidate biomarker genes in each Hallmark gene set and provide evidence that the improved biomarker potential of these genes may be due to reduced functional complexity.
Published: 2018

58. Widespread Genotype-Phenotype Correlations in Intellectual Disability

Author: Manuel F. Casanova, Julia L. Sharp, Zachary Gerstner, Emily L. Casanova, and F. Alex Feltus
Subjects: 0303 health sciences, WikiPathways : Pathways for the people, Computational biology, Biology, medicine.disease, Phenotype, 03 medical and health sciences, 0302 clinical medicine, Gene interaction, Intellectual disability, Genotype, Human Phenotype Ontology, medicine, Autism, KEGG, 030217 neurology & neurosurgery, 030304 developmental biology
Abstract: BackgroundLinking genotype to phenotype is a major aim of genetics research, yet many complex conditions continue to hide their underlying biochemical mechanisms. Recent research provides evidence that relevant gene-phenotype associations are discoverable in the study of intellectual disability (ID). Here we expand on that work, identifying distinctive gene interaction modules with unique enrichment patterns reflective of associated clinical features in ID.MethodsTwo hundred twelve forms of monogenic ID were curated according to comorbidities with autism and epilepsy. These groups were further subdivided according to secondary clinical symptoms of complex versus simple facial dysmorphia and neurodegenerative-like features due to their clinical prominence, modest symptom overlap, and probable etiological divergence. An aggregate gene interaction ID network for these phenotype subgroups was discovered using via a public database of known gene interactions: protein-protein, genetic, and mRNA coexpression. Additional annotation resources (Gene Ontology, Human Phenotype Ontology, TRANSFAC/JASPAR, and KEGG/WikiPathways) were utilized to assess functional and phenotypic enrichment modules within the full ID network.ResultsPhenotypic analysis revealed high rates of complex facial dysmorphia in ID with comorbid autism. In contrast, neurodegenerative-like features were overrepresented in ID with epilepsy. Network analysis subsequently showed that gene groups divided according to clinical features of interest resulted in distinctive interaction clusters, with unique functional enrichments according to module.ConclusionsThese data suggest that specific comorbid and secondary clinical features in ID are predictive of underlying genotype. In summary, ID form unique clusters, which are comprised of individual conditions with remarkable genotypic and phenotypic overlap.
Published: 2017
Full Text: View/download PDF

59. Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study

Author: William L. Poehlman, F. Alex Feltus, Christopher Watson, Leland J. Dunwoodie, Stephen P. Ficklin, and Kimberly E. Roche
Subjects: 0301 basic medicine, Science, Normal Distribution, Gene regulatory network, Computational biology, Biology, computer.software_genre, Article, Normal distribution, 03 medical and health sciences, 0302 clinical medicine, Neoplasms, Cancer genome, Humans, Gene Regulatory Networks, Gene, Regulation of gene expression, Multidisciplinary, Models, Genetic, Cancer case, Gene Expression Profiling, Reproducibility of Results, Mixture model, Gene Expression Regulation, Neoplastic, Tumor Subtype, Gene Ontology, 030104 developmental biology, 030220 oncology & carcinogenesis, Medicine, Data mining, computer, Algorithms
Abstract: A gene co-expression network (GCN) describes associations between genes and points to genetic coordination of biochemical pathways. However, genetic correlations in a GCN are only detectable if they are present in the sampled conditions. With the increasing quantity of gene expression samples available in public repositories, there is greater potential for discovery of genetic correlations from a variety of biologically interesting conditions. However, even if gene correlations are present, their discovery can be masked by noise. Noise is introduced from natural variation (intrinsic and extrinsic), systematic variation (caused by sample measurement protocols and instruments), and algorithmic and statistical variation created by selection of data processing tools. A variety of published studies, approaches and methods attempt to address each of these contributions of variation to reduce noise. Here we describe an approach using Gaussian Mixture Models (GMMs) to address natural extrinsic (condition-specific) variation during network construction from mixed input conditions. To demonstrate utility, we build and analyze a condition-annotated GCN from a compendium of 2,016 mixed gene expression data sets from five tumor subtypes obtained from The Cancer Genome Atlas. Our results show that GMMs help discover tumor subtype specific gene co-expression patterns (modules) that are significantly enriched for clinical attributes.
Published: 2017
Full Text: View/download PDF

60. Tissue specific analysis of bioconversion traits in the bioenergy grass Sorghum bicolor

Author: Keshav C. Das, Andrew H. Paterson, Joshua P. Vandenbrink, Ryan E. Hammonds, F. Alex Feltus, J. Michael Henson, and Roger N. Hilten
Subjects: biology, Bioconversion, Trichoderma viride, food and beverages, Cellulase, Sorghum, biology.organism_classification, Hydrolysis, Agronomy, Bioenergy, Enzymatic hydrolysis, biology.protein, Food science, Agronomy and Crop Science, Sweet sorghum
Abstract: In order for lignocellulose conversion to bioenergy products to be optimal, biomass hydrolysis efficiency must be increased. In conjunction with optimized enzymes and pretreatment strategies, genetic improvement of feedstock conversion potential is a common sense approach to increase end product yield. In this study, feedstock composition and crystallinity index traits were investigated across twenty Sorghum bicolor varieties for tissue specific relationships with enzymatic hydrolysis (Trichoderma viride and Aspergillus niger cellulases). It was found that hydrolysis yield potential was higher in stem than in leaf tissue. Lignin content was shown to be negatively correlated with hydrolysis rates in leaf but not stem tissue. Crystallinity index was negatively correlated with stem tissue hydrolysis rate in two grow-out years. In addition, pretreatment efficacy varied among tissue types of multiple genotypes. Dose–response curves for T. viride cellulase and ammonium hydroxide pretreatment revealed genotype and tissue specific hydrolysis rates, which suggests that these factors be optimized prior to large-scale implementation of a specific feedstock–conversion process combination. Butanol production correlates with hydrolysis rate in stem but not leaf tissue. This study suggests that selection of specific sorghum genotypes with high stem to leaf ratios could improve hydrolysis efficiency and end product yield.
Published: 2013
Full Text: View/download PDF

61. Evidence of function for conserved noncoding sequences inArabidopsis thaliana

Author: Michael Freeling, Sabarinath Subramaniam, F. Alex Feltus, and Jacob B. Spangler
Subjects: Transcription, Genetic, Ultraviolet Rays, Physiology, Arabidopsis, Plant Science, Plant Growth Regulators, Gene Expression Regulation, Plant, Genes, Duplicate, Transcription (biology), Gene expression, Arabidopsis thaliana, Gene, Conserved Sequence, Oligonucleotide Array Sequence Analysis, Genetics, Regulation of gene expression, Base Sequence, biology, food and beverages, biology.organism_classification, Noncoding DNA, Introns, Paleopolyploidy, Thermodynamics, DNA, Intergenic, 5' Untranslated Regions
Abstract: • Whole genome duplication events provide a lineage with a large reservoir of genes that can be molded by evolutionary forces into phenotypes that fit alternative environments. A well-studied whole genome duplication, the α-event, occurred in an ancestor of the model plant Arabidopsis thaliana. Retained segments of the α-event have been defined in recent years in the form of duplicate protein coding sequences (α-pairs) and associated conserved noncoding DNA sequences (CNSs). Our aim was to identify any association between CNSs and α-pair co-functionality at the gene expression level. • Here, we tested for correlation between CNS counts and α-pair co-expression and expression intensity across nine expression datasets: aerial tissue, flowers, leaves, roots, rosettes, seedlings, seeds, shoots and whole plants. • We provide evidence for a putative regulatory role of the CNSs. The association of CNSs with α-pair co-expression and expression intensity varied by gene function, subgene position and the presence of transcription factor binding motifs. A range of possible CNS regulatory mechanisms, including intron-mediated enhancement, messenger RNA fold stability and transcriptional regulation, are discussed. • This study provides a framework to understand how CNS motifs are involved in the maintenance of gene expression after a whole genome duplication event.
Published: 2011
Full Text: View/download PDF

62. Identification and mapping of conserved ortholog set (COS) II sequences of cacao and their conversion to SNP markers for marker-assisted selection in Theobroma cacao and comparative genomics studies

Author: Christopher A. Saski, Ping Zheng, Andrew Farmer, Keithanne Mockaitis, Dorrie Main, Raymond J. Schnell, Juan Carlos Motamayor, David N. Kuhn, Gregory D. May, F. Alex Feltus, and Don Livingstone
Subjects: Genetics, Comparative genomics, Expressed sequence tag, biology, Theobroma, Forestry, Horticulture, Quantitative trait locus, Marker-assisted selection, biology.organism_classification, Gene mapping, Genetic marker, Molecular Biology, Synteny
Abstract: Theobroma cacao (cacao) is a tree cultivated in the tropics around the world for its seeds that are the source of both chocolate and cocoa butter. Genetic marker development for marker-assisted selection (MAS) is critical for the success of cacao breeding for disease resistance and yield. To develop conserved ortholog set II (COSII) single-nucleotide polymorphism (SNP) markers for MAS in cacao, we have used three strategies and three types of cacao genetic and sequence data to identify and map 98 cacao COSII genes. The resources available at the time these studies were first undertaken dictated the strategy utilized. For the first strategy, SNPs were identified using cacao expressed sequence tags homologous to COSII sequences. Strategy II utilized a leaf transcriptome of cacao genotype “Matina 1–6” and Strategy III the genomic sequence of a 3-Mb region of “Matina 1–6” linkage group 5 associated with an important quantitative trait locus (QTL) for resistance to black pod. We have identified SNP markers for 83 of the 98 mapped COSII genes, and 19 of these SNP markers co-locate with QTLs. These COSII SNP markers, the first identified for cacao, will be used for genotyping and off-typing in cacao breeding programs and employed for genetic mapping and syntenic studies to trace co-location of genes regulating traits of importance between cacao and other species.
Published: 2011
Full Text: View/download PDF

63. The Association of Multiple Interacting Genes with Specific Phenotypes in Rice Using Gene Coexpression Networks

Author: Feng Luo, F. Alex Feltus, and Stephen P. Ficklin
Subjects: Genetics, Regulation of gene expression, Microarray, Physiology, Gene expression, Mutant, Pair-rule gene, Plant Science, Biology, Gene, Phenotype, Function (biology)
Abstract: Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
Published: 2010
Full Text: View/download PDF

64. A sorghum diversity panel biofuel feedstock screen for genotypes with high hydrolysis yield potential

Author: Jim R. Frederick, Joshua P. Vandenbrink, Maria P. Delgado, and F. Alex Feltus
Subjects: biology, Trichoderma viride, food and beverages, Biomass, Cellulase, biology.organism_classification, Sorghum, Hydrolysis, Agronomy, Biofuel, Bioenergy, biology.protein, Food science, Agronomy and Crop Science, Sweet sorghum
Abstract: The rate of hydrolysis among a genetically diverse panel of 381 field-grown sorghum (Sorghum bicolor L.) varieties was investigated. A high-throughput 96-well plate method was created to test large numbers of replicated sample biomass hydrolysis rates using Trichoderma viride cellulase. Analysis of the entire panel showed a wide range of hydrolysis rates, ranging from 0.6 μg/h/U cellulase to 2.7 μg/h/U cellulase, with an average rate of release of 1.5 μg/h/U cellulase. The detected hydrolysis rate is the hydrolysis yield potential (HYP) for each sorghum variety. Additionally, pretreatment with ammonium hydroxide increased the rate of hydrolysis by an average of 1.9 fold, yet did not correlate with non-pretreated hydrolysis yield potential. This study identifies specific sorghum varieties with high HYP and sets the stage for the genetic mapping of HYP genes.
Published: 2010
Full Text: View/download PDF

65. The first genome-level transcriptome of the wood-degrading fungus Phanerochaete chrysosporium grown on red oak

Author: Shin Sato, Ming Tien, Prashanti R. Iyer, and F. Alex Feltus
Subjects: DNA, Complementary, Sequence analysis, Molecular Sequence Data, Cellulase, Phanerochaete, Lignin, Microbiology, Fungal Proteins, Quercus, chemistry.chemical_compound, Polysaccharides, Genetics, Cellulases, Cluster Analysis, RNA, Messenger, Cellulose, Gene Library, Chrysosporium, Expressed Sequence Tags, Expressed sequence tag, Base Sequence, biology, Reverse Transcriptase Polymerase Chain Reaction, Gene Expression Profiling, beta-Glucosidase, Fungal genetics, Sequence Analysis, DNA, General Medicine, Lignin peroxidase, biology.organism_classification, chemistry, Biochemistry, biology.protein, Genome, Fungal
Abstract: As part of an effort to determine all the gene products involved in wood degradation, we have performed massively parallel pyrosequencing on an expression library from the white rot fungus Phanerochaete chrysosporium grown in shallow stationary cultures with red oak as the carbon source. Approximately 48,000 high quality sequence tags (246 bp average length) were generated. 53% of the sequence tags aligned to 4,262 P. chrysosporium gene models, and an additional 18.5% of the tags reliably aligned to the P. chrysosporium genome providing evidence for 961 putative novel fragmented gene models. Due to their role in lignocellulose degradation, the secreted proteins were focused upon. Our results show that the four enzymes required for cellulose degradation: endocellulase, exocellulase CBHI, exocellulase CBHII, and beta-glucosidase are all produced. For hemicellulose degradation, not all known enzymes were produced, but endoxylanases, acetyl xylan esterases and mannosidases were detected. For lignin degradation, the role of peroxidases has been questioned; however, our results show that lignin peroxidase is highly expressed along with the H(2)O(2) generating enzyme, alcohol oxidase. The transcriptome snapshot reveals that H(2)O(2) generation and utilization are central in wood degradation. Our results also reveal new transcripts that encode extracellular proteins with no known function.
Published: 2009
Full Text: View/download PDF

66. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus)

Author: Kerr Wall, Alexandre Dionne-Laporte, F. Alex Feltus, Rafael Navajas-Pérez, Jon Y. Suzuki, Wei Wang, Matthew Jones, Yun Feng, Shaobin Hou, Christine M. Ackerman, Yun J. Zhu, Qingyi Yu, Thomas Mitchell-Olds, Paul H. Moore, Jiming Jiang, Wubin Qian, Lei Wang, Peng Du, Dorothy E. Shippen, Yan Ren, Brad W. Porter, Henrik H. Albert, Jyothi Thimmapuram, Chao Liu, Andrea R. Gschwend, Claude W. dePamphilis, Ming-Cheng Luo, Jeffrey D. Palmer, Robert E. Paull, Maya Devi Paidi, Ming Li Wang, Jianmei Wang, Vikki Friedman, Steven L. Salzberg, Andrew H. Paterson, Rachel L. Skelton, Yingjun Li, Eric Lyons, Moriah Eustice, Danny W. Rice, David A. Christopher, Savarni Tripathi, Pavel Senin, Junguo Shen, Tak Sugimura, David R. Nelson, Peizhu Guan, Gernot G. Presting, John E. Bowers, Neupane Kabi Raj, Jan E. Murray, Hairong Wei, Brian J. Haas, Stephen M. Mount, Haibao Tang, Dennis Gonsalves, Arthur L. Delcher, Aaron J. Windsor, Ricelle A. Acob, Andrea Blas, A. Max Burroughs, Ning Jiang, Ray Ming, Xiyin Wang, Maqsudul Alam, Cuixia Chen, Eric J. Tong, Manuel J. Torres, Michael Freeling, Mary A. Schuler, Beth Irikura, Lu Feng, Ching Man Wai, Eugene V. Shakirov, Jong Kuk Na, Jimmy H. Saw, Kanako L. T. Lewis, Todd P. Michael, Benjamin V. Ly, Michael C. Schatz, Lei Liu, Ratnesh Singh, Wenli Zhang, Jianping Wang, and Niranjan Nagarajan
Subjects: Nuclear gene, Molecular Sequence Data, Arabidopsis, Biology, Plant disease resistance, Genes, Plant, Genome, Article, Contig Mapping, Databases, Genetic, Gene, Genetics, Whole genome sequencing, Tropical Climate, Multidisciplinary, Carica, food and beverages, Sequence Analysis, DNA, Plants, Genetically Modified, biology.organism_classification, Chloroplast DNA, Sequence Alignment, Functional genomics, Genome, Plant, Transcription Factors
Abstract: Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3× draft genome sequence of ‘SunUp’ papaya, the first commercial virus-resistant transgenic fruit tree1 to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far2–5, may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica's distinguishing morpho-physiological, medicinal and nutritional properties.
Published: 2008
Full Text: View/download PDF

67. Meta-analysis of Polyploid Cotton QTL Shows Unequal Contributions of Subgenomes to a Complex Network of Genes and Gene Clusters Implicated in Lint Fiber Development

Author: Thea A. Wilkins, C. Wayne Smith, Yehoshua Saranga, Xavier Draye, Junkang Rong, V. N. Waghmare, O. Lloyd May, Peng W. Chee, F. Alex Feltus, John R. Gannaway, Gary J. Pierce, Jonathan F. Wendel, Andrew H. Paterson, and Robert J. Wright
Subjects: Genetics, Gossypium, DNA, Plant, Quantitative Trait Loci, Plant genetics, Chromosome Mapping, food and beverages, Investigations, Biology, Quantitative trait locus, Genes, Plant, Genome, Polyploidy, Phenotype, Family-based QTL mapping, Polyploid, Multigene Family, Mutation, Cotton Fiber, Ploidy, Gene, Crosses, Genetic, Genome, Plant, Synteny
Abstract: QTL mapping experiments yield heterogeneous results due to the use of different genotypes, environments, and sampling variation. Compilation of QTL mapping results yields a more complete picture of the genetic control of a trait and reveals patterns in organization of trait variation. A total of 432 QTL mapped in one diploid and 10 tetraploid interspecific cotton populations were aligned using a reference map and depicted in a CMap resource. Early demonstrations that genes from the non-fiber-producing diploid ancestor contribute to tetraploid lint fiber genetics gain further support from multiple populations and environments and advanced-generation studies detecting QTL of small phenotypic effect. Both tetraploid subgenomes contribute QTL at largely non-homeologous locations, suggesting divergent selection acting on many corresponding genes before and/or after polyploid formation. QTL correspondence across studies was only modest, suggesting that additional QTL for the target traits remain to be discovered. Crosses between closely-related genotypes differing by single-gene mutants yield profoundly different QTL landscapes, suggesting that fiber variation involves a complex network of interacting genes. Members of the lint fiber development network appear clustered, with cluster members showing heterogeneous phenotypic effects. Meta-analysis linked to synteny-based and expression-based information provides clues about specific genes and families involved in QTL networks.
Published: 2007
Full Text: View/download PDF

68. Chromosomal location and gene paucity of the male specific region on papaya Y chromosome

Author: Qingyi Yu, Boris Vyskot, Roman Hobza, Rachel L. Skelton, Cornelia Lemke, Jimmy H. Saw, Xiue Wang, Paul H. Moore, Jiming Jiang, Weiwei Jin, Maqsudul Alam, Shaobin Hou, Andrea Blas, Andrew H. Paterson, Ray Ming, and F. Alex Feltus
Subjects: Genetics, Chromosomes, Artificial, Bacterial, Carica, Chromosome Mapping, General Medicine, Sex Determination Processes, Biology, Genes, Plant, Y chromosome, Chromosomes, Plant, digestive system diseases, Chromosome 17 (human), Chromosome 16, Chromosome 3, Chromosome 18, Chromosome 19, Chromosome 21, Molecular Biology, In Situ Hybridization, Fluorescence, Chromosome 12, Repetitive Sequences, Nucleic Acid
Abstract: Sex chromosomes in Xowering plants evolved recently and many of them remain homomorphic, including those in papaya. We investigated the chromosomal location of papaya's small male speciWc region of the hermaphrodite Y (Y h ) chromosome (MSY) and its genomic features. We conducted chromosome Xuorescence in situ hybridization mapping of Y h -speciWc bacterial artiWcial chromosomes (BACs) and placed the MSY near the centromere of the papaya Y chromosome. Then we sequenced Wve MSY BACs to examine the genomic features of this specialized region, which resulted in the largest collection of contigu- ous genomic DNA sequences of a Y chromosome in Xow- ering plants. Extreme gene paucity was observed in the papaya MSY with no functional gene identiWed in 715 kb MSY sequences. A high density of retroelements and local sequence duplications were detected in the MSY that is suppressed for recombination. Location of the papaya MSY near the centromere might have provided recombination suppression and fostered paucity of genes in the male spe- ciWc region of the Y chromosome. Our Wndings provide critical information for deciphering the sex chromosomes in papaya and reference information for comparative stud- ies of other sex chromosomes in animals and plants.
Published: 2007
Full Text: View/download PDF

69. DNA motifs associated with aberrant CpG island methylation

Author: F. Alex Feltus, Joseph F. Costello, Christoph Plass, Eva K. Lee, and Paula M. Vertino
Subjects: Alu, Alu element, Repetitive DNA, Biology, Transfection, 03 medical and health sciences, 0302 clinical medicine, Epigenetics of physical exercise, Alu Elements, Genetics, Humans, Epigenetics, Repeated sequence, Cells, Cultured, 030304 developmental biology, 0303 health sciences, DNA methylation, Base Sequence, DNA, Sequence Analysis, DNA, Methylation, Discriminant analysis, Differentially methylated regions, Classification techniques, CpG site, 030220 oncology & carcinogenesis, CpG Islands, Algorithms
Abstract: Epigenetic silencing involving the aberrant methylation of promoter region CpG islands is widely recognized as a tumor suppressor silencing mechanism in cancer. However, the molecular pathways underlying aberrant DNA methylation remain elusive. Recently we showed that, on a genome-wide level, CpG island loci differ in their intrinsic susceptibility to aberrant methylation and that this susceptibility can be predicted based on underlying sequence context. These data suggest that there are sequence/structural features that contribute to the protection from or susceptibility to aberrant methylation. Here we use motif elicitation coupled with classification techniques to identify DNA sequence motifs that selectively define methylation-prone or methylation-resistant CpG islands. Motifs common to 28 methylation-prone or 47 methylation-resistant CpG island-containing genomic fragments were determined using the MEME and MAST algorithms (http://meme.sdsc.edu). The five most discriminatory motifs derived from methylation-prone sequences were found to be associated with CpG islands in general and were nonrandomly distributed throughout the genome. In contrast, the eight most discriminatory motifs derived from the methylation-resistant CpG islands were randomly distributed throughout the genome. Interestingly, this latter group tended to associate with Alu and other repetitive sequences. Used together, the frequency of occurrence of these motifs successfully discriminated methylation-prone and methylation-resistant CpG island groups with an accuracy of 87% after 10-fold cross-validation. The motifs identified here are candidate methylation-targeting or methylation-protection DNA sequences.
Published: 2006
Full Text: View/download PDF

70. Molecular Biology of the 3β-Hydroxysteroid Dehydrogenase/Δ5-Δ4 Isomerase Gene Family

Author: Marie-Louise Ricketts, Sebastien Gingras, Penny Soucy, F. Alex Feltus, Michael H. Melner, and Jacques Simard
Subjects: Genetics, Regulation of gene expression, endocrine system, HSD3B2 Gene, Endocrinology, Diabetes and Metabolism, Biology, Molecular biology, Endocrinology, HSD3B1, Gene expression, HSD3B2, Gene family, Signal transduction, Gene
Abstract: The 3beta-hydroxysteroid dehydrogenase/Delta(5)-Delta(4) isomerase (3beta-HSD) isoenzymes are responsible for the oxidation and isomerization of Delta(5)-3beta-hydroxysteroid precursors into Delta(4)-ketosteroids, thus catalyzing an essential step in the formation of all classes of active steroid hormones. In humans, expression of the type I isoenzyme accounts for the 3beta-HSD activity found in placenta and peripheral tissues, whereas the type II 3beta-HSD isoenzyme is predominantly expressed in the adrenal gland, ovary, and testis, and its deficiency is responsible for a rare form of congenital adrenal hyperplasia. Phylogeny analyses of the 3beta-HSD gene family strongly suggest that the need for different 3beta-HSD genes occurred very late in mammals, with subsequent evolution in a similar manner in other lineages. Therefore, to a large extent, the 3beta-HSD gene family should have evolved to facilitate differential patterns of tissue- and cell-specific expression and regulation involving multiple signal transduction pathways, which are activated by several growth factors, steroids, and cytokines. Recent studies indicate that HSD3B2 gene regulation involves the orphan nuclear receptors steroidogenic factor-1 and dosage-sensitive sex reversal adrenal hypoplasia congenita critical region on the X chromosome gene 1 (DAX-1). Other findings suggest a potential regulatory role for STAT5 and STAT6 in transcriptional activation of HSD3B2 promoter. It was shown that epidermal growth factor (EGF) requires intact STAT5; on the other hand IL-4 induces HSD3B1 gene expression, along with IL-13, through STAT 6 activation. However, evidence suggests that multiple signal transduction pathways are involved in IL-4 mediated HSD3B1 gene expression. Indeed, a better understanding of the transcriptional factors responsible for the fine control of 3beta-HSD gene expression may provide insight into mechanisms involved in the functional cooperation between STATs and nuclear receptors as well as their potential interaction with other signaling transduction pathways such as GATA proteins. Finally, the elucidation of the molecular basis of 3beta-HSD deficiency has highlighted the fact that mutations in the HSD3B2 gene can result in a wide spectrum of molecular repercussions, which are associated with the different phenotypic manifestations of classical 3beta-HSD deficiency and also provide valuable information concerning the structure-function relationships of the 3beta-HSD superfamily. Furthermore, several recent studies using type I and type II purified enzymes have elegantly further characterized structure-function relationships responsible for kinetic differences and coenzyme specificity.
Published: 2005
Full Text: View/download PDF

71. An SNP Resource for Rice Genetics and Breeding Based on Subspecies Indica and Japonica Genome Alignments

Author: Andrew H. Paterson, F. Alex Feltus, Stefan R. Schulze, Jun Wan, Ning Jiang, and James C. Estill
Subjects: Recombination, Genetic, Genetics, Positional cloning, Molecular Sequence Data, Genetic Variation, food and beverages, Oryza, Single-nucleotide polymorphism, Breeding, Tag SNP, Biology, Polymorphism, Single Nucleotide, Genome, Resources, SNP genotyping, Genetic marker, Gene pool, Indel, Sequence Alignment, Genome, Plant, Sorghum, Genetics (clinical)
Abstract: Dense coverage of the rice genome with polymorphic DNA markers is an invaluable tool for DNA marker-assisted breeding, positional cloning, and a wide range of evolutionary studies. We have aligned drafts of two rice subspecies, indica and japonica, and analyzed levels and patterns of genetic diversity. After filtering multiple copy and low quality sequence, 408,898 candidate DNA polymorphisms (SNPs/INDELs) were discerned between the two subspecies. These filters have the consequence that our data set includes only a subset of the available SNPs (in particular excluding large numbers of SNPs that may occur between repetitive DNA alleles) but increase the likelihood that this subset is useful: Direct sequencing suggests that 79.8% ± 7.5% of the in silico SNPs are real. The SNP sample in our database is not randomly distributed across the genome. In fact, 566 rice genomic regions had unusually high (328 contigs/48.6 Mb/13.6% of genome) or low (237 contigs/64.7 Mb/18.1% of genome) polymorphism rates. Many SNP-poor regions were substantially longer than most SNP-rich regions, covering up to 4 Mb, and possibly reflecting introgression between the respective gene pools that may have occurred hundreds of years ago. Although 46.2% ± 8.3% of the SNPs differentiate other pairs of japonica and indica genotypes, SNP rates in rice were not predictive of evolutionary rates for corresponding genes in another grass species, sorghum. The data set is freely available at http://www.plantgenome.uga.edu/snp.
Published: 2004
Full Text: View/download PDF

72. Glucocorticoids enhance activation of the human type II 3β-hydroxysteroid dehydrogenase/Δ5–Δ4 isomerase gene

Author: Michael H. Melner, William J. Kovacs, Sebastien Gingras, Wendell E. Nicholson, Barbara J. Clark, F. Alex Feltus, Jacques Simard, and Stéphanie Côté
Subjects: Hydrocortisone, Transcription, Genetic, Endocrinology, Diabetes and Metabolism, Clinical Biochemistry, Response element, Steroid Isomerases, Biochemistry, Dexamethasone, Endocrinology, Glucocorticoid receptor, STAT5 Transcription Factor, Tumor Cells, Cultured, Enzyme Inhibitors, STAT5, Steroidogenic acute regulatory protein, Milk Proteins, Aminoglutethimide, Neoplasm Proteins, DNA-Binding Proteins, medicine.anatomical_structure, Tetradecanoylphorbol Acetate, Molecular Medicine, hormones, hormone substitutes, and hormone antagonists, Glucocorticoid, medicine.drug, endocrine system, medicine.medical_specialty, animal structures, Biology, Response Elements, Receptors, Glucocorticoid, Anterior pituitary, Multienzyme Complexes, Internal medicine, medicine, Humans, RNA, Messenger, Glucocorticoids, Molecular Biology, Hormone response element, Progesterone Reductase, Tumor Suppressor Proteins, Cell Biology, Phosphoproteins, Adrenal Cortex Neoplasms, Enzyme Activation, Trans-Activators, biology.protein, HeLa Cells
Abstract: Glucocorticoids indirectly alter adrenocortical steroid output through the inhibition of ACTH secretion by the anterior pituitary. However, previous studies suggest that glucocorticoids can directly affect adrenocortical steroid production. Therefore, we have investigated the ability of glucocorticoids to affect transcription of adrenocortical steroid biosynthetic enzymes. One potential target of glucocorticoid action in the adrenal is an enzyme critical for adrenocortical steroid production: 3beta-hydroxysteroid dehydrogenase/Delta5-Delta4 isomerase (3beta-HSD). Treatment of the adrenocortical cell line (H295R) with the glucocorticoid agonist dexamethasone (DEX) increased cortisol production and 3beta-HSD mRNA levels alone or in conjunction with phorbol ester. This increase in 3beta-HSD mRNA was paralleled by increases in Steroidogenic Acute Regulatory Protein (StAR) mRNA levels. The human type II 3beta-HSD promoter lacks a consensus palindromic glucocorticoid response element (GRE) but does contain a Stat5 response element (Stat5RE) suggesting that glucocorticoids could affect type II 3beta-HSD transcription via interaction with Stat5. Transfection experiments show enhancement of human type II 3beta-HSD promoter activity by coexpression of the glucocorticoid receptor (GR) and Stat5A and treatment with 100nM dexamethasone. Furthermore, removal of the Stat5RE either by truncation of the 5' flanking sequence in the promoter or introduction of point mutations to the Stat5RE abolished the ability of DEX to enhance 3beta-HSD promoter activity. These studies demonstrate the ability of glucocorticoids to directly enhance the expression of an adrenal steroidogenic enzyme gene albeit independent of a consensus palindromic glucocorticoid response element.
Published: 2002
Full Text: View/download PDF

73. BAC sequencing using pooled methods

Author: Christopher A, Saski, F Alex, Feltus, Laxmi, Parida, and Niina, Haiminen
Subjects: DNA, Bacterial, Isopropyl Thiogalactoside, Chromosomes, Artificial, Bacterial, Genomic Library, Indoles, High-Throughput Nucleotide Sequencing, Galactosides, DNA Restriction Enzymes, Sequence Analysis, DNA, DNA Fingerprinting, Contig Mapping, Escherichia coli, Genome, Bacterial, Software
Abstract: Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
Published: 2014

74. Systems genetics: a paradigm to improve discovery of candidate genes and mechanisms underlying complex traits

Author: F. Alex Feltus
Subjects: Genetics, Candidate gene, Systems Biology, Quantitative Trait Loci, Plant Science, General Medicine, Computational biology, Biology, Quantitative trait locus, Plants, Genome, Quantitative Trait, Heritable, Gene interaction, Gene mapping, Expression quantitative trait loci, Agronomy and Crop Science, Functional genomics, Gene, Genetic Association Studies
Abstract: Understanding the control of any trait optimally requires the detection of causal genes, gene interaction, and mechanism of action to discover and model the biochemical pathways underlying the expressed phenotype. Functional genomics techniques, including RNA expression profiling via microarray and high-throughput DNA sequencing, allow for the precise genome localization of biological information. Powerful genetic approaches, including quantitative trait locus (QTL) and genome-wide association study mapping, link phenotype with genome positions, yet genetics is less precise in localizing the relevant mechanistic information encoded in DNA. The coupling of salient functional genomic signals with genetically mapped positions is an appealing approach to discover meaningful gene-phenotype relationships. Techniques used to define this genetic-genomic convergence comprise the field of systems genetics. This short review will address an application of systems genetics where RNA profiles are associated with genetically mapped genome positions of individual genes (eQTL mapping) or as gene sets (co-expression network modules). Both approaches can be applied for knowledge independent selection of candidate genes (and possible control mechanisms) underlying complex traits where multiple, likely unlinked, genomic regions might control specific complex traits.
Published: 2013

75. Identification of bioconversion quantitative trait loci in the interspecific cross Sorghum bicolor × Sorghum propinquum

Author: Andrew H. Paterson, Valorie H. Goff, F. Alex Feltus, Wenqian Kong, Huizhe Jin, and Joshua P. Vandenbrink
Subjects: Genetic Markers, Genotype, Genetic Linkage, Population, Quantitative Trait Loci, Carbohydrates, Biomass, Lignocellulosic biomass, Quantitative trait locus, Breeding, Genes, Plant, complex mixtures, Zea mays, X-Ray Diffraction, Bioenergy, Genetics, Plant breeding, education, Crosses, Genetic, Sorghum, education.field_of_study, biology, food and beverages, Chromosome Mapping, General Medicine, biology.organism_classification, Phenotype, Agronomy, Agronomy and Crop Science, Sweet sorghum, Biotechnology
Abstract: For lignocellulosic bioenergy to be economically viable, genetic improvements must be made in feedstock quality including both biomass total yield and conversion efficiency. Toward this goal, multiple studies have considered candidate genes and discovered quantitative trait loci (QTL) associated with total biomass accumulation and/or grain production in bioenergy grass species including maize and sorghum. However, very little research has been focused on genes associated with increased biomass conversion efficiency. In this study, Trichoderma viride fungal cellulase hydrolysis activity was measured for lignocellulosic biomass (leaf and stem tissue) obtained from individuals in a F5 recombinant inbred Sorghum bicolor × Sorghum propinquum mapping population. A total of 49 QTLs (20 leaf, 29 stem) were associated with enzymatic conversion efficiency. Interestingly, six high-density QTL regions were identified in which four or more QTLs overlapped. In addition to enzymatic conversion efficiency QTLs, two QTLs were identified for biomass crystallinity index, a trait which has been shown to be inversely correlated with conversion efficiency in bioenergy grasses. The identification of these QTLs provides an important step toward identifying specific genes relevant to increasing conversion efficiency of bioenergy feedstocks. DNA markers linked to these QTLs could be useful in marker-assisted breeding programs aimed at increasing overall bioenergy yields concomitant with selection of high total biomass genotypes.
Published: 2013

76. Sequencing papaya X and Yh chromosomes reveals molecular basis of incipient sex chromosome evolution

Author: Ray Ming, Jennifer Han, Paul H. Moore, Jiming Jiang, Cornelia Lemke, Ming Li Wang, Rafael Navajas-Pérez, Rishi Aryal, Ching Man Wai, Andrea R. Gschwend, Fanchang Zeng, Jianping Wang, F. Alex Feltus, Robert VanBuren, Cuixia Chen, Eric J. Tong, Jan E. Murray, Maqsudul Alam, Qingyi Yu, Wenli Zhang, Jong Kuk Na, Deborah Charlesworth, Ratnesh Singh, Xiang Jia Min, and Andrew H. Paterson
Subjects: Genetics, Chromosomes, Artificial, Bacterial, Multidisciplinary, Sex Chromosomes, Models, Genetic, Retroelements, Sequence analysis, Carica, Molecular Sequence Data, Chromosome, Chromosome Mapping, Sequence Analysis, DNA, Biology, Biological Sciences, DNA sequencing, Chromosomes, Plant, Evolution, Molecular, Molecular evolution, Chromosome Duplication, Chromosome Inversion, Homologous chromosome, Gene, X chromosome, Chromosomal inversion, Repetitive Sequences, Nucleic Acid
Abstract: Sex determination in papaya is controlled by a recently evolved XY chromosome pair, with two slightly different Y chromosomes controlling the development of males (Y) and hermaphrodites (Y h ). To study the events of early sex chromosome evolution, we sequenced the hermaphrodite-specific region of the Y h chromosome (HSY) and its X counterpart, yielding an 8.1-megabase (Mb) HSY pseudomolecule, and a 3.5-Mb sequence for the corresponding X region. The HSY is larger than the X region, mostly due to retrotransposon insertions. The papaya HSY differs from the X region by two large-scale inversions, the first of which likely caused the recombination suppression between the X and Y h chromosomes, followed by numerous additional chromosomal rearrangements. Altogether, including the X and/or HSY regions, 124 transcription units were annotated, including 50 functional pairs present in both the X and HSY. Ten HSY genes had functional homologs elsewhere in the papaya autosomal regions, suggesting movement of genes onto the HSY, whereas the X region had none. Sequence divergence between 70 transcripts shared by the X and HSY revealed two evolutionary strata in the X chromosome, corresponding to the two inversions on the HSY, the older of which evolved about 7.0 million years ago. Gene content differences between the HSY and X are greatest in the older stratum, whereas the gene content and order of the collinear regions are identical. Our findings support theoretical models of early sex chromosome evolution.
Published: 2012

77. Saccharinae Bioinformatics Resources

Author: Alan R. Gingle and F. Alex Feltus
Subjects: Data access, Reference genome sequence, Computer science, Network data, Saccharinae, Bioinformatics, computer.software_genre, Data type, computer, Data integration
Abstract: The primary goal of this chapter is to provide practical information for utilizing the array of Saccharinae bioinformatics resources that are presently available. The chapter begins with the description of a survey of Saccharinae bioinformatics resources that was undertaken early in 2010. Resources are categorized by life science area(s), available data types, and modes of data access. Navigating resources and searching for Saccharinae data is then described through a broad collection of search examples that cover categories ranging from maps, markers, and genomic sequence through transcriptome-, proteome-, and biochemistry-related data. Data integration, as means for providing answers to more complex biological questions, is discussed in terms of existing applications of reference genome sequence and possible future applications of co-expression network data.
Published: 2012
Full Text: View/download PDF

78. Construction of physical maps for the sex-specific regions of papaya sex chromosomes

Author: Jong-Kuk Na, Andrew H. Paterson, Rafael Pérez, Wenli Zhang, Jan E. Murray, Qingyi Yu, F. Alex Feltus, Paul H. Moore, Jiming Jiang, Jianping Wang, Cuixia Chen, Andrea R. Gschwend, Zdenek Kubat, and Ray Ming
Subjects: Genetic Markers, 0106 biological sciences, Chromosomes, Artificial, Bacterial, lcsh:QH426-470, Heterochromatin, lcsh:Biotechnology, Biology, 01 natural sciences, Chromosomes, Plant, 03 medical and health sciences, Chromosome regions, lcsh:TP248.13-248.65, Genetics, Primer walking, Crosses, Genetic, X chromosome, 030304 developmental biology, Recombination, Genetic, 2. Zero hunger, 0303 health sciences, Bacterial artificial chromosome, Bacterial artificial chromosome (BAC), Base Sequence, Carica papaya, Carica, Physical Chromosome Mapping, Chromosome, Sex chromosomes, Sex determination, lcsh:Genetics, Suppression of recombination, Sex linkage, Research Article, Microsatellite Repeats, 010606 plant biology & botany, Biotechnology
Abstract: Background Papaya is a major fruit crop in tropical and subtropical regions worldwide. It is trioecious with three sex forms: male, female, and hermaphrodite. Sex determination is controlled by a pair of nascent sex chromosomes with two slightly different Y chromosomes, Y for male and Yh for hermaphrodite. The sex chromosome genotypes are XY (male), XYh (hermaphrodite), and XX (female). The papaya hermaphrodite-specific Yh chromosome region (HSY) is pericentromeric and heterochromatic. Physical mapping of HSY and its X counterpart is essential for sequencing these regions and uncovering the early events of sex chromosome evolution and to identify the sex determination genes for crop improvement. Results A reiterate chromosome walking strategy was applied to construct the two physical maps with three bacterial artificial chromosome (BAC) libraries. The HSY physical map consists of 68 overlapped BACs on the minimum tiling path, and covers all four HSY-specific Knobs. One gap remained in the region of Knob 1, the only knob structure shared between HSY and X, due to the lack of HSY-specific sequences. This gap was filled on the physical map of the HSY corresponding region in the X chromosome. The X physical map consists of 44 BACs on the minimum tiling path with one gap remaining in the middle, due to the nature of highly repetitive sequences. This gap was filled on the HSY physical map. The borders of the non-recombining HSY were defined genetically by fine mapping using 1460 F2 individuals. The genetically defined HSY spanned approximately 8.5 Mb, whereas its X counterpart extended about 5.4 Mb including a 900 Kb region containing the Knob 1 shared by the HSY and X. The 8.5 Mb HSY corresponds to 4.5 Mb of its X counterpart, showing 4 Mb (89%) DNA sequence expansion. Conclusion The 89% increase of DNA sequence in HSY indicates rapid expansion of the Yh chromosome after genetic recombination was suppressed 2–3 million years ago. The genetically defined borders coincide with the common BACs on the minimum tiling paths of HSY and X. The minimum tiling paths of HSY and its X counterpart are being used for sequencing these X and Yh-specific regions.
Published: 2012

79. Novel nuclear intron-spanning primers for Arecaceae evolutionary biology

Author: F. Alex Feltus, Andrew H. Paterson, Christine D. Bacon, and C. Donovan Bailey
Subjects: Genetics, Nuclear gene, Evolutionary biology, law, Intron, Arecaceae, Biology, biology.organism_classification, Ecology, Evolution, Behavior and Systematics, Polymerase chain reaction, Biotechnology, law.invention
Abstract: In this study, 96 nuclear ‘conserved intron-scanning primers’ were screened across subfamilies the Arecaceae (palms) for potential use in research focused on palm evolutionary biology. Primers were evaluated based on their ability to amplify single polymerase chain reaction products in Arecaceae, the clarity of sequencing reads, and the interspecific variability observed. Ultimately, the results suggest that: (i) seven of the loci are likely to be suitable when comparing non-Arecaceae outgroups and Arecaceae ingroups; (ii) seven loci may be of use when comparing subfamilies of Arecaceae; and (iii) four of the loci may be of use when comparing closely related genera.
Published: 2011

80. Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes

Author: Niina Haiminen, F. Alex Feltus, and Laxmi Parida
Subjects: Chromosomes, Artificial, Bacterial, lcsh:QH426-470, In silico, lcsh:Biotechnology, Pooling, Arabidopsis, Genomics, Hybrid genome assembly, Computational biology, Biology, Genome, lcsh:TP248.13-248.65, Genetics, Genomic library, Base Pairing, Genomic Library, Shotgun sequencing, Sequence Analysis, DNA, Reference Standards, lcsh:Genetics, DNA microarray, Genome, Plant, Research Article, Biotechnology
Abstract: Background We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. Results The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. Conclusions BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.
Published: 2011

81. The Sorghum bicolor genome and the diversification of grasses

Author: Eric Lyons, Haibao Tang, Manuel Spannagl, Uffe Hellsten, Heidrun Gundlach, Daniel S. Rokhsar, Alan R. Gingle, Jane Grimwood, Doreen Ware, Maureen C. McCann, Joachim Messing, Igor V. Grigoriev, C. Thomas Hash, Thomas Wicker, Therese Mitros, Rémy Bruggmann, Yu Wang, Inna Dubchak, Mihaela Martis, Georg Haberer, Bryan W. Penning, Andrew H. Paterson, Beat Keller, Ray Ming, Klaus F. X. Mayer, John E. Bowers, F. Alex Feltus, Peter Westhoff, Stephen Kresovich, Lifang Zhang, Mehboob-ur-Rahman, Jeremy Schmutz, Udo Gowik, Alexander Poliakov, Christopher G. Maher, Apurva Narechania, Patricia E. Klein, Nicholas C. Carpita, Asaf Salamov, Daniel G. Peterson, Robert Otillar, Xiyin Wang, Jarrod Chapman, Arvind K. Bharti, Michael Freeling, and University of Zurich
Subjects: Arabidopsis, Genomics, Retrotransposon, 580 Plants (Botany), Genes, Plant, Poaceae, Zea mays, Genome, Chromosomes, Plant, Evolution, Molecular, 10126 Department of Plant and Microbial Biology, Gene Duplication, Botany, Genome size, Gene, Sorghum, Sequence Deletion, Recombination, Genetic, Genetics, 1000 Multidisciplinary, Multidisciplinary, Concerted evolution, biology, food and beverages, Oryza, Sequence Analysis, DNA, biology.organism_classification, Populus, Sequence Alignment, Sweet sorghum, Genome, Plant
Abstract: Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.
Published: 2009
Full Text: View/download PDF

82. Editorial: AutoImmune Premature Ovarian Failure—Endocrine Aspects of a T Cell Disease1

Author: F. Alex Feltus and Michael H. Melner
Subjects: Autoimmune disease, medicine.medical_specialty, endocrine system diseases, medicine.medical_treatment, T cell, Autoantibody, Disease, Biology, medicine.disease, medicine.disease_cause, Autoimmunity, Premature ovarian failure, Thymectomy, Endocrinology, medicine.anatomical_structure, Antigen, Internal medicine, Immunology, medicine
Abstract: Premature ovarian failure (POF) is the loss of ovarian function in women less than 40 yr of age (reviewed in Ref. 1). It is associated with sex steroid deficiency, amenorrhea, infertility, and elevated serum gonadotropins. While there are multiple etiologies of POF including the exposure to iatrogenic treatments (chemotherapy, radiation), viral agents, and rare genetic disorders, in most patients no etiology can be identified (idiopathic POF). Significant evidence suggests that autoimmunity is a cause of some forms of ovarian failure although specific ovarian antigens are not known and the mechanisms of autoimmune disease development are unclear. Autoimmune POF in humans is frequently associated with other manifestations of autoimmune disease. For example, POF can precede the onset of Addison’s disease or adrenal autoimmunity leading to a deficiency of adrenocortical hormones (1). Autoimmune POF is characterized by inflammatory infiltration of developing follicles, production of antiovarian antibodies, atrophy, and sparing of primordial follicles (1‐3). Autoantibodies in these diseases sometimes react with common antigens in steroid-producing cells of the ovary and adrenal cortex. Common antigens identified have been steroidogenic enzymes including P450 side-chain cleavage (P450scc), 17a-hydroxylase, and 3b-hydroxysteroid dehydrogenase (4 ‐ 6). The identification of specific antigens involved in POF is important for multiple reasons. First, the development of appropriate reagents to screen for the presence of antibodies to these antigens could provide an analytical tool for diagnosing the disease, identifying patients at risk for developing the disease, and detecting patients who may respond to immune-modulating therapies. Second, these tools could be used in research to further understand the mechanisms of disease development and the mechanisms of ovarian pathology associated with the disease. Lastly, the identification of these antigens provides new information on novel proteins in the ovary and their potential function. Animal models of autoimmune premature ovarian failure have yielded important insight into both potential mechanisms of autoimmune disease development and ovarian antigens that may affect disease progression. These models for autoimmune ovarian failure can be induced by multiple methods such as immunization with specific ovarian antigens or neonatal thymectomy in specific genetic strains of mice. A detailed review of the findings will not be repeated here but some major developments will be summarized. The most important development in these animal models of autoimmune ovarian failure comes from multiple studies, all suggesting that the basis of the disease is a cell-mediated autoimmune reaction caused by an alteration in T cell regulation (1, 3). This is most evident in the neonatal thymectomy animal model. The removal of the thymus in specific genetic strains of mice (e.g. BALB/c or A/J) between postnatal days 2 and 5 results in autoimmune ovarian failure. There is a progressive onset of the disease that is potentiated by puberty and the most severe inflammation occurs between 4 ‐14 weeks after thymectomy (3). The proposed mechanism of the disease (3) is that autoreactive T cells (CD41) are generated during normal processes such as apoptosis of follicles in the ovary. These autoreactive cells are normally controlled by CD41 T cells with suppressor activity. However, because these cells are generated in the thymus after the first week of life, neonatal thymectomy results in a dramatic loss in T cells with suppressor function. This animal model strongly implicates T cell regulation in the disease process.
Published: 1999
Full Text: View/download PDF

83. Genomics of Sorghum, a Semi-Arid Cereal and Emerging Model for Tropical Grass Genomics

Author: John E. Bowers, F. Alex Feltus, and Andrew H. Paterson
Subjects: Crop, biology, Agronomy, fungi, food and beverages, Tropics, Genomics, Sorghum, biology.organism_classification, Genome, Functional genomics, Sweet sorghum, Arid
Abstract: Sorghum, an important failsafe crop in the global agroecosystem, is also emerging as a model for tropical grasses based on its small and well-characterized genome, low level of gene duplication, and close relationship to the larger and more complex genomes of maize and sugarcane. A whole-genome shotgun sequence of the sorghum genome is complete and being annotated. The sorghum sequence, together with the attributes of sorghum as a prospective functional genomics and association genetics system, has many implications for better understanding the structure, function, and evolution of cereal genomes. In addition, the sequence will raise to a new level the opportunities to engage genomics in the improvement of human livelihood in arid and semi-arid tropical regions in which sorghum is a staple. Already established as a seed-based ethanol crop, progress in understanding the genetic control of perenniality in sorghum makes it also promising as a cellulosic biofuels crop.
Published: 2008
Full Text: View/download PDF

84. Low X/Y divergence in four pairs of papaya sex-linked genes

Author: Lei Liu, Shaobin Hou, Andrew H. Paterson, F. Alex Feltus, Qingyi Yu, Jan E. Murray, Matthew Jones, Paul H. Moore, Jiming Jiang, Richard C. Moore, Olivia Veatch, Ray Ming, Maqsudul Alam, Jimmy H. Saw, Cornelia Lemke, and Jyothi Thimmapuram
Subjects: Genetics, Bacterial artificial chromosome, Chromosomes, Artificial, Bacterial, Sex Chromosomes, X Chromosome, Carica, Chromosome, Chromosomal translocation, Karyotype, Cell Biology, Plant Science, Biology, Sex Determination Processes, Y chromosome, Chromosomes, Plant, Evolution, Molecular, Molecular evolution, Y Chromosome, Animals, Small supernumerary marker chromosome, Sex linkage
Abstract: *Summary Sex chromosomes in flowering plants, in contrast to those in animals, evolved relatively recently and only a few are heteromorphic. The homomorphic sex chromosomes of papaya show features of incipient sex chromosome evolution. We investigated the features of paired X- and Y-specific bacterial artificial chromosomes (BACs), and estimated the time of divergence in four pairs of sex-linked genes. We report the results of a comparative analysis of long contiguous genomic DNA sequences between the X and hermaphrodite Y (Y h ) chromosomes. Numerous chromosomal rearrangements were detected in the malespecific region of the Y chromosome (MSY), including inversions, deletions, insertions, duplications and translocations, showing the dynamic evolutionary process on the MSY after recombination ceased. DNA sequence expansion was documented in the two regions of the MSY, demonstrating that the cytologically homomorphic sex chromosomes are heteromorphic at the molecular level. Analysis of sequence divergence between four X and Yh gene pairs resulted in a estimated age of divergence of between 0.5 and 2.2 million years, supporting a recent origin of the papaya sex chromosomes. Our findings indicate that sex chromosomes did not evolve at the family level in Caricaceae, and reinforce the theory that sex chromosomes evolve at the species level in some lineages.
Published: 2007

85. Molecular biology of the 3beta-hydroxysteroid dehydrogenase/delta5-delta4 isomerase gene family

Author: Jacques, Simard, Marie-Louise, Ricketts, Sébastien, Gingras, Penny, Soucy, F Alex, Feltus, and Michael H, Melner
Subjects: Male, Base Sequence, Progesterone Reductase, Placenta, Molecular Sequence Data, Steroid Isomerases, Gene Expression Regulation, Enzymologic, Evolution, Molecular, Isoenzymes, Structure-Activity Relationship, Species Specificity, Multienzyme Complexes, Organ Specificity, Pregnancy, Adrenal Glands, Animals, Humans, Female, Amino Acid Sequence, Gonads, Promoter Regions, Genetic, Phylogeny
Abstract: The 3beta-hydroxysteroid dehydrogenase/Delta(5)-Delta(4) isomerase (3beta-HSD) isoenzymes are responsible for the oxidation and isomerization of Delta(5)-3beta-hydroxysteroid precursors into Delta(4)-ketosteroids, thus catalyzing an essential step in the formation of all classes of active steroid hormones. In humans, expression of the type I isoenzyme accounts for the 3beta-HSD activity found in placenta and peripheral tissues, whereas the type II 3beta-HSD isoenzyme is predominantly expressed in the adrenal gland, ovary, and testis, and its deficiency is responsible for a rare form of congenital adrenal hyperplasia. Phylogeny analyses of the 3beta-HSD gene family strongly suggest that the need for different 3beta-HSD genes occurred very late in mammals, with subsequent evolution in a similar manner in other lineages. Therefore, to a large extent, the 3beta-HSD gene family should have evolved to facilitate differential patterns of tissue- and cell-specific expression and regulation involving multiple signal transduction pathways, which are activated by several growth factors, steroids, and cytokines. Recent studies indicate that HSD3B2 gene regulation involves the orphan nuclear receptors steroidogenic factor-1 and dosage-sensitive sex reversal adrenal hypoplasia congenita critical region on the X chromosome gene 1 (DAX-1). Other findings suggest a potential regulatory role for STAT5 and STAT6 in transcriptional activation of HSD3B2 promoter. It was shown that epidermal growth factor (EGF) requires intact STAT5; on the other hand IL-4 induces HSD3B1 gene expression, along with IL-13, through STAT 6 activation. However, evidence suggests that multiple signal transduction pathways are involved in IL-4 mediated HSD3B1 gene expression. Indeed, a better understanding of the transcriptional factors responsible for the fine control of 3beta-HSD gene expression may provide insight into mechanisms involved in the functional cooperation between STATs and nuclear receptors as well as their potential interaction with other signaling transduction pathways such as GATA proteins. Finally, the elucidation of the molecular basis of 3beta-HSD deficiency has highlighted the fact that mutations in the HSD3B2 gene can result in a wide spectrum of molecular repercussions, which are associated with the different phenotypic manifestations of classical 3beta-HSD deficiency and also provide valuable information concerning the structure-function relationships of the 3beta-HSD superfamily. Furthermore, several recent studies using type I and type II purified enzymes have elegantly further characterized structure-function relationships responsible for kinetic differences and coenzyme specificity.
Published: 2005

86. The repetitive landscape of the chicken genome

Author: Robert Ivarie, Jon S. Robertson, Jason A. Morrison, F. Alex Feltus, Vincent Magrini, Stefan R. Schulze, Thomas Wicker, Andrew H. Paterson, Daniel G. Peterson, Elaine R. Mardis, and Richard K. Wilson
Subjects: Genetics, Genome evolution, Genome, Retroelements, Genetic Vectors, Terminal Repeat Sequences, Sequence assembly, Hybrid genome assembly, Genome project, Computational biology, Biology, Evolution, Molecular, Cot analysis, DNA Transposable Elements, Direct repeat, Animals, Chicken Special/Letters, Chickens, Genetics (clinical), Reference genome, Gene Library, Repetitive Sequences, Nucleic Acid
Abstract: Cot-based cloning and sequencing (CBCS) is a powerful tool for isolating and characterizing the various repetitive components of any genome, combining the established principles of DNA reassociation kinetics with high-throughput sequencing. CBCS was used to generate sequence libraries representing the high, middle, and low-copy fractions of the chicken genome. Sequencing high-copy DNA of chicken to about 2.7× coverage of its estimated sequence complexity led to the initial identification of several new repeat families, which were then used for a survey of the newly released first draft of the complete chicken genome. The analysis provided insight into the diversity and biology of known repeat structures such as CR1 and CNM, for which only limited sequence data had previously been available. Cot sequence data also resulted in the identification of four novel repeats (Birddawg, Hitchcock, Kronos, and Soprano), two new subfamilies of CR1 repeats, and many elements absent from the chicken genome assembly. Multiple autonomous elements were found for a novel Mariner-like transposon, Galluhop, in addition to nonautonomous deletion derivatives. Phylogenetic analysis of the high-copy repeats CR1, Galluhop, and Birddawg provided insight into two distinct genome dispersion strategies. This study also exemplifies the power of the CBCS method to create representative databases for the repetitive fractions of genomes for which only limited sequence data is available.
Published: 2004

87. Conserved Non-Coding Regulatory Signatures in Arabidopsis Co-Expressed Gene Modules

Author: Michael Freeling, Jacob B. Spangler, Stephen P. Ficklin, F. Alex Feltus, and Feng Luo
Subjects: 0106 biological sciences, RNA, Untranslated, Plant Evolution, Arabidopsis, Gene regulatory network, lcsh:Medicine, Genetic Networks, Plant Science, Biology, Models, Biological, 01 natural sciences, Genome, Transcriptomes, Conserved sequence, Molecular Genetics, 03 medical and health sciences, Gene Expression Regulation, Plant, Genome Analysis Tools, Gene Duplication, Plant Genomics, Arabidopsis thaliana, Gene Regulatory Networks, Gene Regulation, lcsh:Science, Genome Evolution, Gene, Conserved Sequence, 030304 developmental biology, Regulatory Networks, Regulation of gene expression, Genetics, 0303 health sciences, Multidisciplinary, Gene Expression Profiling, Systems Biology, lcsh:R, Chromosome Mapping, Computational Biology, Genomics, biology.organism_classification, Gene expression profiling, lcsh:Q, Genome Expression Analysis, Research Article, 010606 plant biology & botany
Abstract: Complex traits and other polygenic processes require coordinated gene expression. Co-expression networks model mRNA co-expression: the product of gene regulatory networks. To identify regulatory mechanisms underlying coordinated gene expression in a tissue-enriched context, ten Arabidopsis thaliana co-expression networks were constructed after manually sorting 4,566 RNA profiling datasets into aerial, flower, leaf, root, rosette, seedling, seed, shoot, whole plant, and global (all samples combined) groups. Collectively, the ten networks contained 30% of the measurable genes of Arabidopsis and were circumscribed into 5,491 modules. Modules were scrutinized for cis regulatory mechanisms putatively encoded in conserved non-coding sequences (CNSs) previously identified as remnants of a whole genome duplication event. We determined the non-random association of 1,361 unique CNSs to 1,904 co-expression network gene modules. Furthermore, the CNS elements were placed in the context of known gene regulatory networks (GRNs) by connecting 250 CNS motifs with known GRN cis elements. Our results provide support for a regulatory role of some CNS elements and suggest the functional consequences of CNS activation of co-expression in specific gene sets dispersed throughout the genome.
Published: 2012
Full Text: View/download PDF

88. A scalable, open-source implementation of a large-scale mechanistic model for single cell proliferation and death signaling

Author: Cemal Erdem, Arnab Mutsuddy, Ethan M. Bensman, William B. Dodd, Michael M. Saint-Antoine, Mehdi Bouhaddou, Robert C. Blake, Sean M. Gross, Laura M. Heiser, F. Alex Feltus, and Marc R. Birtwistle
Subjects: Science
Abstract: Mechanistic models of how single cells respond to different perturbations can help integrate disparate big data sets or predict response to varied drug combinations. Here the authors develop a scalable, open-source pipeline for constructing and simulating large-scale, single-cell mechanistic models, an important building block for clinically-predictive mechanistic models and interpretable big data integration.
Published: 2022
Full Text: View/download PDF

89. NetExtractor: Extracting a Cerebellar Tissue Gene Regulatory Network Using Differentially Expressed High Mutual Information Binary RNA Profiles

Author: Benafsh Husain, Allison R. Hickman, Yuqing Hang, Benjamin T. Shealy, Karan Sapra, and F. Alex Feltus
Subjects: mutual information, differential rna, expression, cerebellar gene, regulatory network, Genetics, QH426-470
Abstract: Bigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm. NetExtractor examines all pairwise gene expression profiles first with Gaussian mixture models (GMMs) to identify sample sub-populations followed by mutual information (MI) analysis that is capable of detecting non-linear differential bigenic expression relationships. We applied NetExtractor to brain tissue RNA profiles from the Genotype-Tissue Expression (GTEx) project to obtain a brain tissue specific gene expression relationship network centered on cerebellar and cerebellar hemisphere enriched edges. We leveraged the PsychENCODE pre-frontal cortex (PFC) gene regulatory network (GRN) to construct a cerebellar cortex (cerebellar) GRN associated with transcriptionally active regions in cerebellar tissue. Thus, we demonstrate the utility of our NetExtractor approach to detect biologically relevant and novel non-linear binary gene relationships.
Published: 2020
Full Text: View/download PDF

90. Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study

Author: Stephen P. Ficklin, Leland J. Dunwoodie, William L. Poehlman, Christopher Watson, Kimberly E. Roche, and F. Alex Feltus
Subjects: Medicine, Science
Abstract: Abstract A gene co-expression network (GCN) describes associations between genes and points to genetic coordination of biochemical pathways. However, genetic correlations in a GCN are only detectable if they are present in the sampled conditions. With the increasing quantity of gene expression samples available in public repositories, there is greater potential for discovery of genetic correlations from a variety of biologically interesting conditions. However, even if gene correlations are present, their discovery can be masked by noise. Noise is introduced from natural variation (intrinsic and extrinsic), systematic variation (caused by sample measurement protocols and instruments), and algorithmic and statistical variation created by selection of data processing tools. A variety of published studies, approaches and methods attempt to address each of these contributions of variation to reduce noise. Here we describe an approach using Gaussian Mixture Models (GMMs) to address natural extrinsic (condition-specific) variation during network construction from mixed input conditions. To demonstrate utility, we build and analyze a condition-annotated GCN from a compendium of 2,016 mixed gene expression data sets from five tumor subtypes obtained from The Cancer Genome Atlas. Our results show that GMMs help discover tumor subtype specific gene co-expression patterns (modules) that are significantly enriched for clinical attributes.
Published: 2017
Full Text: View/download PDF

91. EdgeCrafting: mining embedded, latent, nonlinear patterns to construct gene relationship networks.

Author: Husain B, Reed Bender M, and Alex Feltus F
Subjects: Algorithms, Phenotype, Gene Expression Profiling methods, Gene Regulatory Networks
Abstract: The mechanisms that coordinate cellular gene expression are highly complex and intricately interconnected. Thus, it is necessary to move beyond a fully reductionist approach to understanding genetic information flow and begin focusing on the networked connections between genes that organize cellular function. Continued advancements in computational hardware, coupled with the development of gene correlation network algorithms, provide the capacity to study networked interactions between genes rather than their isolated functions. For example, gene coexpression networks are used to construct gene relationship networks using linear metrics such as Spearman or Pearson correlation. Recently, there have been tools designed to deepen these analyses by differentiating between intrinsic vs extrinsic noise within gene expression values, identifying different modules based on tissue phenotype, and capturing potential nonlinear relationships. In this report, we introduce an algorithm with a novel application of image-based segmentation modalities utilizing blob detection techniques applied for detecting bigenic edges in a gene expression matrix. We applied this algorithm called EdgeCrafting to a bulk RNA-sequencing gene expression matrix comprised of a healthy kidney and cancerous kidney data. We then compared EdgeCrafting against 4 other RNA expression analysis techniques: Weighted Gene Correlation Network Analysis, Knowledge Independent Network Construction, NetExtractor, and Differential gene expression analysis., (© The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America.)
Published: 2022
Full Text: View/download PDF

92. RNA-seq analyses of Arabidopsis thaliana seedlings after exposure to blue-light phototropic stimuli in microgravity.

Author: Vandenbrink JP, Herranz R, Poehlman WL, Alex Feltus F, Villacampa A, Ciska M, Javier Medina F, and Kiss JZ
Subjects: Seedlings, Arabidopsis, Arabidopsis Proteins, Space Flight, Weightlessness
Abstract: Premise: Plants synthesize information from multiple environmental stimuli when determining their direction of growth. Gravity, being ubiquitous on Earth, plays a major role in determining the direction of growth and overall architecture of the plant. Here, we utilized the microgravity environment on board the International Space Station (ISS) to identify genes involved influencing growth and development of phototropically stimulated seedlings of Arabidopsis thaliana., Methods: Seedlings were grown on the ISS, and RNA was extracted from 7 samples (pools of 10-15 plants) grown in microgravity (μg) or Earth gravity conditions (1-g). Transcriptomic analyses via RNA sequencing (RNA-seq) of differential gene expression was performed using the HISAT2-Stringtie-DESeq2 RNASeq pipeline. Differentially expressed genes were further characterized by using Pathway Analysis and enrichment for Gene Ontology classifications., Results: For 296 genes that were found significantly differentially expressed between plants in microgravity compared to 1-g controls, Pathway Analysis identified eight molecular pathways that were significantly affected by reduced gravity conditions. Specifically, light-associated pathways (e.g., photosynthesis-antenna proteins, photosynthesis, porphyrin, and chlorophyll metabolism) were significantly downregulated in microgravity., Conclusions: Gene expression in A. thaliana seedlings grown in microgravity was significantly altered compared to that of the 1-g control. Understanding how plants grow in conditions of microgravity not only aids in our understanding of how plants grow and respond to the environment but will also help to efficiently grow plants during long-range space missions., (© 2019 Botanical Society of America.)
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

92 results on '"F Alex Feltus"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources