17 results on '"Parrello B"'
Search Results
2. The RAST Server: Rapid Annotations using Subsystems Technology
- Author
-
Overbeek Ross A, Osterman Andrei L, Olson Robert, Olsen Gary J, Meyer Folker, Kubal Michael, Glass Elizabeth M, Gerdes Svetlana, Formsma Kevin, Edwards Robert A, Disz Terrence, DeJongh Matthew, Best Aaron A, Bartels Daniela, Aziz Ramy K, McNeil Leslie K, Paarmann Daniel, Paczian Tobias, Parrello Bruce, Pusch Gordon D, Reich Claudia, Stevens Rick, Vassieva Olga, Vonstein Veronika, Wilke Andreas, and Zagnitko Olga
- Subjects
Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. Description We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. Conclusion By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.
- Published
- 2008
- Full Text
- View/download PDF
3. Comparative Genomic Analysis of Bacterial Data in BV-BRC: An Example Exploring Antimicrobial Resistance.
- Author
-
Wattam AR, Bowers N, Brettin T, Conrad N, Cucinell C, Davis JJ, Dickerman AW, Dietrich EM, Kenyon RW, Machi D, Mao C, Nguyen M, Olson RD, Overbeek R, Parrello B, Pusch GD, Shukla M, Stevens RL, Vonstein V, and Warren AS
- Subjects
- Humans, Software, Genome, Bacterial, Anti-Bacterial Agents pharmacology, Web Browser, United States, National Institute of Allergy and Infectious Diseases (U.S.), Genomics methods, Computational Biology methods, Drug Resistance, Bacterial genetics, Databases, Genetic, Bacteria genetics, Bacteria drug effects
- Abstract
As genomic and related data continue to expand, research biologists are often hampered by the computational hurdles required to analyze their data. The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Centers (BRC) to assist researchers with their analysis of genome sequence and other omics-related data. Recently, the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD), and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs merged to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) at https://www.bv-brc.org/ . The combined BV-BRC leverages the functionality of the original resources for bacterial and viral research communities with a unified data model, enhanced web-based visualization and analysis tools, and bioinformatics services. Here we demonstrate how antimicrobial resistance data can be analyzed in the new resource., (© 2024. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.)
- Published
- 2024
- Full Text
- View/download PDF
4. Predicting variable gene content in Escherichia coli using conserved genes.
- Author
-
Nguyen M, Elmore Z, Ihle C, Moen FS, Slater AD, Turner BN, Parrello B, Best AA, and Davis JJ
- Subjects
- Escherichia coli genetics, Genome, Bacterial genetics, Plasmids, Escherichia coli Proteins genetics, Anti-Infective Agents
- Abstract
Having the ability to predict the protein-encoding gene content of an incomplete genome or metagenome-assembled genome is important for a variety of bioinformatic tasks. In this study, as a proof of concept, we built machine learning classifiers for predicting variable gene content in Escherichia coli genomes using only the nucleotide k-mers from a set of 100 conserved genes as features. Protein families were used to define orthologs , and a single classifier was built for predicting the presence or absence of each protein family occurring in 10%-90% of all E. coli genomes. The resulting set of 3,259 extreme gradient boosting classifiers had a per-genome average macro F1 score of 0.944 [0.943-0.945, 95% CI]. We show that the F1 scores are stable across multi-locus sequence types and that the trend can be recapitulated by sampling a smaller number of core genes or diverse input genomes. Surprisingly, the presence or absence of poorly annotated proteins, including "hypothetical proteins" was accurately predicted (F1 = 0.902 [0.898-0.906, 95% CI]). Models for proteins with horizontal gene transfer-related functions had slightly lower F1 scores but were still accurate (F1s = 0.895, 0.872, 0.824, and 0.841 for transposon, phage, plasmid, and antimicrobial resistance-related functions, respectively). Finally, using a holdout set of 419 diverse E. coli genomes that were isolated from freshwater environmental sources, we observed an average per-genome F1 score of 0.880 [0.876-0.883, 95% CI], demonstrating the extensibility of the models. Overall, this study provides a framework for predicting variable gene content using a limited amount of input sequence data. IMPORTANCE Having the ability to predict the protein-encoding gene content of a genome is important for assessing genome quality, binning genomes from shotgun metagenomic assemblies, and assessing risk due to the presence of antimicrobial resistance and other virulence genes. In this study, we built a set of binary classifiers for predicting the presence or absence of variable genes occurring in 10%-90% of all publicly available E. coli genomes. Overall, the results show that a large portion of the E. coli variable gene content can be predicted with high accuracy, including genes with functions relating to horizontal gene transfer. This study offers a strategy for predicting gene content using limited input sequence data., Competing Interests: The authors declare no conflict of interest.
- Published
- 2023
- Full Text
- View/download PDF
5. Engineering of increased L-Threonine production in bacteria by combinatorial cloning and machine learning.
- Author
-
Hanke P, Parrello B, Vasieva O, Akins C, Chlenski P, Babnigg G, Henry C, Foflonker F, Brettin T, Antonopoulos D, Stevens R, and Fonstein M
- Abstract
The goal of this study is to develop a general strategy for bacterial engineering using an integrated synthetic biology and machine learning (ML) approach. This strategy was developed in the context of increasing L-threonine production in Escherichia coli ATCC 21277. A set of 16 genes was initially selected based on metabolic pathway relevance to threonine biosynthesis and used for combinatorial cloning to construct a set of 385 strains to generate training data (i.e., a range of L-threonine titers linked to each of the specific gene combinations). Hybrid (regression/classification) deep learning (DL) models were developed and used to predict additional gene combinations in subsequent rounds of combinatorial cloning for increased L-threonine production based on the training data. As a result, E. coli strains built after just three rounds of iterative combinatorial cloning and model prediction generated higher L-threonine titers (from 2.7 g/L to 8.4 g/L) than those of patented L-threonine strains being used as controls (4-5 g/L). Interesting combinations of genes in L-threonine production included deletions of the tdh , metL , dapA , and dhaM genes as well as overexpression of the pntAB , ppc , and aspC genes. Mechanistic analysis of the metabolic system constraints for the best performing constructs offers ways to improve the models by adjusting weights for specific gene combinations. Graph theory analysis of pairwise gene modifications and corresponding levels of L-threonine production also suggests additional rules that can be incorporated into future ML models., Competing Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (©2023PublishedbyElsevierB.V.)
- Published
- 2023
- Full Text
- View/download PDF
6. Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR.
- Author
-
Olson RD, Assaf R, Brettin T, Conrad N, Cucinell C, Davis JJ, Dempsey DM, Dickerman A, Dietrich EM, Kenyon RW, Kuscuoglu M, Lefkowitz EJ, Lu J, Machi D, Macken C, Mao C, Niewiadomska A, Nguyen M, Olsen GJ, Overbeek JC, Parrello B, Parrello V, Porter JS, Pusch GD, Shukla M, Singh I, Stewart L, Tan G, Thomas C, VanOeffelen M, Vonstein V, Wallace ZS, Warren AS, Wattam AR, Xia F, Yoo H, Zhang Y, Zmasek CM, Scheuermann RH, and Stevens RL
- Subjects
- Humans, Bacteria genetics, Computational Biology, Databases, Genetic, Influenza, Human, Genomics, Viruses genetics, Software
- Abstract
The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities., (Published by Oxford University Press on behalf of Nucleic Acids Research 2022.)
- Published
- 2023
- Full Text
- View/download PDF
7. Supervised extraction of near-complete genomes from metagenomic samples: A new service in PATRIC.
- Author
-
Parrello B, Butler R, Chlenski P, Pusch GD, and Overbeek R
- Subjects
- Cluster Analysis, Humans, Sequence Analysis, DNA methods, Metagenome, Metagenomics methods
- Abstract
Large amounts of metagenomically-derived data are submitted to PATRIC for analysis. In the future, we expect even more jobs submitted to PATRIC will use metagenomic data. One in-demand use case is the extraction of near-complete draft genomes from assembled contigs of metagenomic origin. The PATRIC metagenome binning service utilizes the PATRIC database to furnish a large, diverse set of reference genomes. We provide a new service for supervised extraction and annotation of high-quality, near-complete genomes from metagenomically-derived contigs. Reference genomes are assigned to putative draft genome bins based on the presence of single-copy universal marker roles in the sample, and contigs are sorted into these bins by their similarity to reference genomes in PATRIC. Each set of binned contigs represents a draft genome that will be annotated by RASTtk in PATRIC. A structured-language binning report is provided containing quality measurements and taxonomic information about the contig bins. The PATRIC metagenome binning service emphasizes extraction of high-quality genomes for downstream analysis using other PATRIC tools and services. Due to its supervised nature, the binning service is not appropriate for mining novel or extremely low-coverage genomes from metagenomic samples., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2021
- Full Text
- View/download PDF
8. The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities.
- Author
-
Davis JJ, Wattam AR, Aziz RK, Brettin T, Butler R, Butler RM, Chlenski P, Conrad N, Dickerman A, Dietrich EM, Gabbard JL, Gerdes S, Guard A, Kenyon RW, Machi D, Mao C, Murphy-Olson D, Nguyen M, Nordberg EK, Olsen GJ, Olson RD, Overbeek JC, Overbeek R, Parrello B, Pusch GD, Shukla M, Thomas C, VanOeffelen M, Vonstein V, Warren AS, Xia F, Xie D, Yoo H, and Stevens R
- Subjects
- Algorithms, Animals, Caenorhabditis elegans genetics, Chickens genetics, Drosophila melanogaster genetics, Host-Pathogen Interactions genetics, Humans, Internet, Macaca mulatta genetics, Metagenomics, Mice, National Institute of Allergy and Infectious Diseases (U.S.), Phenotype, Phylogeny, Rats, Swine genetics, United States, Zebrafish genetics, Bacteria genetics, Computational Biology methods, Databases, Genetic
- Abstract
The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data., (Published by Oxford University Press on behalf of Nucleic Acids Research 2019.)
- Published
- 2020
- Full Text
- View/download PDF
9. A machine learning-based service for estimating quality of genomes using PATRIC.
- Author
-
Parrello B, Butler R, Chlenski P, Olson R, Overbeek J, Pusch GD, Vonstein V, and Overbeek R
- Subjects
- Metagenomics standards, Software, Databases, Genetic, Genome, Archaeal, Genome, Bacterial, Machine Learning, Metagenomics methods
- Abstract
Background: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel., Description: We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies., Conclusion: EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes.
- Published
- 2019
- Full Text
- View/download PDF
10. PATRIC as a unique resource for studying antimicrobial resistance.
- Author
-
Antonopoulos DA, Assaf R, Aziz RK, Brettin T, Bun C, Conrad N, Davis JJ, Dietrich EM, Disz T, Gerdes S, Kenyon RW, Machi D, Mao C, Murphy-Olson DE, Nordberg EK, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, Santerre J, Shukla M, Stevens RL, VanOeffelen M, Vonstein V, Warren AS, Wattam AR, Xia F, and Yoo H
- Subjects
- Computational Biology trends, Genome, Microbial, Humans, Internet, Molecular Sequence Annotation, Computational Biology methods, Databases, Genetic statistics & numerical data, Drug Resistance, Microbial genetics, Systems Integration
- Abstract
The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes., (© The Author 2017. Published by Oxford University Press.)
- Published
- 2019
- Full Text
- View/download PDF
11. KBase: The United States Department of Energy Systems Biology Knowledgebase.
- Author
-
Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, Sneddon MW, Henderson ML, Riehl WJ, Murphy-Olson D, Chan SY, Kamimura RT, Kumari S, Drake MM, Brettin TS, Glass EM, Chivian D, Gunter D, Weston DJ, Allen BH, Baumohl J, Best AA, Bowen B, Brenner SE, Bun CC, Chandonia JM, Chia JM, Colasanti R, Conrad N, Davis JJ, Davison BH, DeJongh M, Devoid S, Dietrich E, Dubchak I, Edirisinghe JN, Fang G, Faria JP, Frybarger PM, Gerlach W, Gerstein M, Greiner A, Gurtowski J, Haun HL, He F, Jain R, Joachimiak MP, Keegan KP, Kondo S, Kumar V, Land ML, Meyer F, Mills M, Novichkov PS, Oh T, Olsen GJ, Olson R, Parrello B, Pasternak S, Pearson E, Poon SS, Price GA, Ramakrishnan S, Ranjan P, Ronald PC, Schatz MC, Seaver SMD, Shukla M, Sutormin RA, Syed MH, Thomason J, Tintle NL, Wang D, Xia F, Yoo H, Yoo S, and Yu D
- Subjects
- Humans, United States, Computational Biology methods, Database Management Systems trends, Knowledge Bases, Systems Biology trends
- Published
- 2018
- Full Text
- View/download PDF
12. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center.
- Author
-
Wattam AR, Davis JJ, Assaf R, Boisvert S, Brettin T, Bun C, Conrad N, Dietrich EM, Disz T, Gabbard JL, Gerdes S, Henry CS, Kenyon RW, Machi D, Mao C, Nordberg EK, Olsen GJ, Murphy-Olson DE, Olson R, Overbeek R, Parrello B, Pusch GD, Shukla M, Vonstein V, Warren A, Xia F, Yoo H, and Stevens RL
- Subjects
- Anti-Bacterial Agents pharmacology, Bacteria drug effects, Bacteria metabolism, Bacterial Proteins genetics, Bacterial Proteins metabolism, Drug Resistance, Bacterial, Molecular Sequence Annotation, Proteome, Proteomics methods, Software, Web Browser, Bacteria genetics, Computational Biology methods, Databases, Genetic, Genome, Bacterial, Genomics methods
- Abstract
The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by 'virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics., (Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.)
- Published
- 2017
- Full Text
- View/download PDF
13. Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation.
- Author
-
Faria JP, Davis JJ, Edirisinghe JN, Taylor RC, Weisenhorn P, Olson RD, Stevens RL, Rocha M, Rocha I, Best AA, DeJongh M, Tintle NL, Parrello B, Overbeek R, and Henry CS
- Abstract
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus , and Staphylococcus aureus , each of which represents increasing degrees of phylogenetic distance from E. coli . Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain.
- Published
- 2016
- Full Text
- View/download PDF
14. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.
- Author
-
Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, Shukla M, Thomason JA 3rd, Stevens R, Vonstein V, Wattam AR, and Xia F
- Subjects
- Molecular Sequence Annotation methods, Software
- Abstract
The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.
- Published
- 2015
- Full Text
- View/download PDF
15. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).
- Author
-
Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, Parrello B, Shukla M, Vonstein V, Wattam AR, Xia F, and Stevens R
- Subjects
- Bacterial Proteins chemistry, Bacterial Proteins genetics, Bacterial Proteins physiology, Genomics, Internet, Software, Databases, Genetic, Genome, Archaeal, Genome, Bacterial, Molecular Sequence Annotation
- Abstract
In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.
- Published
- 2014
- Full Text
- View/download PDF
16. SEED servers: high-performance access to the SEED genomes, annotations, and metabolic models.
- Author
-
Aziz RK, Devoid S, Disz T, Edwards RA, Henry CS, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, Stevens RL, Vonstein V, and Xia F
- Subjects
- Escherichia coli genetics, Escherichia coli metabolism, Genomics methods, Genomics statistics & numerical data, Internet, Metabolomics methods, Metabolomics statistics & numerical data, Molecular Sequence Annotation methods, Molecular Sequence Annotation statistics & numerical data, Reproducibility of Results, Computational Biology methods, Databases, Factual statistics & numerical data, Information Storage and Retrieval methods, Software
- Abstract
The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.
- Published
- 2012
- Full Text
- View/download PDF
17. The RAST Server: rapid annotations using subsystems technology.
- Author
-
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, and Zagnitko O
- Subjects
- Databases, Nucleic Acid, Genes, rRNA genetics, Genome, Archaeal, Genome, Bacterial, Open Reading Frames genetics, Phylogeny, Proteins genetics, RNA, Transfer genetics, Reproducibility of Results, Sensitivity and Specificity, Time Factors, User-Computer Interface, Computational Biology methods
- Abstract
Background: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them., Description: We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12-24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service., Conclusion: By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.