22 results on '"Benjamin C. Hitz"'
Search Results
2. The ENCODE Uniform Analysis Pipelines
- Author
-
Benjamin C. Hitz, Jin-Wook Lee, Otto Jolanki, Meenakshi S. Kagda, Keenan Graham, Paul Sud, Idan Gabdank, J. Seth Strattan, Cricket A. Sloan, Timothy Dreszer, Laurence D. Rowe, Nikhil R. Podduturi, Venkat S. Malladi, Esther T. Chan, Jean M. Davidson, Marcus Ho, Stuart Miyasato, Matt Simison, Forrest Tanaka, Yunhai Luo, Ian Whaling, Eurie L. Hong, Brian T. Lee, Richard Sandstrom, Eric Rynes, Jemma Nelson, Andrew Nishida, Alyssa Ingersoll, Michael Buckley, Mark Frerker, Daniel S Kim, Nathan Boley, Diane Trout, Alex Dobin, Sorena Rahmanian, Dana Wyman, Gabriela Balderrama-Gutierrez, Fairlie Reese, Neva C. Durand, Olga Dudchenko, David Weisz, Suhas S. P. Rao, Alyssa Blackburn, Dimos Gkountaroulis, Mahdi Sadr, Moshe Olshansky, Yossi Eliaz, Dat Nguyen, Ivan Bochkov, Muhammad Saad Shamim, Ragini Mahajan, Erez Aiden, Tom Gingeras, Simon Heath, Martin Hirst, W. James Kent, Anshul Kundaje, Ali Mortazavi, Barbara Wold, and J. Michael Cherry
- Abstract
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of theHomo sapiensandMus musculusgenomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and availableviathe ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL;https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environmentsviaCromwell. Access to the pipelines and dataviathe cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.Database URL:https://www.encodeproject.org/
- Published
- 2023
- Full Text
- View/download PDF
3. Annotating and prioritizing human non-coding variants with RegulomeDB
- Author
-
Shengcheng Dong, Nanxiang Zhao, Emma Spragins, Meenakshi S. Kagda, Mingjie Li, Pedro Assis, Otto Jolanki, Yunhai Luo, J Michael Cherry, Alan P Boyle, and Benjamin C Hitz
- Abstract
Nearly 90% of the disease risk-associated variants identified from genome-wide association studies (GWAS) are in non-coding regions of the genome. The annotations obtained from analyzing functional genomics assays can provide additional information to pinpoint causal variants, which are often not the lead variants identified from association studies. However, the lack of available annotation tools limits the use of such data.To address the challenge, we have previously built the RegulomeDB database for prioritizing and annotating variants in non-coding regions1, which has been a highly utilized resource for the research community (Supplementary Fig. 1). RegulomeDB annotates a variant by intersecting its position with genomic intervals identified from functional genomic assays and computational approaches. It also incorporates those hits of a variant into a heuristic ranking score, representing its potential to be functional in regulatory elements.Here we present a newer version of the RegulomeDB web server, RegulomeDB v2.1 (http://regulomedb.org). We improve and boost annotation power by incorporating thousands of newly processed data from functional genomic assays in GRCh38 assembly, and now include probabilistic scores from the SURF algorithm that was the top performing non-coding variant predictor in CAGI 52. We also provide interactive charts and genome browser views to allow users an easy way to perform exploratory analyses in different tissue contexts.
- Published
- 2022
- Full Text
- View/download PDF
4. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal
- Author
-
Casey Litton, Zachary Myers, Ulugbek K. Baymuradov, Benjamin C. Hitz, Meenakshi S. Kagda, Otto Jolanki, Jin-Wook Lee, Stuart R. Miyasato, Keenan Graham, Idan Gabdank, Forrest Y. Tanaka, Bonita R. Lam, J. Seth Strattan, Jason A. Hilton, J. Michael Cherry, Yunhai Luo, Philip Adenekan, Paul Sud, Emma O'Neill, Jennifer Jou, and Khine Lin
- Subjects
Interoperability ,Cloud computing ,Data_CODINGANDINFORMATIONTHEORY ,Biology ,ENCODE ,World Wide Web ,Mice ,03 medical and health sciences ,0302 clinical medicine ,Documentation ,Software ,Databases, Genetic ,Genetics ,Database Issue ,Animals ,Humans ,030304 developmental biology ,0303 health sciences ,Genome, Human ,business.industry ,DNA ,Genomics ,Visualization ,Open data ,Encyclopedia ,business ,030217 neurology & neurosurgery - Abstract
The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.
- Published
- 2019
- Full Text
- View/download PDF
5. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models
- Author
-
Joel Rozowsky, Jorg Drenkow, Yucheng T Yang, Gamze Gursoy, Timur Galeev, Beatrice Borsari, Charles B Epstein, Kun Xiong, Jinrui Xu, Jiahao Gao, Keyang Yu, Ana Berthel, Zhanlin Chen, Fabio Navarro, Jason Liu, Maxwell S Sun, James Wright, Justin Chang, Christopher JF Cameron, Noam Shoresh, Elizabeth Gaskell, Jessika Adrian, Sergey Aganezov, François Aguet, Gabriela Balderrama-Gutierrez, Samridhi Banskota, Guillermo Barreto Corona, Sora Chee, Surya B Chhetri, Gabriel Conte Cortez Martins, Cassidy Danyko, Carrie A Davis, Daniel Farid, Nina P Farrell, Idan Gabdank, Yoel Gofin, David U Gorkin, Mengting Gu, Vivian Hecht, Benjamin C Hitz, Robbyn Issner, Melanie Kirsche, Xiangmeng Kong, Bonita R Lam, Shantao Li, Bian Li, Tianxiao Li, Xiqi Li, Khine Zin Lin, Ruibang Luo, Mark Mackiewicz, Jill E Moore, Jonathan Mudge, Nicholas Nelson, Chad Nusbaum, Ioann Popov, Henry E Pratt, Yunjiang Qiu, Srividya Ramakrishnan, Joe Raymond, Leonidas Salichos, Alexandra Scavelli, Jacob M Schreiber, Fritz J Sedlazeck, Lei Hoon See, Rachel M Sherman, Xu Shi, Minyi Shi, Cricket Alicia Sloan, J Seth Strattan, Zhen Tan, Forrest Y Tanaka, Anna Vlasova, Jun Wang, Jonathan Werner, Brian Williams, Min Xu, Chengfei Yan, Lu Yu, Christopher Zaleski, Jing Zhang, Kristin Ardlie, J Michael Cherry, Eric M Mendenhall, William S Noble, Zhiping Weng, Morgan E Levine, Alexander Dobin, Barbara Wold, Ali Mortazavi, Bing Ren, Jesse Gillis, Richard M Myers, Michael P Snyder, Jyoti Choudhary, Aleksandar Milosavljevic, Michael C Schatz, Roderic Guigó, Bradley E Bernstein, Thomas R Gingeras, and Mark Gerstein
- Subjects
Genetic variants ,Genomics ,Preprint ,Computational biology ,Biology ,Personal genomics - Abstract
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of personal epigenomes, for ∼25 tissues and >10 assays in four donors (>1500 open-access functional genomic and proteomic datasets, in total). Each dataset is mapped to a matched, diploid personal genome, which has long-read phasing and structural variants. The mappings enable us to identify >1 million loci with allele-specific behavior. These loci exhibit coordinated epigenetic activity along haplotypes and less conservation than matched, non-allele-specific loci, in a fashion broadly paralleling tissue-specificity. Surprisingly, they can be accurately modelled just based on local nucleotide-sequence context. Combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci and enables models for transferring known eQTLs to difficult-to-profile tissues. Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
- Published
- 2021
- Full Text
- View/download PDF
6. The ENCODE Portal as an Epigenomics Resource
- Author
-
J. Seth Strattan, Khine Lin, Keenan Graham, Casey Litton, Emma O'Neill, Philip Adenekan, Jason A. Hilton, Paul Sud, Benjamin C. Hitz, Idan Gabdank, J. Michael Cherry, Yunhai Luo, Forrest Y. Tanaka, Zachary Myers, Jennifer Jou, Stuart R. Miyasato, Ulugbek K. Baymuradov, Otto Jolanki, Meenakshi S. Kagda, Jin-Wook Lee, and Bonita R. Lam
- Subjects
Epigenomics ,Computer science ,Genomics ,ENCODE ,Article ,03 medical and health sciences ,Mice ,Data file ,Databases, Genetic ,Animals ,Humans ,Protocol (object-oriented programming) ,030304 developmental biology ,0303 health sciences ,Internet ,Metadata ,Information retrieval ,Genome, Human ,030305 genetics & heredity ,General Medicine ,DNA ,DNA Methylation ,Metadata modeling ,Chromatin ,ComputingMethodologies_PATTERNRECOGNITION ,Human genome ,Software - Abstract
The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine-readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access.
- Published
- 2019
7. ENCODE data at the ENCODE portal
- Author
-
Forrest Y. Tanaka, Esther T. Chan, Marcus Ho, Cricket A. Sloan, Nikhil R. Podduturi, J. Seth Strattan, Eurie L. Hong, Jean M. Davidson, Benjamin C. Hitz, Brian T. Lee, Greg Roe, Timothy R. Dreszer, Laurence D. Rowe, Idan Gabdank, Aditi K. Narayanan, Venkat S. Malladi, and J. Michael Cherry
- Subjects
0301 basic medicine ,Genomics ,Computational biology ,Biology ,ENCODE ,Genome ,Mice ,03 medical and health sciences ,Databases, Genetic ,Genetics ,Animals ,Humans ,Database Issue ,Gene ,Genome, Human ,Proteins ,DNA ,Visualization ,Metadata ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Genes ,DNA methylation ,RNA ,Human genome - Abstract
The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.
- Published
- 2015
- Full Text
- View/download PDF
8. Prevention of data duplication for high throughput sequencing repositories
- Author
-
J. Seth Strattan, Carrie A. Davis, Forrest Y. Tanaka, Benjamin C. Hitz, J. Michael Cherry, Keenan Graham, Jean M. Davidson, Jason A. Hilton, Idan Gabdank, Kathrina C. Onate, Stuart R. Miyasato, Otto Jolanki, Timothy R. Dreszer, Esther T. Chan, Aditi K. Narayanan, Ulugbek K. Baymuradov, and Cricket A. Sloan
- Subjects
0301 basic medicine ,Computer science ,business.industry ,Extramural ,MEDLINE ,Computational biology ,General Biochemistry, Genetics and Molecular Biology ,DNA sequencing ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Text mining ,Data deduplication ,Original Article ,Databases, Nucleic Acid ,General Agricultural and Biological Sciences ,business ,Data Curation ,030217 neurology & neurosurgery ,Information Systems - Abstract
Prevention of unintended duplication is one of the ongoing challenges many databases have to address. Working with high-throughput sequencing data, the complexity of that challenge increases with the complexity of the definition of a duplicate. In a computational data model, a data object represents a real entity like a reagent or a biosample. This representation is similar to how a card represents a book in a paper library catalog. Duplicated data objects not only waste storage, they can mislead users into assuming the model represents more than the single entity. Even if it is clear that two objects represent a single entity, data duplication opens the door to potential inconsistencies between the objects since the content of the duplicated objects can be updated independently, allowing divergence of the metadata associated with the objects. Analogously to a situation in which a catalog in a paper library would contain by mistake two cards for a single copy of a book. If these cards are listing simultaneously two different individuals as current book borrowers, it would be difficult to determine which borrower (out of the two listed) actually has the book. Unfortunately, in a large database with multiple submitters, unintended duplication is to be expected. In this article, we present three principal guidelines the Encyclopedia of DNA Elements (ENCODE) Portal follows in order to prevent unintended duplication of both actual files and data objects: definition of identifiable data objects (I), object uniqueness validation (II) and de-duplication mechanism (III). In addition to explaining our modus operandi, we elaborate on the methods used for identification of sequencing data files. Comparison of the approach taken by the ENCODE Portal vs other widely used biological data repositories is provided. Database URL: https://www.encodeproject.org/
- Published
- 2018
- Full Text
- View/download PDF
9. The Reference Genome Sequence ofSaccharomyces cerevisiae: Then and Now
- Author
-
Edith D. Wong, Maria C. Costanzo, Dianna G. Fisk, Marek S. Skrzypek, Selina S. Dwight, Fred S. Dietrich, Paul Lloyd, Robert S. Nash, Kalpana Karra, Stacia R. Engel, Gail Binkley, Matt Simison, J. Michael Cherry, Benjamin C. Hitz, Stuart R. Miyasato, Rama Balakrishnan, and Shuai Weng
- Subjects
Databases, Factual ,Sequence analysis ,Saccharomyces cerevisiae ,Investigations ,ENCODE ,genome release ,Genome ,Open Reading Frames ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,model organism ,Molecular Biology ,Genetics (clinical) ,030304 developmental biology ,Whole genome sequencing ,Internet ,0303 health sciences ,biology ,reference sequence ,Chromosome Mapping ,Sequence Analysis, DNA ,Genome project ,S288C ,biology.organism_classification ,Yeast ,Genome, Fungal ,030217 neurology & neurosurgery ,Reference genome - Abstract
The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called “S288C 2010,” was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science.
- Published
- 2014
- Full Text
- View/download PDF
10. Annotation of functional variation in personal genomes using RegulomeDB
- Author
-
J. Michael Cherry, Shuai Weng, Konrad J. Karczewski, Manoj Hariharan, Alan P. Boyle, Michael Snyder, Marc A. Schaub, Maya Kasowski, Benjamin C. Hitz, Julie Park, Eurie L. Hong, and Yong Cheng
- Subjects
Resource ,Nonsynonymous substitution ,Genotype ,Genome-wide association study ,Computational biology ,Regulatory Sequences, Nucleic Acid ,Biology ,ENCODE ,Polymorphism, Single Nucleotide ,Genome ,Open Reading Frames ,Annotation ,Databases, Genetic ,Genetics ,Humans ,Lupus Erythematosus, Systemic ,Tumor Necrosis Factor alpha-Induced Protein 3 ,Genetics (clinical) ,Internet ,Genome, Human ,Intracellular Signaling Peptides and Proteins ,Genetic Variation ,Nuclear Proteins ,Molecular Sequence Annotation ,DNA-Binding Proteins ,Human genome ,Genome-Wide Association Study ,Personal genomics - Abstract
As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.
- Published
- 2012
- Full Text
- View/download PDF
11. Saccharomyces Genome Database: the genomics resource of budding yeast
- Author
-
Marek S. Skrzypek, Eurie L. Hong, Edith D. Wong, Cynthia J. Krieger, Selina S. Dwight, Stuart R. Miyasato, Maria C. Costanzo, Robert S. Nash, Jodi E. Hirschman, Esther T. Chan, Kalpana Karra, Benjamin C. Hitz, Julie Park, Dianna G. Fisk, J. Michael Cherry, Karen R. Christie, Shuai Weng, Matt Simison, Rama Balakrishnan, Stacia R. Engel, Gail Binkley, and Craig Amundsen
- Subjects
Genes, Fungal ,Saccharomyces cerevisiae ,Genomics ,Genome browser ,Computational biology ,Saccharomyces ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Terminology as Topic ,Databases, Genetic ,Web page ,Genetics ,030304 developmental biology ,0303 health sciences ,biology ,High-Throughput Nucleotide Sequencing ,Molecular Sequence Annotation ,Articles ,biology.organism_classification ,Phenotype ,ComputingMethodologies_PATTERNRECOGNITION ,Encyclopedia ,Genome, Fungal ,Software ,030217 neurology & neurosurgery - Abstract
The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use.
- Published
- 2011
- Full Text
- View/download PDF
12. The Gene Ontology: enhancements for 2011
- Author
-
P D'Eustachio, Benjamin C. Hitz, Julie Park, Paul Browne, Douglas G. Howe, Cynthia J. Krieger, Kalpana Karra, Stan Laulederkind, Karen R. Christie, Susan Tweedie, Eurie L. Hong, Lydie Bougueleret, Michele Magrane, Cathy R. Gresham, Rolf Apweiler, Lisa Matthews, Dong Li, Philippa J. Talmud, Ioannis Xenarios, J. M. Cherry, Tanya Z. Berardini, Deborah A. Siegele, Rama Balakrishnan, D. Sitnikov, A. Auchinchloss, Selina S. Dwight, Tony Sawford, Paul J. Kersey, Ruth C. Lovering, Ruth Y. Eberhardt, Ursula Hinz, Lakshmi Pillai, Sylvain Poux, Edith D. Wong, Klemens Pichler, Kati Laiho, Malcolm J. Gardner, Stephen G. Oliver, Lionel Breuza, Kara Dolinski, P Lemercier, Kristian B. Axelsen, Midori A. Harris, Adrienne E. Zweifel, H. Drabkin, Guillaume Keller, Marek S. Skrzypek, Daniel M. Staines, Fiona M. McCarthy, Nicholas H. Brown, Mark D. McDowall, Antonia Lock, Mary Shimoyama, Maria C. Costanzo, Teresia Buza, S. Jimenez, Rex L. Chisholm, Paul W. Sternberg, Hui Wang, Nadine Gruaz-Gumowski, Chantal Hulo, Rebecca E. Foulger, Melinda R. Dwinell, Judith A. Blake, Marcus C. Chibucos, B. K. McIntosh, C. D. Amundsen, Jane Lomax, L Famiglietti, Tom Hayman, Michael Tognolli, Eva Huala, James C. Hu, Patrick Masson, Maria Jesus Martin, Benoit Bely, Shuai Weng, Heather C. Wick, E. Dimmer, L. Ni, Catherine Rivoire, Christopher J. Mungall, H. Sehra, P. Duek-Roggli, Maria Victoria Schneider, Dianna G. Fisk, Michael S. Livstone, Ivo Pedruzzi, Shyamala Sundaram, Donna K. Slonim, Isabelle Cusin, Stuart R. Miyasato, Timothy F. Lowry, Varsha K. Khodiyar, Seth Carbon, Elisabeth Coudert, Jürg Bähler, Juancarlos Chan, Evelyn Camon, Daniel P. Renfro, Anne Estreicher, M. C. Blatter, Robert S. Nash, P Gaudet, Sven Heinicke, K. Van Auken, Stacia R. Engel, Alan Bridge, Ralf Stephan, Mary E. Dolan, Shane C. Burgess, Petra Fey, Shur-Jen Wang, Damien Lieberherr, Duncan Legge, P. Porras Millán, Andre Stutz, Yasmin Alam-Faruque, Gail Binkley, Bernd Roechert, S. Branconi-Quintaje, Ghislaine Argoud-Puy, S. Basu, Kim Rutherford, M. Moinat, Monte Westerfield, Arnaud Gos, Eleanor J Stanley, Valerie Wood, Ranjana Kishore, Diego Poggioli, S. Ferro-Rojas, Victoria Petri, Florence Jungo, Suzanna E. Lewis, Emmanuel Boutet, Warren A. Kibbe, M Feuermann, Claire O'Donovan, W. M. Chan, J. James, David P. Hill, Rachael P. Huntley, M. Gwinn Giglio, Paul Thomas, Jodi E. Hirschman, Paola Roncaglia, Gene Ontology Consortium, Blake, JA., Dolan, M., Drabkin, H., Hill, DP., Ni, L., Sitnikov, D., Burgess, S., Buza, T., Gresham, C., McCarthy, F., Pillai, L., Wang, H., Carbon, S., Lewis, SE., Mungall, CJ., Gaudet, P., Chisholm, RL., Fey, P., Kibbe, WA., Basu, S., Siegele, DA., McIntosh, BK., Renfro, DP., Zweifel, AE., Hu, JC., Brown, NH., Tweedie, S., Alam-Faruque, Y., Apweiler, R., Auchinchloss, A., Axelsen, K., Argoud-Puy, G., Bely, B., Blatter, M-., Bougueleret, L., Boutet, E., Branconi, S., Breuza, L., Bridge, A., Browne, P., Chan, WM., Coudert, E., Cusin, I., Dimmer, E., Duek-Roggli, P., Eberhardt, R., Estreicher, A., Famiglietti, L., Ferro-Rojas, S., Feuermann, M., Gardner, M., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Huntley, R., James, J., Jimenez, S., Jungo, F., Keller, G., Laiho, K., Legge, D., Lemercier, P., Lieberherr, D., Magrane, M., Martin, MJ., Masson, P., Moinat, M., O'Donovan, C., Pedruzzi, I., Pichler, K., Poggioli, D., Porras Millán, P., Poux, S., Rivoire, C., Roechert, B., Sawford, T., Schneider, M., Sehra, H., Stanley, E., Stutz, A., Sundaram, S., Tognolli, M., Xenarios, I., Foulger, R., Lomax, J., Roncaglia, P., Camon, E., Khodiyar, VK., Lovering, RC., Talmud, PJ., Chibucos, M., Gwinn Giglio, M., Dolinski, K., Heinicke, S., Livstone, MS., Stephan, R., Harris, MA., Oliver, SG., Rutherford, K., Wood, V., Bahler, J., Lock, A., Kersey, PJ., McDowall, MD., Staines, DM., Dwinell, M., Shimoyama, M., Laulederkind, S., Hayman, T., Wang, S-., Petri, V., Lowry, T., D'Eustachio, P., Matthews, L., Amundsen, CD., Balakrishnan, R., Binkley, G., Cherry, JM., Christie, KR., Costanzo, MC., Dwight, SS., Engel, SR., Fisk, DG., Hirschman, JE., Hitz, BC., Hong, EL., Karra, K., Krieger, CJ., Miyasato, SR., Nash, RS., Park, J., Skrzypek, MS., Weng, S., Wong, ED., Berardini, TZ., Li, D., Huala, E., Slonim, D., Wick, H., Thomas, P., Chan, J., Kishore, R., Sternberg, P., Van Auken, K., Howe, D., and Westerfield, M.
- Subjects
Quality Control ,0303 health sciences ,media_common.quotation_subject ,Databases, Genetic ,Molecular Sequence Annotation/standards ,Vocabulary, Controlled ,Inference ,Molecular Sequence Annotation ,Articles ,Biology ,Ontology (information science) ,World Wide Web ,Open Biomedical Ontologies ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Resource (project management) ,Controlled vocabulary ,Genetics ,Social media ,Function (engineering) ,030217 neurology & neurosurgery ,030304 developmental biology ,media_common - Abstract
The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the introduction of computable logical definitions and development of new tools for the streamlined addition of terms to the ontology. The GOC continues to support its user community through the use of e-mail lists, social media and web-based resources.
- Published
- 2011
- Full Text
- View/download PDF
13. Expanded protein information at SGD: new pages and proteome browser
- Author
-
Rama Balakrishnan, Chandra L. Theesfeld, Robert S. Nash, Maria C. Costanzo, J. Michael Cherry, Kara Dolinski, Marek S. Skrzypek, Eurie L. Hong, Mark Schroeder, David Botstein, Shuai Weng, Michael S. Livstone, Stacia R. Engel, Selina S. Dwight, Christopher Lane, Gail Binkley, Benjamin C. Hitz, Julie Park, Stuart R. Miyasato, Jodi E. Hirschman, Karen R. Christie, Anand Sethuraman, Dianna G. Fisk, Qing Dong, and Rose Oughtred
- Subjects
Proteomics ,Internet ,Saccharomyces cerevisiae Proteins ,Information retrieval ,Protein family ,business.industry ,Saccharomyces cerevisiae ,Articles ,Biology ,Bioinformatics ,Visualization ,User-Computer Interface ,ComputingMethodologies_PATTERNRECOGNITION ,Protein Annotation ,Sequence Analysis, Protein ,Web page ,Proteome ,Computer Graphics ,Genetics ,The Internet ,Genome, Fungal ,Databases, Protein ,business ,Hidden Markov model - Abstract
The recent explosion in protein data generated from both directed small-scale studies and large-scale proteomics efforts has greatly expanded the quantity of available protein information and has prompted the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) to enhance the depth and accessibility of protein annotations. In particular, we have expanded ongoing efforts to improve the integration of experimental information and sequence-based predictions and have redesigned the protein information web pages. A key feature of this redesign is the development of a GBrowse-derived interactive Proteome Browser customized to improve the visualization of sequence-based protein information. This Proteome Browser has enabled SGD to unify the display of hidden Markov model (HMM) domains, protein family HMMs, motifs, transmembrane regions, signal peptides, hydropathy plots and profile hits using several popular prediction algorithms. In addition, a physico-chemical properties page has been introduced to provide easy access to basic protein information. Improvements to the layout of the Protein Information page and integration of the Proteome Browser will facilitate the ongoing expansion of sequence-specific experimental information captured in SGD, including post-translational modifications and other user-defined annotations. Finally, SGD continues to improve upon the availability of genetic and physical interaction data in an ongoing collaboration with BioGRID by providing direct access to more than 82,000 manually-curated interactions.
- Published
- 2007
- Full Text
- View/download PDF
14. Principles of metadata organization at the ENCODE data coordination center
- Author
-
Benjamin C. Hitz, Aditi K. Narayanan, Jason A. Hilton, Idan Gabdank, Cricket A. Sloan, Venkat S. Malladi, J. Seth Strattan, J. Michael Cherry, Greg Roe, Jean M. Davidson, Forrest Y. Tanaka, Laurence D. Rowe, Eurie L. Hong, Timothy R. Dreszer, Nikhil R. Podduturi, Marcus Ho, Brian T. Lee, and Esther T. Chan
- Subjects
0301 basic medicine ,Quality Control ,Computer science ,ENCODE ,General Biochemistry, Genetics and Molecular Biology ,World Wide Web ,03 medical and health sciences ,Mice ,0302 clinical medicine ,Nucleic Acids ,Data file ,Databases, Genetic ,Animals ,Humans ,Caenorhabditis elegans ,Data collection ,Data element ,Data Collection ,Metadata standard ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,DNA ,Metadata repository ,Metadata ,030104 developmental biology ,Drosophila melanogaster ,030220 oncology & carcinogenesis ,Encyclopedia ,Original Article ,General Agricultural and Biological Sciences ,Sequence Alignment ,Algorithms ,Information Systems - Abstract
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org.
- Published
- 2015
15. Ontology application and use at the ENCODE DCC
- Author
-
Marcus Ho, Stuart R. Miyasato, W. James Kent, J. Seth Strattan, Jean M. Davidson, Nikhil R. Podduturi, Cricket A. Sloan, Greg Roe, Eurie L. Hong, Laurence D. Rowe, Brian T. Lee, Esther T. Chan, J. Michael Cherry, Drew T. Erickson, Forrest Y. Tanaka, Benjamin C. Hitz, Venkat S. Malladi, and Matt Simison
- Subjects
Information retrieval ,Transcription, Genetic ,Standardization ,Computer science ,Experimental data ,Molecular Sequence Annotation ,Ontology (information science) ,ENCODE ,General Biochemistry, Genetics and Molecular Biology ,Set (abstract data type) ,World Wide Web ,Metadata ,Mice ,Gene Ontology ,Databases, Genetic ,Encyclopedia ,Animals ,Humans ,Original Article ,Gene Regulatory Networks ,General Agricultural and Biological Sciences ,Data Curation ,Information Systems - Abstract
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC’s use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects. Database URL: https://www.encodeproject.org/
- Published
- 2015
- Full Text
- View/download PDF
16. Correction: Corrigendum: InterMOD: integrated data and tools for the unification of model organism research
- Author
-
J. Michael Cherry, Quang M. Trinh, Andrew Vallejos, Lincoln Stein, Jelena Aleksic, Gos Micklem, Richard N. Smith, Benjamin C. Hitz, Pushkala Jayaraman, Rachel Lyne, Howie Motenko, Joel Richardson, Christian Pich, Elizabeth A. Worthey, Gail Binkley, Simon N. Twigger, Kalpana Karra, J. D. Wong, Rama Balakrishnan, Steven B. Neuhauser, Todd W. Harris, Julie Sullivan, Monte Westerfield, and Sierra A. T. Moxon
- Subjects
Multidisciplinary ,Unification ,Computer science ,ved/biology ,ved/biology.organism_classification_rank.species ,computer.software_genre ,Data science ,03 medical and health sciences ,0302 clinical medicine ,030220 oncology & carcinogenesis ,Data mining ,Model organism ,computer ,030217 neurology & neurosurgery - Abstract
CORRIGENDUM: InterMOD: integrated data and tools for the unification of model organism research
- Published
- 2013
- Full Text
- View/download PDF
17. YeastMine—an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit
- Author
-
Eurie L. Hong, Benjamin C. Hitz, Julie Park, Rama Balakrishnan, Kalpana Karra, Gail Binkley, J. Michael Cherry, Gos Micklem, and Julie Sullivan
- Subjects
Computer science ,Interface (computing) ,Saccharomyces cerevisiae ,Data type ,General Biochemistry, Genetics and Molecular Biology ,World Wide Web ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Databases, Genetic ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,030304 developmental biology ,Internet ,0303 health sciences ,Information retrieval ,biology ,business.industry ,Original Articles ,biology.organism_classification ,File format ,Budding yeast ,Data warehouse ,Template ,Database Management Systems ,The Internet ,Genome, Fungal ,General Agricultural and Biological Sciences ,business ,030217 neurology & neurosurgery ,Information Systems - Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) provides high-quality curated genomic, genetic, and molecular information on the genes and their products of the budding yeast Saccharomyces cerevisiae. To accommodate the increasingly complex, diverse needs of researchers for searching and comparing data, SGD has implemented InterMine (http://www.InterMine.org), an open source data warehouse system with a sophisticated querying interface, to create YeastMine (http://yeastmine.yeastgenome.org). YeastMine is a multifaceted search and retrieval environment that provides access to diverse data types. Searches can be initiated with a list of genes, a list of Gene Ontology terms, or lists of many other data types. The results from queries can be combined for further analysis and saved or downloaded in customizable file formats. Queries themselves can be customized by modifying predefined templates or by creating a new template to access a combination of specific data types. YeastMine offers multiple scenarios in which it can be used such as a powerful search interface, a discovery tool, a curation aid and also a complex database presentation format. DATABASE URL: http://yeastmine.yeastgenome.org.
- Published
- 2012
- Full Text
- View/download PDF
18. New mutant phenotype data curation system in the Saccharomyces Genome Database
- Author
-
Robert S. Nash, Marek S. Skrzypek, Maria C. Costanzo, Eurie L. Hong, Stacia R. Engel, Edith D. Wong, Gail Binkley, J. Michael Cherry, and Benjamin C. Hitz
- Subjects
Genetics ,0303 health sciences ,Saccharomyces genome database ,Data curation ,biology ,030302 biochemistry & molecular biology ,Mutant ,Saccharomyces cerevisiae ,Locus (genetics) ,biology.organism_classification ,Phenotype ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Annotation ,Original Article ,General Agricultural and Biological Sciences ,Gene ,030304 developmental biology ,Information Systems - Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) organizes and displays molecular and genetic information about the genes and proteins of baker's yeast, Saccharomyces cerevisiae. Mutant phenotype screens have been the starting point for a large proportion of yeast molecular biological studies, and are still used today to elucidate the functions of uncharacterized genes and discover new roles for previously studied genes. To greatly facilitate searching and comparison of mutant phenotypes across genes, we have devised a new controlled-vocabulary system for capturing phenotype information. Each phenotype annotation is represented as an ‘observable’, which is the entity, or process that is observed, and a ‘qualifier’ that describes the change in that entity or process in the mutant (e.g. decreased, increased, or abnormal). Additional information about the mutant, such as strain background, allele name, conditions under which the phenotype is observed, or the identity of relevant chemicals, is captured in separate fields. For each gene, a summary of the mutant phenotype information is displayed on the Locus Summary page, and the complete information is displayed in tabular format on the Phenotype Details Page. All of the information is searchable and may also be downloaded in bulk using SGD's Batch Download Tool or Download Data Files Page. In the future, phenotypes will be integrated with other curated data to allow searching across different types of functional information, such as genetic and physical interaction data and Gene Ontology annotations. Database URL: http://www.yeastgenome.org/
- Published
- 2008
19. Integration of new alternative reference strain genome sequences into theSaccharomycesgenome database
- Author
-
Rama Balakrishnan, Sage T. Hellerstedt, J. Michael Cherry, Janos Demeter, Edith D. Wong, Stacia R. Engel, Gail Binkley, Marek S. Skrzypek, Travis K. Sheppard, Maria C. Costanzo, Robert S. Nash, Kelley Paskov, Kalpana Karra, Shuai Weng, Giltae Song, Kyla S. Dalusag, and Benjamin C. Hitz
- Subjects
0301 basic medicine ,Saccharomyces cerevisiae ,Locus (genetics) ,Biology ,ENCODE ,Genome ,General Biochemistry, Genetics and Molecular Biology ,Saccharomyces ,User-Computer Interface ,03 medical and health sciences ,Protein sequencing ,Databases, Genetic ,natural sciences ,Gene ,Genetics ,Reproducibility of Results ,Molecular Sequence Annotation ,Genomics ,Genome project ,biology.organism_classification ,030104 developmental biology ,Database Update ,Genome, Fungal ,General Agricultural and Biological Sciences ,Information Systems ,Reference genome - Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences. Database URL: www.yeastgenome.org
- Published
- 2016
- Full Text
- View/download PDF
20. Gene Ontology annotations at SGD: new data sources and annotation methods
- Author
-
Stuart R. Miyasato, Rama Balakrishnan, Shuai Weng, Dianna G. Fisk, Eurie L. Hong, David Botstein, Robert S. Nash, Jodi E. Hirschman, Marek S. Skrzypek, Edith D. Wong, Selina S. Dwight, Michael S. Livstone, Stacia R. Engel, Kathy K. Zhu, J. Michael Cherry, Benjamin C. Hitz, Rose Oughtred, Julie Park, Kara Dolinski, Gail Binkley, Karen R. Christie, Cynthia J. Krieger, Maria C. Costanzo, and Qing Dong
- Subjects
Genetics ,Data source ,Internet ,Information retrieval ,Saccharomyces cerevisiae Proteins ,Gene ontology ,Genes, Fungal ,Computational Biology ,Genomics ,Saccharomyces cerevisiae ,Articles ,Biology ,Genome ,Annotation ,User-Computer Interface ,Vocabulary, Controlled ,Controlled vocabulary ,Databases, Genetic ,UniProt ,Experimental methods ,Genome, Fungal ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) - Abstract
The Saccharomyces Genome Database (SGD; http:// www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current.
- Published
- 2007
21. The Saccharomyces Genome Database provides comprehensive information about the biology of S. cerevisiae and tools for studies in comparative genomics
- Author
-
S. Miyasoto, Rose Oughtred, Michael S. Livstone, Stacia R. Engel, Robert S. Nash, Gail Binkley, Qing Dong, Dianna G. Fisk, Marek S. Skrzypek, Maria C. Costanzo, Mark Schroeder, J. M. Cherry, Eurie L. Hong, Rey Andrada, Karen R. Christie, David Botstein, Shuai Weng, Benjamin C. Hitz, Edith D. Wong, Rama Balakrishnan, Selina S. Dwight, Jinha M. Park, Jodi E. Hirschman, and Kara Dolinski
- Subjects
Comparative genomics ,Saccharomyces genome database ,Genetics ,Computational biology ,Biology ,Molecular Biology ,Biochemistry ,Biotechnology - Published
- 2007
- Full Text
- View/download PDF
22. Free energy determinants of secondary structure formation: III. beta-turns and their role in protein folding
- Author
-
Benjamin C. Hitz, An-Suei Yang, and Barry Honig
- Subjects
Protein Folding ,Databases, Factual ,Chemistry ,Protein Conformation ,Solvation ,Hydrogen Bonding ,Dipeptides ,Conformational entropy ,Force field (chemistry) ,Protein Structure, Secondary ,Crystallography ,Structural Biology ,Chemical physics ,Metastability ,Data Interpretation, Statistical ,Thermodynamics ,Protein folding ,Twist ,Molecular Biology ,Peptide sequence ,Protein secondary structure ,Monte Carlo Method ,Software - Abstract
The stability of beta-turns is calculated as a function of sequence and turn type with a Monte Carlo sampling technique. The conformational energy of four internal hydrogen-bonded turn types, I, I', II and II', is obtained by evaluating their gas phase energy with the CHARMM force field and accounting for solvation effects with the Finite Difference Poisson-Boltzmann (FDPB) method. All four turn types are found to be less stable than the coil state, independent of the sequence in the turn. The free-energy penalties associated with turn formation vary between 1.6 kcal/mol and 7.7 kcal/mol, depending on the sequence and turn type. Differences in turn stability arise mainly from intraresidue interactions within the two central residues of the turn. For each combination of the two central residues, except for -Gly-Gly-, the most stable beta-turn type is always found to occur most commonly in native proteins. The fact that a model based on local interactions accounts for the observed preference of specific sequences suggests that long-range tertiary interactions tend to play a secondary role in determining turn conformation. In contrast, for beta-hairpins, long-range interactions appear to dominate. Specifically, due to the right-handed twist of beta-strands, type I' turns for -Gly-Gly- are found to occur with high frequency, even when local energetics would dictate otherwise. The fact that any combination of two residues is found able to adopt a relatively low-energy turn structure explains why the amino acid sequence in turns is highly variable. The calculated free-energy cost of turn formation, when combined with related numbers obtained for alpha-helices and beta-sheets, suggests a model for the initiation of protein folding based on metastable fragments of secondary structure.
- Published
- 1996
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.