50 results on '"Benjamin C. Hitz"'
Search Results
2. RNAget: an API to securely retrieve RNA quantifications.
- Author
-
Sean Upchurch, Emilio Palumbo, Jeremy Adams, David Bujold, Guillaume Bourque, Jared Nedzel, Keenan Graham, Meenakshi S. Kagda, Pedro Assis, Benjamin C. Hitz, Emilio Righi, Roderic Guigó, Barbara J. Wold, Alvis Brazma, Julia Burchard, Joe Capka, Michael Cherry, Laura Clarke, Brian Craft, Manolis Dermitzakis, Mark Diekhans, John Dursi, Michael Sean Fitzsimons, Zac Flaming, Romina Garrido, Alfred Gil, Paul Godden, Matt Green, Mitch Guttman, Brian Haas, Max Haeussler, Bo Li, Sten Linnarsson, Adam Lipski, David Liu, Simonne Longerich, David Lougheed, Jonathan Manning, John C. Marioni, Christopher Meyer, Stephen B. Montgomery, Alyssa Morrow, Alfonso Muñoz-Pomer Fuentes, Jared L. Nedzel, David Nguyen, Kevin Osborn, Francis Ouellette, Irene Papatheodorou, Dmitri D. Pervouchine, Arun K. Ramani, Jordi Rambla, Bashir Sadjad, David Steinberg, Jeremiah Talkar, Timothy Tickle, Kathy Tzeng, Saman Vaisipour, Sean Watford, Barbara Wold, Zhenyu Zhang, and Jing Zhu
- Published
- 2023
- Full Text
- View/download PDF
3. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal.
- Author
-
Yunhai Luo, Benjamin C. Hitz, Idan Gabdank, Jason A. Hilton, Meenakshi S. Kagda, Bonita Lam, Zachary Myers, Paul Sud, Jennifer Jou, Khine Lin, Ulugbek K. Baymuradov, Keenan Graham, Casey Litton, Stuart R. Miyasato, J. Seth Strattan, Otto Jolanki, Jin-Wook Lee, Forrest Tanaka, Philip Adenekan, Emma O'Neill, and J. Michael Cherry
- Published
- 2020
- Full Text
- View/download PDF
4. The Encyclopedia of DNA elements (ENCODE): data portal update.
- Author
-
Carrie A. Davis, Benjamin C. Hitz, Cricket A. Sloan, Esther T. Chan, Jean M. Davidson, Idan Gabdank, Jason A. Hilton, Kriti Jain, Ulugbek K. Baymuradov, Aditi K. Narayanan, Kathrina C. Onate, Keenan Graham, Stuart R. Miyasato, Timothy R. Dreszer, J. Seth Strattan, Otto Jolanki, Forrest Tanaka, and J. Michael Cherry
- Published
- 2018
- Full Text
- View/download PDF
5. Annotating and prioritizing human non-coding variants with RegulomeDB v.2
- Author
-
Shengcheng Dong, Nanxiang Zhao, Emma Spragins, Meenakshi S. Kagda, Mingjie Li, Pedro Assis, Otto Jolanki, Yunhai Luo, J. Michael Cherry, Alan P. Boyle, and Benjamin C. Hitz
- Subjects
Genetics - Published
- 2023
- Full Text
- View/download PDF
6. The ENCODE Uniform Analysis Pipelines
- Author
-
Benjamin C. Hitz, Jin-Wook Lee, Otto Jolanki, Meenakshi S. Kagda, Keenan Graham, Paul Sud, Idan Gabdank, J. Seth Strattan, Cricket A. Sloan, Timothy Dreszer, Laurence D. Rowe, Nikhil R. Podduturi, Venkat S. Malladi, Esther T. Chan, Jean M. Davidson, Marcus Ho, Stuart Miyasato, Matt Simison, Forrest Tanaka, Yunhai Luo, Ian Whaling, Eurie L. Hong, Brian T. Lee, Richard Sandstrom, Eric Rynes, Jemma Nelson, Andrew Nishida, Alyssa Ingersoll, Michael Buckley, Mark Frerker, Daniel S Kim, Nathan Boley, Diane Trout, Alex Dobin, Sorena Rahmanian, Dana Wyman, Gabriela Balderrama-Gutierrez, Fairlie Reese, Neva C. Durand, Olga Dudchenko, David Weisz, Suhas S. P. Rao, Alyssa Blackburn, Dimos Gkountaroulis, Mahdi Sadr, Moshe Olshansky, Yossi Eliaz, Dat Nguyen, Ivan Bochkov, Muhammad Saad Shamim, Ragini Mahajan, Erez Aiden, Tom Gingeras, Simon Heath, Martin Hirst, W. James Kent, Anshul Kundaje, Ali Mortazavi, Barbara Wold, and J. Michael Cherry
- Abstract
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of theHomo sapiensandMus musculusgenomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and availableviathe ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL;https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environmentsviaCromwell. Access to the pipelines and dataviathe cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.Database URL:https://www.encodeproject.org/
- Published
- 2023
- Full Text
- View/download PDF
7. ENCODE data at the ENCODE portal.
- Author
-
Cricket A. Sloan, Esther T. Chan, Jean M. Davidson, Venkat S. Malladi, J. Seth Strattan, Benjamin C. Hitz, Idan Gabdank, Aditi K. Narayanan, Marcus Ho, Brian T. Lee, Laurence D. Rowe, Timothy R. Dreszer, Greg Roe, Nikhil R. Podduturi, Forrest Tanaka, Eurie L. Hong, and J. Michael Cherry
- Published
- 2016
- Full Text
- View/download PDF
8. The Saccharomyces Genome Database Variant Viewer.
- Author
-
Travis K. Sheppard, Benjamin C. Hitz, Stacia R. Engel, Giltae Song, Rama Balakrishnan, Gail Binkley, Maria C. Costanzo, Kyla S. Dalusag, Janos Demeter, Sage T. Hellerstedt, Kalpana Karra, Robert S. Nash, Kelley M. Paskov, Marek S. Skrzypek, Shuai Weng, Edith D. Wong, and J. Michael Cherry
- Published
- 2016
- Full Text
- View/download PDF
9. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models
- Author
-
Joel Rozowsky, Jiahao Gao, Beatrice Borsari, Yucheng T. Yang, Timur Galeev, Gamze Gürsoy, Charles B. Epstein, Kun Xiong, Jinrui Xu, Tianxiao Li, Jason Liu, Keyang Yu, Ana Berthel, Zhanlin Chen, Fabio Navarro, Maxwell S. Sun, James Wright, Justin Chang, Christopher J.F. Cameron, Noam Shoresh, Elizabeth Gaskell, Jorg Drenkow, Jessika Adrian, Sergey Aganezov, François Aguet, Gabriela Balderrama-Gutierrez, Samridhi Banskota, Guillermo Barreto Corona, Sora Chee, Surya B. Chhetri, Gabriel Conte Cortez Martins, Cassidy Danyko, Carrie A. Davis, Daniel Farid, Nina P. Farrell, Idan Gabdank, Yoel Gofin, David U. Gorkin, Mengting Gu, Vivian Hecht, Benjamin C. Hitz, Robbyn Issner, Yunzhe Jiang, Melanie Kirsche, Xiangmeng Kong, Bonita R. Lam, Shantao Li, Bian Li, Xiqi Li, Khine Zin Lin, Ruibang Luo, Mark Mackiewicz, Ran Meng, Jill E. Moore, Jonathan Mudge, Nicholas Nelson, Chad Nusbaum, Ioann Popov, Henry E. Pratt, Yunjiang Qiu, Srividya Ramakrishnan, Joe Raymond, Leonidas Salichos, Alexandra Scavelli, Jacob M. Schreiber, Fritz J. Sedlazeck, Lei Hoon See, Rachel M. Sherman, Xu Shi, Minyi Shi, Cricket Alicia Sloan, J Seth Strattan, Zhen Tan, Forrest Y. Tanaka, Anna Vlasova, Jun Wang, Jonathan Werner, Brian Williams, Min Xu, Chengfei Yan, Lu Yu, Christopher Zaleski, Jing Zhang, Kristin Ardlie, J Michael Cherry, Eric M. Mendenhall, William S. Noble, Zhiping Weng, Morgan E. Levine, Alexander Dobin, Barbara Wold, Ali Mortazavi, Bing Ren, Jesse Gillis, Richard M. Myers, Michael P. Snyder, Jyoti Choudhary, Aleksandar Milosavljevic, Michael C. Schatz, Bradley E. Bernstein, Roderic Guigó, Thomas R. Gingeras, and Mark Gerstein
- Subjects
Allele-specific activity ,Predictive models ,Personal genome ,eQTLs ,Transformer model ,Functional genomics ,GTEx ,Genome annotations ,Structural variants ,General Biochemistry, Genetics and Molecular Biology ,Tissue specificity ,Functional epigenomes ,ENCODE - Abstract
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
- Published
- 2023
10. Annotating and prioritizing human non-coding variants with RegulomeDB
- Author
-
Shengcheng Dong, Nanxiang Zhao, Emma Spragins, Meenakshi S. Kagda, Mingjie Li, Pedro Assis, Otto Jolanki, Yunhai Luo, J Michael Cherry, Alan P Boyle, and Benjamin C Hitz
- Abstract
Nearly 90% of the disease risk-associated variants identified from genome-wide association studies (GWAS) are in non-coding regions of the genome. The annotations obtained from analyzing functional genomics assays can provide additional information to pinpoint causal variants, which are often not the lead variants identified from association studies. However, the lack of available annotation tools limits the use of such data.To address the challenge, we have previously built the RegulomeDB database for prioritizing and annotating variants in non-coding regions1, which has been a highly utilized resource for the research community (Supplementary Fig. 1). RegulomeDB annotates a variant by intersecting its position with genomic intervals identified from functional genomic assays and computational approaches. It also incorporates those hits of a variant into a heuristic ranking score, representing its potential to be functional in regulatory elements.Here we present a newer version of the RegulomeDB web server, RegulomeDB v2.1 (http://regulomedb.org). We improve and boost annotation power by incorporating thousands of newly processed data from functional genomic assays in GRCh38 assembly, and now include probabilistic scores from the SURF algorithm that was the top performing non-coding variant predictor in CAGI 52. We also provide interactive charts and genome browser views to allow users an easy way to perform exploratory analyses in different tissue contexts.
- Published
- 2022
- Full Text
- View/download PDF
11. Saccharomyces genome database provides new regulation data.
- Author
-
Maria C. Costanzo, Stacia R. Engel, Edith D. Wong, Paul Lloyd, Kalpana Karra, Esther T. Chan, Shuai Weng, Kelley M. Paskov, Greg R. Roe, Gail Binkley, Benjamin C. Hitz, and J. Michael Cherry
- Published
- 2014
- Full Text
- View/download PDF
12. Saccharomyces Genome Database: the genomics resource of budding yeast.
- Author
-
J. Michael Cherry, Eurie L. Hong, Craig Amundsen, Rama Balakrishnan, Gail Binkley, Esther T. Chan, Karen R. Christie, Maria C. Costanzo, Selina S. Dwight, Stacia R. Engel, Dianna G. Fisk, Jodi E. Hirschman, Benjamin C. Hitz, Kalpana Karra, Cynthia J. Krieger, Stuart R. Miyasato, Robert S. Nash, Julie Park, Marek S. Skrzypek, Matt Simison, Shuai Weng, and Edith D. Wong
- Published
- 2012
- Full Text
- View/download PDF
13. Saccharomyces Genome Database provides mutant phenotype data.
- Author
-
Stacia R. Engel, Rama Balakrishnan, Gail Binkley, Karen R. Christie, Maria C. Costanzo, Selina S. Dwight, Dianna G. Fisk, Jodi E. Hirschman, Benjamin C. Hitz, Eurie L. Hong, Cynthia J. Krieger, Michael S. Livstone, Stuart R. Miyasato, Robert S. Nash, Rose Oughtred, Julie Park, Marek S. Skrzypek, Shuai Weng, Edith D. Wong, Kara Dolinski, David Botstein, and J. Michael Cherry
- Published
- 2010
- Full Text
- View/download PDF
14. SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.
- Author
-
Benjamin C Hitz, Laurence D Rowe, Nikhil R Podduturi, David I Glick, Ulugbek K Baymuradov, Venkat S Malladi, Esther T Chan, Jean M Davidson, Idan Gabdank, Aditi K Narayana, Kathrina C Onate, Jason Hilton, Marcus C Ho, Brian T Lee, Stuart R Miyasato, Timothy R Dreszer, Cricket A Sloan, J Seth Strattan, Forrest Y Tanaka, Eurie L Hong, and J Michael Cherry
- Subjects
Medicine ,Science - Abstract
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.
- Published
- 2017
- Full Text
- View/download PDF
15. Gene Ontology annotations at SGD: new data sources and annotation methods.
- Author
-
Eurie L. Hong, Rama Balakrishnan, Qing Dong, Karen R. Christie, Julie Park, Gail Binkley, Maria C. Costanzo, Selina S. Dwight, Stacia R. Engel, Dianna G. Fisk, Jodi E. Hirschman, Benjamin C. Hitz, Cynthia J. Krieger, Michael S. Livstone, Stuart R. Miyasato, Robert S. Nash, Rose Oughtred, Marek S. Skrzypek, Shuai Weng, Edith D. Wong, Kathy K. Zhu, Kara Dolinski, David Botstein, and J. Michael Cherry
- Published
- 2008
- Full Text
- View/download PDF
16. Principles of metadata organization at the ENCODE data coordination center.
- Author
-
Eurie L. Hong, Cricket A. Sloan, Esther T. Chan, Jean M. Davidson, Venkat S. Malladi, J. Seth Strattan, Benjamin C. Hitz, Idan Gabdank, Aditi K. Narayanan, Marcus Ho, Brian T. Lee, Laurence D. Rowe, Timothy R. Dreszer, Greg R. Roe, Nikhil R. Podduturi, Forrest Tanaka, Jason A. Hilton, and J. Michael Cherry
- Published
- 2016
- Full Text
- View/download PDF
17. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database.
- Author
-
Giltae Song, Rama Balakrishnan, Gail Binkley, Maria C. Costanzo, Kyla S. Dalusag, Janos Demeter, Stacia R. Engel, Sage T. Hellerstedt, Kalpana Karra, Benjamin C. Hitz, Robert S. Nash, Kelley M. Paskov, Travis K. Sheppard, Marek S. Skrzypek, Shuai Weng, Edith D. Wong, and J. Michael Cherry
- Published
- 2016
- Full Text
- View/download PDF
18. Expanded protein information at SGD: new pages and proteome browser.
- Author
-
Robert S. Nash, Shuai Weng, Benjamin C. Hitz, Rama Balakrishnan, Karen R. Christie, Maria C. Costanzo, Selina S. Dwight, Stacia R. Engel, Dianna G. Fisk, Jodi E. Hirschman, Eurie L. Hong, Michael S. Livstone, Rose Oughtred, Julie Park, Marek S. Skrzypek, Chandra L. Theesfeld, Gail Binkley, Qing Dong, Christopher Lane, Stuart R. Miyasato, Anand Sethuraman, Mark Schroeder, Kara Dolinski, David Botstein, and J. Michael Cherry
- Published
- 2007
- Full Text
- View/download PDF
19. Prevention of data duplication for high throughput sequencing repositories.
- Author
-
Idan Gabdank, Esther T. Chan, Jean M. Davidson, Jason A. Hilton, Carrie A. Davis, Ulugbek K. Baymuradov, Aditi K. Narayanan, Kathrina C. Onate, Keenan Graham, Stuart R. Miyasato, Timothy R. Dreszer, J. Seth Strattan, Otto Jolanki, Forrest Tanaka, Benjamin C. Hitz, Cricket A. Sloan, and J. Michael Cherry
- Published
- 2018
- Full Text
- View/download PDF
20. Ontology application and use at the ENCODE DCC.
- Author
-
Venkat S. Malladi, Drew T. Erickson, Nikhil R. Podduturi, Laurence D. Rowe, Esther T. Chan, Jean M. Davidson, Benjamin C. Hitz, Marcus Ho, Brian T. Lee, Stuart R. Miyasato, Greg R. Roe, Matt Simison, Cricket A. Sloan, J. Seth Strattan, Forrest Tanaka, W. James Kent, J. Michael Cherry, and Eurie L. Hong
- Published
- 2015
- Full Text
- View/download PDF
21. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal
- Author
-
Casey Litton, Zachary Myers, Ulugbek K. Baymuradov, Benjamin C. Hitz, Meenakshi S. Kagda, Otto Jolanki, Jin-Wook Lee, Stuart R. Miyasato, Keenan Graham, Idan Gabdank, Forrest Y. Tanaka, Bonita R. Lam, J. Seth Strattan, Jason A. Hilton, J. Michael Cherry, Yunhai Luo, Philip Adenekan, Paul Sud, Emma O'Neill, Jennifer Jou, and Khine Lin
- Subjects
Interoperability ,Cloud computing ,Data_CODINGANDINFORMATIONTHEORY ,Biology ,ENCODE ,World Wide Web ,Mice ,03 medical and health sciences ,0302 clinical medicine ,Documentation ,Software ,Databases, Genetic ,Genetics ,Database Issue ,Animals ,Humans ,030304 developmental biology ,0303 health sciences ,Genome, Human ,business.industry ,DNA ,Genomics ,Visualization ,Open data ,Encyclopedia ,business ,030217 neurology & neurosurgery - Abstract
The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.
- Published
- 2019
- Full Text
- View/download PDF
22. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models
- Author
-
Joel Rozowsky, Jorg Drenkow, Yucheng T Yang, Gamze Gursoy, Timur Galeev, Beatrice Borsari, Charles B Epstein, Kun Xiong, Jinrui Xu, Jiahao Gao, Keyang Yu, Ana Berthel, Zhanlin Chen, Fabio Navarro, Jason Liu, Maxwell S Sun, James Wright, Justin Chang, Christopher JF Cameron, Noam Shoresh, Elizabeth Gaskell, Jessika Adrian, Sergey Aganezov, François Aguet, Gabriela Balderrama-Gutierrez, Samridhi Banskota, Guillermo Barreto Corona, Sora Chee, Surya B Chhetri, Gabriel Conte Cortez Martins, Cassidy Danyko, Carrie A Davis, Daniel Farid, Nina P Farrell, Idan Gabdank, Yoel Gofin, David U Gorkin, Mengting Gu, Vivian Hecht, Benjamin C Hitz, Robbyn Issner, Melanie Kirsche, Xiangmeng Kong, Bonita R Lam, Shantao Li, Bian Li, Tianxiao Li, Xiqi Li, Khine Zin Lin, Ruibang Luo, Mark Mackiewicz, Jill E Moore, Jonathan Mudge, Nicholas Nelson, Chad Nusbaum, Ioann Popov, Henry E Pratt, Yunjiang Qiu, Srividya Ramakrishnan, Joe Raymond, Leonidas Salichos, Alexandra Scavelli, Jacob M Schreiber, Fritz J Sedlazeck, Lei Hoon See, Rachel M Sherman, Xu Shi, Minyi Shi, Cricket Alicia Sloan, J Seth Strattan, Zhen Tan, Forrest Y Tanaka, Anna Vlasova, Jun Wang, Jonathan Werner, Brian Williams, Min Xu, Chengfei Yan, Lu Yu, Christopher Zaleski, Jing Zhang, Kristin Ardlie, J Michael Cherry, Eric M Mendenhall, William S Noble, Zhiping Weng, Morgan E Levine, Alexander Dobin, Barbara Wold, Ali Mortazavi, Bing Ren, Jesse Gillis, Richard M Myers, Michael P Snyder, Jyoti Choudhary, Aleksandar Milosavljevic, Michael C Schatz, Roderic Guigó, Bradley E Bernstein, Thomas R Gingeras, and Mark Gerstein
- Subjects
Genetic variants ,Genomics ,Preprint ,Computational biology ,Biology ,Personal genomics - Abstract
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of personal epigenomes, for ∼25 tissues and >10 assays in four donors (>1500 open-access functional genomic and proteomic datasets, in total). Each dataset is mapped to a matched, diploid personal genome, which has long-read phasing and structural variants. The mappings enable us to identify >1 million loci with allele-specific behavior. These loci exhibit coordinated epigenetic activity along haplotypes and less conservation than matched, non-allele-specific loci, in a fashion broadly paralleling tissue-specificity. Surprisingly, they can be accurately modelled just based on local nucleotide-sequence context. Combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci and enables models for transferring known eQTLs to difficult-to-profile tissues. Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
- Published
- 2021
- Full Text
- View/download PDF
23. The YeastGenome app: the Saccharomyces Genome Database at your fingertips.
- Author
-
Edith D. Wong, Kalpana Karra, Benjamin C. Hitz, Eurie L. Hong, and J. Michael Cherry
- Published
- 2013
- Full Text
- View/download PDF
24. YeastMine - an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit.
- Author
-
Rama Balakrishnan, Julie Park, Kalpana Karra, Benjamin C. Hitz, Gail Binkley, Eurie L. Hong, Julie M. Sullivan, Gos Micklem, and J. Michael Cherry
- Published
- 2012
- Full Text
- View/download PDF
25. New mutant phenotype data curation system in the Saccharomyces Genome Database.
- Author
-
Maria C. Costanzo, Marek S. Skrzypek, Robert S. Nash, Edith D. Wong, Gail Binkley, Stacia R. Engel, Benjamin C. Hitz, Eurie L. Hong, and J. Michael Cherry
- Published
- 2009
- Full Text
- View/download PDF
26. The Encyclopedia of DNA elements (ENCODE): data portal update
- Author
-
Aditi K. Narayanan, Benjamin C. Hitz, Timothy R. Dreszer, Kriti Jain, Otto Jolanki, Idan Gabdank, Keenan Graham, Kathrina C. Onate, Jason A. Hilton, Stuart R. Miyasato, J. Michael Cherry, Cricket A. Sloan, J. Seth Strattan, Carrie A. Davis, Esther T. Chan, Jean M. Davidson, Forrest Y. Tanaka, and Ulugbek K. Baymuradov
- Subjects
0301 basic medicine ,Download ,Interface (Java) ,Datasets as Topic ,Genomics ,Biology ,Bioinformatics ,ENCODE ,World Wide Web ,03 medical and health sciences ,Mice ,User-Computer Interface ,Databases, Genetic ,Genetics ,Database Issue ,Animals ,Humans ,Caenorhabditis elegans ,Metadata ,Genome, Human ,High-Throughput Nucleotide Sequencing ,DNA ,Visualization ,030104 developmental biology ,Drosophila melanogaster ,Gene Components ,Encyclopedia ,Data Display ,Forecasting - Abstract
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.
- Published
- 2017
27. The ENCODE Portal as an Epigenomics Resource
- Author
-
J. Seth Strattan, Khine Lin, Keenan Graham, Casey Litton, Emma O'Neill, Philip Adenekan, Jason A. Hilton, Paul Sud, Benjamin C. Hitz, Idan Gabdank, J. Michael Cherry, Yunhai Luo, Forrest Y. Tanaka, Zachary Myers, Jennifer Jou, Stuart R. Miyasato, Ulugbek K. Baymuradov, Otto Jolanki, Meenakshi S. Kagda, Jin-Wook Lee, and Bonita R. Lam
- Subjects
Epigenomics ,Computer science ,Genomics ,ENCODE ,Article ,03 medical and health sciences ,Mice ,Data file ,Databases, Genetic ,Animals ,Humans ,Protocol (object-oriented programming) ,030304 developmental biology ,0303 health sciences ,Internet ,Metadata ,Information retrieval ,Genome, Human ,030305 genetics & heredity ,General Medicine ,DNA ,DNA Methylation ,Metadata modeling ,Chromatin ,ComputingMethodologies_PATTERNRECOGNITION ,Human genome ,Software - Abstract
The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine-readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access.
- Published
- 2019
28. ENCODE data at the ENCODE portal
- Author
-
Forrest Y. Tanaka, Esther T. Chan, Marcus Ho, Cricket A. Sloan, Nikhil R. Podduturi, J. Seth Strattan, Eurie L. Hong, Jean M. Davidson, Benjamin C. Hitz, Brian T. Lee, Greg Roe, Timothy R. Dreszer, Laurence D. Rowe, Idan Gabdank, Aditi K. Narayanan, Venkat S. Malladi, and J. Michael Cherry
- Subjects
0301 basic medicine ,Genomics ,Computational biology ,Biology ,ENCODE ,Genome ,Mice ,03 medical and health sciences ,Databases, Genetic ,Genetics ,Animals ,Humans ,Database Issue ,Gene ,Genome, Human ,Proteins ,DNA ,Visualization ,Metadata ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Genes ,DNA methylation ,RNA ,Human genome - Abstract
The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.
- Published
- 2015
- Full Text
- View/download PDF
29. The Saccharomyces Genome Database Variant Viewer
- Author
-
Janos Demeter, Marek S. Skrzypek, Benjamin C. Hitz, Robert S. Nash, Kalpana Karra, Stacia R. Engel, Sage T. Hellerstedt, Gail Binkley, J. Michael Cherry, Maria C. Costanzo, Rama Balakrishnan, Travis K. Sheppard, Kelley Paskov, Giltae Song, Edith D. Wong, Shuai Weng, and Kyla S. Dalusag
- Subjects
0301 basic medicine ,Sequence analysis ,Saccharomyces cerevisiae ,Sequence alignment ,Computational biology ,Genome ,Saccharomyces ,03 medical and health sciences ,Annotation ,User-Computer Interface ,Sequence Analysis, Protein ,Databases, Genetic ,Genetics ,Database Issue ,natural sciences ,Sequence (medicine) ,biology ,Genetic Variation ,Molecular Sequence Annotation ,Sequence Analysis, DNA ,biology.organism_classification ,030104 developmental biology ,Genome, Fungal ,Sequence Alignment - Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer.
- Published
- 2015
30. Prevention of data duplication for high throughput sequencing repositories
- Author
-
J. Seth Strattan, Carrie A. Davis, Forrest Y. Tanaka, Benjamin C. Hitz, J. Michael Cherry, Keenan Graham, Jean M. Davidson, Jason A. Hilton, Idan Gabdank, Kathrina C. Onate, Stuart R. Miyasato, Otto Jolanki, Timothy R. Dreszer, Esther T. Chan, Aditi K. Narayanan, Ulugbek K. Baymuradov, and Cricket A. Sloan
- Subjects
0301 basic medicine ,Computer science ,business.industry ,Extramural ,MEDLINE ,Computational biology ,General Biochemistry, Genetics and Molecular Biology ,DNA sequencing ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Text mining ,Data deduplication ,Original Article ,Databases, Nucleic Acid ,General Agricultural and Biological Sciences ,business ,Data Curation ,030217 neurology & neurosurgery ,Information Systems - Abstract
Prevention of unintended duplication is one of the ongoing challenges many databases have to address. Working with high-throughput sequencing data, the complexity of that challenge increases with the complexity of the definition of a duplicate. In a computational data model, a data object represents a real entity like a reagent or a biosample. This representation is similar to how a card represents a book in a paper library catalog. Duplicated data objects not only waste storage, they can mislead users into assuming the model represents more than the single entity. Even if it is clear that two objects represent a single entity, data duplication opens the door to potential inconsistencies between the objects since the content of the duplicated objects can be updated independently, allowing divergence of the metadata associated with the objects. Analogously to a situation in which a catalog in a paper library would contain by mistake two cards for a single copy of a book. If these cards are listing simultaneously two different individuals as current book borrowers, it would be difficult to determine which borrower (out of the two listed) actually has the book. Unfortunately, in a large database with multiple submitters, unintended duplication is to be expected. In this article, we present three principal guidelines the Encyclopedia of DNA Elements (ENCODE) Portal follows in order to prevent unintended duplication of both actual files and data objects: definition of identifiable data objects (I), object uniqueness validation (II) and de-duplication mechanism (III). In addition to explaining our modus operandi, we elaborate on the methods used for identification of sequencing data files. Comparison of the approach taken by the ENCODE Portal vs other widely used biological data repositories is provided. Database URL: https://www.encodeproject.org/
- Published
- 2018
- Full Text
- View/download PDF
31. The Reference Genome Sequence ofSaccharomyces cerevisiae: Then and Now
- Author
-
Edith D. Wong, Maria C. Costanzo, Dianna G. Fisk, Marek S. Skrzypek, Selina S. Dwight, Fred S. Dietrich, Paul Lloyd, Robert S. Nash, Kalpana Karra, Stacia R. Engel, Gail Binkley, Matt Simison, J. Michael Cherry, Benjamin C. Hitz, Stuart R. Miyasato, Rama Balakrishnan, and Shuai Weng
- Subjects
Databases, Factual ,Sequence analysis ,Saccharomyces cerevisiae ,Investigations ,ENCODE ,genome release ,Genome ,Open Reading Frames ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,model organism ,Molecular Biology ,Genetics (clinical) ,030304 developmental biology ,Whole genome sequencing ,Internet ,0303 health sciences ,biology ,reference sequence ,Chromosome Mapping ,Sequence Analysis, DNA ,Genome project ,S288C ,biology.organism_classification ,Yeast ,Genome, Fungal ,030217 neurology & neurosurgery ,Reference genome - Abstract
The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called “S288C 2010,” was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science.
- Published
- 2014
- Full Text
- View/download PDF
32. Saccharomyces genome database provides new regulation data
- Author
-
J. Michael Cherry, Maria C. Costanzo, Kelley Paskov, Edith D. Wong, Paul Lloyd, Kalpana Karra, Esther T. Chan, Stacia R. Engel, Gail Binkley, Greg Roe, Shuai Weng, and Benjamin C. Hitz
- Subjects
Saccharomyces cerevisiae Proteins ,Transcription, Genetic ,Saccharomyces cerevisiae ,Gene regulatory network ,Locus (genetics) ,Biology ,Genome ,Gene product ,03 medical and health sciences ,Gene Expression Regulation, Fungal ,Databases, Genetic ,Genetics ,Gene Regulatory Networks ,Gene ,030304 developmental biology ,0303 health sciences ,Internet ,Binding Sites ,030302 biochemistry & molecular biology ,biology.organism_classification ,Protein Structure, Tertiary ,DNA binding site ,Genome, Fungal ,Candidate Disease Gene ,IV. Viruses, bacteria, protozoa and fungi ,Transcription Factors - Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the community resource for genomic, gene and protein information about the budding yeast Saccharomyces cerevisiae, containing a variety of functional information about each yeast gene and gene product. We have recently added regulatory information to SGD and present it on a new tabbed section of the Locus Summary entitled 'Regulation'. We are compiling transcriptional regulator-target gene relationships, which are curated from the literature at SGD or imported, with permission, from the YEASTRACT database. For nearly every S. cerevisiae gene, the Regulation page displays a table of annotations showing the regulators of that gene, and a graphical visualization of its regulatory network. For genes whose products act as transcription factors, the Regulation page also shows a table of their target genes, accompanied by a Gene Ontology enrichment analysis of the biological processes in which those genes participate. We additionally synthesize information from the literature for each transcription factor in a free-text Regulation Summary, and provide other information relevant to its regulatory function, such as DNA binding site motifs and protein domains. All of the regulation data are available for querying, analysis and download via YeastMine, the InterMine-based data warehouse system in use at SGD.
- Published
- 2013
33. Annotation of functional variation in personal genomes using RegulomeDB
- Author
-
J. Michael Cherry, Shuai Weng, Konrad J. Karczewski, Manoj Hariharan, Alan P. Boyle, Michael Snyder, Marc A. Schaub, Maya Kasowski, Benjamin C. Hitz, Julie Park, Eurie L. Hong, and Yong Cheng
- Subjects
Resource ,Nonsynonymous substitution ,Genotype ,Genome-wide association study ,Computational biology ,Regulatory Sequences, Nucleic Acid ,Biology ,ENCODE ,Polymorphism, Single Nucleotide ,Genome ,Open Reading Frames ,Annotation ,Databases, Genetic ,Genetics ,Humans ,Lupus Erythematosus, Systemic ,Tumor Necrosis Factor alpha-Induced Protein 3 ,Genetics (clinical) ,Internet ,Genome, Human ,Intracellular Signaling Peptides and Proteins ,Genetic Variation ,Nuclear Proteins ,Molecular Sequence Annotation ,DNA-Binding Proteins ,Human genome ,Genome-Wide Association Study ,Personal genomics - Abstract
As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.
- Published
- 2012
- Full Text
- View/download PDF
34. Saccharomyces Genome Database: the genomics resource of budding yeast
- Author
-
Marek S. Skrzypek, Eurie L. Hong, Edith D. Wong, Cynthia J. Krieger, Selina S. Dwight, Stuart R. Miyasato, Maria C. Costanzo, Robert S. Nash, Jodi E. Hirschman, Esther T. Chan, Kalpana Karra, Benjamin C. Hitz, Julie Park, Dianna G. Fisk, J. Michael Cherry, Karen R. Christie, Shuai Weng, Matt Simison, Rama Balakrishnan, Stacia R. Engel, Gail Binkley, and Craig Amundsen
- Subjects
Genes, Fungal ,Saccharomyces cerevisiae ,Genomics ,Genome browser ,Computational biology ,Saccharomyces ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Terminology as Topic ,Databases, Genetic ,Web page ,Genetics ,030304 developmental biology ,0303 health sciences ,biology ,High-Throughput Nucleotide Sequencing ,Molecular Sequence Annotation ,Articles ,biology.organism_classification ,Phenotype ,ComputingMethodologies_PATTERNRECOGNITION ,Encyclopedia ,Genome, Fungal ,Software ,030217 neurology & neurosurgery - Abstract
The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use.
- Published
- 2011
- Full Text
- View/download PDF
35. The Gene Ontology: enhancements for 2011
- Author
-
P D'Eustachio, Benjamin C. Hitz, Julie Park, Paul Browne, Douglas G. Howe, Cynthia J. Krieger, Kalpana Karra, Stan Laulederkind, Karen R. Christie, Susan Tweedie, Eurie L. Hong, Lydie Bougueleret, Michele Magrane, Cathy R. Gresham, Rolf Apweiler, Lisa Matthews, Dong Li, Philippa J. Talmud, Ioannis Xenarios, J. M. Cherry, Tanya Z. Berardini, Deborah A. Siegele, Rama Balakrishnan, D. Sitnikov, A. Auchinchloss, Selina S. Dwight, Tony Sawford, Paul J. Kersey, Ruth C. Lovering, Ruth Y. Eberhardt, Ursula Hinz, Lakshmi Pillai, Sylvain Poux, Edith D. Wong, Klemens Pichler, Kati Laiho, Malcolm J. Gardner, Stephen G. Oliver, Lionel Breuza, Kara Dolinski, P Lemercier, Kristian B. Axelsen, Midori A. Harris, Adrienne E. Zweifel, H. Drabkin, Guillaume Keller, Marek S. Skrzypek, Daniel M. Staines, Fiona M. McCarthy, Nicholas H. Brown, Mark D. McDowall, Antonia Lock, Mary Shimoyama, Maria C. Costanzo, Teresia Buza, S. Jimenez, Rex L. Chisholm, Paul W. Sternberg, Hui Wang, Nadine Gruaz-Gumowski, Chantal Hulo, Rebecca E. Foulger, Melinda R. Dwinell, Judith A. Blake, Marcus C. Chibucos, B. K. McIntosh, C. D. Amundsen, Jane Lomax, L Famiglietti, Tom Hayman, Michael Tognolli, Eva Huala, James C. Hu, Patrick Masson, Maria Jesus Martin, Benoit Bely, Shuai Weng, Heather C. Wick, E. Dimmer, L. Ni, Catherine Rivoire, Christopher J. Mungall, H. Sehra, P. Duek-Roggli, Maria Victoria Schneider, Dianna G. Fisk, Michael S. Livstone, Ivo Pedruzzi, Shyamala Sundaram, Donna K. Slonim, Isabelle Cusin, Stuart R. Miyasato, Timothy F. Lowry, Varsha K. Khodiyar, Seth Carbon, Elisabeth Coudert, Jürg Bähler, Juancarlos Chan, Evelyn Camon, Daniel P. Renfro, Anne Estreicher, M. C. Blatter, Robert S. Nash, P Gaudet, Sven Heinicke, K. Van Auken, Stacia R. Engel, Alan Bridge, Ralf Stephan, Mary E. Dolan, Shane C. Burgess, Petra Fey, Shur-Jen Wang, Damien Lieberherr, Duncan Legge, P. Porras Millán, Andre Stutz, Yasmin Alam-Faruque, Gail Binkley, Bernd Roechert, S. Branconi-Quintaje, Ghislaine Argoud-Puy, S. Basu, Kim Rutherford, M. Moinat, Monte Westerfield, Arnaud Gos, Eleanor J Stanley, Valerie Wood, Ranjana Kishore, Diego Poggioli, S. Ferro-Rojas, Victoria Petri, Florence Jungo, Suzanna E. Lewis, Emmanuel Boutet, Warren A. Kibbe, M Feuermann, Claire O'Donovan, W. M. Chan, J. James, David P. Hill, Rachael P. Huntley, M. Gwinn Giglio, Paul Thomas, Jodi E. Hirschman, Paola Roncaglia, Gene Ontology Consortium, Blake, JA., Dolan, M., Drabkin, H., Hill, DP., Ni, L., Sitnikov, D., Burgess, S., Buza, T., Gresham, C., McCarthy, F., Pillai, L., Wang, H., Carbon, S., Lewis, SE., Mungall, CJ., Gaudet, P., Chisholm, RL., Fey, P., Kibbe, WA., Basu, S., Siegele, DA., McIntosh, BK., Renfro, DP., Zweifel, AE., Hu, JC., Brown, NH., Tweedie, S., Alam-Faruque, Y., Apweiler, R., Auchinchloss, A., Axelsen, K., Argoud-Puy, G., Bely, B., Blatter, M-., Bougueleret, L., Boutet, E., Branconi, S., Breuza, L., Bridge, A., Browne, P., Chan, WM., Coudert, E., Cusin, I., Dimmer, E., Duek-Roggli, P., Eberhardt, R., Estreicher, A., Famiglietti, L., Ferro-Rojas, S., Feuermann, M., Gardner, M., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Huntley, R., James, J., Jimenez, S., Jungo, F., Keller, G., Laiho, K., Legge, D., Lemercier, P., Lieberherr, D., Magrane, M., Martin, MJ., Masson, P., Moinat, M., O'Donovan, C., Pedruzzi, I., Pichler, K., Poggioli, D., Porras Millán, P., Poux, S., Rivoire, C., Roechert, B., Sawford, T., Schneider, M., Sehra, H., Stanley, E., Stutz, A., Sundaram, S., Tognolli, M., Xenarios, I., Foulger, R., Lomax, J., Roncaglia, P., Camon, E., Khodiyar, VK., Lovering, RC., Talmud, PJ., Chibucos, M., Gwinn Giglio, M., Dolinski, K., Heinicke, S., Livstone, MS., Stephan, R., Harris, MA., Oliver, SG., Rutherford, K., Wood, V., Bahler, J., Lock, A., Kersey, PJ., McDowall, MD., Staines, DM., Dwinell, M., Shimoyama, M., Laulederkind, S., Hayman, T., Wang, S-., Petri, V., Lowry, T., D'Eustachio, P., Matthews, L., Amundsen, CD., Balakrishnan, R., Binkley, G., Cherry, JM., Christie, KR., Costanzo, MC., Dwight, SS., Engel, SR., Fisk, DG., Hirschman, JE., Hitz, BC., Hong, EL., Karra, K., Krieger, CJ., Miyasato, SR., Nash, RS., Park, J., Skrzypek, MS., Weng, S., Wong, ED., Berardini, TZ., Li, D., Huala, E., Slonim, D., Wick, H., Thomas, P., Chan, J., Kishore, R., Sternberg, P., Van Auken, K., Howe, D., and Westerfield, M.
- Subjects
Quality Control ,0303 health sciences ,media_common.quotation_subject ,Databases, Genetic ,Molecular Sequence Annotation/standards ,Vocabulary, Controlled ,Inference ,Molecular Sequence Annotation ,Articles ,Biology ,Ontology (information science) ,World Wide Web ,Open Biomedical Ontologies ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Resource (project management) ,Controlled vocabulary ,Genetics ,Social media ,Function (engineering) ,030217 neurology & neurosurgery ,030304 developmental biology ,media_common - Abstract
The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the introduction of computable logical definitions and development of new tools for the streamlined addition of terms to the ontology. The GOC continues to support its user community through the use of e-mail lists, social media and web-based resources.
- Published
- 2011
- Full Text
- View/download PDF
36. Saccharomyces Genome Database provides mutant phenotype data
- Author
-
Marek S. Skrzypek, Gail Binkley, Michael S. Livstone, Rose Oughtred, Shuai Weng, Stuart R. Miyasato, David Botstein, Eurie L. Hong, Rama Balakrishnan, Jodi E. Hirschman, Robert S. Nash, Benjamin C. Hitz, Maria C. Costanzo, Julie Park, Cynthia J. Krieger, Dianna G. Fisk, Stacia R. Engel, Kara Dolinski, Edith D. Wong, Karen R. Christie, Selina S. Dwight, and J. Michael Cherry
- Subjects
Protein domain ,Mutant ,Saccharomyces cerevisiae ,Genes, Fungal ,Information Storage and Retrieval ,Biology ,medicine.disease_cause ,Saccharomyces ,Databases, Genetic ,Genetics ,medicine ,DNA, Fungal ,Databases, Protein ,Mutation ,Internet ,Saccharomyces genome database ,Computational Biology ,Articles ,biology.organism_classification ,Phenotype ,Yeast ,Protein Structure, Tertiary ,Genome, Fungal ,Databases, Nucleic Acid ,Software - Abstract
The Saccharomyces Genome Database (SGD; http:// www.yeastgenome.org) is a scientific database for the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker’s or budding yeast. The information in SGD includes functional annotations, mapping and sequence information, protein domains and structure, expression data, mutant phenotypes, physical and genetic interactions and the primary literature from which these data are derived. Here we describe how published phenotypes and genetic interaction data are annotated and displayed in SGD.
- Published
- 2009
37. The Gene Ontology project in 2008
- Author
-
John Day Richter, Rex L. Chisholm, Carol J. Bult, Petra Fey, Michael S. Livstone, Susan Bromberg, Evelyn Camon, Suzanna E. Lewis, Janan T. Eppig, Emily Dimmer, Mary Shimoyama, Ni Li, Rose Oughtred, Rolf Apweiler, Stuart R. Miyasato, Edith D. Wong, Tanya Z. Berardini, Maria C. Costanzo, Christopher J. Mungall, David P. Hill, Ruth C. Lovering, Valerie Wood, Marek S. Skrzypek, Jodi E. Hirschman, J. Michael Cherry, Li Donghui, Seth Carbon, Jennifer R. Wortman, Kara Dolinski, Giorgio Valle, Kathy K. Zhu, Susan Tweedie, Shane C. Burgess, Stacia R. Engel, Trudy Torto Alalibo, Paul W. Sternberg, Fiona M. McCarthy, Pankaj Jaiswal, Doug Howe, Ranjana Kishore, Jennifer I. Deegan, Warren A. Kibbe, Gail Binkley, Simon N. Twigger, Harold J. Drabkin, Erika Feltrin, Martin Aslett, Qing Dong, Matthew Berriman, David Botstein, Victoria Petri, Pascale Gaudet, Candace Collmer, Shuai Weng, Cynthia J. Krieger, Linda Hannick, Dianna G. Fisk, Robert S. Nash, Rachael P. Huntley, Nicola Mulder, Jennifer L. Smith, Sue Povey, Seung Y. Rhee, Stan Laulederkind, Benjamin C. Hitz, Julie Park, Howard J. Jacob, Midori A. Harris, Michelle G. Giglio, Judith A. Blake, Martin Ringwald, Erich M. Schwarz, Daniel Barrell, Rama Balakrishnan, Alexander D. Diehl, Trent E. Seigfried, Amelia Ireland, Eurie L. Hong, Jane Lomax, Karen Eilbeck, Michael Ashburner, Karen R. Christie, Kimberly Van Auken, Mary E. Dolan, Varsha K. Khodiyar, and Monte Westerfield
- Subjects
Interface (Java) ,Genomics ,Biology ,Bioinformatics ,Vocabulary ,World Wide Web ,Open Biomedical Ontologies ,Databases ,03 medical and health sciences ,Annotation ,Mice ,User-Computer Interface ,0302 clinical medicine ,Resource (project management) ,Genetic ,Controlled vocabulary ,Databases, Genetic ,Genetics ,Animals ,Humans ,Sequence Ontology ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,030304 developmental biology ,0303 health sciences ,Internet ,business.industry ,Articles ,Rats ,Sequence Analysis ,Vocabulary, Controlled ,030220 oncology & carcinogenesis ,The Internet ,ComputingMethodologies_GENERAL ,Controlled ,business ,Caltech Library Services - Abstract
The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org/). The ontologies have been extended and refined for several biological areas, and improvements to the structure of the ontologies have been implemented. To improve the quantity and quality of gene product annotations available from its public repository, the GO Consortium has launched a focused effort to provide comprehensive and detailed annotation of orthologous genes across a number of ‘reference’ genomes, including human and several key model organisms. Software developments include two releases of the ontology-editing tool OBO-Edit, and improvements to the AmiGO browser interface.
- Published
- 2007
38. Expanded protein information at SGD: new pages and proteome browser
- Author
-
Rama Balakrishnan, Chandra L. Theesfeld, Robert S. Nash, Maria C. Costanzo, J. Michael Cherry, Kara Dolinski, Marek S. Skrzypek, Eurie L. Hong, Mark Schroeder, David Botstein, Shuai Weng, Michael S. Livstone, Stacia R. Engel, Selina S. Dwight, Christopher Lane, Gail Binkley, Benjamin C. Hitz, Julie Park, Stuart R. Miyasato, Jodi E. Hirschman, Karen R. Christie, Anand Sethuraman, Dianna G. Fisk, Qing Dong, and Rose Oughtred
- Subjects
Proteomics ,Internet ,Saccharomyces cerevisiae Proteins ,Information retrieval ,Protein family ,business.industry ,Saccharomyces cerevisiae ,Articles ,Biology ,Bioinformatics ,Visualization ,User-Computer Interface ,ComputingMethodologies_PATTERNRECOGNITION ,Protein Annotation ,Sequence Analysis, Protein ,Web page ,Proteome ,Computer Graphics ,Genetics ,The Internet ,Genome, Fungal ,Databases, Protein ,business ,Hidden Markov model - Abstract
The recent explosion in protein data generated from both directed small-scale studies and large-scale proteomics efforts has greatly expanded the quantity of available protein information and has prompted the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) to enhance the depth and accessibility of protein annotations. In particular, we have expanded ongoing efforts to improve the integration of experimental information and sequence-based predictions and have redesigned the protein information web pages. A key feature of this redesign is the development of a GBrowse-derived interactive Proteome Browser customized to improve the visualization of sequence-based protein information. This Proteome Browser has enabled SGD to unify the display of hidden Markov model (HMM) domains, protein family HMMs, motifs, transmembrane regions, signal peptides, hydropathy plots and profile hits using several popular prediction algorithms. In addition, a physico-chemical properties page has been introduced to provide easy access to basic protein information. Improvements to the layout of the Protein Information page and integration of the Proteome Browser will facilitate the ongoing expansion of sequence-specific experimental information captured in SGD, including post-translational modifications and other user-defined annotations. Finally, SGD continues to improve upon the availability of genetic and physical interaction data in an ongoing collaboration with BioGRID by providing direct access to more than 82,000 manually-curated interactions.
- Published
- 2007
- Full Text
- View/download PDF
39. Principles of metadata organization at the ENCODE data coordination center
- Author
-
Benjamin C. Hitz, Aditi K. Narayanan, Jason A. Hilton, Idan Gabdank, Cricket A. Sloan, Venkat S. Malladi, J. Seth Strattan, J. Michael Cherry, Greg Roe, Jean M. Davidson, Forrest Y. Tanaka, Laurence D. Rowe, Eurie L. Hong, Timothy R. Dreszer, Nikhil R. Podduturi, Marcus Ho, Brian T. Lee, and Esther T. Chan
- Subjects
0301 basic medicine ,Quality Control ,Computer science ,ENCODE ,General Biochemistry, Genetics and Molecular Biology ,World Wide Web ,03 medical and health sciences ,Mice ,0302 clinical medicine ,Nucleic Acids ,Data file ,Databases, Genetic ,Animals ,Humans ,Caenorhabditis elegans ,Data collection ,Data element ,Data Collection ,Metadata standard ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,DNA ,Metadata repository ,Metadata ,030104 developmental biology ,Drosophila melanogaster ,030220 oncology & carcinogenesis ,Encyclopedia ,Original Article ,General Agricultural and Biological Sciences ,Sequence Alignment ,Algorithms ,Information Systems - Abstract
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org.
- Published
- 2015
40. Ontology application and use at the ENCODE DCC
- Author
-
Marcus Ho, Stuart R. Miyasato, W. James Kent, J. Seth Strattan, Jean M. Davidson, Nikhil R. Podduturi, Cricket A. Sloan, Greg Roe, Eurie L. Hong, Laurence D. Rowe, Brian T. Lee, Esther T. Chan, J. Michael Cherry, Drew T. Erickson, Forrest Y. Tanaka, Benjamin C. Hitz, Venkat S. Malladi, and Matt Simison
- Subjects
Information retrieval ,Transcription, Genetic ,Standardization ,Computer science ,Experimental data ,Molecular Sequence Annotation ,Ontology (information science) ,ENCODE ,General Biochemistry, Genetics and Molecular Biology ,Set (abstract data type) ,World Wide Web ,Metadata ,Mice ,Gene Ontology ,Databases, Genetic ,Encyclopedia ,Animals ,Humans ,Original Article ,Gene Regulatory Networks ,General Agricultural and Biological Sciences ,Data Curation ,Information Systems - Abstract
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC’s use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects. Database URL: https://www.encodeproject.org/
- Published
- 2015
- Full Text
- View/download PDF
41. Correction: Corrigendum: InterMOD: integrated data and tools for the unification of model organism research
- Author
-
J. Michael Cherry, Quang M. Trinh, Andrew Vallejos, Lincoln Stein, Jelena Aleksic, Gos Micklem, Richard N. Smith, Benjamin C. Hitz, Pushkala Jayaraman, Rachel Lyne, Howie Motenko, Joel Richardson, Christian Pich, Elizabeth A. Worthey, Gail Binkley, Simon N. Twigger, Kalpana Karra, J. D. Wong, Rama Balakrishnan, Steven B. Neuhauser, Todd W. Harris, Julie Sullivan, Monte Westerfield, and Sierra A. T. Moxon
- Subjects
Multidisciplinary ,Unification ,Computer science ,ved/biology ,ved/biology.organism_classification_rank.species ,computer.software_genre ,Data science ,03 medical and health sciences ,0302 clinical medicine ,030220 oncology & carcinogenesis ,Data mining ,Model organism ,computer ,030217 neurology & neurosurgery - Abstract
CORRIGENDUM: InterMOD: integrated data and tools for the unification of model organism research
- Published
- 2013
- Full Text
- View/download PDF
42. InterMOD: integrated data and tools for the unification of model organism research
- Author
-
Richard N. Smith, Pushkala Jayaraman, Rama Balakrishnan, Elizabeth A. Worthey, Steven B. Neuhauser, Gail Binkley, Julie Sullivan, Lincoln Stein, J. D. Wong, Jelena Aleksic, Sierra A. T. Moxon, J. Michael Cherry, Monte Westerfield, Todd W. Harris, Quang M. Trinh, Rachel Lyne, Benjamin C. Hitz, Gos Micklem, Simon N. Twigger, Andrew Vallejos, Howie Motenko, Joel Richardson, Christian Pich, and Kalpana Karra
- Subjects
Unification ,Databases, Factual ,media_common.quotation_subject ,ved/biology.organism_classification_rank.species ,Biology ,computer.software_genre ,Article ,Data modeling ,03 medical and health sciences ,Consistency (database systems) ,0302 clinical medicine ,Comparative research ,Databases, Genetic ,Animals ,Function (engineering) ,Model organism ,030304 developmental biology ,media_common ,0303 health sciences ,Multidisciplinary ,Genome ,Models, Genetic ,ved/biology ,Genomics ,Data science ,Data warehouse ,DECIPHER ,Data mining ,computer ,030217 neurology & neurosurgery - Abstract
Model organisms are widely used for understanding basic biology and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.
- Published
- 2013
43. The YeastGenome app: the Saccharomyces Genome Database at your fingertips
- Author
-
Benjamin C. Hitz, Kalpana Karra, Eurie L. Hong, Edith D. Wong, and J. Michael Cherry
- Subjects
business.product_category ,Computer science ,Genes, Fungal ,Genome browser ,Saccharomyces cerevisiae ,General Biochemistry, Genetics and Molecular Biology ,World Wide Web ,Access to Information ,03 medical and health sciences ,0302 clinical medicine ,Databases, Genetic ,Internet access ,030304 developmental biology ,0303 health sciences ,Internet ,business.industry ,Fungal genetics ,Hyperlink ,Online help ,Gene nomenclature ,ComputingMethodologies_PATTERNRECOGNITION ,The Internet ,Original Article ,Genome, Fungal ,General Agricultural and Biological Sciences ,business ,Mobile device ,030217 neurology & neurosurgery ,Cell Phone ,Information Systems - Abstract
The Saccharomyces Genome Database (SGD) is a scientific database that provides researchers with high-quality curated data about the genes and gene products of Saccharomyces cerevisiae. To provide instant and easy access to this information on mobile devices, we have developed YeastGenome, a native application for the Apple iPhone and iPad. YeastGenome can be used to quickly find basic information about S. cerevisiae genes and chromosomal features regardless of internet connectivity. With or without network access, you can view basic information and Gene Ontology annotations about a gene of interest by searching gene names and gene descriptions or by browsing the database within the app to find the gene of interest. With internet access, the app provides more detailed information about the gene, including mutant phenotypes, references and protein and genetic interactions, as well as provides hyperlinks to retrieve detailed information by showing SGD pages and views of the genome browser. SGD provides online help describing basic ways to navigate the mobile version of SGD, highlights key features and answers frequently asked questions related to the app. The app is available from iTunes (http://itunes.com/apps/yeastgenome). The YeastGenome app is provided freely as a service to our community, as part of SGD’s mission to provide free and open access to all its data and annotations.
- Published
- 2013
44. YeastMine—an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit
- Author
-
Eurie L. Hong, Benjamin C. Hitz, Julie Park, Rama Balakrishnan, Kalpana Karra, Gail Binkley, J. Michael Cherry, Gos Micklem, and Julie Sullivan
- Subjects
Computer science ,Interface (computing) ,Saccharomyces cerevisiae ,Data type ,General Biochemistry, Genetics and Molecular Biology ,World Wide Web ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Databases, Genetic ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,030304 developmental biology ,Internet ,0303 health sciences ,Information retrieval ,biology ,business.industry ,Original Articles ,biology.organism_classification ,File format ,Budding yeast ,Data warehouse ,Template ,Database Management Systems ,The Internet ,Genome, Fungal ,General Agricultural and Biological Sciences ,business ,030217 neurology & neurosurgery ,Information Systems - Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) provides high-quality curated genomic, genetic, and molecular information on the genes and their products of the budding yeast Saccharomyces cerevisiae. To accommodate the increasingly complex, diverse needs of researchers for searching and comparing data, SGD has implemented InterMine (http://www.InterMine.org), an open source data warehouse system with a sophisticated querying interface, to create YeastMine (http://yeastmine.yeastgenome.org). YeastMine is a multifaceted search and retrieval environment that provides access to diverse data types. Searches can be initiated with a list of genes, a list of Gene Ontology terms, or lists of many other data types. The results from queries can be combined for further analysis and saved or downloaded in customizable file formats. Queries themselves can be customized by modifying predefined templates or by creating a new template to access a combination of specific data types. YeastMine offers multiple scenarios in which it can be used such as a powerful search interface, a discovery tool, a curation aid and also a complex database presentation format. DATABASE URL: http://yeastmine.yeastgenome.org.
- Published
- 2012
- Full Text
- View/download PDF
45. New mutant phenotype data curation system in the Saccharomyces Genome Database
- Author
-
Robert S. Nash, Marek S. Skrzypek, Maria C. Costanzo, Eurie L. Hong, Stacia R. Engel, Edith D. Wong, Gail Binkley, J. Michael Cherry, and Benjamin C. Hitz
- Subjects
Genetics ,0303 health sciences ,Saccharomyces genome database ,Data curation ,biology ,030302 biochemistry & molecular biology ,Mutant ,Saccharomyces cerevisiae ,Locus (genetics) ,biology.organism_classification ,Phenotype ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Annotation ,Original Article ,General Agricultural and Biological Sciences ,Gene ,030304 developmental biology ,Information Systems - Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) organizes and displays molecular and genetic information about the genes and proteins of baker's yeast, Saccharomyces cerevisiae. Mutant phenotype screens have been the starting point for a large proportion of yeast molecular biological studies, and are still used today to elucidate the functions of uncharacterized genes and discover new roles for previously studied genes. To greatly facilitate searching and comparison of mutant phenotypes across genes, we have devised a new controlled-vocabulary system for capturing phenotype information. Each phenotype annotation is represented as an ‘observable’, which is the entity, or process that is observed, and a ‘qualifier’ that describes the change in that entity or process in the mutant (e.g. decreased, increased, or abnormal). Additional information about the mutant, such as strain background, allele name, conditions under which the phenotype is observed, or the identity of relevant chemicals, is captured in separate fields. For each gene, a summary of the mutant phenotype information is displayed on the Locus Summary page, and the complete information is displayed in tabular format on the Phenotype Details Page. All of the information is searchable and may also be downloaded in bulk using SGD's Batch Download Tool or Download Data Files Page. In the future, phenotypes will be integrated with other curated data to allow searching across different types of functional information, such as genetic and physical interaction data and Gene Ontology annotations. Database URL: http://www.yeastgenome.org/
- Published
- 2008
46. Integration of new alternative reference strain genome sequences into theSaccharomycesgenome database
- Author
-
Rama Balakrishnan, Sage T. Hellerstedt, J. Michael Cherry, Janos Demeter, Edith D. Wong, Stacia R. Engel, Gail Binkley, Marek S. Skrzypek, Travis K. Sheppard, Maria C. Costanzo, Robert S. Nash, Kelley Paskov, Kalpana Karra, Shuai Weng, Giltae Song, Kyla S. Dalusag, and Benjamin C. Hitz
- Subjects
0301 basic medicine ,Saccharomyces cerevisiae ,Locus (genetics) ,Biology ,ENCODE ,Genome ,General Biochemistry, Genetics and Molecular Biology ,Saccharomyces ,User-Computer Interface ,03 medical and health sciences ,Protein sequencing ,Databases, Genetic ,natural sciences ,Gene ,Genetics ,Reproducibility of Results ,Molecular Sequence Annotation ,Genomics ,Genome project ,biology.organism_classification ,030104 developmental biology ,Database Update ,Genome, Fungal ,General Agricultural and Biological Sciences ,Information Systems ,Reference genome - Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences. Database URL: www.yeastgenome.org
- Published
- 2016
- Full Text
- View/download PDF
47. Gene Ontology annotations at SGD: new data sources and annotation methods
- Author
-
Stuart R. Miyasato, Rama Balakrishnan, Shuai Weng, Dianna G. Fisk, Eurie L. Hong, David Botstein, Robert S. Nash, Jodi E. Hirschman, Marek S. Skrzypek, Edith D. Wong, Selina S. Dwight, Michael S. Livstone, Stacia R. Engel, Kathy K. Zhu, J. Michael Cherry, Benjamin C. Hitz, Rose Oughtred, Julie Park, Kara Dolinski, Gail Binkley, Karen R. Christie, Cynthia J. Krieger, Maria C. Costanzo, and Qing Dong
- Subjects
Genetics ,Data source ,Internet ,Information retrieval ,Saccharomyces cerevisiae Proteins ,Gene ontology ,Genes, Fungal ,Computational Biology ,Genomics ,Saccharomyces cerevisiae ,Articles ,Biology ,Genome ,Annotation ,User-Computer Interface ,Vocabulary, Controlled ,Controlled vocabulary ,Databases, Genetic ,UniProt ,Experimental methods ,Genome, Fungal ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) - Abstract
The Saccharomyces Genome Database (SGD; http:// www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current.
- Published
- 2007
48. The Saccharomyces Genome Database provides comprehensive information about the biology of S. cerevisiae and tools for studies in comparative genomics
- Author
-
S. Miyasoto, Rose Oughtred, Michael S. Livstone, Stacia R. Engel, Robert S. Nash, Gail Binkley, Qing Dong, Dianna G. Fisk, Marek S. Skrzypek, Maria C. Costanzo, Mark Schroeder, J. M. Cherry, Eurie L. Hong, Rey Andrada, Karen R. Christie, David Botstein, Shuai Weng, Benjamin C. Hitz, Edith D. Wong, Rama Balakrishnan, Selina S. Dwight, Jinha M. Park, Jodi E. Hirschman, and Kara Dolinski
- Subjects
Comparative genomics ,Saccharomyces genome database ,Genetics ,Computational biology ,Biology ,Molecular Biology ,Biochemistry ,Biotechnology - Published
- 2007
- Full Text
- View/download PDF
49. Free energy determinants of secondary structure formation: III. beta-turns and their role in protein folding
- Author
-
Benjamin C. Hitz, An-Suei Yang, and Barry Honig
- Subjects
Protein Folding ,Databases, Factual ,Chemistry ,Protein Conformation ,Solvation ,Hydrogen Bonding ,Dipeptides ,Conformational entropy ,Force field (chemistry) ,Protein Structure, Secondary ,Crystallography ,Structural Biology ,Chemical physics ,Metastability ,Data Interpretation, Statistical ,Thermodynamics ,Protein folding ,Twist ,Molecular Biology ,Peptide sequence ,Protein secondary structure ,Monte Carlo Method ,Software - Abstract
The stability of beta-turns is calculated as a function of sequence and turn type with a Monte Carlo sampling technique. The conformational energy of four internal hydrogen-bonded turn types, I, I', II and II', is obtained by evaluating their gas phase energy with the CHARMM force field and accounting for solvation effects with the Finite Difference Poisson-Boltzmann (FDPB) method. All four turn types are found to be less stable than the coil state, independent of the sequence in the turn. The free-energy penalties associated with turn formation vary between 1.6 kcal/mol and 7.7 kcal/mol, depending on the sequence and turn type. Differences in turn stability arise mainly from intraresidue interactions within the two central residues of the turn. For each combination of the two central residues, except for -Gly-Gly-, the most stable beta-turn type is always found to occur most commonly in native proteins. The fact that a model based on local interactions accounts for the observed preference of specific sequences suggests that long-range tertiary interactions tend to play a secondary role in determining turn conformation. In contrast, for beta-hairpins, long-range interactions appear to dominate. Specifically, due to the right-handed twist of beta-strands, type I' turns for -Gly-Gly- are found to occur with high frequency, even when local energetics would dictate otherwise. The fact that any combination of two residues is found able to adopt a relatively low-energy turn structure explains why the amino acid sequence in turns is highly variable. The calculated free-energy cost of turn formation, when combined with related numbers obtained for alpha-helices and beta-sheets, suggests a model for the initiation of protein folding based on metastable fragments of secondary structure.
- Published
- 1996
50. H3K4me3 Breadth Is Linked to Cell Identity and Transcriptional Consistency
- Author
-
Edith D. Wong, Thomas A. Rando, Salah Mahmoudi, Michael Snyder, Julie C. Baker, Keerthana Devarajan, Benjamin C. Hitz, Anshul Kundaje, Kalpana Karra, Duygu Ucar, Elena Mancini, Elizabeth A. Pollina, Rakhi Gupta, J. Michael Cherry, Anne Brunet, Aaron Daugherty, and Bérénice A. Benayoun
- Subjects
Cell type ,Transcription, Genetic ,Cells ,education ,RNA polymerase II ,Computational biology ,Methylation ,Article ,General Biochemistry, Genetics and Molecular Biology ,Histones ,Histone H3 ,03 medical and health sciences ,0302 clinical medicine ,Neural Stem Cells ,Artificial Intelligence ,Histone code ,Animals ,Humans ,Gene ,030304 developmental biology ,Genetics ,0303 health sciences ,biology ,Biochemistry, Genetics and Molecular Biology(all) ,Lysine ,030302 biochemistry & molecular biology ,Promoter ,Genomics ,Histone Code ,Mice, Inbred C57BL ,Histone ,biology.protein ,H3K4me3 ,RNA Polymerase II ,030217 neurology & neurosurgery - Abstract
SummaryTrimethylation of histone H3 at lysine 4 (H3K4me3) is a chromatin modification known to mark the transcription start sites of active genes. Here, we show that H3K4me3 domains that spread more broadly over genes in a given cell type preferentially mark genes that are essential for the identity and function of that cell type. Using the broadest H3K4me3 domains as a discovery tool in neural progenitor cells, we identify novel regulators of these cells. Machine learning models reveal that the broadest H3K4me3 domains represent a distinct entity, characterized by increased marks of elongation. The broadest H3K4me3 domains also have more paused polymerase at their promoters, suggesting a unique transcriptional output. Indeed, genes marked by the broadest H3K4me3 domains exhibit enhanced transcriptional consistency rather than increased transcriptional levels, and perturbation of H3K4me3 breadth leads to changes in transcriptional consistency. Thus, H3K4me3 breadth contains information that could ensure transcriptional precision at key cell identity/function genes.
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.