70 results on '"Sequence Ontology"'
Search Results
2. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
- Author
-
Bolleman, Jerven T, Mungall, Christopher J, Strozzi, Francesco, Baran, Joachim, Dumontier, Michel, Bonnal, Raoul JP, Buels, Robert, Hoehndorf, Robert, Fujisawa, Takatomo, Katayama, Toshiaki, and Cock, Peter JA
- Subjects
Information and Computing Sciences ,Human Genome ,Genetics ,Networking and Information Technology R&D (NITRD) ,Biotechnology ,Generic health relevance ,Biological Ontologies ,Databases ,Genetic ,Databases ,Protein ,Fuzzy Logic ,Humans ,Molecular Sequence Annotation ,Nucleotides ,Proteins ,Reference Books ,Semantics ,Annotation ,Data integration ,RDF ,SPARQL ,Semantic Web ,Sequence feature ,Sequence ontology ,Standardisation ,Other Biological Sciences ,Artificial Intelligence and Image Processing ,Information Systems ,Information and computing sciences - Abstract
BackgroundNucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples.DescriptionWe have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations.ConclusionsOur ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.
- Published
- 2016
3. FALDO: A semantic standard for describing the location of nucleotide and protein feature annotation
- Author
-
Cock, Peter [The James Hutton Institute, Dundee (United Kingdom)]
- Published
- 2016
- Full Text
- View/download PDF
4. Clinical Research in the Postgenomic Era
- Author
-
Meystre, Stephane M., Gouripeddi, Ramkiran, Richesson, Rachel L., editor, and Andrews, James E., editor
- Published
- 2019
- Full Text
- View/download PDF
5. Sequence Ontology
- Author
-
Eilbeck, Karen, Holt, Carson, Dubitzky, Werner, editor, Wolkenhauer, Olaf, editor, Cho, Kwang-Hyun, editor, and Yokota, Hiroki, editor
- Published
- 2013
- Full Text
- View/download PDF
6. Clinical Research in the Postgenomic Era
- Author
-
Meystre, Stephane M., Narus, Scott P., Mitchell, Joyce A., Richesson, Rachel L., editor, and Andrews, James E., editor
- Published
- 2012
- Full Text
- View/download PDF
7. Experiences Using Logic Programming in Bioinformatics
- Author
-
Mungall, Chris, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Hill, Patricia M., editor, and Warren, David S., editor
- Published
- 2009
- Full Text
- View/download PDF
8. GAD: A Python Script for Dividing Genome Annotation Files into Feature-Based Files
- Author
-
Norhan Yasser and Ahmed Karam
- Subjects
Untranslated region ,Computer science ,Big data ,Health Informatics ,Genomics ,Computational biology ,Genome ,DNA sequencing ,General Biochemistry, Genetics and Molecular Biology ,Exon ,03 medical and health sciences ,Annotation ,Intergenic region ,Humans ,Sequence Ontology ,Gene ,030304 developmental biology ,computer.programming_language ,Whole genome sequencing ,0303 health sciences ,Information retrieval ,Genome, Human ,business.industry ,030302 biochemistry & molecular biology ,Intron ,Computational Biology ,Molecular Sequence Annotation ,Genome project ,Gene Annotation ,Python (programming language) ,File format ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,business ,computer ,Software - Abstract
Nowadays, manipulating and analyzing publicly available genomic datasets become a daily task in bioinformatics and genomics laboratories. The release of several genome sequencing projects prompts bioinformaticians to develop automated scripts and pipelines which analyze genomic datasets in particular gene annotation pipelines. Handling genome annotation files with fully-featured programs used by non-developers is necessary, furthermore, accelerating genomic data analysis with a focus on diminishing the genome annotation and sequence files based on specific features is required. Consequently, to extract genome features from GTF or GFF3 in a precise manner, GAD script (https://github.com/bio-projects/GAD) provides a simple graphical user interface which interpreted by all python versions installed in different operating systems. GAD script contains unique entry widgets which are capable to analyze multiple genome sequence and annotation files by a click. With highly influential coded functions, genome features such upstream genes, downstream genes, intergenic regions, genes, transcripts, exons, introns, coding sequences, five prime untranslated regions, and three prime untranslated regions and other ambiguous sequence ontology terms will be extracted. GAD script outputs the results in diverse file formats such as BED, GTF/GFF3 and FASTA files which supported by other bioinformatics programs. Our script could be incorporated into various pipelines in all genomics laboratories with the aim of accelerating data analysis.
- Published
- 2020
- Full Text
- View/download PDF
9. Murine allele and transgene symbols: ensuring unique, concise, and informative nomenclature
- Author
-
C. L. Smith and M. N. Perry
- Subjects
Mutation ,Strain (biology) ,Computational biology ,Genomics ,Biology ,Mouse Genome Informatics ,medicine.disease_cause ,Genome ,Human genetics ,Mice ,Databases, Genetic ,Genetics ,medicine ,Animals ,Transgenes ,Allele ,Sequence Ontology ,Gene ,Alleles - Abstract
In addition to naturally occurring sequence variation and spontaneous mutations, a wide array of technologies exist for modifying the mouse genome. Standardized nomenclature, including allele, transgene, and other mutation nomenclature, as well as persistent unique identifiers (PUID) are critical for effective scientific communication, comparison of results, and integration of data into knowledgebases such as Mouse Genome Informatics (MGI), Alliance for Genome Resources, and International Mouse Strain Resource (IMSR). As well as being the authoritative source for mouse gene, allele, and strain nomenclature, MGI integrates published and unpublished genomic, phenotypic, and expression data while linking to other online resources for a complete view of the mouse as a valuable model organism. The International Committee on Standardized Genetic Nomenclature for Mice has developed allele nomenclature rules and guidelines that take into account the number of genes impacted, the method of allele generation, and the nature of the sequence alteration. To capture details that cannot be included in allele symbols, MGI has further developed allele to gene relationships using sequence ontology (SO) definitions for mutations that provide links between alleles and the genes affected. MGI is also using (HGVS) variant nomenclature for variants associated with alleles that will enhance searching for mutations and will improve cross-species comparison. With the ability to assign unique and informative symbols as well as to link alleles with more than one gene, allele and transgene nomenclature rules and guidelines provide an unambiguous way to represent alterations in the mouse genome and facilitate data integration among multiple resources such the Alliance of Genome Resources and International Mouse Strain Resource.
- Published
- 2021
10. Formalization of gene regulation knowledge using ontologies and gene ontology causal activity models
- Author
-
María del Mar Roldán-García, José Antonio Vera-Ramos, Belén Juanes Cortés, Stefan Schulz, Martin Kuiper, Astrid Lægreid, Pascale Gaudet, Ruth C. Lovering, Colin Logie, and Jesualdo Tomás Fernández-Breis
- Subjects
Biological data ,Knowledge representation and reasoning ,Models, Genetic ,Computer science ,Interoperability ,Biophysics ,Molecular Sequence Annotation ,Protégé ,Ontology (information science) ,Biochemistry ,Data science ,Annotation ,Gene Ontology ,Gene Expression Regulation ,Structural Biology ,Genetics ,ComputingMethodologies_GENERAL ,Sequence Ontology ,Molecular Biology ,Natural language ,Data Curation - Abstract
Gene regulation computational research requires handling and integrating large amounts of heterogeneous data. The Gene Ontology has demonstrated that ontologies play a fundamental role in biological data interoperability and integration. Ontologies help to express data and knowledge in a machine processable way, which enables complex querying and advanced exploitation of distributed data. Contributing to improve data interoperability in gene regulation is a major objective of the GREEKC Consortium, which aims to develop a standardized gene regulation knowledge commons. GREEKC proposes the use of ontologies and semantic tools for developing interoperable gene regulation knowledge models, which should support data annotation. In this work, we study how such knowledge models can be generated from cartoons of gene regulation scenarios. The proposed method consists of generating descriptions in natural language of the cartoons; extracting the entities from the texts; finding those entities in existing ontologies to reuse as much content as possible, especially from well known and maintained ontologies such as the Gene Ontology, the Sequence Ontology, the Relations Ontology and ChEBI; and implementation of the knowledge models. The models have been implemented using Protege, a general ontology editor, and Noctua, the tool developed by the Gene Ontology Consortium for the development of causal activity models to capture more comprehensive annotations of genes and link their activities in a causal framework for Gene Ontology Annotations. We applied the method to two gene regulation scenarios and illustrate how to apply the models generated to support the annotation of data from research articles.
- Published
- 2021
11. Sequence Ontology terminology for gene regulation
- Author
-
David W. Sant, Christopher J. Mungall, Colin Logie, Stefan Schulz, Daniel R. Zerbino, Karen Eilbeck, Ruth C. Lovering, and Michael Sinclair
- Subjects
Regulation of gene expression ,Logical reasoning ,Biophysics ,Computational biology ,Ontology (information science) ,Locus Control Region ,Biochemistry ,Knowledge commons ,Terminology ,Biological Ontologies ,Gene Expression Regulation ,Expression (architecture) ,Structural Biology ,Controlled vocabulary ,Genetics ,Regulatory Elements, Transcriptional ,Sequence Ontology ,Molecular Biology - Abstract
The Sequence Ontology (SO) is a structured, controlled vocabulary that provides terms and definitions for genomic annotation. The Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC) initiative has gathered input from many groups of researchers, including the SO, the Gene Ontology (GO), and gene regulation experts, with the goal of curating information about how gene expression is regulated at the molecular level. Here we discuss recent updates to the SO reflecting current knowledge. We have developed more accurate human-readable terms (also known as classes), including new definitions, and relationships related to the expression of genes. New findings continue to give us insight into the biology of gene regulation, including the order of events, and participants in those events. These updates to the SO support logical reasoning with the current understanding of gene expression regulation at the molecular level.
- Published
- 2021
- Full Text
- View/download PDF
12. RNAcentral 2021:Secondary structure integration, improved sequence search and new member databases
- Author
-
Alex Bateman, Dimitra Karagkouni, Robin R. Gutell, Lina Ma, Ruth C. Lovering, Prita Mani, Artemis G. Hatzigeorgiou, Pieter-Jan Volders, Elspeth A. Bruford, Simon Kay, Kevin J. Peterson, Lauren M. Lui, Steven J Marygold, Todd M. Lowe, Jamie J. Cannone, Anton S. Petrov, Patricia P. Chan, Robert D. Finn, Adam Frankish, Stefan E. Seemann, David Hoksza, Bastian Fromm, Ioanna Kalvari, Maciej Szymanski, Ruth L. Seal, Ruth Barshir, Pieter Mestdagh, Simona Panni, Carlos Eduardo Ribas, Michelle S. Scott, Pablo Porras, Simon Fishilevich, Anton I. Petrov, Sam Griffiths-Jones, Blake A. Sweeney, Zhang Zhang, Jonathan M. Mudge, Zasha Weinberg, Sridhar Ramachandran, Jan Gorodkin, Shuai Weng, Eric P. Nawrocki, Wojciech M. Karlowski, Barbara Kramarz, Philia Bouchard-Bourelle, and Gil dos Santos
- Subjects
RNA, Untranslated ,Interface (Java) ,AcademicSubjects/SCI00010 ,CURATION ,Rfam ,Biology ,computer.software_genre ,ANNOTATION ,MiRBase ,03 medical and health sciences ,Annotation ,Betacoronavirus ,0302 clinical medicine ,Genetics ,Medicine and Health Sciences ,Database Issue ,Animals ,Humans ,Sequence Ontology ,Gene ,030304 developmental biology ,0303 health sciences ,Internet ,Database ,Base Sequence ,Sequence Analysis, RNA ,Fungi ,RNA ,Molecular Sequence Annotation ,Non-coding RNA ,GENE ,Gene Ontology ,Nucleic Acid Conformation ,Databases, Nucleic Acid ,computer ,Apicomplexa ,030217 neurology & neurosurgery ,Software - Abstract
RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world’s largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org.
- Published
- 2021
- Full Text
- View/download PDF
13. The Pfam protein families database in 2019
- Author
-
Alfredo Smart, Simon C. Potter, Layla Hirsh, Sara El-Gebali, Jaina Mistry, Sean R. Eddy, Matloob Qureshi, Alex Bateman, Aurelien Luciani, Gustavo A. Salazar, Damiano Piovesan, Silvio C. E. Tosatto, Robert D. Finn, Lisanna Paladin, Erik L. L. Sonnhammer, and Lorna Richardson
- Subjects
Repetitive Sequences, Amino Acid ,0303 health sciences ,Repetitive Sequences ,Proteins ,Molecular Sequence Annotation ,Computational biology ,Structural classification ,Biology ,03 medical and health sciences ,0302 clinical medicine ,Protein Domains ,Functional annotation ,Genetics ,Tandem Repeat Sequence ,Database Issue ,Protein Families Database ,Databases, Protein ,Sequence Ontology ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors’ ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.
- Published
- 2018
- Full Text
- View/download PDF
14. Evolution of the Sequence Ontology terms and relationships.
- Author
-
Mungall, Christopher J., Batchelor, Colin, and Eilbeck, Karen
- Abstract
Abstract: The Sequence Ontology is an established ontology, with a large user community, for the purpose of genomic annotation. We are reforming the ontology to provide better terms and relationships to describe the features of biological sequence, for both genomic and derived sequence. The SO is working within the guidelines of the OBO Foundry to provide interoperability between SO and the other related OBO ontologies. Here, we report changes and improvements made to SO including new relationships to better define the mereological, spatial and temporal aspects of biological sequence. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
15. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome.
- Author
-
Reese, Justin T, Childers, Christopher P, Sundaram, Jaideep P, Dickens, C Michael, Childs, Kevin L, Vile, Donald C, and Elsik, Christine G
- Subjects
- *
CATTLE , *EUKARYOTIC genomes , *COMMUNITY support , *ANNOTATIONS , *GENOMES , *BIOLOGICAL databases , *MULTIAGENT systems - Abstract
Background: A goal of the Bovine Genome Database (BGD; http://BovineGenome.org) has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC) in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion: BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions: We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
16. Abstract 3218: CKB: a Clinical Knowledge Base for interpreting correlations among cancer, biomarker and drug
- Author
-
Bolong He, Jing Ma, Hua Dong, Bingru Sun, Hank Yang, and Yizhou Ye
- Subjects
Cancer Research ,Mechanism (biology) ,Computer science ,business.industry ,Cancer ,Computational biology ,Disease ,Ontology (information science) ,medicine.disease ,Biomarker (cell) ,Oncology ,Disease Ontology ,Knowledge base ,medicine ,Sequence Ontology ,business - Abstract
Background: The field of cancer diagnosis and therapy is being empowered by Next Generation Sequencing (NGS) technology in recent years. The correlations among cancer, biomarker and drug are usually analyzed and used to guide the personalized drug recommendation for patients. However, the accurate correlation analysis faces two major challenges. First, large amount of existing variants carry the high variability in pathogenicity and clinical relevance. Second, the knowledge concerning the correlations is not easily accessible in the case of rare variants. The quantity of currently known variants, and dramatic increase of newly identified variants have showed the need for building up a knowledge base, for interpreting correlations among cancer, biomarker and drug. Method: To make the knowledge base accurately maintainable and friendly usable, we designed knowledge structures (or ontologies), and implemented an ontology based Clinical Knowledge Base (CKB). CKB consists of evidences from three different aspects, including cancer (i.e. disease), biomarker (genes and their corresponding variants), and drug. The inclusion of Gene Ontology (GO), Sequence Ontology (SO), Disease Ontology (DO) and KEGG Pathway Ontology (PO) not only describes the individual entities and the relationships between them, but also facilitates the discovery of correlations for potential biomarkers through ontology searching mechanism. CKB was implemented in Java and MySQL, and it provided a web based UI to input, analyze, relate and retrieve the data. Results: CKB supports effective retrieval of comprehensive information of cancer, biomarker and drug, and it highlights the sophisticated relationship among the respective entities. By inputting a biomarker of interest, evidences associated with the relevant genes/variants and relevant cancer types will be displayed. Evidences will be sorted by the significance levels. The levels include FDA/NMPA approved drugs, NCCN guidelines, clinical trials, clinical case report, pre-clinical evidence, and others. Furthermore, for the advanced users, the correlations between biomarker and cancer type can be explored for the research perspective of umbrella or basket clinical trials. CKB is open to public for research use at ckb.sodayun.com. Conclusions: Ontology is a powerful modeling methodology for medical knowledge. To manage the exponentially increasing NGS variants and their correlation with cancer and drug, ontology based knowledge base like CKB will play a significant role to help provide accurate correlation interpretation. It will be valuable for researchers and clinicians to better understand the correlations, in order to design new diagnostic assay or prescribe therapeutic regimens for their patients. For the more usage of CKB, more research is required, for example, how to embed CKB ontologies into a NGS pipeline. Citation Format: Jing Ma, Yizhou Ye, Hua Dong, Bingru Sun, Bolong He, Hank Yang. CKB: a Clinical Knowledge Base for interpreting correlations among cancer, biomarker and drug [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 3218.
- Published
- 2020
- Full Text
- View/download PDF
17. Gene Model Annotations forDrosophila melanogaster: The Rule-Benders
- Author
-
Susan E. St. Pierre, L. Sian Gramates, Beverley B. Matthews, David B. Emmert, Andrew J. Schroeder, Pinglei Zhou, Madeline A. Crosby, Susan M. Russo, Gilberto dos Santos, Kathleen Falls, and William M. Gelbart
- Subjects
Context (language use) ,Computational biology ,Investigations ,multiphasic exon ,non-AUG translation start ,Databases, Genetic ,Genetics ,Animals ,shared promoter ,FlyBase : A Database of Drosophila Genes & Genomes ,Sequence Ontology ,Molecular Biology ,Gene ,Genetics (clinical) ,Translational frameshift ,Base Sequence ,Models, Genetic ,biology ,Intron ,Molecular Sequence Annotation ,biology.organism_classification ,Mitochondria ,Drosophila melanogaster ,Protein Biosynthesis ,Codon, Terminator ,stop-codon suppression ,bicistronic ,RNA Editing ,RNA Splice Sites - Abstract
In the context of the FlyBase annotated gene models in Drosophila melanogaster, we describe the many exceptional cases we have curated from the literature or identified in the course of FlyBase analysis. These range from atypical but common examples such as dicistronic and polycistronic transcripts, noncanonical splices, trans-spliced transcripts, noncanonical translation starts, and stop-codon readthroughs, to single exceptional cases such as ribosomal frameshifting and HAC1-type intron processing. In FlyBase, exceptional genes and transcripts are flagged with Sequence Ontology terms and/or standardized comments. Because some of the rule-benders create problems for handlers of high-throughput data, we discuss plans for flagging these cases in bulk data downloads.
- Published
- 2015
- Full Text
- View/download PDF
18. Combining clinical and genomics queries using i2b2 - Three methods
- Author
-
Lori C. Phillips, Matteo Gabetta, Paul Avillach, Michael McDuffie, Riccardo Bellazzi, Shawn N. Murphy, Alal Eran, and Isaac S. Kohane
- Subjects
0301 basic medicine ,Data management ,Big data ,Biological database ,lcsh:Medicine ,computer.software_genre ,Health informatics ,Database and Informatics Methods ,Sequence Ontology ,lcsh:Science ,Data Management ,Multidisciplinary ,Physics ,Classical Mechanics ,Nonsense Mutation ,Genomics ,Genomic Databases ,Drag ,3. Good health ,Physical Sciences ,System integration ,Information Technology ,Research Article ,Data integration ,Computer and Information Sciences ,Health Informatics ,Fluid Mechanics ,Computational biology ,Biology ,Research and Analysis Methods ,Polymorphism, Single Nucleotide ,Continuum Mechanics ,Databases ,03 medical and health sciences ,Genomic Medicine ,Genetics ,Humans ,business.industry ,lcsh:R ,Biology and Life Sciences ,Computational Biology ,Fluid Dynamics ,Genome Analysis ,Precision medicine ,Data science ,Systems Integration ,Biological Databases ,030104 developmental biology ,Mutation ,Programming Languages ,lcsh:Q ,business ,computer ,Software - Abstract
We are fortunate to be living in an era of twin biomedical data surges: a burgeoning representation of human phenotypes in the medical records of our healthcare systems, and high-throughput sequencing making rapid technological advances. The difficulty representing genomic data and its annotations has almost by itself led to the recognition of a biomedical “Big Data” challenge, and the complexity of healthcare data only compounds the problem to the point that coherent representation of both systems on the same platform seems insuperably difficult. We investigated the capability for complex, integrative genomic and clinical queries to be supported in the Informatics for Integrating Biology and the Bedside (i2b2) translational software package. Three different data integration approaches were developed: The first is based on Sequence Ontology, the second is based on the tranSMART engine, and the third on CouchDB. These novel methods for representing and querying complex genomic and clinical data on the i2b2 platform are available today for advancing precision medicine.
- Published
- 2017
19. Understanding the Systems Biology of Pathogen Virulence Using Semantic Methodologies
- Author
-
Kevin Shieh, Aaron Golden, Julie Sullivan, David Rhee, Kami Kim, and Gos Micklem
- Subjects
0301 basic medicine ,education.field_of_study ,Computer science ,Systems biology ,Population ,Context (language use) ,Genomics ,Data science ,Quantitative Biology - Quantitative Methods ,Data warehouse ,03 medical and health sciences ,030104 developmental biology ,Cyberinfrastructure ,Knowledge extraction ,FOS: Biological sciences ,Sequence Ontology ,education ,Quantitative Methods (q-bio.QM) - Abstract
Systems biology approaches to the integrative study of cells, organs and organisms offer the best means of understanding in a holistic manner the diversity of molecular assays that can be now be implemented in a high throughput manner. Such assays can sample the genome, epigenome, proteome, metabolome and microbiome contemporaneously, allowing us for the first time to perform a complete analysis of physiological activity. The central problem remains empowering the scientific community to actually implement such an integration, across seemingly diverse data types and measurements. One promising solution is to apply semantic techniques on a self-consistent and implicitly correct ontological representation of these data types. In this paper we describe how we have applied one such solution, based around the InterMine data warehouse platform which uses as its basis the Sequence Ontology, to facilitate a systems biology analysis of virulence in the apicomplexan pathogen $Toxoplasma~gondii$, a common parasite that infects up to half the worlds population, with acute pathogenic risks for immuno-compromised individuals or pregnant mothers. Our solution, which we named `toxoMine', has provided both a platform for our collaborators to perform such integrative analyses and also opportunities for such cyberinfrastructure to be further developed, particularly to take advantage of possible semantic similarities of value to knowledge discovery in the Omics enterprise. We discuss these opportunities in the context of further enhancing the capabilities of this powerful integrative platform., To appear in the Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing (ICSC 2016)
- Published
- 2016
20. Using semantic web rules to reason on an ontology of pseudogenes
- Author
-
Ekta Khurana, Matthew E. Holford, Mark Gerstein, and Kei-Hoi Cheung
- Subjects
Statistics and Probability ,Databases, Factual ,Computer science ,Databases and Ontologies ,Information Storage and Retrieval ,02 engineering and technology ,Ontology (information science) ,Biochemistry ,World Wide Web ,Open Biomedical Ontologies ,03 medical and health sciences ,Text mining ,0202 electrical engineering, electronic engineering, information engineering ,SPARQL ,Sequence Ontology ,Molecular Biology ,Semantic Web ,030304 developmental biology ,Internet ,0303 health sciences ,Hierarchy ,Information retrieval ,Hierarchy (mathematics) ,business.industry ,computer.file_format ,Semantic reasoner ,Ismb 2010 Conference Proceedings July 11 to July 13, 2010, Boston, Ma, Usa ,Original Papers ,Semantics ,Computer Science Applications ,Computational Mathematics ,Vocabulary, Controlled ,Computational Theory and Mathematics ,Knowledge base ,Ontology ,020201 artificial intelligence & image processing ,business ,computer ,Pseudogenes - Abstract
Motivation: Recent years have seen the development of a wide range of biomedical ontologies. Notable among these is Sequence Ontology (SO) which offers a rich hierarchy of terms and relationships that can be used to annotate genomic data. Well-designed formal ontologies allow data to be reasoned upon in a consistent and logically sound way and can lead to the discovery of new relationships. The Semantic Web Rules Language (SWRL) augments the capabilities of a reasoner by allowing the creation of conditional rules. To date, however, formal reasoning, especially the use of SWRL rules, has not been widely used in biomedicine. Results: We have built a knowledge base of human pseudogenes, extending the existing SO framework to incorporate additional attributes. In particular, we have defined the relationships between pseudogenes and segmental duplications. We then created a series of logical rules using SWRL to answer research questions and to annotate our pseudogenes appropriately. Finally, we were left with a knowledge base which could be queried to discover information about human pseudogene evolution. Availability: The fully populated knowledge base described in this document is available for download from http://ontology.pseudogene.org. A SPARQL endpoint from which to query the dataset is also available at this location. Contact: matthew.holford@yale.edu; mark.gerstein@yale.edu
- Published
- 2010
- Full Text
- View/download PDF
21. SOBA: sequence ontology bioinformatics analysis
- Author
-
Karen Eilbeck, Guozhen Fan, and Barry Moore
- Subjects
Genomics ,Biology ,Bioinformatics ,Domain (software engineering) ,Terminology ,World Wide Web ,User-Computer Interface ,03 medical and health sciences ,Annotation ,Consistency (database systems) ,0302 clinical medicine ,Software ,Genetics ,Sequence Ontology ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,030304 developmental biology ,Internet ,0303 health sciences ,business.industry ,Software development ,Articles ,Data Interpretation, Statistical ,business ,Sequence Analysis ,030217 neurology & neurosurgery - Abstract
The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.
- Published
- 2010
- Full Text
- View/download PDF
22. Using Semantic Web Technologies to Annotate and Align Microarray Designs
- Author
-
Sebastian Szpakowski, Michael Krauthammer, and James P. McCusker
- Subjects
Cancer Research ,Genomics ,Biology ,Ontology (information science) ,computer.software_genre ,lcsh:RC254-282 ,Annotation ,semantic web ,genomics ,SPARQL ,ontology ,Sequence Ontology ,Semantic Web ,data integration ,computer.programming_language ,Information retrieval ,Methodology ,computer.file_format ,lcsh:Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,Data science ,ComputingMethodologies_PATTERNRECOGNITION ,Oncology ,annotation ,computer ,Data integration ,RDF query language - Abstract
In this paper, we annotate and align two different gene expression microarray designs using the Genomic ELement Ontology (GELO). GELO is a new ontology that leverages an existing community resource, Sequence Ontology (SO), to create views of genomically-aligned data in a semantic web environment. We start the process by mapping array probes to genomic coordinates. The coordinates represent an implicit link between the probes and multiple genomic elements, such as genes, transcripts, miRNA, and repetitive elements, which are represented using concepts in SO. We then use the RDF Query Language (SPARQL) to create explicit links between the probes and the elements. We show how the approach allows us to easily determine the element coverage and genomic overlap of the two array designs. We believe that the method will ultimately be useful for integration of cancer data across multiple omic studies. The ontology and other materials described in this paper are available at http://krauthammerlab.med.yale.edu/wiki/Gelo .
- Published
- 2009
23. The Protein Feature Ontology: a tool for the unification of protein feature annotations
- Author
-
Gabrielle A. Reeves, Michele Magrane, Janet M. Thornton, Karen Eilbeck, Luisa Montecchi-Palazzi, Henning Hermjakob, Claire O'Donovan, Andreas Prlić, Midori A. Harris, Rafael C. Jimenez, Sandra Orchard, and Tim Hubbard
- Subjects
Statistics and Probability ,Proteome ,Computer science ,computer.internet_protocol ,Process ontology ,Ontology (information science) ,Biochemistry ,Article ,OWL-S ,Structural genomics ,Open Biomedical Ontologies ,World Wide Web ,Protein structure ,Upper ontology ,Databases, Protein ,Sequence Ontology ,Molecular Biology ,Internet ,Ontology-based data integration ,Suggested Upper Merged Ontology ,Computational Biology ,Proteins ,Computer Science Applications ,Computational Mathematics ,Vocabulary, Controlled ,Computational Theory and Mathematics ,Posttranslational modification ,Ontology ,computer ,Ontology alignment ,Software - Abstract
The advent of sequencing and structural genomics projects has provided a dramatic boost in the number of protein structures and sequences. Due to the high-throughput nature of these projects, many of the molecules are uncharacterised and their functions unknown. This, in turn, has led to the need for a greater number and diversity of tools and databases providing annotation through transfer based on homology and prediction methods. Though many such tools to annotate protein sequence and structure exist, they are spread throughout the world, often with dedicated individual web pages. This situation does not provide a consensus view of the data and hinders comparison between methods. Integration of these methods is needed. So far this has not been possible since there was no common vocabulary available that could be used as a standard language. A variety of terms could be used to describe any particular feature ranging from different spellings to completely different terms. The Protein Feature Ontology (http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS) is a structured controlled vocabulary for features of a protein sequence or structure. It provides a common language for tools and methods to use, so that integration and comparison of their annotations is possible. The Protein Feature Ontology comprises approximately 100 positional terms (located in a particular region of the sequence), which have been integrated into the Sequence Ontology (SO). 40 non-positional terms which describe general protein properties have also been defined and, in addition, post-translational modifications are described by using an already existing ontology, the Protein Modification Ontology (MOD). The Protein Feature Ontology has been used by the BioSapiens Network of Excellence, a consortium comprising 19 partner sites in 14 European countries generating over 150 distinct annotation types for protein sequences and structures.
- Published
- 2008
- Full Text
- View/download PDF
24. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration
- Author
-
Susanna-Assunta Sansone, Louis J. Goldberg, Suzanna E. Lewis, Karen Eilbeck, Jonathan Bard, Michael Ashburner, Nigam H. Shah, Alan Ruttenberg, Amelia Ireland, Christopher J. Mungall, Patricia L. Whetzel, Philippe Rocca-Serra, Werner Ceusters, Neocles B. Leontis, Barry Smith, Richard H. Scheuermann, Cornelius Rosse, and William J. Bug
- Subjects
Ontology for Biomedical Investigations ,Biomedical Engineering ,Information Storage and Retrieval ,Bioengineering ,Biological Ontologies ,Ontology (information science) ,Bioinformatics ,Nervous System ,Applied Microbiology and Biotechnology ,Data science ,Basic Formal Ontology ,Article ,Open Biomedical Ontologies ,Vocabulary, Controlled ,Terminology as Topic ,OBO Foundry ,Humans ,Molecular Medicine ,Nervous System Physiological Phenomena ,IDEF5 ,Sequence Ontology ,Biotechnology - Abstract
The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.
- Published
- 2007
- Full Text
- View/download PDF
25. The Gene Ontology project in 2008
- Author
-
John Day Richter, Rex L. Chisholm, Carol J. Bult, Petra Fey, Michael S. Livstone, Susan Bromberg, Evelyn Camon, Suzanna E. Lewis, Janan T. Eppig, Emily Dimmer, Mary Shimoyama, Ni Li, Rose Oughtred, Rolf Apweiler, Stuart R. Miyasato, Edith D. Wong, Tanya Z. Berardini, Maria C. Costanzo, Christopher J. Mungall, David P. Hill, Ruth C. Lovering, Valerie Wood, Marek S. Skrzypek, Jodi E. Hirschman, J. Michael Cherry, Li Donghui, Seth Carbon, Jennifer R. Wortman, Kara Dolinski, Giorgio Valle, Kathy K. Zhu, Susan Tweedie, Shane C. Burgess, Stacia R. Engel, Trudy Torto Alalibo, Paul W. Sternberg, Fiona M. McCarthy, Pankaj Jaiswal, Doug Howe, Ranjana Kishore, Jennifer I. Deegan, Warren A. Kibbe, Gail Binkley, Simon N. Twigger, Harold J. Drabkin, Erika Feltrin, Martin Aslett, Qing Dong, Matthew Berriman, David Botstein, Victoria Petri, Pascale Gaudet, Candace Collmer, Shuai Weng, Cynthia J. Krieger, Linda Hannick, Dianna G. Fisk, Robert S. Nash, Rachael P. Huntley, Nicola Mulder, Jennifer L. Smith, Sue Povey, Seung Y. Rhee, Stan Laulederkind, Benjamin C. Hitz, Julie Park, Howard J. Jacob, Midori A. Harris, Michelle G. Giglio, Judith A. Blake, Martin Ringwald, Erich M. Schwarz, Daniel Barrell, Rama Balakrishnan, Alexander D. Diehl, Trent E. Seigfried, Amelia Ireland, Eurie L. Hong, Jane Lomax, Karen Eilbeck, Michael Ashburner, Karen R. Christie, Kimberly Van Auken, Mary E. Dolan, Varsha K. Khodiyar, and Monte Westerfield
- Subjects
Interface (Java) ,Genomics ,Biology ,Bioinformatics ,Vocabulary ,World Wide Web ,Open Biomedical Ontologies ,Databases ,03 medical and health sciences ,Annotation ,Mice ,User-Computer Interface ,0302 clinical medicine ,Resource (project management) ,Genetic ,Controlled vocabulary ,Databases, Genetic ,Genetics ,Animals ,Humans ,Sequence Ontology ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,030304 developmental biology ,0303 health sciences ,Internet ,business.industry ,Articles ,Rats ,Sequence Analysis ,Vocabulary, Controlled ,030220 oncology & carcinogenesis ,The Internet ,ComputingMethodologies_GENERAL ,Controlled ,business ,Caltech Library Services - Abstract
The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org/). The ontologies have been extended and refined for several biological areas, and improvements to the structure of the ontologies have been implemented. To improve the quantity and quality of gene product annotations available from its public repository, the GO Consortium has launched a focused effort to provide comprehensive and detailed annotation of orthologous genes across a number of ‘reference’ genomes, including human and several key model organisms. Software developments include two releases of the ontology-editing tool OBO-Edit, and improvements to the AmiGO browser interface.
- Published
- 2007
26. Sequence Ontology Annotation Guide
- Author
-
Karen Eilbeck and Suzanna E. Lewis
- Subjects
Sequence ,Information retrieval ,Article Subject ,lcsh:QH426-470 ,Computer science ,Construct (python library) ,Ontology (information science) ,Visualization ,Annotation ,lcsh:Genetics ,lcsh:Biology (General) ,Controlled vocabulary ,Genetics ,lcsh:Q ,Line (text file) ,Sequence Ontology ,lcsh:Science ,Molecular Biology ,lcsh:QH301-705.5 ,Research Article ,Biotechnology - Abstract
This Sequence Ontology (SO) [13] aims to unify the way in which we describe sequence annotations, by providing a controlled vocabulary of terms and the relationships between them. Using SO terms to label the parts of sequence annotations greatly facilitates downstream analyses of their contents, as it ensures that annotations produced by different groups conform to a single standard. This greatly facilitates analyses of annotation contents and characteristics, e.g. comparisons of UTRs, alternative splicing, etc. Because SO also specifies the relationships between features, e.g. part_of, kind_of, annotations described with SO terms are also better substrates for validation and visualization software. This document provides a step-by-step guide to producing a SO compliant file describing a sequence annotation. We illustrate this by using an annotated gene as an example. First we show where the terms needed to describe the gene's features are located in SO and their relationships to one another. We then show line by line how to format the file to construct a SO compliant annotation of this gene.
- Published
- 2004
27. The NonCode aReNA DB: a non-redundant and integrated collection of non- codingRNAs
- Author
-
Giorgio De Caro 1, Arianna Consiglio 1, Domenica D'Elia 1?, Andreas Gisel 1, 2, Giorgio Grillo 1, Sabino Liuni 1, Angelica Tulipano 1, and Flavio Licciulli 1
- Subjects
Information retrieval ,Database ,Computer science ,non-coding RNA ,computer.software_genre ,Non-coding RNA ,MiRBase ,RefSeq ,Ensembl ,User interface ,Sequence Ontology ,computer ,database ,Coding (social sciences) - Abstract
The recent availability of high throughput tech- nologies, like next generation sequencing (NGS) platforms, has providedthescientific community with an unprecedented opportunity for large- scale analysis of genome in a large number of organisms.However,among others, one of the most challenging task for bioinformaticians is to developtools that providebiologists withaneasy access to curated and non-redundant collec- tions of sequence data. Non-coding RNAs, for a long time believed tobe not-functional, are emerging as themost large and important family of gene regulators. NonCode aReNA Database is a comprehensive and non-redundant source ofmanually curated and automatically annotated ncRNA transcripts. Originally developed as a component of a big- ger project, composed by a datawarehouse for the functional annotation of ncRNAs fromNGS data, NonCode aReNA DB is currently available as a web-resource at http://ncrnadb.ba.itb.cnr. it/. Sequences have been classified in diverse biotypes and associated to SequenceOntology terms. The database can be queried by using multi-criteria and ontological search, through an easy-to-use web interface, and data exported as non-redundant collections of transcripts an- notated in VEGA, ENSEMBL, RefSeq, miRBase, GtRNAdb and piRNABank. The database is up- dated through an automatic pipeline and last updatewasonJanuary 2015. PresentlyNonCode aReNA DB contains 134,908 human ncRNAs clas- sified in 24 biotypes, and next update will include transcripts ofMusmusculus and Arabidopsis thal- iana. Acknowledgements This work was supported by the Italian MIUR Flagship Project "Epigen".
- Published
- 2015
- Full Text
- View/download PDF
28. Formalization of Genome Interval Relations
- Author
-
Christopher J. Mungall
- Subjects
Core (game theory) ,Text mining ,Theoretical computer science ,business.industry ,Order (exchange) ,Computer science ,Informatics ,Interval (graph theory) ,Genomics ,Sequence Ontology ,business ,Genome - Abstract
In order to take full advantage of next generation genomics data, I need informatics methods to be based on agreed upon formally specified standards that can be implemented easily in a uniform fashion without ambiguity. These standards should be encoded as logical formulae, so that provably correct and efficient decision procedures can be used for query answering and validation. In this paper I present the core of such a standard for sequence data: a collection of definitions of relations that hold between genomic intervals, and an alegbra for performing operations upon these intervals. I show how these relations can be used to extend formalize concepts in the Sequence Ontology (SO).
- Published
- 2014
- Full Text
- View/download PDF
29. Gene Ontology: tool for the unification of biology
- Author
-
Gavin Sherlock, Midori A. Harris, Allan Peter Davis, Laurie Issel-Tarver, Joel E. Richardson, J.T. Eppig, David P. Hill, Kara Dolinski, J. M. Cherry, M Ashburner, Suzanna E. Lewis, Heather Butler, Judith A. Blake, Selina S. Dwight, Gerald M. Rubin, John C. Matese, M. Ringwald, David Botstein, Catherine A. Ball, and Andrew Kasarskis
- Subjects
Genetics ,Databases, Factual ,Metaphysics ,Sequence Analysis, DNA ,Computational biology ,Biology ,Basic Formal Ontology ,Article ,Open Biomedical Ontologies ,Computer Communication Networks ,Mice ,Eukaryotic Cells ,Genes ,Disease Ontology ,Terminology as Topic ,OBO Foundry ,Human Phenotype Ontology ,Animals ,Humans ,Critical Assessment of Function Annotation ,Sequence Ontology ,Molecular Biology ,Blast2GO - Abstract
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
- Published
- 2000
- Full Text
- View/download PDF
30. GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database
- Author
-
Graziano Pappadà, Gaetano Scioscia, Paolo Pannarale, Domenico Catalano, Giorgio De Caro, Flavio Licciulli, Pietro Leo, Giorgio Grillo, and Francesco Rubino
- Subjects
SQL ,Computer science ,Flat file database ,Relational database ,Expert Systems ,External Data Representation ,computer.software_genre ,Biochemistry ,03 medical and health sciences ,Structural Biology ,Schema (psychology) ,Animals ,Sequence Ontology ,Semantic Web ,Molecular Biology ,030304 developmental biology ,computer.programming_language ,0303 health sciences ,Internet ,Database ,Applied Mathematics ,Research ,030302 biochemistry & molecular biology ,Database schema ,Computational Biology ,Semantic reasoner ,Biodiversity ,Computer Science Applications ,Entrez ,GenBank ,Ontology ,Databases, Nucleic Acid ,computer ,Software - Abstract
Background In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. Methods The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. Results and conclusions Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS.
- Published
- 2012
- Full Text
- View/download PDF
31. Non-coding RNA bioinformatics platform for full backing of the high-throughput sequencing experiments generated by next-generation sequencing technologies
- Author
-
Arianna Consiglio, Giorgio Grillo, Andreas Gisel, Sabino Liuni, Flavio Licciulli, Angelica Tulipano, and G De Caro
- Subjects
Genetics ,Gene nomenclature ,HUGO Gene Nomenclature Committee ,RNA ,Biology ,Sequence Ontology ,Bioinformatics ,Non-coding RNA ,MiRBase ,DNA sequencing ,Reference genome - Abstract
Motivations. Short non-coding RNA molecules (20-30 nucleotides long) play an important role in the regulation of gene expression by interacting with their target RNAs. This interaction generally downregulates gene expression either affecting RNA stability or repressing translation. Different classes of small regulatory non coding RNAs (sncRNAs) have been discovered and studied so far, and new families continue to be described, which differ in the proteins required for their biogenesis, the mechanism of target recognition and regulation, and the biological pathways they control [1,2]. In particular, three major classes of sncRNAs have been mostly investigated: small interfering RNAs (siRNAs), micro-RNAs (miRNAs) and PIWI-interacting RNAs (piRNAs) [1,2,3]. siRNAs direct the endonucleolytic cleavage of their target RNAs through a mechanism known as RNA interference (RNAi), miRNAs can repress translation or direct degradation of their target mRNA generally through imperfect complementary pairing on their 3'UTRs, whereas the major role of piRNAs is to ensure germline stability by repressing transposable elements (TEs). Recently, the advent of new Next-Generation Sequencing (NGS) technologies has awfully increased the throughput of transcriptome studies, thus allowing an unprecedented investigation of non-coding RNAs. Regulatory pathways involving ncRNAs, such as miRNAs, are now being elucidated in detail and functions for long non-coding RNAs are also emerging. The huge amount of transcript data produced by high-throughput sequencing requires the development and implementation of suitable bioinformatics workflows for their analysis and interpretation. Here we describe here a bioinformatics resource to classify and analyze the non-coding RNA component of human transcriptome sequence data obtained by different NGS platforms (Roche 454, Illumina and Solid). Methods. The ncRNAs bioinformatics platform is organized according to a typical three tier architecture: an analysis tier for ncRNAs detection, classification and functional analyses; a data tier made up of a data-warehouse used to store the analysis results, the ncRNAs reference database (a non-redundant collection of ncRNAs sequence retrieved from fRNAdb, RNAdb, mirBASE, NONCODE and others), the reference genome and other useful annotation database like HGNC nomenclature [4], Sequence Ontology (SO) [5] and Entrez Gene; a web tier module for querying the analysis results and the annotation stored in the ncRNAs reference database. The core of the platform is the analysis workflow. In figure 1 we show the pipeline for classification and functional annotations of non-coding RNAs (ncRNAs) fraction obtained through high-throughput sequencing (HTS) experiments using different NGS technologies. The input data for the bioinformatics platform can be either the reads data obtained by different NGS platforms (Roche 454, Illumina and Solid) or previously mapped reads stored in users' SAM/BAM files. Results. The ncRNA bioinformatics platform - through a combination of an analyses pipeline, a datawarehouse and a user-friendly web interface - is able to: i. detect and classify reads in known functional ncRNA categories using Sequence Ontology classification, HGNC nomenclature, gene names and miRNA accessions; ii. extract reads collections belonging to a givencategory for further analysis; iii. quantify ncRNA expression based on annotations derived from different reference ncRNA databases; iv. generate some statistics of expressed ncRNAs, indicating the RPKM (reads per kilobase of RNA model per million mapped reads)value for each Sequence Ontology class; v. detect differential expression of ncRNAs between two conditions (i.e. normal/pathological); vi. create a collections of interesting clusters of reads mapped on the genome but not detected as known ncRNA; vii. filter out reads mapping to ribosomal RNAs and mtDNA transcripts; viii. create a collection of unmapped residual reads (chimeras, artifacts, and contaminations). References 1. Ghildiyal, M. and Zamore, P. D. Small (2009) Silencing RNAs: an expanding universe. Nature Rev. Genet. 10, 94-108. 2. Malone, C. D. and Hannon, G. J. (2009) Small RNAs as guardians of the genome. Cell 136, 656-668 3. Kim, N. V., Han, J. '& Siomi M. C. (2009) Biogenesis of small RNAs in animals. Nature Rev. Mol. Cell Biol. 10, 126-139 4. Wright M '& Bruford E (2011) Naming 'junk': Human non-protein coding RNA (ncRNA) gene nomenclature. Human Genomics, VOL 5. NO 2. 90-98. 5. Eilbeck et al. (2005) The Sequence Ontology: A tool for the unification of genome annotations. Genome Biology 6:R44
- Published
- 2012
32. Using ontologies for supporting genomic sequence annotation projects
- Author
-
José Antonio Miñarro-Giménez, Santiago Torres Martínez, Marisa Madrid, María del Carmen Legaz-García, and Jesualdo Tomás Fernández-Breis
- Subjects
Open Biomedical Ontologies ,Information retrieval ,Ontology-based data integration ,Process ontology ,Suggested Upper Merged Ontology ,Upper ontology ,Biological Ontologies ,ComputingMethodologies_GENERAL ,Ontology (information science) ,Biology ,Sequence Ontology - Abstract
Traditionally, biological knowledge has been represented in an understandable way for humans, but not for machines. During the last years, the implementation and success of the Gene Ontology has influenced the development of several biological ontologies, which make it possible for machines to understand and process biological knowledge. One of such bio-ontologies is the Sequence Ontology, which provides a shared vocabulary for defining the terms that should be used for normalizing the annotations produced by sequencing projects. In this work, we present how this ontology has been incorporated into a tool that is used for supporting researchers in the annotation of genomes. Besides, this tool also reuses knowledge and integrates data from previous work from our group related to orthology and genetic disorders.
- Published
- 2011
- Full Text
- View/download PDF
33. BiologicalNetworks 2.0 - an integrative view of genome biology data
- Author
-
Amarnath Gupta, Yulia Dubinina, Mayya Sedova, Sergey Kozhenkov, Michael Baitaluk, and Julia Ponomarenko
- Subjects
Genomics ,Computational biology ,Biology ,computer.software_genre ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Databases, Genetic ,RDF ,Sequence Ontology ,Molecular Biology ,lcsh:QH301-705.5 ,Organism ,Oligonucleotide Array Sequence Analysis ,030304 developmental biology ,Internet ,0303 health sciences ,Applied Mathematics ,Frame (networking) ,Computational Biology ,Molecular Sequence Annotation ,Sequence Analysis, DNA ,computer.file_format ,Computer Science Applications ,lcsh:Biology (General) ,Genome Biology ,Database Management Systems ,lcsh:R858-859.7 ,computer ,Software ,030217 neurology & neurosurgery ,Data integration - Abstract
Background A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated sufficiently enough to give researchers a straightforward opportunity to analyze them together in the same frame of study. Systematic application of data integration methods is also hampered by a multitude of such factors as the orthogonal nature of the integrated data and naming problems. Results Here we report on a new version of BiologicalNetworks, a research environment for the integral visualization and analysis of heterogeneous biological data. BiologicalNetworks can be queried for properties of thousands of different types of biological entities (genes/proteins, promoters, COGs, pathways, binding sites, and other) and their relations (interactions, co-expression, co-citations, and other). The system includes the build-pathways infrastructure for molecular interactions/relations and module discovery in high-throughput experiments. Also implemented in BiologicalNetworks are the Integrated Genome Viewer and Comparative Genomics Browser applications, which allow for the search and analysis of gene regulatory regions and their conservation in multiple species in conjunction with molecular pathways/networks, experimental data and functional annotations. Conclusions The new release of BiologicalNetworks together with its back-end database introduces extensive functionality for a more efficient integrated multi-level analysis of microarray, sequence, regulatory, and other data. BiologicalNetworks is freely available at http://www.biologicalnetworks.org.
- Published
- 2010
34. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome
- Author
-
Justin T. Reese, Kevin L. Childs, Christopher P. Childers, Donald C. Vile, Jaideep P. Sundaram, Christine G. Elsik, and C. Michael Dickens
- Subjects
lcsh:QH426-470 ,lcsh:Biotechnology ,Statistics as Topic ,Vertebrate and Genome Annotation Project ,Biology ,computer.software_genre ,Proteomics ,Genome ,Database ,Annotation ,lcsh:TP248.13-248.65 ,Databases, Genetic ,Genetics ,Animals ,Sequence Ontology ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,Internet ,Molecular Sequence Annotation ,Bovine genome ,lcsh:Genetics ,Cattle ,DNA microarray ,computer ,Biotechnology - Abstract
Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org) has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC) in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence.
- Published
- 2010
35. A Formal Ontology of Sequences
- Author
-
Heinrich Herre, Robert Hoehndorf, and Janet Kelso
- Subjects
Computer science ,business.industry ,Bioinformatics ,Process ontology ,Suggested Upper Merged Ontology ,Ontology (information science) ,computer.software_genre ,Genetics & Genomics ,Open Biomedical Ontologies ,Formal ontology ,OBO Foundry ,Upper ontology ,General Materials Science ,Data mining ,Artificial intelligence ,Sequence Ontology ,business ,computer ,Natural language processing - Abstract
The Sequence Ontology is an OBO Foundry ontology that provides categories of sequences and sequence features that are applied to the annotation of genomes. To facilitate interoperability with other domain ontologies and to provide a foundation for automated inference, we provide here an axiom system for the Sequence and Junction categories in first- and second-order predicate logics.
- Published
- 2009
36. Evolution of the Sequence Ontology terms and relationships
- Author
-
Karen Eilbeck, Colin Batchelor, and Christopher J. Mungall
- Subjects
Biomedical Research ,Information retrieval ,Computer science ,Ontology-based data integration ,Interoperability ,Health Informatics ,Documentation ,Sequence Analysis, DNA ,Genome project ,Ontology (information science) ,Article ,Computer Science Applications ,Open Biomedical Ontologies ,Biomedical ontology ,Genes ,Sequence Ontology ,OBO Foundry ,Databases, Genetic ,Humans ,General Materials Science ,Genome annotation ,Sequence (medicine) - Abstract
The Sequence Ontology is an established ontology, with a large user community, for the purpose of genomic annotation. We are reforming the ontology to provide better terms and relationships to describe the features of biological sequence, for both genomic and derived sequence. The SO is working within the guidelines of the OBO Foundry to provide interoperability between SO and the other related OBO ontologies. Here, we report changes and improvements made to SO including new relationships to better define the mereological, spatial and temporal aspects of biological sequence.
- Published
- 2009
- Full Text
- View/download PDF
37. Evolution of the Sequence Ontology terms and relationships
- Author
-
Christopher J. Mungall and Karen Eilbeck
- Subjects
business.industry ,Computer science ,Bioinformatics ,OBO Foundry ,General Materials Science ,Artificial intelligence ,computer.software_genre ,business ,Sequence Ontology ,computer ,Natural language processing ,Mereology ,Sequence (medicine) - Abstract
The Sequence Ontology is undergoing reform to meet the standards of the OBO Foundry. Here we report some of the incremental changes and improvements made to SO. We also propose new relationships to better define the mereological, spatial and temporal aspects of biological sequence.
- Published
- 2009
38. Immunogenetic sequence annotation based on IMGT-ONTOLOGY
- Author
-
Marie-Paule Lefranc, Géraldine Folch, Fatena Bellahcene, Patrice Duroux, François Ehrenmann, Véronique Giudicelli, and Joumana Jabado-Michaloud
- Subjects
genomic DNA ,Annotation ,Sequence annotation ,Molecular type ,General Materials Science ,Chain type ,Computational biology ,Ontology (information science) ,Biology ,Sequence Ontology - Abstract
IMGT/LIGM-DB^1^ is the first and the largest IMGT^®^ database^2^ in which are managed, analysed and annotated more than 136,000 immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences from human and 235 other vertebrate species (April 2009). The expert annotation of these sequences and the added standardized knowledge are based on IMGT-ONTOLOGY, the first ontology developed in the field of immunogenetics and immunoinformatics.^3^ The annotation of immunogenetic sequences requires important expertise, owing to the unusual structure (non-classical exon/intron structure) of the IG and TR genes and characteristic chain synthesis owing to DNA V-J and V-D-J rearrangements. The way to annotate these sequences depends on the molecular type (gDNA, mRNA, cDNA or protein) and the configuration type (germline or rearranged), and if sequences from the concerned species are present or not in the IMGT reference directory sets. IMGT/V-QUEST^5^ and internal tools (IMGT/Automat, IMGT/LIGMotif, IMGT/BLAST and IMGT/DomainGapAlign) were developed. The first step in annotation allows to identify the chain type (for instance IG-Heavy) and to assign standardized keywords (IDENTIFICATION axiom). The second step is the classification of IG and TR genes and alleles (CLASSIFICATION axiom). The third step is the description (DESCRIPTION axiom) of the V, D, J and C genes and alleles with specific standardized labels. There are more than 590 IMGT standardized labels from which 64 have been entered in Sequence Ontology (SO). The delimitation of the FR-IMGT and CDR-IMGT lengths and the positions of conserved amino acids based on the IMGT unique numbering (NUMEROTATION axiom) allow to bridge the gap between sequences and 3D structures.^6^ The complete annotation of immunogenetic germline (V, D, J) and C sequences is followed by the update of the IMGT Repertoire (IMGT Gene tables, Alignments of alleles, Protein displays, Colliers de Perles, etc.), IMGT® gene database (IMGT/GENE-DB) and IMGT reference directory sets of the IMGT® tools (IMGT/V-QUEST, IMGT/JunctionAnalysis and IMGT/DomainGapAlign).
- Published
- 2009
- Full Text
- View/download PDF
39. Protein Ontology and Community Curation
- Author
-
Cecilia Arighi
- Subjects
Information retrieval ,Computer science ,Bioinformatics ,Structural Classification of Proteins database ,Ontology (information science) ,Unique identifier ,Open Biomedical Ontologies ,Annotation ,Documentation ,ComputingMethodologies_PATTERNRECOGNITION ,Molecular Cell Biology ,General Materials Science ,Relevance (information retrieval) ,Sequence Ontology - Abstract
The Protein Ontology (PRO) is designed as a formal and well-principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from the classification of proteins, on the basis of evolutionary relationships at the homeomorphic level, to the representation of the multiple protein forms of a gene, such as those resulting from alternative splicing, cleavage and/or post-translational modifications. As an ontology, PRO differs from a database in that it provides description about the protein types and their relationships. In this way PRO can be integrated with or cross-referenced by other ontologies and/or databases. The representation of specific protein entities in PRO allows precise definition of objects in pathways, complexes, or in disease modeling. This is useful for proteomics studies where isoforms and modified forms must be differentiated and for biological pathway/network representation where the cascade of events often depends on a specific protein modification. The PRO framework is designed to allow the community to curate any protein entities of interest and will provide a stable unique identifier to any protein type. PRO is manually curated starting with content derived from various data sources coupling with scientific literature. Only annotation with experimental evidence is included, and is in the form of relationship to other ontologies (such as Gene Ontology, Sequence Ontology, and PSI-MOD). We have developed a web-based curation editor for PRO community annotation. In the tutorial, we will first give a brief introduction to the ontology and its relevance to the research communities - OBO ontologies, MOD, pathway and other databases, and any resources that need references/links to protein types. We will show the components of the PRO entry report, and how to search the ontology. Then, we will walk through an example where we will teach the basic curation steps: accessing the web editor, entering the protein to be defined with source attribution, and adding functional annotation. We will provide the necessary tools and documentation so that the user will be able to start curating the protein types of interest. PRO URL: "http://pir.georgetown.edu/pro/":http://pir.georgetown.edu/pro/
- Published
- 2009
40. Linking Biological Databases Semantically for Knowledge Discovery
- Author
-
Sudha Ram, Kunpeng Zhang, and Wei Wei
- Subjects
Data sharing ,Biological data ,Information retrieval ,Workflow ,Knowledge extraction ,Computer science ,Biological database ,Ontology (information science) ,Semantic data model ,Sequence Ontology ,Data science - Abstract
Many important life sciences questions are aimed at studying the relationships and interactions between biological functions/processes and biological entities such as genes. The answers may be found by examining diverse types of biological/genomic databases. Finding these answers, however, requires accessing, and retrieving data, from diverse biological data sources. More importantly, sophisticated knowledge discovery processes involve traversing through large numbers of inherent links among various data sources. Currently, the links among data are either implemented as hyperlinks without explicitly indicating their meanings and labels, or hidden in a seemingly simple text format. Consequently, biologists spend numerous hours identifying potentially useful links and following each lead manually, which is time-consuming and error-prone. Our research is aimed at constructing semantic relationships among all biological entities. We have designed a semantic model to categorize and formally define the links. By incorporating ontologies such as Gene or Sequence ontology, we propose techniques to analyze the links embedded within and among data records, to explicitly label their semantics, and to facilitate link traversal, querying, and data sharing. Users may then ask complicated and ad hoc questions and even design their own workflow to support their knowledge discovery processes. In addition, we have performed an empirical analysis to demonstrate that our method can not only improve the efficiency of querying multiple databases, but also yield more useful information.
- Published
- 2008
- Full Text
- View/download PDF
41. BOOTSTRAPPING THE RECOGNITION AND ANAPHORIC LINKING OF NAMED ENTITIES IN DROSOPHILA ARTICLES
- Author
-
Ted Briscoe, Andreas Vlachos, Caroline Gasperin, and Ian Lewin
- Subjects
Information retrieval ,biology ,business.industry ,Computer science ,Anaphora (linguistics) ,Bootstrapping (linguistics) ,biology.organism_classification ,computer.software_genre ,Annotation ,Gene nomenclature ,Extant taxon ,Artificial intelligence ,Drosophila (subgenus) ,business ,FlyBase : A Database of Drosophila Genes & Genomes ,Sequence Ontology ,computer ,Natural language processing - Abstract
This paper demonstrates how Drosophila gene name recognition and anaphoric linking of gene names and their products can be achieved using existing information in FlyBase and the Sequence Ontology. Extending an extant approach to gene name recognition we achieved a F-score of 0.8559, and we report a preliminary experiment using a baseline anaphora resolution algorithm. We also present guidelines for annotation of gene mentions in texts and outline how the resulting system is used to aid FlyBase curation.
- Published
- 2005
- Full Text
- View/download PDF
42. The Sequence Ontology: a tool for the unification of genome annotations
- Author
-
Karen Eilbeck, Michael Ashburner, Christopher J. Mungall, Lincoln Stein, Suzanna E. Lewis, Mark Yandell, Richard Durbin, and Apollo - University of Cambridge Repository
- Subjects
Method ,Documentation ,Computational biology ,Biology ,Ontology (information science) ,computer.software_genre ,Set (abstract data type) ,03 medical and health sciences ,0302 clinical medicine ,Software Design ,Terminology as Topic ,OBO Foundry ,Databases, Genetic ,Controlled vocabulary ,RNA, Messenger ,Automated reasoning ,Sequence Ontology ,030304 developmental biology ,0303 health sciences ,business.industry ,Computational Biology ,Exons ,Genomics ,Alternative Splicing ,ComputingMethodologies_PATTERNRECOGNITION ,Vocabulary, Controlled ,Artificial intelligence ,Zebrafish Information Network genome database ,business ,computer ,030217 neurology & neurosurgery ,Natural language processing ,Mereology - Abstract
The goal of the Sequence Ontology (SO) project is to produce a structured controlled vocabulary with a common set of terms and definitions for parts of a genomic annotation, and to describe the relationships among them. Details of SO construction, design and use, particularly with regard to part-whole relationships are discussed and the practical utility of SO is demonstrated for a set of genome annotations from Drosophila melanogaster., The Sequence Ontology (SO) is a structured controlled vocabulary for the parts of a genomic annotation. SO provides a common set of terms and definitions that will facilitate the exchange, analysis and management of genomic data. Because SO treats part-whole relationships rigorously, data described with it can become substrates for automated reasoning, and instances of sequence features described by the SO can be subjected to a group of logical operations termed extensional mereology operators.
- Published
- 2005
- Full Text
- View/download PDF
43. The Gene Ontology project
- Author
-
Jane Lomax, Jennifer I. Clark, Midori A. Harris, and Amelia Ireland
- Subjects
Open Biomedical Ontologies ,Annotation ,Information retrieval ,Computer science ,Component (UML) ,Controlled vocabulary ,Biocurator ,Ontology (information science) ,Sequence Ontology ,Critical Assessment of Function Annotation - Abstract
The main goal of the Gene Ontology (GO) project is to support the construction and use of structured, controlled vocabularies to address the growing need for meaningful annotation of genes and their products in different organisms. There are three key aspects of the GO project: the development of dynamic, controlled vocabularies that can be applied to all organisms even as knowledge of the roles of gene products is accumulating and changing; the application of GO terms in annotating genes or gene products; and the development and maintenance of databases and software for querying, displaying, and manipulating ontologies and associated annotation sets. The GO vocabularies are four nonoverlapping, structured networks of terms that describe key aspects of biology. Molecular function describes the activities or tasks performed by individual gene products at the molecular level; biological process describes broad biological goals that are accomplished by ordered assemblies of molecular functions; cellular component encompasses subcellular structures, locations, and macromolecular complexes; sequence ontology includes genome feature terms. The GO project's resources are available to the public at http://www.geneontology.org. Keywords: ontology; annotation; database; model organism; function; process; component; controlled vocabulary
- Published
- 2005
- Full Text
- View/download PDF
44. Protein Ontology and Community Curation
- Author
-
Arighi, Cecilia
- Published
- 2009
- Full Text
- View/download PDF
45. Dotazovací jazyk pro databáze biologických dat
- Author
-
Martínek, Tomáš, Vogel, Ivan, Bahurek, Tomáš, Martínek, Tomáš, Vogel, Ivan, and Bahurek, Tomáš
- Abstract
S rapidně stoupajícím množstvím biologických dat stoupá i důležitost biologických databází. U těchto databází je nezbytné objevování znalostí (nalezení spojitostí, které nebyli známé v čase vkládání dat). K získávání znalostí z biologických databází je nutná konstrukce složitých SQL dotazů, což vyžaduje pokročilou znalost SQL a použitého databázového sché- matu. Biologové většinou tyto znalosti nemají, proto je potřeba nástroje, který by poskytl intuitivnějšího rozhrání pro tyto databáze. Tato práce navrhuje ChQL, intuitivní dotazovací jazyk pro databázi biologických dat Chado. ChQL umožňuje biologům poskládat dotaz za použití pojmů, které dobře znají bez nutnosti znát SQL nebo použité schéma. Tato práce implementuje aplikaci pro dotazování databáze Chado pomocí ChQL. Webové rozhrání provede uživatele procesem zostavení věty jazyka ChQL. Aplikace přeloží tuto větu do SQL dotazu, odešle jej do databáze Chado a zobrazí vrácená data v tabulce. Výsledky jsou vyhodnoceny testováním dotazů na reálných datech., With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.
46. Dotazovací jazyk pro databáze biologických dat
- Author
-
Martínek, Tomáš, Vogel, Ivan, Bahurek, Tomáš, Martínek, Tomáš, Vogel, Ivan, and Bahurek, Tomáš
- Abstract
S rapidně stoupajícím množstvím biologických dat stoupá i důležitost biologických databází. U těchto databází je nezbytné objevování znalostí (nalezení spojitostí, které nebyli známé v čase vkládání dat). K získávání znalostí z biologických databází je nutná konstrukce složitých SQL dotazů, což vyžaduje pokročilou znalost SQL a použitého databázového sché- matu. Biologové většinou tyto znalosti nemají, proto je potřeba nástroje, který by poskytl intuitivnějšího rozhrání pro tyto databáze. Tato práce navrhuje ChQL, intuitivní dotazovací jazyk pro databázi biologických dat Chado. ChQL umožňuje biologům poskládat dotaz za použití pojmů, které dobře znají bez nutnosti znát SQL nebo použité schéma. Tato práce implementuje aplikaci pro dotazování databáze Chado pomocí ChQL. Webové rozhrání provede uživatele procesem zostavení věty jazyka ChQL. Aplikace přeloží tuto větu do SQL dotazu, odešle jej do databáze Chado a zobrazí vrácená data v tabulce. Výsledky jsou vyhodnoceny testováním dotazů na reálných datech., With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.
47. Dotazovací jazyk pro databáze biologických dat
- Author
-
Martínek, Tomáš, Vogel, Ivan, Martínek, Tomáš, and Vogel, Ivan
- Abstract
S rapidně stoupajícím množstvím biologických dat stoupá i důležitost biologických databází. U těchto databází je nezbytné objevování znalostí (nalezení spojitostí, které nebyli známé v čase vkládání dat). K získávání znalostí z biologických databází je nutná konstrukce složitých SQL dotazů, což vyžaduje pokročilou znalost SQL a použitého databázového sché- matu. Biologové většinou tyto znalosti nemají, proto je potřeba nástroje, který by poskytl intuitivnějšího rozhrání pro tyto databáze. Tato práce navrhuje ChQL, intuitivní dotazovací jazyk pro databázi biologických dat Chado. ChQL umožňuje biologům poskládat dotaz za použití pojmů, které dobře znají bez nutnosti znát SQL nebo použité schéma. Tato práce implementuje aplikaci pro dotazování databáze Chado pomocí ChQL. Webové rozhrání provede uživatele procesem zostavení věty jazyka ChQL. Aplikace přeloží tuto větu do SQL dotazu, odešle jej do databáze Chado a zobrazí vrácená data v tabulce. Výsledky jsou vyhodnoceny testováním dotazů na reálných datech., With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.
48. Dotazovací jazyk pro databáze biologických dat
- Author
-
Martínek, Tomáš, Vogel, Ivan, Martínek, Tomáš, and Vogel, Ivan
- Abstract
S rapidně stoupajícím množstvím biologických dat stoupá i důležitost biologických databází. U těchto databází je nezbytné objevování znalostí (nalezení spojitostí, které nebyli známé v čase vkládání dat). K získávání znalostí z biologických databází je nutná konstrukce složitých SQL dotazů, což vyžaduje pokročilou znalost SQL a použitého databázového sché- matu. Biologové většinou tyto znalosti nemají, proto je potřeba nástroje, který by poskytl intuitivnějšího rozhrání pro tyto databáze. Tato práce navrhuje ChQL, intuitivní dotazovací jazyk pro databázi biologických dat Chado. ChQL umožňuje biologům poskládat dotaz za použití pojmů, které dobře znají bez nutnosti znát SQL nebo použité schéma. Tato práce implementuje aplikaci pro dotazování databáze Chado pomocí ChQL. Webové rozhrání provede uživatele procesem zostavení věty jazyka ChQL. Aplikace přeloží tuto větu do SQL dotazu, odešle jej do databáze Chado a zobrazí vrácená data v tabulce. Výsledky jsou vyhodnoceny testováním dotazů na reálných datech., With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.
49. Dotazovací jazyk pro databáze biologických dat
- Author
-
Martínek, Tomáš, Vogel, Ivan, Martínek, Tomáš, and Vogel, Ivan
- Abstract
S rapidně stoupajícím množstvím biologických dat stoupá i důležitost biologických databází. U těchto databází je nezbytné objevování znalostí (nalezení spojitostí, které nebyli známé v čase vkládání dat). K získávání znalostí z biologických databází je nutná konstrukce složitých SQL dotazů, což vyžaduje pokročilou znalost SQL a použitého databázového sché- matu. Biologové většinou tyto znalosti nemají, proto je potřeba nástroje, který by poskytl intuitivnějšího rozhrání pro tyto databáze. Tato práce navrhuje ChQL, intuitivní dotazovací jazyk pro databázi biologických dat Chado. ChQL umožňuje biologům poskládat dotaz za použití pojmů, které dobře znají bez nutnosti znát SQL nebo použité schéma. Tato práce implementuje aplikaci pro dotazování databáze Chado pomocí ChQL. Webové rozhrání provede uživatele procesem zostavení věty jazyka ChQL. Aplikace přeloží tuto větu do SQL dotazu, odešle jej do databáze Chado a zobrazí vrácená data v tabulce. Výsledky jsou vyhodnoceny testováním dotazů na reálných datech., With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.
50. Dotazovací jazyk pro databáze biologických dat
- Author
-
Martínek, Tomáš, Vogel, Ivan, Martínek, Tomáš, and Vogel, Ivan
- Abstract
S rapidně stoupajícím množstvím biologických dat stoupá i důležitost biologických databází. U těchto databází je nezbytné objevování znalostí (nalezení spojitostí, které nebyli známé v čase vkládání dat). K získávání znalostí z biologických databází je nutná konstrukce složitých SQL dotazů, což vyžaduje pokročilou znalost SQL a použitého databázového sché- matu. Biologové většinou tyto znalosti nemají, proto je potřeba nástroje, který by poskytl intuitivnějšího rozhrání pro tyto databáze. Tato práce navrhuje ChQL, intuitivní dotazovací jazyk pro databázi biologických dat Chado. ChQL umožňuje biologům poskládat dotaz za použití pojmů, které dobře znají bez nutnosti znát SQL nebo použité schéma. Tato práce implementuje aplikaci pro dotazování databáze Chado pomocí ChQL. Webové rozhrání provede uživatele procesem zostavení věty jazyka ChQL. Aplikace přeloží tuto větu do SQL dotazu, odešle jej do databáze Chado a zobrazí vrácená data v tabulce. Výsledky jsou vyhodnoceny testováním dotazů na reálných datech., With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.