Descriptor: "Literature curation" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Literature curation"' showing total 13 results

Start Over Descriptor "Literature curation" Database OpenAIRE

13 results on '"Literature curation"'

1. The case of the gluten bibliome

Author: Pérez-Pérez, Martín, Ferreira, Tânia, Igrejas, Gilberto, Fdez-Riverola, Florentino, and LAQV@REQUIMTE
Subjects: Ontology-based methods, Text mining, Artificial Intelligence, Deep learning, Relation extraction, Literature curation, Gluten, Engineering(all), Computer Science Applications
Abstract: SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from the University of Vigo for hosting its IT infrastructure. the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) under the scope of the strategic funding of ED431C2018/55-GRC Competitive Reference Group, the “Centro singular de investigación de Galicia” (accreditation 2019-2022) funded by the European Regional Development Fund (ERDF)-Ref. ED431G2019/06. The authors also acknowledge the postdoctoral fellowship [ED481B-2019-032] of Martín Pérez-Pérez, funded by Xunta de Galicia. Funding for open access charge: Universidade de Vigo/CISUG. Publisher Copyright: © 2022 The Author(s) Discover relevant biomedical interactions in the literature is crucial for enhancing biology research. This curation process has an essential role in studying the different processes and interactions reported that affect the biological process (e.g., genome, metabolome, and transcriptome). In this sense, the objective of this work is twofold: reduce the manual effort required to curate and review the existing biochemical interactions reported in the gluten-related bibliome, while proposing a novel vector-space integrated into a deep learning model to assists manual curators in a real curation task by learning from their previous decisions. With this objective, the present work proposes a novel vector-space that combine (i) high-level lexical and syntactic inference features as Wordnets and Health-related domain ontologies, (ii) unsupervised semantic resources as word embedding, (iii) semantic and syntactic sentence knowledge, (iv) abbreviation resolution support, (v) several state-of-the-art Named-entity recognition methods, and, finally, (vi) different feature construction and optimization techniques to support a semi-automatic curation workflow. Therefore, the application of the proposed workflow over a classified set of 2,451 relevant gluten-related documents produces a total of 8,349 relevant and 471,813 irrelevant relations distributed in thirteen domain health-related categories. Experimental results showed that the proposed workflow is a valuable approach for a semi-automatic relation extraction task. It was able to obtain satisfactory results in the early stages of a real-world curation task and saved manual annotation efforts by learning from the decisions made by manual curators in iterative annotation rounds. The average F.score for the proposed relation categories was 0.731, being the lowest F.score at 0.47 and the highest F.score at 0.929. The different resources used in this work as well as the manually curated corpus are public available on our GitHub repository. publishersversion published
Published: 2022

2. Exploring Curated Conformational Ensembles of Intrinsically Disordered Proteins in the Protein Ensemble Database

Author: Federica Quaglia, Silvio C. E. Tosatto, Tamas Lazar, András Hatos, Peter Tompa, Damiano Piovesan, Faculty of Sciences and Bioengineering Sciences, Department of Bio-engineering Sciences, and Structural Biology Brussels
Subjects: conformational ensembles, database, intrinsically disordered proteins, literature curation, PED, structural modeling, Databases, Protein, Molecular Dynamics Simulation, Scattering, Small Angle, X-Ray Diffraction, Intrinsically Disordered Proteins, Small Angle, Computer science, Health Informatics, computer.software_genre, Intrinsically disordered proteins, 01 natural sciences, General Biochemistry, Genetics and Molecular Biology, Field (computer science), Scattering, Databases, 03 medical and health sciences, Consistency (database systems), 0103 physical sciences, General Pharmacology, Toxicology and Pharmaceutics, Protocol (object-oriented programming), Conformational ensembles, 030304 developmental biology, 0303 health sciences, 010304 chemical physics, General Immunology and Microbiology, Database, Protein, General Neuroscience, Metadata, Medical Laboratory Technology, User interface, computer
Abstract: The Protein Ensemble Database (PED; https://proteinensemble.org/) is the major repository of conformational ensembles of intrinsically disordered proteins (IDPs). Conformational ensembles of IDPs are primarily provided by their authors or occasionally collected from literature, and are subsequently deposited in PED along with the corresponding structured, manually curated metadata. The modeling of conformational ensembles usually relies on experimental data from small-angle X-ray scattering (SAXS), fluorescence resonance energy transfer (FRET), NMR spectroscopy, and molecular dynamics (MD) simulations, or a combination of these techniques. The growing number of scientific studies based on these data, along with the astounding and swift progress in the field of protein intrinsic disorder, has required a significant update and upgrade of PED, first published in 2014. To this end, the database was entirely renewed in 2020 and now has a dedicated team of biocurators providing manually curated descriptions of the methods and conditions applied to generate the conformational ensembles and for checking consistency of the data. Here, we present a detailed description on how to explore PED with its protein pages and experimental pages, and how to interpret entries of conformational ensembles. We describe how to efficiently search conformational ensembles deposited in PED by means of its web interface and API. We demonstrate how to make sense of the PED protein page and its associated experimental entry pages with reference to the yeast Sic1 use case. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Performing a search in PED Support Protocol 1: Programmatic access with the PED API Basic Protocol 2: Interpreting the protein page and the experimental entry page-the Sic1 use case Support Protocol 2: Downloading options Support Protocol 3: Understanding the validation report-the Sic1 use case Basic Protocol 3: Submitting new conformational ensembles to PED Basic Protocol 4: Providing feedback in PED.
Published: 2021
Full Text: View/download PDF

3. DisProt: intrinsic protein disorder annotation in 2020

Author: Hatos, András, Hajdu-Soltész, Borbála, Monzon, Alexander M., Palopoli, Nicolas, Álvarez, Lucía, Aykac-Fas, Burcu, Bassot, Claudio, Benítez, Guillermo I., Bevilacqua, Martina, Chasapi, Anastasia, Chemes, Lucia, Davey, Norman E., Davidović, Radoslav, Dunker, A. Keith, Elofsson, Arne, Gobeill, Julien, Foutel, Nicolás S. González, Sudha, Govindarajan, Guharoy, Mainak, Horvath, Tamas, Iglesias, Valentin, Kajava, Andrey V., Kovacs, Orsolya P., Lamb, John, Lambrughi, Matteo, Lazar, Tamas, Leclercq, Jeremy Y., Leonardi, Emanuela, Macedo-Ribeiro, Sandra, Macossay-Castillo, Mauricio, Maiani, Emiliano, Manso, José A., Marino-Buslje, Cristina, Martínez-Pérez, Elizabeth, Mészáros, Bálint, Mičetić, Ivan, Minervini, Giovanni, Murvai, Nikoletta, Necci, Marco, Ouzounis, Christos A., Pajkos, Mátyás, Paladin, Lisanna, Pancsa, Rita, Papaleo, Elena, Parisi, Gustavo, Pasche, Emilie, Barbosa Pereira, Pedro J., Promponas, Vasilis J., Pujols, Jordi, Quaglia, Federica, Ruch, Patrick, Salvatore, Marco, Schad, Eva, Szabo, Beata, Szaniszló, Tamás, Tamana, Stella, Tantos, Agnes, Veljkovic, Nevena, Ventura, Salvador, Vranken, Wim, Dosztányi, Zsuzsanna, Tompa, Peter, Tosatto, Silvio C. E., Piovesan, Damiano, Promponas, Vasilis J. [0000-0003-3352-4831], Universita degli Studi di Padova, Vinča Institute of Nuclear Sciences, University of Belgrade [Belgrade], Stockholm University, Laboratoire Leibniz (Leibniz - IMAG), Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Université Joseph Fourier - Grenoble 1 (UJF), Laboratoire de biochimie théorique [Paris] (LBT (UPR_9080)), Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS)-Institut de biologie physico-chimique (IBPC (FR_550)), Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Institut de Chimie du CNRS (INC), Reproductive Neuroscience Unit, Department of Obstetrics and Gynecology and Department of Neurobiology, Yale University School of Medicine, Centre de recherche en Biologie Cellulaire (CRBM), Université Montpellier 1 (UM1)-Université Montpellier 2 - Sciences et Techniques (UM2)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Università degli Studi di Modena e Reggio Emilia (UNIMORE), Structural Biology Brussels (SBB), Vrije Universiteit Brussel (VUB), Instituto de Biologia Molecular e Celular (IBMC), Hungarian Academy of Sciences (MTA), Institute of Agrobiotechnology, National Center for Research and Technology, Universidad Nacional de Quilmes (UNQ), Université de Genève et Hôpitaux Universitaires de Genève (SIM), Hôpitaux Universitaires de Genève (HUG), Biophysics and Bioinformatics Laboratory, Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona (UAB), Service d'informatique médicale (SIM), Hôpitaux de Genève, Department of Structural Biology, Biophysics Research Group [Budapest] (ELTE-MTA 'Lendület'), Department of Bio-engineering Sciences, Structural Biology Brussels, Faculty of Sciences and Bioengineering Sciences, Informatics and Applied Informatics, Chemistry, Basic (bio-) Medical Sciences, Instituto de Investigação e Inovação em Saúde, Université Joseph Fourier - Grenoble 1 (UJF)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Université Paris Diderot - Paris 7 (UPD7)-Institut de biologie physico-chimique (IBPC), Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS), Université Montpellier 2 - Sciences et Techniques (UM2)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Université Montpellier 1 (UM1), Vrije Universiteit [Brussels] (VUB), Universitat Autònoma de Barcelona [Barcelona] (UAB), Centre National de la Recherche Scientifique (CNRS)-Université Paris Diderot - Paris 7 (UPD7)-Institut de biologie physico-chimique (IBPC (FR_550)), and Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Disordered proteins, Interface (Java), Disorder Ontology, [SDV]Life Sciences [q-bio], Interoperability, [SDV.BC]Life Sciences [q-bio]/Cellular Biology, Ontology (information science), Biology, 03 medical and health sciences, Annotation, Genetics, Database Issue, Databases, Protein, ComputingMilieux_MISCELLANEOUS, Data Curation, 030304 developmental biology, Graphical user interface, Structure (mathematical logic), 0303 health sciences, Intrinsically Disordered Proteins / chemistry, Information retrieval, Intrinsically disordered proteins, Data curation, Molecular sequence annotation, business.industry, 030302 biochemistry & molecular biology, Intrinsic protein, Biological Ontologies, Molecular Sequence Annotation, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Intrinsically Disordered Proteins, Dark proteome, Ontology, Intrisic protein disorder, Literature curation, business, Biological ontologies
Abstract: The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome. Agencia Nacional de Promoción Científica y Tecnológica (ANPCyT) of Argentina [PICT-2015/3367, PICT-2017/1924]; Ministry of Education, Science and Technological Development of the Republic of Serbia [ON173001]; Vetenskapsrådet [2016-03798]; Hungarian National Research, Development, and Innovation Office (NKFIH) [FK-128133]; Italian Ministry of Health Young Investigator Grant [GR-2011-02347754]; Ministerio de Economía y Competitividad (MINECO) [BIO2016-78310-R]; ICREA (ICREA-Academia 2015); Fundac¸ão para a Ciência e a Tecnologia (FCT, Portugal); European Regional Development Fund [POCI-01-0145-FEDER-031173, POCI-01-0145-FEDER-029221]; Mexican National Council of Science and Technology (CONACYT) [215503]; Elixir-GR, Action ‘Reinforcement of the Research and Innovation Infrastructure’, Operational Programme ‘Competitiveness, Entrepreneurship and Innovation’ [NSRF 2014-2020]. co-financed by Greece and the European Union (European Regional Development Fund); Hungarian Academy of Sciences [PREMIUM-2017-48]; Carlsberg Distinguished Fellowship [CF18-0314]; Danmarks Grundforskningsfond [DNRF125]; National Research, Development and Innovation Office [K-125340]; Research Foundation Flanders (FWO) [G.0328.16N]; Hungarian Academy of Sciences [LP2014-18]; OTKA [K108798 and K124670]. This project has received funding from the European Union’s Horizon 2020 research and innovation programme [778247]. Funding for open access charge: European Union’s Horizon 2020 research and innovation programme [778247]. Conflict of interest statement. None declared.
Published: 2020
Full Text: View/download PDF

4. Exploring Manually Curated Annotations of Intrinsically Disordered Proteins with DisProt

Author: Federica Quaglia, András Hatos, Damiano Piovesan, and Silvio C. E. Tosatto
Subjects: 0303 health sciences, Information retrieval, DisProt, community curation, database, intrinsically disordered proteins, literature curation, Interface (Java), Computer science, Protein Conformation, 030305 genetics & heredity, Computational Biology, General Medicine, Intrinsically disordered proteins, 03 medical and health sciences, User-Computer Interface, Humans, Databases, Protein, Protocol (object-oriented programming), Data Curation, 030304 developmental biology
Abstract: DisProt is the major repository of manually curated data for intrinsically disordered proteins collected from the literature. Although lacking a stable tertiary structure under physiological conditions, intrinsically disordered proteins carry out a plethora of biological functions, some of them directly arising from their flexible nature. A growing number of scientific studies have been published during the last few decades in an effort to shed light on their unstructured state, their binding modes, and their functions. DisProt makes use of a team of expert biocurators to provide up-to-date annotations of intrinsically disordered proteins from the literature, making them available to the scientific community. Here we present a comprehensive description on how to use DisProt in different contexts and provide a detailed explanation of how to explore and interpret manually curated annotations of intrinsically disordered proteins. We describe how to search DisProt annotations, using both the web interface and the API for programmatic access. Finally, we explain how to visualize and interpret a DisProt entry, p53, a widely studied protein characterized by the presence of unstructured N-terminal and C-terminal regions. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Performing a search in DisProt Support Protocol 1: Downloading options Support Protocol 2: Programmatic access with DisProt REST API Basic Protocol 2: Visualizing and interpreting DisProt entries: the p53 use case Basic Protocol 3: Providing feedback and submitting new intrinsic disorder-related data.
Published: 2020

5. Systematic evaluation of isoform function in literature reports of alternative splicing

Author: James Liu, Ellie Hogan, Chao Chun Liu, Minh Phan, Sophia Ly, Shamsuddin A. Bhuiyan, Paul Pavlidis, and Brandon Huntington
Subjects: 0301 basic medicine, Gene isoform, lcsh:QH426-470, lcsh:Biotechnology, Computational biology, Functional diversity, Biology, Genome, Mice, 03 medical and health sciences, 0302 clinical medicine, lcsh:TP248.13-248.65, Genetics, Animals, Humans, Protein Isoforms, Ensembl, Gene, 030304 developmental biology, 0303 health sciences, Repertoire, Isoform function, Alternative splicing, Computational Biology, Genome project, lcsh:Genetics, Alternative Splicing, 030104 developmental biology, RNA splicing, Splice isoforms, DNA microarray, Literature curation, 030217 neurology & neurosurgery, Function (biology), Biotechnology, Research Article
Abstract: Background Although most genes in mammalian genomes have multiple isoforms, an ongoing debate is whether these isoforms are all functional as well as the extent to which they increase the functional repertoire of the genome. To ground this debate in data, it would be helpful to have a corpus of experimentally-verified cases of genes which have functionally distinct splice isoforms (FDSIs). Results We established a curation framework for evaluating experimental evidence of FDSIs, and analyzed over 700 human and mouse genes, strongly biased towards genes that are prominent in the alternative splicing literature. Despite this bias, we found experimental evidence meeting the classical definition for functionally distinct isoforms for ~ 5% of the curated genes. If we relax our criteria for inclusion to include weaker forms of evidence, the fraction of genes with evidence of FDSIs remains low (~ 13%). We provide evidence that this picture will not change substantially with further curation and conclude there is a large gap between the presumed impact of splicing on gene function and the experimental evidence. Furthermore, many functionally distinct isoforms were not traceable to a specific isoform in Ensembl, a database that forms the basis for much computational research. Conclusions We conclude that the claim that alternative splicing vastly increases the functional repertoire of the genome is an extrapolation from a limited number of empirically supported cases. We also conclude that more work is needed to integrate experimental evidence and genome annotation databases. Our work should help shape research around the role of splicing on gene function from presuming large general effects to acknowledging the need for stronger experimental evidence. Electronic supplementary material The online version of this article (10.1186/s12864-018-5013-2) contains supplementary material, which is available to authorized users.
Published: 2018
Full Text: View/download PDF

6. Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature

Author: K. Van Auken, Paul W. Sternberg, Hans-Michael Müller, and Yuling Li
Subjects: 0301 basic medicine, Text corpus, PubMed, Text mining, Information extraction, Computer science, lcsh:Computer applications to medicine. Medical informatics, computer.software_genre, Biochemistry, User-Computer Interface, 03 medical and health sciences, Search engine, Upload, Structural Biology, Information retrieval, Animals, Data Mining, Humans, Knowledge retrieval, Caenorhabditis elegans, lcsh:QH301-705.5, Molecular Biology, Internet, 030102 biochemistry & molecular biology, Ontology, business.industry, Applied Mathematics, Publications, Search engine indexing, Molecular Sequence Annotation, Literature search engine, Computer Science Applications, Search Engine, Gene Ontology, 030104 developmental biology, lcsh:Biology (General), Model organism databases, lcsh:R858-859.7, The Internet, Literature curation, business, computer, Software, Algorithms
Abstract: Background The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. Results We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Conclusion Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpc
Published: 2018
Full Text: View/download PDF

7. A Framework for Collaborative Curation of Neuroscientific Literature

Author: Sean Hill, Christian O'Reilly, and Elisabetta Iavarone
Subjects: 0301 basic medicine, Traceability, Computer science, neural network modeling, Biomedical Engineering, Neuroscience (miscellaneous), computer.software_genre, 03 medical and health sciences, Annotation, 0302 clinical medicine, Bounding overwatch, Methods, literature curation, ontology, thalamocortical loop, Graphical user interface, computer.programming_language, Information retrieval, Data curation, business.industry, Python (programming language), Toolbox, Computer Science Applications, 030104 developmental biology, annotation tools, Ontology, Data mining, business, computer, 030217 neurology & neurosurgery, Neuroscience
Abstract: Large models of complex neuronal circuits require specifying numerous parameters, with values that often need to be extracted from the literature, a tedious and error-prone process. To help establishing shareable curated corpora of annotations, we have developed a literature curation framework comprising an annotation format, a Python API (NeuroAnnotation Toolbox; NAT), and a user-friendly graphical interface (NeuroCurator). This framework allows the systematic annotation of relevant statements and model parameters. The context of the annotated content is made explicit in a standard way by associating it with ontological terms (e.g., species, cell types, brain regions). The exact position of the annotated content within a document is specified by the starting character of the annotated text, or the number of the figure, the equation, or the table, depending on the context. Alternatively, the provenance of parameters can also be specified by bounding boxes. Parameter types are linked to curated experimental values so that they can be systematically integrated into models. We demonstrate the use of this approach by releasing a corpus describing different modeling parameters associated with thalamo-cortical circuitry. The proposed framework supports a rigorous management of large sets of parameters, solving common difficulties in their traceability. Further, it allows easier classification of literature information and more efficient and systematic integration of such information into models and analyses. Collaborative curation of the literature could be a powerful force driving future modeling endeavors.
Published: 2017
Full Text: View/download PDF

8. Relation mining experiments in the pharmacogenomics domain

Author: Gerold Schneider, Simon Clematide, Fabio Rinaldi, University of Zurich, and Rinaldi, Fabio
Subjects: PharmGKB, Text mining, Databases, Factual, Relation (database), Abstracting and Indexing, Computer science, Process (engineering), Knowledge Bases, 410 Linguistics, Health Informatics, 02 engineering and technology, 000 Computer science, knowledge & systems, Task (project management), Ranking (information retrieval), 03 medical and health sciences, 1706 Computer Science Applications, 0202 electrical engineering, electronic engineering, information engineering, Animals, Data Mining, Humans, 2718 Health Informatics, 030304 developmental biology, 0303 health sciences, Data curation, business.industry, Computational Biology, Data science, 3. Good health, Computer Science Applications, Knowledge base, Pharmacogenetics, 10105 Institute of Computational Linguistics, 020201 artificial intelligence & image processing, Personalized medicine, Literature curation, Pharmacogenomics, business
Abstract: Graphical abstractThe first figure illustrates evaluation results for the three different approaches towards extraction of drug/gene/disease relationship from the biomedical literature discussed in the paper. The second figure shows the ODIN curation system as deployed in an assisted curation experiment using PharmGKB data.Display Omitted Highlights? We suggest the usage of PharmGKB data for a shared task similar to BioCreative. The same evaluation tools and criteria could be used. ? Three relation mining methods are presented and evaluated. ? Relative merits of the ranking metrics (TAP-k and AUC iP/R) are discussed. ? An interactive system for the assisted curation of interactions automatically derived from the pharmacogenomics literature is introduced. The mutual interactions among genes, diseases, and drugs are at the heart of biomedical research, and are especially important for the pharmacological industry. The recent trend towards personalized medicine makes it increasingly relevant to be able to tailor drugs to specific genetic makeups. The pharmacogenetics and pharmacogenomics knowledge base (PharmGKB) aims at capturing relevant information about such interactions from several sources, including curation of the biomedical literature.Advanced text mining tools which can support the process of manual curation are increasingly necessary in order to cope with the deluge of new published results. However, effective evaluation of those tools requires the availability of manually curated data as gold standard.In this paper we discuss how the existing PharmGKB database can be used for such an evaluation task in a way similar to the usage of gold standard data derived from protein-protein interaction databases in one of the recent BioCreative shared tasks. Additionally, we present our own considerations and results on the feasibility and difficulty of such a task.
Published: 2012
Full Text: View/download PDF

9. Utilization of ontology look-up services in information retrieval for biomedical literature

Author: Vishnyakova, Dina, Pasche, Emilie, Lovis, Christian, and Ruch, Patrick
Subjects: Medical Subject Headings, User-Computer Interface, Ontology, Abstracting and Indexing, Information retrieval, Data Mining, Database Management Systems, Literature curation, Periodicals as Topic, ddc:616.0757, Databases, Bibliographic, Natural Language Processing, Semantics
Abstract: With the vast amount of biomedical data we face the necessity to improve information retrieval processes in biomedical domain. The use of biomedical ontologies facilitated the combination of various data sources (e.g. scientific literature, clinical data repository) by increasing the quality of information retrieval and reducing the maintenance efforts. In this context, we developed Ontology Look-up services (OLS), based on NEWT and MeSH vocabularies. Our services were involved in some information retrieval tasks such as gene/disease normalization. The implementation of OLS services significantly accelerated the extraction of particular biomedical facts by structuring and enriching the data context. The results of precision in normalization tasks were boosted on about 20%.
Published: 2013

10. GIDMP: good protein-protein interaction data metamining practice

Author: Dariusz Plewczynski and Tomas Klingström
Subjects: Proteomics, Interactome, End user, Bioinformatics, Short Communication, Computational Biology, Proteins, Cell Biology, Biology, Metamining, Biochemistry, Signaling, World Wide Web, Protein-protein interaction, Web page, Protein Interaction Mapping, Table (database), Data Mining, Literature curation, Databases, Protein, Pathways, Systems biology, Molecular Biology, Biological sciences
Abstract: Studying the interactome is one of the exciting frontiers of proteomics, as shown lately at the recent bioinformatics conferences (for example ISMB 2010, or ECCB 2010). Distribution of data is facilitated by a large number of databases. Metamining databases have been created in order to allow researchers access to several databases in one search, but there are serious difficulties for end users to evaluate the metamining effort. Therefore we suggest a new standard, “Good Interaction Data Metamining Practice” (GIDMP), which could be easily automated and requires only very minor inclusion of statistical data on each database homepage. Widespread adoption of the GIDMP standard would provide users with: a standardized way to evaluate the statistics provided by each metamining database, thus enhancing the end-user experiencea stable contact point for each database, allowing the smooth transition of statisticsa fully automated system, enhancing time- and cost-effectiveness.The proposed information can be presented as a few hidden lines of text on the source database www page, and a constantly updated table for a metamining database included in the source/credits web page.
Published: 2010

11. @Note: a workbench for biomedical text mining

Author: Daniel Glez-Peña, Florentino Fdez-Riverola, Sónia Carneiro, Paulo Maia, Anália Lourenço, Eugénio C. Ferreira, Isabel Rocha, Miguel Rocha, Rafael Carreira, and Universidade do Minho
Subjects: Biomedical Research, Information extraction, Databases, Factual, Computer science, 0206 medical engineering, Interoperability, Information Storage and Retrieval, Health Informatics, 02 engineering and technology, Scientific literature, computer.software_genre, 03 medical and health sciences, Annotation, User-Computer Interface, Named-entity recognition, Information retrieval, Component-based software development, 030304 developmental biology, Natural Language Processing, 0303 health sciences, Semantic annotation, Science & Technology, Benchmarking, Biomedical text mining, Computer Science Applications, Semantics, Named entity recognition, Vocabulary, Controlled, Component-based software engineering, Literature curation, computer, 020602 bioinformatics, Software
Abstract: Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists’ needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used., Fundação para a Ciência e a Tecnologia (FCT)
Published: 2008

12. The KnownLeaf literature curation system captures knowledge about Arabidopsis leaf growth and development and facilitates integrated data mining

Author: Stefanie De Bodt, Yves Van de Peer, Sara Jover-Gil, José Luis Micol, Jesper T. Gronlund, Richard G. H. Immink, Katja Baerenfaller, Sofie Van Landeghem, Gerco C. Angenent, Rubén Casanova-Sáez, Wilhelm Gruissem, Vicky Buchanan-Wollaston, Dóra Szakonyi, Pierre Hilson, Lieven Baeyens, Aalt D. J. van Dijk, Jonas Blomme, David Wilson-Sánchez, Fabio Fiorani, Asuka Kuwabara, Sean Walsh, David Esteve-Bruna, Nathalie Gonzalez, Tamara Muñoz-Nortes, Dirk Inzé, Department of Plant Systems Biology, VIB, and Department of Plant Biotechnology and Bioinformatics, Ghent University [Belgium] (UGENT), Department of Biology, Swiss Federal Institute of Technology, Instituto de Bioingeniería, Universidad Miguel Hernández [Elche] (UMH), Warwick Systems Biology Centre, and School of Life Sciences, University of Warwick, Plant Research International, Bioscience, Wageningen University and Research Center (WUR), Genomics Research Institute (GRI), University of Pretoria (UPSpace), Institut Jean-Pierre Bourgin (IJPB), Institut National de la Recherche Agronomique (INRA)-AgroParisTech, Universiteit Gent = Ghent University [Belgium] (UGENT), and Wageningen University and Research [Wageningen] (WUR)
Subjects: EXPRESSION, Leaf growth, INFORMATION, Relational database, Computer science, [SDV]Life Sciences [q-bio], Arabidopsis, Context (language use), Plant Science, Scientific literature, computer.software_genre, Biochemistry, Wiskundige en Statistische Methoden - Biometris, TEXT, Open Biomedical Ontologies, Consistency (database systems), BIOS Applied Bioinformatics, lcsh:Botany, Genetics, Laboratorium voor Moleculaire Biologie, TOOL, BIOS Plant Development Systems, Mathematical and Statistical Methods - Biometris, Literature curation, Data integration, Data curation, PLANT ONTOLOGY, Biology and Life Sciences, Cell Biology, 15. Life on land, Data science, GENE, lcsh:QK1-989, PROTEIN INTERACTIONS, Data sharing, DIFFERENTIATION, MAINTENANCE, Data mining, Laboratory of Molecular Biology, computer, Developmental Biology, GENERATION
Abstract: The information that connects genotypes and phenotypes is essentially embedded in research articles written in natural language. To facilitate access to this knowledge, we constructed a framework for the curation of the scientific literature studying the molecular mechanisms that control leaf growth and development in Arabidopsis thaliana (Arabidopsis). Standard structured statements, called relations, were designed to capture diverse data types, including phenotypes and gene expression linked to genotype description, growth conditions, genetic and molecular interactions, and details about molecular entities. Relations were then annotated from the literature, defining the relevant terms according to standard biomedical ontologies. This curation process was supported by a dedicated graphical user interface, called Leaf Knowtator. A total of 283 primary research articles were curated by a community of annotators, yielding 9947 relations monitored for consistency and over 12,500 references to Arabidopsis genes. This information was converted into a relational database (KnownLeaf) and merged with other public Arabidopsis resources relative to transcriptional networks, protein–protein interaction, gene co-expression, and additional molecular annotations. Within KnownLeaf, leaf phenotype data can be searched together with molecular data originating either from this curation initiative or from external public resources. Finally, we built a network (LeafNet) with a portion of the KnownLeaf database content to graphically represent the leaf phenotype relations in a molecular context, offering an intuitive starting point for knowledge mining. Literature curation efforts such as ours provide high quality structured information accessible to computational analysis, and thereby to a wide range of applications., Current Plant Biology, 2, ISSN:2214-6628
Full Text: View/download PDF

13. A Supervised Learning Approach for Imbalanced Text Classification of Biomedical Literature Triage

Subjects: text classification, machine learning, data sampling, literature curation, imbalance learning, undersampling, bioinformatics, fungal genomics, triage, biomedical literature

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Literature curation"'

1. The case of the gluten bibliome

2. Exploring Curated Conformational Ensembles of Intrinsically Disordered Proteins in the Protein Ensemble Database

3. DisProt: intrinsic protein disorder annotation in 2020

4. Exploring Manually Curated Annotations of Intrinsically Disordered Proteins with DisProt

5. Systematic evaluation of isoform function in literature reports of alternative splicing

6. Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature

7. A Framework for Collaborative Curation of Neuroscientific Literature

8. Relation mining experiments in the pharmacogenomics domain

9. Utilization of ontology look-up services in information retrieval for biomedical literature

10. GIDMP: good protein-protein interaction data metamining practice

11. @Note: a workbench for biomedical text mining

12. The KnownLeaf literature curation system captures knowledge about Arabidopsis leaf growth and development and facilitates integrated data mining

13. A Supervised Learning Approach for Imbalanced Text Classification of Biomedical Literature Triage

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

13 results on '"Literature curation"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources