Back to Search Start Over

Learning from biomedical linked data to suggest valid pharmacogenes

Authors :
Adrien Coulet
Kevin Dalleau
Patrice Ringot
Sébastien Da Silva
Ndeye Coumba Ndiaye
Yassine Marzougui
Knowledge representation, reasonning (ORPAILLEUR)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Ecole Nationale Supérieure des Mines de Nancy (ENSMN)
Institut Mines-Télécom [Paris] (IMT)-Université de Lorraine (UL)
Interactions Gène-Environnement en Physiopathologie Cardio-Vasculaire (IGE-PCV)
Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Lorraine (UL)
ANR PractiKPharma project, grant ANR-15-CE23-0028, funded by the French National Research Agency (http://practikpharma.loria.fr/) and *Snowflake, an Inria associate team (http://snowflake.loria.fr/)
Snowflake Inria Associate Team
Inria@SiliconValley
Snowball Inria Associate Team
ANR-15-CE23-0028,PractiKPharma,Confrontation entre connaissances de l'état de l'art et connaissances extraites de dossiers patients en pharmacogénomique(2015)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Coulet, Adrien
Interactions humain-machine, objets connectés, contenus numériques, données massives et connaissance - Confrontation entre connaissances de l'état de l'art et connaissances extraites de dossiers patients en pharmacogénomique - - PractiKPharma2015 - ANR-15-CE23-0028 - AAPG2015 - VALID
Source :
Journal of Biomedical Semantics, Journal of Biomedical Semantics, BioMed Central, 2017, 8 (1), pp.16. ⟨10.1186/s13326-017-0125-1⟩, Journal of Biomedical Semantics, 2017, 8 (1), pp.16. ⟨10.1186/s13326-017-0125-1⟩, Journal of Biomedical Semantics, Vol 8, Iss 1, Pp 1-12 (2017)
Publication Year :
2017
Publisher :
HAL CCSD, 2017.

Abstract

Background A standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been proposed. Until now, only molecular networks or the biomedical literature were used, whereas many other resources are available. Method We propose here to consume a diverse and larger set of resources using linked data related either to genes, drugs or diseases. One of the advantages of linked data is that they are built on a standard framework that facilitates the joint use of various sources, and thus facilitates considering features of various origins. We propose a selection and linkage of data sources relevant to pharmacogenomics, including for example DisGeNET and Clinvar. We use machine learning to identify and prioritize pharmacogenes that are the most probably valid, considering the selected linked data. This identification relies on the classification of gene–drug pairs as either pharmacogenomically associated or not and was experimented with two machine learning methods –random forest and graph kernel–, which results are compared in this article. Results We assembled a set of linked data relative to pharmacogenomics, of 2,610,793 triples, coming from six distinct resources. Learning from these data, random forest enables identifying valid pharmacogenes with a F-measure of 0.73, on a 10 folds cross-validation, whereas graph kernel achieves a F-measure of 0.81. A list of top candidates proposed by both approaches is provided and their obtention is discussed. Electronic supplementary material The online version of this article (doi:10.1186/s13326-017-0125-1) contains supplementary material, which is available to authorized users.

Subjects

Subjects :
0301 basic medicine
Graph kernel
Computer science
Knowledge discovery from databases
computer.software_genre
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
0302 clinical medicine
False positive paradox
[INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB]
[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]
Valid pharmacogenes
Linked data
Computer Science Applications
Random forest
Identification (information)
Phenotype
[SDV.SP.PHARMA] Life Sciences [q-bio]/Pharmaceutical sciences/Pharmacology
030220 oncology & carcinogenesis
lcsh:R858-859.7
Information Systems
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI]
Computer Networks and Communications
Health Informatics
[SDV.GEN.GH] Life Sciences [q-bio]/Genetics/Human genetics
lcsh:Computer applications to medicine. Medical informatics
Machine learning
Set (abstract data type)
03 medical and health sciences
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Computer Graphics
Selection (linguistics)
Data mining
Semantic Web
Linkage (software)
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]
business.industry
Research
Computational Biology
[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG]
030104 developmental biology
[SDV.GEN.GH]Life Sciences [q-bio]/Genetics/Human genetics
Pharmacogenetics
[SDV.SP.PHARMA]Life Sciences [q-bio]/Pharmaceutical sciences/Pharmacology
Artificial intelligence
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
business
Pharmacogenomics
computer

Details

Language :
English
ISSN :
20411480
Database :
OpenAIRE
Journal :
Journal of Biomedical Semantics, Journal of Biomedical Semantics, BioMed Central, 2017, 8 (1), pp.16. ⟨10.1186/s13326-017-0125-1⟩, Journal of Biomedical Semantics, 2017, 8 (1), pp.16. ⟨10.1186/s13326-017-0125-1⟩, Journal of Biomedical Semantics, Vol 8, Iss 1, Pp 1-12 (2017)
Accession number :
edsair.doi.dedup.....6bda02895091220390c9c8163426fb5b
Full Text :
https://doi.org/10.1186/s13326-017-0125-1⟩