53 results on '"Soualmia LF"'
Search Results
2. eHealth beyond the horizon -- get IT there. Mining knowledge from corpora: an application to retrieval and indexing.
- Author
-
Soualmia LF, Dahamna B, Darmoni S, Andersen SK, Klein GO, Schulz S, Aarts J, and Mazzoleni MC
- Published
- 2008
3. Merging Biomedical Ontologies with BioSTransformers.
- Author
-
Menad S, Abdeddaïm S, and Soualmia LF
- Subjects
- Neural Networks, Computer, Humans, Unified Medical Language System, Biological Ontologies, Semantics
- Abstract
Ontologies play a key role in representing and structuring domain knowledge. In the biomedical domain, the need for this type of representation is crucial for structuring, coding, and retrieving data. However, available ontologies do not encompass all the relevant concepts and relationships. In this paper, we propose the framework SiMHOMer (Siamese Models for Health Ontologies Merging) to semantically merge and integrate the most relevant ontologies in the healthcare domain, with a first focus on diseases, symptoms, drugs, and adverse events. We propose to rely on the siamese neural models we developed and trained on biomedical data, BioSTransformers, to identify new relevant relations between concepts and to create new semantic relations, the objective being to build a new merging ontology that could be used in applications. To validate the proposed approach and the new relations, we relied on the UMLS Metathesaurus and the Semantic Network. Our first results show promising improvements for future research.
- Published
- 2024
- Full Text
- View/download PDF
4. Harnessing the Core Propagation Phenomenon Ontology to Develop a Knowledge Graph for Tracking Health-Related Phenomena.
- Author
-
Medeiros GHA, Soualmia LF, and Zanni-Merk C
- Subjects
- Humans, SARS-CoV-2, Biological Ontologies, Unified Medical Language System, COVID-19
- Abstract
Biomedical data analysis and visualization often demand data experts for each unique health event. There is a clear lack of automatic tools for semantic visualization of the spread of health risks through biomedical data. Illnesses such as coronavirus disease (COVID-19) and Monkeypox spread rampantly around the world before governments could make decisions based on the analysis of such data. We propose the design of a knowledge graph (KG) for spatio-temporal tracking of public health event propagation. To achieve this, we propose the specialization of the Core Propagation Phenomenon Ontology (PropaPhen) into a health-related propagation phenomenon domain ontology. Data from the UMLS and OpenStreetMaps are suggested for instantiating the proposed knowledge graph. Finally, the results of a use case on COVID-19 data from the World Health Organization are analyzed to evaluate the possibilities of our approach.
- Published
- 2024
- Full Text
- View/download PDF
5. Informatics for One Health.
- Author
-
Hollis KF, Mougin F, and Soualmia LF
- Subjects
- Humans, Medical Informatics, One Health
- Abstract
Objectives: To introduce the 2023 International Medical Informatics Association (IMIA) Yearbook by the editors., Methods: The editorial provides an introduction and overview to the 2023 IMIA Yearbook where the special topic is "Informatics for One Health". The special topic, survey papers and some best papers are discussed. The section changes in the Yearbook editorial committee are also described., Results: IMIA Yearbook 2023 provides many perspectives on a relatively new topic called "One Digital Health". The subject is vast, and includes the use of digital technologies to promote the well-being of people and animals, but also of the environment in which they evolve. Many sections produced new work in the topic including One Health and all sections included the latest themes in many specialties in medical informatics., Conclusions: The theme of "Informatics for One Health" is relatively new but the editors of the IMIA Yearbook have presented excellent and thought-provoking work for biomedical informatics in 2023., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).)
- Published
- 2023
- Full Text
- View/download PDF
6. Modeling and integrating interactions involving the CYP450 enzyme system in a multi-terminology server: Contribution to information extraction from a clinical data warehouse.
- Author
-
Gosselin L, Letord C, Leguillon R, Soualmia LF, Dahamna B, Mouazer A, Disson F, Darmoni SJ, and Grosjean J
- Subjects
- Humans, Data Warehousing, Cytochrome P-450 Enzyme System genetics, Cytochrome P-450 Enzyme System metabolism
- Abstract
Introduction: The cytochrome P450 (CYP450) enzyme system is involved in the metabolism of certain drugs and is responsible for most drug interactions. These interactions result in either an enzymatic inhibition or an enzymatic induction mechanism that has an impact on the therapeutic management of patients. Detecting these drug interactions will allow for better predictability in therapeutic response. Therefore, computerized solutions can represent a valuable help for clinicians in their tasks of detection., Objective: The objective of this study is to provide a structured data-source of interactions involving the CYP450 enzyme system. These interactions are aimed to be integrated in the cross-lingual multi-terminology server HeTOP (Health Terminologies and Ontologies Portal), to support the query processing of the clinical data warehouse (CDW) EDSaN (Entrepôt de Données de Santé Normand)., Material and Methods: A selection and curation of drug components (DCs) that share a relationship with the CYP450 system was performed from several international data sources. The DCs were linked according to the type of relationship which can be substrate, inhibitor, or inducer. These relationships were then integrated into the HeTOP server. To validate the CYP450 relationships, a semantic query was performed on the CDW, whose search engine is founded on HeTOP data (concepts, terms, and relations)., Results: A total of 776 DCs are associated by a new interaction relationship, integrated in HeTOP, by 14 enzymes. These are CYP450 1A2, 2A6, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 3A4, 3A7, 11B1,11B2 mitochondrial and P-glycoprotein, constituting a total of 2,088 relationships. A general modelling of cytochromic interactions was performed. From this model, 233,006 queries were processed in less than two hours, demonstrating the usefulness and performance of our CDW implementation. Moreover, they showed that in our university hospital, the concurrent prescription that could cause a cytochromic interaction is Bisoprolol with Amiodarone by enzymatic inhibition for 2,493 patients., Discussion: The queries submitted to the CDW EDSaN allowed to highlight the most prescribed molecules simultaneously and potentially responsible for cytochromic interactions. In a second step, it would be interesting to evaluate the real clinical impact by looking for possible adverse effects of these interactions in the patients' files. Other computational solutions for cytochromic interactions exist. The impact of CYP450 is particularly important for drugs with narrow therapeutic window (NTW) as they can lead to increased toxicity or therapeutic failure. It is also important to define which drug component is a pro-drug and to considerate the many genetic polymorphisms of patients., Conclusion: The HeTOP server contains a non-negligible number of relationships between drug components and CYP450 from multiple reference sources. These data allow us to query our Clinical Data Warehouse to highlight these cytochromic interactions. It would be interesting in the future to assess the actual clinical impact in hospital reports., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2022 Elsevier B.V. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
7. Inclusive Digital Health.
- Author
-
Mougin F, Hollis KF, and Soualmia LF
- Subjects
- Humans, Artificial Intelligence, Pandemics, Algorithms, COVID-19, Medical Informatics
- Abstract
Objectives: To introduce the 2022 International Medical Informatics Association (IMIA) Yearbook by the editors., Methods: The editorial provides an introduction and overview to the 2022 IMIA Yearbook whose special topic is "Inclusive Digital Health: Addressing Equity, Literacy, and Bias for Resilient Health Systems". The special topic, survey papers, section editor synopses and some best papers are discussed. The sections' changes in the Yearbook Editorial Committee are also described., Results: As shown in the previous edition, health informatics in the context of a global pandemic has led to the development of ways to collect, standardize, disseminate and reuse data worldwide. The Corona Virus Disease 2019 (COVID-19) pandemic has demonstrated the need for timely, reliable, open, and globally available information to support decision making. It has also highlighted the need to address social inequities and disparities in access to care across communities. This edition of the Yearbook acknowledges the fact that much work has been done to study health equity in recent years in the various fields of health informatics research., Conclusion: There is a strong desire to better consider disparities between populations to avoid biases being induced in Artificial Intelligence algorithms in particular. Telemedicine and m-health must be more inclusive for people with disabilities or living in isolated geographical areas., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).)
- Published
- 2022
- Full Text
- View/download PDF
8. Analyzing SARS-CoV-2 Sequence Patterns by Semantic Trajectories.
- Author
-
Laddada W, Zanni-Merk C, and Soualmia LF
- Subjects
- Genome, Viral genetics, Humans, Pandemics, Semantics, COVID-19, SARS-CoV-2 genetics
- Abstract
Since the beginning of the pandemic due to the SARS-CoV-2 emergence, several variants has been observed all over the world. One of the last known, Omicron, caused a large spread of the virus in few days, and several countries reached a record number of contaminations. Indeed, the mutation in the Spike region of the virus played an important role in altering its behavior. Therefore, it is important to understand the virus evolution by extracting and analyzing the virus structure of each variant. In this work we show how patterns sequence could be analyzed and extracted by means of semantic trajectories modeling. To do so, we designed a graph-based model in which the genome organization is handled using nodes and edges to represent respectively the nucleotides and sequence connection (point of interest and routes for trajectories). The modeling choices and pattern extraction from the graph allowed to retrieve a region where a mutation occurred in Omicron (NCBI version:OM011974.1).
- Published
- 2022
- Full Text
- View/download PDF
9. OntoBioStat: Supporting Causal Diagram Design and Analysis.
- Author
-
Pressat Laffouilhère T, Grosjean J, Bénichou J, Darmoni SJ, and Soualmia LF
- Subjects
- Bias, Causality, Biometry, Biostatistics
- Abstract
Suitable causal inference in biostatistics can be best achieved by knowledge representation thanks to causal diagrams or directed acyclic graphs. However, necessary and sufficient causes are not easily represented. Since existing ontologies do not fill this gap, we designed OntoBioStat in order to enable covariate selection support based on causal relation representations. OntoBioStat automatic ontological causal diagram construction and inferences are detailed in this study. OntoBioStat inferences are allowed by Semantic Web Rule Language rules and axioms. First, statements made by the users include outcome, exposure, covariate, and causal relation specification. Then, reasoning enable automatic construction using generic instances of Meta_Variable and Necessary_Variable classes. Finally, inferred classes highlighted potential bias such as confounder-like. Ontological causal diagram built with OntoBioStat was compared to a standard causal diagram (without OntoBioStat) in a theoretical study. It was found that confounding and bias were not completely identified by the standard causal diagram, and erroneous covariate sets were provided. Further research is needed in order to make OntoBioStat more usable.
- Published
- 2022
- Full Text
- View/download PDF
10. Assisting Data Retrieval with a Drug Knowledge Graph.
- Author
-
Lelong R, Dahamna B, Leguillon R, Grosjean J, Letord C, Darmoni SJ, and Soualmia LF
- Subjects
- France, Humans, Information Storage and Retrieval, Knowledge, Pattern Recognition, Automated, Pharmaceutical Preparations
- Abstract
The Normandy health data warehouse EDSaN integrates the medication orders from the University Hospital of Rouen (France). This study aims at describing the design and the evaluation of an information retrieval system founded on a complex and semantically augmented knowledge graph dedicated to EDSaN drugs' prescriptions. The system is intended to help the selection of drugs in the search process by health professionals. The manual evaluation of the relevance of the returned drugs showed encouraging results as expected. A deeper analysis in order to improve the ranking method is needed and will be performed in a future work.
- Published
- 2022
- Full Text
- View/download PDF
11. Tracing and analyzing COVID-19 dissemination using knowledge graphs.
- Author
-
Medeiros GHA, Soualmia LF, Zanni-Merk C, and Hagverdiyev R
- Abstract
The COVID-19 (SARS-CoV-2) spread around the globe could have been halted if we had had a better understanding of the situation and applied more restrictive measures for travel adapted to each country. This is due to a lack of efficient tools to visualize, analyze and control the virus dissemination. In the context of virus proliferation, analyzing flight connections between countries and COVID-19 data seems helpful to understand spatial and temporal information about the virus and its possible spread. To manage these complex, massive, and heterogeneous data, we propose a methodology based on knowledge graphs models. Several analyses and visualization tools can be applied, and our results show that these knowledge graph models may be a promising way to study the dissemination of any virus. These graphs can also be easily enriched with additional information that could be useful in the future to analyze or predict other interesting indicators., (© 2022 The Author(s). Published by Elsevier B.V.)
- Published
- 2022
- Full Text
- View/download PDF
12. Health Data, Information, and Knowledge Sharing for Addressing the COVID-19.
- Author
-
Soualmia LF, Hollis KF, Mougin F, and Séroussi B
- Subjects
- Health Information Exchange, Humans, Medical Informatics, COVID-19, Health Communication, Information Dissemination
- Abstract
Objectives: To introduce the 2021 International Medical Informatics Association (IMIA) Yearbook by the editors., Methods: The editorial provides an introduction and overview to the 2021 IMIA Yearbook whose special topic is "Managing Pandemics with Health Informatics - Successes and Challenges". The Special Topic, the keynote paper, and survey papers are discussed. The IMIA President's statement and the IMIA dialogue with the World Health Organization are introduced. The sections' changes in the Yearbook Editorial Committee are also described., Results: Health informatics, in the context of a global pandemic, led to the development of ways to collect, standardize, disseminate and reuse data worldwide: public health data but also information from social networks and scientific literature. Fact checking methods were mostly based on artificial intelligence and natural language processing. The pandemic also introduced new challenges for telehealth support in times of critical response. Next generation sequencing in bioinformatics helped in decoding the sequence of the virus and the development of messenger ribonucleic acid (mRNA) vaccines., Conclusions: The Corona Virus Disease 2019 (COVID-19) pandemic shows the need for timely, reliable, open, and globally available information to support decision making and efficiently control outbreaks. Applying Findable, Accessible, Interoperable, and Reusable (FAIR) requirements for data is a key success factor while challenging ethical issues have to be considered., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).)
- Published
- 2021
- Full Text
- View/download PDF
13. Ontological Models Supporting Covariates Selection in Observational Studies.
- Author
-
Pressat Laffouilhère T, Grosjean J, Bénichou J, Darmoni SJ, and Soualmia LF
- Subjects
- Causality
- Abstract
In the context of causal inference, biostatisticians use causal diagrams to select covariates in order to build multivariate models. These diagrams represent datasets variables and their relations but have some limitations (representing interactions, bidirectional causal relations). The MetBrAYN project aims at building an ontological-based process to tackle these issues. The knowledge acquired by the biostatistician during a methodological consultation for a research question will be represented in a general ontology. In order to aggregate various forms of knowledge the ontology will act as a wrapper. Ontology-based causal diagrams will be semi-automatically built. Founded on inference rules, the global system will help biostatisticians to curate it and to visualize recommended covariates for their research question.
- Published
- 2021
- Full Text
- View/download PDF
14. Patient and Graph Embeddings for Predictive Diagnosis of Drug Iatrogenesis.
- Author
-
Soualmia LF, Lafon V, and Darmoni SJ
- Subjects
- Algorithms, Humans, Knowledge Bases, Semantics, Artificial Intelligence, Pharmaceutical Preparations
- Abstract
In the context of the IA.TROMED project we intend to develop and evaluate original algorithmic methods that will rely on semantic enrichment of embeddings by combining new deep learning algorithms, such as models founded on transformers, and symbolic artificial intelligence. The documents' embeddings, the graphs' embeddings of biomedical concepts, and patients' embeddings, all of them semantically enriched with aligned formal ontologies and semantic networks, will constitute a layer that will play the role of a queryable and searchable knowledge base that will supply the IA.TROMED's clinical, predictive, and iatrogenic diagnosis support module.
- Published
- 2021
- Full Text
- View/download PDF
15. OntoRepliCov: an Ontology-Based Approach for Modeling the SARS-CoV-2 Replication Process.
- Author
-
Laddada W, Soualmia LF, Zanni-Merk C, Ayadi A, Frydman C, L'Hote I, and Imbert I
- Abstract
Understanding the replication machinery of viruses contributes to suggest and try effective antiviral strategies. Exhaustive knowledge about the proteins structure, their function, or their interaction is one of the preconditions for successfully modeling it. In this context, modeling methods based on a formal representation with a high semantic expressiveness would be relevant to extract proteins and their nucleotide or amino acid sequences as an element from the replication process. Consequently, our approach relies on the use of semantic technologies to design the SARS-CoV-2 replication machinery. This provides the ability to infer new knowledge related to each step of the virus replication. More specifically, we developed an ontology-based approach enriched with reasoning process of a complete replication machinery process for SARS-CoV-2. We present in this paper a partial overview of our ontology OntoRepliCov to describe one step of this process, namely, the continuous translation or protein synthesis, through classes, properties, axioms, and SWRL (Semantic Web Rule Language) rules., (© 2021 The Author(s). Published by Elsevier B.V.)
- Published
- 2021
- Full Text
- View/download PDF
16. Transparency of Health Informatics Processes as the Condition of Healthcare Professionals' and Patients' Trust and Adoption: the Rise of Ethical Requirements.
- Author
-
Séroussi B, Hollis KF, and Soualmia LF
- Subjects
- Artificial Intelligence ethics, Attitude of Health Personnel, Bioethical Issues, Health Personnel, Humans, Attitude to Health, Medical Informatics ethics, Trust
- Abstract
Objectives: To provide an introduction to the 2020 International Medical Informatics Association (IMIA) Yearbook by the editors., Methods: This editorial provides an introduction and overview to the 2020 IMIA Yearbook which special topic is: "Ethics in Health Informatics". The keynote paper, the survey paper of the Special Topic section, and the paper about Donald Lindberg's ethical scientific openness in the History of Medical Informatics chapter of the Yearbook are discussed. Changes in the Yearbook Editorial Committee are also described., Results: Inspired by medical ethics, ethics in health informatics progresses with the advances in biomedical informatics. With the wide use of EHRs, the enlargement of the care team perimeter, the need for data sharing for care continuity, the reuse of data for the sake of research, and the implementation of AI-powered decision support tools, new ethics requirements are necessary to address issues such as threats on privacy, confidentiality breaches, poor security practices, lack of patient information, tension on data sharing and reuse policies, need for more transparency on apps effectiveness, biased algorithms with discriminatory outcomes, guarantee on trustworthy AI, concerns on the re-identification of de-identified data., Conclusions: Despite privacy rules rooted in the Health Insurance Portability and Accountability Act of 1996 (HIPAA) in the USA and even more restrictive new regulations such as the EU General Data Protection Regulation published in May 2018, some people do not believe their data will be kept confidential and may not share sensitive information with a provider, which may also induce unethical situations. Transparency on healthcare data processes is a condition of healthcare professionals' and patients' trust and their adoption of digital tools., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (Georg Thieme Verlag KG Stuttgart.)
- Published
- 2020
- Full Text
- View/download PDF
17. Building a Semantic Health Data Warehouse in the Context of Clinical Trials: Development and Usability Study.
- Author
-
Lelong R, Soualmia LF, Grosjean J, Taalba M, and Darmoni SJ
- Abstract
Background: The huge amount of clinical, administrative, and demographic data recorded and maintained by hospitals can be consistently aggregated into health data warehouses with a uniform data model. In 2017, Rouen University Hospital (RUH) initiated the design of a semantic health data warehouse enabling both semantic description and retrieval of health information., Objective: This study aimed to present a proof of concept of this semantic health data warehouse, based on the data of 250,000 patients from RUH, and to assess its ability to assist health professionals in prescreening eligible patients in a clinical trials context., Methods: The semantic health data warehouse relies on 3 distinct semantic layers: (1) a terminology and ontology portal, (2) a semantic annotator, and (3) a semantic search engine and NoSQL (not only structured query language) layer to enhance data access performances. The system adopts an entity-centered vision that provides generic search capabilities able to express data requirements in terms of the whole set of interconnected conceptual entities that compose health information., Results: We assessed the ability of the system to assist the search for 95 inclusion and exclusion criteria originating from 5 randomly chosen clinical trials from RUH. The system succeeded in fully automating 39% (29/74) of the criteria and was efficiently used as a prescreening tool for 73% (54/74) of them. Furthermore, the targeted sources of information and the search engine-related or data-related limitations that could explain the results for each criterion were also observed., Conclusions: The entity-centered vision contrasts with the usual patient-centered vision adopted by existing systems. It enables more genericity in the information retrieval process. It also allows to fully exploit the semantic description of health information. Despite their semantic annotation, searching within clinical narratives remained the major challenge of the system. A finer annotation of the clinical texts and the addition of specific functionalities would significantly improve the results. The semantic aspect of the system combined with its generic entity-centered vision enables the processing of a large range of clinical questions. However, an important part of health information remains in clinical narratives, and we are currently investigating novel approaches (deep learning) to enhance the semantic annotation of those unstructured data., (©Romain Lelong, Lina F Soualmia, Julien Grosjean, Mehdi Taalba, Stéfan J Darmoni. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 20.12.2019.)
- Published
- 2019
- Full Text
- View/download PDF
18. The MeSH-Gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for Semantic Similarity.
- Author
-
Abdeddaïm S, Vimard S, and Soualmia LF
- Subjects
- Humans, MEDLINE, PubMed, Medical Subject Headings, Neural Networks, Computer, Semantics
- Abstract
Eliciting semantic similarity between concepts remains a challenging task. Recent approaches founded on embedding vectors have gained in popularity as they have risen to efficiently capture semantic relationships. The underlying idea is that two words that have close meaning gather similar contexts. In this study, we propose a new neural network model, named MeSH-gram, which relies on a straightforward approach that extends the skip-gram neural network model by considering MeSH (Medical Subject Headings) descriptors instead of words. Trained on publicly available PubMed/MEDLINE corpus, MeSH-gram is evaluated on reference standards manually annotated for semantic similarity. MeSH-gram is first compared to skip-gram with vectors of size 300 and at several windows' contexts. A deeper comparison is performed with twenty existing models. All the obtained results with Spearman's rank correlations between human scores and computed similarities show that MeSH-gram (i) outperforms the skip-gram model and (ii) is comparable to the best methods that need more computation and external resources.
- Published
- 2019
- Full Text
- View/download PDF
19. Artificial Intelligence in Health Informatics: Hype or Reality?
- Author
-
Hollis KF, Soualmia LF, and Séroussi B
- Subjects
- Artificial Intelligence, Medical Informatics
- Abstract
Objectives: To provide an introduction to the 2019 International Medical Informatics Association (IMIA) Yearbook by the editors., Methods: This editorial presents an overview and introduction to the 2019 IMIA Yearbook which includes the special topic "Artificial Intelligence in Health: New Opportunities, Challenges, and Practical Implications". The special topic is discussed, the IMIA President's statement is introduced, and changes in the Yearbook editorial team are described., Results: Artificial intelligence (AI) in Medicine arose in the 1970's from new approaches for representing expert knowledge with computers. Since then, AI in medicine has gradually evolved toward essentially data-driven approaches with great results in image analysis. However, data integration, storage, and management still present clear challenges among which the lack of explanability of the results produced by data-driven AI methods., Conclusion: With more health data availability, and the recent developments of efficient and improved machine learning algorithms, there is a renewed interest for AI in medicine.The objective is to help health professionals improve patient care while also reduce costs. However, the other costs of AI, including ethical issues when processing personal health data by algorithms, should be included., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (Georg Thieme Verlag KG Stuttgart.)
- Published
- 2019
- Full Text
- View/download PDF
20. Cimind: A phonetic-based tool for multilingual named entity recognition in biomedical texts.
- Author
-
Cabot C, Darmoni S, and Soualmia LF
- Subjects
- Algorithms, Humans, Natural Language Processing, Phonetics, Vocabulary, Controlled
- Abstract
Background: Extracting concepts from biomedical texts is a key to support many advanced applications such as biomedical information retrieval. However, in clinical notes Named Entity Recognition (NER) has to deal with various types of errors such as spelling errors, grammatical errors, truncated sentences, and non-standard abbreviations. Moreover, in numerous countries, NER is challenged by the availability of many resources originally developed and only suitable for English texts. This paper presents the Cimind system, a multilingual system dedicated to named entity recognition in medical texts based on a phonetic similarity measure., Methods: Cimind performs entity recognition by combining phonetic recognition using the DM phonetic algorithm to deal with spelling errors and string similarity measures. Three main steps are processed to identify terms in a controlled vocabulary: normalization, candidate selection by phonetic similarity and candidate ranking., Results: Cimind was evaluated in the 2016 and 2017 editions of the CLEF eHealth challenge in the CépiDC/CDC tasks. In 2017, it obtained on each corpus the following results: English dataset: 83.9% P, 78.3% R, 81.0% F1; French raw dataset: 85.7% P, 68.9% R, 76.4% F1; French aligned dataset: 83.5% P, 77.5% R, 80.4% F1. It ranked first in French and fourth in English in officials runs., (Copyright © 2019 Elsevier Inc. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
21. A 21st Century Embarrassment of Riches: The Balance Between Health Data Access, Usage, and Sharing.
- Author
-
Holmes JH, Soualmia LF, and Séroussi B
- Subjects
- Confidentiality, Information Dissemination, Medical Informatics
- Abstract
Objectives: To provide an introduction to the 2018 International Medical Informatics Association (IMIA) Yearbook by the editors., Methods: This editorial provides an overview and introduction to the 2018 IMIA Yearbook which special topic is: "Between access and privacy: Challenges in sharing health data". The special topic editors and section are discussed, and the new section of the 2018 Yearbook, Cancer Informatics, is introduced. Changes in the Yearbook editorial team are also described., Results: With the exponential burgeoning of health-related data, and attendant demands for sharing and using these data, the special topic for 2018 is noteworthy for its timeliness. Data sharing brings responsibility for preservation of data privacy, and for this, patient perspectives are of paramount importance in understanding how patients view their health data and how their privacy should be protected., Conclusion: With the increase in availability of health-related data from many different sources and contexts, there is an urgent need for informaticians to become aware of their role in maintaining the balance between data sharing and privacy., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (Georg Thieme Verlag KG Stuttgart.)
- Published
- 2018
- Full Text
- View/download PDF
22. Transforming Data into Knowledge: How to Improve the Efficiency of Clinical Care?
- Author
-
Séroussi B, Soualmia LF, and Holmes JH
- Subjects
- Internationality, Societies, Medical, Data Mining, Information Dissemination, Medical Informatics
- Abstract
Objectives: To provide an introduction to the 2017 IMIA Yearbook of Medical Informatics by the editors. Methods: We present a brief overview of the 2017 special topic "Learning from experience: Secondary use of patient data". We review our choice of special topic section editors, present the new section "Health Information Management", and discuss transitions in the editorial team. Results: In this edition of the Yearbook, we focused on one of the most important issues for the medical informatics community: The secondary use of clinical data. With the ubiquitous adoption of electronic health records (EHRs) and the increasing availability of genomic and environmental data, as well as the accessibility of unstructured data in social media, issues related to data integration, storage, and management, as well as the need for novel analytic approaches are clear challenges. The paradigm of Learning Health Systems (LHSs) is presented in the keynote paper and survey papers review the significant developments in allied fields such as clinical research, clinical systems, translational informatics, and public health over the past two years. IMIA Working Groups also contributed to this topic. Conclusion: The 2017 issue of the IMIA yearbook focuses on the secondary use of patient data and presents the difficulties that still need to be solved before witnessing the actual development of LHSs., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (Georg Thieme Verlag KG Stuttgart.)
- Published
- 2017
- Full Text
- View/download PDF
23. On Contributing to the Progress of Medical Informatics as Publisher.
- Author
-
Haux R, Geissbuhler A, Holmes J, Jaulent MC, Koch S, Kulikowski CA, Lehmann CU, McCray AT, Séroussi B, Soualmia LF, and van Bemmel JH
- Subjects
- History, 20th Century, History, 21st Century, Medical Informatics history, Publishing history
- Abstract
May 1st, 2017, will mark Dieter Bergemann's 80th birthday. As Chief Executive Officer and Owner of Schattauer Publishers from 1983 to 2016, the biomedical and health informatics community owes him a great debt of gratitude. The past and present editors of Methods of Information in Medicine, the IMIA Yearbook of Medical Informatics, and Applied Clinical Informatics want to honour and thank Dieter Bergemann by providing a brief biography that emphasizes his contributions, by reviewing his critical role as an exceptionally supportive publisher for Schattauer's three biomedical and health informatics periodicals, and by sharing some personal anecdotes. Over the past 40 years, Dieter Bergemann has been an influential, if behind-the-scenes, driving force in biomedical and health informatics publications, helping to ensure success in the dissemination of our field's research and practice., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (Georg Thieme Verlag KG Stuttgart.)
- Published
- 2017
- Full Text
- View/download PDF
24. Evaluation of the Terminology Coverage in the French Corpus LiSSa.
- Author
-
Cabot C, Soualmia LF, Grosjean J, Griffon N, and Darmoni SJ
- Subjects
- Information Storage and Retrieval methods, Language, Systematized Nomenclature of Medicine, Translating, Databases, Bibliographic, Natural Language Processing, Vocabulary, Controlled
- Abstract
Extracting concepts from medical texts is a key to support many advanced applications in medical information retrieval. Entity recognition in French texts is moreover challenged by the availability of many resources originally developed for English texts. This paper proposes an evaluation of the terminology coverage in a corpus of 50,000 French articles extracted from the bibliographic database LiSSa. This corpus was automatically indexed with 32 health terminologies, published in French or translated. Then, the terminologies providing the best coverage of these documents were determined. The results show that major resources such as the NCI and SNOMED CT thesauri achieve the largest annotation of the corpus while specific French resources prove to be valuable assets.
- Published
- 2017
25. Efficient Results in Semantic Interoperability for Health Care. Findings from the Section on Knowledge Representation and Management.
- Author
-
Soualmia LF and Charlet J
- Subjects
- Biological Ontologies, Chronic Disease, Humans, Knowledge Management, Rare Diseases classification, Information Storage and Retrieval, Vocabulary, Controlled
- Abstract
Objectives: To summarize excellent current research in the field of Knowledge Representation and Management (KRM) within the health and medical care domain., Method: We provide a synopsis of the 2016 IMIA selected articles as well as a related synthetic overview of the current and future field activities. A first step of the selection was performed through MEDLINE querying with a list of MeSH descriptors completed by a list of terms adapted to the KRM section. The second step of the selection was completed by the two section editors who separately evaluated the set of 1,432 articles. The third step of the selection consisted of a collective work that merged the evaluation results to retain 15 articles for peer-review., Results: The selection and evaluation process of this Yearbook's section on Knowledge Representation and Management has yielded four excellent and interesting articles regarding semantic interoperability for health care by gathering heterogeneous sources (knowledge and data) and auditing ontologies. In the first article, the authors present a solution based on standards and Semantic Web technologies to access distributed and heterogeneous datasets in the domain of breast cancer clinical trials. The second article describes a knowledge-based recommendation system that relies on ontologies and Semantic Web rules in the context of chronic diseases dietary. The third article is related to concept-recognition and text-mining to derive common human diseases model and a phenotypic network of common diseases. In the fourth article, the authors highlight the need for auditing the SNOMED CT. They propose to use a crowdbased method for ontology engineering., Conclusions: The current research activities further illustrate the continuous convergence of Knowledge Representation and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care by proposing solutions to cope with the problem of semantic interoperability. Indeed, there is a need for powerful tools able to manage and interpret complex, large-scale and distributed datasets and knowledge bases, but also a need for user-friendly tools developed for the clinicians in their daily practice.
- Published
- 2016
- Full Text
- View/download PDF
26. Retrieving Clinical and Omic Data from Electronic Health Records.
- Author
-
Cabot C, Lelong R, Grosjean J, Soualmia LF, and Darmoni SJ
- Subjects
- Systems Integration, User-Computer Interface, Biological Ontologies organization & administration, Data Mining methods, Databases, Genetic, Electronic Health Records organization & administration, Medical Record Linkage methods, Natural Language Processing
- Published
- 2016
27. Bioinformatics Methods and Tools to Advance Clinical Care. Findings from the Yearbook 2015 Section on Bioinformatics and Translational Informatics.
- Author
-
Soualmia LF and Lecroq T
- Subjects
- Humans, Pharmacogenetics, Proteome, Computational Biology, Genomics, Precision Medicine
- Abstract
Objectives: To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain and clinical care., Method: We provide a synopsis of the articles selected for the IMIA Yearbook 2015, from which we attempt to derive a synthetic overview of current and future activities in the field. As last year, a first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor has evaluated separately the set of 1,594 articles and the evaluation results were merged for retaining 15 articles for peer-review., Results: The selection and evaluation process of this Yearbook's section on Bioinformatics and Translational Informatics yielded four excellent articles regarding data management and genome medicine that are mainly tool-based papers. In the first article, the authors present PPISURV a tool for uncovering the role of specific genes in cancer survival outcome. The second article describes the classifier PredictSNP which combines six performing tools for predicting disease-related mutations. In the third article, by presenting a high-coverage map of the human proteome using high resolution mass spectrometry, the authors highlight the need for using mass spectrometry to complement genome annotation. The fourth article is also related to patient survival and decision support. The authors present datamining methods of large-scale datasets of past transplants. The objective is to identify chances of survival., Conclusions: The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care. Indeed, there is a need for powerful tools for managing and interpreting complex, large-scale genomic and biological datasets, but also a need for user-friendly tools developed for the clinicians in their daily practice. All the recent research and development efforts contribute to the challenge of impacting clinically the obtained results towards a personalized medicine.
- Published
- 2015
- Full Text
- View/download PDF
28. A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French.
- Author
-
Griffon N, Schuers M, Soualmia LF, Grosjean J, Kerdelhué G, Kergourlay I, Dahamna B, and Darmoni SJ
- Subjects
- France, Humans, Medical Subject Headings, Information Storage and Retrieval methods, Language, PubMed statistics & numerical data, Search Engine statistics & numerical data
- Abstract
Background: PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing., Objective: The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French)., Methods: To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy., Results: More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French., Conclusions: It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese.
- Published
- 2014
- Full Text
- View/download PDF
29. Managing large-scale genomic datasets and translation into clinical practice.
- Author
-
Lecroq T and Soualmia LF
- Subjects
- Electronic Health Records, Genomics, Humans, Male, Translational Research, Biomedical, Computational Biology, Medical Informatics
- Abstract
Objective: To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain., Method: We provide a synopsis of the articles selected for the IMIA Yearbook 2014, from which we attempt to derive a synthetic overview of current and future activities in the field. A first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor evaluated independently the set of 1,851 articles and 15 articles were retained for peer-review., Results: The selection and evaluation process of this Yearbook's section on Bioinformatics and Translational Informatics yielded three excellent articles regarding data management and genome medicine. In the first article, the authors present VEST (Variant Effect Scoring Tool) which is a supervised machine learning tool for prioritizing variants found in exome sequencing projects that are more likely involved in human Mendelian diseases. In the second article, the authors show how to infer surnames of male individuals by crossing anonymous publicly available genomic data from the Y chromosome and public genealogy data banks. The third article presents a statistical framework called iCluster+ that can perform pattern discovery in integrated cancer genomic data. This framework was able to determine different tumor subtypes in colon cancer., Conclusions: The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on large-scale biological, genomic, and Electronic Health Records data. Indeed, there is a need for powerful tools for managing and interpreting complex data, but also a need for user-friendly tools developed for the clinicians in their daily practice. All the recent research and development efforts are contributing to the challenge of impacting clinically the results and even going towards a personalized medicine in the near future.
- Published
- 2014
- Full Text
- View/download PDF
30. Evaluating alignment quality between iconic language and reference terminologies using similarity metrics.
- Author
-
Griffon N, Kerdelhué G, Soualmia LF, Merabti T, Grosjean J, Lamy JB, Venot A, Duclos C, and Darmoni SJ
- Subjects
- Electronic Health Records standards, Humans, International Classification of Diseases statistics & numerical data, Medical Subject Headings statistics & numerical data, Unified Medical Language System standards, Information Storage and Retrieval standards, Terminology as Topic, Vocabulary, Controlled
- Abstract
Background: Visualization of Concepts in Medicine (VCM) is a compositional iconic language that aims to ease information retrieval in Electronic Health Records (EHR), clinical guidelines or other medical documents. Using VCM language in medical applications requires alignment with medical reference terminologies. Alignment from Medical Subject Headings (MeSH) thesaurus and International Classification of Diseases - tenth revision (ICD10) to VCM are presented here. This study aim was to evaluate alignment quality between VCM and other terminologies using different measures of inter-alignment agreement before integration in EHR., Methods: For medical literature retrieval purposes and EHR browsing, the MeSH thesaurus and the ICD10, both organized hierarchically, were aligned to VCM language. Some MeSH to VCM alignments were performed automatically but others were performed manually and validated. ICD10 to VCM alignment was entirely manually performed. Inter-alignment agreement was assessed on ICD10 codes and MeSH descriptors, sharing the same Concept Unique Identifiers in the Unified Medical Language System (UMLS). Three metrics were used to compare two VCM icons: binary comparison, crude Dice Similarity Coefficient (DSCcrude), and semantic Dice Similarity Coefficient (DSCsemantic), based on Lin similarity. An analysis of discrepancies was performed., Results: MeSH to VCM alignment resulted in 10,783 relations: 1,830 of which were manually performed and 8,953 were automatically inherited. ICD10 to VCM alignment led to 19,852 relations. UMLS gathered 1,887 alignments between ICD10 and MeSH. Only 1,606 of them were used for this study. Inter-alignment agreement using only validated MeSH to VCM alignment was 74.2% [70.5-78.0]CI95%, DSCcrude was 0.93 [0.91-0.94]CI95%, and DSCsemantic was 0.96 [0.95-0.96]CI95%. Discrepancy analysis revealed that even if two thirds of errors came from the reviewers, UMLS was nevertheless responsible for one third., Conclusions: This study has shown strong overall inter-alignment agreement between MeSH to VCM and ICD10 to VCM manual alignments. VCM icons have now been integrated into a guideline search engine (http://www.cismef.org) and a health terminologies portal (http://www.hetop.eu).
- Published
- 2014
- Full Text
- View/download PDF
31. An approach to compare bio-ontologies portals.
- Author
-
Grosjean J, Soualmia LF, Bouarech K, Jonquet C, and Darmoni SJ
- Subjects
- Pattern Recognition, Automated methods, Algorithms, Biological Ontologies, Data Curation methods, Documentation methods, Information Storage and Retrieval methods, Natural Language Processing, Semantics
- Abstract
Background: main biomedical information retrieval systems are based on controlled vocabularies and most specifically on terminologies or ontologies (T/O). These classification structures allow indexing, coding, annotating different kind of documents. Many T/O have been created for different purposes and it became a problem for finding specific concepts in the multitude of existing nomenclatures. The NCBO (National Center for Biomedical Ontologies) BioPortal and the CISMeF (Catalogue et Index des Sites Médicaux de langue Française) HeTOP projects have been developed to tackle this issue., Objective: the present work consists in comparing both portals., Methods: we hereby are proposing a set of criteria to compare bio-ontologies portals in terms of goals, features, technologies and usability., Results: BioPortal and HeTOP have been compared based on the given criteria. While both portals are designed to store and make T/O available to the community and are sharing many basic features, they differ on several points mainly because of their basic purposes., Conclusion: thanks to the comparison criteria, we can assume that a merge between BioPortal and HeTOP is possible in terms of functionalities. The main difficulties will be about merging the data repositories and applying different policies on T/O content.
- Published
- 2014
32. Improving information retrieval with multiple health terminologies in a quality-controlled gateway.
- Author
-
Soualmia LF, Sakji S, Letord C, Rollin L, Massari P, and Darmoni SJ
- Abstract
Background: The Catalog and Index of French-language Health Internet resources (CISMeF) is a quality-controlled health gateway, primarily for Web resources in French (n=89,751). Recently, we achieved a major improvement in the structure of the catalogue by setting-up multiple terminologies, based on twelve health terminologies available in French, to overcome the potential weakness of the MeSH thesaurus, which is the main and pivotal terminology we use for indexing and retrieval since 1995. The main aim of this study was to estimate the added-value of exploiting several terminologies and their semantic relationships to improve Web resource indexing and retrieval in CISMeF, in order to provide additional health resources which meet the users' expectations., Methods: Twelve terminologies were integrated into the CISMeF information system to set up multiple-terminologies indexing and retrieval. The same sets of thirty queries were run: (i) by exploiting the hierarchical structure of the MeSH, and (ii) by exploiting the additional twelve terminologies and their semantic links. The two search modes were evaluated and compared., Results: The overall coverage of the multiple-terminologies search mode was improved by comparison to the coverage of using the MeSH (16,283 vs. 14,159) (+15%). These additional findings were estimated at 56.6% relevant results, 24.7% intermediate results and 18.7% irrelevant., Conclusion: The multiple-terminologies approach improved information retrieval. These results suggest that integrating additional health terminologies was able to improve recall. Since performing the study, 21 other terminologies have been added which should enable us to make broader studies in multiple-terminologies information retrieval.
- Published
- 2013
- Full Text
- View/download PDF
33. Validating the semantics of a medical iconic language using ontological reasoning.
- Author
-
Lamy JB, Soualmia LF, Kerdelhué G, Venot A, and Duclos C
- Subjects
- Semantics, Vocabulary, Controlled
- Abstract
To help clinicians read medical texts such as clinical practice guidelines or drug monographs, we proposed an iconic language called VCM. This language can use icons to represent the main medical concepts, including diseases, symptoms, treatments and follow-up procedures, by combining various pictograms, shapes and colors. However, the semantics of this language have not been formalized, and users may create inconsistent icons, e.g. by combining the "tumor" shape and the "sleeping" pictograms into a "tumor of sleeping" icon. This work aims to represent the VCM language using DLs and OWL for evaluating its semantics by reasoners, and in particular for determining inconsistent icons. We designed an ontology for formalized the semantics of VCM icons using the Protégé editor and scripts for translating the VCM lexicon in OWL. We evaluated the ability of the ontology to determine icon consistency for a set of 100 random icons. The evaluation showed good results for determining icon consistency, with a high sensitivity. The ontology may also be useful for the design of mapping between VCM and other medical terminologies, for generating textual labels for icons, and for developing user interfaces for creating VCM icons., (Copyright © 2012 Elsevier Inc. All rights reserved.)
- Published
- 2013
- Full Text
- View/download PDF
34. From genome sequencing to bedside. Findings from the section on bioinformatics and translational informatics.
- Author
-
Lecroq T and Soualmia LF
- Subjects
- Genome, Human, Genomics, Humans, Publishing, Computational Biology, Medical Informatics
- Abstract
Objectives: To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain and evidence-based medicine., Method: We provide a synopsis of the articles selected for the IMIA Yearbook 2013, from which we attempt to derive a synthetic overview of current and future activities in the field. Three steps of selection were performed by querying PubMed and Web of Science. A first set of 5,549 articles was refined into a second set of 1,272 articles from which 15 articles were retained for peer-review., Results: The selection and evaluation process of this Yearbook's section on Bioinformatics and Translational Informatics yielded four excellent articles regarding the Human Genome and Medicine. Exploiting genomic data depends on having the appropriate reference annotation available. In the first article, the goal of the GENCODE Consortium is to produce and publish The GENCODE human reference gene set. As a result it is composed by merged manual and automatic annotations, which are frequently updated from public experimental databases. The quality of genome sequencing is platform-dependant. In the second article, a generic database independent from the sequencing technologies, Huvariome, can help to identify errors and inconsistencies in sequencing. To understand complex diseases of patients it will be of great importance to detect rare gene variants. This is the aim of the third study. Finally, in the last article, the plasma's DNA of healthy individual and patients suffering from cancer is compared., Conclusions: The current research activities attest to the continuous convergence of Bioinformatics and Medical Informatics for clinical practice. For instance, a direct use of high throughput sequencing technologies for patients could aid the diagnosis of complex diseases (such as cancer) without invasive surgery (such as biopsy) but only with blood analysis. However, ongoing genomic tests will generate massive amounts of data and will imply new trends in the near future: "Big Data" and smart health management.
- Published
- 2013
35. Multi-lingual search engine to access PubMed monolingual subsets: a feasibility study.
- Author
-
Darmoni SJ, Soualmia LF, Griffon N, Grosjean J, Kerdelhué G, Kergourlay I, and Dahamna B
- Subjects
- Database Management Systems, Feasibility Studies, Information Storage and Retrieval methods, Natural Language Processing, Software, Data Mining methods, Multilingualism, PubMed classification, Search Engine methods, Translating, User-Computer Interface, Vocabulary, Controlled
- Abstract
PubMed contains many articles in languages other than English but it is difficult to find them using the English version of the Medical Subject Headings (MeSH) Thesaurus. The aim of this work is to propose a tool allowing access to a PubMed subset in one language, and to evaluate its performance. Translations of MeSH were enriched and gathered in the information system. PubMed subsets in main European languages were also added in our database, using a dedicated parser. The CISMeF generic semantic search engine was evaluated on the response time for simple queries. MeSH descriptors are currently available in 11 languages in the information system. All the 654,000 PubMed citations in French were integrated into CISMeF database. None of the response times exceed the threshold defined for usability (2 seconds). It is now possible to freely access biomedical literature in French using a tool in French; health professionals and lay people with a low English language may find it useful. It will be expended to several European languages: German, Spanish, Norwegian and Portuguese.
- Published
- 2013
36. Integrating the human phenotype ontology into HeTOP terminology-ontology server.
- Author
-
Grosjean J, Merabti T, Soualmia LF, Letord C, Charlet J, Robinson PN, and Darmoni SJ
- Subjects
- Humans, Phenotype, Systems Integration, Biological Ontologies, Databases, Genetic, Genetic Predisposition to Disease genetics, Medical Record Linkage methods, Natural Language Processing, Terminology as Topic, Vocabulary, Controlled
- Abstract
The Human Phenotype Ontology (HPO) is a controlled vocabulary which provides phenotype data related to genes or diseases. The Health Terminology/Ontology Portal (HeTOP) is a tool dedicated to both human beings and computers to access and browse biomedical terminologies or ontologies (T/O). The objective of this work was to integrate the HPO into HeTOP in order to enhance both works. This integration is a success and allows users to search and browse the HPO with a dedicated interface. Furthermore, the HPO has been enhanced with the addition of content such as new synonyms, translations, mappings. Integrating T/O such as the HPO into HeTOP is a benefit to vocabularies because it allows enrichment of them and it is also a benefit for HeTOP which provides a better service to both humans and machines.
- Published
- 2013
37. Translating MeSH concepts.
- Author
-
Soualmia LF, Letord C, Merabti T, Griffon N, Manel J, and Darmoni SJ
- Subjects
- France, United States, Information Storage and Retrieval methods, Medical Record Linkage methods, Medical Subject Headings, Natural Language Processing, Terminology as Topic, Translating
- Abstract
The concept-oriented structure of the MeSH® thesaurus is not yet in common use. Nevertheless, it has been shown that a concept-based querying of PubMed may be of interest. To take full advantage of the concept-oriented structure of MeSH in the information retrieval tool associated with the CISMeF catalogue, it was necessary to translate such concepts into French.
- Published
- 2013
38. Assisting the translation of SNOMED CT into French.
- Author
-
Merabti T, Soualmia LF, Grosjean J, Letord C, and Darmoni SJ
- Subjects
- Algorithms, France, Medical Record Linkage methods, Natural Language Processing, Semantics, Symbolism, Systematized Nomenclature of Medicine, Translating
- Abstract
The objective of this study is to evaluate to approaches assisting the translation of SNOMED CT into French. Two types of approaches were combined: a concept-based one, which relies on conceptual information of the UMLS Metathesaurus and a lexical-based one, which relieson NLP techniques. In addition to the French terminologies (whether included in UMLS or not). Using the concept-based approach, a set of 156,157 (39.4%) SNOMED CT terms were translated to at least one French term from UMLS. Expanded to the French terms from UMLS terminologies translated by CISMeF, 2,548 (+0.7%) additional SNOMED CT terms were translated to at least one French term. Using the lexical-based approach, a set of 145,737 (36.8%) SNOMED CT terms were translated to at least one French term from HeTOP. The qualitative evaluation showed that 44% of the translations were rated as "relevant". Overall, the two approaches have provided the translation of 168,750 (42.6%) SNOMED CT terms into French using different bilingual terminological sources included in UMLS or in HeTOP.
- Published
- 2013
39. Improving information retrieval using Medical Subject Headings Concepts: a test case on rare and chronic diseases.
- Author
-
Darmoni SJ, Soualmia LF, Letord C, Jaulent MC, Griffon N, Thirion B, and Névéol A
- Subjects
- Algorithms, Chronic Disease, Electronic Data Processing, France, Humans, Information Storage and Retrieval, Language, MEDLINE statistics & numerical data, Quality Control, Rare Diseases, Abstracting and Indexing statistics & numerical data, Databases as Topic statistics & numerical data, Medical Subject Headings statistics & numerical data, Terminology as Topic
- Abstract
Background: As more scientific work is published, it is important to improve access to the biomedical literature. Since 2000, when Medical Subject Headings (MeSH) Concepts were introduced, the MeSH Thesaurus has been concept based. Nevertheless, information retrieval is still performed at the MeSH Descriptor or Supplementary Concept level., Objective: The study assesses the benefit of using MeSH Concepts for indexing and information retrieval., Methods: Three sets of queries were built for thirty-two rare diseases and twenty-two chronic diseases: (1) using PubMed Automatic Term Mapping (ATM), (2) using Catalog and Index of French-language Health Internet (CISMeF) ATM, and (3) extrapolating the MEDLINE citations that should be indexed with a MeSH Concept., Results: Type 3 queries retrieve significantly fewer results than type 1 or type 2 queries (about 18,000 citations versus 200,000 for rare diseases; about 300,000 citations versus 2,000,000 for chronic diseases). CISMeF ATM also provides better precision than PubMed ATM for both disease categories., Discussion: Using MeSH Concept indexing instead of ATM is theoretically possible to improve retrieval performance with the current indexing policy. However, using MeSH Concept information retrieval and indexing rules would be a fundamentally better approach. These modifications have already been implemented in the CISMeF search engine.
- Published
- 2012
- Full Text
- View/download PDF
40. Matching health information seekers' queries to medical terms.
- Author
-
Soualmia LF, Prieur-Gaston E, Moalla Z, Lecroq T, and Darmoni SJ
- Subjects
- Humans, Internet, Language, Medical Informatics instrumentation, Vocabulary, Controlled, Algorithms, Information Storage and Retrieval, Medical Informatics methods
- Abstract
Background: The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool., Methods: In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run., Results: According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction., Conclusions: Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records.
- Published
- 2012
- Full Text
- View/download PDF
41. Translating the Foundational Model of Anatomy into French using knowledge-based and lexical methods.
- Author
-
Merabti T, Soualmia LF, Grosjean J, Palombi O, Müller JM, and Darmoni SJ
- Subjects
- France, Humans, Linguistics, Natural Language Processing, Subject Headings, Unified Medical Language System, Knowledge Bases, Models, Anatomic, Terminology as Topic, Translating, Vocabulary, Controlled
- Abstract
Background: The Foundational Model of Anatomy (FMA) is the reference ontology regarding human anatomy. FMA vocabulary was integrated into the Health Multi Terminological Portal (HMTP) developed by CISMeF based on the CISMeF Information System which also includes 26 other terminologies and controlled vocabularies, mainly in French. However, FMA is primarily in English. In this context, the translation of FMA English terms into French could also be useful for searching and indexing French anatomy resources. Various studies have investigated automatic methods to assist the translation of medical terminologies or create multilingual medical vocabularies. The goal of this study was to facilitate the translation of FMA vocabulary into French., Methods: We compare two types of approaches to translate the FMA terms into French. The first one is UMLS-based on the conceptual information of the UMLS metathesaurus. The second method is lexically-based on several Natural Language Processing (NLP) tools., Results: The UMLS-based approach produced a translation of 3,661 FMA terms into French whereas the lexical approach produced a translation of 3,129 FMA terms into French. A qualitative evaluation was made on 100 FMA terms translated by each method. For the UMLS-based approach, among the 100 translations, 52% were manually rated as "very good" and only 7% translations as "bad". For the lexical approach, among the 100 translations, 47% were rated as "very good" and 20% translations as "bad"., Conclusions: Overall, a low rate of translations were demonstrated by the two methods. The two approaches permitted us to semi-automatically translate 3,776 FMA terms from English into French, this was to added to the existing 10,844 French FMA terms in the HMTP (4,436 FMA French terms and 6,408 FMA terms manually translated).
- Published
- 2011
- Full Text
- View/download PDF
42. Health multi-terminology portal: a semantic added-value for patient safety.
- Author
-
Grosjean J, Merabti T, Dahamna B, Kergourlay I, Thirion B, Soualmia LF, and Darmoni SJ
- Subjects
- Hospital Administration, Humans, Internet, Documentation methods, Information Storage and Retrieval methods, Safety Management organization & administration, Semantics, Terminology as Topic
- Abstract
Since the mid-90s, several quality-controlled health gateways were developed. In France, CISMeF is the leading health gateway. It indexes Internet resources from the main institutions, using the MeSH thesaurus and the Dublin Core metadata element set. Since 2005, the CISMeF Information System (IS) includes 24 health terminologies, classifications and thesauri for indexing and information retrieval. This work aims at creating a Health Multi-Terminology Portal (HMTP) and connect it to the CISMeF Terminology Database mainly for searching concepts and terms among all the health controlled vocabularies available in French (or in English and translated in French) and browsing it dynamically. To integrate the terminologies in the CISMeF IS, three steps are necessary: (1) designing a meta-model into which each terminology can be integrated, (2) developing a process to include terminologies into the HMTP, (3) building and integrating existing and new inter-terminology mappings into the HMTP. A total of 24 terminologies are included in the HMTP, with 575,300 concepts, 852,000 synonyms, 222,800 definitions and 1,180,000 relations. Heightteen of these terminologies are not included yet in the UMLS among them, some from the World Health Organization. Since January 2010, HMTP is daily used by CISMeF librarians to index in multi-terminology mode. A health multiterminology portal is a valuable tool helping the indexing and the retrieval of resources from a quality-controlled patient safety gateway. It can also be very useful for teaching or performing audits in terminology management.
- Published
- 2011
43. Evaluation of multi-terminology super-concepts for information retrieval.
- Author
-
Griffon N, Soualmia LF, Névéol A, Massari P, Thirion B, Dahamna B, and Darmoni SJ
- Subjects
- Algorithms, Catalogs as Topic, Electronic Data Processing, Humans, Internet, Medical Subject Headings, Reproducibility of Results, Software, Statistics as Topic, Terminology as Topic, Abstracting and Indexing, Information Storage and Retrieval methods, Medical Informatics methods
- Abstract
Background: Following a recent change in the indexing policy for French quality controlled health gateway CISMeF, multiple terminologies are now being used for indexing in addition to MeSH®., Objective: To evaluate precision and recall of super-concepts for information retrieval in a multi-terminology paradigm compared to MeSH-only., Methods: We evaluate the relevance of resources retrieved by multi-terminology super-concepts and MeSH-only super-concepts queries., Results: Recall was 8-14% higher for multi-terminology super-concepts compared to MeSH only super-concepts. Precision decreased from 0.66 for MeSH only super-concepts to 0.61 for multi-terminology super-concepts. Retrieval performance was found to vary significantly depending on the super-concepts (p<10
-4 ) and indexing methods (manual vs automatic; p<0.004)., Conclusion: A multi-terminology paradigm contributes to increase recall but lowers precision. Automated tools for indexing are not accurate enough to allow a very precise information retrieval.- Published
- 2011
44. Mining knowledge from corpora: an application to retrieval and indexing.
- Author
-
Soualmia LF, Dahamna B, and Darmoni S
- Subjects
- Expert Systems, France, Humans, Libraries, Medical, Medical Subject Headings, Natural Language Processing, Unified Medical Language System, Vocabulary, Controlled, Abstracting and Indexing, Consumer Health Information, Information Storage and Retrieval, Internet
- Abstract
Unlabelled: The present work aims at discovering new associations between medical concepts to be exploited as input in retrieval and indexing., Material and Methods: Association rules method is applied to documents. The process is carried out on three major document categories referring to e-health information consumers: health professionals, students and lay people. Association rules evaluation is founded on statistical measures combined with domain knowledge., Results: Association rules represent existing relations between medical concepts (60.62%) and new knowledge (54.21%). Based on observations, 463 expert rules are defined by medical librarians for retrieval and indexing., Conclusions: Association rules bear out existing relations, produce new knowledge and support users and indexers in document retrieval and indexing.
- Published
- 2008
45. A MEDLINE categorization algorithm.
- Author
-
Darmoni SJ, Névéol A, Renard JM, Gehanno JF, Soualmia LF, Dahamna B, and Thirion B
- Subjects
- Bibliometrics, France, Internet, Libraries, Medical, Periodicals as Topic, Semantics, Abstracting and Indexing, Algorithms, Information Storage and Retrieval methods, MEDLINE, Medical Subject Headings, Medicine classification, Specialization, Terminology as Topic
- Abstract
Background: Categorization is designed to enhance resource description by organizing content description so as to enable the reader to grasp quickly and easily what are the main topics discussed in it. The objective of this work is to propose a categorization algorithm to classify a set of scientific articles indexed with the MeSH thesaurus, and in particular those of the MEDLINE bibliographic database. In a large bibliographic database such as MEDLINE, finding materials of particular interest to a specialty group, or relevant to a particular audience, can be difficult. The categorization refines the retrieval of indexed material. In the CISMeF terminology, metaterms can be considered as super-concepts. They were primarily conceived to improve recall in the CISMeF quality-controlled health gateway., Methods: The MEDLINE categorization algorithm (MCA) is based on semantic links existing between MeSH terms and metaterms on the one hand and between MeSH subheadings and metaterms on the other hand. These links are used to automatically infer a list of metaterms from any MeSH term/subheading indexing. Medical librarians manually select the semantic links., Results: The MEDLINE categorization algorithm lists the medical specialties relevant to a MEDLINE file by decreasing order of their importance. The MEDLINE categorization algorithm is available on a Web site. It can run on any MEDLINE file in a batch mode. As an example, the top 3 medical specialties for the set of 60 articles published in BioMed Central Medical Informatics & Decision Making, which are currently indexed in MEDLINE are: information science, organization and administration and medical informatics., Conclusion: We have presented a MEDLINE categorization algorithm in order to classify the medical specialties addressed in any MEDLINE file in the form of a ranked list of relevant specialties. The categorization method introduced in this paper is based on the manual indexing of resources with MeSH (terms/subheadings) pairs by NLM indexers. This algorithm may be used as a new bibliometric tool.
- Published
- 2006
- Full Text
- View/download PDF
46. Strategies for health information retrieval.
- Author
-
Soualmia LF, Dahamna B, Thirion B, and Darmoni SJ
- Subjects
- France, Internet, Terminology as Topic, Information Storage and Retrieval methods, Medical Informatics
- Abstract
Background: The amount of health data accessible on the Web is increasing and Internet has become a major source of health information. Many tools and search engines are available but medical information retrieval remains difficult for both the health professional and the patients., Objective: In this paper we describe heuristics that aim at matching as much as possible queries with the content of the documents in the context of the CISMeF catalogue (Catalogue and Index of Health Resources in French) and its Doc'CISMeF search tool. The queries are represented by terms and the content of the documents is indexed by a terminology based on the MeSH thesaurus., Results: Several operations are performed to match the terms of the terminology: natural language processing techniques on multi-words queries, phonemisation, spelling correction, plain text search with adjacency etc.. Each one is tested to evaluate its contribution in matching the terminology and the indexed documents., Conclusion: The implemented heuristics contribute significantly with good results in maximising as much as possible the recall of the Doc'CISMeF search tool.
- Published
- 2006
47. A method of cross-lingual consumer health information retrieval.
- Author
-
Névéol A, Pereira S, Soualmia LF, Thirion B, and Darmoni SJ
- Subjects
- France, MEDLINE, Information Storage and Retrieval methods, Medical Informatics, Multilingualism
- Abstract
Objectives: This paper presents a method of cross-language information retrieval aiming to make medical information available to patients in French and English, regardless of the query language they wish to use., Methods: We describe the two MeSH-related terminologies used in this work. We show that the French patient synonyms included in CISMeF can be automatically mapped to the English consumer-oriented health topics used in MEDLINEplus, via the MeSH thesaurus. The links between French and English patient terms thus inferred can subsequently be exploited to automatically translate patient queries., Results: 129 MEDLINEplus topics have been mapped to 142 CISMeF patient synonyms. Contextual links for cross-language retrieval have been added to the patient dedicated French information Gateway CISMeF., Conclusion: we have presented an efficient method for cross-lingual patient information retrieval in French and English, which may also be applied to other language pairs, subject to the availability of patient terminologies and of the MeSH thesaurus in these languages.
- Published
- 2006
48. Combining different standards and different approaches for health information retrieval in a quality-controlled gateway.
- Author
-
Soualmia LF and Darmoni SJ
- Subjects
- France, Medical Subject Headings, Information Storage and Retrieval standards, Internet
- Abstract
Internet as source of information is increasing in preeminence in numerous fields, including health. We describe in this paper the CISMeF project (acronym of Catalogue and Index of French-speaking Medical Sites) which has been designed to help the health information consumers and health professionals to find what they are looking for among the numerous health documents available online. The catalogue is founded on two standards: a set of metadata and a terminology based on the MeSH thesaurus which has the same structure and use as an ontology of the medical domain. The structure of the catalogue allows us to place the project at an overlap between the present Web, which is informal, and the forthcoming Semantic Web. Many features of information retrieval and navigation through the catalogue were developed. These features take into account the kind of the end-user (health professional, medical student, patient). The CISMeF-patients catalogue is a sub-catalogue of CISMeF and is dedicated to the patients and the general public. It shares the same model as CISMeF whereas MEDLINE and MedlinePlus do not. We also propose to couple two approaches (morphological processing and data mining) to help the users by correcting and refining their queries.
- Published
- 2005
- Full Text
- View/download PDF
49. Enhancing the MeSH thesaurus to retrieve French online health resources in a quality-controlled gateway.
- Author
-
Douyère M, Soualmia LF, Névéol A, Rogozan A, Dahamna B, Leroy JP, Thirion B, and Darmoni SJ
- Subjects
- France, Humans, Language, Quality Control, Semantics, Abstracting and Indexing, Health Resources, Information Storage and Retrieval, Internet, MEDLINE, Medical Subject Headings, Online Systems, Terminology as Topic
- Abstract
The amount of health information available on the Internet is considerable. In this context, several health gateways have been developed. Among them, CISMeF (Catalogue and Index of Health Resources in French) was designed to catalogue and index health resources in French. The goal of this article is to describe the various enhancements to the MeSH thesaurus developed by the CISMeF team to adapt this terminology to the broader field of health Internet resources instead of scientific articles for the medline bibliographic database. CISMeF uses two standard tools for organizing information: the MeSH thesaurus and several metadata element sets, in particular the Dublin Core metadata format. The heterogeneity of Internet health resources led the CISMeF team to enhance the MeSH thesaurus with the introduction of two new concepts, respectively, resource types and metaterms. CISMeF resource types are a generalization of the publication types of medline. A resource type describes the nature of the resource and MeSH keyword/qualifier pairs describe the subject of the resource. A metaterm is generally a medical specialty or a biological science, which has semantic links with one or more MeSH keywords, qualifiers and resource types. The CISMeF terminology is exploited for several tasks: resource indexing performed manually, resource categorization performed automatically, visualization and navigation through the concept hierarchies and information retrieval using the Doc'CISMeF search engine. The CISMeF health gateway uses several MeSH thesaurus enhancements to optimize information retrieval, hierarchy navigation and automatic indexing.
- Published
- 2004
- Full Text
- View/download PDF
50. Using CISMeF MeSH "Encapsulated" terminology and a categorization algorithm for health resources.
- Author
-
Névéol A, Soualmia LF, Douyère M, Rogozan A, Thirion B, and Darmoni SJ
- Subjects
- Algorithms, Electronic Data Processing, Expert Systems, France, Humans, Information Storage and Retrieval standards, Internet, Medicine, National Library of Medicine (U.S.), Software, Specialization, United States, User-Computer Interface, Abstracting and Indexing methods, Catalogs, Library, Databases, Bibliographic, Information Storage and Retrieval methods, Subject Headings
- Abstract
Introduction: CISMeF is a Quality Controlled Health Gateway using a terminology based on the Medical Subject Headings (MeSH) thesaurus that displays medical specialties (metaterms) and the relationships existing between them and MeSH terms., Objective: The need to classify the resources within the catalogue has led us to combine this type of semantic information with domain expert knowledge for health resources categorization purposes., Material and Methods: A two-step categorization process consisting of mapping resource keywords to CISMeF metaterms and ranking metaterms by decreasing coverage in the resource has been developed. We evaluate this algorithm on a random set of 123 resources extracted from the CISMeF catalogue. Our gold standard for this evaluation is the manual classification provided by a domain expert, viz. a librarian of the team., Results: The CISMeF algorithm shows 81% precision and 93% recall, and 62% of the resources were assigned a "fully relevant" or "fairly relevant" categorization according to strict standards., Discussion: A thorough analysis of the results has enabled us to find gaps in the knowledge modeling of the CISMeF terminology. The necessary adjustments having been made, the algorithm is currently used in CISMeF for resource categorization.
- Published
- 2004
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.