1. Advancing pharmacogenomics research: automated extraction of insights from PubMed using SpaCy NLP framework.
- Author
-
Dos Reis EC, Caneppa S, Vasconcelos P, and de Lima Santos PCJ
- Abstract
This paper presents a methodology for automatically extracting insights from PubMed articles using a Natural Language Processing (NLP) framework. Our approach, leveraging advanced NLP techniques and Named Entity Recognition (NER), is crucial for advancing pharmacogenomics and other scientific fields that benefit from streamlined access to literature through automated services like RESTful APIs.Building a new NLP model presents several challenges. First, it is essential to have a thorough understanding of the field in order to define relevant entities. Second, the construction of a diverse and consistent set of examples is crucial. Finally, the effective utilization of pre-established models is of paramount importance, as demonstrated in this work.Our model, validated via ten-fold cross-validation, achieved over 70% recall and precision for all entities in the training set. We provide a reproducible pipeline for the scientific community and propose a structured approach for qualitative analysis and clustering of results. This methodology refines literature reviews, optimizes knowledge extraction, and supports broader application across diverse research domains. An online platform could further extend these benefits to researchers, educators, and practitioners.
- Published
- 2024
- Full Text
- View/download PDF