219 results on '"clinical text"'
Search Results
2. ICH-PRNet: a cross-modal intracerebral haemorrhage prognostic prediction method using joint-attention interaction mechanism
- Author
-
Yu, Xinlei, Elazab, Ahmed, Ge, Ruiquan, Zhu, Jichao, Zhang, Lingyan, Jia, Gangyong, Wu, Qing, Wan, Xiang, Li, Lihua, and Wang, Changmiao
- Published
- 2025
- Full Text
- View/download PDF
3. Reshaping free-text radiology notes into structured reports with generative question answering transformers
- Author
-
Bergomi, Laura, Buonocore, Tommaso M., Antonazzo, Paolo, Alberghi, Lorenzo, Bellazzi, Riccardo, Preda, Lorenzo, Bortolotto, Chandra, and Parimbelli, Enea
- Published
- 2024
- Full Text
- View/download PDF
4. Enhancing the Efficiency of Lung Disease Classification Based on Multi-modal Fusion Model
- Author
-
Truong, Thi-Diem, Huynh, Phuoc-Hai, Nguyen, Van Hoa, Do, Thanh-Nghi, Li, Gang, Series Editor, Filipe, Joaquim, Series Editor, Ghosh, Ashish, Series Editor, Xu, Zhiwei, Series Editor, Thai-Nghe, Nguyen, editor, Do, Thanh-Nghi, editor, and Benferhat, Salem, editor
- Published
- 2025
- Full Text
- View/download PDF
5. End-to-end pseudonymization of fine-tuned clinical BERT models
- Author
-
Thomas Vakili, Aron Henriksson, and Hercules Dalianis
- Subjects
Natural language processing ,Language models ,BERT ,Electronic health records ,Clinical text ,De-identification ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Many state-of-the-art results in natural language processing (NLP) rely on large pre-trained language models (PLMs). These models consist of large amounts of parameters that are tuned using vast amounts of training data. These factors cause the models to memorize parts of their training data, making them vulnerable to various privacy attacks. This is cause for concern, especially when these models are applied in the clinical domain, where data are very sensitive. Training data pseudonymization is a privacy-preserving technique that aims to mitigate these problems. This technique automatically identifies and replaces sensitive entities with realistic but non-sensitive surrogates. Pseudonymization has yielded promising results in previous studies. However, no previous study has applied pseudonymization to both the pre-training data of PLMs and the fine-tuning data used to solve clinical NLP tasks. This study evaluates the effects on the predictive performance of end-to-end pseudonymization of Swedish clinical BERT models fine-tuned for five clinical NLP tasks. A large number of statistical tests are performed, revealing minimal harm to performance when using pseudonymized fine-tuning data. The results also find no deterioration from end-to-end pseudonymization of pre-training and fine-tuning data. These results demonstrate that pseudonymizing training data to reduce privacy risks can be done without harming data utility for training PLMs.
- Published
- 2024
- Full Text
- View/download PDF
6. End-to-end pseudonymization of fine-tuned clinical BERT models: Privacy preservation with maintained data utility.
- Author
-
Vakili, Thomas, Henriksson, Aron, and Dalianis, Hercules
- Subjects
LANGUAGE models ,DATA privacy ,PRIVACY ,NATURAL language processing - Abstract
Many state-of-the-art results in natural language processing (NLP) rely on large pre-trained language models (PLMs). These models consist of large amounts of parameters that are tuned using vast amounts of training data. These factors cause the models to memorize parts of their training data, making them vulnerable to various privacy attacks. This is cause for concern, especially when these models are applied in the clinical domain, where data are very sensitive. Training data pseudonymization is a privacy-preserving technique that aims to mitigate these problems. This technique automatically identifies and replaces sensitive entities with realistic but non-sensitive surrogates. Pseudonymization has yielded promising results in previous studies. However, no previous study has applied pseudonymization to both the pre-training data of PLMs and the fine-tuning data used to solve clinical NLP tasks. This study evaluates the effects on the predictive performance of end-to-end pseudonymization of Swedish clinical BERT models fine-tuned for five clinical NLP tasks. A large number of statistical tests are performed, revealing minimal harm to performance when using pseudonymized fine-tuning data. The results also find no deterioration from end-to-end pseudonymization of pre-training and fine-tuning data. These results demonstrate that pseudonymizing training data to reduce privacy risks can be done without harming data utility for training PLMs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Cross-Lingual Name Entity Recognition from Clinical Text Using Mixed Language Query
- Author
-
Shi, Kunli, Chen, Gongchi, Gu, Jinghang, Qian, Longhua, Zhou, Guodong, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Xu, Hua, editor, Chen, Qingcai, editor, Lin, Hongfei, editor, Wu, Fei, editor, Liu, Lei, editor, Tang, Buzhou, editor, Hao, Tianyong, editor, and Huang, Zhengxing, editor
- Published
- 2024
- Full Text
- View/download PDF
8. Modeling disagreement in automatic data labeling for semi-supervised learning in Clinical Natural Language Processing
- Author
-
Hongshu Liu, Nabeel Seedat, and Julia Ive
- Subjects
automated labeling ,clinical text ,Natural Language Processing ,radiology ,semi-supervised learning ,uncertainty ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
IntroductionComputational models providing accurate estimates of their uncertainty are crucial for risk management associated with decision-making in healthcare contexts. This is especially true since many state-of-the-art systems are trained using the data which have been labeled automatically (self-supervised mode) and tend to overfit.MethodsIn this study, we investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports. This problem remains understudied for Natural Language Processing in the healthcare domain.ResultsWe demonstrate that Gaussian Processes (GPs) provide superior performance in quantifying the risks of three uncertainty labels based on the negative log predictive probability (NLPP) evaluation metric and mean maximum predicted confidence levels (MMPCL), whilst retaining strong predictive performance.DiscussionOur conclusions highlight the utility of probabilistic models applied to “noisy” labels and that similar methods could provide utility for Natural Language Processing (NLP) based automated labeling tasks.
- Published
- 2024
- Full Text
- View/download PDF
9. Semantic Web Techniques for Clinical Topic Detection in Health Care.
- Author
-
RAMAN, R., SAHAYARAJ, Kishore Anthuvan, SONI, Mukesh, NAYAK, Nihar Ranjan, GOVINDARAJ, Ramya, and SINGH, Nikhil Kumar
- Subjects
MEDICAL technology ,SEMANTICS ,MEDICAL care ,MICROBLOGS ,TIME series analysis - Abstract
The scope of this paper is that it investigates and proposes a new clustering method that takes into account the timing characteristics of frequently used feature words and the semantic similarity of microblog short texts as well as designing and implementing microblog topic detection and detection based on clustering results. The aim of the proposed research is to provide a new cluster overlap reduction method based on the divisions of semantic memberships to solve limited semantic expression and diversify short microblog contents. First, by defining the time-series frequent word set of the microblog text, a feature word selection method for hot topics is given; then, for the existence of initial clusters, according to the time-series recurring feature word set, to obtain the initial clustering of the microblog. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Entity normalization in a Spanish medical corpus using a UMLS-based lexicon: findings and limitations
- Author
-
Báez, Pablo, Campillos-Llanos, Leonardo, Núñez, Fredy, and Dunstan, Jocelyn
- Published
- 2024
- Full Text
- View/download PDF
11. Natural Language Processing and Text Mining (Turning Unstructured Data into Structured)
- Author
-
Bagheri, Ayoub, Giachanou, Anastasia, Mosteiro, Pablo, Verberne, Suzan, Asselbergs, Folkert W., editor, Denaxas, Spiros, editor, Oberski, Daniel L., editor, and Moore, Jason H., editor
- Published
- 2023
- Full Text
- View/download PDF
12. On the Impact of the Vocabulary for Domain-Adaptive Pretraining of Clinical Language Models
- Author
-
Lamproudis, Anastasios, Henriksson, Aron, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Roque, Ana Cecília A., editor, Gracanin, Denis, editor, Lorenz, Ronny, editor, Tsanas, Athanasios, editor, Bier, Nathalie, editor, Fred, Ana, editor, and Gamboa, Hugo, editor
- Published
- 2023
- Full Text
- View/download PDF
13. Clinical Abbreviation Disambiguation Using Clinical Variants of BERT
- Author
-
Wagh, Atharwa, Khanna, Manju, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Morusupalli, Raghava, editor, Dandibhotla, Teja Santosh, editor, Atluri, Vani Vathsala, editor, Windridge, David, editor, Lingras, Pawan, editor, and Komati, Venkateswara Rao, editor
- Published
- 2023
- Full Text
- View/download PDF
14. Temporal Relation Extraction from Clinical Texts Using Knowledge Graphs
- Author
-
Knez, Timotej, Žitnik, Slavko, van der Aalst, Wil, Series Editor, Ram, Sudha, Series Editor, Rosemann, Michael, Series Editor, Szyperski, Clemens, Series Editor, Guizzardi, Giancarlo, Series Editor, Nurcan, Selmin, editor, Opdahl, Andreas L., editor, Mouratidis, Haralambos, editor, and Tsohou, Aggeliki, editor
- Published
- 2023
- Full Text
- View/download PDF
15. A Hybrid Model for Prediction and Progression of COVID-19 Using Clinical Text Data and Chest X-rays
- Author
-
Devan, Swetha V., Lakshmi, K. S., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Smys, S., editor, Balas, Valentina Emilia, editor, and Palanisamy, Ram, editor
- Published
- 2022
- Full Text
- View/download PDF
16. Clinical Named Entity Recognition Methods: An Overview
- Author
-
Pagad, Naveen S., Pradeep, N., Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Khanna, Ashish, editor, Gupta, Deepak, editor, Bhattacharyya, Siddhartha, editor, Hassanien, Aboul Ella, editor, Anand, Sameer, editor, and Jaiswal, Ajay, editor
- Published
- 2022
- Full Text
- View/download PDF
17. Clinical Text Classification with Word Representation Features and Machine Learning Algorithms.
- Author
-
Almazaydeh, Laiali, Abuhelaleh, Mohammed, Al Tawil, Arar, and Elleithy, Khaled
- Subjects
MACHINE learning ,NAIVE Bayes classification ,K-nearest neighbor classification ,SUPPORT vector machines ,ELECTRONIC health records ,MEDICAL transcription ,MEDICAL coding - Abstract
Clinical text classification of electronic medical records is a challenging task. Existing electronic records suffer from irrelevant text, misspellings, semantic ambiguity, and abbreviations. The approach reported in this paper elaborates on machine learning techniques to develop an intelligent framework for classification of the medical transcription dataset. The proposed approach is based on four main phases: the text preprocessing phase, word representation phase, features reduction phase and classification phase. We have used four machine learning algorithms, support vector machines, naïve bayes, logistic regression and k-nearest neighbors in combination with different word representation models. We have applied the four algorithms to the bag of words, to TF-IDF, to word2vec. Experimental results were evaluated based on precision, recall, accuracy and F1 score. The best results were obtained with the combination of the k-NN classifier, and the word represented by Word2vec achieving an accuracy of 92% to correctly classify the medical specialties based on the transcription text. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Automatic Diagnosis of COVID-19 Patients from Unstructured Data Based on a Novel Weighting Scheme.
- Author
-
Mahdi, Amir Yasseen and Yuhaniz, Siti Sophiayati
- Subjects
COVID-19 ,COVID-19 testing ,VIRUS diseases ,PUBLIC hospitals ,FEATURE extraction - Abstract
The extraction of features fromunstructured clinical data of Covid-19 patients is critical for guiding clinical decision-making and diagnosing this viral disease. Furthermore, an early and accurate diagnosis of COVID-19 can reduce the burden on healthcare systems. In this paper, an improved Term Weighting technique combined with Parts-Of-Speech (POS) Tagging is proposed to reduce dimensions for automatic and effective classification of clinical text related to Covid-19 disease. Term Frequency-Inverse Document Frequency (TF-IDF) is the most often used term weighting scheme (TWS). However, TF-IDF has several developments to improve its drawbacks, in particular, it is not efficient enough to classify text by assigning effective weights to the terms in unstructured data. In this research, we proposed a modification term weighting scheme: RTF-C-IEF and compare the proposed model with four extraction methods: TF, TF-IDF, TF-IHF, and TF-IEF. The experiment was conducted on two new datasets for COVID-19 patients. The first datasetwas collected from government hospitals in Iraq with 3053 clinical records, and the second dataset with 1446 clinical reports, was collected from several different websites. Based on the experimental results using several popular classifiers applied to the datasets of Covid-19, we observe that the proposed scheme RTF-C-IEF achieves is a consistent performer with the best scores in most of the experiments. Further, the modifiedRTF-C-IEF proposed in the study outperformed the original scheme and other employed term weighting methods in most experiments. Thus, the proper selection of term weighting scheme among the different methods improves the performance of the classifier and helps to find the informative term. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. Bi-LSTM-CRF Network for Clinical Event Extraction With Medical Knowledge Features
- Author
-
Shunli Zhang, Yancui Li, Shiyong Li, and Fang Yan
- Subjects
Clinical text ,entity recognition ,deep learning ,natural language processing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Extracting clinical event expressions and their types from clinical text is a fundamental task for many applications in clinical NLP. State-of-the-art systems need handcraft features and do not take into account the representation of the low-frequency words. To address these issues, a Bi-LSTM-CRF neural network architecture based on medical knowledge features is proposed. First, we employ convolutional neural networks (CNNs) to encode character-level information of a word and extract medical knowledge features from an open-source clinical knowledge system. Then, we concatenate character-level and word-level embedding and the medical knowledge features of words together, and feed them into bi-directional long short-term memory (Bi-LSTM) to build context information of each word. Finally, we jointly use a conditional random field (CRF) to decode labels for the whole sentence. We evaluate our model on two publicly available clinical datasets, namely THYME corpus and 2012 i2b2 dataset. Experimental results show that our model outperforms previous state-of-the-art systems with different methodologies, including machine learning-based methods, deep learning-based methods, and Bert-based methods.
- Published
- 2022
- Full Text
- View/download PDF
20. Temporal disambiguation of relative temporal expressions in clinical texts
- Author
-
Amy L. Olex and Bridget T. McInnes
- Subjects
natural language processing ,temporal reasoning ,temporal expression recognition and normalization ,clinical text ,relative temporal expression ,error analysis ,Bibliography. Library science. Information resources - Abstract
Temporal expression recognition and normalization (TERN) is the foundation for all higher-level temporal reasoning tasks in natural language processing, such as timeline extraction, so it must be performed well to limit error propagation. Achieving new heights in state-of-the-art performance for TERN in clinical texts requires knowledge of where current systems struggle. In this work, we summarize the results of a detailed error analysis for three top performing state-of-the-art TERN systems that participated in the 2012 i2b2 Clinical Temporal Relation Challenge, and compare our own home-grown system Chrono to identify specific areas in need of improvement. Performance metrics and an error analysis reveal that all systems have reduced performance in normalization of relative temporal expressions, specifically in disambiguating temporal types and in the identification of the correct anchor time. To address the issue of temporal disambiguation we developed and integrated a module into Chrono that utilizes temporally fine-tuned contextual word embeddings to disambiguate relative temporal expressions. Chrono now achieves state-of-the-art performance for temporal disambiguation of relative temporal expressions in clinical text, and is the only TERN system to output dual annotations into both TimeML and SCATE schemes.
- Published
- 2022
- Full Text
- View/download PDF
21. Using BART to Automatically Generate Discharge Summaries from Swedish Clinical Text
- Author
-
Berg, Nils, Dalianis, Hercules, Berg, Nils, and Dalianis, Hercules
- Abstract
Documentation is a regular part of contemporary healthcare practices and one such documentation task is the creation of a discharge summary, which summarizes a care episode. However, to manually write discharge summaries is a time-consuming task, and research has shown that discharge summaries are often lacking quality in various respects. To alleviate this problem, text summarization methods could be applied on text from electronic health records, such as patient notes, to automatically create a discharge summary. Previous research has been conducted on this topic on text in various languages and with various methods, but no such research has been conducted on Swedish text. In this paper, four data sets extracted from a Swedish clinical corpora were used to fine-tune four BART language models to perform the task of summarizing Swedish patient notes into a discharge summary. Out of these models, the best performing model was manually evaluated by a senior, now retired, nurse and clinical coder. The evaluation results show that the best performing model produces discharge summaries of overall low quality. This is possibly due to issues in the data extracted from the Health Bank research infrastructure, which warrants further work on this topic.
- Published
- 2024
22. Improving Medical Care for Adults with Intellectual Disabilities (ID): Can Automated Processing of Electronic Health Record Clinical Text Assist in ID Detection Among the General Outpatient Population?
- Author
-
Rijs, Joyce (author) and Rijs, Joyce (author)
- Abstract
Background: Undetected Intellectual Disability (ID) can lead to chronic stress due to overestimation by society. Chronic stress can cause stress-related health issues, like hypertension, chronic fatigue and abdominal complaints. When a physician (General Practitioner (GP) or medical specialist) does not recognize that a patient has ID, the relation with stress may go unnoticed. In that case, the complaint is often treated as a purely somatic problem, while the underlying cause (overestimation due to unrecognized ID) remains untreated. This can increase healthcare consumption and impair the patient’s quality of life. While physicians with ID-expertise can recognize subtle signs of mild ID, physicians without extensive experience will easily overlook the ID. To improve medical care for patients with ID, we aim to improve ID detection among physicians. As it is not feasible to give all individual doctors an ‘ID-recognition training’, we study the possibility of using AI to improve ID detection. In the past years, we have been working on an ‘ID Alert’ (IDA) using ML. In previous phases of the IDA project, structured Electronic Health Record (EHR) data was used for the creation of an IDA. In addition, in the current study, we investigate the use of unstructured EHR data (clinical text). Methods: We analyzed unstructured correspondence files of 200 ID-adults and 200 non-ID adults of Novicare, an organization that provides multidisciplinary care to clients with complex and chronic conditions in intra- and extramural settings. Structured clinical data was unavailable. Therefore, we used an automated method of text extraction, de-identification and two types of feature extraction (bag-of-words and clinical concept extraction). Features were compared between ID-adults and non-ID adults. Significant features that were unlikely to be intrinsically different between ID- and non-ID adults were excluded. The remaining significant features were used for the traini, TM30004; 35 ECTS, Technical Medicine
- Published
- 2024
23. Transformer-based active learning for multi-class text annotation and classification.
- Author
-
Afzal M, Hussain J, Abbas A, Hussain M, Attique M, and Lee S
- Abstract
Objective: Data-driven methodologies in healthcare necessitate labeled data for effective decision-making. However, medical data, particularly in unstructured formats, such as clinical notes, often lack explicit labels, making manual annotation challenging and tedious., Methods: This paper introduces a novel deep active learning framework designed to facilitate the annotation process for multiclass text classification, specifically using the SOAP (subjective, objective, assessment, plan) framework, a widely recognized medical protocol. Our methodology leverages transformer-based deep learning techniques to automatically annotate clinical notes, significantly easing the manual labor involved and enhancing classification performance. Transformer-based deep learning models, with their ability to capture complex patterns in large datasets, represent a cutting-edge approach for advancing natural language processing tasks., Results: We validate our approach through experiments on a diverse set of clinical notes from publicly available datasets, comprising over 426 documents. Our model demonstrates superior classification accuracy, with an F1 score improvement of 4.8% over existing methods but also provides a practical tool for healthcare professionals, potentially improving clinical documentation practices and patient care., Conclusions: The research underscores the synergy between active learning and advanced deep learning, paving the way for future exploration of automatic text annotation and its implications for clinical informatics. Future studies will aim to integrate multimodal data and large language models to enhance the richness and accuracy of clinical text analysis, opening new pathways for comprehensive healthcare insights., Competing Interests: The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article., (© The Author(s) 2024.)
- Published
- 2024
- Full Text
- View/download PDF
24. Analysis of Medical Documents with Text Mining and Association Rule Mining
- Author
-
Reátegui, Ruth, Ratté, Sylvie, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Rocha, Álvaro, editor, Ferrás, Carlos, editor, and Paredes, Manolo, editor
- Published
- 2019
- Full Text
- View/download PDF
25. A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts.
- Author
-
Bose, Priyankar, Srinivasan, Sriram, Sleeman IV, William C., Palta, Jatinder, Kapoor, Rishabh, and Ghosh, Preetam
- Subjects
DATA mining ,ELECTRONIC health records ,EXTRACTION techniques ,TASK performance ,NATURAL language processing ,TEXT messages - Abstract
Significant growth in Electronic Health Records (EHR) over the last decade has provided an abundance of clinical text that is mostly unstructured and untapped. This huge amount of clinical text data has motivated the development of new information extraction and text mining techniques. Named Entity Recognition (NER) and Relationship Extraction (RE) are key components of information extraction tasks in the clinical domain. In this paper, we highlight the present status of clinical NER and RE techniques in detail by discussing the existing proposed NLP models for the two tasks and their performances and discuss the current challenges. Our comprehensive survey on clinical NER and RE encompass current challenges, state-of-the-art practices, and future directions in information extraction from clinical text. This is the first attempt to discuss both of these interrelated topics together in the clinical context. We identified many research articles published based on different approaches and looked at applications of these tasks. We also discuss the evaluation metrics that are used in the literature to measure the effectiveness of the two these NLP methods and future research directions. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. Limitations of Transformers on Clinical Text Classification.
- Author
-
Gao, Shang, Alawad, Mohammed, Young, M. Todd, Gounley, John, Schaefferkoetter, Noah, Yoon, Hong Jun, Wu, Xiao-Cheng, Durbin, Eric B., Doherty, Jennifer, Stroup, Antoinette, Coyle, Linda, and Tourassi, Georgia
- Subjects
CONVOLUTIONAL neural networks ,NATURAL language processing ,CLASSIFICATION ,DEEP learning ,DEFAULT (Finance) - Abstract
Bidirectional Encoder Representations from Transformers (BERT) and BERT-based approaches are the current state-of-the-art in many natural language processing (NLP) tasks; however, their application to document classification on long clinical texts is limited. In this work, we introduce four methods to scale BERT, which by default can only handle input sequences up to approximately 400 words long, to perform document classification on clinical texts several thousand words long. We compare these methods against two much simpler architectures – a word-level convolutional neural network and a hierarchical self-attention network – and show that BERT often cannot beat these simpler baselines when classifying MIMIC-III discharge summaries and SEER cancer pathology reports. In our analysis, we show that two key components of BERT – pretraining and WordPiece tokenization – may actually be inhibiting BERT's performance on clinical text classification tasks where the input document is several thousand words long and where correctly identifying labels may depend more on identifying a few key words or phrases rather than understanding the contextual meaning of sequences of text. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
27. Automatic Extraction and Decryption of Abbreviations from Domain-Specific Texts.
- Author
-
EGOROV, Michil and FUNKNER, Anastasia
- Abstract
This paper explores the problems of extraction and decryption of abbreviations from domain-specific texts in Russian. The main focus are unstructured electronic medical records which pose specific preprocessing problems. The major challenge is that there is no uniform way to write medical histories. The aim of the paper is to generalize the way of decrypting abbreviations from any variant of text. A dataset of nearly three million medical records was collected. A classifier model was trained in order to extract and decrypt abbreviations. After testing the proposed method with 224,307 records, the model showed an F1 score of 93.7% on a valid dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
28. An efficient prototype method to identify and correct misspellings in clinical text
- Author
-
T. Elizabeth Workman, Yijun Shao, Guy Divita, and Qing Zeng-Treitler
- Subjects
Spelling analysis ,Spelling correction ,Clinical text ,Word embeddings ,Word2Vec ,Medicine ,Biology (General) ,QH301-705.5 ,Science (General) ,Q1-390 - Abstract
Abstract Objective Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications. Results In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types.
- Published
- 2019
- Full Text
- View/download PDF
29. Incorporating Domain Knowledge into Natural Language Inference on Clinical Texts
- Author
-
Mingming Lu, Yu Fang, Fengqi Yan, and Maozhen Li
- Subjects
Attention mechanism ,clinical text ,medical domain knowledge ,natural language inference ,word representation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Making inference on clinical texts is a task which has not been fully studied. With the newly released, expert annotated MedNLI dataset, this task is being boosted. Compared with open domain data, clinical texts present unique linguistic phenomena, e.g., a large number of medical terms and abbreviations, different written forms for the same medical concept, which make inference much harder. Incorporating domain-specific knowledge is a way to eliminate this problem, in this paper, we assemble a new incorporating medical concept definitions module on the classic enhanced sequential inference model (ESIM), which first extracts the most relevant medical concept for each word, if it exists, then encodes the definition of this medical concept with a bidirectional long short-term network (BiLSTM) to obtain domain-specific definition representations, and attends these definition representations over vanilla word embeddings. The empirical evaluations are conducted to demonstrate that our model improves the prediction performance and achieves a high level of accuracy on the MedNLI dataset. Specifically, the knowledge enhanced word representations contribute significantly to entailment class.
- Published
- 2019
- Full Text
- View/download PDF
30. Extraction of Temporal Events from Clinical Text Using Semi-supervised Conditional Random Fields
- Author
-
Moharasan, Gandhimathi, Ho, Tu-Bao, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Tan, Ying, editor, Takagi, Hideyuki, editor, and Shi, Yuhui, editor
- Published
- 2017
- Full Text
- View/download PDF
31. ContextMEL: Classifying Contextual Modifiers in Clinical Text.
- Author
-
Chocrón, Paula, Abella, Álvaro, and de Maeztu, Gabriel
- Subjects
NATURAL language processing ,COMPUTATIONAL linguistics ,ELECTRONIC health records ,DEEP learning ,ALGORITHMS ,MEDICAL records - Abstract
Taking advantage of electronic health records in clinical research requires the development of natural language processing tools to extract data from unstructured text in different languages. A key task is the detection of contextual modifiers, such as understanding whether a concept is negated or if it belongs to the past. We present ContextMEL, a method to build classifiers for contextual modifiers that is independent of the specific task and the language, allowing for a fast model development cycle. ContextMEL uses annotation by experts to build a curated dataset, and state-of-the-art deep learning architectures to train models with it. We discuss the application of ContextMEL for three modifiers, namely Negation, Temporality and Certainty, on Spanish and Catalan medical text. The metrics we obtain show our models are suitable for industrial use, outperforming commonly used rule-based approaches such as the NegEx algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
32. De-Identifying Swedish EHR Text Using Public Resources in the General Domain.
- Author
-
CHOMUTARE, Taridzo, YIGZAW, Kassaye Yitbarek, BUDRIONIS, Andrius, MAKHLYSHEVA, Alexandra, GODTLIEBSEN, Fred, and DALIANIS, Hercules
- Abstract
Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study,we add non-sensitive public datasets to EHR training data; (i) scientific medical textand (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02%with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text;and this could be useful in cases where the data is both sensitive and in low-resource languages [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
33. Deep learning in clinical natural language processing: a methodical review.
- Author
-
Wu, Stephen, Roberts, Kirk, Datta, Surabhi, Du, Jingcheng, Ji, Zongcheng, Si, Yuqi, Soni, Sarvesh, Wang, Qiong, Wei, Qiang, Xiang, Yang, Zhao, Bo, and Xu, Hua
- Abstract
Objective: This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research.Materials and Methods: We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers.Results: DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) and word2vec embeddings (74.1%) were the most popular methods; the information extraction tasks of text classification, named entity recognition, and relation extraction were dominant (89.2%). However, there was a "long tail" of other methods and specific tasks. Most contributions were methodological variants or applications, but 20.8% were new methods of some kind. The earliest adopters were in the NLP community, but the medical informatics community was the most prolific.Discussion: Our analysis shows growing acceptance of deep learning as a baseline for NLP research, and of DL-based NLP in the medical community. A number of common associations were substantiated (eg, the preference of recurrent neural networks for sequence-labeling named entity recognition), while others were surprisingly nuanced (eg, the scarcity of French language clinical NLP with deep learning).Conclusion: Deep learning has not yet fully penetrated clinical NLP and is growing rapidly. This review highlighted both the popular and unique trends in this active field. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
34. Improving Dependency Parsing on Clinical Text with Syntactic Clusters from Web Text
- Author
-
Qiao, Xiuming, Cao, Hailong, Zhao, Tiejun, Chen, Kehai, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Hirose, Akira, editor, Ozawa, Seiichi, editor, Doya, Kenji, editor, Ikeda, Kazushi, editor, Lee, Minho, editor, and Liu, Derong, editor
- Published
- 2016
- Full Text
- View/download PDF
35. A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts
- Author
-
Priyankar Bose, Sriram Srinivasan, William C. Sleeman, Jatinder Palta, Rishabh Kapoor, and Preetam Ghosh
- Subjects
electronic health records ,clinical text ,natural language processing ,named entity recognition ,relationship extraction ,machine learning ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Significant growth in Electronic Health Records (EHR) over the last decade has provided an abundance of clinical text that is mostly unstructured and untapped. This huge amount of clinical text data has motivated the development of new information extraction and text mining techniques. Named Entity Recognition (NER) and Relationship Extraction (RE) are key components of information extraction tasks in the clinical domain. In this paper, we highlight the present status of clinical NER and RE techniques in detail by discussing the existing proposed NLP models for the two tasks and their performances and discuss the current challenges. Our comprehensive survey on clinical NER and RE encompass current challenges, state-of-the-art practices, and future directions in information extraction from clinical text. This is the first attempt to discuss both of these interrelated topics together in the clinical context. We identified many research articles published based on different approaches and looked at applications of these tasks. We also discuss the evaluation metrics that are used in the literature to measure the effectiveness of the two these NLP methods and future research directions.
- Published
- 2021
- Full Text
- View/download PDF
36. 基于 E-CNN 和 BLSTM-CRF 的 临床文本命名实体识别.
- Author
-
曹春萍 and 关鹏举
- Subjects
- *
CONDITIONAL random fields , *SHORT-term memory , *INFORMATION modeling , *MEDICAL records , *MODEL railroads , *MATHEMATICAL convolutions - Abstract
In the task of named entity recognition of biomedical clinical medical record text, the traditional solution affects the identification of some composite entities because the boundary of the entity is not accurately defined. By studying the characteristics of composite entities, this paper proposed an ensemble convolution neural network (E-CNN) model combined with bidirectional long short-term memory network (BLSTM) and conditional random field (CRF). By setting the size of different convolution windows for the convolution layer in CNN, it captured richer boundary feature information between multiple words. Then it passed the integrated feature information to the BLSTM model for training, and finally obtained the final sequence annotation from the CRF model. The experimental results show that the proposed method has a good effect on the composite entity recognition in the clinical medical record text. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. Similarité surfacique et similarité sémantique dans des cas cliniques générés
- Author
-
Hiebel, Nicolas, Ferret, Olivier, Fort, Karën, Névéol, Aurélie, Université Paris-Saclay, Centre National de la Recherche Scientifique (CNRS), Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Sciences et Technologies des Langues (STL), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Département Intelligence Ambiante et Systèmes Interactifs (DIASI), Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Semantic Analysis of Natural Language (SEMAGRAMME), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Sorbonne Université (SU), and ANR-20-CE23-0026,CODEINE,Création éthique de données textuelles artificielles : Synthèse Automatique de documents Hospitaliers(2020)
- Subjects
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,French ,Synthetic Text ,Generation ,[INFO]Computer Science [cs] ,Génération ,Similarity ,Clinical Text ,Similarité ,Texte clinique ,Texte synthétique ,Français - Abstract
National audience; La disponibilité restreinte des documents cliniques est un frein à la recherche en traitement automatique de la langue dans le domaine médical. Les corpus cliniques dont l'accès est relativement facile en français (E3C (Magnini et al., 2020), CAS (Grabar et al., 2018)) ne sont pas tout à fait représentatifs des documents confidentiels présents dans les hôpitaux. Le partage des connaissances au sein de la communauté scientifique est compliqué. Aucune reproductibilité n'est possible, tout comme les comparaisons avec d'autres méthodes / données. Une piste de création de ressource partageable en substitut des données confidentielles est la génération de données similaires à ces données privées. Cela pourrait permettre à des personnes ayant accès à un corpus privé de générer un corpus librement distribué à partir du premier. En partageant la méthode de génération, il serait également possible de reproduire l'expérience sur d'autres données confidentielles. La mise à disposition des données générées donnerait alors à la communauté scientifique un terrain de test, de comparaison, de discussion et d'entraide dans la recherche en TAL biomédical. Nous proposons ici une méthode d'évaluation de textes cliniques générés à base de plongements de phrases.
- Published
- 2023
38. Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications
- Author
-
Dalianis, Hercules, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Paltoglou, Georgios, editor, Loizides, Fernando, editor, and Hansen, Preben, editor
- Published
- 2014
- Full Text
- View/download PDF
39. History of Terminology and Terminological Logics
- Author
-
Elkin, Peter L., Tuttle, Mark Samuel, and Elkin, Peter L., editor
- Published
- 2012
- Full Text
- View/download PDF
40. Natural Language Processing – The Basics
- Author
-
Pestian, John P., Deleger, Louise, Savova, Guergana K., Dexheimer, Judith W., Solti, Imre, and Hutton, John J., editor
- Published
- 2012
- Full Text
- View/download PDF
41. 基于 BLSTM 网络的医学时间短语识别.
- Author
-
张顺利, 王应军, and 姬东鸿
- Subjects
- *
CONVOLUTIONAL neural networks , *RANDOM fields , *FEATURE extraction , *INFORMATION modeling , *MACHINE learning , *NATURAL language processing - Abstract
Recognizing time phrases from clinical text is a fundamental task for many applications in clinical NLP. Traditional methods based on rules and machine learning require the design of complex rules and feature extraction, and the serial method used by most systems may lead to error propagation. This paper proposed a novel neural network based on bidirectional longshort term memory (BLSTM) to identifying clinical time expressions and the type of them simultaneously. Firstly, it combined character-level word embedding trained by convolutional neural network (CNN) with word embedding trained from large-scale biomedical corpus together as input to BLSTM. Then it utilized BLSTM to model context information of each word. Finally, it employed conditional random field (CRF) to optimize the output of BLSTM. This paper evaluated the model task 12 of on the Semeval-2016. It receives the best F1 value without requiring any handcrafted features or rules . Compared with the state-of-theart systems in this task, the proposed model improves the F1 scores by 3% . [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
42. Zero-shot Learning with Minimum Instruction to Extract Social Determinants and Family History from Clinical Notes using GPT Model.
- Author
-
Bhate NJ, Mittal A, He Z, and Luo X
- Abstract
Demographics, social determinants of health, and family history documented in the unstructured text within the electronic health records are increasingly being studied to understand how this information can be utilized with the structured data to improve healthcare outcomes. After the GPT models were released, many studies have applied GPT models to extract this information from the narrative clinical notes. Different from the existing work, our research focuses on investigating the zero-shot learning on extracting this information together by providing minimum information to the GPT model. We utilize de-identified real-world clinical notes annotated for demographics, various social determinants, and family history information. Given that the GPT model might provide text different from the text in the original data, we explore two sets of evaluation metrics, including the traditional NER evaluation metrics and semantic similarity evaluation metrics, to completely understand the performance. Our results show that the GPT-3.5 method achieved an average of 0.975 F1 on demographics extraction, 0.615 F1 on social determinants extraction, and 0.722 F1 on family history extraction. We believe these results can be further improved through model fine-tuning or few-shots learning. Through the case studies, we also identified the limitations of the GPT models, which need to be addressed in future research.
- Published
- 2023
- Full Text
- View/download PDF
43. Natural Language Processing of Medical Reports
- Author
-
Taira, Ricky K., Bui, Alex A.T., editor, and Taira, Ricky K., editor
- Published
- 2010
- Full Text
- View/download PDF
44. Phenotero: Annotate as you write.
- Author
-
Hombach, Daniela, Schwarz, Jana M., Knierim, Ellen, Schuelke, Markus, Seelow, Dominik, and Köhler, Sebastian
- Subjects
- *
PHENOTYPES , *GENETIC disorders , *GENE ontology , *PATIENT education , *MOLECULAR diagnosis - Abstract
In clinical genetics, the Human Phenotype Ontology as well as disease ontologies are often used for deep phenotyping of patients and coding of clinical diagnoses. However, assigning ontology classes to patient descriptions is often disconnected from writing patient reports or manuscripts in word processing software. This additional workload and the requirement to install dedicated software may discourage usage of ontologies for parts of the target audience. Here we present Phenotero, a freely available and simple solution to annotate patient phenotypes and diseases at the time of writing clinical reports or manuscripts. We adopt Zotero, a citation management software to create a tool which allows to reference classes from ontologies within text at the time of writing. We expect this approach to decrease the additional workload to a minimum while ensuring high quality associations with ontology classes. Standardized collection of phenotypic information at the time of describing the patient allows for streamlining the clinic workflow and efficient data entry. It will subsequently promote clinical and molecular diagnosis with the ultimate goal of better understanding genetic diseases. Thus, we believe that Phenotero eases the usage of ontologies and controlled vocabularies in the field of clinical genetics. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
45. Classifying medical relations in clinical text via convolutional neural networks.
- Author
-
He, Bin, Guan, Yi, and Dai, Rui
- Subjects
- *
ARTIFICIAL neural networks , *MEDICAL records , *CONSTRAINT satisfaction , *POOLINGS of interest , *ECONOMIC competition - Abstract
Deep learning research on relation classification has achieved solid performance in the general domain. This study proposes a convolutional neural network (CNN) architecture with a multi-pooling operation for medical relation classification on clinical records and explores a loss function with a category-level constraint matrix. Experiments using the 2010 i2b2/VA relation corpus demonstrate these models, which do not depend on any external features, outperform previous single-model methods and our best model is competitive with the existing ensemble-based method. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
46. Animal Models of Diabetes
- Author
-
Haluzik, Martin, Reitman, Marc L., and Poretsky, Leonid, editor
- Published
- 2004
- Full Text
- View/download PDF
47. Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models
- Author
-
Lamproudis, Anastasios, Henriksson, Aron, Dalianis, Hercules, Lamproudis, Anastasios, Henriksson, Aron, and Dalianis, Hercules
- Abstract
Research has shown that using generic language models – specifically, BERT models – in specialized domains may be sub-optimal due to domain differences in language use and vocabulary. There are several techniques for developing domain-specific language models that leverage the use of existing generic language models, including continued and domain-adaptive pretraining with in-domain data. Here, we investigate a strategy based on using a domain-specific vocabulary, while leveraging a generic language model for initialization. The results demonstrate that domain-adaptive pretraining, in combination with a domain-specific vocabulary – as opposed to a general-domain vocabulary – yields improvements on two downstream clinical NLP tasks for Swedish. The results highlight the value of domain-adaptive pretraining when developing specialized language models and indicate that it is beneficial to adapt the vocabulary of the language model to the target domain prior to continued, domain-adaptive pretraining of a generic language model.
- Published
- 2022
- Full Text
- View/download PDF
48. Systematic Review: Use of information extracted from unstructured text in prognostic clinical prediction models
- Author
-
Seinen, Tom
- Subjects
Statistics and Probability ,clinical event prediction ,Databases and Information Systems ,systematic review ,Computer Sciences ,Physical Sciences and Mathematics ,Medicine and Health Sciences ,Health Information Technology ,Longitudinal Data Analysis and Time Series ,unstructured text ,natural language processing ,clinical text - Abstract
The aim of this review is to determine how and in what situations prognostic clinical event prediction models, using information extracted from unstructured text data, have been developed.
- Published
- 2022
- Full Text
- View/download PDF
49. CLISTER : A Corpus for Semantic Textual Similarity in French Clinical Narratives
- Author
-
Hiebel, Nicolas, Ferret, Olivier, Fort, Karën, Névéol, Aurélie, Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Information, Langue Ecrite et Signée (ILES), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Sciences et Technologies des Langues (STL), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Département Intelligence Ambiante et Systèmes Interactifs (DIASI), Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Semantic Analysis of Natural Language (SEMAGRAMME), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Sorbonne Université (SU), ANR-20-CE23-0026,CODEINE,Création éthique de données textuelles artificielles : Synthèse Automatique de documents Hospitaliers(2020), CEA, Contributeur MAP, and Création éthique de données textuelles artificielles : Synthèse Automatique de documents Hospitaliers - - CODEINE2020 - ANR-20-CE23-0026 - AAPG2020 - VALID
- Subjects
Similarité sémantique ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Français Semantic Similarity ,Développement de corpus ,French ,[INFO.INFO-TT] Computer Science [cs]/Document and Text Processing ,Corpus Development ,Semantic Similarity ,Clinical Text ,Texte clinique ,Français - Abstract
National audience; Natural Language Processing relies on the availability of annotated corpora for training and evaluating models. There are very few resources for semantic similarity in the clinical domain in French. Herein, we introduce a definition of similarity guided by clinical facts and apply it to the development of a new shared corpus of 1,000 sentence pairs manually annotated with similarity scores. We evaluate the corpus through experiments of automatic similarity measurement. We show that a model of sentence embeddings can capture similarity with state of the art performance on the DEFT STS shared task data set (Spearman=0.8343). We also show that CLISTER is complementary to DEFT STS.; Le TAL repose sur la disponibilité de corpus annotés pour l'entraînement et l'évaluation de modèles. Il existe très peu de ressources pour la similarité sémantique dans le domaine clinique en français. Dans cette étude, nous proposons une définition de la similarité guidée par l'analyse clinique et l'appliquons au développement d'un nouveau corpus partagé de 1 000 paires de phrases annotées manuellement en scores de similarité. Nous évaluons ensuite le corpus par des expériences de mesure automatique de similarité. Nous montrons ainsi qu'un modèle de plongements de phrases peut capturer la similarité avec des performances à l'état de l'art sur le corpus DEFT STS (Spearman=0,8343). Nous montrons également que le contenu du corpus CLISTER est complémentaire de celui de DEFT STS.
- Published
- 2022
50. Medical Text Classification Using Convolutional Neural Networks.
- Author
-
HUGHES, Mark, Irene LI, KOTOULAS, Spyros, and Toyotaro SUZUMURA
- Abstract
We present an approach to automatically classify clinical text at a sentence level. We are using deep convolutional neural networks to represent complex features. We train the network on a dataset providing a broad categorization of health information. Through a detailed evaluation, we demonstrate that our method outperforms several approaches widely used in natural language processing tasks by about 15%. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.