Back to Search
Start Over
AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic Language
- Source :
- IEEE Access, Vol 9, Pp 133875-133888 (2021)
- Publication Year :
- 2021
- Publisher :
- IEEE, 2021.
-
Abstract
- In recent years, the utilization of natural language processing (NLP) and Machine Learning (ML) techniques in clinical decision support systems have shown their ability in improving and automating the diagnosis process, and reducing potential clinical errors. NLP in the Arabic language is more intricate due to several limitations, such as the lack of datasets and analytical resources compared to other languages like English. However, a clinical decision support system in the Arabic context is of significant importance. A fundamental process in NLP is extracting features from text-based data via text embedding. Word embedding is a representation of words in a numeric format that encodes the statistic, semantic, or context information. Building a neural word embedding model requires hundreds of thousands of data instances to find hidden patterns of relationships within sentences. Essentially, extracting relevant and informative features promotes the performance of the learning algorithms. The objective of this paper is to propose an Arabic neural-based word embedding model in the medical and healthcare context (called “AltibbiVec”). Around 1.5 million medical consultations and questions written in different dialects are obtained from Altibbi telemedicine company and used to train the embedding model. Three different embedding models are developed and compared, which are Word2Vec, fastText, and GloVe. The trained models were evaluated by different criteria, including the word clustering and the similarity of words. Besides, performing a specialty-based question classification. The results show that Word2Vec and fastText capture sufficiently the semantics of text more than GloVe. Hence, they are recommended for healthcare NLP-based applications.
- Subjects :
- Word embedding
General Computer Science
Computer science
Context (language use)
computer.software_genre
Semantics
Data modeling
General Materials Science
Word2vec
fastText
Context model
Arabic
business.industry
pre-trained
General Engineering
healthcare
word embedding
TK1-9971
Embedding
GloVe
Artificial intelligence
Electrical engineering. Electronics. Nuclear engineering
business
computer
Natural language processing
Word (computer architecture)
Subjects
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 9
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....5e3437a7df5c471d1fc1f5312bf19e6e