1. Exploration of the optimal deep learning model for english-Japanese machine translation of medical device adverse event terminology
- Author
-
Ayako Yagahara, Masahito Uesugi, and Hideto Yokoi
- Subjects
Machine translation ,Deep learning ,Medical device adverse event terminology ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background In Japan, reporting of medical device malfunctions and related health problems is mandatory, and efforts are being made to standardize terminology through the Adverse Event Terminology Collection of the Japan Federation of Medical Device Associations (JFMDA). Internationally, the Adverse Event Terminology of the International Medical Device Regulators Forum (IMDRF-AET) provides a standardized terminology collection in English. Mapping between the JFMDA terminology collection and the IMDRF-AET is critical to international harmonization. However, the process of translating the terminology collections from English to Japanese and reconciling them is done manually, resulting in high human workloads and potential inaccuracies. Objective The purpose of this study is to investigate the optimal machine translation model for the IMDRF-AET into Japanese for the part of a function for the automatic terminology mapping system. Methods English-Japanese parallel data for IMDRF-AET published by the Ministry of Health, Labor and Welfare in Japan was obtained from 50 sentences randomly extracted from the terms and their definitions. These English sentences were fed into the following machine translation models to produce Japanese translations: mBART50, m2m-100, Google Translation, Multilingual T5, GPT-3, ChatGPT, and GPT-4. The evaluations included the quantitative metrics of BiLingual Evaluation Understudy (BLEU), Character Error Rate (CER), Word Error Rate (WER), Metric for Evaluation of Translation with Explicit ORdering (METEOR), and Bidirectional Encoder Representations from Transformers (BERT) score, as well as qualitative evaluations by four experts. Results GPT-4 outperformed other models in both the quantitative and qualitative evaluations, with ChatGPT showing the same capability, but with lower quantitative scores, in the qualitative evaluation. Scores of other models, including mBART50 and m2m-100, lagged behind, particularly in the CER and BERT scores. Conclusion GPT-4’s superior performance in translating medical terminology, indicates its potential utility in improving the efficiency of the terminology mapping system.
- Published
- 2025
- Full Text
- View/download PDF