Back to Search Start Over

Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary.

Authors :
Xu Y
Chen L
Wei J
Ananiadou S
Fan Y
Qian Y
Chang EI
Tsujii J
Source :
BMC bioinformatics [BMC Bioinformatics] 2015 May 09; Vol. 16, pp. 149. Date of Electronic Publication: 2015 May 09.
Publication Year :
2015

Abstract

Background: Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national treatment. We make efforts to extract a bilingual lexicon from English and Chinese discharge summaries with a small seed lexicon. The lexical terms can be classified into two categories: single-word terms (SWTs) and multi-word terms (MWTs). For SWTs, we use a label propagation (LP; context-based) method to extract candidates of translation pairs. For MWTs, which are pervasive in the medical domain, we propose a term alignment method, which firstly obtains translation candidates for each component word of a Chinese MWT, and then generates their combinations, from which the system selects a set of plausible translation candidates.<br />Results: We compare our LP method with a baseline method based on simple context-similarity. The LP based method outperforms the baseline with the accuracies: 4.44% Acc1, 24.44% Acc10, and 62.22% Acc100, where AccN means the top N accuracy. The accuracy of the LP method drops to 5.41% Acc10 and 8.11% Acc20 for MWTs. Our experiments show that the method based on term alignment improves the performance for MWTs to 16.22% Acc10 and 27.03% Acc20.<br />Conclusions: We constructed a framework for building an English-Chinese term dictionary from discharge summaries in the two languages. Our experiments have shown that the LP-based method augmented with the term alignment method will contribute to reduction of manual work required to compile a bilingual sydictionary of clinical terms.

Details

Language :
English
ISSN :
1471-2105
Volume :
16
Database :
MEDLINE
Journal :
BMC bioinformatics
Publication Type :
Academic Journal
Accession number :
25956056
Full Text :
https://doi.org/10.1186/s12859-015-0606-0