Back to Search
Start Over
Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation
- Source :
- IEEE/ACM Transactions on Audio, Speech, and Language Processing. 23:1209-1220
- Publication Year :
- 2015
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2015.
-
Abstract
- Larger n-gram language models (LMs) perform better in statistical machine translation (SMT). However, the existing approaches have two main drawbacks for constructing larger LMs: 1) it is not convenient to obtain larger corpora in the same domain as the bilingual parallel corpora in SMT; 2) most of the previous studies focus on monolingual information from the target corpora only, and redundant n-grams have not been fully utilized in SMT. Nowadays, continuous-space language model (CSLM), especially neural network language model (NNLM), has been shown great improvement in the estimation accuracies of the probabilities for predicting the target words. However, most of these CSLM and NNLM approaches still consider monolingual information only or require additional corpus. In this paper, we propose a novel neural network based bilingual LM growing method. Compared to the existing approaches, the proposed method enables us to use bilingual parallel corpus for LM growing in SMT. The results show that our new method outperforms the existing approaches on both SMT performance and computational efficiency significantly.
- Subjects :
- Acoustics and Ultrasonics
Artificial neural network
Machine translation
business.industry
Computer science
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
Space (commercial competition)
computer.software_genre
Parallel corpora
Domain (software engineering)
Computational Mathematics
ComputingMethodologies_PATTERNRECOGNITION
Cache language model
Computer Science (miscellaneous)
Language model
Artificial intelligence
Electrical and Electronic Engineering
business
Focus (optics)
computer
Natural language processing
Subjects
Details
- ISSN :
- 23299304 and 23299290
- Volume :
- 23
- Database :
- OpenAIRE
- Journal :
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
- Accession number :
- edsair.doi...........44527833f34c992b34399f64ba42bcc2
- Full Text :
- https://doi.org/10.1109/taslp.2015.2425220