112 results on '"Masao Utiyama"'
Search Results
2. Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles
- Author
-
Haojie Yu, Masao Utiyama, Zhuosheng Zhang, and Hai Zhao
- Subjects
Word embedding ,Acoustics and Ultrasonics ,Machine translation ,Computer science ,Generalization ,business.industry ,Natural language understanding ,Context (language use) ,computer.software_genre ,Multimodality ,Computational Mathematics ,Computer Science (miscellaneous) ,Language model ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Natural language processing ,Word (computer architecture) - Abstract
Recent pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally provide more effective contextualized word representations than non-contextualized models, they are still subject to a sequence of text contexts without diverse hints from multimodality. This paper thus proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. In detail, we build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach. Analysis shows that our method with visual guidance pays more attention to content words, improves the representation diversity, and is potentially beneficial for enhancing the accuracy of disambiguation.
- Published
- 2022
- Full Text
- View/download PDF
3. Towards Tokenization and Part-of-Speech Tagging for Khmer: Data and Discussion
- Author
-
Hour Kaing, Sethserey Sam, Katsuhito Sudoh, Chenchen Ding, Eiichiro Sumita, Satoshi Nakamura, Masao Utiyama, and Sopheap Seng
- Subjects
Analytic language ,General Computer Science ,Part-of-speech tagging ,business.industry ,Computer science ,Lexical analysis ,Artificial intelligence ,computer.software_genre ,business ,computer ,Natural language processing - Abstract
As a highly analytic language, Khmer has considerable ambiguities in tokenization and part-of-speech (POS) tagging processing. This topic is investigated in this study. Specifically, a 20,000-sentence Khmer corpus with manual tokenization and POS-tagging annotation is released after a series of work over the last 4 years. This is the largest morphologically annotated Khmer dataset as of 2020, when this article was prepared. Based on the annotated data, experiments were conducted to establish a comprehensive benchmark on the automatic processing of tokenization and POS-tagging for Khmer. Specifically, a support vector machine, a conditional random field (CRF) , a long short-term memory (LSTM) -based recurrent neural network, and an integrated LSTM-CRF model have been investigated and discussed. As a primary conclusion, processing at morpheme-level is satisfactory for the provided data. However, it is intrinsically difficult to identify further grammatical constituents of compounds or phrases because of the complex analytic features of the language. Syntactic annotation and automatic parsing for Khmer will be scheduled in the near future.
- Published
- 2021
- Full Text
- View/download PDF
4. Context-aware positional representation for self-attention networks
- Author
-
Masao Utiyama, Rui Wang, Eiichiro Sumita, and Kehai Chen
- Subjects
0209 industrial biotechnology ,Machine translation ,Computer science ,business.industry ,Cognitive Neuroscience ,Representation (systemics) ,Context (language use) ,02 engineering and technology ,computer.software_genre ,Translation (geometry) ,Computer Science Applications ,020901 industrial engineering & automation ,Artificial Intelligence ,Position (vector) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Sentence ,Natural language processing ,Word (computer architecture) ,Transformer (machine learning model) - Abstract
In self-attention networks (SANs), positional embeddings are used to model order dependencies between words in the input sentence and are added with word embeddings to gain an input representation, which enables the SAN-based neural model to perform (multi-head) and to stack (multi-layer) self-attentive functions in parallel to learn the representation of the input sentence. However, this input representation only involves static order dependencies based on discrete position indexes of words, that is, is independent of context information, which may be weak in modeling the input sentence. To address this issue, we proposed a novel positional representation method to model order dependencies based on n-gram context or sentence context in the input sentence, which allows SANs to learn a more effective sentence representation. To validate the effectiveness of the proposed method, it is applied to the neural machine translation model, which adopts a typical SAN-based neural model. Experimental results on two widely used translation tasks, i.e., WMT14 English-to-German and WMT17 Chinese-to-English, showed that the proposed approach can significantly improve the translation performance over the strong Transformer baseline.
- Published
- 2021
- Full Text
- View/download PDF
5. Unsupervised Neural Machine Translation for Similar and Distant Language Pairs
- Author
-
Benjamin Marie, Haipeng Sun, Rui Wang, Masao Utiyama, Eiichiro Sumita, Kehai Chen, and Tiejun Zhao
- Subjects
General Computer Science ,Machine translation ,Computer science ,business.industry ,media_common.quotation_subject ,Translation (geometry) ,computer.software_genre ,language.human_language ,German ,Empirical research ,Simple (abstract algebra) ,Convergence (routing) ,language ,Quality (business) ,Artificial intelligence ,business ,computer ,Natural language processing ,Word (computer architecture) ,media_common - Abstract
Unsupervised neural machine translation (UNMT) has achieved remarkable results for several language pairs, such as French–English and German–English. Most previous studies have focused on modeling UNMT systems; few studies have investigated the effect of UNMT on specific languages. In this article, we first empirically investigate UNMT for four diverse language pairs (French/German/Chinese/Japanese–English). We confirm that the performance of UNMT in translation tasks for similar language pairs (French/German–English) is dramatically better than for distant language pairs (Chinese/Japanese–English). We empirically show that the lack of shared words and different word orderings are the main reasons that lead UNMT to underperform in Chinese/Japanese–English. Based on these findings, we propose several methods, including artificial shared words and pre-ordering, to improve the performance of UNMT for distant language pairs. Moreover, we propose a simple general method to improve translation performance for all these four language pairs. The existing UNMT model can generate a translation of a reasonable quality after a few training epochs owing to a denoising mechanism and shared latent representations. However, learning shared latent representations restricts the performance of translation in both directions, particularly for distant language pairs, while denoising dramatically delays convergence by continuously modifying the training data. To avoid these problems, we propose a simple, yet effective and efficient, approach that (like UNMT) relies solely on monolingual corpora: pseudo-data-based unsupervised neural machine translation. Experimental results for these four language pairs show that our proposed methods significantly outperform UNMT baselines.
- Published
- 2021
- Full Text
- View/download PDF
6. Constituency Parsing by Cross-Lingual Delexicalization
- Author
-
Masao Utiyama, Satoshi Nakamura, Eiichiro Sumita, Chenchen Ding, Katsuhito Sudoh, and Hour Kaing
- Subjects
Structure (mathematical logic) ,Cross lingual ,Parsing ,Dependency (UML) ,General Computer Science ,Computer science ,business.industry ,General Engineering ,computer.software_genre ,syntactic parsing ,TK1-9971 ,Margin (machine learning) ,Selection (linguistics) ,General Materials Science ,Syntactic structure ,Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,natural language processing ,Electrical and Electronic Engineering ,business ,computer ,Syntactic parsing ,Natural language processing ,Cross-lingual ,delexicalization - Abstract
Cross-lingual transfer is an important technique for low-resource language processing. Temporarily, most research on syntactic parsing works on the dependency structures. This work investigates cross-lingual parsing on another type of important syntactic structure, i.e., the constituency structure. We propose a delexicalized approach, where part-of-speech sequences of rich-resource languages are used to train cross-lingual models to parse low-resource languages. We also investigate the measurements on the selection of proper rich-resource languages for specific low-resource languages. The experiments show that the delexicalized approach outperforms state-of-the-art unsupervised models on six languages by a margin of 4.2 to 37.0 of sentence-level F1-score. Based on the experiment results, the limitation and future work of the delexicalized approach are discussed.
- Published
- 2021
- Full Text
- View/download PDF
7. Bilingual Subword Segmentation for Neural Machine Translation
- Author
-
Masao Utiyama, Eiichiro Sumita, Akihiro Tamura, Takashi Ninomiya, and Hiroyuki Deguchi
- Subjects
Machine translation ,Computer science ,business.industry ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Translation (geometry) ,computer.software_genre ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Segmentation ,Artificial intelligence ,business ,computer ,Sentence ,Natural language processing ,Transformer (machine learning model) - Abstract
This paper proposed a new subword segmentation method for neural machine translation, “Bilingual Subword Segmentation,” which tokenizes sentences to minimize the difference between the number of subword units in a sentence and that of its translation. While existing subword segmentation methods tokenize a sentence without considering its translation, the proposed method tokenizes a sentence by using subword units induced from bilingual sentences; this method could be more favorable to machine translation. Evaluations on WAT Asian Scientific Paper Excerpt Corpus (ASPEC) English-to-Japanese and Japanese-to-English translation tasks and WMT14 English-to-German and German-to-English translation tasks show that our bilingual subword segmentation improves the performance of Transformer neural machine translation (up to +0.81 BLEU).
- Published
- 2021
- Full Text
- View/download PDF
8. Extremely low-resource neural machine translation for Asian languages
- Author
-
Masao Utiyama, Raj Dabre, Atsushi Fujita, Eiichiro Sumita, Raphael Rubino, and Benjamin Marie
- Subjects
Linguistics and Language ,Machine translation ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,010501 environmental sciences ,Translation (geometry) ,computer.software_genre ,01 natural sciences ,Language and Linguistics ,Synthetic data ,Set (abstract data type) ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Quality (business) ,0105 earth and related environmental sciences ,media_common ,Data processing ,business.industry ,020201 artificial intelligence & image processing ,Artificial intelligence ,Noise (video) ,Computational linguistics ,business ,computer ,Software ,Natural language processing - Abstract
This paper presents a set of effective approaches to handle extremely low-resource language pairs for self-attention based neural machine translation (NMT) focusing on English and four Asian languages. Starting from an initial set of parallel sentences used to train bilingual baseline models, we introduce additional monolingual corpora and data processing techniques to improve translation quality. We describe a series of best practices and empirically validate the methods through an evaluation conducted on eight translation directions, based on state-of-the-art NMT approaches such as hyper-parameter search, data augmentation with forward and backward translation in combination with tags and noise, as well as joint multilingual training. Experiments show that the commonly used default architecture of self-attention NMT models does not reach the best results, validating previous work on the importance of hyper-parameter tuning. Additionally, empirical results indicate the amount of synthetic data required to efficiently increase the parameters of the models leading to the best translation quality measured by automatic metrics. We show that the best NMT models trained on large amount of tagged back-translations outperform three other synthetic data generation approaches. Finally, comparison with statistical machine translation (SMT) indicates that extremely low-resource NMT requires a large amount of synthetic parallel data obtained with back-translation in order to close the performance gap with the preceding SMT approach.
- Published
- 2020
- Full Text
- View/download PDF
9. Encoder-Decoder Attention ≠ Word Alignment: Axiomatic Method of Learning Word Alignments for Neural Machine Translation
- Author
-
Tiejun Zhao, Masao Utiyama, Eiichiro Sumita, Akihiro Tamura, and Chunpeng Ma
- Subjects
Machine translation ,Computer science ,business.industry ,Speech recognition ,Deep learning ,Axiomatic system ,Encoder decoder ,Artificial intelligence ,computer.software_genre ,business ,computer ,Word (computer architecture) - Published
- 2020
- Full Text
- View/download PDF
10. Improving neural machine translation through phrase-based soft forced decoding
- Author
-
Graham Neubig, Satoshi Nakamura, Masao Utiyama, Eiichiro Sumita, and Jingyi Zhang
- Subjects
Linguistics and Language ,Phrase ,Machine translation ,Computer science ,Speech recognition ,02 engineering and technology ,Translation (geometry) ,computer.software_genre ,Language and Linguistics ,Artificial Intelligence ,Translation rule ,Path (graph theory) ,0202 electrical engineering, electronic engineering, information engineering ,Table (database) ,020201 artificial intelligence & image processing ,Computational linguistics ,computer ,Software ,Decoding methods - Abstract
Compared to traditional statistical machine translation (SMT), such as phrase-based machine translation (PBMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using this cost to rerank the n-best NMT outputs. The main challenge in implementing this approach is that NMT outputs may not be in the search space of the standard phrase-based decoding algorithm, because the search space of PBMT is limited by the phrase-based translation rule table. We propose a phrase-based soft forced decoding algorithm, which can always successfully find a decoding path for any NMT output. We show that using the phrase-based decoding cost to rerank the NMT outputs can successfully improve translation quality on four different language pairs.
- Published
- 2020
- Full Text
- View/download PDF
11. A Burmese (Myanmar) Treebank
- Author
-
Masao Utiyama, Win Pa Pa, Eiichiro Sumita, Khin Mar Soe, Sann Su Su Yee, and Chenchen Ding
- Subjects
050101 languages & linguistics ,General Computer Science ,Computer science ,business.industry ,05 social sciences ,Treebank ,Phrase structure rules ,Languages of Asia ,02 engineering and technology ,Guideline ,computer.software_genre ,language.human_language ,Burmese ,Annotation ,Component (UML) ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Artificial intelligence ,business ,computer ,Sentence ,Natural language processing - Abstract
A 20,000-sentence Burmese (Myanmar) treebank on news articles has been released under a CC BY-NC-SA license. Complete phrase structure annotation was developed for each sentence from the morphologically annotated data prepared in previous work of Ding et al. [1]. As the final result of the Burmese component in the Asian Language Treebank Project , this is the first large-scale, open-access treebank for the Burmese language. The annotation details and features of this treebank are presented.
- Published
- 2020
- Full Text
- View/download PDF
12. Towards More Diverse Input Representation for Neural Machine Translation
- Author
-
Masao Utiyama, Muyun Yang, Hai Zhao, Tiejun Zhao, Eiichiro Sumita, Rui Wang, and Kehai Chen
- Subjects
Word embedding ,Acoustics and Ultrasonics ,Machine translation ,Computer science ,business.industry ,Pattern recognition ,Speech processing ,computer.software_genre ,ENCODE ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,Computer Science (miscellaneous) ,Embedding ,NIST ,Artificial intelligence ,Electrical and Electronic Engineering ,Performance improvement ,0305 other medical science ,business ,computer ,Transformer (machine learning model) - Abstract
Source input information plays a very important role in the Transformer-based translation system. In practice, word embedding and positional embedding of each word are added as the input representation. Then self-attention networks are used to encode the global dependencies in the input representation to generate a source representation. However, this processing on the source representation only adopts a single source feature and excludes richer and more diverse features such as recurrence features, local features, and syntactic features, which results in tedious representation and thereby hinders the further translation performance improvement. In this paper, we introduce a simple and efficient method to encode more diverse source features into the input representation simultaneously, and thereby learning an effective source representation by self-attention networks. In particular, the proposed grouped strategy is only applied to the input representation layer, to keep the diversity of translation information and the efficiency of the self-attention networks at the same time. Experimental results show that our approach improves the translation performance over the state-of-the-art baselines of Transformer in regard to WMT14 English-to-German and NIST Chinese-to-English machine translation tasks.
- Published
- 2020
- Full Text
- View/download PDF
13. Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement
- Author
-
Tiejun Zhao, Haipeng Sun, Rui Wang, Eiichiro Sumita, Masao Utiyama, and Kehai Chen
- Subjects
Structure (mathematical logic) ,Word embedding ,Acoustics and Ultrasonics ,Machine translation ,Computer science ,Noise reduction ,Speech recognition ,Initialization ,computer.software_genre ,Computational Mathematics ,Computer Science (miscellaneous) ,Language model ,Electrical and Electronic Engineering ,Layer (object-oriented design) ,Encoder ,computer - Abstract
Unsupervised cross-lingual language representation initialization methods such as unsupervised bilingual word embedding (UBWE) pre-training and cross-lingual masked language model (CMLM) pre-training, together with mechanisms such as denoising and back-translation, have advanced unsupervised neural machine translation (UNMT), which has achieved impressive results on several language pairs, particularly French-English and German-English. Typically, UBWE focuses on initializing the word embedding layer in the encoder and decoder of UNMT, whereas the CMLM focuses on initializing the entire encoder and decoder of UNMT. However, UBWE/CMLM training and UNMT training are independent, which makes it difficult to assess how the quality of UBWE/CMLM affects the performance of UNMT during UNMT training. In this paper, we first empirically explore relationships between UNMT and UBWE/CMLM. The empirical results demonstrate that the performance of UBWE and CMLM has a significant influence on the performance of UNMT. Motivated by this, we propose a novel UNMT structure with cross-lingual language representation agreement to capture the interaction between UBWE/CMLM and UNMT during UNMT training. Experimental results on several language pairs demonstrate that the proposed UNMT models improve significantly over the corresponding state-of-the-art UNMT baselines.
- Published
- 2020
- Full Text
- View/download PDF
14. Neural Machine Translation With Sentence-Level Topic Context
- Author
-
Masao Utiyama, Eiichiro Sumita, Rui Wang, Tiejun Zhao, and Kehai Chen
- Subjects
Acoustics and Ultrasonics ,Machine translation ,Computer science ,business.industry ,Context (language use) ,computer.software_genre ,Translation (geometry) ,Convolutional neural network ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,Computer Science (miscellaneous) ,Artificial intelligence ,Electrical and Electronic Engineering ,Language translation ,0305 other medical science ,business ,Baseline (configuration management) ,computer ,Natural language processing ,Sentence - Abstract
Traditional neural machine translation NMT methods use the word-level context to predict target language translation while neglecting the sentence-level context, which has been shown to be beneficial for translation prediction in statistical machine translation. This paper represents the sentence-level context as latent topic representations by using a convolution neural network, and designs a topic attention to integrate source sentence-level topic context information into both attention-based and Transformer-based NMT. In particular, our method can improve the performance of NMT by modeling source topics and translations jointly. Experiments on the large-scale LDC Chinese-to-English translation tasks and WMT’14 English-to-German translation tasks show that the proposed approach can achieve significant improvements compared with baseline systems.
- Published
- 2019
- Full Text
- View/download PDF
15. Towards Burmese (Myanmar) Morphological Analysis
- Author
-
Chenchen Ding, Masao Utiyama, Khin Mar Soe, Khin Thandar Nwet, Win Pa Pa, Eiichiro Sumita, and Hnin Thu Zar Aye
- Subjects
Conditional random field ,General Computer Science ,Computer science ,business.industry ,Lexical analysis ,Treebank ,Languages of Asia ,02 engineering and technology ,Thesaurus ,computer.software_genre ,language.human_language ,Burmese ,03 medical and health sciences ,0302 clinical medicine ,Recurrent neural network ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Syllable ,business ,computer ,Natural language processing - Abstract
This article presents a comprehensive study on two primary tasks in Burmese (Myanmar) morphological analysis: tokenization and part-of-speech (POS) tagging. Twenty thousand Burmese sentences of newswire are annotated with two-layer tokenization and POS-tagging information, as one component of the Asian Language Treebank Project. The annotated corpus has been released under a CC BY-NC-SA license, and it is the largest open-access database of annotated Burmese when this manuscript was prepared in 2017. Detailed descriptions of the preparation, refinement, and features of the annotated corpus are provided in the first half of the article. Facilitated by the annotated corpus, experiment-based investigations are presented in the second half of the article, wherein the standard sequence-labeling approach of conditional random fields and a long short-term memory (LSTM)-based recurrent neural network (RNN) are applied and discussed. We obtained several general conclusions, covering the effect of joint tokenization and POS-tagging and importance of ensemble from the viewpoint of stabilizing the performance of LSTM-based RNN. This study provides a solid basis for further studies on Burmese processing.
- Published
- 2019
- Full Text
- View/download PDF
16. Text Compression-Aided Transformer Encoding
- Author
-
Masao Utiyama, Eiichiro Sumita, Kehai Chen, Rui Wang, Zhuosheng Zhang, Hai Zhao, and Zuchao Li
- Subjects
FOS: Computer and information sciences ,Downstream (software development) ,Computer science ,02 engineering and technology ,Data_CODINGANDINFORMATIONTHEORY ,Machine learning ,computer.software_genre ,Text mining ,Artificial Intelligence ,Encoding (memory) ,0202 electrical engineering, electronic engineering, information engineering ,Transformer (machine learning model) ,Computer Science - Computation and Language ,business.industry ,Applied Mathematics ,Computational Theory and Mathematics ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Encoder ,Computation and Language (cs.CL) ,Software ,Text compression ,Meaning (linguistics) - Abstract
Text encoding is one of the most important steps in Natural Language Processing (NLP). It has been done well by the self-attention mechanism in the current state-of-the-art Transformer encoder, which has brought about significant improvements in the performance of many NLP tasks. Though the Transformer encoder may effectively capture general information in its resulting representations, the backbone information, meaning the gist of the input text, is not specifically focused on. In this paper, we propose explicit and implicit text compression approaches to enhance the Transformer encoding and evaluate models using this approach on several typical downstream tasks that rely on the encoding heavily. Our explicit text compression approaches use dedicated models to compress text, while our implicit text compression approach simply adds an additional module to the main model to handle text compression. We propose three ways of integration, namely backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the backbone information into Transformer-based models for various downstream tasks. Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines. We therefore conclude, when comparing the encodings to the baseline models, text compression helps the encoders to learn better language representations.
- Published
- 2021
17. User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization
- Author
-
Eiichiro Sumita, Shohei Higashiyama, Taro Watanabe, and Masao Utiyama
- Subjects
FOS: Computer and information sciences ,Normalization (statistics) ,Text corpus ,050101 languages & linguistics ,Computer Science - Computation and Language ,Computer science ,business.industry ,05 social sciences ,02 engineering and technology ,computer.software_genre ,Morphological analysis ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Artificial intelligence ,business ,Computation and Language (cs.CL) ,computer ,Natural language processing - Abstract
Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comprises 929 sentences annotated with morphological and normalization information, along with category information we classified for frequent UGT-specific phenomena. Experiments on the corpus demonstrated the low performance of existing MA/LN methods for non-general words and non-standard forms, indicating that the corpus would be a challenging benchmark for further research on UGT., NAACL-HLT 2021
- Published
- 2021
- Full Text
- View/download PDF
18. NICT’s Neural Machine Translation Systems for the WAT21 Restricted Translation Task
- Author
-
Eiichiro Sumita, Masao Utiyama, Hai Zhao, and Zuchao Li
- Subjects
Vocabulary ,Backbone network ,Machine translation ,business.industry ,Computer science ,media_common.quotation_subject ,Inference ,Machine learning ,computer.software_genre ,Translation (geometry) ,Task (project management) ,Artificial intelligence ,business ,computer ,Decoding methods ,Transformer (machine learning model) ,media_common - Abstract
This paper describes our system (Team ID: nictrb) for participating in the WAT’21 restricted machine translation task. In our submitted system, we designed a new training approach for restricted machine translation. By sampling from the translation target, we can solve the problem that ordinary training data does not have a restricted vocabulary. With the further help of constrained decoding in the inference phase, we achieved better results than the baseline, confirming the effectiveness of our solution. In addition, we also tried the vanilla and sparse Transformer as the backbone network of the model, as well as model ensembling, which further improved the final translation performance.
- Published
- 2021
- Full Text
- View/download PDF
19. Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios
- Author
-
Eiichiro Sumita, Haipeng Sun, Tiejun Zhao, Rui Wang, Kehai Chen, and Masao Utiyama
- Subjects
FOS: Computer and information sciences ,Training set ,Computer Science - Computation and Language ,Machine translation ,Computer science ,business.industry ,Training (meteorology) ,computer.software_genre ,Translation (geometry) ,Estonian ,language.human_language ,030507 speech-language pathology & audiology ,03 medical and health sciences ,ComputingMethodologies_PATTERNRECOGNITION ,language ,Artificial intelligence ,0305 other medical science ,business ,computer ,Self training ,Computation and Language (cs.CL) ,Natural language processing - Abstract
Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks. However, in real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian, and UNMT systems usually perform poorly when there is not adequate training corpus for one language. In this paper, we first define and analyze the unbalanced training data scenario for UNMT. Based on this scenario, we propose UNMT self-training mechanisms to train a robust UNMT system and improve its performance in this case. Experimental results on several language pairs show that the proposed methods substantially outperform conventional UNMT systems., Accepted by NAACL 2021
- Published
- 2020
20. Reference Language based Unsupervised Neural Machine Translation
- Author
-
Rui Wang, Eiichiro Sumita, Hai Zhao, Zuchao Li, and Masao Utiyama
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Machine translation ,business.industry ,Computer science ,media_common.quotation_subject ,Supervised learning ,SIGNAL (programming language) ,02 engineering and technology ,010501 environmental sciences ,Translation (geometry) ,computer.software_genre ,01 natural sciences ,Agreement ,Pivot language ,Subject (grammar) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Computation and Language (cs.CL) ,computer ,Natural language processing ,0105 earth and related environmental sciences ,media_common - Abstract
Exploiting a common language as an auxiliary for better translation has a long tradition in machine translation and lets supervised learning-based machine translation enjoy the enhancement delivered by the well-used pivot language in the absence of a source language to target language parallel corpus. The rise of unsupervised neural machine translation (UNMT) almost completely relieves the parallel corpus curse, though UNMT is still subject to unsatisfactory performance due to the vagueness of the clues available for its core back-translation training. Further enriching the idea of pivot translation by extending the use of parallel corpora beyond the source-target paradigm, we propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source, but this corpus still indicates a signal clear enough to help the reconstruction training of UNMT through a proposed reference agreement mechanism. Experimental results show that our methods improve the quality of UNMT over that of a strong baseline that uses only one auxiliary language, demonstrating the usefulness of the proposed reference language-based UNMT and establishing a good start for the community., EMNLP 2020, ACL Findings
- Published
- 2020
21. Modeling Future Cost for Neural Machine Translation
- Author
-
Kehai Chen, Tiejun Zhao, Eiichiro Sumita, Rui Wang, Chaoqun Duan, Conghui Zhu, and Masao Utiyama
- Subjects
FOS: Computer and information sciences ,Context model ,Computer Science - Computation and Language ,Acoustics and Ultrasonics ,Machine translation ,Artificial neural network ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Speech processing ,Computational Mathematics ,Computer Science (miscellaneous) ,Artificial intelligence ,Electrical and Electronic Engineering ,Representation (mathematics) ,business ,Computation and Language (cs.CL) ,computer ,Word (computer architecture) ,Decoding methods ,Transformer (machine learning model) - Abstract
Existing neural machine translation (NMT) systems utilize sequence-to-sequence neural networks to generate target translation word by word, and then make the generated word at each time-step and the counterpart in the references as consistent as possible. However, the trained translation model tends to focus on ensuring the accuracy of the generated target word at the current time-step and does not consider its future cost which means the expected cost of generating the subsequent target translation (i.e., the next target word). To respond to this issue, in this article, we propose a simple and effective method to model the future cost of each target word for NMT systems. In detail, a future cost representation is learned based on the current generated target word and its contextual information to compute an additional loss to guide the training of the NMT model. Furthermore, the learned future cost representation at the current time-step is used to help the generation of the next target word in the decoding. Experimental results on three widely-used translation datasets, including the WMT14 English-to-German, WMT14 English-to-French, and WMT17 Chinese-to-English, show that the proposed approach achieves significant improvements over strong Transformer-based NMT baseline.
- Published
- 2020
22. NOVA
- Author
-
Masao Utiyama, Chenchen Ding, and Eiichiro Sumita
- Subjects
Flexibility (engineering) ,General Computer Science ,Relation (database) ,business.industry ,Computer science ,Lexical analysis ,02 engineering and technology ,computer.software_genre ,Southeast asian ,language.human_language ,Burmese ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Annotation ,Nova (rocket) ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,computer ,Natural language processing ,Word (computer architecture) - Abstract
A feasible and flexible annotation system is designed for joint tokenization and part-of-speech (POS) tagging to annotate those languages without natural definitions of words . This design was motivated by the fact that word separators are not used in many highly analytic East and Southeast Asian languages. Although several of the languages are well-studied, e.g., Chinese and Japanese, many are understudied with low resources, e.g., Burmese (Myanmar) and Khmer. In the first part of the article, the proposed annotation system, named nova, is introduced. nova contains only four basic tags (n, v, a, and o); these tags can be further modified and combined to adapt complex linguistic phenomena in tokenization and POS tagging. In the second part of the article, the feasibility and flexibility of nova is illustrated from the annotation practice on Burmese and Khmer. The relation between nova and two universal POS tagsets is discussed in the final part of the article.
- Published
- 2018
- Full Text
- View/download PDF
23. Sentence Selection and Weighting for Neural Machine Translation Domain Adaptation
- Author
-
Rui Wang, Kehai Chen, Andrew Finch, Lemao Liu, Eiichiro Sumita, and Masao Utiyama
- Subjects
Phrase ,Acoustics and Ultrasonics ,Machine translation ,Computer science ,business.industry ,02 engineering and technology ,computer.software_genre ,Speech processing ,Domain (software engineering) ,Weighting ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Selection (linguistics) ,Task analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,Electrical and Electronic Engineering ,0305 other medical science ,business ,computer ,Sentence ,Natural language processing - Abstract
Neural machine translation (NMT) has been prominent in many machine translation tasks. However, in some domain-specific tasks, only the corpora from similar domains can improve translation performance. If out-of-domain corpora are directly added into the in-domain corpus, the translation performance may even degrade. Therefore, domain adaptation techniques are essential to solve the NMT domain problem. Most existing methods for domain adaptation are designed for the conventional phrase-based machine translation. For NMT domain adaptation, there have been only a few studies on topics such as fine tuning, domain tags, and domain features. In this paper, we have four goals for sentence level NMT domain adaptation. First, the NMT's internal sentence embedding is exploited and the sentence embedding similarity is used to select out-of-domain sentences that are close to the in-domain corpus. Second, we propose three sentence weighting methods, i.e., sentence weighting, domain weighting, and batch weighting, to balance the data distribution during NMT training. Third, in addition, we propose dynamic training methods to adjust the sentence selection and weighting during NMT training. Fourth, to solve the multidomain problem in a real-world NMT scenario where the domain distributions of training and testing data often mismatch, we proposed a multidomain sentence weighting method to balance the domain distributions of training data and match the domain distributions of training and testing data. The proposed methods are evaluated in international workshop on spoken language translation (IWSLT) English-to-French/German tasks and a multidomain English-to-French task. Empirical results show that the sentence selection and weighting methods can significantly improve the NMT performance, outperforming the existing baselines.
- Published
- 2018
- Full Text
- View/download PDF
24. A Neural Approach to Source Dependence Based Context Model for Statistical Machine Translation
- Author
-
Rui Wang, Tiejun Zhao, Muyun Yang, Lemao Liu, Kehai Chen, Masao Utiyama, Akihiro Tamura, and Eiichiro Sumita
- Subjects
Phrase ,Acoustics and Ultrasonics ,Machine translation ,Computer science ,Context (language use) ,02 engineering and technology ,computer.software_genre ,Translation (geometry) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Electrical and Electronic Engineering ,Language translation ,Context model ,Artificial neural network ,business.industry ,Programming language ,Computational Mathematics ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
In statistical machine translation, translation prediction considers not only the aligned source word itself but also its source contextual information. Learning context representation is a promising method for improving translation results, particularly through neural networks. Most of the existing methods process context words sequentially and neglect source long-distance dependencies. In this paper, we propose a novel neural approach to source dependence-based context representation for translation prediction. The proposed model is capable of not only encoding source long-distance dependencies but also capturing functional similarities to better predict translations (i.e., word form translations and ambiguous word translations). To verify our method, the proposed mode is incorporated into phrase-based and hierarchical phrase-based translation models, respectively. Experiments on large-scale Chinese-to-English and English-to-German translation tasks show that the proposed approach achieves significant improvement over the baseline systems and outperforms several existing context-enhanced methods.
- Published
- 2018
- Full Text
- View/download PDF
25. Translation Quality Estimation Using Only Bilingual Corpora
- Author
-
Atsushi Fujita, Andrew Finch, Eiichiro Sumita, Masao Utiyama, and Lemao Liu
- Subjects
Acoustics and Ultrasonics ,Machine translation ,Computer science ,media_common.quotation_subject ,Postediting ,02 engineering and technology ,computer.software_genre ,Machine learning ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Annotation ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Quality (business) ,Electrical and Electronic Engineering ,media_common ,business.industry ,Speech processing ,Marginal likelihood ,Computational Mathematics ,Recurrent neural network ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,computer ,Natural language processing ,Spoken language - Abstract
In computer-aided translation scenarios, quality estimation of machine translation hypotheses plays a critical role. Existing methods for word-level translation quality estimation (TQE) rely on the availability of manually annotated TQE training data obtained via direct annotation or postediting. However, due to the cost of human labor, such data are either limited in size or is only available for few tasks in practice. To avoid the reliance on such annotated TQE data, this paper proposes an approach to train word-level TQE models using bilingual corpora, which are typically used in machine translation training and is relatively easier to access. We formalize the training of our proposed method under the framework of maximum marginal likelihood estimation. To avoid degenerated solutions, we propose a novel regularized training objective whose optimization is achieved by an efficient approximation. Extensive experiments on both written and spoken language datasets empirically show that our approach yields comparable performance to the standard training on annotated data.
- Published
- 2017
- Full Text
- View/download PDF
26. Robust Unsupervised Neural Machine Translation with Adversarial Denoising Training
- Author
-
Tiejun Zhao, Rui Wang, Masao Utiyama, Eiichiro Sumita, Haipeng Sun, Kehai Chen, and Xugang Lu
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Machine translation ,Computer science ,Noise reduction ,Speech recognition ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Adversarial system ,Robustness (computer science) ,0305 other medical science ,computer ,Computation and Language (cs.CL) ,0105 earth and related environmental sciences ,Word order - Abstract
Unsupervised neural machine translation (UNMT) has recently attracted great interest in the machine translation community. The main advantage of the UNMT lies in its easy collection of required large training text sentences while with only a slightly worse performance than supervised neural machine translation which requires expensive annotated translation pairs on some translation tasks. In most studies, the UMNT is trained with clean data without considering its robustness to the noisy data. However, in real-world scenarios, there usually exists noise in the collected input sentences which degrades the performance of the translation system since the UNMT is sensitive to the small perturbations of the input sentences. In this paper, we first time explicitly take the noisy data into consideration to improve the robustness of the UNMT based systems. First of all, we clearly defined two types of noises in training sentences, i.e., word noise and word order noise, and empirically investigate its effect in the UNMT, then we propose adversarial training methods with denoising process in the UNMT. Experimental results on several language pairs show that our proposed methods substantially improved the robustness of the conventional UNMT systems in noisy scenarios., Comment: Accepted at COLING 2020
- Published
- 2020
- Full Text
- View/download PDF
27. A System for Worldwide COVID-19 Information Aggregation
- Author
-
Atsuyuki Morishima, Hiroyoshi Ito, Sadao Kurohashi, Frederic Bergeron, Masao Utiyama, Hirokazu Kiyomaru, Qianying Liu, Ying Zhong, Shinji Suzuki, Akiko Aizawa, Yugo Murawaki, Masaki Kobayashi, Kazumasa Omura, Yusuke Miyao, Masaki Matsubara, Yu Tanaka, Nobuhiro Ueda, Honai Ueoka, Haiyue Song, Masashi Toyoda, Katsuhiko Hayashi, Kentaro Inui, Junjie Chen, Ribeka Tanaka, Eiichiro Sumita, Fei Cheng, Daisuke Kawahara, Takashi Kodama, and Masaru Kitsuregawa
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Coronavirus disease 2019 (COVID-19) ,Sanitation ,Machine translation ,Computer science ,business.industry ,media_common.quotation_subject ,computer.software_genre ,Crowdsourcing ,Data science ,Information aggregation ,Pandemic ,Quality (business) ,business ,computer ,Computation and Language (cs.CL) ,media_common - Abstract
The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories., Comment: Accepted to EMNLP 2020 Workshop NLP-COVID
- Published
- 2020
- Full Text
- View/download PDF
28. Bilingual Subword Segmentation for Neural Machine Translation
- Author
-
Hiroyuki Deguchi, Eiichiro Sumita, Masao Utiyama, Takashi Ninomiya, and Akihiro Tamura
- Subjects
Machine translation ,business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,computer.software_genre ,Translation (geometry) ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Segmentation ,Artificial intelligence ,business ,computer ,Natural language processing ,Sentence ,BLEU ,Transformer (machine learning model) - Abstract
This paper proposed a new subword segmentation method for neural machine translation, “Bilingual Subword Segmentation,” which tokenizes sentences to minimize the difference between the number of subword units in a sentence and that of its translation. While existing subword segmentation methods tokenize a sentence without considering its translation, the proposed method tokenizes a sentence by using subword units induced from bilingual sentences; this method could be more favorable to machine translation. Evaluations on WAT Asian Scientific Paper Excerpt Corpus (ASPEC) English-to-Japanese and Japanese-to-English translation tasks and WMT14 English-to-German and German-to-English translation tasks show that our bilingual subword segmentation improves the performance of Transformer neural machine translation (up to +0.81 BLEU).
- Published
- 2020
- Full Text
- View/download PDF
29. Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation
- Author
-
Eiichiro Sumita, Raj Dabre, Masao Utiyama, Chenchen Ding, and Abhisek Chakrabarty
- Subjects
Machine translation ,Computer science ,Low resource ,media_common.quotation_subject ,computer.software_genre ,Translation (geometry) ,Linguistics ,Quality (business) ,Relevance (information retrieval) ,Representation (mathematics) ,computer ,Word (computer architecture) ,media_common ,BLEU - Abstract
In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data. Integrating manually designed or automatically extracted features into the NMT framework is known to be beneficial. However, this study emphasizes that the relevance of the features is crucial to the performance. Specifically, we propose two methods, 1) self relevance and 2) word-based relevance, to improve the representation of features for NMT. Experiments are conducted on translation tasks from English to eight Asian languages, with no more than twenty thousand sentences for training. The proposed methods improve translation quality for all tasks by up to 3.09 BLEU points. Discussions with visualization provide the explainability of the proposed methods where we show that the relevance methods provide weights to features thereby enhancing their impact on low-resource machine translation.
- Published
- 2020
- Full Text
- View/download PDF
30. Content Word Aware Neural Machine Translation
- Author
-
Rui Wang, Eiichiro Sumita, Masao Utiyama, and Kehai Chen
- Subjects
Machine translation ,Computer science ,business.industry ,010501 environmental sciences ,Content word ,computer.software_genre ,01 natural sciences ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Word lists by frequency ,Artificial intelligence ,0305 other medical science ,business ,computer ,Natural language processing ,Sentence ,0105 earth and related environmental sciences ,Transformer (machine learning model) - Abstract
Neural machine translation (NMT) encodes the source sentence in a universal way to generate the target sentence word-by-word. However, NMT does not consider the importance of word in the sentence meaning, for example, some words (i.e., content words) express more important meaning than others (i.e., function words). To address this limitation, we first utilize word frequency information to distinguish between content and function words in a sentence, and then design a content word-aware NMT to improve translation performance. Empirical results on the WMT14 English-to-German, WMT14 English-to-French, and WMT17 Chinese-to-English translation tasks show that the proposed methods can significantly improve the performance of Transformer-based NMT.
- Published
- 2020
- Full Text
- View/download PDF
31. Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation
- Author
-
Masao Utiyama, Tiejun Zhao, Eiichiro Sumita, Rui Wang, Kehai Chen, and Haipeng Sun
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Basis (linear algebra) ,Machine translation ,business.industry ,Computer science ,computer.software_genre ,Translation (geometry) ,law.invention ,law ,Simple (abstract algebra) ,Artificial intelligence ,business ,Computation and Language (cs.CL) ,Distillation ,computer ,Encoder ,Natural language processing - Abstract
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs. However, it can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time. That is, research on multilingual UNMT has been limited. In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder, making use of multilingual data to improve UNMT for all language pairs. On the basis of the empirical findings, we propose two knowledge distillation methods to further enhance multilingual UNMT performance. Our experiments on a dataset with English translated to and from twelve other languages (including three language families and six language branches) show remarkable results, surpassing strong unsupervised individual baselines while achieving promising performance between non-English language pairs in zero-shot translation scenarios and alleviating poor performance in low-resource language pairs., Comment: Accepted to ACL 2020
- Published
- 2020
- Full Text
- View/download PDF
32. Explicit Sentence Compression for Neural Machine Translation
- Author
-
Zuchao Li, Rui Wang, Kehai Chen, Masao Utiyama, Hai Zhao, Eiichiro Sumita, and Zhuosheng Zhang
- Subjects
FOS: Computer and information sciences ,Sentence compression ,Computer Science - Computation and Language ,Machine translation ,Computer science ,business.industry ,General Medicine ,computer.software_genre ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Artificial intelligence ,0305 other medical science ,business ,Computation and Language (cs.CL) ,computer ,Natural language processing ,Sentence ,Transformer (machine learning model) - Abstract
State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT English-to-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines., Working in progress, part of this work is accepted in AAAI-2020
- Published
- 2019
33. Document-level Neural Machine Translation with Associated Memory Network
- Author
-
Hai Zhao, Kehai Chen, Masao Utiyama, Shu Jiang, Bao-liang Lu, Rui Wang, Eiichiro Sumita, and Zuchao Li
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Machine translation ,business.industry ,Computer science ,computer.software_genre ,Document level ,Artificial Intelligence ,Hardware and Architecture ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Computation and Language (cs.CL) ,computer ,Software ,Natural language processing - Abstract
Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the rich document-level context. In this work, the proposed document-aware memory network is implemented to enhance the Transformer NMT baseline. Experiments on several tasks show that the proposed method significantly improves the NMT performance over strong Transformer baselines and other related studies.
- Published
- 2019
34. Improving Sublanguage Translation via Global Pre-ordering
- Author
-
Masao Utiyama, Eiichiro Sumita, Yuji Matsumoto, and Masaru Fuji
- Subjects
Thesaurus (information retrieval) ,business.industry ,Computer science ,Artificial intelligence ,computer.software_genre ,Translation (geometry) ,business ,computer ,Natural language processing ,Sublanguage - Published
- 2017
- Full Text
- View/download PDF
35. Learning local word reorderings for hierarchical phrase-based statistical machine translation
- Author
-
Jingyi Zhang, Hai Zhao, Masao Utiyama, Eiichro Sumita, Satoshi Nakamura, and Graham Neubig
- Subjects
Linguistics and Language ,Phrase ,Machine translation ,Computer science ,business.industry ,Speech recognition ,Statistical model ,02 engineering and technology ,computer.software_genre ,Translation (geometry) ,Language and Linguistics ,03 medical and health sciences ,0302 clinical medicine ,Artificial Intelligence ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Computational linguistics ,business ,computer ,Software ,Natural language processing ,Sentence ,Word (computer architecture) - Abstract
Statistical models for reordering source words have been used to enhance hierarchical phrase-based statistical machine translation. There are existing word-reordering models that learn reorderings for any two source words in a sentence or only for two contiguous words. This paper proposes a series of separate sub-models to learn reorderings for word pairs with different distances. Our experiments demonstrate that reordering sub-models for word pairs with distances less than a specific threshold are useful to improve translation quality. Compared with previous work, our method more effectively and efficiently exploits helpful word-reordering information; it improves a basic hierarchical phrase-based system by 2.4-3.1 BLEU points and keeps the average time of translating one sentence under 10 s.
- Published
- 2016
- Full Text
- View/download PDF
36. Improving Neural Machine Translation with Neural Syntactic Distance
- Author
-
Masao Utiyama, Tiejun Zhao, Chunpeng Ma, Akihiro Tamura, and Eiichiro Sumita
- Subjects
Sequence ,Parsing ,Artificial neural network ,Machine translation ,Computer science ,business.industry ,02 engineering and technology ,010501 environmental sciences ,Translation (geometry) ,computer.software_genre ,01 natural sciences ,Syntax ,Tree (data structure) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Sentence ,0105 earth and related environmental sciences ,BLEU - Abstract
The explicit use of syntactic information has been proved useful for neural machine translation (NMT). However, previous methods resort to either tree-structured neural networks or long linearized sequences, both of which are inefficient. Neural syntactic distance (NSD) enables us to represent a constituent tree using a sequence whose length is identical to the number of words in the sentence. NSD has been used for constituent parsing, but not in machine translation. We propose five strategies to improve NMT with NSD. Experiments show that it is not trivial to improve NMT with NSD; however, the proposed strategies are shown to improve translation performance of the baseline model (+2.1 (En–Ja), +1.3 (Ja–En), +1.2 (En–Ch), and +1.0 (Ch–En) BLEU).
- Published
- 2019
- Full Text
- View/download PDF
37. NICT’s Supervised Neural Machine Translation Systems for the WMT19 News Translation Task
- Author
-
Eiichiro Sumita, Atsushi Fujita, Masao Utiyama, Rui Wang, Raj Dabre, Kehai Chen, and Benjamin Marie
- Subjects
Machine translation ,Computer science ,business.industry ,Translation (geometry) ,computer.software_genre ,Parallel corpora ,Task (project management) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Artificial intelligence ,Transfer of learning ,business ,Baseline (configuration management) ,computer ,Natural language processing - Abstract
In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions. We focused on leveraging multilingual transfer learning and back-translation for the extremely low-resource language pairs: Kazakh↔English and Gujarati↔English translation. For the Chinese↔English translation, we used the provided parallel data augmented with a large quantity of back-translated monolingual data to train state-of-the-art NMT systems. We then employed techniques that have been proven to be most effective, such as back-translation, fine-tuning, and model ensembling, to generate the primary submissions of Chinese↔English. For English→Finnish, our submission from WMT18 remains a strong baseline despite the increase in parallel corpora for this year’s task.
- Published
- 2019
- Full Text
- View/download PDF
38. MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method
- Author
-
Eiichiro Sumita, Masao Utiyama, and Chenchen Ding
- Subjects
Computer science ,business.industry ,computer.software_genre ,Unicode ,language.human_language ,Romanization ,Burmese ,Transcription (linguistics) ,language ,Input method ,Artificial intelligence ,business ,computer ,Digitization ,Natural language processing - Abstract
MY-AKKHARA is a method used to input Burmese texts encoded in the Unicode standard, based on commonly accepted Latin transcription. By using this method, arbitrary Burmese strings can be accurately inputted with 26 lowercase Latin letters. Meanwhile, the 26 uppercase Latin letters are designed as shortcuts of lowercase letter sequences. The frequency of Burmese characters is considered in MY-AKKHARA to realize an efficient keystroke distribution on a QWERTY keyboard. Given that the Unicode standard has not been extensively used in digitization of Burmese, we hope that MY-AKKHARA can contribute to the widespread use of Unicode in Myanmar and can provide a platform for smart input methods for Burmese in the future. An implementation of MY-AKKHARA running in Windows is released at http://www2.nict.go.jp/astrec-att/member/ding/my-akkhara.html
- Published
- 2019
- Full Text
- View/download PDF
39. Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English
- Author
-
Chenchen Ding, Benjamin Marie, Hour Kaing, Atsushi Fujita, Eiichiro Sumita, Masao Utiyama, and Aye Myat Mon
- Subjects
Machine translation ,Computer science ,business.industry ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,Translation (geometry) ,01 natural sciences ,ComputingMethodologies_PATTERNRECOGNITION ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,0105 earth and related environmental sciences - Abstract
This paper presents the NICT’s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks. For all the translation directions, we built state-of-the-art supervised neural (NMT) and statistical (SMT) machine translation systems, using monolingual data cleaned and normalized. Our combination of NMT and SMT performed among the best systems for the four translation directions. We also investigated the feasibility of unsupervised machine translation for low-resource and distant language pairs and confirmed observations of previous work showing that unsupervised MT is still largely unable to deal with them.
- Published
- 2019
- Full Text
- View/download PDF
40. Recurrent Positional Embedding for Neural Machine Translation
- Author
-
Masao Utiyama, Rui Wang, Eiichiro Sumita, and Kehai Chen
- Subjects
Machine translation ,Computer science ,business.industry ,Pattern recognition ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Recurrent neural network ,Embedding ,NIST ,Artificial intelligence ,0305 other medical science ,business ,computer ,0105 earth and related environmental sciences ,Transformer (machine learning model) - Abstract
In the Transformer network architecture, positional embeddings are used to encode order dependencies into the input representation. However, this input representation only involves static order dependencies based on discrete numerical information, that is, are independent of word content. To address this issue, this work proposes a recurrent positional embedding approach based on word vector. In this approach, these recurrent positional embeddings are learned by a recurrent neural network, encoding word content-based order dependencies into the input representation. They are then integrated into the existing multi-head self-attention model as independent heads or part of each head. The experimental results revealed that the proposed approach improved translation performance over that of the state-of-the-art Transformer baseline in WMT’14 English-to-German and NIST Chinese-to-English translation tasks.
- Published
- 2019
- Full Text
- View/download PDF
41. English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019
- Author
-
Rui Wang, Kehai Chen, Haipeng Sun, Chenchen Ding, Masao Utiyama, and Eiichiro Sumita
- Subjects
Machine translation ,Computer science ,business.industry ,05 social sciences ,Rank (computer programming) ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,language.human_language ,Task (project management) ,Burmese ,0502 economics and business ,language ,Language model ,Artificial intelligence ,050207 economics ,business ,computer ,Natural language processing ,0105 earth and related environmental sciences ,BLEU - Abstract
This paper presents the NICT’s participation (team ID: NICT) in the 6th Workshop on Asian Translation (WAT-2019) shared translation task, specifically Myanmar (Burmese) - English task in both translation directions. We built neural machine translation (NMT) systems for these tasks. Our NMT systems were trained with language model pretraining. Back-translation technology is adopted to NMT. Our NMT systems rank the third in English-to-Myanmar and the second in Myanmar-to-English according to BLEU score.
- Published
- 2019
- Full Text
- View/download PDF
42. Sentence-Level Agreement for Neural Machine Translation
- Author
-
Masao Utiyama, Rui Wang, Tiejun Zhao, Eiichiro Sumita, Kehai Chen, Mingming Yang, and Min Zhang
- Subjects
Machine translation ,Artificial neural network ,business.industry ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,computer.software_genre ,Agreement ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,NIST ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,Representation (mathematics) ,computer ,Natural language processing ,Sentence ,media_common - Abstract
The training objective of neural machine translation (NMT) is to minimize the loss between the words in the translated sentences and those in the references. In NMT, there is a natural correspondence between the source sentence and the target sentence. However, this relationship has only been represented using the entire neural network and the training objective is computed in word-level. In this paper, we propose a sentence-level agreement module to directly minimize the difference between the representation of source and target sentence. The proposed agreement module can be integrated into NMT as an additional training objective function and can also be used to enhance the representation of the source sentences. Empirical results on the NIST Chinese-to-English and WMT English-to-German tasks show the proposed agreement module can significantly improve the NMT performance.
- Published
- 2019
- Full Text
- View/download PDF
43. Incorporating Word Attention into Character-Based Word Segmentation
- Author
-
Masao Utiyama, Eiichiro Sumita, Shohei Higashiyama, Yoshiaki Oida, Masao Ideuchi, Isaac Okada, and Yohei Sakamoto
- Subjects
Feature engineering ,Artificial neural network ,Computer science ,business.industry ,Text segmentation ,Inference ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,020201 artificial intelligence & image processing ,Segmentation ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences - Abstract
Neural network models have been actively applied to word segmentation, especially Chinese, because of the ability to minimize the effort in feature engineering. Typical segmentation models are categorized as character-based, for conducting exact inference, or word-based, for utilizing word-level information. We propose a character-based model utilizing word information to leverage the advantages of both types of models. Our model learns the importance of multiple candidate words for a character on the basis of an attention mechanism, and makes use of it for segmentation decisions. The experimental results show that our model achieves better performance than the state-of-the-art models on both Japanese and Chinese benchmark datasets.
- Published
- 2019
- Full Text
- View/download PDF
44. Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation
- Author
-
Tiejun Zhao, Haipeng Sun, Masao Utiyama, Kehai Chen, Eiichiro Sumita, and Rui Wang
- Subjects
Word embedding ,Machine translation ,Computer science ,Speech recognition ,media_common.quotation_subject ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Agreement ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0305 other medical science ,Encoder ,computer ,0105 earth and related environmental sciences ,media_common - Abstract
Unsupervised bilingual word embedding (UBWE), together with other technologies such as back-translation and denoising, has helped unsupervised neural machine translation (UNMT) achieve remarkable results in several language pairs. In previous methods, UBWE is first trained using non-parallel monolingual corpora and then this pre-trained UBWE is used to initialize the word embedding in the encoder and decoder of UNMT. That is, the training of UBWE and UNMT are separate. In this paper, we first empirically investigate the relationship between UBWE and UNMT. The empirical findings show that the performance of UNMT is significantly affected by the performance of UBWE. Thus, we propose two methods that train UNMT with UBWE agreement. Empirical results on several language pairs show that the proposed methods significantly outperform conventional UNMT.
- Published
- 2019
- Full Text
- View/download PDF
45. NICT’s Machine Translation Systems for CCMT-2019 Translation Task
- Author
-
Eiichiro Sumita, Rui Wang, Masao Utiyama, and Kehai Chen
- Subjects
Machine translation ,business.industry ,Computer science ,Artificial intelligence ,Translation (geometry) ,computer.software_genre ,business ,computer ,Natural language processing ,Task (project management) - Abstract
This paper describes the NICT’s neural machine translation systems for Chinese\(\leftrightarrow \)English directions in the CCMT-2019 shared news translation task. We used the provided parallel data augmented with a large quantity of back-translated monolingual data to train state-of-the-art NMT systems. We then employed techniques that have been proven to be most effective, such as fine-tuning, and model ensembling, to generate the primary submissions of Chinese\(\leftrightarrow \)English translation tasks.
- Published
- 2019
- Full Text
- View/download PDF
46. Neural Machine Translation with Reordering Embeddings
- Author
-
Masao Utiyama, Rui Wang, Kehai Chen, and Eiichiro Sumita
- Subjects
Phrase ,Machine translation ,Computer science ,business.industry ,02 engineering and technology ,computer.software_genre ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,NIST ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,computer ,Encoder ,Natural language processing ,Sentence ,Transformer (machine learning model) - Abstract
The reordering model plays an important role in phrase-based statistical machine translation. However, there are few works that exploit the reordering information in neural machine translation. In this paper, we propose a reordering mechanism to learn the reordering embedding of a word based on its contextual information. These learned reordering embeddings are stacked together with self-attention networks to learn sentence representation for machine translation. The reordering mechanism can be easily integrated into both the encoder and the decoder in the Transformer translation system. Experimental results on WMT’14 English-to-German, NIST Chinese-to-English, and WAT Japanese-to-English translation tasks demonstrate that the proposed methods can significantly improve the performance of the Transformer.
- Published
- 2019
- Full Text
- View/download PDF
47. NICT’s Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task
- Author
-
Masao Utiyama, Atsushi Fujita, Eiichiro Sumita, Benjamin Marie, Rui Wang, Kehai Chen, and Haipeng Sun
- Subjects
Machine translation ,Computer science ,business.industry ,Simple (abstract algebra) ,Artificial intelligence ,Translation (geometry) ,business ,computer.software_genre ,computer ,Natural language processing ,Task (project management) - Abstract
This paper presents the NICT’s participation in the WMT19 unsupervised news translation task. We participated in the unsupervised translation direction: German-Czech. Our primary submission to the task is the result of a simple combination of our unsupervised neural and statistical machine translation systems. Our system is ranked first for the German-to-Czech translation task, using only the data provided by the organizers (“constraint’”), according to both BLEU-cased and human evaluation. We also performed contrastive experiments with other language pairs, namely, English-Gujarati and English-Kazakh, to better assess the effectiveness of unsupervised machine translation in for distant language pairs and in truly low-resource conditions.
- Published
- 2019
- Full Text
- View/download PDF
48. SJTU-NICT at MRP 2019: Multi-Task Learning for End-to-End Uniform Semantic Graph Parsing
- Author
-
Rui Wang, Eiichiro Sumita, Zhuosheng Zhang, Zuchao Li, Hai Zhao, and Masao Utiyama
- Subjects
Training set ,Parsing ,Computer science ,business.industry ,Multi-task learning ,Graph parsing ,computer.software_genre ,Language acquisition ,End-to-end principle ,Pruning algorithm ,Artificial intelligence ,business ,F1 score ,computer ,Natural language processing - Abstract
This paper describes our SJTU-NICT’s system for participating in the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2019 Conference for Computational Language Learning (CoNLL). Our system uses a graph-based approach to model a variety of semantic graph parsing tasks. Our main contributions in the submitted system are summarized as follows: 1. Our model is fully end-to-end and is capable of being trained only on the given training set which does not rely on any other extra training source including the companion data provided by the organizer; 2. We extend our graph pruning algorithm to a variety of semantic graphs, solving the problem of excessive semantic graph search space; 3. We introduce multi-task learning for multiple objectives within the same framework. The evaluation results show that our system achieved second place in the overall F1 score and achieved the best F1 score on the DM framework.
- Published
- 2019
- Full Text
- View/download PDF
49. Graph-Based Bilingual Word Embedding for Statistical Machine Translation
- Author
-
Masao Utiyama, Rui Wang, Sabine Ploux, Eiichiro Sumita, Bao-Liang Lu, Hai Zhao, CAMS, Centre d'Analyse et de Mathématique sociales (CAMS), and École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Phrase ,Word embedding ,General Computer Science ,Machine translation ,Computer science ,business.industry ,[SCCO.NEUR]Cognitive science/Neuroscience ,010102 general mathematics ,Context (language use) ,02 engineering and technology ,Pointwise mutual information ,computer.software_genre ,01 natural sciences ,[SHS]Humanities and Social Sciences ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,0101 mathematics ,business ,computer ,Natural language processing ,Word (computer architecture) ,ComputingMilieux_MISCELLANEOUS - Abstract
Bilingual word embedding has been shown to be helpful for Statistical Machine Translation (SMT). However, most existing methods suffer from two obvious drawbacks. First, they only focus on simple contexts such as an entire document or a fixed-sized sliding window to build word embedding and ignore latent useful information from the selected context. Second, the word sense but not the word should be the minimal semantic unit; however, most existing methods still use word representation. To overcome these drawbacks, this article presents a novel Graph-Based Bilingual Word Embedding (GBWE) method that projects bilingual word senses into a multidimensional semantic space. First, a bilingual word co-occurrence graph is constructed using the co-occurrence and pointwise mutual information between the words. Then, maximum complete subgraphs (cliques), which play the role of a minimal unit for bilingual sense representation, are dynamically extracted according to the contextual information. Consequently, correspondence analysis, principal component analyses, and neural networks are used to summarize the clique-word matrix into lower dimensions to build the embedding model. Without contextual information, the proposed GBWE can be applied to lexical translation. In addition, given contextual information, GBWE is able to give a dynamic solution for bilingual word representations, which can be applied to phrase translation and generation. Empirical results show that GBWE can enhance the performance of lexical translation, as well as Chinese/French-to-English and Chinese-to-Japanese phrase-based SMT tasks (IWSLT, NTCIR, NIST, and WAT).
- Published
- 2018
- Full Text
- View/download PDF
50. Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation
- Author
-
Rui Wang, Eiichiro Sumita, and Masao Utiyama
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Machine translation ,Computer science ,Epoch (reference date) ,Speech recognition ,05 social sciences ,Training (meteorology) ,Sampling (statistics) ,Sample (statistics) ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,0502 economics and business ,NIST ,050207 economics ,Computation and Language (cs.CL) ,computer ,Sentence ,0105 earth and related environmental sciences - Abstract
Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch. In reality, some sentences are well-learned during the initial few epochs; however, using this approach, the well-learned sentences would continue to be trained along with those sentences that were not well learned for 10-30 epochs, which results in a wastage of time. Here, we propose an efficient method to dynamically sample the sentences in order to accelerate the NMT training. In this approach, a weight is assigned to each sentence based on the measured difference between the training costs of two iterations. Further, in each epoch, a certain percentage of sentences are dynamically sampled according to their weights. Empirical results based on the NIST Chinese-to-English and the WMT English-to-German tasks depict that the proposed method can significantly accelerate the NMT training and improve the NMT performance., Revised version of ACL-2018
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.