Back to Search
Start Over
WCC-EC 2.0: Enhancing Neural Machine Translation with a 1.6M+ Web-Crawled English-Chinese Parallel Corpus.
- Source :
- Electronics (2079-9292); Apr2024, Vol. 13 Issue 7, p1381, 15p
- Publication Year :
- 2024
-
Abstract
- This research introduces WCC-EC 2.0 (Web-Crawled Corpus—English and Chinese), a comprehensive parallel corpus designed for enhancing Neural Machine Translation (NMT), featuring over 1.6 million English-Chinese sentence pairs meticulously gathered via web crawling. This corpus, extracted through an advanced web crawler, showcases the vast linguistic diversity and richness of English and Chinese, uniquely spanning the rarely covered news and music domains. Our methodical approach in web crawling and corpus assembly, coupled with rigorous experiments and manual evaluations, demonstrated its superiority by achieving high BLEU scores, marking significant strides in translation accuracy and model resilience. Its inclusion of these specific areas adds significant value, providing a unique dataset that enriches the scope for NMT research and development. With the rise of NMT technology, WCC-EC 2.0 emerges not only as an invaluable resource for researchers and developers, but also as a pivotal tool for improving translation accuracy, training more resilient models, and promoting interlingual communication. [ABSTRACT FROM AUTHOR]
- Subjects :
- MACHINE translating
CORPORA
CHINESE language
ENGLISH language
RESEARCH personnel
Subjects
Details
- Language :
- English
- ISSN :
- 20799292
- Volume :
- 13
- Issue :
- 7
- Database :
- Complementary Index
- Journal :
- Electronics (2079-9292)
- Publication Type :
- Academic Journal
- Accession number :
- 176594274
- Full Text :
- https://doi.org/10.3390/electronics13071381