Back to Search Start Over

Automatic Translation between Mixtec to Spanish Languages Using Neural Networks.

Authors :
Santiago-Benito, Hermilo
Córdova-Esparza, Diana-Margarita
Castro-Sánchez, Noé-Alejandro
García-Ramirez, Teresa
Romero-González, Julio-Alejandro
Terven, Juan
Source :
Applied Sciences (2076-3417); Apr2024, Vol. 14 Issue 7, p2958, 21p
Publication Year :
2024

Abstract

This paper introduces a novel method for collecting and translating texts from the Mixtec to the Spanish language. The method comprises four primary steps. First, we collected a Mixtec–Spanish corpus that includes 4568 sentences from educational and religious domain texts. To enhance the parallel corpus, we generate synthetic data with GPT-3.5. Second, we cleaned the data with a semi-automatic approach followed by preprocessing and tokenization. In preprocessing, we removed stop words, duplicated sentences, special characters, and numbers and converted them to lowercase. Third, we performed semi-automatic alignment to find the correspondence of Mixtec–Spanish sentences to generate sentence-level aligned texts necessary for translation. Finally, we trained automatic translation models based on recurrent neural networks, bidirectional recurrent neural networks, and Transformers. Our system achieved a BLEU score of 95.66 for Mixtec-to-Spanish translation and 99.87 for Spanish-to-Mixtec translation. We also obtained a translation edit rate (TER) of 0.5 for Spanish-to-Mixtec and a TER of 16.5 for Mixtec-to-Spanish. Our research stands out as a pioneering effort in the field of automatic Mixtec-to-Spanish translation in Mexico, filling a gap identified in the current literature. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20763417
Volume :
14
Issue :
7
Database :
Complementary Index
Journal :
Applied Sciences (2076-3417)
Publication Type :
Academic Journal
Accession number :
176597175
Full Text :
https://doi.org/10.3390/app14072958