Speech recognition model design for Sundanese language using WAV2VEC 2.0.

Authors :: Cryssiover, Albert
Zahra, Amalia
Source :: International Journal of Speech Technology; Mar2024, Vol. 27 Issue 1, p171-177, 7p
Publication Year :: 2024
Abstract: Indonesia has a variety of languages, one of which is Sundanese. Sundanese is a regional language from Indonesia that has the potential to become extinct. One way to prevent Sundanese from potential extinction is with speech recognition. Speech recognition is one of the most advanced technologies in a machine for simulating human behavior. In this study, researchers used the existing Wav2Vec 2.0 pre-trained model for a fine-tune process so that the model could learn and predict Sundanese. The fine-tune process was carried out on the Wav2Vec 2.0 Base model and the Wav2Vec 2.0 Large model using the Sundanese language dataset of approximately 53 h duration. The fine-tune process lasted for 1 to 2 h, and the results of the fine-tune were followed by an evaluation process with the WER metric. The fine-tuned model gives good results, with a WER value of 23.5% for the Wav2Vec 2.0 Base model and 24.0% for the Wav2Vec 2.0 Large model. [ABSTRACT FROM AUTHOR]