Back to Search Start Over

Speech Recognition by Simply Fine-tuning BERT

Authors :
Chia-Hua Wu
Hsin-Min Wang
Kuan-Yu Chen
Shang-Bao Luo
Tomoki Toda
Wen-Chin Huang
Source :
ICASSP
Publication Year :
2021

Abstract

We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations. Our assumption is that given a history context sequence, a powerful LM can narrow the range of possible choices and the speech signal can be used as a simple clue. Hence, comparing to conventional ASR systems that train a powerful acoustic model (AM) from scratch, we believe that speech recognition is possible by simply fine-tuning a BERT model. As an initial study, we demonstrate the effectiveness of the proposed idea on the AISHELL dataset and show that stacking a very simple AM on top of BERT can yield reasonable performance.<br />Accepted to ICASSP 2021

Details

Language :
English
Database :
OpenAIRE
Journal :
ICASSP
Accession number :
edsair.doi.dedup.....85c5e0688028ef9237e8c610d100060c