Back to Search
Start Over
Speech Recognition by Simply Fine-tuning BERT
- Source :
- ICASSP
- Publication Year :
- 2021
-
Abstract
- We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations. Our assumption is that given a history context sequence, a powerful LM can narrow the range of possible choices and the speech signal can be used as a simple clue. Hence, comparing to conventional ASR systems that train a powerful acoustic model (AM) from scratch, we believe that speech recognition is possible by simply fine-tuning a BERT model. As an initial study, we demonstrate the effectiveness of the proposed idea on the AISHELL dataset and show that stacking a very simple AM on top of BERT can yield reasonable performance.<br />Accepted to ICASSP 2021
- Subjects :
- FOS: Computer and information sciences
Sequence
Signal processing
Sound (cs.SD)
Computer Science - Computation and Language
Computer science
Speech recognition
SIGNAL (programming language)
Acoustic model
Context (language use)
Computer Science - Sound
Data modeling
Simple (abstract algebra)
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Language model
Computation and Language (cs.CL)
Electrical Engineering and Systems Science - Audio and Speech Processing
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- ICASSP
- Accession number :
- edsair.doi.dedup.....85c5e0688028ef9237e8c610d100060c