1. End-to-End Language Identification Using a Residual Convolutional Neural Network with Attentive Temporal Pooling
- Author
-
Gautam Bhattacharya, Joao Monteiro, Tiago H. Falk, and Jahangir Alam
- Subjects
Language identification ,Computer science ,Speech recognition ,Pooling ,020206 networking & telecommunications ,02 engineering and technology ,Residual ,Convolutional neural network ,Set (abstract data type) ,Variable (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Benchmark (computing) ,Embedding ,020201 artificial intelligence & image processing - Abstract
In this work, we tackle the problem of end-to-end language identification from speech. To this end, we propose the use of a residual convolutional neural network aiming at exploiting the ability of such architectures to take into account large contextual segments of input data. Moreover, in order for variable input lengths to be supported by the proposed setting, a self-attention mechanism is employed on top of the final convolutional layer. This results in a learnable temporal feature pooling scheme that allows for embedding varying duration utterances into a fixed dimension space. Evaluation is performed on data containing ten oriental languages under different test conditions, namely: short-duration recordings, confusing languages trials, as well as a set of trials in which non-target unseen languages are included. End-to-end evaluation of the proposed framework is thus shown to significantly outperform well-known benchmark methods under considered evaluation conditions.
- Published
- 2019
- Full Text
- View/download PDF