Start Over

Real to H-space Encoder for Speech Recognition

Authors :: Georges Linarès
Mohamed Morchid
Renato De Mori
Titouan Parcollet
Laboratoire Informatique d'Avignon (LIA)
Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
McGill University = Université McGill [Montréal, Canada]
Parcollet, Titouan
Source :: INTERSPEECH 2019, INTERSPEECH 2019, Jun 2019, Graz, Austria, INTERSPEECH
Publication Year :: 2019
Publisher :: HAL CCSD, 2019.
Abstract: Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences. Recently, it has been shown that different input representations, based on multidimensional algebras, such as complex and quaternion numbers, are able to bring to neural networks a more natural, compressive and powerful representation of the input signal by outperforming common real-valued NNs. Indeed, quaternion-valued neural networks (QNNs) better learn both internal dependencies, such as the relation between the Mel-filter-bank value of a specific time frame and its time derivatives, and global dependencies, describing the relations that exist between time frames. Nonetheless, QNNs are limited to quaternion-valued input signals, and it is difficult to benefit from this powerful representation with real-valued input data. This paper proposes to tackle this weakness by introducing a real-to-quaternion encoder that allows QNNs to process any one dimensional input features, such as traditional Mel-filter-banks for automatic speech recognition.<br />Comment: Accepted at INTERSPEECH 2019

Subjects :: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI]
FOS: Computer and information sciences
Sound (cs.SD)
Relation (database)
Computer science
Speech recognition
Computer Science::Neural and Evolutionary Computation
[INFO] Computer Science [cs]
Computer Science - Sound
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
[INFO]Computer Science [cs]
Quaternion
Representation (mathematics)
Index Terms: quaternion neural networks
Computer Science - Computation and Language
Artificial neural network
Frame (networking)
Process (computing)
speech recognition
recurrent neural net- works
Recurrent neural network
Encoder
Computation and Language (cs.CL)
Electrical Engineering and Systems Science - Audio and Speech Processing

Details

Language :: English
Database :: OpenAIRE
Journal :: INTERSPEECH 2019, INTERSPEECH 2019, Jun 2019, Graz, Austria, INTERSPEECH
Accession number :: edsair.doi.dedup.....5b861c082624b8ed15e3fdc9eba6f156

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Real to H-space Encoder for Speech Recognition

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Real to H-space Encoder for Speech Recognition

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources