1. Statistical methods in multi-speaker automatic speech recognition
- Author
-
Jean-Paul Haton, J. F. Mari, Anne Boyer, P. Divoux, Kamel Smaïli, J.-C. Di Martino, Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Computer science ,Speech recognition ,Markov models ,02 engineering and technology ,Markov model ,computer.software_genre ,Dynamic programming ,Clustering ,0504 sociology ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Management of Technology and Innovation ,[INFO.INFO-AU]Computer Science [cs]/Automatic Control Engineering ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,Finite-state machine ,Markov chain ,business.industry ,05 social sciences ,Vector quantization ,Automatic speech recognition ,050401 social sciences methods ,020206 networking & telecommunications ,Multi-speaker ,Modeling and Simulation ,Word recognition ,Training phase ,Artificial intelligence ,business ,computer ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Natural language processing - Abstract
International audience; Automatic speech recognition and understanding (ASR) plays an important role in the framework of man-machine communication. Substantial industrial developments are at present in progress in this area. However, after 40 years or so of efforts several fundamental questions remain open. This paper is concerned with a comparative study of four different methods for multi-speaker word recognition: (i) clustering of acoustic templates, (ii) comparison with a finite state automaton, (iii) dynamic programming and vector quantization, (iv) stochastic Markov sources. In order to make things comparable, the four methods were tested with the same material made up of the ten digits (0 to 9) pronounced four times by 60 different speakers (30 males and 30 females). We will distinguish in our experiments between multi-speaker systems (capable of recognizing words pronounced by speakers that have been used during the training phase of the system) and speaker-independent systems (capable of recognizing words pronounced by speakers totally unknown to the system). Half of the corpus (15 male and 15 female) were used for training, and the remaining part for test.
- Published
- 1988
- Full Text
- View/download PDF