1. High-Resolution Speaker Counting in Reverberant Rooms Using CRNN with Ambisonics Features
- Author
-
Pierre-Amaury Grumiaux, Laurent Girin, Srdan Kitic, and Alexandre Guerin
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Reverberation ,Microphone ,Computer science ,Ambisonics ,Speech recognition ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Computer Science - Sound ,Speaker diarisation ,Sound recording and reproduction ,Noise ,Recurrent neural network ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Audio signal processing ,computer ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speakers at each timestep is a prerequisite, or at least it can be a strong advantage, in addition to enabling a low latency processing. For that purpose, we address the speaker counting problem with a multichannel convolutional recurrent neural network which produces an estimation at a short-term frame resolution. We trained the network to predict up to 5 concurrent speakers in a multichannel mixture, with simulated data including many different conditions in terms of source and microphone positions, reverberation, and noise. The network can predict the number of speakers with good accuracy at frame resolution., 5 pages, 1 figure
- Published
- 2021
- Full Text
- View/download PDF