Back to Search
Start Over
Robust f0 extraction from monophonic signals using adaptive sub-band filtering
- Source :
- Speech Communication. 116:77-85
- Publication Year :
- 2020
- Publisher :
- Elsevier BV, 2020.
-
Abstract
- Fundamental frequency (f0) extraction plays an important role in processing of monophonic signals such as speech and song. It is essential in various real-time applications such as emotion recognition, speech/singing voice discrimination and so on. Several f0 extraction methods have been proposed over the years, but no one algorithm works well for both speech and song. In this paper, we propose a novel approach that can accurately estimate f0 from speech as well as songs. First, voiced/unvoiced detection is performed using a novel RNN-LSTM based approach. Then, each voiced frame is decomposed into several sub-bands. From each sub-band of a voiced frame, the candidate pitch periods are identified using autocorrelation and non-linear operations. Finally, Viterbi decoding is used to form the final pitch contours. The performance of the proposed method is evaluated using popular speech (Keele, CMU-ARCTIC), and song (MIR-1K, LYRICS) databases. The evaluation results show that the proposed method performs equally well for speech and monophonic songs, and is better than the state-of-the-art methods. Further, the efficacy of proposed f0 extraction method is demonstrated by developing an interactive SARGAM learning tool.
- Subjects :
- Linguistics and Language
Computer science
Communication
Speech recognition
Autocorrelation
Frame (networking)
020206 networking & telecommunications
02 engineering and technology
Fundamental frequency
Lyrics
01 natural sciences
Language and Linguistics
Computer Science Applications
Viterbi decoder
Modeling and Simulation
0103 physical sciences
0202 electrical engineering, electronic engineering, information engineering
Extraction methods
Computer Vision and Pattern Recognition
Emotion recognition
Singing
010301 acoustics
Software
Subjects
Details
- ISSN :
- 01676393
- Volume :
- 116
- Database :
- OpenAIRE
- Journal :
- Speech Communication
- Accession number :
- edsair.doi...........d2bcd78822e83af7e76d53a2fc7494c3
- Full Text :
- https://doi.org/10.1016/j.specom.2019.11.006