1. Neural RAPT: deep learning-based pitch tracking with prior algorithmic knowledge instillation.
- Author
-
Wang, Kai, Liu, Jingjing, Peng, Yizhou, and Huang, Hao
- Abstract
The task of estimating the fundamental frequency of an audio signal, also known as pitch tracking, has been a long-standing research area in speech signal processing. Recently, there has been a growing interest in data-driven pitch tracking methods that leverage deep learning technologies. However, deep learning-based pitch tracking models often neglect the methodologies employed in well-established signal processing-based pitch tracking algorithms, which incorporate valuable prior knowledge. Motivated by this, we propose Neural RAPT, an interpretable neural network-based pitch tracking model that incorporates signal processing knowledge from RAPT. The proposed model consists of a front-end module and a back-end module. The front-end module adopts the U-Net structure, which inherently involves downsampling and upsampling processes similar to RAPT. To enhance the U-Net, we introduce a neural autocorrelation function (ACF) module using masked CNN, along with a normalization layer that models the sample pair-wise product in the normalized cross-correlation function (NCCF). The back-end module is based on the Transformer architecture. The model is evaluated by pitch tracking experiments on the PTDB-TUG database and noisy mixtures with the NOISEX-92 database at different SNRs. The experimental results demonstrate that incorporating algorithmic knowledge into the model design leads to improved performance. The proposed model inherits the advantages of high accuracy from traditional pitch tracking algorithms, while also benefiting from the noise robustness offered by neural network-based methods. Consequently, our model exhibits superiority over existing deep learning-based pitch tracking methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF