Back to Search Start Over

A characteristic extraction method for VoicePrint slice statistics base on joint time-frequency processing.

Authors :
Liang, Guolong
Guo, Shaoxiang
Zou, Nan
Wu, Guanyi
Source :
Applied Acoustics. Jan2024, Vol. 216, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

• A new Mel feature fusion method has been extracted. • From the perspective of joint time–frequency processing, voiceprint slicing statistical features have been proposed, which effectively utilize time information and perform better than Mel features. This approach addresses the issue of insufficient time utilization in Mel feature extraction.This method fully utilizes the temporal information. • The two types of features proposed in this article demonstrated strong performance in both simulated and actual data processing. Mel-Frequency Cepstral Coefficients (MFCC) and its differential coefficients are widely used as a typical non-linear spectral envelope feature in passive sonar target recognition. MFCC emphasizes the frequency domain information at the low-frequency end of the target, while the differential coefficients add dynamic characteristics of MFCC in the time domain. However, the utilization of time information is not sufficient. In response to this issue, we propose two solutions: one is the characteristic fusion method of Weighted Mel-Frequency Cepstral Coefficient based on Bhattacharyya Distance (BD-WMFCC); the other is the characteristic extraction method of VoicePrint Slice Statistics (VPSS) based on time–frequency joint processing. BD-WMFCC combines weighted differential coefficients with MFCC using Bhattacharyya distance, maximizing the utilization of time information. VPSS is a linear spectral feature that utilizes the stability of the spectral intensity within a certain bandwidth over continuous time. It extracts the steady-state characteristics of the target's spectrum with respect to time changes at equidistant bandwidths. Simulation and real data analysis have shown that VPSS demonstrates outstanding separability and classification performance in both non-linear and linear spectral features, with strong stability. In practical data analysis, the accuracy of VPSS features under the AlexNet algorithm is 3.8 % and 5.7 % higher than the spectral (Fre) and Mel features, respectively. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0003682X
Volume :
216
Database :
Academic Search Index
Journal :
Applied Acoustics
Publication Type :
Academic Journal
Accession number :
174689005
Full Text :
https://doi.org/10.1016/j.apacoust.2023.109814