Back to Search Start Over

Data Augmentation and D-vector Representation Methods for Speaker Change Detection

Authors :
Jeon Gue Park
Shin Cha
Jisu Park
Seongbae Eun
Young-Sun Yun
Source :
RACS
Publication Year :
2020
Publisher :
ACM, 2020.

Abstract

Speaker Change Detection (SCD) is the process that detects speaker changes during a conversation. The conversation can be divided into homogeneous segments using a typical SCD system or speaker diarization system in which the segments are partitioned according to a speaker identity. When the d-vectors are used to identify or verify the speakers with deep neural network model, they are often considered insufficient to train model for detecting the speaker changes by using only acoustic information. There are few dedicated datasets for system training, so the progress of the SCD study is slow and the performance is poor. Therefore, we presented data augmentation method based on TIMIT dataset to suit for the system, and we also proposed several methods to represent d-vectors for SCD systems and their preliminary results. In the proposed data augmentation method, the boundary information of speakers is transformed into probability according to the offset in a given frame and collected in the segment. To model the boundaries of the speakers, we concatenate two random speech sentences dedicated to speech recognition system. The preliminary experimental results, specifically recall percentage, shows the possibility of the proposed approaches. In the future, we will add linguistic information to the proposed classification system, or improve the system to use hybrid system of d-vector and frame vectors, or convolutional networks.

Details

Database :
OpenAIRE
Journal :
Proceedings of the International Conference on Research in Adaptive and Convergent Systems
Accession number :
edsair.doi...........7e4784f0c9902d78dba0d734fb22126e
Full Text :
https://doi.org/10.1145/3400286.3418270