8 results on '"Matrouf, Driss"'
Search Results
2. How to Leverage DNN-based speech enhancement for multi-channel speaker verification?
- Author
-
Dowerah, Sandipana, Serizel, Romain, Jouvet, Denis, Mohammadamini, Mohammad, and Matrouf, Driss
- Subjects
Computer Science - Sound ,Computer Science - Human-Computer Interaction ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Speaker verification (SV) suffers from unsatisfactory performance in far-field scenarios due to environmental noise andthe adverse impact of room reverberation. This work presents a benchmark of multichannel speech enhancement for far-fieldspeaker verification. One approach is a deep neural network-based, and the other is a combination of deep neural network andsignal processing. We integrated a DNN architecture with signal processing techniques to carry out various experiments. Ourapproach is compared to the existing state-of-the-art approaches. We examine the importance of enrollment in pre-processing,which has been largely overlooked in previous studies. Experimental evaluation shows that pre-processing can improve the SVperformance as long as the enrollment files are processed similarly to the test data and that test and enrollment occur within similarSNR ranges. Considerable improvement is obtained on the generated and all the noise conditions of the VOiCES dataset.
- Published
- 2022
3. A bridge between features and evidence for binary attribute-driven perfect privacy
- Author
-
Noé, Paul-Gauthier, Nautsch, Andreas, Matrouf, Driss, Bousquet, Pierre-Michel, and Bonastre, Jean-François
- Subjects
Computer Science - Cryptography and Security ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Attribute-driven privacy aims to conceal a single user's attribute, contrary to anonymisation that tries to hide the full identity of the user in some data. When the attribute to protect from malicious inferences is binary, perfect privacy requires the log-likelihood-ratio to be zero resulting in no strength-of-evidence. This work presents an approach based on normalizing flow that maps a feature vector into a latent space where the evidence, related to the binary attribute, and an independent residual are disentangled. It can be seen as a non-linear discriminant analysis where the mapping is invertible allowing generation by mapping the latent variable back to the original space. This framework allows to manipulate the log-likelihood-ratio of the data and therefore allows to set it to zero for privacy. We show the applicability of the approach on an attribute-driven privacy task where the sex information is removed from speaker embeddings. Results on VoxCeleb2 dataset show the efficiency of the method that outperforms in terms of privacy and utility our previous experiments based on adversarial disentanglement., Comment: ICASSP 2022
- Published
- 2021
4. Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation
- Author
-
Noé, Paul-Gauthier, Mohammadamini, Mohammad, Matrouf, Driss, Parcollet, Titouan, Nautsch, Andreas, and Bonastre, Jean-François
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
In speech technologies, speaker's voice representation is used in many applications such as speech recognition, voice conversion, speech synthesis and, obviously, user authentication. Modern vocal representations of the speaker are based on neural embeddings. In addition to the targeted information, these representations usually contain sensitive information about the speaker, like the age, sex, physical state, education level or ethnicity. In order to allow the user to choose which information to protect, we introduce in this paper the concept of attribute-driven privacy preservation in speaker voice representation. It allows a person to hide one or more personal aspects to a potential malicious interceptor and to the application provider. As a first solution to this concept, we propose to use an adversarial autoencoding method that disentangles in the voice representation a given speaker attribute thus allowing its concealment. We focus here on the sex attribute for an Automatic Speaker Verification (ASV) task. Experiments carried out using the VoxCeleb datasets have shown that the proposed method enables the concealment of this attribute while preserving ASV ability., Comment: Accepted to Interspeech 2021
- Published
- 2020
5. Speech Pseudonymisation Assessment Using Voice Similarity Matrices
- Author
-
Noé, Paul-Gauthier, Bonastre, Jean-François, Matrouf, Driss, Tomashenko, Natalia, Nautsch, Andreas, and Evans, Nicholas
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Cryptography and Security - Abstract
The proliferation of speech technologies and rising privacy legislation calls for the development of privacy preservation solutions for speech applications. These are essential since speech signals convey a wealth of rich, personal and potentially sensitive information. Anonymisation, the focus of the recent VoicePrivacy initiative, is one strategy to protect speaker identity information. Pseudonymisation solutions aim not only to mask the speaker identity and preserve the linguistic content, quality and naturalness, as is the goal of anonymisation, but also to preserve voice distinctiveness. Existing metrics for the assessment of anonymisation are ill-suited and those for the assessment of pseudonymisation are completely lacking. Based upon voice similarity matrices, this paper proposes the first intuitive visualisation of pseudonymisation performance for speech signals and two novel metrics for objective assessment. They reflect the two, key pseudonymisation requirements of de-identification and voice distinctiveness., Comment: Interspeech 2020
- Published
- 2020
6. Data augmentation versus noise compensation for x- vector speaker recognition systems in noisy environments
- Author
-
Mohammadamini, Mohammad and Matrouf, Driss
- Subjects
Computer Science - Sound ,Computer Science - Computation and Language ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
The explosion of available speech data and new speaker modeling methods based on deep neural networks (DNN) have given the ability to develop more robust speaker recognition systems. Among DNN speaker modelling techniques, x-vector system has shown a degree of robustness in noisy environments. Previous studies suggest that by increasing the number of speakers in the training data and using data augmentation more robust speaker recognition systems are achievable in noisy environments. In this work, we want to know if explicit noise compensation techniques continue to be effective despite the general noise robustness of these systems. For this study, we will use two different x-vector networks: the first one is trained on Voxceleb1 (Protocol1), and the second one is trained on Voxceleb1+Voxveleb2 (Protocol2). We propose to add a denoising x-vector subsystem before scoring. Experimental results show that, the x-vector system used in Protocol2 is more robust than the other one used Protocol1. Despite this observation we will show that explicit noise compensation gives almost the same EER relative gain in both protocols. For example, in the Protocol2 we have 21% to 66% improvement of EER with denoising techniques.
- Published
- 2020
7. ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech
- Author
-
Wang, Xin, Yamagishi, Junichi, Todisco, Massimiliano, Delgado, Hector, Nautsch, Andreas, Evans, Nicholas, Sahidullah, Md, Vestman, Ville, Kinnunen, Tomi, Lee, Kong Aik, Juvela, Lauri, Alku, Paavo, Peng, Yu-Huai, Hwang, Hsin-Te, Tsao, Yu, Wang, Hsin-Min, Maguer, Sebastien Le, Becker, Markus, Henderson, Fergus, Clark, Rob, Zhang, Yu, Wang, Quan, Jia, Ye, Onuma, Kai, Mushika, Koji, Kaneda, Takashi, Jiang, Yuan, Liu, Li-Juan, Wu, Yi-Chiao, Huang, Wen-Chin, Toda, Tomoki, Tanaka, Kou, Kameoka, Hirokazu, Steiner, Ingmar, Matrouf, Driss, Bonastre, Jean-Francois, Govender, Avashna, Ronanki, Srikanth, Zhang, Jing-Xuan, and Ling, Zhen-Hua
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Cryptography and Security ,Computer Science - Sound ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as "presentation attacks." These vulnerabilities are generally unacceptable and call for spoofing countermeasures or "presentation attack detection" systems. In addition to impersonation, ASV systems are vulnerable to replay, speech synthesis, and voice conversion attacks. The ASVspoof 2019 edition is the first to consider all three spoofing attack types within a single challenge. While they originate from the same source database and same underlying protocol, they are explored in two specific use case scenarios. Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques. Replay spoofing attacks within a physical access (PA) scenario are generated through carefully controlled simulations that support much more revealing analysis than possible previously. Also new to the 2019 edition is the use of the tandem detection cost function metric, which reflects the impact of spoofing and countermeasures on the reliability of a fixed ASV system. This paper describes the database design, protocol, spoofing attack implementations, and baseline ASV and countermeasure results. It also describes a human assessment on spoofed data in logical access. It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona-fide utterances even by human subjects., Comment: Accepted, Computer Speech and Language. This manuscript version is made available under the CC-BY-NC-ND 4.0. For the published version on Elsevier website, please visit https://doi.org/10.1016/j.csl.2020.101114
- Published
- 2019
8. LIA system description for NIST SRE 2016
- Author
-
Rouvier, Mickael, Bousquet, Pierre-Michel, Ajili, Moez, Kheder, Waad Ben, Matrouf, Driss, and Bonastre, Jean-François
- Subjects
Computer Science - Sound - Abstract
This paper describes the LIA speaker recognition system developed for the Speaker Recognition Evaluation (SRE) campaign. Eight sub-systems are developed, all based on a state-of-the-art approach: i-vector/PLDA which represents the mainstream technique in text-independent speaker recognition. These sub-systems differ: on the acoustic feature extraction front-end (MFCC, PLP), at the i-vector extraction stage (UBM, DNN or two-feats posteriors) and finally on the data-shifting (IDVC, mean-shifting). The submitted system is a fusion at the score-level of these eight sub-systems.
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.