112 results on '"Probabilistic linear discriminant analysis"'
Search Results
2. Enhancing the Damage Detection and Classification of Unknown Classes with a Hybrid Supervised–Unsupervised Approach.
- Author
-
Stagi, Lorenzo, Sclafani, Lorenzo, Tronci, Eleonora M., Betti, Raimondo, Milana, Silvia, Culla, Antonio, Roveri, Nicola, and Carcaterra, Antonio
- Subjects
FISHER discriminant analysis ,STRUCTURAL health monitoring ,CLASSIFICATION algorithms ,SUPERVISED learning ,DYNAMICAL systems - Abstract
Most damage-assessment strategies for dynamic systems only distinguish between undamaged and damaged conditions without recognizing the level or type of damage or considering unseen conditions. This paper proposes a novel framework for structural health monitoring (SHM) that combines supervised and unsupervised learning techniques to assess damage using a system's structural response (e.g., the acceleration response of big infrastructures). The objective is to enhance the benefits of a supervised learning framework while addressing the challenges of working in an SHM context. The proposed framework uses a Linear Discriminant Analysis (LDA)/Probabilistic Linear Discriminant Analysis (PLDA) strategy that enables learning the distributions of known classes and the performance of probabilistic estimations on new incoming data. The methodology is developed and proposed in two versions. The first version is used in the context of controlled, conditioned monitoring or for post-damage assessment, while the second analyzes the single observational data. Both strategies are built in an automatic framework able to classify known conditions and recognize unseen damage classes, which are then used to update the classification algorithm. The proposed framework's effectiveness is first tested considering the acceleration response of a numerically simulated 12-degree-of-freedom system. Then, the methodology's practicality is validated further by adopting the experimental monitoring data of the benchmark study case of the Z24 bridge. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Enhancing the Damage Detection and Classification of Unknown Classes with a Hybrid Supervised–Unsupervised Approach
- Author
-
Lorenzo Stagi, Lorenzo Sclafani, Eleonora M. Tronci, Raimondo Betti, Silvia Milana, Antonio Culla, Nicola Roveri, and Antonio Carcaterra
- Subjects
structural health monitoring ,damage detection ,cepstral coefficients ,probabilistic linear discriminant analysis ,Z24 bridge ,Technology - Abstract
Most damage-assessment strategies for dynamic systems only distinguish between undamaged and damaged conditions without recognizing the level or type of damage or considering unseen conditions. This paper proposes a novel framework for structural health monitoring (SHM) that combines supervised and unsupervised learning techniques to assess damage using a system’s structural response (e.g., the acceleration response of big infrastructures). The objective is to enhance the benefits of a supervised learning framework while addressing the challenges of working in an SHM context. The proposed framework uses a Linear Discriminant Analysis (LDA)/Probabilistic Linear Discriminant Analysis (PLDA) strategy that enables learning the distributions of known classes and the performance of probabilistic estimations on new incoming data. The methodology is developed and proposed in two versions. The first version is used in the context of controlled, conditioned monitoring or for post-damage assessment, while the second analyzes the single observational data. Both strategies are built in an automatic framework able to classify known conditions and recognize unseen damage classes, which are then used to update the classification algorithm. The proposed framework’s effectiveness is first tested considering the acceleration response of a numerically simulated 12-degree-of-freedom system. Then, the methodology’s practicality is validated further by adopting the experimental monitoring data of the benchmark study case of the Z24 bridge.
- Published
- 2024
- Full Text
- View/download PDF
4. Performance Evaluation of Language Identification on Emotional Speech Corpus of Three Indian Languages
- Author
-
Basu, Joyanta, Majumder, Swanirbhar, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Bhattacharyya, Siddhartha, editor, Dutta, Paramartha, editor, and Datta, Kakali, editor
- Published
- 2021
- Full Text
- View/download PDF
5. Printer source identification by feature modeling in the total variable printer space.
- Author
-
Hamzehyan, Roozbeh, Razzazi, Farbod, and Behrad, Alireza
- Subjects
- *
COMPUTER printers , *OPTICAL character recognition , *ALGORITHMS , *PLURALITY voting , *FISHER discriminant analysis , *FACTOR analysis - Abstract
Advances in the digital world have attracted the attention of many researchers in terms of developing digital forensics. Using machine learning methods for print source identification is one of the developing areas in the field of digital forensics. In this paper, a new method is presented for printer source identification by modeling the primary Local Binary Pattern (LBP) features in the total variable printer space and extracting the secondary features based on the joint factor analysis. Only one low‐dimensional i‐vector feature is employed for each document image without using any optical character recognition (OCR) algorithm or similar processes. This property eliminates the requirement for the majority voting algorithm and reduces the computational cost of the classification process. Furthermore, the proposed algorithm is not limited to a specific language or character set. The capabilities of the proposed method in extracting useful discriminant information from the sparse print shadow texture are revealed through simulation. The simulation results showed that the proposed algorithm obtained the accuracy of 98.48% by refining the basic features of LBP, which is comparable to the results of the state‐of‐the‐art approaches in this field. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. An Accurate WiFi Indoor Positioning Algorithm for Complex Pedestrian Environments.
- Author
-
Yu, Da and Li, Changgeng
- Abstract
This paper proposes a precise WiFi fingerprinting indoor positioning algorithm for complex pedestrian environments. We transform the disturbed received signal strength (RSS) from the original space to latent space using the improved probabilistic linear discriminant analysis (PLDA). In the latent space, Bayes rule is used to calculate the posterior probability of the similarity between the test point and the reference points, and the ${K}$ reference points with the highest posterior probability are weighted to estimate the position. Actual on-site experiments involving three floors demonstrate that the mean localization error of the proposed algorithm is 1.38 m, which outperforms the Horus algorithm by 29% under the same test conditions. In addition, by studying the variability of mean value of RSS in different pedestrian environments, the fingerprint maps in different states of personnel movement are simulated. By using which, the average localization error of the proposed algorithm increases slightly to 1.63m, while the workload required during the offline training phase is significantly reduced. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
7. Speech Replay Spoofing Attack Detection System Based on Fusion of Classification Algorithms
- Author
-
А.А. Лепендин, Я.А. Филин, and П.В. Малинин
- Subjects
automatic speaker verification ,voice spoofing ,replay attacks ,universal background model ,i-vector ,probabilistic linear discriminant analysis ,tree boosting ,model fusion ,Physics ,QC1-999 ,History (General) ,D1-2009 - Abstract
Fast development of modern technologies of digital processing and speech recording leads to the fact that it is necessary to take into account the potential threats from the speech replay attacks. We propose our ensemble fusion replay attack detection system. It uses constant Q cepstral coefficients as speech features and short-time mean normalization for their preprocessing. The set of binary classifiers includes multiple Gaussian mixture models based Bayesian classifier, i-vector based Gaussian Probabilistic Linear Discriminant Analysis and XGBoost tree boosting algorithm. Fusion of scores was made by modified logistic regression algorithm from BOSARIS toolbox. ASV Spoof 2017 corpus is utilized in the experiments as the main database for anti-spoofing systems evaluation. Obtained results demonstrate that the proposed system can provide substantially better performance than the baseline Gaussian mixture model classifier. The pre-processing of cepstral features is crucial for the better performance of the system. High evaluation performance can be obtained using only few algorithms in a set. The attained value of equal error rate EER=12.44% for our fusion classifier is competitive with the best results obtained during last two years. DOI 10.14258/izvasu(2018)1-19
- Published
- 2018
- Full Text
- View/download PDF
8. Recovering Pose and 3D Deformable Shape from Multi-instance Image Ensembles
- Author
-
Agudo, Antonio, Moreno-Noguer, Francesc, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Lai, Shang-Hong, editor, Lepetit, Vincent, editor, Nishino, Ko, editor, and Sato, Yoichi, editor
- Published
- 2017
- Full Text
- View/download PDF
9. Local Learning Multiple Probabilistic Linear Discriminant Analysis
- Author
-
Yang, Yi, Sun, Jiasong, and Stephanidis, Constantine, editor
- Published
- 2015
- Full Text
- View/download PDF
10. PLDA Speaker Verification with Limited Speech Data
- Author
-
Ridzik, Andrej, Rusko, Milan, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Ronzhin, Andrey, editor, Potapova, Rodmonga, editor, and Fakotakis, Nikos, editor
- Published
- 2015
- Full Text
- View/download PDF
11. Neural adversarial learning for speaker recognition.
- Author
-
Chien, Jen-Tzung and Peng, Kang-Ting
- Subjects
- *
AUTOMATIC speech recognition , *ARTIFICIAL neural networks , *DEEP learning , *EMBEDDINGS (Mathematics) , *DISCRIMINANT analysis , *MATHEMATICAL optimization - Abstract
• A series of neural adversarial learning methods are proposed for speaker recognition. • Deep learning is implemented for PLDA speaker recognition with i -vectors. • Adversarial manifold learning is proposed to build a subspace model with neighbor embeddings in latent variables. • Multiobjective learning is performed for minimax optimization. • Adversarial augmentation learning is proposed to compensate the imbalanced data problem in speaker recognition. This paper presents the adversarial learning approaches to deal with various tasks in speaker recognition based on probabilistic discriminant analysis (PLDA) which is seen as a latent variable model for reconstruction of i -vectors. The first task aims to reduce the dimension of i -vectors based on an adversarial manifold learning where the adversarial neural networks of generator and discriminator are merged to preserve neighbor embedding of i -vectors in a low-dimensional space. The generator is trained to fool the discriminator with the generated samples in latent space. A PLDA subspace model is constructed by jointly minimizing a PLDA reconstruction error, a manifold loss for neighbor embedding and an adversarial loss caused by the generator and discriminator. The second task of adversarial learning is developed to tackle the imbalanced data problem. A PLDA based generative adversarial network is trained to generate new i -vectors to balance the size of training utterances across different speakers. An adversarial augmentation learning is proposed for robust speaker recognition. In particular, the minimax optimization is performed to estimate a generator and a discriminator where the class conditional i -vectors produced by generator could not be distinguished from real i -vectors via discriminator. A multiobjective learning is realized for a specialized neural model with the cosine similarity between real and fake i -vectors as well as the regularization for Gaussianity. Experiments are conducted to show the merit of adversarial learning in subspace construction and data augmentation for PLDA-based speaker recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
12. Learning the Face Prior for Bayesian Face Recognition
- Author
-
Lu, Chaochao, Tang, Xiaoou, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Fleet, David, editor, Pajdla, Tomas, editor, Schiele, Bernt, editor, and Tuytelaars, Tinne, editor
- Published
- 2014
- Full Text
- View/download PDF
13. Joint PLDA for Simultaneous Modeling of Two Factors.
- Author
-
Ferrer, Luciana and McLaren, Mitchell
- Subjects
- *
DISCRIMINANT analysis , *FACTOR analysis , *BIOMETRIC identification , *LATENT variables , *SPEECH processing systems - Abstract
Probabilistic linear discriminant analysis (PLDA) is a method used for biometric problems like speaker or face recognition that models the variability of the samples using two latent variables, one that depends on the class of the sample and another one that is assumed independent across samples and models the within-class variability. In this work, we propose a generalization of PLDA that enables joint modeling of two sample-dependent factors: the class of interest and a nuisance condition. The approach does not change the basic form of PLDA but rather modifies the training procedure to consider the dependency across samples of the latent variable that models within-class variability. While the identity of the nuisance condition is needed during training, it is not needed during testing since we propose a scoring procedure that marginalizes over the corresponding latent variable. We show results on a multilingual speaker-verification task, where the language spoken is considered a nuisance condition. The proposed joint PLDA approach leads to significant performance gains in this task for two different data sets, in particular when the training data contains mostly or only monolingual speakers. [ABSTRACT FROM AUTHOR]
- Published
- 2019
14. Modelling and compensation for language mismatch in speaker verification.
- Author
-
Misra, Abhinav and Hansen, John H.L.
- Subjects
- *
LOUDSPEAKERS , *AUTOMATIC speech recognition , *NEAREST neighbor analysis (Statistics) , *AFFINE transformations , *NONPARAMETRIC estimation , *DISCRIMINANT analysis , *ORAL communication - Abstract
Language mismatch represents one of the more difficult challenges in achieving effective speaker verification in naturalistic audio streams. The portion of bi-lingual speakers worldwide continues to grow making speaker verification for speech technology more difficult. In this study, three specific methods are proposed to address this issue. Experiments are conducted on the PRISM (Promoting Robustness in Speaker Modeling) evaluation-set. We first show that adding small amounts of multi-lingual seed data to the Probabilistic Linear Discriminant Analysis (PLDA) development set, leads to a significant relative improvement of +17.96% in system Equal Error Rate (EER). Second, we compute the eigendirections that represent the distribution of multi-lingual data added to PLDA. We show that by adding these new eigendirections as part of the Linear Discriminant Analysis (LDA), and then minimizing them to directly compensate for language mismatch, further performance gains for speaker verification are achieved. By combining both multi-lingual PLDA and this minimization step with the new set of eigendirections, we obtain a +26.03% relative improvement in EER. In practical scenarios, it is highly unlikely that multi-lingual seed data representing the languages present in the test-set would be available. Hence, in the third phase, we address such scenarios, by proposing a method for Locally Weighted Linear Discriminant Analysis (LWLDA). In this third method, we reformulate the LDA equations to incorporate a local affine transform that weighs the same speaker samples. This method effectively preserves the local intrinsic information represented by the multimodal structure of the within-speaker scatter matrix, thereby helping to improve the class discriminating ability of LDA. It also helps in extending the ability of LDA to transform the speaker i-Vectors to dimensions that are greater than the total number of speaker classes. Using LWLDA, a relative improvement of +8.54% is obtained in system EER. LWLDA provides even more gain when multi-lingual seed data is available, and improves the system peformance by relative +26.03% in terms of EER. We also compare LWLDA to the recently proposed Nearest Neighbor Non-Parametric Discriminant Analysis (NDA). We show that not only is LWLDA better than NDA in terms of system performance but is also computationally less expensive. Comparative studies on DARPA Robust Automatic Transcription of Speech (RATS) corpus also show that LWLDA consistently outperforms NDA and LDA on different evaluation conditions. Our solutions offer new directions for addressing a challenging problem which has received limited attention in the speaker recognition community. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
15. PLDA-based mean shift speakers' short segments clustering.
- Author
-
Salmun, Itay, Shapiro, Ilya, Opher, Irit, and Lapidot, Itshak
- Subjects
- *
SPEECH processing systems , *LINEAR statistical models , *DISCRIMINANT analysis , *PROBABILITY theory , *SPEECH processing software - Abstract
This paper extends upon a previous work using Mean Shift algorithm to perform speaker clustering on i-vectors generated from short speech segments. In this paper we examine the effectiveness of probabilistic linear discriminant analysis (PLDA) scoring as the metric of the mean shift clustering algorithm in the presence of different numbers of speakers. Our proposed method, combined with k-nearest neighbors (kNN) for bandwidth estimation, yields better and more robust results in comparison to the cosine similarity with fixed neighborhood bandwidth for clustering segments of large numbers of speakers. In the case of 30 speakers, we achieved significant improvement in cluster and speaker purity with the PLDA-based mean shift algorithm compared to the cosine-based baseline system. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
16. How to make embeddings suitable for PLDA.
- Author
-
Li, Zhuo, Xiao, Runqiu, Chen, Hangting, Zhao, Zhenduo, Wang, Wenchao, and Zhang, Pengyuan
- Subjects
- *
PROBABILISTIC databases , *DISCRIMINANT analysis , *EMBEDDINGS (Mathematics) , *SIMPLICITY , *DATABASES - Abstract
Probabilistic linear discriminant analysis (PLDA) is widely implemented in speaker verification tasks. However, PLDA has limitations owing to its assumptions. In this study, we explore how to make deep speaker embeddings suitable for PLDA in complex situations. We analyze PLDA in detail and summarize its three important properties, Gaussianity, simplicity, and domain sensitivity. For the Gaussianity, by comparing the discrimination and Gaussianity of embeddings extracted from different layers of speaker extractors with different numbers of segment-level fully connected (Fc) layers, we demonstrate that embeddings extracted from the first Fc layer of models with two segment-level Fc layers are more suitable for PLDA. Secondly, several common speaker datasets comprise multiple short-duration speech segments extracted from long speech. We find that embeddings of short speech segments extracted from the long speech are less reliable and have complex within-class distributions. By determining the weighted average of embeddings extracted from short-duration speech segments, we simplify the embeddings distribution and make the embeddings suitable for PLDA. Thirdly, PLDA is sensitive to domain mismatches. We propose data adaptation methods that work directly on raw speech to eliminate explicit mismatches, such as the codecs and the environment noise mismatches. We prove that the data adaptation methods achieve performance improvements of PLDA and show strong complementarity with backend adaptation methods. We conduct extensive experiments, using the NIST SRE CTS superset, VoxCeleb, and SRE16 as the training set, and the SRE21 set as the evaluation set mainly. The experimental results show that our methods effectively improve the overall performance of PLDA. • We investigate how to make embeddings suitable for PLDA. • We analyze the effect of the number of segment-level fully-connected layers on embeddings. • We explore the relationship between speech segment duration and embeddings. • We propose data adaptation method to eliminate the relatively explicit mismatch. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Sentence‐HMM state‐based i‐vector/PLDA modelling for improved performance in text dependent single utterance speaker verification.
- Author
-
Büyük, Osman
- Abstract
In this paper, we make use of hidden Markov model (HMM) state alignment information in i‐vector/probabilistic linear discriminant analysis (PLDA) framework to improve the verification performance in a text‐dependent single utterance (TDSU) task. In the TDSU task, speakers repeat a fixed utterance in both enrollment and authentication sessions. Despite Gaussian mixture models (GMMs) have been the dominant modeling technique for text‐independent applications, an HMM based method might be better suited for the TDSU task since it captures the co‐articulation information better. Recently, powerful channel compensation techniques such as joint factor analysis (JFA), i‐vectors and PLDA have been proposed for GMM based text‐independent speaker verification. In this study, we train a separate i‐vector/PLDA model for each sentence HMM state in order to utilize the alignment information of the HMM states in a TDSU task. The proposed method is tested using a multi‐channel speaker verification database. In the experiments, it is observed that HMM state based i‐vector/PLDA (i‐vector/PLDA‐HMM) provides approximately 67% relative reduction in equal error rate (EER) when compared to the i‐vector/PLDA. The proposed method also outperforms the baseline GMM and sentence HMM methods. It yields approximately 51% relative reduction in EER over the best performing sentence HMM method. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
18. Sparse kernel machines with empirical kernel maps for PLDA speaker verification.
- Author
-
Rao, Wei and Mak, Man-Wai
- Subjects
- *
NATURAL language processing , *AUTOMATIC speech recognition , *KERNEL operating systems , *SPEECH processing systems , *SUPPORT vector machines - Abstract
Previous studies have demonstrated the benefits of PLDA–SVM scoring with empirical kernel maps for i-vector/PLDA speaker verification. The method not only performs significantly better than the conventional PLDA scoring and utilizes the multiple enrollment utterances of target speakers effectively, but also opens up opportunity for adopting sparse kernel machines in PLDA-based speaker verification systems. This paper proposes taking the advantages of empirical kernel maps by incorporating them into a more advanced kernel machine called relevance vector machines (RVMs). The paper reports extensive analyses on the behaviors of RVMs and provides insight into the properties of RVMs and their applications in i-vector/PLDA speaker verification. Results on NIST 2012 SRE demonstrate that PLDA–RVM outperforms the conventional PLDA and that it achieves a comparable performance as PLDA–SVM. Results also show that PLDA–RVM is much sparser than PLDA–SVM. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
19. Task-Driven Variability Model for Speaker Verification
- Author
-
Chen Chen and Jiqing Han
- Subjects
0209 industrial biotechnology ,Speaker verification ,Computer science ,business.industry ,Applied Mathematics ,Network structure ,Pattern recognition ,02 engineering and technology ,Latent variable ,Information loss ,Total variability ,Probabilistic linear discriminant analysis ,020901 industrial engineering & automation ,Signal Processing ,Embedding ,Artificial intelligence ,business ,Classifier (UML) - Abstract
The total variability model (TVM)/probabilistic linear discriminant analysis (PLDA) framework is one of the most popular methods for speaker verification. In this framework, the i-vector representations are first extracted from utterances via an estimated TVM and then employed to estimate the PLDA parameters for classification. The TVM and PLDA are estimated serially, so the information loss in the TVM is inherited by the i-vectors, and then passed into the PLDA classifier. More seriously, this loss cannot be compensated by the PLDA. To solve this problem, we propose a task-driven variability model (TDVM) to jointly estimate the TVM and PLDA classifier. In this method, the feedback from the PLDA can supervise the optimal solution of the TVM to move toward the space that has the maximum between-class separation and minimum within-class variation. Meanwhile, this space is suitable for open-set test which can deal with unenrolled speakers. Unlike most embedding methods which extract the embedding representations via the stack of network structures, the TDVM contains the assumptions about latent variables, which can enhance the interpretation of speaker representation extraction. The proposed method is evaluated on the King-ASR-010 and VoxCeleb databases, and the experimental results show that the TDVM method can achieve better performance than the traditional TVM/PLDA and VGG-M network with different cost functions.
- Published
- 2019
- Full Text
- View/download PDF
20. Speech intelligibility assessment of dysarthria using Fisher vector encoding.
- Author
-
H․M․, Chandrashekar, Karjigi, Veena, and Sreedevi, N.
- Subjects
- *
DYSARTHRIA , *ARTIFICIAL neural networks , *INTELLIGIBILITY of speech , *NEUROMUSCULAR diseases , *SPECTRUM analysis , *CEPSTRUM analysis (Mechanics) - Abstract
• Intelligibility assessment of dysarthric speech using UA and TORGO databases. • Different frame-level features are extracted in from temporal, spectral, and cepstral domains. • Temporal encoding and Fisher vector encoding techniques are used to convert frame-level features to utterance level features. • Probabilistic linear discriminant analysis (PLDA) and artificial neural network (ANN) classifiers are used for the intelligibility assessment of dysarthric speech. • Fisher vector encoding performance is superior to temporal encoding and ANN performance is superior to PLDA. Neuromuscular disorders can lead to dysarthria. Dysarthria is a speech disorder that mainly affects the human motor speech system. It often results in reduced speech intelligibility. Speech intelligibility is one of the parameters to assess the severity of dysarthria. The intelligibility of speech decreases as the severity increases. Automatic speech intelligibility assessment could be reliable and cost-effective compared to the conventional methods which need experienced speech pathologists. Automatic intelligibility assessment includes feature extraction and classification. Features extracted from spectral and cepstral domains are investigated for use in intelligibility assessment. Frame level features are extracted from different signal representations in spectral and cepstral domains. Fisher vector encoding and temporal encoding that uses descriptive statistics are applied to convert frame level features into utterance level. These features are fed to PLDA and ANN classifiers for intelligibility assessment. Overall, the performance of Fisher vector encoding is found to be superior compared to temporal encoding. In all features, the performance of ANN is better in assessing the intelligibility levels compared to PLDA as expected. The STFT, as well as harmonics in spectral domain and CQCC in the cepstral domain, performed well for intelligibility assessment of unseen data of TORGO database. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Anonymous speaker clusters: Making distinctions between anonymised speech recordings with clustering interface
- Author
-
Anaïs Chanclu, Benjamin O'Brien, Natalia A. Tomashenko, Jean-François Bonastre, Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, ANR-18-CE23-0018,DEEP-PRIVACY,Apprentissage distribué, personnalisé, préservant la privacité pour le traitement de la parole(2018), ANR-18-JSTS-0001,VoicePersonae,Clonage et protection de l'identité vocale(2018), ANR-19-DATA-0008,Harpocrates,Open data, outils et challenges pour l'anonymisation des voix(2019), ANR-17-CE39-0016,VoxCrim,Comparaison de voix appliquée au domaine criminalistique(2017), O'Brien, Benjamin, APPEL À PROJETS GÉNÉRIQUE 2018 - Apprentissage distribué, personnalisé, préservant la privacité pour le traitement de la parole - - DEEP-PRIVACY2018 - ANR-18-CE23-0018 - AAPG2018 - VALID, APPEL À PROJETS FRANCO-JAPONAIS : INTERACTION SYMBIOTIQUE - Clonage et protection de l'identité vocale - - VoicePersonae2018 - ANR-18-JSTS-0001 - ANR-JST CREST IS - VALID, Open data, outils et challenges pour l'anonymisation des voix - - Harpocrates2019 - ANR-19-DATA-0008 - DONNEES - VALID, Comparaison de voix appliquée au domaine criminalistique - - VoxCrim2017 - ANR-17-CE39-0016 - AAPG2017 - VALID, Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), and European Project: 825081,H2020,COMPRISE(2018)
- Subjects
Speaker verification ,Computer science ,subjective evaluation ,Interface (computing) ,Speech recognition ,Speech characteristics ,Speech synthesis ,02 engineering and technology ,computer.software_genre ,privacy ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Probabilistic linear discriminant analysis ,030507 speech-language pathology & audiology ,03 medical and health sciences ,symbols.namesake ,Similarity (network science) ,speech synthesis ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,anonymisation ,speaker identification ,020206 networking & telecommunications ,[SCCO.LING]Cognitive science/Linguistics ,Pearson product-moment correlation coefficient ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,symbols ,[SCCO.LING] Cognitive science/Linguistics ,0305 other medical science ,computer ,clustering - Abstract
International audience; Our study examined the performance of evaluators tasked to group natural and anonymised speech recordings into clusters based on their perceived similarities. Speech stimuli were selected from the VCTK corpus; two systems developed for the VoicePrivacy 2020 Challenge were used for anonymisation. The Baseline-1 (B1) system was developed by using x-vectors and neural waveform models, while the Baseline-2 (B2) system relied on digital-signal-processing techniques. 74 evaluators completed three trials composed of 16 recordings with either natural or anonymised speech generated from a single system. F-measure and cluster purity metrics were used to assess evaluator accuracy. Probabilistic linear discriminant analysis (PLDA) scores from an automatic speaker verification system were generated to quantify similarity between recordings and used to correlate subjective results. Our findings showed that non-native English speaking evaluators significantly lowered their F-measure means when presented anonymised recordings. We observed no significance for cluster purity. Pearson correlation procedures revealed that PLDA scores generated from natural and B2-anonymised speech recordings correlated positively to F-measure and cluster purity metrics. These findings show evaluators were able to use the interface to cluster natural and anonymised speech recordings and suggest anonymisation systems modelled like B1 are more effective at suppressing identifiable speech characteristics.
- Published
- 2021
22. An advanced channel compensation method for speaker recognition.
- Author
-
Imamverdiyev, Yadigar and Sukhostat, Lyudmila
- Abstract
This paper discusses various methods for channel compensation to effectively reduce errors in speech data transmitted through different channels in order to increase the accuracy of speaker recognition system. The aim of this paper is to analyze and evaluate the channel distortions and noise effects on speech. Methods used for channel equalization and compensation in order to improve the efficiency of speaker verification and identification systems over various communication channels are described. A scalable approach for probabilistic linear discriminant analysis for speaker recognition is presented. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
23. On the Application of the Probabilistic Linear Discriminant Analysis to Face Recognition across Expression.
- Author
-
Wibowo, Moh Edi, Tjondronegoro, Dian, and Zhang, Ligang
- Abstract
Facial expression is one of the main issues of face recognition in uncontrolled environments. In this paper, we apply the probabilistic linear discriminant analysis (PLDA) method to recognize faces across expressions. Several PLDA approaches are tested and cross-evaluated on the Cohn-Kanade and JAFFE databases. With less samples per gallery subject, high recognition rates comparable to previous works have been achieved indicating the robustness of the approaches. Among the approaches, the mixture of PLDAs has demonstrated better performances. The experimental results also indicate that facial regions around the cheeks, eyes, and eyebrows are more discriminative than regions around the mouth, jaw, chin, and nose. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
24. Gender-independent speaker recognition using source normalisation.
- Author
-
McLaren, Mitchell and van Leeuwen, David A.
- Abstract
Source-normalisation (SN) was proposed to improve the robustness of i-vector-based speaker recognition for under-resourced and unseen cross-speech-source evaluation conditions. The technique of source-normalisation estimates directions of undesired within-speaker variation more accurately than traditional methods when cross-source variation is not explicitly observed from each speaker in system development data. Incorporated into Within Class Covariance Normalisation (WCCN), source-normalisation provides significant improvements to speaker recognition based on i-vectors. This paper proposes a novel approach to gender-independent Probabilistic LDA (PLDA) through the use of SN-WCCN to normalise for the variation that separates genders as a pre-processing step for i-vector based PLDA classification. Evaluated on the NIST 2010 speaker recognition evaluation (SRE) dataset, the proposed approach demonstrated performance comparable to a typical gender-dependent configuration. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
25. Probabilistic Linear Discriminant Analysis for Acoustic Modeling.
- Author
-
Liang Lu and Renals, Steve
- Subjects
DISCRIMINANT analysis ,ACOUSTIC models ,AUTOMATIC speech recognition ,HIDDEN Markov models ,GAUSSIAN mixture models - Abstract
In this letter, we propose a new acoustic modeling approach for automatic speech recognition based on probabilistic linear discriminant analysis (PLDA), which is used to model the state density function for the standard hidden Markov models (HMMs). Unlike the conventional Gaussian mixture models (GMMs) where the correlations are weakly modelled by using the diagonal covariance matrices, PLDA captures the correlations of feature vector in subspaces without vastly expanding the model. It also allows the usage of high dimensional feature input, and therefore is more flexible to make use of different type of acoustic features. We performed the preliminary experiments on the Switchboard corpus, and demonstrated the feasibility of this acoustic model. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
26. On the Complementarity of Phone Posterior Probabilities for Improved Speaker Recognition.
- Author
-
Diez, Mireia, Varona, Amparo, Penagarikano, Mikel, Rodriguez-Fuentes, Luis Javier, and Bordel, German
- Subjects
VOICEPRINTS ,PROBABILITY theory ,MATHEMATICAL sequences ,PERFORMANCE evaluation ,PHONETICS - Abstract
In this letter, we apply Phone Log-Likelihood Ratio (PLLR) features to the task of speaker recognition. PLLRs, which are computed on the phone posterior probabilities provided by phone decoders, convey acoustic-phonetic information in a sequence of frame-level vectors, and therefore can be easily plugged into traditional acoustic systems, just by replacing the Mel-Frequency Cepstral Coefficients (MFCC) or an alternate representation. To study the performance of the proposed features, MFCC-based and PLLR-based systems are trained under an i-vector-PLDA approach. Results on the NIST 2010 and 2012 Speaker Recognition Evaluation databases show that, despite yielding lower performance than the acoustic system, the system based on PLLR features does provide significant gains when both systems are fused, which reveals a complementarity among features, and provides a suitable and effective way of using higher level phonetic information in speaker recognition systems. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
27. Voice biometrics using linear Gaussian model.
- Author
-
Yang, Hai, Xu, Yunfei, Huang, Houjun, Zhou, Ruohua, and Yan, Yonghong
- Abstract
This study introduces a linear Gaussian model‐based framework for voice biometrics. The model works with discrete‐time linear dynamical systems. The study motivation is to use the linear Gaussian modelling method in voice biometrics, and show that the accuracy offered by the linear Gaussian modelling method is comparable with other state‐of‐the‐art methods such as Probabilistic Linear Discriminant Analysis and two‐covariance model. An expectation–maximisation algorithm is derived to train the model and a Bayesian solution is used to calculate the log‐likelihood ratio score of all trials of speakers. This approach performed well on the core‐extended conditions of the NIST 2010 Speaker Recognition Evaluation, and is competitive compared with the Gaussian probabilistic linear discriminant analysis, in terms of normalised decision cost function. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
28. Classification of non-tumorous skin pigmentation disorders using voting based probabilistic linear discriminant analysis
- Author
-
Wee Ser, Steven Tien Guan Thng, Yunfeng Liang, Lei Sun, Qiping Chen, Zhiping Lin, Feng Lin, School of Computer Science and Engineering, School of Electrical and Electronic Engineering, and Interdisciplinary Graduate School (IGS)
- Subjects
Male ,Melasma ,media_common.quotation_subject ,Health Informatics ,02 engineering and technology ,Probabilistic linear discriminant analysis ,030207 dermatology & venereal diseases ,03 medical and health sciences ,0302 clinical medicine ,Skin Pigmentation Disorder ,Voting ,Image Processing, Computer-Assisted ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Humans ,Nevus ,media_common ,Non-tumorous ,Contextual image classification ,business.industry ,Pattern recognition ,medicine.disease ,V-PLDA ,Nevus of Ota ,Computer Science Applications ,Electrical and electronic engineering [Engineering] ,Computer science and engineering [Engineering] ,Female ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Pigmentation Disorders ,Algorithms - Abstract
Non-tumorous skin pigmentation disorders can have a huge negative emotional impact on patients. The correct diagnosis of these disorders is essential for proper treatments to be instituted. In this paper, we present a computerized method for classifying five non-tumorous skin pigmentation disorders (i.e., freckles, lentigines, Hori's nevus, melasma and nevus of Ota) based on probabilistic linear discriminant analysis (PLDA). To address the large within-class variance problem with pigmentation images, a voting based PLDA (V-PLDA) approach is proposed. The proposed V-PLDA method is tested on a dataset that contains 150 real-world images taken from patients. It is shown that the proposed V-PLDA method obtains significantly higher classification accuracy (4% or more with p< 0.001 in the analysis of variance (ANOVA) test) than the original PLDA method, as well as several state-of-the-art image classification methods. To the authors' best knowledge, this is the first study that focuses on the non-tumorous skin pigmentation image classification problem. Therefore, this paper could provide a benchmark for subsequent research on this topic. Additionally, the proposed V-PLDA method demonstrates promising performance in clinical applications related to skin pigmentation disorders. Accepted version
- Published
- 2018
- Full Text
- View/download PDF
29. A speaker verification backend with robust performance across conditions
- Author
-
Niko Brümmer, Mitchell McLaren, and Luciana Ferrer
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Sound (cs.SD) ,Speaker verification ,Training set ,Artificial neural network ,Calibration (statistics) ,Computer science ,Speech recognition ,Binary number ,Computer Science - Sound ,Machine Learning (cs.LG) ,Theoretical Computer Science ,Probabilistic linear discriminant analysis ,Human-Computer Interaction ,Joint (audio engineering) ,Software - Abstract
In this paper, we address the problem of speaker verification in conditions unseen or unknown during development. A standard method for speaker verification consists of extracting speaker embeddings with a deep neural network and processing them through a backend composed of probabilistic linear discriminant analysis (PLDA) and global logistic regression score calibration. This method is known to result in systems that work poorly on conditions different from those used to train the calibration model. We propose to modify the standard backend, introducing an adaptive calibrator that uses duration and other automatically extracted side-information to adapt to the conditions of the inputs. The backend is trained discriminatively to optimize binary cross-entropy. When trained on a number of diverse datasets that are labeled only with respect to speaker, the proposed backend consistently and, in some cases, dramatically improves calibration, compared to the standard PLDA approach, on a number of held-out datasets, some of which are markedly different from the training data. Discrimination performance is also consistently improved. We show that joint training of the PLDA and the adaptive calibrator is essential — the same benefits cannot be achieved when freezing PLDA and fine-tuning the calibrator. To our knowledge, the results in this paper are the first evidence in the literature that it is possible to develop a speaker verification system with robust out-of-the-box performance on a large variety of conditions.
- Published
- 2022
- Full Text
- View/download PDF
30. Pairwise Discriminative Speaker Verification in the I-Vector Space.
- Author
-
Cumani, Sandro, Brummer, Niko, Burget, Lukáš, Laface, Pietro, Plchot, Oldřich, and Vasilakakis, Vasileios
- Subjects
DISCRIMINANT analysis ,AUTOMATIC speech recognition ,SUPPORT vector machines ,CLASSIFICATION algorithms ,MEMORY - Abstract
This work presents a new and efficient approach to discriminative speaker verification in the i–vector space. We illustrate the development of a linear discriminative classifier that is trained to discriminate between the hypothesis that a pair of feature vectors in a trial belong to the same speaker or to different speakers. This approach is alternative to the usual discriminative setup that discriminates between a speaker and all the other speakers. We use a discriminative classifier based on a Support Vector Machine (SVM) that is trained to estimate the parameters of a symmetric quadratic function approximating a log–likelihood ratio score without explicit modeling of the i–vector distributions as in the generative Probabilistic Linear Discriminant Analysis (PLDA) models. Training these models is feasible because it is not necessary to expand the i–vector pairs, which would be expensive or even impossible even for medium sized training sets. The results of experiments performed on the tel-tel extended core condition of the NIST 2010 Speaker Recognition Evaluation are competitive with the ones obtained by generative models, in terms of normalized Detection Cost Function and Equal Error Rate. Moreover, we show that it is possible to train a gender–independent discriminative model that achieves state–of–the–art accuracy, comparable to the one of a gender–dependent system, saving memory and execution time both in training and in testing. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
31. Face Recognition using Simplified Probabilistic Linear Discriminant Analysis.
- Author
-
Vesnicer, Boštjan, Gros, Jerneja Žganec, Pavešić, Nikola, and Štruc, Vitomir
- Subjects
HUMAN facial recognition software ,PROBABILISTIC databases ,DISCRIMINANT analysis ,PROBLEM solving ,VERIFICATION of computer systems ,ROBUST control - Abstract
Face recognition in uncontrolled environments remains an open problem that has not been satisfactorily solved by existing recognition techniques. In this paper, we tackle this problem using a variant of the recently proposed Probabilistic Linear Discriminant Analysis (PLDA). We show that simplified versions of the PLDA model, which are regularly used in the field of speaker recognition, rely on certain assumptions that not only result in a simpler PLDA model, but also reduce the computational load of the technique and - as indicated by our experimental assessments - improve recognition performance. Moreover, we show that, contrary to the general belief that PLDA-based methods produce well calibrated verification scores, score normalization techniques can still deliver significant performance gains, but only if non-parametric score normalization techniques are employed. Last but not least, we demonstrate the competitiveness of the simplified PLDA model for face recognition by comparing our results with the state-of-the-art results from the literature obtained on the second version of the large-scale Face Recognition Grand Challenge (FRGC) database. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
32. Improving PLDA speaker verification performance using domain mismatch compensation techniques
- Author
-
Hafizur Rahman, Sridha Sridharan, Ahilan Kanagasundaram, David Dean, and Ivan Himawan
- Subjects
Speaker verification ,Training set ,Computer science ,business.industry ,Speech recognition ,Word error rate ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Linear discriminant analysis ,Theoretical Computer Science ,Probabilistic linear discriminant analysis ,Human-Computer Interaction ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Score fusion ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Invariant (mathematics) ,0305 other medical science ,business ,Software ,Subspace topology - Abstract
Domain mismatch significantly affects the speaker verification performance.Domain invariant linear discriminant analysis (DI-LDA) for compensating domain mismatch in the LDA subspace.Domain invariant probabilistic linear discriminant analysis (DI-PLDA) for domain mismatch modelling n the PLDA subspace.DI-LDA approach followed by the DI-PLDA (DI-PLDA[DI-LDA]) to compensate domain mismatch from both LDA and PLDA subspaces.Limited target domain data requirement using domain mismatch compensation techniques. The performance of state-of-the-art i-vector speaker verification systems relies on a large amount of training data for probabilistic linear discriminant analysis (PLDA) modeling. During the evaluation, it is also crucial that the target condition data is matched well with the development data used for PLDA training. However, in many practical scenarios, these systems have to be developed, and trained, using data that is often outside the domain of the intended application, since the collection of a significant amount of in-domain data is often difficult. Experimental studies have found that PLDA speaker verification performance degrades significantly due to this development/evaluation mismatch. This paper introduces a domain-invariant linear discriminant analysis (DI-LDA) technique for out-domain PLDA speaker verification that compensates domain mismatch in the LDA subspace. We also propose a domain-invariant probabilistic linear discriminant analysis (DI-PLDA) technique for domain mismatch modeling in the PLDA subspace, using only a small amount of in-domain data. In addition, we propose the sequential and score-level combination of DI-LDA, and DI-PLDA to further improve out-domain speaker verification performance. Experimental results show the proposed domain mismatch compensation techniques yield at least 27% and 14.5% improvement in equal error rate (EER) over a pooled PLDA system for telephone-telephone and interview-interview conditions, respectively. Finally, we show that the improvement over the baseline pooled system can be attained even when significantly reducing the number of in-domain speakers, down to 30 in most of the evaluation conditions.
- Published
- 2018
- Full Text
- View/download PDF
33. Evaluating the Performance of Speaker Recognition Solutions in E-Commerce Applications
- Author
-
Dusan Starcevic, Olja Krčadinac, and Uroš Šošević
- Subjects
biometrics ,Biometry ,Biometrics ,Computer science ,Speech recognition ,media_common.quotation_subject ,TP1-1185 ,02 engineering and technology ,E-commerce ,Biochemistry ,Analytical Chemistry ,Probabilistic linear discriminant analysis ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Consistency (database systems) ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Instrumentation ,media_common ,Focus (computing) ,speaker recognition ,business.industry ,Communication ,Chemical technology ,Commerce ,Discriminant Analysis ,Recognition, Psychology ,020206 networking & telecommunications ,identity management systems ,Certainty ,Speaker recognition ,Atomic and Molecular Physics, and Optics ,e-commerce applications ,Identity (object-oriented programming) ,0305 other medical science ,business ,Algorithms - Abstract
Two important tasks in many e-commerce applications are identity verification of the user accessing the system and determining the level of rights that the user has for accessing and manipulating system’s resources. The performance of these tasks is directly dependent on the certainty of establishing the identity of the user. The main research focus of this paper is user identity verification approach based on voice recognition techniques. The paper presents research results connected to the usage of open-source speaker recognition technologies in e-commerce applications with an emphasis on evaluating the performance of the algorithms they use. Four open-source speaker recognition solutions (SPEAR, MARF, ALIZE, and HTK) have been evaluated in cases of mismatched conditions during training and recognition phases. In practice, mismatched conditions are influenced by various lengths of spoken sentences, different types of recording devices, and the usage of different languages in training and recognition phases. All tests conducted in this research were performed in laboratory conditions using the specially designed framework for multimodal biometrics. The obtained results show consistency with the findings of recent research which proves that i-vectors and solutions based on probabilistic linear discriminant analysis (PLDA) continue to be the dominant speaker recognition approaches for text-independent tasks.
- Published
- 2021
- Full Text
- View/download PDF
34. A fuzzy‐clustering‐based hierarchical i‐vector/probabilistic linear discriminant analysis system for text‐dependent speaker verification
- Author
-
Mohammad Azharuddin Laskar and Rabul Hussain Laskar
- Subjects
Speaker verification ,Fuzzy clustering ,Computational Theory and Mathematics ,Artificial Intelligence ,Control and Systems Engineering ,Computer science ,business.industry ,Pattern recognition ,Artificial intelligence ,I vector ,business ,Theoretical Computer Science ,Probabilistic linear discriminant analysis - Published
- 2020
- Full Text
- View/download PDF
35. From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification.
- Author
-
Rajan, Padmanabhan, Afanasyev, Anton, Hautamäki, Ville, and Kinnunen, Tomi
- Subjects
- *
VECTOR analysis , *VERIFICATION of computer systems , *LINEAR statistical models , *MAXIMUM likelihood statistics , *DATA analysis , *MATRIX inversion - Abstract
Abstract: The availability of multiple utterances (and hence, i-vectors) for speaker enrollment brings up several alternatives for their utilization with probabilistic linear discriminant analysis (PLDA). This paper provides an overview of their effective utilization, from a practical viewpoint. We derive expressions for the evaluation of the likelihood ratio for the multi-enrollment case, with details on the computation of the required matrix inversions and determinants. The performance of five different scoring methods, and the effect of i-vector length normalization is compared experimentally. We conclude that length normalization is a useful technique for all but one of the scoring methods considered, and averaging i-vectors is the most effective out of the methods compared. We also study the application of multicondition training on the PLDA model. Our experiments indicate that multicondition training is more effective in estimating PLDA hyperparameters than it is for likelihood computation. Finally, we look at the effect of the configuration of the enrollment data on PLDA scoring, studying the properties of conditional dependence and number-of-enrollment-utterances per target speaker. Our experiments indicate that these properties affect the performance of the PLDA model. These results further support the conclusion that i-vector averaging is a simple and effective way to process multiple enrollment utterances. [Copyright &y& Elsevier]
- Published
- 2014
- Full Text
- View/download PDF
36. Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification
- Author
-
Man Sun, Shuai Wang, Yanmin Qian, Kai Yu, and Yexin Yang
- Subjects
Training set ,Speaker verification ,Noise measurement ,Computer science ,Speech recognition ,020206 networking & telecommunications ,02 engineering and technology ,Probabilistic linear discriminant analysis ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Effective method ,NIST ,0305 other medical science ,Generative grammar - Abstract
Data augmentation is an effective method to increase the quantity of training data, which improves the model’s robustness and generalization ability. In this paper, we propose a generative adversarial network (GAN) based data augmentation approach for probabilistic linear discriminant analysis (PLDA), which is a standard back-end for state-of-the-art x-vector based speaker verification system. Instead of generating new spectral feature samples, a conditional Wasserstein GAN is adopted to directly generate x-vectors. Experiments are carried out on the standard NIST SRE 2016 evaluation dataset. Compared to manually adding noise, the GAN augmented PLDA achieves better performance and this performance can be further boosted when combined with manual augmented data. EER of 11.68% and 4.43% were obtained for Tagalog and Cantonese evaluation condition, respectively.
- Published
- 2018
- Full Text
- View/download PDF
37. The Scalable Version of Probabilistic Linear Discriminant Analysis and Its Potential as A Classifier for Audio Signal Classification
- Author
-
Yuechi Jiang and H. F. Frank Leung
- Subjects
Training set ,business.industry ,Computer science ,Audio signal classification ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Facial recognition system ,Probabilistic linear discriminant analysis ,Support vector machine ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Symmetric matrix ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Classifier (UML) - Abstract
Probabilistic Linear Discriminant Analysis (PLDA) has exhibited good performance in face recognition and speaker verification. However, it is not widely used as a general-purpose classifier. The major limitation of PLDA lies in that, in the original formulation, the modeling part and the prediction part require the inversion of large matrices, whose sizes are proportional to the number of training vectors in a class. The original formulation of PLDA is not scalable if there are many training vectors, because the matrices will become too large to be inverted. In the literature, some scalable versions for the modeling part have been proposed. In this paper, we propose the scalable version for the prediction part, which completes the scalable version of PLDA. This makes PLDA able to handle a large number of training data, enabling PLDA to be used as a general-purpose classifier for different classification tasks. We then apply PLDA as the classifier to three different audio signal classification tasks, and compare its performance with Support Vector Machine (SVM), which is a widely used general-purpose classifier. Experimental results show that PLDA performs very well and can be even better than SVM, in terms of classification accuracy.
- Published
- 2018
- Full Text
- View/download PDF
38. Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification
- Author
-
Hafizur Rahman, Clinton Fookes, Sridha Sridharan, David Dean, and Ivan Himawan
- Subjects
Speaker verification ,Training set ,Computer science ,business.industry ,Feature extraction ,Pattern recognition ,Artificial intelligence ,Invariant (mathematics) ,Covariance ,I vector ,business ,Combined approach ,Probabilistic linear discriminant analysis - Abstract
The performance of the current state-of-the-art i-vector based probabilistic linear discriminant analysis (PLDA) speaker verification depends on large volumes of training data, ideally in the target domain. However, in real-world applications, it is often difficult to collect sufficient amount of target domain data for successful PLDA training. Thus, an adequate amount of domain mismatch compensated out-domain data must be used as the basis of PLDA training. In this paper, we introduce a domain-invariant i-vector extraction (DI-IVEC) approach to extract domain mismatch compensated out-domain i-vectors using limited in-domain (target) data for adaptation. In this method, in-domain prior information is utilised to remove the domain mismatch during the i-vector extraction stage. The proposed method provides at least 17.3% improvement in EER over an out-domain-only trained baseline when speaker labels are absent and a 27.2% improvement in EER when speaker labels are known. A further improvement is obtained when DI-IVEC approach is used in combination with a domain-invariant covariance normalization (DICN) approach. This combined approach is found to work well with reduced in-domain adaptation data, where only 1000 unlabelled i-vectors are required to perform better than a baseline in-domain PLDA approach.
- Published
- 2018
- Full Text
- View/download PDF
39. Robust discriminative training against data insufficiency in PLDA-based speaker verification
- Author
-
Koichi Shinoda, Sangeeta Biswas, and Johan Rohdin
- Subjects
Training set ,Speaker verification ,Computer science ,business.industry ,Speech recognition ,Score ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Overfitting ,Theoretical Computer Science ,Probabilistic linear discriminant analysis ,Human-Computer Interaction ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Discriminative model ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,NIST ,Artificial intelligence ,0305 other medical science ,business ,Software - Abstract
HighlightsWe address data insufficiency in discriminative PLDA training.First, we compensate for statistical dependencies in the training data.Second, we propose three constrained discriminative training schemes. Probabilistic linear discriminant analysis (PLDA) with i-vectors as features has become one of the state-of-the-art methods in speaker verification. Discriminative training (DT) has proven to be effective for improving PLDA's performance but suffers more from data insufficiency than generative training (GT). In this paper, we achieve robustness against data insufficiency in DT in two ways. First, we compensate for statistical dependencies in the training data by adjusting the weights of the training trials in order for the training loss to be an accurate estimate of the expected loss. Second, we propose three constrained DT schemes, among which the best was a discriminatively trained transformation of the PLDA score function having four parameters. Experiments on the male telephone part of the NIST SRE 2010 confirmed the effectiveness of our proposed techniques. For various number of training speakers, the combination of weight-adjustment and the constrained DT scheme gave between 7% and 19% relative improvements in C ? llr over GT followed by score calibration. Compared to another baseline, DT of all the parameters of the PLDA score function, the improvements were larger.
- Published
- 2016
- Full Text
- View/download PDF
40. Non-speaker information reduction from Cosine Similarity Scoring in i-vector based speaker verification
- Author
-
Hossein Sameti, Bagher BabaAli, Alireza Mirian, and Hossein Zeinali
- Subjects
Speaker verification ,General Computer Science ,business.industry ,Speech recognition ,Cosine similarity ,Normalization (image processing) ,Scoring methods ,Word error rate ,Pattern recognition ,I vector ,Probabilistic linear discriminant analysis ,Control and Systems Engineering ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Mathematics - Abstract
Cosine similarity and Probabilistic Linear Discriminant Analysis (PLDA) in i-vector space are two state-of-the-art scoring methods in speaker verification field. While PLDA usually gives better accuracy, Cosine Similarity Scoring (CSS) remains a widely used method due to simplicity and acceptable performance. In this domain, several channel compensation and score normalization methods have been proposed to improve the performance. We investigate non-speaker information in cosine similarity metric and propose a new approach to remove it from the decision making process. I-vectors hold a large amount of non-speaker information such as channel effects, language, and phonetic content. This type of information increases the verification error rate and hence it should be removed from the scoring method. To this end we propose a method that estimates non-speaker information between two i-vectors using the development set and subtracts it from cosine similarity. The results indicate that the proposed method performed better than other implemented methods based on the cosine similarity. Furthermore, in certain cases the performance of this method was better than the PLDA method and when combined with PLDA performance was improved in most cases.
- Published
- 2015
- Full Text
- View/download PDF
41. Automatic versus human speaker verification: The case of voice mimicry
- Author
-
Ville Hautamäki, Anne-Maria Laukkanen, Rosa González Hautamäki, and Tomi Kinnunen
- Subjects
Linguistics and Language ,Speaker verification ,Computer science ,business.industry ,Communication ,Speech recognition ,Word error rate ,Speaker recognition ,computer.software_genre ,Language and Linguistics ,Computer Science Applications ,Probabilistic linear discriminant analysis ,Modeling and Simulation ,Mimicry ,Active listening ,Statistical analysis ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Software ,Natural language processing - Abstract
In this work, we compare the performance of three modern speaker verification systems and non-expert human listeners in the presence of voice mimicry. Our goal is to gain insights on how vulnerable speaker verification systems are to mimicry attack and compare it to the performance of human listeners. We study both traditional Gaussian mixture model-universal background model (GMM-UBM) and an i-vector based classifier with cosine scoring and probabilistic linear discriminant analysis (PLDA) scoring. For the studied material in Finnish language, the mimicry attack decreased lightly the equal error rate (EER) for GMM-UBM from 10.83 to 10.31, while for i-vector systems the EER increased from 6.80 to 13.76 and from 4.36 to 7.38. The performance of the human listening panel shows that imitated speech increases the difficulty of the speaker verification task. It is even more difficult to recognize a person who is intentionally concealing his or her identity. For Impersonator A, the average listener made 8 errors from 34 trials while the automatic systems had 6 errors in the same set. The average listener for Impersonator B made 7 errors from the 28 trials, while the automatic systems made 7 to 9 errors. A statistical analysis of the listener performance was also conducted. We found out a statistically significant association, with p ¼ 0:00019 and R 2 ¼ 0:59, between listener accuracy and self reported factors only when familiar voices were present in the test.
- Published
- 2015
- Full Text
- View/download PDF
42. Autonomous Selection of i-Vectors for PLDA Modelling in Speaker Verification
- Author
-
Johan Rohdin, Sangeeta Biswas, and Koichi Shinoda
- Subjects
Linguistics and Language ,Training set ,Speaker verification ,Computer science ,business.industry ,Communication ,Speech recognition ,Pattern recognition ,Speaker recognition ,Language and Linguistics ,Computer Science Applications ,Probabilistic linear discriminant analysis ,Modeling and Simulation ,Outlier ,NIST ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,Selection (genetic algorithm) ,Data selection - Abstract
Recently, systems combining i-vector and probabilistic linear discriminant analysis (PLDA) have become one of the state-of-the-art methods in text-independent speaker verification. The training data of a PLDA model is often collected from a large, diverse population. However, including irrelevant or noisy training data may deteriorate the verification performance. In this paper, we first show that data selection using k -NN improves the speaker verification performance. We then present a robust way of selecting k based on the local distance-based outlier factor (LDOF). We call this method flexible k -NN ( fk -NN). We conduct experiments on male and female trials of several telephone conditions of the NIST 2006, 2008, 2010 and 2012 Speaker Recognition Evaluations (SRE). By using fk -NN, we discard a substantial amount of irrelevant or noisy training data without depending on tuning k , and achieve significant performance improvements on the NIST SRE sets.
- Published
- 2015
43. Speaker adaptation using probabilistic linear discriminant analysis for continuous speech recognition.
- Author
-
Jeong, Y.
- Abstract
The application of probabilistic linear discriminant analysis (PLDA) to speaker adaptation for automatic speech recognition based on hidden Markov models is proposed. By expressing the set of acoustic models of each of the training speakers in a matrix and treating each column as a sample, the small sample problem that can be encountered in PLDA if only one sample is available for each training speaker is overcome. In the continuous speech recognition experiments, the performance of the PLDA based approach improves over the principal component analysis (PCA) based approach and the two‐dimensional PCA based approach for adaptation data longer than 12 s. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
44. Curriculum Learning Based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker Recognition
- Author
-
Abhinav Misra, Shivesh Ranjan, and John H. L. Hansen
- Subjects
business.industry ,Computer science ,Speech recognition ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Speaker recognition ,Probabilistic linear discriminant analysis ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Noise ,0202 electrical engineering, electronic engineering, information engineering ,Learning based ,Artificial intelligence ,0305 other medical science ,business ,Curriculum - Published
- 2017
- Full Text
- View/download PDF
45. Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker Verification
- Author
-
Mohammad Mehdi Homayounpour and Abbas Khosravani
- Subjects
Speaker verification ,Computer science ,business.industry ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,I vector ,01 natural sciences ,Probabilistic linear discriminant analysis ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,010301 acoustics - Published
- 2017
- Full Text
- View/download PDF
46. Full-posterior PLDA based speaker diarization of telephone conversations
- Author
-
Songzan Guan, Yanni Chen, Yonghong Yan, and Wei Hong
- Subjects
Speaker diarisation ,Scoring system ,Computer science ,Speech recognition ,Feature extraction ,Posterior probability ,Cluster analysis ,Speaker recognition ,Data modeling ,Probabilistic linear discriminant analysis - Abstract
Conventional speaker diarization systems based on factor analysis mainly differ in i-vector scoring, such as the cosine scoring and the newly emerged probabilistic linear discriminant analysis (PLDA) scoring technique. However, during the clustering process, the accuracy of PLDA scoring decreases in short speech segments. The matter becomes even worse when the segments are with arbitrary duration. In this paper, we choose a modified PLDA model, called full posterior distribution PLDA (FP-PLDA) for clustering instead of the standard PLDA model (Std-PLDA). The new model exploits the intrinsic uncertainty of the i-vector extraction. The experiment shows that FP-PLDA has an especially effective performance in the short and variable duration speech segments. It relatively decreases the diarization error rate by around 41% for the cosine scoring system and 30.98% for the standard PLDA system.
- Published
- 2017
- Full Text
- View/download PDF
47. Local training in speaker verification for PLDA
- Author
-
Priya Ranjan, Amit Ujlayan, and Hunny Pahuja
- Subjects
Normalization (statistics) ,Speaker verification ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Automation ,Probabilistic linear discriminant analysis ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Artificial intelligence ,0305 other medical science ,business ,computer - Abstract
For i-vector model, normalization approach is Probabilistic linear discriminant analysis and has a significant performance for verification of speaker. However it requires a huge development data which cost a lot in many cases. Unsupervised adaption method is a possible approach, which use unlabeled data to adapt PLDA scattering matrices to the target domain. In this paper, ‘local training’ approach is adapted to train the PLDA model. In this approach local labels discriminate the speaker within a single conversion and therefore easy to obtain as compare to normal ‘global labels’. Proposed approach can deliver better performance and improvement, particularly with limited globally-labeled data.
- Published
- 2017
- Full Text
- View/download PDF
48. Performance evaluation of mixtures of PLDA and conventional PLDA for a small-set speaker verification system
- Author
-
Qianhui Wan and Martin Bouchard
- Subjects
030507 speech-language pathology & audiology ,03 medical and health sciences ,Engineering ,Speaker verification ,Noise measurement ,business.industry ,Robustness (computer science) ,Speech recognition ,0305 other medical science ,business ,Small set ,Computer Science::Information Theory ,Probabilistic linear discriminant analysis - Abstract
This paper compares the use of signal to noise ratio (SNR)-dependent and SNR-independent mixtures of probabilistic linear discriminant analysis (PLDA) versus conventional PLDA, under multi-noise and multi-SNR conditions for a small-set speaker verification system. Results indicate that conventional PLDA is more robust under multi-SNR conditions. The effect of the testing speech length is also examined and speech signals with a length of 5 seconds were found to achieve acceptable results.
- Published
- 2017
- Full Text
- View/download PDF
49. Dynamic probabilistic linear discriminant analysis for video classification
- Author
-
Stefanos Zafeiriou, Irene Kotsia, Mihalis A. Nicolaou, and Alessandro Fabris
- Subjects
business.industry ,Computer science ,Divergence-from-randomness model ,Probabilistic logic ,Pattern recognition ,02 engineering and technology ,Machine learning ,computer.software_genre ,Facial recognition system ,Face Recognition ,Temporal database ,Data modeling ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Generative model ,Component Analysis ,0202 electrical engineering, electronic engineering, information engineering ,Probabilistic Linear Discriminant Analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,Canonical correlation ,business ,Cluster analysis ,computer - Abstract
Component Analysis (CA) comprises of statistical techniques that decompose signals into appropriate latent components, relevant to a task-at-hand (e.g., clustering, segmentation, classification). Recently, an explosion of research in CA has been witnessed, with several novel probabilistic models proposed (e.g., Probabilistic Principal CA, Probabilistic Linear Discriminant Analysis (PLDA), Probabilistic Canonical Correlation Analysis). PLDA is a popular generative probabilistic CA method, that incorporates knowledge regarding class-labels and furthermore introduces class-specific and sample-specific latent spaces. While PLDA has been shown to outperform several state-of-the-art methods, it is nevertheless a static model; any feature-level temporal dependencies that arise in the data are ignored. As has been repeatedly shown, appropriate modelling of temporal dynamics is crucial for the analysis of temporal data (e.g., videos). In this light, we propose the first, to the best of our knowledge, probabilistic LDA formulation that models dynamics, the so-called Dynamic-PLDA (DPLDA). DPLDA is a generative model suitable for video classification and is able to jointly model the label information (e.g., face identity, consistent over videos of the same subject), as well as dynamic variations of each individual video. Experiments on video classification tasks such as face and facial expression recognition show the efficacy of the proposed method.
- Published
- 2017
50. Role of voice activity detection methods for the speakers in the wild challenge
- Author
-
S. R. Mahadeva Prasanna, Sarfaraz Jelil, Rohan Kumar Das, and Rohit Sinha
- Subjects
Speaker verification ,Voice activity detection ,Noise measurement ,Computer science ,Speech recognition ,020206 networking & telecommunications ,02 engineering and technology ,Speaker recognition ,Probabilistic linear discriminant analysis ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Noise (video) ,Mel-frequency cepstrum ,0305 other medical science - Abstract
One of the major reasons for the performance degradation of a speaker verification (SV) system in real-world conditions is its inability to spot speech regions due to the presence of noise. This work focuses on the role of voice activity detection (VAD) methods in alleviating such shortcomings. The experiments are conducted on the core-core task of the speakers in the wild (SITW) challenge. Two VAD approaches are explored in this work. One of them is the recently proposed self-adaptive VAD and the other is based on vowel-like region (VLR) detection. For evaluating the effectiveness of these approaches, the SV systems are developed using the i-vector framework in the front-end and probabilistic linear discriminant analysis (PLDA) in the back-end. The self-adaptive VAD based system shows better performance compared to the VLR based system in high SNR condition. Under degraded conditions, the VLR based method is relatively more robust compared to self-adaptive VAD. Exploiting these complementary features, significant improvements in the SV performances are noted with the fusion of scores of the two systems.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.