Author: "Garner, Philip N." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Garner, Philip N."' showing total 237 results

Start Over Author "Garner, Philip N."

237 results on '"Garner, Philip N."'

1. Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

Author: He, Mutian and Garner, Philip N.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Architectures such as Linformer and Mamba have recently emerged as competitive linear time replacements for transformers. However, corresponding large pretrained models are often unavailable, especially in non-text domains. To remedy this, we present a Cross-Architecture Layerwise Distillation (CALD) approach that jointly converts a transformer model to a linear time substitute and fine-tunes it to a target task. We also compare several means to guide the fine-tuning to optimally retain the desired inference capability from the original model. The methods differ in their use of the target model and the trajectory of the parameters. In a series of empirical studies on language processing, language modeling, and speech processing, we show that CALD can effectively recover the result of the original model, and that the guiding strategy contributes to the result. Some reasons for the variation are suggested., Comment: 15 pages, 4 figures
Published: 2024

2. A Bayesian Interpretation of Adaptive Low-Rank Adaptation

Author: Chen, Haolin and Garner, Philip N.
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Motivated by the sensitivity-based importance score of the adaptive low-rank adaptation (AdaLoRA), we utilize more theoretically supported metrics, including the signal-to-noise ratio (SNR), along with the Improved Variational Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The resulting Bayesian counterpart not only has matched or surpassed the performance of using the sensitivity-based importance metric but is also a faster alternative to AdaLoRA with Adam. Our theoretical analysis reveals a significant connection between the two metrics, providing a Bayesian perspective on the efficacy of sensitivity as an importance score. Furthermore, our findings suggest that the magnitude, rather than the variance, is the primary indicator of the importance of parameters.
Published: 2024

3. An investigation of modularity for noise robustness in conformer-based ASR

Author: de Gibson, Louise Coppieters, Garner, Philip N., and Honnet, Pierre-Edouard
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Whilst state of the art automatic speech recognition (ASR) can perform well, it still degrades when exposed to acoustic environments that differ from those used when training the model. Unfamiliar environments for a given model may well be known a-priori, but yield comparatively small amounts of adaptation data. In this experimental study, we investigate to what extent recent formalisations of modularity can aid adaptation of ASR to new acoustic environments. Using a conformer based model and fixed routing, we confirm that environment awareness can indeed lead to improved performance in known environments. However, at least on the (CHIME) datasets in the study, it is difficult for a classifier module to distinguish different noisy environments, a simpler distinction between noisy and clean speech being the optimal configuration. The results have clear implications for deploying large models in particular environments with or without a-priori knowledge of the environmental noise., Comment: 5 pages, 3 figures
Published: 2024

4. Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks

Author: Bittar, Alexandre and Garner, Philip N.
Subjects: Computer Science - Computation and Language, Quantitative Biology - Neurons and Cognition
Abstract: Understanding cognitive processes in the brain demands sophisticated models capable of replicating neural dynamics at large scales. We present a physiologically inspired speech recognition architecture, compatible and scalable with deep learning frameworks, and demonstrate that end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network. Significant cross-frequency couplings, indicative of these oscillations, are measured within and across network layers during speech processing, whereas no such interactions are observed when handling background noise inputs. Furthermore, our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance. Overall, on top of developing our understanding of synchronisation phenomena notably observed in the human auditory pathway, our architecture exhibits dynamic and efficient information processing, with relevance to neuromorphic technology.
Published: 2024
Full Text: View/download PDF

5. Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting

Author: Chen, Haolin and Garner, Philip N.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning
Abstract: We are motivated primarily by the adaptation of text-to-speech synthesis models; however we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. Nevertheless, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker-factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance, and using the Kronecker-factored approximation produces a better preservation of the pre-training knowledge than the diagonal ones.
Published: 2024

6. Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding

Author: He, Mutian and Garner, Philip N.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recently, large pretrained language models have demonstrated strong language understanding capabilities. This is particularly reflected in their zero-shot and in-context learning abilities on downstream tasks through prompting. To assess their impact on spoken language understanding (SLU), we evaluate several such models like ChatGPT and OPT of different sizes on multiple benchmarks. We verify the emergent ability unique to the largest models as they can reach intent classification accuracy close to that of supervised models with zero or few shots on various languages given oracle transcripts. By contrast, the results for smaller models fitting a single GPU fall far behind. We note that the error cases often arise from the annotation scheme of the dataset; responses from ChatGPT are still reasonable. We show, however, that the model is worse at slot filling, and its performance is sensitive to ASR errors, suggesting serious challenges for the application of those textual models on SLU., Comment: 6 pages, 2 figures; Accepted by Interspeech 2023
Published: 2023

7. The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

Author: He, Mutian and Garner, Philip N.
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: End-to-end spoken language understanding (SLU) remains elusive even with current large pretrained language models on text and speech, especially in multilingual cases. Machine translation has been established as a powerful pretraining objective on text as it enables the model to capture high-level semantics of the input utterance and associations between different languages, which is desired for speech models that work on lower-level acoustic frames. Motivated particularly by the task of cross-lingual SLU, we demonstrate that the task of speech translation (ST) is a good means of pretraining speech models for end-to-end SLU on both intra- and cross-lingual scenarios. By introducing ST, our models reach higher performance over baselines on monolingual and multilingual intent classification as well as spoken question answering using SLURP, MINDS-14, and NMSQA benchmarks. To verify the effectiveness of our methods, we also create new benchmark datasets from both synthetic and real sources, for speech summarization and low-resource/zero-shot transfer from English to French or Spanish. We further show the value of preserving knowledge for the ST pretraining task for better downstream performance, possibly using Bayesian transfer regularizers., Comment: 16 pages, 3 figures; accepted by Findings of EMNLP 2023
Published: 2023

8. An investigation into the adaptability of a diffusion-based TTS model

Author: Chen, Haolin and Garner, Philip N.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Given the recent success of diffusion in producing natural-sounding synthetic speech, we investigate how diffusion can be used in speaker adaptive TTS. Taking cues from more traditional adaptation approaches, we show that adaptation can be included in a diffusion pipeline using conditional layer normalization with a step embedding. However, we show experimentally that, whilst the approach has merit, such adaptation alone cannot approach the performance of Transformer-based techniques. In a second experiment, we show that diffusion can be optimally combined with Transformer, with the latter taking the bulk of the adaptation load and the former contributing to improved naturalness.
Published: 2023

9. Surrogate Gradient Spiking Neural Networks as Encoders for Large Vocabulary Continuous Speech Recognition

Author: Bittar, Alexandre and Garner, Philip N.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Compared to conventional artificial neurons that produce dense and real-valued responses, biologically-inspired spiking neurons transmit sparse and binary information, which can also lead to energy-efficient implementations. Recent research has shown that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method. They have shown promising results on speech command recognition tasks. Using the same technique, we show that they are scalable to large vocabulary continuous speech recognition, where they are capable of replacing LSTMs in the encoder with only minor loss of performance. This suggests that they may be applicable to more involved sequence-to-sequence tasks. Moreover, in contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates.
Published: 2022

10. Low-Level Physiological Implications of End-to-End Learning of Speech Recognition

Author: de Gibson, Louise Coppieters and Garner, Philip N.
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Current speech recognition architectures perform very well from the point of view of machine learning, hence user interaction. This suggests that they are emulating the human biological system well. We investigate whether the inference can be inverted to provide insights into that biological system; in particular the hearing mechanism. Using SincNet, we confirm that end-to-end systems do learn well known filterbank structures. However, we also show that wider band-width filters are important in the learned structure. Whilst some benefits can be gained by initialising both narrow and wide-band filters, physiological constraints suggest that such filters arise in mid-brain rather than the cochlea. We show that standard machine learning architectures must be modified to allow this process to be emulated neurally., Comment: Submitted to INTERSPEECH 2022
Published: 2022

11. Bayesian Recurrent Units and the Forward-Backward Algorithm

Author: Bittar, Alexandre and Garner, Philip N.
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Using Bayes's theorem, we derive a unit-wise recurrence as well as a backward recursion similar to the forward-backward algorithm. The resulting Bayesian recurrent units can be integrated as recurrent neural networks within deep learning frameworks, while retaining a probabilistic interpretation from the direct correspondence with hidden Markov models. Whilst the contribution is mainly theoretical, experiments on speech recognition indicate that adding the derived units at the end of state-of-the-art recurrent architectures can improve the performance at a very low cost in terms of trainable parameters., Comment: Submitted to INTERSPEECH 2022
Published: 2022
Full Text: View/download PDF

12. A t-distribution based operator for enhancing out of distribution robustness of neural network classifiers

Author: Antonello, Niccolò and Garner, Philip N.
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Neural Network (NN) classifiers can assign extreme probabilities to samples that have not appeared during training (out-of-distribution samples) resulting in erroneous and unreliable predictions. One of the causes for this unwanted behaviour lies in the use of the standard softmax operator which pushes the posterior probabilities to be either zero or unity hence failing to model uncertainty. The statistical derivation of the softmax operator relies on the assumption that the distributions of the latent variables for a given class are Gaussian with known variance. However, it is possible to use different assumptions in the same derivation and attain from other families of distributions as well. This allows derivation of novel operators with more favourable properties. Here, a novel operator is proposed that is derived using $t$-distributions which are capable of providing a better description of uncertainty. It is shown that classifiers that adopt this novel operator can be more robust to out of distribution samples, often outperforming NNs that use the standard softmax operator. These enhancements can be reached with minimal changes to the NN architecture., Comment: 5 pages, 5 figures, to be published in IEEE Signal Processing Letters, reproducible code https://github.com/idiap/tsoftmax
Published: 2020
Full Text: View/download PDF

13. A Bayesian Approach to Recurrence in Neural Networks

Author: Garner, Philip N. and Tong, Sibo
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We begin by reiterating that common neural network activation functions have simple Bayesian origins. In this spirit, we go on to show that Bayes's theorem also implies a simple recurrence relation; this leads to a Bayesian recurrent unit with a prescribed feedback formulation. We show that introduction of a context indicator leads to a variable feedback that is similar to the forget mechanism in conventional recurrent units. A similar approach leads to a probabilistic input gate. The Bayesian formulation leads naturally to the two pass algorithm of the Kalman smoother or forward-backward algorithm, meaning that inference naturally depends upon future inputs as well as past ones. Experiments on speech recognition confirm that the resulting architecture can perform as well as a bidirectional recurrent network with the same number of parameters as a unidirectional one. Further, when configured explicitly bidirectionally, the architecture can exceed the performance of a conventional bidirectional recurrence.
Published: 2019
Full Text: View/download PDF

14. A Variational Prosody Model for Mapping the Context-Sensitive Variation of Functional Prosodic Prototypes

Author: Gerazov, Branislav, Bailly, Gérard, Mohammed, Omar, Xu, Yi, and Garner, Philip N.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: The quest for comprehensive generative models of intonation that link linguistic and paralinguistic functions to prosodic forms has been a longstanding challenge of speech communication research. Traditional intonation models have given way to the overwhelming performance of deep learning (DL) techniques for training general purpose end-to-end mappings using millions of tunable parameters. The shift towards black box machine learning models has nonetheless posed the reverse problem -- a compelling need to discover knowledge, to explain, visualise and interpret. Our work bridges between a comprehensive generative model of intonation and state-of-the-art DL techniques. We build upon the modelling paradigm of the Superposition of Functional Contours (SFC) model and propose a Variational Prosody Model (VPM) that uses a network of variational contour generators to capture the context-sensitive variation of the constituent elementary prosodic contours. We show that the VPM can give insight into the intrinsic variability of these prosodic prototypes through learning a meaningful prosodic latent space representation structure. We also show that the VPM is able to capture prosodic phenomena that have multiple dimensions of context based variability. Since it is based on the principle of superposition, the VPM does not necessitate the use of specially crafted corpora for the analysis, opening up the possibilities of using big data for prosody analysis. In a speech synthesis scenario, the model can be used to generate a dynamic and natural prosody contour that is devoid of averaging effects., Comment: Updated with recurrent version of contour generators, unified prosodic latent space, and performance evaluation with baseline
Published: 2018

15. Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Author: Tong, Sibo, Garner, Philip N., and Bourlard, Hervé
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as it performs well with monophone labels. We investigate multilingual CTC in the context of adaptation and regularisation techniques that have been shown to be beneficial in more conventional contexts. The multilingual model is trained to model a universal International Phonetic Alphabet (IPA)-based phone set using the CTC loss function. Learning Hidden Unit Contribution (LHUC) is investigated to perform language adaptive training. In addition, dropout during cross-lingual adaptation is also studied and tested in order to mitigate the overfitting problem. Experiments show that the performance of the universal phoneme-based CTC system can be improved by applying LHUC and it is extensible to new phonemes during cross-lingual adaptation. Updating all the parameters shows consistent improvement on limited data. Applying dropout during adaptation can further improve the system and achieve competitive performance with Deep Neural Network / Hidden Markov Model (DNN/HMM) systems on limited data.
Published: 2017

16. Investigating a neural all pass warp in modern TTS applications

Author: Schnell, Bastian and Garner, Philip N.
Published: 2022
Full Text: View/download PDF

17. Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

Author: Cernak, Milos, Lazaridis, Alexandros, Asaei, Afsaneh, and Garner, Philip N.
Subjects: Computer Science - Sound, Computer Science - Computation and Language
Abstract: Most current very low bit rate (VLBR) speech coding systems use hidden Markov model (HMM) based speech recognition/synthesis techniques. This allows transmission of information (such as phonemes) segment by segment that decreases the bit rate. However, the encoder based on a phoneme speech recognition may create bursts of segmental errors. Segmental errors are further propagated to optional suprasegmental (such as syllable) information coding. Together with the errors of voicing detection in pitch parametrization, HMM-based speech coding creates speech discontinuities and unnatural speech sound artefacts. In this paper, we propose a novel VLBR speech coding framework based on neural networks (NNs) for end-to-end speech analysis and synthesis without HMMs. The speech coding framework relies on phonological (sub-phonetic) representation of speech, and it is designed as a composition of deep and spiking NNs: a bank of phonological analysers at the transmitter, and a phonological synthesizer at the receiver, both realised as deep NNs, and a spiking NN as an incremental and robust encoder of syllable boundaries for coding of continuous fundamental frequency (F0). A combination of phonological features defines much more sound patterns than phonetic features defined by HMM-based speech coders, and the finer analysis/synthesis code contributes into smoother encoded speech. Listeners significantly prefer the NN-based approach due to fewer discontinuities and speech artefacts of the encoded speech. A single forward pass is required during the speech encoding and decoding. The proposed VLBR speech coding operates at a bit rate of approximately 360 bits/s.
Published: 2016
Full Text: View/download PDF

18. Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees

Author: Taghizadeh, Mohammad J., Parhizkar, Reza, Garner, Philip N., Bourlard, Herve, and Asaei, Afsaneh
Subjects: Computer Science - Sound, Computer Science - Learning
Abstract: This paper addresses the problem of ad hoc microphone array calibration where only partial information about the distances between microphones is available. We construct a matrix consisting of the pairwise distances and propose to estimate the missing entries based on a novel Euclidean distance matrix completion algorithm by alternative low-rank matrix completion and projection onto the Euclidean distance space. This approach confines the recovered matrix to the EDM cone at each iteration of the matrix completion algorithm. The theoretical guarantees of the calibration performance are obtained considering the random and locally structured missing entries as well as the measurement noise on the known distances. This study elucidates the links between the calibration error and the number of microphones along with the noise level and the ratio of missing distances. Thorough experiments on real data recordings and simulated setups are conducted to demonstrate these theoretical insights. A significant improvement is achieved by the proposed Euclidean distance matrix completion algorithm over the state-of-the-art techniques for ad hoc microphone array calibration., Comment: In Press, available online, August 1, 2014. http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal Processing, 2014
Published: 2014

19. Training a Filter-Based Model of the Cochlea in the Context of Pre-Trained Acoustic Models.

Author: Coppieters de Gibson, Louise and Garner, Philip N.
Subjects: ACOUSTIC models, COCHLEA, HYBRID systems, SPEECH perception, AUTOMATIC speech recognition, COCHLEA physiology, FILTER banks
Abstract: Auditory research aims in general to lead to understanding of physiological processes. By contrast, the state of the art in automatic speech processing (notably recognition) is dominated by large pre-trained models that are meant to be used as black-boxes. In this work, we integrate a physiologically plausible (albeit simple filter-based) model of the cochlea into a much larger pre-trained acoustic model for speech recognition. We show that the hybrid system can be trained and evaluated with various combinations of fine-tuning and self-supervision. The results broadly show that the system automatically yields structures that are known to work well. Moreover, these structures lack artifacts that were apparent in (our) previous work using less sophisticated neural models. We conclude that the hybrid structure is an appropriate way to proceed in auditory research, more generally allowing the work to take advantage of larger models and databases from which it would not otherwise benefit. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Cross-lingual adaptation of a CTC-based multilingual acoustic model

Author: Tong, Sibo, Garner, Philip N., and Bourlard, Hervé
Published: 2018
Full Text: View/download PDF

21. Intonation modelling using a muscle model and perceptually weighted matching pursuit

Author: Honnet, Pierre-Edouard, Gerazov, Branislav, Gjoreski, Aleksandar, and Garner, Philip N.
Published: 2018
Full Text: View/download PDF

22. Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes

Author: Korshunov, Pavel, primary, Chen, Haolin, additional, Garner, Philip N., additional, and Marcel, Sébastien, additional
Published: 2023
Full Text: View/download PDF

23. The Idiap Speech Synthesis System for the Blizzard Challenge 2023

Author: Chen, Haolin, primary, He, Mutian, additional, Gibson, Louise Coppieters de, additional, and Garner, Philip N., additional
Published: 2023
Full Text: View/download PDF

24. Diffusion Transformer for Adaptive Text-to-Speech

Author: Chen, Haolin, primary and Garner, Philip N., additional
Published: 2023
Full Text: View/download PDF

25. Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding

Author: He, Mutian, primary and Garner, Philip N., additional
Published: 2023
Full Text: View/download PDF

26. An Agonist-Antagonist Pitch Production Model

Author: Gerazov, Branislav, Garner, Philip N., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Ronzhin, Andrey, editor, Potapova, Rodmonga, editor, and Németh, Géza, editor
Published: 2016
Full Text: View/download PDF

27. Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer

Author: Sečujski, Milan, Gerazov, Branislav, Csapó, Tamás Gábor, Delić, Vlado, Garner, Philip N., Gjoreski, Aleksandar, Guennec, David, Ivanovski, Zoran, Melov, Aleksandar, Németh, Géza, Stojković, Ana, Szaszák, György, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Ronzhin, Andrey, editor, Potapova, Rodmonga, editor, and Németh, Géza, editor
Published: 2016
Full Text: View/download PDF

28. A Basis Function Approach to Position Estimation Using Microwave Arrays

Author: Webb, Andrew R. and Garner, Philip N.
Published: 1999

29. DNN-Based Speech Synthesis: Importance of Input Features and Training Data

Author: Lazaridis, Alexandros, Potard, Blaise, Garner, Philip N., Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Ronzhin, Andrey, editor, Potapova, Rodmonga, editor, and Fakotakis, Nikos, editor
Published: 2015
Full Text: View/download PDF

30. Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes

Author: Korshunov, Pavel, Chen, Haolin, Garner, Philip N., Marcel, Sebastien, Korshunov, Pavel, Chen, Haolin, Garner, Philip N., and Marcel, Sebastien
Abstract: The task of deepfakes detection is far from being solved by speech or vision researchers. Several publicly available databases of fake synthetic video and speech were built to aid the development of detection methods. However, existing databases typically focus on visual or voice modalities and provide no proof that their deepfakes can in fact impersonate any real person. In this paper, we present the first realistic audio-visual database of deepfakes SWAN-DF, where lips and speech are well synchronized and video have high visual and audio qualities. We took the publicly available SWAN dataset of real videos with different identities to create audio-visual deepfakes using several models from DeepFaceLab and blending techniques for face swapping and HiFiVC, DiffVC, YourTTS, and FreeVC models for voice conversion. From the publicly available speech dataset LibriTTS, we also created a separate database of only audio deepfakes LibriTTS-DF using several latest text to speech methods: YourTTS, Adaspeech, and TorToiSe. We demonstrate the vulnerability of a state of the art speaker recognition system, such as ECAPA-TDNN-based model from SpeechBrain, to the synthetic voices. Similarly, we tested face recognition system based on the MobileFaceNet architecture to several variants of our visual deepfakes. The vulnerability assessment show that by tuning the existing pretrained deepfake models to specific identities, one can successfully spoof the face and speaker recognition systems in more than 90% of the time and achieve a very realistic looking and sounding fake video of a given person., Comment: 10 pages, 3 figures, 3 tables
Published: 2023

31. Ad hoc microphone array calibration: Euclidean distance matrix completion algorithm and theoretical guarantees

Author: Taghizadeh, Mohammad J., Parhizkar, Reza, Garner, Philip N., Bourlard, Hervé, and Asaei, Afsaneh
Published: 2015
Full Text: View/download PDF

32. Enhanced diffuse field model for ad hoc microphone array calibration

Author: Taghizadeh, Mohammad J., Garner, Philip N., and Bourlard, Hervé
Published: 2014
Full Text: View/download PDF

33. Using out-of-language data to improve an under-resourced speech recognizer

Author: Imseng, David, Motlicek, Petr, Bourlard, Hervé, and Garner, Philip N.
Published: 2014
Full Text: View/download PDF

34. A surrogate gradient spiking baseline for speech command recognition

Author: Bittar, Alexandre and Garner, Philip N.
Subjects: General Neuroscience
Published: 2022
Full Text: View/download PDF

35. Bayesian Recurrent Units and the Forward-Backward Algorithm

Author: Bittar, Alexandre, primary and Garner, Philip N., additional
Published: 2022
Full Text: View/download PDF

36. Low-Level Physiological Implications of End-to-End Learning for Speech Recognition

Author: Coppieters de Gibson, Louise, primary and Garner, Philip N., additional
Published: 2022
Full Text: View/download PDF

37. Exploiting foreign resources for DNN-based ASR

Author: Motlicek, Petr, Imseng, David, Potard, Blaise, Garner, Philip N., and Himawan, Ivan
Published: 2015
Full Text: View/download PDF

38. An Agonist-Antagonist Pitch Production Model

Author: Gerazov, Branislav, primary and Garner, Philip N., additional
Published: 2016
Full Text: View/download PDF

39. Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer

Author: Sečujski, Milan, primary, Gerazov, Branislav, additional, Csapó, Tamás Gábor, additional, Delić, Vlado, additional, Garner, Philip N., additional, Gjoreski, Aleksandar, additional, Guennec, David, additional, Ivanovski, Zoran, additional, Melov, Aleksandar, additional, Németh, Géza, additional, Stojković, Ana, additional, and Szaszák, György, additional
Published: 2016
Full Text: View/download PDF

40. Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition

Author: Garner, Philip N.
Published: 2011
Full Text: View/download PDF

41. DNN-Based Speech Synthesis: Importance of Input Features and Training Data

Author: Lazaridis, Alexandros, primary, Potard, Blaise, additional, and Garner, Philip N., additional
Published: 2015
Full Text: View/download PDF

42. Learning to Translate Low-Resourced Swiss German Dialectal Speech into Standard German Text

Author: Khosravani, Abbas, primary, Garner, Philip N., additional, and Lazaridis, Alexandros, additional
Published: 2021
Full Text: View/download PDF

43. An Evaluation Benchmark for Automatic Speech Recognition of German-English Code-Switching

Author: Khosravani, Abbas, primary, Garner, Philip N., additional, and Lararidis, Alexandros, additional
Published: 2021
Full Text: View/download PDF

44. Modeling Dialectal Variation for Swiss German Automatic Speech Recognition

Author: Khosravani, Abbas, primary, Garner, Philip N., additional, and Lazaridis, Alexandros, additional
Published: 2021
Full Text: View/download PDF

45. Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction

Author: Schnell, Bastian, primary and Garner, Philip N., additional
Published: 2021
Full Text: View/download PDF

46. Current trends in multilingual speech processing

Author: BOURLARD, HERVÉ, DINES, JOHN, MAGIMAI-DOSS, MATHEW, GARNER, PHILIP N, IMSENG, DAVID, MOTLICEK, PETR, LIANG, HUI, SAHEER, LAKSHMI, and VALENTE, FABIO
Published: 2011
Full Text: View/download PDF

47. A Bayesian Interpretation of the Light Gated Recurrent Unit

Author: Bittar, Alexandre, primary and Garner, Philip N., additional
Published: 2021
Full Text: View/download PDF

48. A $t$-Distribution Based Operator for Enhancing Out of Distribution Robustness of Neural Network Classifiers

Author: Antonello, Niccolo, primary and Garner, Philip N., additional
Published: 2020
Full Text: View/download PDF

49. A Bayesian Approach to Recurrence in Neural Networks

Author: Garner, Philip N., primary and Tong, Sibo, additional
Published: 2020
Full Text: View/download PDF

50. A Bayesian Approach to Recurrence in Neural Networks.

Author: Garner, Philip N. and Tong, Sibo
Subjects: *FORWARD-backward algorithm, *SPEECH perception, *ALGORITHMS, *PSYCHOLOGICAL feedback, *ECONOMIC indicators, *LOGIC circuits
Abstract: We begin by reiterating that common neural network activation functions have simple Bayesian origins. In this spirit, we go on to show that Bayes's theorem also implies a simple recurrence relation; this leads to a Bayesian recurrent unit with a prescribed feedback formulation. We show that introduction of a context indicator leads to a variable feedback that is similar to the forget mechanism in conventional recurrent units. A similar approach leads to a probabilistic input gate. The Bayesian formulation leads naturally to the two pass algorithm of the Kalman smoother or forward-backward algorithm, meaning that inference naturally depends upon future inputs as well as past ones. Experiments on speech recognition confirm that the resulting architecture can perform as well as a bidirectional recurrent network with the same number of parameters as a unidirectional one. Further, when configured explicitly bidirectionally, the architecture can exceed the performance of a conventional bidirectional recurrence. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

237 results on '"Garner, Philip N."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources