Author: "Hain, Thomas" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hain, Thomas"' showing total 385 results

Start Over Author "Hain, Thomas"

385 results on '"Hain, Thomas"'

51. Automatic detection of behavioural codes in team interactions

Author: Hasan, Madina, Jefferson, Nicholas, Hain, Thomas, and Dawson, Jeremy
Published: 2022
Full Text: View/download PDF

52. WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Author: Huang, Shengjie, Chen, Mingjie, Xu, Yanyan, Ke, Dengfeng, Hain, Thomas, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Pham, Duc Nghia, editor, Theeramunkong, Thanaruk, editor, Governatori, Guido, editor, and Liu, Fenrong, editor
Published: 2021
Full Text: View/download PDF

53. Use of Speaker Metadata for Improving Automatic Pronunciation Assessment

Author: Saenz, Jose Antonio Lopez, Hain, Thomas, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Espinosa-Anke, Luis, editor, Martín-Vide, Carlos, editor, and Spasić, Irena, editor
Published: 2021
Full Text: View/download PDF

54. Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement

Author: Close, George, primary, Ravenscroft, William, additional, Hain, Thomas, additional, and Goetze, Stefan, additional
Published: 2024
Full Text: View/download PDF

55. SCORE: Self-Supervised Correspondence Fine-Tuning for Improved Content Representations

Author: Meghanani, Amit, primary and Hain, Thomas, additional
Published: 2024
Full Text: View/download PDF

56. Progressive Unsupervised Domain Adaptation for ASR Using Ensemble Models and Multi-Stage Training

Author: Ahmad, Rehan, primary, Farooq, Muhammad Umar, additional, and Hain, Thomas, additional
Published: 2024
Full Text: View/download PDF

57. Combining Conformer and Dual-Path-Transformer Networks for Single Channel Noisy Reverberant Speech Separation

Author: Ravenscroft, William, primary, Goetze, Stefan, additional, and Hain, Thomas, additional
Published: 2024
Full Text: View/download PDF

58. Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models

Author: Mogridge, Rhiannon, primary, Close, George, additional, Sutherland, Robert, additional, Hain, Thomas, additional, Barker, Jon, additional, Goetze, Stefan, additional, and Ragni, Anton, additional
Published: 2024
Full Text: View/download PDF

59. H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model

Author: Shi, Yanpei, Huang, Qiang, and Hain, Thomas
Published: 2021
Full Text: View/download PDF

60. Automatic Genre and Show Identification of Broadcast Media

Author: Doulaty, Mortaza, Saz, Oscar, Ng, Raymond W. M., and Hain, Thomas
Subjects: Computer Science - Multimedia, Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Huge amounts of digital videos are being produced and broadcast every day, leading to giant media archives. Effective techniques are needed to make such data accessible further. Automatic meta-data labelling of broadcast media is an essential task for multimedia indexing, where it is standard to use multi-modal input for such purposes. This paper describes a novel method for automatic detection of media genre and show identities using acoustic features, textual features or a combination thereof. Furthermore the inclusion of available meta-data, such as time of broadcast, is shown to lead to very high performance. Latent Dirichlet Allocation is used to model both acoustics and text, yielding fixed dimensional representations of media recordings that can then be used in Support Vector Machines based classification. Experiments are conducted on more than 1200 hours of TV broadcasts from the British Broadcasting Corporation (BBC), where the task is to categorise the broadcasts into 8 genres or 133 show identities. On a 200-hour test set, accuracies of 98.6% and 85.7% were achieved for genre and show identification respectively, using a combination of acoustic and textual features with meta-data., Comment: Proc. of 17th Interspeech (2016), San Francisco, California, USA
Published: 2016

61. The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media

Author: Saz, Oscar, Doulaty, Mortaza, Deena, Salil, Milner, Rosanna, Ng, Raymond W. M., Hasan, Madina, Liu, Yulan, and Hain, Thomas
Subjects: Computer Science - Computation and Language
Abstract: We describe the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge task of transcribing multi-genre broadcast shows. Transcription was one of four tasks proposed in the MGB challenge, with the aim of advancing the state of the art of automatic speech recognition, speaker diarisation and automatic alignment of subtitles for broadcast media. Four topics are investigated in this work: Data selection techniques for training with unreliable data, automatic speech segmentation of broadcast media shows, acoustic modelling and adaptation in highly variable environments, and language modelling of multi-genre shows. The final system operates in multiple passes, using an initial unadapted decoding stage to refine segmentation, followed by three adapted passes: a hybrid DNN pass with input features normalised by speaker-based cepstral normalisation, another hybrid stage with input features normalised by speaker feature-MLLR transformations, and finally a bottleneck-based tandem stage with noise and speaker factorisation. The combination of these three system outputs provides a final error rate of 27.5% on the official development set, consisting of 47 multi-genre shows., Comment: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015), 13-17 Dec 2015, Scottsdale, Arizona, USA
Published: 2015
Full Text: View/download PDF

62. Latent Dirichlet Allocation Based Organisation of Broadcast Media Archives for Deep Neural Network Adaptation

Author: Doulaty, Mortaza, Saz, Oscar, Ng, Raymond W. M., and Hain, Thomas
Subjects: Computer Science - Computation and Language
Abstract: This paper presents a new method for the discovery of latent domains in diverse speech data, for the use of adaptation of Deep Neural Networks (DNNs) for Automatic Speech Recognition. Our work focuses on transcription of multi-genre broadcast media, which is often only categorised broadly in terms of high level genres such as sports, news, documentary, etc. However, in terms of acoustic modelling these categories are coarse. Instead, it is expected that a mixture of latent domains can better represent the complex and diverse behaviours within a TV show, and therefore lead to better and more robust performance. We propose a new method, whereby these latent domains are discovered with Latent Dirichlet Allocation, in an unsupervised manner. These are used to adapt DNNs using the Unique Binary Code (UBIC) representation for the LDA domains. Experiments conducted on a set of BBC TV broadcasts, with more than 2,000 shows for training and 47 shows for testing, show that the use of LDA-UBIC DNNs reduces the error up to 13% relative compared to the baseline hybrid DNN models., Comment: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015), 13-17 Dec 2015, Scottsdale, Arizona, USA
Published: 2015
Full Text: View/download PDF

63. Background-tracking Acoustic Features for Genre Identification of Broadcast Shows

Author: Saz, Oscar, Doulaty, Mortaza, and Hain, Thomas
Subjects: Computer Science - Sound
Abstract: This paper presents a novel method for extracting acoustic features that characterise the background environment in audio recordings. These features are based on the output of an alignment that fits multiple parallel background--based Constrained Maximum Likelihood Linear Regression transformations asynchronously to the input audio signal. With this setup, the resulting features can track changes in the audio background like appearance and disappearance of music, applause or laughter, independently of the speakers in the foreground of the audio. The ability to provide this type of acoustic description in audiovisual data has many potential applications, including automatic classification of broadcast archives or improving automatic transcription and subtitling. In this paper, the performance of these features in a genre identification task in a set of 332 BBC shows is explored. The proposed background--tracking features outperform short--term Perceptual Linear Prediction features in this task using Gaussian Mixture Model classifiers (62% vs 72% accuracy). The use of more complex classifiers, Hidden Markov Models and Support Vector Machines, increases the performance of the system with the novel background--tracking features to 79% and 81% in accuracy respectively.
Published: 2015
Full Text: View/download PDF

64. The USFD Spoken Language Translation System for IWSLT 2014

Author: Ng, Raymond W. M., Doulaty, Mortaza, Doddipatla, Rama, Aziz, Wilker, Shah, Kashif, Saz, Oscar, Hasan, Madina, AlHarbi, Ghada, Specia, Lucia, and Hain, Thomas
Subjects: Computer Science - Computation and Language
Abstract: The University of Sheffield (USFD) participated in the International Workshop for Spoken Language Translation (IWSLT) in 2014. In this paper, we will introduce the USFD SLT system for IWSLT. Automatic speech recognition (ASR) is achieved by two multi-pass deep neural network systems with adaptation and rescoring techniques. Machine translation (MT) is achieved by a phrase-based system. The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23.45 and 14.75 on the English-to-French and English-to-German speech-to-text translation task with the IWSLT 2014 data. The USFD contrastive systems explore the integration of ASR and MT by using a quality estimation system to rescore the ASR outputs, optimising towards better translation. This gives a further 0.54 and 0.26 BLEU improvement respectively on the IWSLT 2012 and 2014 evaluation data.
Published: 2015

65. Data-selective Transfer Learning for Multi-Domain Speech Recognition

Author: Doulaty, Mortaza, Saz, Oscar, and Hain, Thomas
Subjects: Computer Science - Learning, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Negative transfer in training of acoustic models for automatic speech recognition has been reported in several contexts such as domain change or speaker characteristics. This paper proposes a novel technique to overcome negative transfer by efficient selection of speech data for acoustic model training. Here data is chosen on relevance for a specific target. A submodular function based on likelihood ratios is used to determine how acoustically similar each training utterance is to a target test set. The approach is evaluated on a wide-domain data set, covering speech from radio and TV broadcasts, telephone conversations, meetings, lectures and read speech. Experiments demonstrate that the proposed technique both finds relevant data and limits negative transfer. Results on a 6--hour test set show a relative improvement of 4% with data selection over using all data in PLP based models, and 2% with DNN features.
Published: 2015

66. Unsupervised Domain Discovery using Latent Dirichlet Allocation for Acoustic Modelling in Speech Recognition

Author: Doulaty, Mortaza, Saz, Oscar, and Hain, Thomas
Subjects: Computer Science - Computation and Language
Abstract: Speech recognition systems are often highly domain dependent, a fact widely reported in the literature. However the concept of domain is complex and not bound to clear criteria. Hence it is often not evident if data should be considered to be out-of-domain. While both acoustic and language models can be domain specific, work in this paper concentrates on acoustic modelling. We present a novel method to perform unsupervised discovery of domains using Latent Dirichlet Allocation (LDA) modelling. Here a set of hidden domains is assumed to exist in the data, whereby each audio segment can be considered to be a weighted mixture of domain properties. The classification of audio segments into domains allows the creation of domain specific acoustic models for automatic speech recognition. Experiments are conducted on a dataset of diverse speech data covering speech from radio and TV broadcasts, telephone conversations, meetings, lectures and read speech, with a joint training set of 60 hours and a test set of 6 hours. Maximum A Posteriori (MAP) adaptation to LDA based domains was shown to yield relative Word Error Rate (WER) improvements of up to 16% relative, compared to pooled training, and up to 10%, compared with models adapted with human-labelled prior domain knowledge.
Published: 2015

67. On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

Author: Ravenscroft, William, primary, Goetze, Stefan, additional, and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

68. Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning

Author: Islam, Elaf, primary, Hain, Thomas, additional, and Sudro, Protima Nomo, additional
Published: 2023
Full Text: View/download PDF

69. Deriving Translational Acoustic Sub-Word Embeddings

Author: Meghanani, Amit, primary and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

70. MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition

Author: Farooq, Muhammad Umar, primary, Ahmad, Rehan, additional, and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

71. Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection

Author: El Hannani, Asmaa, Errattahi, Rahhal, Salmam, Fatima Zahra, Hain, Thomas, and Ouahmane, Hassan
Published: 2021
Full Text: View/download PDF

72. Die Unternehmensgruppe Nassauische Heimstätte/Wohnstadt als Beispiel für eine zukunftsweisende Orientierung im Wohnungsbau

Author: Hain, Thomas, Lüter, Felix, Reich, Sebastian, Worms, Martin J., editor, and Radermacher, Franz J., editor
Published: 2018
Full Text: View/download PDF

73. Energieeffizienz, Klimaschutz und Nachhaltigkeit im Wohnungsbau

Author: Hain, Thomas, Lüter, Felix, Reich, Sebastian, Worms, Martin J., editor, and Radermacher, Franz J., editor
Published: 2018
Full Text: View/download PDF

74. Use of Speaker Metadata for Improving Automatic Pronunciation Assessment

Author: Saenz, Jose Antonio Lopez, primary and Hain, Thomas, additional
Published: 2021
Full Text: View/download PDF

75. WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Author: Huang, Shengjie, primary, Chen, Mingjie, additional, Xu, Yanyan, additional, Ke, Dengfeng, additional, and Hain, Thomas, additional
Published: 2021
Full Text: View/download PDF

76. System-independent ASR error detection and classification using Recurrent Neural Network

Author: Errattahi, Rahhal, EL Hannani, Asmaa, Hain, Thomas, and Ouahmane, Hassan
Published: 2019
Full Text: View/download PDF

77. The Effect of Spoken Language on Speech Enhancement Using Self-Supervised Speech Representation Loss Functions

Author: Close, George, primary, Hain, Thomas, additional, and Goetze, Stefan, additional
Published: 2023
Full Text: View/download PDF

78. Adapting Pretrained Models for Adult to Child Voice Conversion

Author: Sudro, Protima Nomo, primary, Ragni, Anton, additional, and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

79. Probing Statistical Representations for End-to-End ASR

Author: Ollerenshaw, Anna, primary, Jalal, Md Asif, additional, and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

80. On Data Sampling Strategies for Training Neural Network Speech Separation Models

Author: Ravenscroft, William, primary, Goetze, Stefan, additional, and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

81. The University of Sheffield CHiME-7 UDASE Challenge Speech Enhancement System

Author: Close, George L., primary, Ravenscroft, William, additional, Hain, Thomas, additional, and Goetze, Stefan, additional
Published: 2023
Full Text: View/download PDF

82. Domain Adaptive Self-supervised Training of Automatic Speech Recognition

Author: Do, Cong-Thanh, primary, Doddipatla, Rama, additional, Li, Mohan, additional, and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

83. Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

Author: Farooq, Muhammad Umar, primary and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

84. Exploring Speech Representations for Proficiency Assessment in Language Learning

Author: Islam, Elaf, primary, Park, Chanho, additional, and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

85. Lightly supervised alignment of subtitles on multi-genre broadcasts

Author: Saz, Oscar, Deena, Salil, Doulaty, Mortaza, Hasan, Madina, Khaliq, Bilal, Milner, Rosanna, Ng, Raymond W. M., Olcoz, Julia, and Hain, Thomas
Published: 2018
Full Text: View/download PDF

86. Unsupervised crosslingual adaptation of tokenisers for spoken language recognition

Author: Ng, Raymond W.M., Nicolao, Mauro, and Hain, Thomas
Published: 2017
Full Text: View/download PDF

87. Perceive and Predict: Self-Supervised Speech Representation Based Loss Functions for Speech Enhancement

Author: Close, George, primary, Ravenscroft, William, additional, Hain, Thomas, additional, and Goetze, Stefan, additional
Published: 2023
Full Text: View/download PDF

88. Towards Domain Generalisation in ASR with Elitist Sampling and Ensemble Knowledge Distillation

Author: Ahmad, Rehan, primary, Jalal, Md Asif, additional, Umar Farooq, Muhammad, additional, Ollerenshaw, Anna, additional, and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

89. Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

Author: Ravenscroft, William, primary, Goetze, Stefan, additional, and Hain, Thomas, additional
Published: 2023
Full Text: View/download PDF

90. Acoustic adaptation to dynamic background conditions with asynchronous transformations

Author: Saz, Oscar and Hain, Thomas
Published: 2017
Full Text: View/download PDF

91. Hidden model sequence models for automatic speech recognition

Author: Hain, Thomas
Subjects: 620
Published: 2002

92. Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition

Author: Loweimi, Erfan, Doulaty, Mortaza, Barker, Jon, Hain, Thomas, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Dediu, Adrian-Horia, editor, Martín-Vide, Carlos, editor, and Vicsi, Klára, editor
Published: 2015
Full Text: View/download PDF

93. Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system

Author: Kamper, Herman, de Wet, Febe, Hain, Thomas, and Niesler, Thomas
Published: 2014
Full Text: View/download PDF

94. Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Author: Farooq, Muhammad Umar, primary, Narayana, Darshan Adiga Haniya, additional, and Hain, Thomas, additional
Published: 2022
Full Text: View/download PDF

95. Investigating the Impact of Crosslingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

Author: Farooq, Muhammad Umar, primary and Hain, Thomas, additional
Published: 2022
Full Text: View/download PDF

96. Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals

Author: Close, George, primary, Hollands, Samuel, additional, Goetze, Stefan, additional, and Hain, Thomas, additional
Published: 2022
Full Text: View/download PDF

97. Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation

Author: Ravenscroft, William, primary, Goetze, Stefan, additional, and Hain, Thomas, additional
Published: 2022
Full Text: View/download PDF

98. Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation

Author: Ravenscroft, William, primary, Goetze, Stefan, additional, and Hain, Thomas, additional
Published: 2022
Full Text: View/download PDF

99. MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

Author: Close, George, primary, Hain, Thomas, additional, and Goetze, Stefan, additional
Published: 2022
Full Text: View/download PDF

100. The 2007 AMI(DA) System for Meeting Transcription

Author: Hain, Thomas, Burget, Lukas, Dines, John, Garau, Giulia, Karafiat, Martin, van Leeuwen, David, Lincoln, Mike, Wan, Vincent, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Stiefelhagen, Rainer, editor, Bowers, Rachel, editor, and Fiscus, Jonathan, editor
Published: 2008
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

385 results on '"Hain, Thomas"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources