Author: "Dominique Fohr" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Dominique Fohr"' showing total 113 results

Start Over Author "Dominique Fohr" Database OpenAIRE

113 results on '"Dominique Fohr"'

1. Exploration of Multi-corpus Learning for Hate Speech Classification in Low Resource Scenarios

Author: Ashwin Geet D’Sa, Irina Illina, Dominique Fohr, and Awais Akbar
Published: 2022
Full Text: View/download PDF

2. DNN-based semantic rescoring models for speech recognition

Author: Irina Illina, Dominique Fohr, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), The authors thank the DGA (Direction Générale de l’Armement, part of the French Ministry of Defence), Thales AVS and Dassault Aviation who are supporting the funding of this study and the 'Man-Machine Teaming' scientific program., MMT, Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Artificial neural network, Computer science, Speech recognition, Automatic speech recognition, Word error rate, 020206 networking & telecommunications, Context (language use), 02 engineering and technology, Semantic data model, Semantics, 01 natural sciences, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Word2vec, Language model, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], 010301 acoustics, semantics, embeddings, Word (computer architecture), BERT
Abstract: International audience; In this work, we address the problem of improving an automatic speech recognition (ASR) system. We want to efficiently model long-term semantic relations between words and introduce this information through a semantic model. We propose neural network (NN) semantic models for rescoring the N-best hypothesis list. These models use two types of representations as part of DNN input features: static word embeddings (from word2vec) and dynamic contextual embeddings (from BERT). Semantic information is computed thanks to these representations and used in the hypothesis pair comparison mode. We perform experiments on the publicly available dataset TED-LIUM. Clean speech and speech mixed with real noise are experimented, according to our industrial project context. The proposed BERT-based rescoring approach gives a significant improvement of the word error rate (WER) over the ASR system without rescoring semantic models under all experimented conditions and with n-gram and recurrent NN language model (Long Short-Term model, LSTM).
Published: 2021

3. Explaining Deep Learning Models for Speech Enhancement

Author: Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr, Microsoft Corporation [Redmond], Microsoft Corporation [Redmond, Wash.], Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), This work was made with the support of the French National Research Agency, in the framework of the project VOCADOM 'Robust voice command adapted to the user and to the context for AAL' (ANR-16-CE33-0006). Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as otherorganizations (see https://www.grid5000.fr)., ANR-16-CE33-0006,VOCADOM,Commande vocale robuste adaptée à la personne et au contexte pour l'autonomie à domicile(2016), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Artificial neural network, business.industry, Computer science, Deep learning, Speech recognition, Word error rate, 020206 networking & telecommunications, Context (language use), 02 engineering and technology, explainable AI, Speech enhancement, 030507 speech-language pathology & audiology, 03 medical and health sciences, Noise, feature attribution, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Robustness (computer science), [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD], 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), speech enhancement, Artificial intelligence, 0305 other medical science, business
Abstract: International audience; We consider the problem of explaining the robustness of neural networks used to compute time-frequency masks for speech enhancement to mismatched noise conditions. We employ the Deep SHapley Additive exPlanations (DeepSHAP) feature attribution method to quantify the contribution of every timefrequency bin in the input noisy speech signal to every timefrequency bin in the output time-frequency mask. We define an objective metric-referred to as the speech relevance scorethat summarizes the obtained SHAP values and show that it correlates with the enhancement performance, as measured by the word error rate on the CHiME-4 real evaluation dataset. We use the speech relevance score to explain the generalization ability of three speech enhancement models trained using synthetically generated speech-shaped noise, noise from a professional sound effects library, or real CHiME-4 noise. To the best of our knowledge, this is the first study on neural network explainability in the context of speech enhancement.
Published: 2021
Full Text: View/download PDF

4. BERT-based Semantic Model for Rescoring N-best Speech Recognition List

Author: Irina Illina, Dominique Fohr, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), The authors thank the DGA (Direction Générale de l’Armement, part of the French Ministry of Defence), Thales AVS and Dassault Aviation who are supporting the funding of this study and the 'Man-Machine Teaming' scientific program in which this project is taking place., Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Artificial neural network, Computer science, Speech recognition, Automatic speech recognition, Word error rate, 020206 networking & telecommunications, 02 engineering and technology, Semantic data model, Semantic consistency, 030507 speech-language pathology & audiology, 03 medical and health sciences, Noise, semantic context, 0202 electrical engineering, electronic engineering, information engineering, Semantic context, Word2vec, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], 0305 other medical science, embeddings, Word (computer architecture), BERT
Abstract: International audience; This work aims to improve automatic speech recognition (ASR) by modeling long-term semantic relations. We propose to perform this through rescoring the ASR N-best hypotheses list. To achieve this, we propose two deep neural network (DNN) models and combine semantic, acoustic, and linguistic information. Our DNN rescoring models are aimed at selecting hypotheses that have better semantic consistency and therefore lower WER. We investigate a powerful representation as part of input features to our DNN model: dynamic contextual embeddings from Transformer-based BERT. Acoustic and linguistic features are also included. We perform experiments on the publicly available dataset TED-LIUM. We evaluate in clean and in noisy conditions, with n-gram and Recurrent Neural Network Language Model (RNNLM), more precisely Long Short-Term Memory (LSTM) model. The proposed rescoring approaches give significant WER improvements over the ASR system without rescoring models. Furthermore, the combination of rescoring methods based on BERT and GPT-2 scores achieves the best results.
Published: 2021
Full Text: View/download PDF

5. Unsupervised Domain Adaptation in Cross-corpora Abusive Language Detection

Author: Irina Illina, Dominique Fohr, Tulika Bose, Bose, Tulika, ISITE - Isite LUE - - LUE2015 - ANR-15-IDEX-0004 - IDEX - VALID, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), This work was supported partly by the french PIA project 'Lorraine Université d’Excellence', reference ANR-15-IDEX-04-LUE., IMPACT-OLKi, GRID5000, ANR-15-IDEX-0004,LUE,Isite LUE(2015), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Domain adaptation, Language identification, Computer science, business.industry, 05 social sciences, 02 engineering and technology, [INFO] Computer Science [cs], computer.software_genre, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Task (project management), Annotation, 0202 electrical engineering, electronic engineering, information engineering, [INFO]Computer Science [cs], 020201 artificial intelligence & image processing, Artificial intelligence, Language model, 0509 other social sciences, 050904 information & library sciences, business, Adaptation (computer science), computer, Natural language processing
Abstract: International audience; The state-of-the-art abusive language detection models report great in-corpus performance, but underperform when evaluated on abusive comments that differ from the training scenario. As human annotation involves substantial time and effort, models that can adapt to newly collected comments can prove to be useful. In this paper, we investigate the effectiveness of several Unsupervised Domain Adaptation (UDA) approaches for the task of cross-corpora abusive language detection. In comparison, we adapt a variant of the BERT model, trained on large-scale abusive comments, using Masked Language Model (MLM) fine-tuning. Our evaluation shows that the UDA approaches result in sub-optimal performance, while the MLM fine-tuning does better in the cross-corpora setting. Detailed analysis reveals the limitations of the UDA approaches and emphasizes the need to build efficient adaptation methods for this task.
Published: 2021

6. Exploring Conditional Language Model Based Data Augmentation Approaches for Hate Speech Classification

Author: Dana Ruiter, Ashwin Geet D'Sa, Dietrich Klakow, Dominique Fohr, and Irina Illina
Subjects: Training set, Artificial neural network, Computer science, Speech recognition, Classifier (linguistics), Labeled data, Language model, Speech classification, Class (biology), Generative grammar
Abstract: Deep Neural Network (DNN) based classifiers have gained increased attention in hate speech classification. However, the performance of DNN classifiers increases with quantity of available training data and in reality, hate speech datasets consist of only a small amount of labeled data. To counter this, Data Augmentation (DA) techniques are often used to increase the number of labeled samples and therefore, improve the classifier’s performance. In this article, we explore augmentation of training samples using a conditional language model. Our approach uses a single class conditioned Generative Pre-Trained Transformer-2 (GPT-2) language model for DA, avoiding the need for multiple class specific GPT-2 models. We study the effect of increasing the quantity of the augmented data and show that adding a few hundred samples significantly improves the classifier’s performance. Furthermore, we evaluate the effect of filtering the generated data used for DA. Our approach demonstrates up to 7.3% and up to 25.0% of relative improvements in macro-averaged F1 on two widely used hate speech corpora.
Published: 2021
Full Text: View/download PDF

7. Multiword Expression Features for Automatic Hate Speech Detection

Author: Nicolas Zampieri, Dominique Fohr, and Irina Illina
Subjects: Voice activity detection, Artificial neural network, Computer science, business.industry, Deep learning, computer.software_genre, Multiword expression, Word2vec, Artificial intelligence, business, Encoder, computer, Sentence, Word (computer architecture), Natural language processing
Abstract: The task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed HSD system with MWE features significantly outperforms the baseline system in terms of macro-F1.
Published: 2021
Full Text: View/download PDF

8. Introduction of semantic model to help speech recognition

Author: Stephane Level, Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), The authors thank the DGA (Direction G´en´erale de l’Armement, part of the French Ministry of Defence), Thales AVS and Dassault Aviation who are supporting the funding of this study and the 'Man-Machine Teaming' scientific program in which this research project is taking place., Man-Machine Teaming (MMT), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Computer science, Speech recognition, Automatic speech recognition, Word error rate, Context (language use), 02 engineering and technology, Semantic data model, Term (time), Embeddings, 030507 speech-language pathology & audiology, 03 medical and health sciences, Noise, Semantic context, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Word2vec, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], 0305 other medical science, Sentence
Abstract: International audience; Current Automatic Speech Recognition (ASR) systems mainly take into account acoustic, lexical and local syntactic information. Long term semantic relations are not used. ASR systems significantly decrease performance when the training conditions and the testing conditions differ due to the noise, etc. In this case the acoustic information can be less reliable. To help noisy ASR system, we propose to supplement ASR system with a semantic module. This module re-evaluates the N-best speech recognition hypothesis list and can be seen as a form of adaptation in the context of noise. For the words in the processed sentence that could have been poorly recognized, this module chooses words that correspond better to the semantic context of the sentence. To achieve this, we introduced the notions of a context part and possibility zones that measure the similarity between the semantic context of the document and the corresponding possible hypothesis. The proposed methodology uses two continuous representations of words: word2vec and FastText. We conduct experiments on the publicly available TED conferences dataset (TED-LIUM) mixed with real noise. The proposed method achieves a significant improvement of the word error rate (WER) over the ASR system without semantic information.
Published: 2020

9. BERT and fastText Embeddings for Automatic Detection of Toxic Speech

Author: Ashwin Geet D'Sa, Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Grid'5000, and D'Sa, Ashwin Geet
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], word embeddings, 0209 industrial biotechnology, Computer science, business.industry, Natural language processing, Offensive, 02 engineering and technology, 16. Peace & justice, computer.software_genre, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], 020901 industrial engineering & automation, deep neural networks, hate speech detection, Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, Deep neural networks, 020201 artificial intelligence & image processing, Social media, The Internet, Artificial intelligence, business, computer, Word (computer architecture)
Abstract: International audience; With the expansion of Internet usage, catering to the dissemination of thoughts and expressions of an individual, there has been an immense increase in the spread of online hate speech. Social media, community forums, discussion platforms are few examples of common playground of online discussions where people are freely allowed to communicate. However, the freedom of speech may be misused by some people by arguing aggressively, offending others and spreading verbal violence. As there is no clear distinction between the terms offensive, abusive, hate and toxic speech, in this paper we consider the above mentioned terms as toxic speech. In many countries, online toxic speech is punishable by the law. Thus, it is important to automatically detect and remove toxic speech from online medias. Through this work, we propose automatic classification of toxic speech using embedding representations of words and deep-learning techniques. We perform binary and multi-class classification using a Twitter corpus and study two approaches: (a) a method which consists in extracting of word embeddings and then using a DNN classifier; (b) fine-tuning the pre-trained BERT model. We observed that BERT fine-tuning performed much better. Proposed methodology can be used for any other type of social media comments.
Published: 2020
Full Text: View/download PDF

10. Label Propagation-Based Semi-Supervised Learning for Hate Speech Classification

Author: Dietrich Klakow, Irina Illina, Dana Ruiter, Ashwin Geet D'Sa, Dominique Fohr, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Saarland University [Saarbrücken], and GRID5000
Subjects: ComputingMethodologies_PATTERNRECOGNITION, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Computer science, Speech recognition, 0202 electrical engineering, electronic engineering, information engineering, Labeled data, [INFO]Computer Science [cs], 020206 networking & telecommunications, 02 engineering and technology, Semi-supervised learning, Speech classification, Classifier (UML), Label propagation
Abstract: International audience; Research on hate speech classification has received increased attention. In real-life scenarios , a small amount of labeled hate speech data is available to train a reliable classifier. Semi-supervised learning takes advantage of a small amount of labeled data and a large amount of unlabeled data. In this paper, label propagation-based semi-supervised learning is explored for the task of hate speech classification. The quality of labeling the unla-beled set depends on the input representations. In this work, we show that pre-trained representations are label agnostic, and when used with label propagation yield poor results. Neu-ral network-based fine-tuning can be adopted to learn task-specific representations using a small amount of labeled data. We show that fully fine-tuned representations may not always be the best representations for the label propagation and intermediate representations may perform better in a semi-supervised setup.
Published: 2020
Full Text: View/download PDF

11. RNN Language Model Estimation for Out-of-Vocabulary Words

Author: Dominique Fohr and Irina Illina
Subjects: Perplexity, Artificial neural network, business.industry, Probability estimation, Computer science, 02 engineering and technology, computer.software_genre, Lexicon, Out of vocabulary, 030507 speech-language pathology & audiology, 03 medical and health sciences, Recurrent neural network, 0202 electrical engineering, electronic engineering, information engineering, Proper noun, 020201 artificial intelligence & image processing, Language model, Artificial intelligence, 0305 other medical science, business, computer, Natural language processing
Abstract: One important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun probability estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show relative perplexity reductions of about 14% compared to baseline RNNLM.
Published: 2020
Full Text: View/download PDF

12. A Fine-grained Multilingual Analysis Based on the Appraisal Theory: Application to Arabic and English Videos

Author: Odile Mella, Karima Abidi, Kamel Smaïli, David Langlois, Denis Jouvet, Dominique Fohr, Statistical Machine Translation and Speech Modelization and Text (SMarT), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Appraisal theory, Word embedding, Arabic, Polarity (physics), Computer science, business.industry, Sentiment analysis, 02 engineering and technology, computer.software_genre, Video analysis, language.human_language, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, language, Sentiment Analysis, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Utterance, Natural language processing, Graduation
Abstract: International audience; The objective of this paper is to compare the opinions of two videos in two different languages. To do so, a fine-grained approach inspired from the appraisal theory is used to analyze the content of the videos that concern the same topic. In general, the methods devoted to sentiment analysis concern the study of the polarity of a text or an utterance. The appraisal approach goes further than the basic polarity sentiments and consider more detailed sentiments by covering additional attributes of opinions such as: Attitude, Graduation and Engagement. In order to achieve such a comparison, in AMIS (Chist-Era project), we collected a corpus of 1503 Arabic and 1874 English videos. These videos need to be aligned in order to compare their contents, that is why we propose several methods to make them comparable. Then the best one is selected to align them and to constitute the data-set necessary for the fine-grained sentiment analysis.
Published: 2019
Full Text: View/download PDF

13. Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations

Author: Juan-Manuel Torres-Moreno, Kamel Smaïli, David Langlois, Denis Jouvet, Odile Mella, Mohamed Amine Menacer, Dominique Fohr, Karima Abidi, Carlos-Emiliano González-Gallardo, Fatiha Sadat, Statistical Machine Translation and Speech Modelization and Text (SMarT), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Université du Québec à Montréal = University of Québec in Montréal (UQAM), and Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU)
Subjects: Computer science, Arabic, business.industry, Automatic speech recognition ASR, Foreign language, Word error rate, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, Automatic summarization, language.human_language, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Segmentation, Text summarization, 0202 electrical engineering, electronic engineering, information engineering, language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Scope (computer science), Natural language processing, Video summarisation
Abstract: International audience; In this paper, we present and evaluate a method for extractive text-based summarization of Arabic videos. The algorithm is proposed in the scope of the AMIS project that aims at helping a user to understand videos given in a foreign language (Arabic). For that, the project proposes several strategies to translate and summarize the videos. One of them consists in transcribing the Ara-bic videos, summarizing the transcriptions, and translating the summary. In this paper we describe the video corpus that was collected from YouTube and present and evaluate the transcription-summarization part of this strategy. Moreover, we present the Automatic Speech Recognition (ASR) system used to transcribe the videos, and show how we adapted this system to the Algerian dialect. Then, we describe how we automatically segment into sentences the sequence of words provided by the ASR system, and how we summarize the obtained sequence of sentences. We evaluate objectively and subjectively our approach. Results show that the ASR system performs well in terms of Word Error Rate on MSA, but needs to be adapted for dealing with Algerian dialect data. The subjective evaluation shows the same behaviour than ASR: transcriptions for videos containing dialectal data were better scored than videos containing only MSA data. However, summaries based on transcriptions are not as well rated, even when transcriptions are better rated. Last, the study shows that features, such as the lengths of transcriptions and summaries, and the subjective score of transcriptions, explain only 31% of the subjective score of summaries.
Published: 2019
Full Text: View/download PDF

14. Summarizing videos into a target language: Methodology, architectures and evaluation

Author: Begona Garcia-Zapirain, Lucjan Janowski, Michał Grega, Odile Mella, David Langlois, Kamel Smaïli, Eric SanJuan, Denis Jouvet, Dominique Fohr, Mikołlaj Leszczuk, Elvys Linhares Pontes, Mohamed-Amine Menacer, Carlos-Emiliano González-Gallardo, Amaia Mendez, Juan-Manuel Torres-Moreno, Arian Koźbiał, Statistical Machine Translation and Speech Modelization and Text (SMarT), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, AGH University of Science and Technology [Krakow, PL] (AGH UST), University of Deusto (DEUSTO), and Universidad de Deusto (DEUSTO)
Subjects: Statistics and Probability, Machine translation, Computer science, media_common.quotation_subject, Foreign language, Text Boundary Segmentation, 02 engineering and technology, Automatic Speech Recognition, computer.software_genre, Speech segmentation, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Artificial Intelligence, Component (UML), 0202 electrical engineering, electronic engineering, information engineering, Quality (business), Objective and Subjective evaluations, media_common, Focus (computing), Information retrieval, Video Summarization, General Engineering, Collecting data, Statistical Machine Translation, Automatic summarization, Text and Audio Summarization, 020201 artificial intelligence & image processing, State (computer science), computer
Abstract: International audience; The aim of the work is to report the results of the Chist-Era project AMIS (Access Multilingual Information opinionS). The purpose of AMIS is to answer the following question: How to make the information in a foreign language accessible for everyone? This issue is not limited to translate a source video into a target language video since the objective is to provide only the main idea of an Arabic video in English. This objective necessitates developing research in several areas that are not, all arrived at a maturity state: Video summarization, Speech recognition, Machine translation, Audio summarization and Speech segmentation. In this article we present several possible architectures to achieve our objective, yet we focus on only one of them. The scientific locks are be presented, and we explain how to deal with them. One of the big challenges of this work is to conceive a way to evaluate objectively a system composed of several components knowing that each of them has its limits and can propagate errors through the first component. Also, a subjective evaluation procedure is proposed in which several annotators have been mobilized to test the quality of the achieved summaries.
Published: 2019
Full Text: View/download PDF

15. Machine Translation on a Parallel Code-Switched Corpus

Author: Odile Mella, David Langlois, Kamel Smaïli, Mohamed Amine Menacer, Denis Jouvet, Dominique Fohr, Statistical Machine Translation and Speech Modelization and Text (SMarT), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Neural Machine Translation, 050101 languages & linguistics, Machine translation, Computer science, Carry (arithmetic), 02 engineering and technology, computer.software_genre, Translation (geometry), [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Code (semiotics), Parallel code-switched corpora, Resource (project management), Code-switching, Phenomenon, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, business.industry, 05 social sciences, Statistical Machine Translation, 16. Peace & justice, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, Utterance
Abstract: International audience; Code-switching (CS) is the phenomenon that occurs when a speaker alternates between two or more languages within an utterance or discourse. In this work, we investigate the existence of code-switching in formal text, namely proceedings of multilingual institutions. Our study is carried out on the Arabic-English code-mixing in a parallel corpus extracted from official documents of United Nations. We build a parallel code-switched corpus with two reference translations one in pure Arabic and the other in pure English. We also carry out a human evaluation of this resource in the aim to use it to evaluate the translation of code-switched documents. To the best of our knowledge, this kind of corpora does not exist. The one we propose is unique. This paper examines several methods to translate code-switched corpus: conventional statistical machine translation, the end-to-end neural machine translation and multitask-learning.
Published: 2019
Full Text: View/download PDF

16. Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

Author: Emmanuel Vincent, Sunit Sivasankaran, Dominique Fohr, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Grid'5000, ANR-16-CE33-0006,VOCADOM,Commande vocale robuste adaptée à la personne et au contexte pour l'autonomie à domicile(2016), This work was made with the support of the French National Research Agency, in the framework of the project VOCADOM 'Robust voice commandadapted to the user and to the context for AAL' (ANR-16-CE33-0006). Experiments presented in this paper were carried out using the Grid’5000testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several universities as well as other organizations (see https://www.grid5000.fr) and using the EXPLOR centre, hosted by the University of Lorraine., Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Multichannel speech separation, WSJ0-2mix reverberated, Signal processing, Noise measurement, Artificial neural network, Computer science, Speech recognition, Word error rate, 020206 networking & telecommunications, 02 engineering and technology, Speech processing, Signal-to-noise ratio, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Audio and Speech Processing (eess.AS), [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD], 0202 electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Adaptive beamformer, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from the speaker location which is then used to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second order statistics and to derive an adaptive beamformer in the third stage. We generated a multichannel, multispeaker, reverberated, noisy dataset inspired from the well studied WSJ0-2mix and study the performance of the proposed pipeline in terms of the word error rate (WER). An average WER of $29.4$% was achieved using the ground truth localization information and $42.4$% using the localization information estimated via GCC-PHAT. The signal-to-interference ratio (SIR) between the speakers has a higher impact on the ASR performance, to the extent of reducing the WER by $59$% relative for a SIR increase of $15$ dB. By contrast, increasing the spatial distance to $50^\circ$ or more improves the WER by $23$% relative only, Comment: Submitted to ICASSP 2020
Published: 2019
Full Text: View/download PDF

17. A First Summarization System of a Video in a Target Language

Author: Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno, Denis Jouvet, Michał Grega, Amaia Mendez, Begona Garcia-Zapirain, Kamel Smaïli, David Langlois, Lucjan Janowski, Artur Komorowski, Damian Świst, Arian Koźbiał, Elvys Linhares Pontes, Odile Mella, Mohamed Amine Menacer, Dominique Fohr, Mikołaj Leszczuk, Eric SanJuan, Statistical Machine Translation and Speech Modelization and Text (SMarT), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, AGH University of Science and Technology [Krakow, PL] (AGH UST), Universidad de Deusto (DEUSTO), École Polytechnique de Montréal (EPM), and University of Deusto (DEUSTO)
Subjects: Machine translation, Computer science, Process (engineering), media_common.quotation_subject, Foreign language, Video Summarization, Compression, 020206 networking & telecommunications, Text Boundary Segmentation ·, 02 engineering and technology, computer.software_genre, Automatic summarization, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], World Wide Web, Speech Recognition, Machine Translation, Fragment (logic), Text summarization, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Aptitude, Content (Freudian dream analysis), computer, media_common, Meaning (linguistics)
Abstract: International audience; In this paper, we present the first results of the project AMIS (Access Multilingual Information opinionS) funded by Chist-Era. The main goal of this project is to understand the content of a video in a foreign language. In this work, we consider the understanding process, such as the aptitude to capture the most important ideas contained in a media expressed in a foreign language. In other words, the understanding will be approached by the global meaning of the content of a support and not by the meaning of each fragment of a video. Several stumbling points remain before reaching the fixed goal. They concern the following aspects: Video summarization, Speech recognition, Machine translation and Speech segmentation. All these issues will be discussed and the methods used to develop each of these components will be presented. A first implementation is achieved and each component of this system is evaluated on a representative test data. We propose also a protocol for a global subjective evaluation of AMIS.
Published: 2018

18. An Integrated AMIS Prototype for Automated Summarization and Translation of Newscasts and Reports

Author: Mohamed Amine Menacer, Denis Jouvet, Juan-Manuel Torres-Moreno, Odile Mella, Michał Grega, Mikołaj Leszczuk, Dominique Fohr, Carlos-Emiliano González-Gallardo, Elvys Linhares Pontes, Kamel Smaïli, AGH University of Science and Technology [Krakow, PL] (AGH UST), Statistical Machine Translation and Speech Modelization and Text (SMarT), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, École Polytechnique de Montréal (EPM), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Kazimierz Choroś, Marek Kopel, Elżbieta Kukla, and Andrzej Siemiński
Subjects: Machine translation, Computer science, business.industry, 0206 medical engineering, 02 engineering and technology, Translation (geometry), computer.software_genre, 020601 biomedical engineering, Automatic summarization, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], 03 medical and health sciences, Software modules, 0302 clinical medicine, Artificial intelligence, business, computer, 030217 neurology & neurosurgery, Natural language processing
Abstract: International audience; In this paper we present the results of the integration works on the system designed for automated summarization and translation of newscast and reports. We show the proposed system architectures and list the available software modules. Thanks to well defined interfaces the software modules may be used as building blocks allowing easy experimentation with different summarization scenarios.
Published: 2018
Full Text: View/download PDF

19. Keyword Based Speaker Localization: Localizing a Target Speaker in a Multi-speaker Environment

Author: Emmanuel Vincent, Dominique Fohr, and Sunit Sivasankaran
Subjects: Reverberation, Computer science, Speech recognition, 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 020201 artificial intelligence & image processing, 02 engineering and technology, Noise (video), Convolutional neural network, Word (computer architecture), Task (project management)
Abstract: Speaker localization is a hard task, especially in adverse environmental conditions involving reverberation and noise. In this work we introduce the new task of localizing the speaker who uttered a given keyword, e.g., the wake-up word of a distant-microphone voice command system, in the presence of overlapping speech. We employ a convolutional neural network based localization system and investigate multiple identifiers as additional inputs to the system in order to characterize this speaker. We conduct experiments using ground truth identifiers which are obtained assuming the availability of clean speech and also in realistic conditions where the identifiers are computed from the corrupted speech. We find that the identifier consisting of the ground truth time-frequency mask corresponding to the target speaker provides the best localization performance and we propose methods to estimate such a mask in adverse reverberant and noisy conditions using the considered keyword.
Published: 2018
Full Text: View/download PDF

20. Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents

Author: Irina Illina and Dominique Fohr
Subjects: Vocabulary, Word embedding, Artificial neural network, Recall, business.industry, Computer science, media_common.quotation_subject, computer.software_genre, Transcription (linguistics), Proper noun, Artificial intelligence, business, computer, Natural language processing, media_common
Abstract: Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation. This model allows to better take into account lexical and semantic word relationships. In the context of broadcast news transcription and in terms of recall, experimental results show a good ability of the proposed model to select new relevant proper names.
Published: 2018
Full Text: View/download PDF

21. Dynamic Extension of ASR Lexicon Using Wikipedia Data

Author: Badr M. Abdullah, Irina Illina, Dominique Fohr, Illina, Irina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Text corpus, Vocabulary, Word embedding, Computer science, media_common.quotation_subject, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 02 engineering and technology, [INFO] Computer Science [cs], Semantics, computer.software_genre, Lexicon, 01 natural sciences, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, lexicon extension, Proper noun, [INFO]Computer Science [cs], 010301 acoustics, media_common, Context model, business.industry, Search engine indexing, Automatic speech recognition, word embedding, out-of-vocabulary words, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing
Abstract: International audience; Despite recent progress in developing Large Vocabulary Continuous Speech Recognition Systems (LVCSR), these systems suffer from Out-Of-Vocabulary words (OOV). In many cases, the OOV words are Proper Nouns (PNs). The correct recognition of PNs is essential for broadcast news, audio indexing, etc. In this article, we address the problem of OOV PN retrieval in the framework of broadcast news LVCSR. We focused on dynamic (document dependent) extension of LVCSR lexicon. To retrieve relevant OOV PNs, we propose to use a very large multipurpose text corpus: Wikipedia. This corpus contains a huge number of PNs. These PNs are grouped in semantically similar classes using word embedding. We use a two-step approach: first, we select OOV PN pertinent classes with a multi-class Deep Neural Network (DNN). Secondly, we rank the OOVs of the selected classes. The experiments on French broadcast news show that the Bi-GRU model outperforms other studied models. Speech recognition experiments demonstrate the effectiveness of the proposed methodology.
Published: 2018

22. Topic segmentation in ASR transcripts using bidirectional rnns for change detection

Author: Imran Sehikh, Irina Illina, Dominique Fohr, TCS Innovation Labs, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Grid'5000, ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Fohr, Dominique, and BLANC - Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio - - ContNomina2012 - ANR-12-BS02-0009 - BLANC - VALID
Subjects: business.industry, Computer science, Speech recognition, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, Cohesion (linguistics), topic segmentation, Recurrent neural network, Discriminative model, Broadcast television systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Segmentation, The Internet, recurrent neural networks, Artificial intelligence, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], business, computer, Change detection, Natural language processing
Abstract: International audience; Topic segmentation methods are mostly based on the idea of lexical cohesion, in which lexical distributions are analysed across the document and segment boundaries are marked in areas of low cohesion. We propose a novel approach for topic segmentation in speech recognition transcripts by measuring lexical cohesion using bidirectional Recurrent Neural Networks (RNN). The bidirectional RNNs capture context in the past and the following set of words. The past and following contexts are compared to perform topic change detection. In contrast to existing works based on sequence and discrim-inative models for topic segmentation, our approach does not use a segmented corpus nor (pseudo) topic labels for training. Our model is trained using news articles obtained from the internet. Evaluation on ASR transcripts of French TV broadcast news programs demonstrates the effectiveness of our proposed approach.
Published: 2017

23. Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect

Author: Kamel Smaïli, Mohamed Amine Menacer, Odile Mella, David Langlois, Denis Jouvet, Dominique Fohr, Statistical Machine Translation and Speech Modelization and Text (SMarT), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Arabic, Computer science, Speech recognition, 02 engineering and technology, computer.software_genre, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Algerian dialect, MSA, 0202 electrical engineering, electronic engineering, information engineering, General Environmental Science, business.industry, French, Acoustic model, 020206 networking & telecommunications, language.human_language, Language model, language, Modern Standard Arabic, General Earth and Planetary Sciences, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Word (computer architecture), Natural language processing
Abstract: International audience; This paper addresses the development of an Automatic Speech Recognition system for Modern Standard Arabic (MSA) and its extension to Algerian dialect. Algerian dialect is very different from Arabic dialects of the Middle-East, since it is highly influenced by the French language. In this article, we start by presenting the new automatic speech recognition named ALASR (Arabic Loria Automatic Speech Recognition) system. The acoustic model of ALASR is based on a DNN approach and the language model is a classical n-gram. Several options are investigated in this paper to find the best combination of models and parameters. ALASR achieves good results for MSA in terms of WER (14.02%), but it completely collapses on an Algerian dialect data set of 70 minutes (a WER of 89%). In order to take into account the impact of the French language, on the Algerian dialect, we combine in ALASR two acoustic models, the original one (MSA) and a French one trained on ESTER corpus. This solution has been adopted because no transcribed speech data for Algerian dialect are available. This combination leads to a substantial absolute reduction of the word error of 24%. c 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 3rd International Conference on Arabic Computational Linguistics .
Published: 2017

24. Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

Author: Georges Linarès, Dominique Fohr, Irina Illina, Imran Sheikh, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), Grid'5000, ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Topic model, Vocabulary, Acoustics and Ultrasonics, Computer science, media_common.quotation_subject, Speech recognition, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Context (language use), 02 engineering and technology, 010501 environmental sciences, Semantics, computer.software_genre, 01 natural sciences, Latent Dirichlet allocation, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], symbols.namesake, 0202 electrical engineering, electronic engineering, information engineering, Computer Science (miscellaneous), Electrical and Electronic Engineering, out-of-vocabulary, proper names, 0105 earth and related environmental sciences, media_common, Context model, business.industry, Computational Mathematics, large vocabulary continuous speech recognition, Automatic indexing, semantic context, symbols, 020201 artificial intelligence & image processing, Artificial intelligence, Language model, business, computer, Natural language processing
Abstract: International audience; The diachronic nature of broadcast news data leads to the problem of Out-Of-Vocabulary (OOV) words in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are Proper Names (PNs). However PNs are important for automatic indexing of audio-video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the semantic context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and semantic context derived from Latent Dirichlet Allocation (LDA) topic models, continuous word vector representations and the Neural Bag-of-Words (NBOW) model which is capable of learning task specific word and context representations. We propose a Neural Bag-of-Weighted Words (NBOW2) model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed context models.
Published: 2017
Full Text: View/download PDF

25. An enhanced automatic speech recognition system for Arabic

Author: Mohamed Amine Menacer, Kamel Smaïli, Denis Jouvet, Dominique Fohr, Odile Mella, David Langlois, Smaïli, Kamel, Statistical Machine Translation and Speech Modelization and Text (SMarT), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), AMIS - Chist-Era Project, Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Computer science, Arabic, Speech recognition, Arabic Speech recognition, Word error rate, 02 engineering and technology, computer.software_genre, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Task (project management), Arabic speech recognition, 0202 electrical engineering, electronic engineering, information engineering, business.industry, 020206 networking & telecommunications, language.human_language, Language model, Focus (linguistics), [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], Modern Standard Arabic, language, 020201 artificial intelligence & image processing, Artificial intelligence, Rewriting, business, computer, Natural language processing, Arabic processing
Abstract: International audience; Automatic speech recognition for Arabic is a very challenging task. Despite all the classical techniques for Automatic Speech Recognition (ASR), which can be efficiently applied to Arabic speech recognition , it is essential to take into consideration the language specificities to improve the system performance. In this article, we focus on Modern Standard Arabic (MSA) speech recognition. We introduce the challenges related to Arabic language, namely the complex morphology nature of the language and the absence of the short vowels in written text, which leads to several potential vowelization for each graphemes, which is often conflicting. We develop an ASR system for MSA by using Kaldi toolkit. Several acoustic and language models are trained. We obtain a Word Error Rate (WER) of 14.42 for the baseline system and 12.2 relative improvement by rescoring the lattice and by rewriting the output with the right hamoza above or below Alif.
Published: 2017

26. Dynamic adjustment of language models for automatic speech recognition using word similarity

Author: Anna Currey, Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Fohr, Dominique, and BLANC - Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio - - ContNomina2012 - ANR-12-BS02-0009 - BLANC - VALID
Subjects: word embeddings, language modeling, Vocabulary, Perplexity, Computer science, media_common.quotation_subject, Speech recognition, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Word error rate, 02 engineering and technology, computer.software_genre, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Data modeling, ASR, lexicon extension, 0202 electrical engineering, electronic engineering, information engineering, Proper noun, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], media_common, Context model, business.industry, OOV, 020206 networking & telecommunications, [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], 020201 artificial intelligence & image processing, Language model, Artificial intelligence, [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], business, computer, Word (computer architecture), Natural language processing
Abstract: International audience; Out-of-vocabulary (OOV) words can pose a particular problem for automatic speech recognition (ASR) of broadcast news. The language models (LMs) of ASR systems are typically trained on static corpora, whereas new words (particularly new proper nouns) are continually introduced in the media. Additionally, such OOVs are often content-rich proper nouns that are vital to understanding the topic. In this work, we explore methods for dynamically adding OOVs to language models by adapting the n-gram language model used in our ASR system. We propose two strategies: the first relies on finding in-vocabulary (IV) words similar to the OOVs, where word embeddings are used to define similarity. Our second strategy leverages a small contemporary corpus to estimate OOV probabilities. The models we propose yield improvements in perplexity over the baseline; in addition, the corpus-based approach leads to a significant decrease in proper noun error rate over the baseline in recognition experiments.
Published: 2016
Full Text: View/download PDF

27. Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition

Author: Irina Illina, Imran Sheikh, Georges Linarès, Dominique Fohr, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Computer science, Speech recognition, Process (computing), oov, Context (language use), 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Task (project management), Bag-of-words model, 0202 electrical engineering, electronic engineering, information engineering, lvcsr, Embedding, Proper noun, 020201 artificial intelligence & image processing, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], Layer (object-oriented design), proper names, 0105 earth and related environmental sciences
Abstract: International audience; Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech recognition systems used to process di-achronic audio data. To enable recovery of the PNs missed by the system, relevant OOV PNs can be retrieved by exploiting the semantic context of the spoken content. In this paper, we explore the Neural Bag-of-Words (NBOW) model, proposed previously for text classification, to retrieve relevant OOV PNs. We propose a Neural Bag-of-Weighted-Words (NBOW2) model in which the input embedding layer is augmented with a context anchor layer. This layer learns to assign importance to input words and has the ability to capture (task specific) keywords in a NBOW model. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outper-form earlier methods based on raw embeddings from LDA and Skip-gram. Combining NBOW with NBOW2 gives faster convergence during training.
Published: 2016
Full Text: View/download PDF

28. Learning Word Importance with the Neural Bag-of-Words Model

Author: Imran Sheikh, Georges Linarès, Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Fohr, Dominique, and BLANC - Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio - - ContNomina2012 - ANR-12-BS02-0009 - BLANC - VALID
Subjects: business.industry, Computer science, Speech recognition, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Task (project management), Bag-of-words model, Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], business, computer, Word (computer architecture), Natural language processing, 0105 earth and related environmental sciences
Abstract: The Neural Bag-of-Words (NBOW) model performs classification with an average of the input word vectors and achieves an impressive performance. While the NBOW model learns word vectors targeted for the classification task it does not explicitly model which words are important for given task. In this paper we propose an improved NBOW model with this ability to learn task specific word importance weights. The word importance weights are learned by introducing a new weighted sum composition of the word vectors. With experiments on standard topic and sentiment classification tasks, we show that (a) our proposed model learns meaningful word importance for a given task (b) our model gives best accuracies among the BOW approaches. We also show that the learned word importance weights are comparable to tf-idf based word weights when used as features in a BOWSVM classifier.
Published: 2016

29. Document level semantic context for retrieving OOV proper names

Author: Irina Ulina, Georges Linarès, Dominique Fohr, Imran Sheikh, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Context (language use), 02 engineering and technology, computer.software_genre, 030507 speech-language pathology & audiology, 03 medical and health sciences, Document level, Phonetic search technology, 0202 electrical engineering, electronic engineering, information engineering, Semantic context, Proper noun, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], proper names, Information retrieval, Training set, business.industry, Search engine indexing, OOV, semantic, 020201 artificial intelligence & image processing, Artificial intelligence, 0305 other medical science, business, computer, Natural language processing, indexing
Abstract: International audience; Recognition of Proper Names (PNs) in speech is important for content based indexing and browsing of audio-video data.However, many PNs are Out-Of-Vocabulary (OOV) words nfor LVCSR systems used in these applications due to the diachronicnature of data. By exploiting semantic context of the audio, relevant OOV PNs can be retrieved and then the target PNs can be recovered. To retrieve OOV PNs, we propose to represent their context with document level semantic vectors; and show that this approach is able to handle less frequent OOV PNs in the training data. We study different representations, including Random Projections, LSA, LDA, Skip-gram, CBOW and GloVe. A further evaluation of recovery of target OOV PNs using a phonetic search shows that document level semantic context is reliable for recovery of OOV PNs.
Published: 2016
Full Text: View/download PDF

30. Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval

Author: Irina Illina, Georges Linarès, Dominique Fohr, Imane Nkairi, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Zygmunt Vetulani, Hans Uszkoreit, Marek Kubis, and ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012)
Subjects: Vocabulary, Out-of-vocabulary words, Computer science, media_common.quotation_subject, Word error rate, Context (language use), 02 engineering and technology, Proper names, Speech recognition, computer.software_genre, Out of vocabulary, Task (project management), 0202 electrical engineering, electronic engineering, information engineering, Selection (linguistics), Proper noun, [INFO]Computer Science [cs], media_common, business.industry, 020206 networking & telecommunications, Key (cryptography), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, Vocabulary augmentation
Abstract: International audience; Proper name recognition is a challenging task in information retrieval from large audio/video databases. Proper names are semantically rich and are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving proper names from contemporary diachronic text documents. We proposed methods that dynamically augment the automatic speech recognition system vocabulary using lexical and temporal features in diachronic documents. We also studied different metrics for proper name selection in order to limit the vocabulary augmentation and therefore the impact on the ASR performances. Recognition results show a significant reduction of the proper name error rate using an augmented vocabulary.
Published: 2016
Full Text: View/download PDF

31. Different word representations and their combination for proper name retrieval from diachronic documents

Author: Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Fohr, Dominique, BLANC - Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio - - ContNomina2012 - ANR-12-BS02-0009 - BLANC - VALID, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Vocabulary, LDA, Computer science, Speech recognition, media_common.quotation_subject, computer.software_genre, Latent Dirichlet allocation, Document level, symbols.namesake, Transcription (linguistics), Word representation, Proper noun, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], proper names, vocabulary extension, media_common, Context model, Artificial neural network, business.industry, speech recognition, neural networks, out-of-vocabulary words, symbols, Artificial intelligence, [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], business, computer, Natural language processing
Abstract: International audience; This paper deals with the problem of high-quality transcription systems for very large vocabulary automatic speech recognition (ASR). We investigate the problem of automatic retrieval of out-of-vocabulary (OOV) proper names (PNs). We want to take into account the temporal, syntactic and semantic context of words. Nowadays, Artificial Neural Networks (NN) are widely used in natural language processing: continuous space representations of words is learned automatically from unstructured text data. To model the latent topics at document level, Latent Dirichlet Allocation (LDA) has been successful. In this paper, we propose OOV PN retrieval using (1) temporal versus topic context modeling; (2) different word representation spaces for word-level and document-level context modeling; (3) combinations of retrieval results. Experimental evaluation on broadcast news data shows that the proposed method combinations lead to better results. This confirms the complementarity of methods.
Published: 2015
Full Text: View/download PDF

32. Continuous Word Representation using Neural Networks for Proper Name Retrieval from Diachronic Documents

Author: Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Fohr, Dominique, BLANC - Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio - - ContNomina2012 - ANR-12-BS02-0009 - BLANC - VALID, Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Vocabulary, Computer science, Speech recognition, media_common.quotation_subject, Word error rate, 02 engineering and technology, computer.software_genre, 030507 speech-language pathology & audiology, 03 medical and health sciences, 0202 electrical engineering, electronic engineering, information engineering, Word representation, Proper noun, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], proper names, media_common, vocabulary extension, Recall, Artificial neural network, business.industry, speech recognition, 020206 networking & telecommunications, neural networks, out-of-vocabulary words, Artificial intelligence, Transcription (software), [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], 0305 other medical science, business, computer, Natural language processing
Abstract: International audience; Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. One approach for increasing the vocabulary coverage of a speech transcription system is to automatically retrieve new proper names from contemporary diachronic text documents. In recent years, neural networks have been successfully applied to a variety of speech recognition tasks. In this paper, we investigate whether neural networks can enhance word representation in vector space for the vocabulary extension of a speech recognition system. This is achieved by using high-quality word vector representation of words from large amounts of unstructured text data proposed by Mikolov. This model allows to take into account lexical and semantic word relationships. Proposed methodology is evaluated in the context of broadcast news transcription. Obtained recall and ASR proper name error rate is compared to that obtained using cosine-based vector space methodology. Experimental results show a good ability of the proposed model to capture semantic and lexical information
Published: 2015

33. OOV Proper Name retrieval using topic and lexical context models

Author: Imran Sheikh, Georges Linarès, Dominique Fohr, and Irina Illina
Subjects: Topic model, Vocabulary, Context model, Computer science, business.industry, media_common.quotation_subject, Speech recognition, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Search engine indexing, Context (language use), computer.software_genre, Keyword spotting, Selection (linguistics), Proper noun, Artificial intelligence, business, computer, Natural language processing, media_common
Abstract: Retrieving Proper Names (PNs) specific to an audio document can be useful for vocabulary selection and OOV recovery in speech recognition, as well as in keyword spotting and audio indexing tasks. We propose methods to infer and retrieve OOV PNs relevant to an audio news document by using probabilistic topic models trained over diachronic text news. LVCSR hypothesis on the audio news document is analysed for latent topics, which is then used to retrieve relevant OOV PNs. Using an LDA topic model we obtain Recall up to 0.87 and Mean Average Precision (MAP) of 0.26 with only top 10% of the retrieved OOV PNs. We further propose methods to re-score and retrieve rare OOV PNs, and a lexical context model to improve the target OOV PN rankings assigned by the topic model, which may be biased due to prominence of certain news events. Re-scoring rare OOV PNs improves Recall whereas the lexical context model improves MAP.
Published: 2015
Full Text: View/download PDF

34. Neural Networks for Proper Name Retrieval in the Framework of Automatic Speech Recognition

Author: Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Vocabulary, Artificial neural network, Computer science, business.industry, Speech recognition, media_common.quotation_subject, speech recognition, Context (language use), computer.software_genre, neural networks, out-of-vocabulary words, Proper noun, Speech analytics, Artificial intelligence, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], business, Representation (mathematics), computer, Word (computer architecture), Natural language processing, proper names, media_common, vocabulary extension
Abstract: International audience; The problem of out-of-vocabulary words, more precisely proper names retrieval for in speech recognition is investigated. Speech recognition vocabulary is extended using diachronic documents. This article explores a new method based on neural network (NN), proposed recently by Mikolov. The NN uses high-quality continuous representation of words from large amounts of unstructured text data and predicts surrounding words of one input word. Different strategies of using the NN to take into account lexical context are proposed. Experimental results on broadcast speech recognition and comparison with previously proposed methods show an ability of NN representation to model semantic and lexical context of proper names.
Published: 2015

35. Recognition of OOV Proper Names in Diachronic Audio News

Author: Imran Sheikh, Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Fohr, Dominique, and BLANC - Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio - - ContNomina2012 - ANR-12-BS02-0009 - BLANC - VALID
Subjects: Topic model, Text corpus, business.industry, Computer science, Speech recognition, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Search engine indexing, computer.software_genre, Latent Dirichlet allocation, symbols.namesake, Phonetic search technology, symbols, Proper noun, The Internet, Artificial intelligence, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], business, computer, Natural language processing, ComputingMilieux_MISCELLANEOUS
Abstract: LVCSR based audio indexing approaches are preferred as they allow search, navigation, browsing and structuring of audio/video documents based on their content. A major challenge with LVCSR based indexing of diachronic audio data, for e.g. broadcast audio news, is OOV words and specifically OOV PNs which are very important for indexing applications. In this paper we propose an approach for recognition of OOV PNs in audio news documents using PNs extracted from collections of diachronic text news from the internet. The approach has two steps (a) reduce the long list of OOV PNs in the diachronic text corpus to a smaller list of OOV PNs which are relevant to the audio document, using probabilistic topic models (b) perform a phonetic search for the target OOV PNs with the reduced list of relevant OOV PNs. We evaluate our approach on French broadcast news videos published over a period of 6 months. Latent Dirichlet Allocation topic model is trained on diachronic text news to model PN-topic relationships and then to retrieve OOV PNs relevant to the audio document. Our proposed method retrieves up to 90% of the relevant OOV PNs by reducing the OOV PN search space to only 5% of the total OOV PNs. Phonetic search for target OOV PNs gives an F1-score up to 0.392.
Published: 2015

36. About combining forward and backward-based decoders for selecting data for unsupervised training of acoustic models

Author: Denis Jouvet, Dominique Fohr, Analysis, perception and recognition of speech (PAROLE), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Jouvet, Denis, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing, Computer science, Speech recognition, Word error rate, 02 engineering and technology, 030507 speech-language pathology & audiology, 03 medical and health sciences, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, 0202 electrical engineering, electronic engineering, information engineering, unsupervised training, Adaptation (computer science), Selection (genetic algorithm), Measure (data warehouse), business.industry, speech recognition, Process (computing), Training (meteorology), 020206 networking & telecommunications, Pattern recognition, LVCSR, combining recognizer outputs, data selection, Artificial intelligence, Transcription (software), 0305 other medical science, business, Word (computer architecture)
Abstract: International audience; This paper introduces the combination of speech decoders for selecting automatically transcribed speech data for unsupervised training or adaptation of acoustic models. Here, the combination relies on the use of a forward-based and a backward-based decoder. Best performance is achieved when selecting automatically transcribed data (speech segments) that have the same word hypotheses when processed by the Sphinx forward-based and the Julius backward-based transcription systems, and this selection process outperforms confidence measure based selection. Results are reported and discussed for adaptation and for full training from scratch, using data resulting from various selection processes, whether alone or in addition to the baseline manually transcribed data. Overall, selecting automatically transcribed speech segments that have the same word hypotheses when processed by the Sphinx forward-based and Julius backward-based recognizers, and adding this automatically transcribed and selected data to the manually transcribed data leads to significant word error rate reductions on the ESTER2 data when compared to the baseline system trained only on manually transcribed speech data.
Published: 2014
Full Text: View/download PDF

37. Constitution d'un Corpus de Français Langue Etrangère destiné aux Apprenants Allemands

Author: Odile Mella, Anne Bonneau, Dominique Fohr, Camille Fauth, Vincent Colotte, Yves Laprie, Jürgen Trouvain, Denis Jouvet, Analysis, perception and recognition of speech (PAROLE), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Allgemeine Linguistik Computational Linguistics and phonetics (Allgemeine Linguistik), Saarland University [Saarbrücken], ANR IFCASL, Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: French, phonétique, phonetics, allemand, parole non-native, français, 02 engineering and technology, German, [SCCO.LING]Cognitive science/Linguistics, language learning, corpus de parole, lcsh:Social Sciences, lcsh:H, apprentissage des langues, 030507 speech-language pathology & audiology, 03 medical and health sciences, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, [INFO]Computer Science [cs], 0305 other medical science, non native speech corpora
Abstract: La plupart des corpus en langue se focalisent sur les phénomènes linguistiques écrits et concernent l’anglais (voir le site web : « Learner corpora around the world » de l’Université de Louvain - Belgique). La recherche phonétique sur l’acquisition d’une L2 est généralement orientée vers l’étude des phénomènes segmentaux et la plupart des études ont également l’anglais comme langue cible. Les modèles de parole en L2 actuels - voir par exemple Speech Learning Model (Flege, 1995) ou Best’s Perceptual Assimilation Model (Best, 1995) – négligent bien souvent les aspects prosodiques. Notre étude concerne le français en tant que langue seconde et s’inscrit dans un projet plus vaste mené en partenariat avec une université allemande, dont l’un des buts est le développement de l’apprentissage des langues par ordinateur. (Projet ANR-DFG – Agence Nationale de la Recherche et Deutsche Forschungsgemeinschaft attribué à l’équipe Parole du LORIA UMR 7503, Nancy – France et à l’Equipe de Linguistique Computationnelle et de Phonétique FR 4.7 de l’Université de la Sarre Sarrebruck – Allemagne) dans lequel le français et l’allemand sont des langues cibles. Pour la paire allemand-français, peu de corpus parallèles sont disponibles. Nous présentons ici l’élaboration d’un corpus de productions orales de locuteurs natifs et non natifs pour la paire allemand-français. Notre corpus entend mettre au jour les déviations phonétiques et phonologiques que les locuteurs allemands produisent lorsqu’ils apprennent le français. Ce travail s’insère dans un projet plus global, Ce projet entend étudier les difficultés que les locuteurs français rencontrent lorsqu’ils apprennent l’allemand, et réciproquement. Aussi, cinquante locuteurs allemands seront recrutés dans des milieux universitaires et scolaires (niveau lycée) en Allemagne et cinquante locuteurs français dans les mêmes milieux en France. Il s’agit pour les deux populations de produire d’une part le corpus en langue étrangère (en langue française pour les locuteurs allemands et en langue allemande pour les locuteurs français) mais également le corpus en langue maternelle (en allemand pour les allemands et en français pour les français). Les corpus ainsi obtenus devraient nous permettre d’identifier les difficultés que les locuteurs allemands ou français rencontrent lorsqu’ils apprennent le français ou l’allemand. Les données de contrôle sont doubles puisque l’on pourra à la fois se référer aux productions des apprenants dans leur langue maternelle (ici l’allemand), mais également à celles de locuteurs natifs (ici germanophones). Nous ne présenterons ici que la constitution du corpus en français.
Published: 2014
Full Text: View/download PDF

38. Combining Forward-based and Backward-based Decoders for Improved Speech Recognition Performance

Author: Denis Jouvet, Dominique Fohr, Analysis, perception and recognition of speech (PAROLE), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Jouvet, Denis, Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing, Computer science, Speech recognition, Word error rate, 020206 networking & telecommunications, 02 engineering and technology, Set (abstract data type), 030507 speech-language pathology & audiology, 03 medical and health sciences, Speech recognition performance, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, 0202 electrical engineering, electronic engineering, information engineering, 0305 other medical science, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing, Test data
Abstract: International audience; Combining outputs of speech recognizers is a known way of increasing speech recognition performance. The ROVER approach handles efficiently such combinations. In this paper we show that the best performance is not achieved by combining the outputs of the best set of recognizers, but rather by combining outputs of recognizers that rely on different processing components, and in particular on a different order (backward vs. forward) for processing speech frames. Indeed, much better speech recognition results were obtained by combining outputs of sphinx-based recognizers with outputs of Julius-based recognizers than by combining the same number of outputs from only sphinx-based recognizers, even if the individual sphinx-based systems led to better results than the individual Julius-based recognizers. Further experiments have also been conducted using sphinx-based tools for processing speech frames in reverse order (i.e. backward in time). The results clearly show that combining forward-based and backward-based decoders provide significant improvement with respect to a combination of forward only or backward only decoders. Experiments have been conducted on the ESTER2 and ETAPE speech corpora. Overall, combining sphinx-based and Julius-based systems led to 18.6% word error rate on ESTER2 test data, and 24.5% word error rate on ETAPE test data.
Published: 2013

39. Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition

Author: Dominique Fohr, Odile Mella, Analysis, perception and recognition of speech (PAROLE), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: language modeling, Perplexity, business.industry, Latent semantic analysis, Computer science, Speech recognition, speech recognition, 020206 networking & telecommunications, 02 engineering and technology, Function (mathematics), computer.software_genre, 030507 speech-language pathology & audiology, 03 medical and health sciences, Random indexing, n-gram, Cache language model, random indexing, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, Language model, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], 0305 other medical science, business, computer, Natural language processing
Abstract: International audience; This paper presents the results and conclusion of a study on the introduction of semantic information through the Random Indexing paradigm in statistical language models used in speech recognition. Random Indexing is an alternative to Latent Semantic Analysis (LSA) that addresses the scalability problem of LSA. After a brief presentation of Random Indexing (RI), this paper describes, different methods to estimate the RI matrix, then how to derive probabilities from the RI matrix and finally how to combine them with n-gram language model probabilities. Then, it analyzes the performance of these different RI methods and their combinations with a 4-gram language model by computing the perplexity of a test corpus of 290,000 words from the French evaluation campaign ETAPE. Among our results, the main conclusions are (1) regardless of the method, function words should not be taken into account in the estimation of RI matrix; (2) The two methods RI_basic and TTRI_w achieved the best perplexity, i.e. a relative gain of 3% compared to the perplexity of the 4-gram language model alone (136.2 vs. 140.4).
Published: 2013

40. Analysis and Combination of Forward and Backward Based Decoders for Improved Speech Transcription

Author: Denis Jouvet and Dominique Fohr
Subjects: Search algorithm, Computer science, Speech recognition, SIGNAL (programming language), Process (computing), Language model, Speech transcription, Space (commercial competition), Heuristics, Decoding methods
Abstract: This paper analysis the behavior of forward and backward-based decoders used for speech transcription. Experiments have showed that backward-based decoding leads to similar recognition performance as forward-based decoding, which is consistent with the fact that both systems handle similar information through the acoustic, lexical and language models. However, because of heuristics, search algorithms used in decoding explore only a limited portion of the search space. As forward-based and backward-based approaches do not process the speech signal in the same temporal way, they explore different portions of the search space; leading to complementary systems that can be efficiently combined using the ROVER approach. The speech transcription results achieved by combining forward-based and backward-based systems are significantly better than the results obtained by combining the same amount of forward-only or backward-only systems. This confirms the complementary of the forward and backward approaches and thus the usefulness of their combination.
Published: 2013
Full Text: View/download PDF

41. Combining criteria for the detection of incorrect entries of non-native speech in the context of foreign language learning

Author: Anne Bonneau, Dominique Fohr, Irina Illina, Luiza Orosanu, Denis Jouvet, Analysis, perception and recognition of speech (PAROLE), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), and Jouvet, Denis
Subjects: incorrect entries, Computer science, [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing, Speech recognition, Chinese speech synthesis, Speech coding, Speech synthesis, Pronunciation, computer.software_genre, 01 natural sciences, 03 medical and health sciences, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, 0103 physical sciences, Speech analytics, Foreign language learning, 010301 acoustics, 030304 developmental biology, [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing, Cued speech, 0303 health sciences, business.industry, Speech corpus, Speech processing, non-native speech, constrained and unconstrained alignments, Artificial intelligence, business, computer, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Natural language processing
Abstract: International audience; This article analyzes the detection of incorrect entries of nonnative speech in the context of foreign language learning. The purpose is to detect and reject incorrect entries (i.e. those for which the speech signal does not correspond at all to the associated text) while being tolerant to the mispronunciations of non-native speech. The proposed approach exploits the comparison between two text-to-speech alignments : one constrained by the text which is being checked, with another one unconstrained, corresponding to a phonetic decoding. Several comparison criteria are described and combined via a logistic regression function. The article analyzes the influence of different settings, such as the impact of non-native pronunciation variants, the impact of learning the decision functions on native or on non-native speech, as well as the impact of combining various comparison criteria. The performance evaluations are conducted both on native and on non-native speech.
Published: 2012

42. Multilingual Recognition of Non-Native Speech using Acoustic Model Transformation and Pronunciation Modeling

Author: Irina Illina, Dominique Fohr, Ghazi Bouselmi, Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), and Fohr, Dominique
Subjects: Linguistics and Language, Computer science, Speech recognition, First language, Non-native pronunciations of English, Pronunciation, computer.software_genre, 01 natural sciences, Language and Linguistics, 030507 speech-language pathology & audiology, 03 medical and health sciences, 0103 physical sciences, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], Hidden Markov model, 010301 acoustics, business.industry, Acoustic model, Speech corpus, Human-Computer Interaction, Transformation (function), Computer Vision and Pattern Recognition, Artificial intelligence, [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], 0305 other medical science, business, computer, Software, Natural language processing, Spoken language
Abstract: International audience; This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers, but has difficulties dealing with pronunciation errors like phoneme insertions or substitutions. For these pronunciation mismatches, pronunciation modeling can make the recognition system more robust. Our approach is based on acoustic model transformation and pronunciation modeling for multiple non-native accents. For acoustic model transformation, two approaches are evaluated: MAP and model re-estimation. For pronunciation modeling, confusion rules (alternate pronunciations) are automatically extracted from a small non-native speech corpus. This paper presents a novel approach to introduce confusion rules in the recognition system which are automatically learned through pronunciation modelling. The modified HMM of a foreign spoken language phoneme includes its canonical pronunciation along with all the alternate non-native pronunciations, so that spoken language phonemes pronounced correctly by a non-native speaker could be recognized. We evaluate our approaches on the European project HIWIRE non-native corpus which contains English sentences pronounced by French, Italian, Greek and Spanish speakers. Two cases are studied: the native language of the test speaker is either known or unknown. Our approach gives better recognition results than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. We obtain 22% WER reduction compared to the reference system. Furthermore, we take into account the written form of the spoken words: non-native speakers may rely on the writing of the words in order to pronounce them. This approach does not provide any further improvements.
Published: 2012

43. Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Author: Dominique Fohr, Irina Illina, Denis Jouvet, Analysis, perception and recognition of speech (PAROLE), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Conditional random field, business.industry, Computer science, Speech recognition, Grapheme, speech recognition, 020206 networking & telecommunications, Context (language use), 02 engineering and technology, Pronunciation, Lexicon, computer.software_genre, Grapheme-to-phoneme, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, pronunciation lexicon, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Precision and recall, Hidden Markov model, computer, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Natural language processing
Abstract: International audience; This paper deals with the evaluation of grapheme-to-phoneme (G2P) converters in a speech recognition context. The precision and recall rates are investigated as potential measures of the quality of the multiple generated pronunciation variants. Very different results are obtained whether or not we take into account the frequency of occurrence of the words. Since G2P systems are rarely evaluated on a speech recognition performance basis, the originality of this paper consists in using a speech recognition system to evaluate the G2P pronunciation variants. The results show that the training process is quite robust to some errors in the pronunciation lexicon, whereas pronunciation lexicon errors are harmful in the decoding process. Noticeable speech recognition performance improvements are achieved by combining two different G2P converters, one based on conditional random fields and the other on joint multigram models, as well as by checking the pronunciation variants of the most frequent words.
Published: 2012
Full Text: View/download PDF

44. KNOWLEDGE-BASED TECHNIQUES IN ACOUSTIC-PHONETIC DECODING OF SPEECH: INTEREST AND LIMITATIONS

Author: Yves Laprie, Dominique Fohr, and Jean-Paul Haton
Subjects: Knowledge representation and reasoning, Process (engineering), Computer science, business.industry, computer.software_genre, Model-based reasoning, Abductive reasoning, Expert system, Knowledge-based systems, Artificial Intelligence, Computer Vision and Pattern Recognition, Artificial intelligence, Software architecture, Set (psychology), business, computer, Software, Natural language processing
Abstract: A major step in the process of speech understanding is the acoustic-phonetic decoding which can be defined as the automatic mapping of the continuous speech wave into a set of predetermined linguistic units such as phones, diphones, syllables, etc. This paper relates to the approach of this problem which consists in exploiting an explicit description of all kinds of available knowledge about the speech communication phenomena, in the general framework of an artificial intelligence knowledge-based system. We will first recall the main difficulties of acoustic-phonetic decoding with a practical example. We will then present the APHODEX system that we have been designing for the past eight years, in terms of software architecture and of knowledge representation and reasoning. The practical evaluation of this system will then be carried out at the different levels of feature extraction, segmentation and labelling. Finally, we will discuss the limitations of our approach and present the ongoing effort to overcome these limitations, especially through the use of abductive reasoning.
Published: 1994
Full Text: View/download PDF

45. Grapheme-to-Phoneme Conversion using Conditional Random Fields

Author: Denis Jouvet, Dominique Fohr, Irina Illina, Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP), International Speech Communication Association (ISCA) et The Italian Regional SIG - AISV (Italian Speech Communication Association), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), and Jouvet, Denis
Subjects: Conditional random field, [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing, business.industry, Computer science, Bigram, Speech recognition, Grapheme, 020206 networking & telecommunications, Context (language use), Pattern recognition, 02 engineering and technology, Pronunciation, 030507 speech-language pathology & audiology, 03 medical and health sciences, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, 0305 other medical science, business, Precision and recall, Hidden Markov model, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Independence (probability theory), [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing
Abstract: International audience; We propose an approach to grapheme-to-phoneme conversion based on a probabilistic method: Conditional Random Fields (CRF). CRF give a long term prediction, assume relaxed state independence condition. Moreover, we propose an algorithm to one-to-one letter to phoneme alignment needed for CRF training. This alignment is based on discrete HMM. The proposed system is validated on two pronunciation dictionaries. Different CRF features are studied: POS-tag, context size, unigram versus bigram. Our approach compares favorably with the performance of the state-of-the-art Joint-Multigram Models for the quality of the pronunciations, but provides better recall and precision measures for multiple pronunciation variants generation.
Published: 2011

46. Frame-Synchronous and Local Confidence Measures for Automatic Speech recognition

Author: Odile Mella, Jean-Paul Haton, Joseph Razik, Dominique Fohr, Université de Toulon (UTLN), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), and Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)
Subjects: likelihood ratio, Modalities, Computer science, Low Confidence, Speech recognition, Posterior probability, Frame (networking), speech recognition, 020206 networking & telecommunications, 02 engineering and technology, frame-synchronous measure, Measure (mathematics), posterior probability, Comprehension, Artificial Intelligence, Confidence measure, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], Transcription (software), Software
Abstract: International audience; In this paper, we introduce two new confidence measures for large vocabulary speech recognition systems. The major feature of these measures is that they can be computed without waiting for the end of the audio stream. We proposed two kinds of confidence measures: frame-synchronous and local. The frame-synchronous ones can be computed as soon as a frame is processed by the recognition engine and are based on a likelihood ratio. The local measures estimate a local posterior probability in the vicinity of the word to analyze. We evaluated our confidence measures within the framework of the automatic transcription of French broadcast news with the EER criterion. Our local measures achieved results very close to the best state-of-the-art measure (EER of 23% compared to 22.0%). We then conducted a preliminary experiment to assess the contribution of our confidence measure in improving the comprehension of an automatic transcription for the hearing impaired. We introduced several modalities to highlight words of low confidence in this transcription. We showed that these modalities used with our local confidence measure improved the comprehension of automatic transcription
Published: 2011
Full Text: View/download PDF

47. A wavelet-based parameterization for speech/music discrimination

Author: Emmanuel Didiot, Irina Illina, Odile Mella, Dominique Fohr, Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Frequency band, Computer science, Speech recognition, Word error rate, Speech/music discrimination, 02 engineering and technology, Wavelets, Theoretical Computer Science, Static parameters, Reduction (complexity), 030507 speech-language pathology & audiology, 03 medical and health sciences, Wavelet, Segmentation, 0202 electrical engineering, electronic engineering, information engineering, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], Signal processing, Human-Computer Interaction, Sound recording and reproduction, Computer Science::Sound, Dynamic parameters, 020201 artificial intelligence & image processing, Mel-frequency cepstrum, 0305 other medical science, Long-term parameters, Software, Energy (signal processing)
Abstract: International audience; This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non-stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.
Published: 2010
Full Text: View/download PDF

48. Detection of OOV words by combining acoustic confidence measures with linguistic features

Author: Dominique Fohr, Irina Illina, Frederik Stouten, Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), and Fohr, Dominique
Subjects: Vocabulary, Computer science, Speech recognition, media_common.quotation_subject, Feature extraction, confidence measures, 02 engineering and technology, Lexicon, computer.software_genre, 030507 speech-language pathology & audiology, 03 medical and health sciences, Rule-based machine translation, Transcription (linguistics), 0202 electrical engineering, electronic engineering, information engineering, Detection theory, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], media_common, Artificial neural network, Grammar, business.industry, OOV, 020206 networking & telecommunications, LVCSR, Linguistics, Artificial intelligence, [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], 0305 other medical science, business, computer, Natural language processing
Abstract: International audience; This paper describes the design of an Out-Of- Vocabulary words (OOV) detector. Such a system is assumed to detect segments that correspond to OOV words (words that are not included in the lexicon) in the output of a LVCSR system. The OOV detector uses acoustic confidence measures that are derived from several systems: a word recognizer constrained by a lexicon, a phone recognizer constrained by a grammar and a phone recognizer without constraints. On top of that it also uses some linguistic features. The experimental results on a French broadcast news transcription task showed that for our approach precision equals recall at 35%
Published: 2009
Full Text: View/download PDF

49. JTrans: an open-source software for semi-automatic text-to-speech alignment

Author: Dominique Fohr, Christophe Cerisara, Odile Mella, Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), and Cerisara, Christophe
Subjects: Audio mining, [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing, Computer science, Speech recognition, [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing, Speech synthesis, computer.software_genre, 030507 speech-language pathology & audiology, 03 medical and health sciences, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, Transcription (linguistics), Speech analytics, 060201 languages & linguistics, business.industry, speech recognition, Acoustic model, alignment, Speech corpus, 06 humanities and the arts, Transcriber, Speech processing, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, JTrans, 0602 languages and literature, Artificial intelligence, Transcription (software), 0305 other medical science, business, computer, Natural language processing
Abstract: International audience; Aligning speech corpora with text transcriptions is an important requirement of many speech processing, data mining applications and linguistic researches. Despite recent progress in the field of speech recognition, many linguists still manually align spontaneous and noisy speech recordings to guarantee a good alignment quality. This work proposes an open-source java software with an easy-to-use GUI that integrates dedicated semiautomatic speech alignment algorithms that can be dynamically controlled and guided by the user. The objective of this software is to facilitate and speed up the process of creating and aligning speech corpora.
Published: 2009
Full Text: View/download PDF

50. Frame-Synchronous and Local Confidence Measures for on-the-fly Automatic Speech Recognition

Author: Joseph Razik, Odile Mella, Dominique Fohr, Jean-Paul Haton, Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: 030507 speech-language pathology & audiology, 03 medical and health sciences, likelihood ratio, conﬁdence measures, [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD], 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 02 engineering and technology, Speech recognition, 0305 other medical science, posterior probability
Abstract: International audience; This paper presents several new conﬁdence measures with the major advantage that they can be evaluated as soon as possible without having to wait for the recognition process to be completed. We have deﬁned two kinds of conﬁdence measures. The ﬁrst one can be computed synchronously with the frame processed by the engine and the second one with a slight delay. Such measures are useful for driving the recognition process by modifying the likelihood score or for validating recognised words in on-the-ﬂy applications such as keyword spotting task and on-line automatic speech transcription for deaf people. The EER evaluation on a French broadcast news corpus shows a performance close to the batch version of these measures (23.0% against 22.0% of EER) with only 0.84s of data before and after the word to be analysed.
Published: 2008

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

113 results on '"Dominique Fohr"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources