Author: "Georges Linarès" / Language: english - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Georges Linarès"' showing total 56 results

Start Over Author "Georges Linarès" Language english

56 results on '"Georges Linarès"'

1. Query-Driven Strategy for On-the-Fly Term Spotting in Spontaneous Speech

Author: Mickael Rouvier, Georges Linarès, and Benjamin Lecouteux
Subjects: Acoustics. Sound, QC221-246, Electronic computers. Computer science, QA75.5-76.95
Abstract: Spoken utterance retrieval was largely studied in the last decades, with the purpose of indexing large audio databases or of detecting keywords in continuous speech streams. While the indexing of closed corpora can be performed via a batch process, on-line spotting systems have to synchronously detect the targeted spoken utterances. We propose a two-level architecture for on-the-fly term spotting. The first level performs a fast detection of the speech segments that probably contain the targeted utterance. The second level refines the detection on the selected segments, by using a speech recognizer based on a query-driven decoding algorithm. Experiments are conducted on both broadcast and spontaneous speech corpora. We investigate the impact of the spontaneity level on system performance. Results show that our method remains effective even if the recognition rates are significantly degraded by disfluencies.
Published: 2010
Full Text: View/download PDF

2. Compact Acoustic Models for Embedded Speech Recognition

Author: Christophe Lévy, Georges Linarès, and Jean-François Bonastre
Subjects: Acoustics. Sound, QC221-246, Electronic computers. Computer science, QA75.5-76.95
Abstract: Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In order to fit the resource constraints of embedded applications, an approach based on a semicontinuous HMM system using state-independent acoustic modelling is proposed. A transformation is computed and applied to the global model in order to obtain each HMM state-dependent probability density functions, authorizing to store only the transformation parameters. This approach is evaluated on two tasks: digit and voice-command recognition. A fast adaptation technique of acoustic models is also proposed. In order to significantly reduce computational costs, the adaptation is performed only on the global model (using related speaker recognition adaptation techniques) with no need for state-dependent data. The whole approach results in a relative gain of more than 20% compared to a basic HMM-based system fitting the constraints.
Published: 2009
Full Text: View/download PDF

3. E2E-SINCNET: TOWARD FULLY END-TO-END SPEECH RECOGNITION

Author: Mohamed Morchid, Titouan Parcollet, Georges Linarès, Parcollet, Titouan, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and PNRIA
Subjects: Lossless compression, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Computer science, Speech recognition, 05 social sciences, Word error rate, 010501 environmental sciences, [INFO] Computer Science [cs], 01 natural sciences, Signal, Field (computer science), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], End-to-end principle, 0502 economics and business, Waveform, [INFO]Computer Science [cs], Mel-frequency cepstrum, 050207 economics, Joint (audio engineering), 0105 earth and related environmental sciences
Abstract: International audience; Modern end-to-end (E2E) Automatic Speech Recognition (ASR) systems rely on Deep Neural Networks (DNN) that are mostly trained on handcrafted and pre-computed acoustic features such as Mel-filter-banks or Mel-frequency cepstral coefficients. Nonetheless , and despite worse performances, E2E ASR models processing raw waveforms are an active research field due to the lossless nature of the input signal. In this paper, we propose the E2E-SincNet, a novel fully E2E ASR model that goes from the raw waveform to the text transcripts by merging two recent and powerful paradigms: SincNet and the joint CTC-attention training scheme. The conducted experiments on two different speech recognition tasks show that our approach outperforms previously investigated E2E systems relying either on the raw waveform or pre-computed acoustic features, with a reported top-of-the-line Word Error Rate (WER) of 4.7% on the Wall Street Journal (WSJ) dataset.
Published: 2020

4. Real to H-space Encoder for Speech Recognition

Author: Georges Linarès, Mohamed Morchid, Renato De Mori, Titouan Parcollet, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, McGill University = Université McGill [Montréal, Canada], and Parcollet, Titouan
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], FOS: Computer and information sciences, Sound (cs.SD), Relation (database), Computer science, Speech recognition, Computer Science::Neural and Evolutionary Computation, [INFO] Computer Science [cs], Computer Science - Sound, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, [INFO]Computer Science [cs], Quaternion, Representation (mathematics), Index Terms: quaternion neural networks, Computer Science - Computation and Language, Artificial neural network, Frame (networking), Process (computing), speech recognition, recurrent neural net- works, Recurrent neural network, Encoder, Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences. Recently, it has been shown that different input representations, based on multidimensional algebras, such as complex and quaternion numbers, are able to bring to neural networks a more natural, compressive and powerful representation of the input signal by outperforming common real-valued NNs. Indeed, quaternion-valued neural networks (QNNs) better learn both internal dependencies, such as the relation between the Mel-filter-bank value of a specific time frame and its time derivatives, and global dependencies, describing the relations that exist between time frames. Nonetheless, QNNs are limited to quaternion-valued input signals, and it is difficult to benefit from this powerful representation with real-valued input data. This paper proposes to tackle this weakness by introducing a real-to-quaternion encoder that allows QNNs to process any one dimensional input features, such as traditional Mel-filter-banks for automatic speech recognition., Comment: Accepted at INTERSPEECH 2019
Published: 2019

5. Quaternion Convolutional Neural Networks For Theme Identification Of Telephone Conversations

Author: Mohamed Morchid, Renato De Mori, Titouan Parcollet, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, McGill University = Université McGill [Montréal, Canada], and Parcollet, Titouan
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], business.industry, Computer science, Process (computing), 02 engineering and technology, 010501 environmental sciences, [INFO] Computer Science [cs], 01 natural sciences, Convolutional neural network, Task (project management), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Identification (information), 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), Task analysis, Code (cryptography), 020201 artificial intelligence & image processing, [INFO]Computer Science [cs], Artificial intelligence, business, Quaternion, ComputingMilieux_MISCELLANEOUS, 0105 earth and related environmental sciences
Abstract: Quaternion convolutional neural networks (QCNN) are powerful architectures to learn and model external dependencies that exist between neighbor features of an input vector, and internal latent dependencies within the feature. This paper proposes to evaluate the effectiveness of the QCNN on a realistic theme identification task of spoken telephone conversations between agents and customers from the call center of the Paris transportation system (RATP). We show that QCNNs are more suitable than real-valued CNN to process multidimensional data and to code internal dependencies. Indeed, real-valued CNNs deal with both internal and external relations at the same level since components of an entity are processed independently. Experimental evidence is provided that the proposed QCNN architecture always outperforms real-valued equivalent CNN models in the theme identification task of the DECODA corpus. It is also shown that QCNN accuracy results are the best achieved so far on this task, while reducing by a factor of 4 the number of model parameters.
Published: 2018

6. Audiovisual speaker diarization of TV series

Author: Georges Linarès, Xavier Bost, and Serigne Gueye
Subjects: Speaker diarisation, FOS: Computer and information sciences, Modality (human–computer interaction), Computer Science - Computation and Language, Series (mathematics), Computer science, Speech recognition, ComputerApplications_MISCELLANEOUS, Intonation (linguistics), Set (psychology), Computation and Language (cs.CL), Computer Science - Multimedia, Multimedia (cs.MM)
Abstract: Speaker diarization may be difficult to achieve when applied to narrative films, where speakers usually talk in adverse acoustic conditions: background music, sound effects, wide variations in intonation may hide the inter-speaker variability and make audio-based speaker diarization approaches error prone. On the other hand, such fictional movies exhibit strong regularities at the image level, particularly within dialogue scenes. In this paper, we propose to perform speaker diarization within dialogue scenes of TV series by combining the audio and video modalities: speaker diarization is first performed by using each modality; the two resulting partitions of the instance set are then optimally matched, before the remaining instances, corresponding to cases of disagreement between both modalities, are finally processed. The results obtained by applying such a multi-modal approach to fictional films turn out to outperform those obtained by relying on a single modality.
Published: 2018

7. Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Author: Chiheb Trabelsi, Titouan Parcollet, Yoshua Bengio, Ying Zhang, Renato De Mori, Mohamed Morchid, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Montreal Institute for Learning Algorithms [Montréal] (MILA), Centre de Recherches Mathématiques [Montréal] (CRM), and Université de Montréal (UdeM)-Université de Montréal (UdeM)
Subjects: FOS: Computer and information sciences, Sound (cs.SD), 0209 industrial biotechnology, Computer science, Speech recognition, Word error rate, Machine Learning (stat.ML), TIMIT, 02 engineering and technology, Convolutional neural network, Computer Science - Sound, Machine Learning (cs.LG), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], 020901 industrial engineering & automation, Audio and Speech Processing (eess.AS), Statistics - Machine Learning, FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), [INFO]Computer Science [cs], Quaternion, Index Terms: quaternion convolutional neural networks, Artificial neural network, Quaternion algebra, business.industry, Deep learning, deep learning, auto- matic speech recognition, Computer Science - Learning, 020201 artificial intelligence & image processing, Artificial intelligence, business, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs., Accepted at INTERSPEECH 2018
Published: 2018
Full Text: View/download PDF

8. A TOPIC MODELING BASED REPRESENTATION TO DETECT TWEET LOCATIONS. EXAMPLE OF THE EVENT 'JE SUIS CHARLIE'

Author: Mohamed Morchid, Richard Dufour, Eitan Altman, Yonathan Portilla, Georges Linarès, and Didier Josselin
Subjects: Topic model, lcsh:Applied optics. Photonics, 0209 industrial biotechnology, Computer science, Sample (statistics), 02 engineering and technology, computer.software_genre, Latent Dirichlet allocation, lcsh:Technology, Set (abstract data type), symbols.namesake, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Spatial analysis, Information retrieval, Event (computing), lcsh:T, lcsh:TA1501-1820, lcsh:TA1-2040, symbols, 020201 artificial intelligence & image processing, Data mining, lcsh:Engineering (General). Civil engineering (General), computer, Meaning (linguistics)
Abstract: Social Networks became a major actor in information propagation. Using the Twitter popular platform, mobile users post or relay messages from different locations. The tweet content, meaning and location, show how an event-such as the bursty one ”JeSuisCharlie”, happened in France in January 2015, is comprehended in different countries. This research aims at clustering the tweets according to the co-occurrence of their terms, including the country, and forecasting the probable country of a non-located tweet, knowing its content. First, we present the process of collecting a large quantity of data from the Twitter website. We finally have a set of 2,189 located tweets about “Charlie”, from the 7th to the 14th of January. We describe an original method adapted from the Author-Topic (AT) model based on the Latent Dirichlet Allocation (LDA) method. We define an homogeneous space containing both lexical content (words) and spatial information (country). During a training process on a part of the sample, we provide a set of clusters (topics) based on statistical relations between lexical and spatial terms. During a clustering task, we evaluate the method effectiveness on the rest of the sample that reaches up to 95% of good assignment. It shows that our model is pertinent to foresee tweet location after a learning process.
Published: 2015

9. Extraction and Analysis of Dynamic Conversational Networks from TV Series

Author: Serigne Gueye, Xavier Bost, Vincent Labatut, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, ANR & FR Agorantic, Mehmet Kaya, Jalal Kawash, Suheil Khoury, Min-Yuh Day, and ANR-14-CE24-0022,GaFes,Galeries des Festivals(2014)
Subjects: Dynamic network analysis, Computer science, Open problem, 02 engineering and technology, computer.software_genre, [INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI], 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Narrative, Sequence, Social network, Series (mathematics), business.industry, [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM], 020207 software engineering, 16. Peace & justice, Dynamic Social Network, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], TV Series, Plot Analysis, Artificial intelligence, business, Focus (optics), computer, Smoothing, Natural language processing
Abstract: International audience; Identifying and characterizing the dynamics of modern tv series subplots is an open problem. One way is to study the underlying social network of interactions between the characters. Standard dynamic network extraction methods rely on temporal integration, either over the whole considered period, or as a sequence of several time-slices. However, they turn out to be inappropriate in the case of tv series, because the scenes shown onscreen alternatively focus on parallel storylines, and do not necessarily respect a traditional chronology. In this article, we introduce Narrative Smoothing, a novel network extraction method taking advantage of the plot properties to solve some of their limitations. We apply our method to a corpus of 3 popular series, and compare it to both standard approaches. Narrative smoothing leads to more relevant observations when it comes to the characterization of the protagonists and their relationships, confirming its appropriateness to model the intertwined storylines constituting the plots.
Published: 2018
Full Text: View/download PDF

10. Deep quaternion neural networks for spoken language understanding

Author: Mohamed Morchid, Titouan Parcollet, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and Parcollet, Titouan
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Artificial neural network, Computer science, business.industry, Deep learning, deep learning, 020206 networking & telecommunications, 02 engineering and technology, Construct (python library), [INFO] Computer Science [cs], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Identification (information), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, [INFO]Computer Science [cs], Artificial intelligence, Quaternion, business, Subspace topology, Spoken language, Abstraction (linguistics)
Abstract: International audience; The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility. The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters. Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.
Published: 2017

11. Impact Of Content Features For Automatic Online Abuse Detection

Author: Etienne Papegnies, Vincent Labatut, Richard Dufour, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Région PACA, and Nectar de Code
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Matching (statistics), Computer science, Process (engineering), business.industry, Computer Science - Social and Information Networks, 02 engineering and technology, Moderation, [INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI], Task (project management), Computer Science - Information Retrieval, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, Human–computer interaction, 020204 information systems, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, The Internet, Online Communities, business, Abuse Detection, Information Retrieval (cs.IR), Natural Language Processing
Abstract: International audience; Online communities have gained considerable importance in recent years due to the increasing number of people connected to the Internet. Moderating user content in online communities is mainly performed manually, and reducing the workload through automatic methods is of great financial interest for community maintainers. Often, the industry uses basic approaches such as bad words filtering and regular expression matching to assist the moderators. In this article, we consider the task of automatically determining if a message is abusive. This task is complex since messages are written in a non-standardized way, including spelling errors, abbreviations, community-specific codes... First, we evaluate the system that we propose using standard features of online messages. Then, we evaluate the impact of the addition of pre-processing strategies, as well as original specific features developed for the community of an online in-browser strategy game. We finally propose to analyze the usefulness of this wide range of features using feature selection. This work can lead to two possible applications: 1) automatically flag potentially abusive messages to draw the moderator's attention on a narrow subset of messages ; and 2) fully automate the moderation process by deciding whether a message is abusive without any human intervention.
Published: 2017
Full Text: View/download PDF

12. Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

Author: Georges Linarès, Dominique Fohr, Irina Illina, Imran Sheikh, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), Grid'5000, ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Topic model, Vocabulary, Acoustics and Ultrasonics, Computer science, media_common.quotation_subject, Speech recognition, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Context (language use), 02 engineering and technology, 010501 environmental sciences, Semantics, computer.software_genre, 01 natural sciences, Latent Dirichlet allocation, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], symbols.namesake, 0202 electrical engineering, electronic engineering, information engineering, Computer Science (miscellaneous), Electrical and Electronic Engineering, out-of-vocabulary, proper names, 0105 earth and related environmental sciences, media_common, Context model, business.industry, Computational Mathematics, large vocabulary continuous speech recognition, Automatic indexing, semantic context, symbols, 020201 artificial intelligence & image processing, Artificial intelligence, Language model, business, computer, Natural language processing
Abstract: International audience; The diachronic nature of broadcast news data leads to the problem of Out-Of-Vocabulary (OOV) words in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are Proper Names (PNs). However PNs are important for automatic indexing of audio-video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the semantic context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and semantic context derived from Latent Dirichlet Allocation (LDA) topic models, continuous word vector representations and the Neural Bag-of-Words (NBOW) model which is capable of learning task specific word and context representations. We propose a Neural Bag-of-Weighted Words (NBOW2) model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed context models.
Published: 2017
Full Text: View/download PDF

13. Improving multi-stream classification by mapping sequence-embedding in a high dimensional space

Author: Mohamed Morchid, Mohamed Bouaziz, Richard Dufour, Georges Linarès, Département de Recherche en Ingéniérie des Véhicules pour l'Environnement (DRIVE), Université de Bourgogne (UB), Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: Sequence, Artificial neural network, Computer science, business.industry, Word error rate, Context (language use), Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Support vector machine, Recurrent neural network, Hyperplane, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Embedding, Artificial intelligence, business, ComputingMilieux_MISCELLANEOUS, 0105 earth and related environmental sciences
Abstract: Most of the Natural and Spoken Language Processing tasks now employ Neural Networks (NN), allowing them to reach impressive performances. Embedding features allow the NLP systems to represent input vectors in a latent space and to improve the observed performances. In this context, Recurrent Neural Network (RNN) based architectures such as Long Short-Term Memory (LSTM) are well known for their capacity to encode sequential data into a non-sequential hidden vector representation, called sequence embedding. In this paper, we propose an LSTM-based multi-stream sequence embedding in order to encode parallel sequences by a single non-sequential latent representation vector. We then propose to map this embedding representation in a high-dimensional space using a Support Vector Machine (SVM) in order to classify the multi-stream sequences by finding out an optimal hyperplane. Multi-stream sequence embedding allowed the SVM classifier to more efficiently profit from information carried by both parallel streams and longer sequences. The system achieved the best performance, in a multi-stream sequence classification task, with a gain of 9 points in error rate compared to an SVM trained on the original input sequences.
Published: 2016
Full Text: View/download PDF

14. Quaternion Neural Networks for Spoken Language Understanding

Author: Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori, Pierre-Michel Bousquet, Titouan Parcollet, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and Parcollet, Titouan
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Document Structure Description, Computer science, 02 engineering and technology, [INFO] Computer Science [cs], computer.software_genre, Machine learning, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], 030507 speech-language pathology & audiology, 03 medical and health sciences, 0202 electrical engineering, electronic engineering, information engineering, [INFO]Computer Science [cs], Quaternion, Representation (mathematics), ComputingMilieux_MISCELLANEOUS, Artificial neural network, Quaternion algebra, business.industry, Deep learning, Multilayer perceptron, 020201 artificial intelligence & image processing, Artificial intelligence, 0305 other medical science, business, computer, Natural language processing, Spoken language
Abstract: Machine Learning (ML) techniques have allowed a great performance improvement of different challenging Spoken Language Understanding (SLU) tasks. Among these methods, Neural Networks (NN), or Multilayer Perceptron (MLP), recently received a great interest from researchers due to their representation capability of complex internal structures in a low dimensional subspace. However, MLPs employ document representations based on basic word level or topic-based features. Therefore, these basic representations reveal little in way of document statistical structure by only considering words or topics contained in the document as a “bag-of-words”, ignoring relations between them. We propose to remedy this weakness by extending the complex features based on Quaternion algebra presented in [1] to neural networks called QMLP. This original QMLP approach is based on hyper-complex algebra to take into consideration features dependencies in documents. New document features, based on the document structure itself, used as input of the QMLP, are also investigated in this paper, in comparison to those initially proposed in [1]. Experiments made on a SLU task from a real framework of human spoken dialogues showed that our QMLP approach associated with the proposed document features outperforms other approaches, with an accuracy gain of 2% with respect to the MLP based on real numbers and more than 3% with respect to the first Quaternion-based features proposed in [1]. We finally demonstrated that less iterations are needed by our QMLP architecture to be efficient and to reach promising accuracies.
Published: 2016

15. Parallel Long Short-Term Memory for multi-stream classification

Author: Richard Dufour, Mohamed Morchid, Georges Linarès, Renato De Mori, Mohamed Bouaziz, Département de Recherche en Ingéniérie des Véhicules pour l'Environnement (DRIVE), Université de Bourgogne (UB), Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: FOS: Computer and information sciences, Sequence, Computer Science - Computation and Language, business.industry, Computer science, Speech recognition, Process (computing), Pattern recognition, Context (language use), 02 engineering and technology, Multi stream, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Machine Learning (cs.LG), Task (computing), Computer Science - Learning, Recurrent neural network, 020204 information systems, Logic gate, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Hidden Markov model, Computation and Language (cs.CL), ComputingMilieux_MISCELLANEOUS
Abstract: Recently, machine learning methods have provided a broad spectrum of original and efficient algorithms based on Deep Neural Networks (DNN) to automatically predict an outcome with respect to a sequence of inputs. Recurrent hidden cells allow these DNN-based models to manage long-term dependencies such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM) directions. But most of the information available nowadays is from multistreams or multimedia documents, and require RNNs to process these information synchronously during the training. This paper presents an original LSTM-based architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel synchronized input sequences in order to predict a common output. The proposed PLSTM method could be used for parallel sequence classification purposes. The PLSTM approach is evaluated on an automatic telecast genre sequences classification task and compared with different state-of-the-art architectures. Results show that the proposed PLSTM method outperforms the baseline n-gram models as well as the state-of-the-art LSTM approach., Comment: 2016 IEEE Workshop on Spoken Language Technology
Published: 2016
Full Text: View/download PDF

16. Spoken Language Understanding in a Latent Topic-based Subspace

Author: Killian Janod, Richard Dufour, Mohamed Morchid, Pierre-Michel Bousquet, Georges Linarès, Waad Ben Kheder, Mohamed Bouaziz, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Département de Recherche en Ingéniérie des Véhicules pour l'Environnement (DRIVE), and Université de Bourgogne (UB)
Subjects: Computer science, author-topic model, factor analysis, 02 engineering and technology, computer.software_genre, Latent Dirichlet allocation, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], document clustering, symbols.namesake, Transcription (linguistics), Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, c-vector, business.industry, Document classification, 020206 networking & telecommunications, Document clustering, ComputingMethodologies_PATTERNRECOGNITION, symbols, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, Subspace topology, Spoken language
Abstract: International audience; Performance of spoken language understanding applications declines when spoken documents are automatically transcribed in noisy conditions due to high Word Error Rates (WER). To improve the robustness to transcription errors, recent solutions propose to map these automatic transcriptions in a latent space. These studies have proposed to compare classical topic-based representations such as Latent Dirichlet Allocation (LDA), supervised LDA and author-topic (AT) models. An original compact representation, called c-vector, has recently been introduced to walk around the tricky choice of the number of latent topics in these topic-based representations. Moreover, c-vectors allow to increase the robustness of document classification with respect to transcription errors by compacting different LDA representations of a same speech document in a reduced space and then compensate most of the noise of the document representation. The main drawback of this method is the number of sub-tasks needed to build the c-vector space. This paper proposes to both improve this compact representation (c-vector) of spoken documents and to reduce the number of needed sub-tasks, using an original framework in a robust low dimensional space of features from a set of AT models called "Latent Topic-based Sub-space" (LTS). In comparison to LDA, the AT model considers not only the dialogue content (words), but also the class related to the document. Experiments are conducted on the DECODA corpus containing speech conversations from the call-center of the RATP Paris transportation company. Results show that the original LTS representation outperforms the best previous compact representation (c-vector), with a substantial gain of more than 2.5% in terms of correctly labeled conversations.
Published: 2016
Full Text: View/download PDF

17. Deep Stacked Autoencoders for Spoken Language Understanding

Author: Mohamed Morchid, Killian Janod, Richard Dufour, Renato De Mori, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: 030507 speech-language pathology & audiology, 03 medical and health sciences, Computer science, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, 0305 other medical science, Linguistics, ComputingMilieux_MISCELLANEOUS, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Spoken language
Abstract: International audience
Published: 2016
Full Text: View/download PDF

18. Learning Word Importance with the Neural Bag-of-Words Model

Author: Imran Sheikh, Georges Linarès, Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Fohr, Dominique, and BLANC - Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio - - ContNomina2012 - ANR-12-BS02-0009 - BLANC - VALID
Subjects: business.industry, Computer science, Speech recognition, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Task (project management), Bag-of-words model, Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], business, computer, Word (computer architecture), Natural language processing, 0105 earth and related environmental sciences
Abstract: The Neural Bag-of-Words (NBOW) model performs classification with an average of the input word vectors and achieves an impressive performance. While the NBOW model learns word vectors targeted for the classification task it does not explicitly model which words are important for given task. In this paper we propose an improved NBOW model with this ability to learn task specific word importance weights. The word importance weights are learned by introducing a new weighted sum composition of the word vectors. With experiments on standard topic and sentiment classification tasks, we show that (a) our proposed model learns meaningful word importance for a given task (b) our model gives best accuracies among the BOW approaches. We also show that the learned word importance weights are comparable to tf-idf based word weights when used as features in a BOWSVM classifier.
Published: 2016

19. Narrative Smoothing: Dynamic Conversational Network for the Analysis of TV Series Plots

Author: Xavier Bost, Vincent Labatut, Serigne Gueye, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), ANR-14-CE24-0022,GaFes,Galeries des Festivals(2014), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: FOS: Computer and information sciences, Dynamic networks, Computer science, 02 engineering and technology, Machine learning, computer.software_genre, Social networks, Field (computer science), Computer Science - Information Retrieval, 0202 electrical engineering, electronic engineering, information engineering, Narrative, Plot (narrative), Social and Information Networks (cs.SI), Plot analysis, Series (mathematics), Social network, Point (typography), business.industry, [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM], 020207 software engineering, Computer Science - Social and Information Networks, Multimedia (cs.MM), Dynamics (music), [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], 020201 artificial intelligence & image processing, TV series, Artificial intelligence, business, computer, Information Retrieval (cs.IR), Smoothing, Natural language processing, Computer Science - Multimedia
Abstract: International audience; Modern popular TV series often develop complex storylines spanning several seasons, but are usually watched in quite a discontinuous way. As a result, the viewer generally needs a comprehensive summary of the previous season plot before the new one starts. The generation of such summaries requires first to identify and characterize the dynamics of the series subplots. One way of doing so is to study the underlying social network of interactions between the characters involved in the narrative. The standard tools used in the Social Networks Analysis field to extract such a network rely on an integration of time, either over the whole considered period, or as a sequence of several time-slices. However, they turn out to be inappropriate in the case of TV series, due to the fact the scenes showed onscreen alternatively focus on parallel storylines, and do not necessarily respect a traditional chronology. This makes existing extraction methods inefficient to describe the dynamics of relationships between characters, or to get a relevant instantaneous view of the current social state in the plot. This is especially true for characters shown as interacting with each other at some previous point in the plot but temporarily neglected by the narrative. In this article, we introduce narrative smoothing, a novel, still exploratory, network extraction method. It smooths the relationship dynamics based on the plot properties, aiming at solving some of the limitations present in the standard approaches. In order to assess our method, we apply it to a new corpus of 3 popular TV series, and compare it to both standard approaches. Our results are promising, showing narrative smoothing leads to more relevant observations when it comes to the characterization of the protagonists and their relationships. It could be used as a basis for further modeling the intertwined storylines constituting TV series plots.
Published: 2016
Full Text: View/download PDF

20. Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval

Author: Irina Illina, Georges Linarès, Dominique Fohr, Imane Nkairi, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Zygmunt Vetulani, Hans Uszkoreit, Marek Kubis, and ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012)
Subjects: Vocabulary, Out-of-vocabulary words, Computer science, media_common.quotation_subject, Word error rate, Context (language use), 02 engineering and technology, Proper names, Speech recognition, computer.software_genre, Out of vocabulary, Task (project management), 0202 electrical engineering, electronic engineering, information engineering, Selection (linguistics), Proper noun, [INFO]Computer Science [cs], media_common, business.industry, 020206 networking & telecommunications, Key (cryptography), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, Vocabulary augmentation
Abstract: International audience; Proper name recognition is a challenging task in information retrieval from large audio/video databases. Proper names are semantically rich and are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving proper names from contemporary diachronic text documents. We proposed methods that dynamically augment the automatic speech recognition system vocabulary using lexical and temporal features in diachronic documents. We also studied different metrics for proper name selection in order to limit the vocabulary augmentation and therefore the impact on the ASR performances. Recognition results show a significant reduction of the proper name error rate using an augmented vocabulary.
Published: 2016
Full Text: View/download PDF

21. Predicting popularity dynamics of online contents using data filtering methods

Author: Georges Linarès, Rachid El-Azouzi, Cedric Richier, Tania Jimenez, Eitan Altman, Jimenez, Tania, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Models for the performance analysis and the control of networks (MAESTRO), Inria Sophia Antipolis - Méditerranée (CRISAM), and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Subjects: [INFO.INFO-MM] Computer Science [cs]/Multimedia [cs.MM], Exploit, Computer science, Process (engineering), [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM], Baseline model, 020206 networking & telecommunications, 02 engineering and technology, [INFO] Computer Science [cs], computer.software_genre, Popularity, [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, Data filtering, Dynamics (music), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, [INFO]Computer Science [cs], Data mining, [INFO.INFO-MO] Computer Science [cs]/Modeling and Simulation, computer, ComputingMilieux_MISCELLANEOUS
Abstract: This paper proposes a new prediction process to explain and predicts popularity evolution of YouTube videos. We exploit prior study on the classification of YouTube videos in order to predict the evolution of videos' view-count. This classification allows to identify important factors of the observed popularity dynamics. In particular, we use this classification as filtering method allowing to identify the factors responsible for this popularity evolution. Results given by extensive experiments show that the proposed prediction process is able to reduce the average prediction errors compared to a state-of-the-art baseline model. We also evaluate the impact of adding popularity criteria in the classification.
Published: 2016

22. Topic-space based setup of a neural network for theme identification of highly imperfect transcriptions

Author: Mohamed Morchid, Georges Linarès, Richard Dufour, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: Artificial neural network, Computer science, Time delay neural network, business.industry, Hidden layer, Speech recognition, Index Terms— Artificial neural network, Initialization, Latent Dirichlet allocation, Machine learning, computer.software_genre, symbols.namesake, Categorization, Prior probability, symbols, Weights initialization, Speech analytics, [INFO]Computer Science [cs], Artificial intelligence, business, computer, Classifier (UML)
Abstract: International audience; This paper presents a method for speech analytics that integrates topic-space based representation into a feed-forward artificial neural network (FFANN), working as a document classifier. The proposed method consists in configuring the FFANN's topology and in initializing the weights according to a previously estimated topic-space. Setup based on thematic priors is expected to improve the efficiency of the FFANN's weight optimization process, while speeding-up the training process and improving the classification accuracy. This method is evaluated on a spoken dialogue categorization task which is composed of customer-agent dialogues from the call-centre of Paris Public Transportation Company. Results show the interest of the proposed setup method, with a gain of more than 4 points in terms of classification accuracy, compared to the baseline. Moreover, experiments highlight that performance is weakly dependent to FFANN's topology with the LDA-based configuration, in comparison to classical empirical setup.
Published: 2015
Full Text: View/download PDF

23. An Author-Topic based Approach to Cluster Tweets and Mine their Location

Author: Richard Dufour, Didier Josselin, Georges Linarès, Yonathan Portilla, Mohamed Morchid, Jean-Valère Cossu, Alexandre Reiffers-Masson, Marc El-Bèze, Eitan Altman, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Models for the performance analysis and the control of networks (MAESTRO), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Études des Structures, des Processus d’Adaptation et des Changements de l’Espace (ESPACE), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Avignon Université (AU)-Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Université Nice Sophia Antipolis (1965 - 2019) (UNS), and Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU)
Subjects: 0209 industrial biotechnology, Keywords: Author-Topic model, Computer science, Twitter, Sample (statistics), 02 engineering and technology, computer.software_genre, Latent Dirichlet allocation, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Task (project management), Set (abstract data type), symbols.namesake, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, Author-Topic model, Tweet location, Cluster analysis, Spatial analysis, ComputingMilieux_MISCELLANEOUS, General Environmental Science, Tweets location, Information retrieval, Process (computing), [SHS.GEO]Humanities and Social Sciences/Geography, Topic modeling, Author topic model, symbols, General Earth and Planetary Sciences, 020201 artificial intelligence & image processing, Data mining, computer, Meaning (linguistics)
Abstract: Presented as poster at Spatial Statistics Conference 2015, Avignon, France, June 2015; International audience; Social Networks became a major actor in information propagation. Using the Twitter popular platform, mobile users post or relaymessages from different locations. The tweet content, meaning and location show how an event-such as the bursty one“JeSuisCharlie'” happened in France in January 2015 is comprehended in different countries. This research aims at clustering thetweets according to the co-occurrence of their terms, including the country, and forecasting the probable country of a non locatedtweet, knowing its content. First, we present the process of collecting a large quantity of data from the Twitter website. Wefinally have a set of 2.189 located tweets about “Charlie'', from the 7th to the 14th of January. We describe an original methodadapted from the Author-Topic (AT) model based on the Latent Dirichlet Allocation method (LDA). We define a homogeneousspace containing both lexical content (words) and spatial information (country). During a training process on a part of the sample,we provide a set of clusters (topics) based on statistical relations between lexical and spatial terms. During a clustering task, weevaluate the method effectiveness on the rest of the sample that reaches up to 95% of good assignment.
Published: 2015
Full Text: View/download PDF

24. Identification de personnes dans des flux multimédia

Author: Frédéric Béchet, Meriem Bendris, Delphine Charlet, Géraldine Damnati, Benoit Favre, Mickael Rouvier, Rémi Auguste, Benjamin Bigot, Richard Dufour, Corinne Fredouille, Georges Linarès, Jean Martinet, Gregory Senay, Pierre Trilly, Laboratoire d'informatique Fondamentale de Marseille - UMR 6166 (LIF), Université de la Méditerranée - Aix-Marseille 2-Université de Provence - Aix-Marseille 1-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'informatique Fondamentale de Marseille (LIF), Centre National de la Recherche Scientifique (CNRS)-École Centrale de Marseille (ECM)-Aix Marseille Université (AMU), France Télécom Recherche & Développement (FT R&D), France Télécom, France Télécom Recherche et Développement [Lannion] (FTR&D), Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), FOX MIIRE (LIFL), Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Aix Marseille Université (AMU)-École Centrale de Marseille (ECM)-Centre National de la Recherche Scientifique (CNRS), Traitement Automatique du Langage Ecrit et Parlé (TALEP), Laboratoire d'Informatique et Systèmes (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: Multimedia indexing2, Person recognition1, Reconnaissance de personnes1, Indexation multimédia2, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Abstract: International audience; This paper describes a multi-modal person recognition system for video broadcastdeveloped for participating to the REPERE challenge, that was organized jointly by the DGA and the ANR (French Research National Agency). The main track of this challenge targets the identification of all persons occurring in a video either. The main scientific issue addressed by this challenge is the combination of audio and video information extraction processes for improving the extraction performance in both modalities. In this paper, we present a strategy for speaker identification based on enriching the speaker diarization by features related to the ”understanding” of the video scenes: text overlay transcription and analysis, automatic situation identification (TV set, report), the amount of people visible, TV set disposition and even the camera when available. Experiments on the REPERE corpus show interest of the proposed approach.; Cet article présente un système d’identification de personnes dans des flux multimédia.Ce système a été engagé dans le défi REPERE, co-organisé par l’ANR et la DGA et qui s’est terminé en 2014. La tâche principale du défi consistait à identifier des individus apparaissant dans au moins une des modalités portées par la vidéo, qu’il s’agisse de locuteurs audibles ou de visages visibles à l’écran. Un des verrous scientifiques majeurs de cette tâche est lié à la combinaison des modalités audio et vidéo. Cet article présente une stratégie pour la reconnaissance des personnes basée sur une identification du locuteur reposant sur des descripteurs dehaut niveau, modélisant différents aspects de la scène filmée : la transcription et l’analyse des textes incrustés, l’identification du type de la scène filmée (reportage, plateau, ...), le nombre de personnes présentes, la disposition des caméras... Nos expériences sur le corpus REPERE montrent l’intérêt de l’approche proposée.
Published: 2015

25. Author-topic based representation of call-center conversations

Author: Richard Dufour, Mohamed Morchid, Mohamed Bouallegue, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: Process (engineering), Computer science, Human/human con-versation, Context (language use), 02 engineering and technology, Space (commercial competition), Speech recognition, computer.software_genre, 01 natural sciences, Latent Dirichlet allocation, Task (project management), 010104 statistics & probability, symbols.namesake, Transcription (linguistics), 0202 electrical engineering, electronic engineering, information engineering, Speech analytics, [INFO]Computer Science [cs], Latent Dirichlet Allocation, 0101 mathematics, business.industry, Representation (systemics), Index Terms— Author-Topic model, Classification, symbols, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing
Abstract: International audience; Performance of Automatic Speech Recognition (ASR) systems drops dramatically when transcribing conversations recorded in noisy conditions. Speech analytics suffer from this poor automatic transcription quality. To tackle this difficulty , a solution consists in mapping transcriptions into a space of hidden topics. This abstract representation allows to substantiate the drawbacks of the ASR process. The well-known and commonly used one is the topic-based representation from a Latent Dirichlet Allocation (LDA). Several studies demonstrate the effectiveness and reliability of this high-level representation. During the LDA learning process, distribution of words into each topic is estimated automatically. Nonetheless, in the context of a classification task, no consideration is made for the targeted classes. Thus, if the targeted application is to find out the main theme related to a dialogue, this information should be taken into consideration. In this paper, we propose to compare a classical topic-based representation of a dialogue, with a new one based not only on the dialogue content itself (words), but also on the theme related to the dialogue. This original representation is based on the author-topic (AT) model. The effectiveness of the proposed representation is evaluated on a classification task from automatic dialogue transcriptions between an agent and a customer of the Paris Transportation Company. Experiments confirmed that this author-topic model approach outperforms by far the classical topic representation, with a substantial gain of more than 7% in terms of correctly labeled conversations.
Published: 2014
Full Text: View/download PDF

26. Feature selection using Principal Component Analysis for massive retweet detection

Author: Richard Dufour, Pierre-Michel Bousquet, Mohamed Morchid, Georges Linarès, Juan-Manuel Torres-Moreno, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: Computer science, 020207 software engineering, Context (language use), Feature selection, 02 engineering and technology, computer.software_genre, Popularity, Set (abstract data type), Artificial Intelligence, Signal Processing, Principal component analysis, 0202 electrical engineering, electronic engineering, information engineering, Selection (linguistics), 020201 artificial intelligence & image processing, [INFO]Computer Science [cs], Computer Vision and Pattern Recognition, Data mining, computer, Software
Abstract: International audience; Social networks become a major actor in massive information propagation. In the context of the Twitter platform, its popularity is due in part to the capability of relaying messages (i.e. tweets) posted by users. This particular mechanism, called retweet, allows users to massively share tweets they consider as potentially interesting for others. In this paper, we propose to study the behavior of tweets that have been massively retweeted in a short period of time. We first analyze specific tweet features through a Principal Component Analysis (PCA) to better understand the behavior of highly forwarded tweets as opposed to those retweeted only a few times. Finally, we propose to automatically detect the massively retweeted messages. The qualitative study is used to select the features allowing the best classification performance. We show that the selection of only the most correlated features, leads to the best classification accuracy (F-measure of 65.7%), with a gain of about 2.4 points in comparison to the use of the complete set of features.
Published: 2014
Full Text: View/download PDF

27. An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

Author: Mohamed Morchid, Driss Matrouf, Renato De Mori, Richard Dufour, Georges Linarès, Mohamed Bouallegue, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and McGill University = Université McGill [Montréal, Canada]
Subjects: Information retrieval, Computer science, business.industry, Speaker recognition, computer.software_genre, I vector, Latent Dirichlet allocation, symbols.namesake, Transcription (linguistics), symbols, [INFO]Computer Science [cs], Granularity, Artificial intelligence, business, computer, Natural language processing
Abstract: International audience; Various studies highlighted that topic-based approaches give a powerful spoken content representation of documents. Nonetheless, these documents may contain more than one main theme, and their automatic transcription inevitably contains errors. In this study, we propose an original and promising framework based on a compact representation of a textual document , to solve issues related to topic space granularity. Firstly, various topic spaces are estimated with different numbers of classes from a Latent Dirichlet Allocation. Then, this multiple topic space representation is compacted into an elementary segment , called c-vector, originally developed in the context of speaker recognition. Experiments are conducted on the DECODA corpus of conversations. Results show the effectiveness of the proposed multi-view compact representation paradigm. Our identification system reaches an accuracy of 85%, with a significant gain of 9 points compared to the baseline (best single topic space configuration).
Published: 2014
Full Text: View/download PDF

28. Subspace Gaussian Mixture Models for Dialogues Classification

Author: Driss Matrouf, Mohamed Bouallegue, Mohamed Morchid, Georges Linarès, Renato De Mori, Richard Dufour, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, McGill University = Université McGill [Montréal, Canada], and Déposants HAL-Avignon, bibliothèque Universitaire
Subjects: Topic model, GMM subspace, Computer science, 02 engineering and technology, [INFO] Computer Science [cs], Machine learning, computer.software_genre, Latent Dirichlet allocation, 030507 speech-language pathology & audiology, 03 medical and health sciences, symbols.namesake, LDA features, 0202 electrical engineering, electronic engineering, information engineering, [INFO]Computer Science [cs], Latent Dirichlet Allocation, Representation (mathematics), business.industry, 020206 networking & telecommunications, Index Terms: Human/Human conversation analysis, Mixture model, Expression (mathematics), theme identification, Identification (information), symbols, Artificial intelligence, 0305 other medical science, business, Theme (computing), computer, Subspace topology
Abstract: International audience; The main objective of this paper is to identify themes from dialogues of telephone conversations in a real-life customer care service. In order to capture significant semantic content in spite of high expression variability, features are extracted in a large number of hidden spaces constructed with a Latent Dirichlet Allocation (LDA) approach. Multiple views of a spoke document can then be represented with several hidden topic models. Nonetheless, the model diversity due to the multi-model approach introduces a new type of variability. An approach is proposed based on features extracted in a common homogenous subspace with the purpose of reducing the multi-span representation variability. A Gaussian Mixture Model subspace model, inspired by previous work on speaker identification, is proposed for theme identification. This representation, novel for theme classification, is compared with the direct application of multiple topic-model representations. Experiments are reported using a corpus collected in the call center of the Paris Transportation Service. Results show the effectiveness of the proposed representation paradigm with a theme identification accuracy of 78.8%, showing a significant improvement with respect to previous results on the same corpus.
Published: 2014

29. Factor Analysis based Semantic Variability Compensation for Automatic Conversation Representation

Author: Mohamed Bouallegue, Driss Matrouf, Georges Linarès, Mohamed Morchid, Richard Dufour, Renato De Mori, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, McGill University = Université McGill [Montréal, Canada], and Déposants HAL-Avignon, bibliothèque Universitaire
Subjects: Computer science, Speech recognition, media_common.quotation_subject, 02 engineering and technology, [INFO] Computer Science [cs], computer.software_genre, Latent Dirichlet allocation, 030507 speech-language pathology & audiology, 03 medical and health sciences, symbols.namesake, Dimension (vector space), Se-mantic variability, 0202 electrical engineering, electronic engineering, information engineering, Conversation, [INFO]Computer Science [cs], Latent Dirichlet Allocation, Variability compensation, Representation (mathematics), Index Terms: Human/Human conversation representation, media_common, business.industry, 020206 networking & telecommunications, Speech processing, Identification (information), symbols, Automatic classification, Noise (video), Artificial intelligence, Factor analysis, 0305 other medical science, business, computer, Subspace topology, Natural language processing
Abstract: The main objective of this paper is to identify themes from dialogues of telephone conversations in a real-life customer care service. In this task, the word semantic variability contained in these conversations may impact the classification performance by retaining the noise in their vectorial representation. In this article, we propose an original method to compensate this semantic variability using the Factor Analysis (FA) paradigm, initially designed for speech processing tasks to compensate the acoustic variability, mainly in Speaker Verification (SV) and Automatic Speech Recognition (ASR). In our proposal, we used the FA paradigm to estimate the semantic variability as an additive component located in a subspace of low dimension (with respect to the super-vector space). This additive semantic variability is estimated in Factor Analysis model space. From this estimation, a specific vector transformation is obtained and is applied to vectors of dialogue representation. Experiments are reported using a corpus collected in the call center of the Paris Transportation Service. Results show the effectiveness of the proposed representation paradigm with a theme identification accuracy of 80.0%, showing a significant improvement with respect to previous results on the same corpus. Index Terms: Human/Human conversation representation, Semantic variability, Factor analysis, Variability compensation, Automatic classification, Latent Dirichlet Allocation.
Published: 2014

30. Improving dialogue classification using a topic space representation and a Gaussian classifier based on the decision rule

Author: Pierre-Michel Bousquet, Mohamed Bouallegue, Renato De Mori, Georges Linarès, Mohamed Morchid, Richard Dufour, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and McGill University = Université McGill [Montréal, Canada]
Subjects: business.industry, Computer science, Latent dirichlet allocation, SVM, Decision rule, Space (commercial competition), Machine learning, computer.software_genre, Latent Dirichlet allocation, Support vector machine, Reduction (complexity), Identification (information), symbols.namesake, Index Terms— Speech analytics, ComputingMethodologies_PATTERNRECOGNITION, Theme classification, symbols, [INFO]Computer Science [cs], Artificial intelligence, business, Representation (mathematics), Gaussian process, computer
Abstract: International audience; In this paper, we study the impact of dialogue representations and classification methods in the task of theme identification of telephone conversation services having highly imperfect automatic transcriptions. Two dialogue representations are firstly compared: the classical Term Frequency-Inverse Document Frequency with Gini purity criteria (TF-IDF-Gini) method and the Latent Dirichlet Allocation (LDA) approach. We then propose to study an original classification method that takes advantage of the LDA topic space representation , highlighted as the best dialogue representation. To do so, two assumptions about topic representation led us to choose a Gaussian process (GP) based method. This approach is compared with a Support Vector Machine (SVM) classification method. Results show that the GP approach is a better solution to deal with the multiple theme complexity of a dialogue, no matter the conditions studied (manual or automatic transcriptions). We finally discuss the impact of the topic space reduction on the classification accuracy.
Published: 2014
Full Text: View/download PDF

31. Multimodal understanding for person recognition in video broadcasts

Author: Frédéric Béchet, Pierre Tirilly, Corinne Fredouille, Benjamin Bigot, Mickael Rouvier, Gregory Senay, Meriem Bendris, Benoit Favre, Rémi Auguste, Georges Linarès, Géraldine Damnati, Richard Dufour, Delphine Charlet, Jean Martinet, Laboratoire d'informatique Fondamentale de Marseille - UMR 6166 (LIF), Université de la Méditerranée - Aix-Marseille 2-Université de Provence - Aix-Marseille 1-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'informatique Fondamentale de Marseille (LIF), Aix Marseille Université (AMU)-École Centrale de Marseille (ECM)-Centre National de la Recherche Scientifique (CNRS), France Télécom Recherche & Développement (FT R&D), France Télécom, France Télécom Recherche et Développement [Lannion] (FTR&D), Traitement Automatique du Langage Ecrit et Parlé (TALEP), Laboratoire d'Informatique et Systèmes (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, FOX MIIRE (LIFL), Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-École Centrale de Marseille (ECM)-Aix Marseille Université (AMU), and Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU)
Subjects: Modality (human–computer interaction), Multimedia, Computer science, Speech recognition, 020207 software engineering, 02 engineering and technology, computer.software_genre, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Speaker diarisation, Information extraction, Identification (information), Transcription (linguistics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Person recognition, Set (psychology), computer, ComputingMilieux_MISCELLANEOUS
Abstract: This paper describes a multi-modal person recognition system for video broadcast developed for participating in the DefiRepere challenge. The main track of this challenge targets the identification of all persons occurring in a video either in the audio modality (speakers) or the image modality (faces). This system is developed by the PERCOL team involving 4 research labs in France and was ranked first at the 2014 Defi-Repere challenge. The main scientific issue addressed by this challenge is the combination of audio and video information extraction processes for improving the extraction performance in both modalities. In this paper, we present the strategy followed by the PERCOL team for speaker identification based on enriching the speaker diarization with features related to the ”understanding” of the video scenes: text overlay transcription and analysis, automatic situation identification (TV set, report), the amount of people visible, TV set disposition and even the camera when available. Experiments on the REPERE corpus show interesting results on the speaker identification system enriched by the scene understanding features and the usefulness of the speaker to identify faces.
Published: 2014

32. I-vector based Representation of Highly Imperfect Automatic Transcriptions

Author: Driss Matrouf, Renato De Mori, Mohamed Morchid, Georges Linarès, Richard Dufour, Mohamed Bouallegue, Déposants HAL-Avignon, bibliothèque Universitaire, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and McGill University = Université McGill [Montréal, Canada]
Subjects: Topic model, Computer science, Speech recognition, media_common.quotation_subject, Context (language use), 02 engineering and technology, [INFO] Computer Science [cs], Latent Dirichlet allocation, 030507 speech-language pathology & audiology, 03 medical and health sciences, symbols.namesake, Transcription (linguistics), 0202 electrical engineering, electronic engineering, information engineering, Speech analytics, Conversation, [INFO]Computer Science [cs], Latent Dirichlet Allocation, Baseline (configuration management), Representation (mathematics), media_common, speech recognition, joint factor analysis, 020206 networking & telecommunications, Speaker recognition, symbols, Index Terms: human/human conversation, 0305 other medical science, i-vectors
Abstract: International audience; The performance of Automatic Speech Recognition (ASR) systems drops dramatically when used in noisy environments. Speech analytics suffer from this poor quality of automatic transcriptions. In this paper, we seek to identify themes from dialogues of telephone conversation services using multiple topic-spaces estimated with a Latent Dirichlet Allocation (LDA) approach. This technique consists in estimating several topic models that offer different views of the document. Unfortunately, such a multi-model approach also introduces additional vari-abilities due to the model diversity. We propose to extract the useful information from the full model-set by using an i-vector based approach, previously developed in the context of speaker recognition. Experiments are conducted on the DECODA corpus , that contains records from the call center of the Paris Transportation Company. Results show the effectiveness of the proposed representation paradigm, our identification system reaching an accuracy of 84.7%, with a gain of 3.3 points compared to the baseline.
Published: 2014

33. Person name spotting by combining acoustic matching and LDA topic models

Author: Richard Dufour, Corinne Fredouille, Gregory Senay, Georges Linarès, Benjamin Bigot, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and Déposants HAL-Avignon, bibliothèque Universitaire
Subjects: Topic model, business.industry, Computer science, Speech recognition, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Spotting, [INFO] Computer Science [cs], Latent Dirichlet allocation, 030507 speech-language pathology & audiology, 03 medical and health sciences, symbols.namesake, 0202 electrical engineering, electronic engineering, information engineering, symbols, [INFO]Computer Science [cs], Artificial intelligence, 0305 other medical science, business, ComputingMilieux_MISCELLANEOUS
Abstract: In this article, we are interested in spoken term detection task, with a particular focus on Person Name (PN) spotting in automatic speech recognition (ASR) system outputs. We propose a two-step method that combines an acoustic matching based on a Phoneme Confusion Network (PCN) with a semantic rescoring based on the Latent Dirichlet Allocation (LDA) models. The first module allows to find, in the PCN, potential PN candidates in speech segments, while the second is in charge of ranking the competing PN, according to a LDA topic model. The proposed LDA-based approach outperforms significantly the baseline system based on a search in the ASR phoneme lattice, obtaining a F-measure score of 77.04% on PN detection.
Published: 2013

34. Theme Identification in Telephone Service Conversations using Quaternions of Speech Features

Author: Mohamed Morchid, Georges Linarès, Marc El-Bèze, Renato De Mori, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Déposants HAL-Avignon, bibliothèque Universitaire, and Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU)
Subjects: Service (systems architecture), human/human conversation analysis, Computer science, Speech recognition, media_common.quotation_subject, Index Terms: Speech analytics, 020206 networking & telecommunications, 02 engineering and technology, [INFO] Computer Science [cs], topic identification, 030507 speech-language pathology & audiology, 03 medical and health sciences, Identification (information), Word lists by frequency, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), quaternion algebra, Conversation, [INFO]Computer Science [cs], 0305 other medical science, Quaternion, Theme (computing), media_common
Abstract: International audience; The paper introduces new features for describing possible focus variation in a human/human conversation. The application considered is a real-life telephone customer care service. The purpose is to hypothesize the dominant theme of conversations between a casual customer calling. Conversations are processed by an automatic speech recognition system that provides hypotheses used for extracting word frequency. Features are extracted in different, broadly defined and partially overlapped, time segments. Combinations of each feature in different segments are represented in a quaternion algebra framework. The advantage of the proposed approach is made evident by the statistically significant improvements in theme classification accuracy .
Published: 2013

35. A LDA-based method for automatic tagging of Youtube videos

Author: Mohamed Morchid, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), and Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU)
Subjects: Topic model, Computer science, business.industry, Keyword extraction, speech recognition, Pattern recognition, 02 engineering and technology, Latent Dirichlet allocation, Small set, symbols.namesake, Robustness (computer science), keyword extraction, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, symbols, 020201 artificial intelligence & image processing, [INFO]Computer Science [cs], Artificial intelligence, Granularity, Transcription (software), business, Tag system, Index Terms— audio categorization, structuring multimedia collection
Abstract: International audience; This article presents a method for automatic tagging of Youtube videos. The proposed method combines an automatic speech recognition (ASR) system, that extracts the spoken contents, and a keyword extraction component that aims at finding a small set of tags representing a video. In order to improve the robustness of the tagging system to the recognition errors, a video transcription is represented in a topic space obtained by a Latent Dirichlet Allocation (LDA), in which each dimension is automatically characterized by a list of weighted terms. Tags are extracted by combining the weighted word list of the best LDA classes. We evaluate this method by employing the user-provided tags of Youtube videos as reference and we investigate the impact of the topic model granularity. The obtained results demonstrate the interest of such model to improve the robustness of the tagging system.
Published: 2013
Full Text: View/download PDF

36. Event detection from image hosting services by slightly-supervised multi-span context models

Author: Richard Dufour, Mohamed Morchid, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: Topic model, 0209 industrial biotechnology, business.industry, Event (computing), Computer science, 020207 software engineering, Statistical model, Context (language use), 02 engineering and technology, Internet hosting service, Machine learning, computer.software_genre, Latent Dirichlet allocation, Set (abstract data type), Data set, symbols.namesake, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, symbols, [INFO]Computer Science [cs], Artificial intelligence, Data mining, business, computer
Abstract: International audience; —We present a method to detect social events in a set of pictures from an image hosting service (Flickr). This method relies on the analysis of user-generated tags, by using statistical models trained on both a small set of manually annotated data and a large data set collected from the Internet. Social event modeling relies on multi-span topic model based on LDA (Latent Dirichlet Allocation). Experiments are conducted in the experimental setup of MediaEval'2011 evaluation campaign. The proposed system outperforms significantly the best system of this benchmark, reaching a F-measure score of about 71%.
Published: 2013
Full Text: View/download PDF

37. Person name recognition in ASR outputs using continuous context models

Author: Corinne Fredouille, Richard Dufour, Gregory Senay, Benjamin Bigot, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: business.industry, Computer science, InformationSystems_INFORMATIONINTERFACESANDPRESENTATION(e.g.,HCI), Speech recognition, Index Terms— spoken document retrieval, lexical context representation, 020206 networking & telecommunications, Context (language use), 02 engineering and technology, Latent variable, computer.software_genre, spoken name detection, 030507 speech-language pathology & audiology, 03 medical and health sciences, Content analysis, Order (business), 0202 electrical engineering, electronic engineering, information engineering, [INFO]Computer Science [cs], Artificial intelligence, 0305 other medical science, business, computer, Natural language processing, ComputingMilieux_MISCELLANEOUS
Abstract: International audience; The detection and characterization, in audiovisual documents, of speech utterances where person names are pronounced, is an important cue for spoken content analysis. This paper tackles the problematic of retrieving spoken person names in the 1-Best ASR outputs of broadcast TV shows. Our assumption is that a person name is a latent variable produced by the lexical context it appears in. Thereby, a spoken name could be derived from ASR outputs even if it has not been proposed by the speech recognition system. A new context modelling is proposed in order to capture lexical and structural information surrounding a spoken name. The fundamental hypothesis of this study has been validated on broadcast TV documents available in the context of the REPERE challenge.
Published: 2013
Full Text: View/download PDF

38. Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

Author: Yannick Estève, Guillaume Gravier, Georges Linarès, Benjamin Lecouteux, Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Laboratoire d'Informatique de Grenoble (LIG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Laboratoire d'Informatique de l'Université du Maine (LIUM), Le Mans Université (UM)-Centre National de la Recherche Scientifique (CNRS), Multimedia content-based indexing (TEXMEX), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique
Subjects: Voice activity detection, Acoustics and Ultrasonics, Computer science, business.industry, Speech recognition, Speech coding, [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM], Word error rate, 02 engineering and technology, Integrated approach, 030507 speech-language pathology & audiology, 03 medical and health sciences, Search algorithm, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Electrical and Electronic Engineering, 0305 other medical science, business, Radio broadcasting, Decoding methods
Abstract: International audience; Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both A* and beam-search-based decoder yields similar performances.
Published: 2013

39. Confidence measure for speech indexing based on Latent Dirichlet Allocation

Author: Grégory Senay, Georges Linarès, and Déposants HAL-Avignon, bibliothèque Universitaire
Subjects: InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, [INFO] Computer Science [cs]
Abstract: This paper presents a confidence measure for speech indexing that aims to predict the indexing quality of a speech document for a Spoken Document Retrieval (SDR) task. We first introduce how the indexing quality of a speech document is evaluated. Then, we present our method to predict the indexing quality of a speech document. It is based on confidence measure provided by an automatic speech recognition system and the detection of semantic outliers implemented with the Latent Dirichlet Allocation (LDA) model. Experiments are conducted on the French Broadcast news campaign ESTER2 in a classical SDR scenario where users submit text-queries to a search engine. Results demonstrate an overall improvement when the detection is done with the LDA model. The detection rate is always above 70%. Index Terms: speech indexing, confidence measure, spoken document retrieval, latent dirichlet allocation
Published: 2012

40. ON THE USE OF LINGUISTIC FEATURES IN AN AUTOMATIC SYSTEM FOR SPEECH ANALYTICS OF TELEPHONE CONVERSATIONS

Author: Renato De Mori, Marc El-Bèze, Benjamin Maza, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), McGill University = Université McGill [Montréal, Canada], and Déposants HAL-Avignon, bibliothèque Universitaire
Subjects: speech understanding, Vocabulary, human/human conversation analysis, Computer science, business.industry, media_common.quotation_subject, Index Terms: speech analytics, [INFO] Computer Science [cs], computer.software_genre, Set (abstract data type), Test set, Taxonomy (general), dialogue classification, call centre performance monitoring, Speech analytics, Conversation, [INFO]Computer Science [cs], Artificial intelligence, business, computer, Natural language processing, Sentence, media_common
Abstract: International audience; A research on the analysis of human/human conversations in a call centre is described. The purpose of the research is to provide short reports of each conversation with information useful for monitoring the call centre efficiency. Data from real users discussing over the telephone with agents are processed by an automatic speech recognition (ASR) system. Reports are grouped into classes by the agents based on predefined taxonomy. A train set of manually transcribed data is used for training the extraction of features relevant to the application and the classification of the conversations. The use of all the words of the application vocabulary, of automatically selected key_words, and of automatically learned sentence chunks containing semantic classes of words are compared and evaluated with a totally different test set. The results show a significant increase in performance when chunks are used even in comparison with the use of bags of words obtained with a boosting algorithm.
Published: 2011

41. Bag of n-gram driven decoding for LVCSR system harnessing

Author: Georges Linarès, Yannick Estève, Fethi Bougares, Paul Deléglise, Déposants HAL-Avignon, bibliothèque Universitaire, and AMOKRANE, HAKIM
Subjects: Voice activity detection, Matching (graph theory), System combination, Computer science, business.industry, Speech recognition, Speech coding, system combination, [INFO] Computer Science [cs], Machine learning, computer.software_genre, n-gram, [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], Search algorithm, Auxiliary system, Artificial intelligence, Index Terms—speech recognition, business, computer, ComputingMilieux_MISCELLANEOUS, Decoding methods, bag of n-gram driven decoding
Abstract: —This paper focuses on automatic speech recognition systems combination based on driven decoding paradigms. The driven decoding algorithm (DDA) involves the use of a 1-best hypothesis provided by an auxiliary system as another knowledge source in the search algorithm of a primary system. In previous studies, it was shown that DDA outperforms ROVER when the primary system is guided by a more accurate system. In this paper we propose a new method to manage auxiliary transcriptions which are presented as a bag-of-n-grams (BONG) without temporal matching. These modifications allow to make easier the combination of several hypotheses given by different auxiliary systems. Using BONG combination with hypotheses provided by two auxiliary systems, each of which obtained more than 23% of WER on the same data, our experiments show that a CMU Sphinx based ASR system can reduce its WER from 19.85% to 18.66% which is better than the results reached with DDA or classical ROVER combination.
Published: 2011

42. A SEGMENT-LEVEL CONFIDENCE MEASURE FOR SPOKEN DOCUMENT RETRIEVAL

Author: Georges Linarès, Gregory Senay, Benjamin Lecouteux, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Laboratoire d'Informatique de Grenoble (LIG), and Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)
Subjects: Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 02 engineering and technology, confidence measures, Speech recognition, Semantics, computer.software_genre, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Task (project management), Search engine, Text mining, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, Relevance (information retrieval), Document retrieval, 050107 human factors, Measure (data warehouse), Information retrieval, business.industry, 05 social sciences, Search engine indexing, 020206 networking & telecommunications, spoken document retrieval, Index (publishing), Artificial intelligence, Transcription (software), business, computer, Natural language processing
Abstract: International audience; This paper presents a semantic confidence measure that aims to predict the relevance of automatic transcripts for a task of Spoken Document Retrieval (SDR). The proposed predicting method relies on the combination of Automatic Speech Recognition (ASR) confidence measure and a Semantic Com-pacity Index (SCI), that estimates the relevance of the words considering the semantic context in which they occurred. Experiments are conducted on the French Broadcast news corpus ESTER, by simulating a classical SDR usage scenario : users submit text-queries to a search engine that is expected to return the most relevant documents regarding the query. Results demonstrate the interest of using semantic level information to predict the transcription indexability.
Published: 2011

43. Combination of Probabilistic and Possibilistic Language Models

Author: Stanislas Oger, Vladimir Popescu, Georges Linarès, and Déposants HAL-Avignon, bibliothèque Universitaire
Subjects: [INFO] Computer Science [cs]
Abstract: In a previous paper we proposed Web-based language models relying on the possibility theory. These models explicitly represent the possibility of word sequences. In this paper we propose to find the best way of combining this kind of model with classical probabilistic models, in the context of automatic speech recognition. We propose several combination approaches, depending on the nature of the combined models. With respect to the baseline, the best combination provides an absolute word error rate reduction of about 1% on broadcast news transcription , and of 3.5% on domain-specific multimedia document transcription. Index Terms: language models, world wide web, possibility measure, automatic speech recognition
Published: 2010

44. Factor Analysis for Audio-based Video Genre Classification

Author: Driss Matrouf, Mickael Rouvier, Georges Linarès, and Déposants HAL-Avignon, bibliothèque Universitaire
Subjects: Channel (digital image), Computer science, business.industry, Speech recognition, Pattern recognition, automatic classification, [INFO] Computer Science [cs], Mixture model, Domain (software engineering), Support vector machine, ComputingMethodologies_PATTERNRECOGNITION, Factor (programming language), Index Terms: video genre identification, Feature (machine learning), Artificial intelligence, business, Factor Analy-sis, computer, computer.programming_language
Abstract: Statistical classifiers operate on features that generally include both useful and useless information. These two types of information are difficult to separate in the feature domain. Recently, a new paradigm based on a Latent Factor Analysis (LFA) proposed a model decomposition into usefull and useless components. This method was successfully applied to speaker and language recognition tasks. In this paper, we study the use of LFA for video genre classification by using only the audio channel. We propose a classification method based on short-term cep-stral features and Gaussian Mixture Models (GMM) or Support Vector Machine (SVM) classifiers, that are combined with Factor Analysis (FA). Experiments are conducted on a corpus composed of 5 types of video (musics, commercials, cartoons, movies and news). The relative classification error reduction obtained by using the best factor analysis configuration with respect to the baseline system, Gaussian Mixture Model Universal Background Model (GMM-UBM), is about 56%, corresponding to a correct identification rate of about 90%.
Published: 2009

45. Probabilistic and Possibilistic Language Models Based on the World Wide Web

Author: Stanislas Oger, Vladimir Popescu, Georges Linarès, and Déposants HAL-Avignon, bibliothèque Universitaire
Subjects: [INFO] Computer Science [cs]
Abstract: Usually, language models are built either from a closed corpus, or by using World Wide Web retrieved documents, which are considered as a closed corpus themselves. In this paper we propose several other ways, more adapted to the nature of the Web, of using this resource for language modeling. We first start by improving an approach consisting in estimating n-gram probabilities from Web search engine statistics. Then, we propose a new way of considering the information extracted from the Web in a probabilistic framework. Then, we also propose to rely on Possibility Theory for effectively using this kind of information. We compare these two approaches on two automatic speech recognition tasks: (i) transcribing broadcast news data, and (ii) transcribing domain-specific data, concerning surgical operation film comments. We show that the two approaches are effective in different situations. Index Terms: language modeling, World Wide Web, possibility measure, automatic speech recognition
Published: 2009

46. Compact Acoustic Models for Embedded Speech Recognition

Author: Jean-François Bonastre, Christophe Lévy, and Georges Linarès
Subjects: Acoustics and Ultrasonics, Computer science, Speech recognition, Resource constraints, lcsh:QC221-246, Acoustic model, Probability density function, Speaker recognition, Speech processing, lcsh:QA75.5-76.95, Transformation (function), Computer Science::Sound, lcsh:Acoustics. Sound, lcsh:Electronic computers. Computer science, Electrical and Electronic Engineering, Adaptation (computer science), Hidden Markov model
Abstract: Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In order to fit the resource constraints of embedded applications, an approach based on a semicontinuous HMM system using state-independent acoustic modelling is proposed. A transformation is computed and applied to the global model in order to obtain each HMM state-dependent probability density functions, authorizing to store only the transformation parameters. This approach is evaluated on two tasks: digit and voice-command recognition. A fast adaptation technique of acoustic models is also proposed. In order to significantly reduce computational costs, the adaptation is performed only on the global model (using related speaker recognition adaptation techniques) with no need for state-dependent data. The whole approach results in a relative gain of more than 20% compared to a basic HMM-based system fitting the constraints.
Published: 2009

47. Combined low level and high level features for Out-Of-Vocabulary Word detection

Author: Benjamin Lecouteux, Georges Linarès, Benoit Favre, Favre, Benoit, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, International Computer Science Institute [Berkeley] (ICSI), International Computer Science Institute, and Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU)
Subjects: [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], speech recognition, confidence measures, Index Terms: OOV word detection, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], ComputingMilieux_MISCELLANEOUS
Abstract: International audience; no abstract
Published: 2009

48. Frame-Based Acoustic Feature Integration for Speech Understanding

Author: Loïc Barrault, Driss Matrouf, R. De Mori, Christophe Servan, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: speech understanding, 0209 industrial biotechnology, Computer science, Speech recognition, Word error rate, Topology (electrical circuits), 02 engineering and technology, Set (abstract data type), Reduction (complexity), 030507 speech-language pathology & audiology, 03 medical and health sciences, 020901 industrial engineering & automation, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, frame based combination, business.industry, speech recognition, posterior probabilities combination, Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing), Pattern recognition, Computer Science::Sound, Feature (computer vision), Artificial intelligence, 0305 other medical science, business, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Spoken language, Interpolation
Abstract: International audience; With the purpose of improving Spoken Language Un- derstanding (SLU) performance, a combination of different acoustic speech recognition (ASR) systems is proposed. State a posteriori probabilities obtained with systems using different acoustic feature sets are combined with log-linear inter- polation. In order to perform a coherent combination of these probabilities, acoustic models must have the same topology (i.e. same set of states). For this purpose, a fast and efficient twin model training protocol is proposed. By a wise choice of acoustic feature sets and log-linear interpolation of their like- lihood ratios, a substantial Concept Error Rate (CER) reduction has been observed on the test part of the French MEDIA corpus.
Published: 2008
Full Text: View/download PDF

49. On-demand new word learning using world wide web

Author: Pascal Nocera, Georges Linarès, Stanislas Oger, Frédéric Béchet, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Subjects: Computer science, Word processing, Context (language use), 02 engineering and technology, Natural languages, Speech recognition, Lexicon, computer.software_genre, World Wide Web, 030507 speech-language pathology & audiology, 03 medical and health sciences, 0202 electrical engineering, electronic engineering, information engineering, Information retrieval, Relevance (information retrieval), [INFO]Computer Science [cs], Index Terms— Lexical modeling, Semantic Web, business.industry, 020201 artificial intelligence & image processing, Artificial intelligence, Transcription (software), 0305 other medical science, business, computer, Word (computer architecture), Natural language, Natural language processing
Abstract: International audience; Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web. We suggest that the local context of the out-of-vocabulary (OOV) words contains relevant information on the OOV words. With this information, we propose to use the Web to build locally-augmented lexicons which are used in a final local decoding pass. Our experiments confirm the relevance of the Web for the OOV word retrieval. Different methods are proposed to retrieve the hypothesis words. Finally we present the integration of new words in the transcription process based on part-of-speech models. This technique allows to recover 7.6% of the significant OOV words and the accuracy of the system is improved.
Published: 2008
Full Text: View/download PDF

50. GENERALIZED DRIVEN DECODING FOR SPEECH RECOGNITION SYSTEM COMBINATION

Author: Benjamin Lecouteux, Georges Linarès, Yannick Estève, Guillaume Gravier, Laboratoire d'Informatique de Grenoble (LIG), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), Laboratoire d'Informatique de l'Université du Mans (LIUM), Le Mans Université (UM), Creating and exploiting explicit links between multimedia fragments (LinkMedia), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-MEDIA ET INTERACTIONS (IRISA-D6), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Speech and sound data modeling and processing (METISS), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), ANR-06-MDCA-0006,EPAC,Exploration de masse de documents audio pour l'extraction et le traitement de la parole conversationnelle(2006), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)
Subjects: System combination, Computer science, business.industry, Speech recognition, Speech coding, Word error rate, system combination, Pattern recognition, 02 engineering and technology, Integrated approach, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], 030507 speech-language pathology & audiology, 03 medical and health sciences, Search algorithm, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, [INFO]Computer Science [cs], Artificial intelligence, 0305 other medical science, business, Decoding methods, ComputingMilieux_MISCELLANEOUS
Abstract: International audience; Driven Decoding Algorithm (DDA) is initially an integrated approach for the combination of 2 speech recognition (ASR) systems. It consists in guiding the search algorithm of a primary ASR system by the one-best hypothesis of an auxiliary system. In this paper , we generalize DDA to confusion-network driven decoding and we propose new combination schemes for multiple system combination. Since previous experiments involved 2 ASR systems on broadcast news data, the proposed extended DDA is evaluated using 3 ASR systems from different labs. Results show that generalized-DDA outperforms significantly ROVER method: we obtain a 15.7% relative word error rate improvement with respect to the best single system, as opposed to 8.5% with the ROVER combination.
Published: 2008

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Journal

Database

Publisher

56 results on '"Georges Linarès"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources