279 results on '"Georges Linarès"'
Search Results
152. INEX 2012 Benchmark a Semantic Space for Tweets Contextualization.
- Author
-
Mohamed Morchid and Georges Linarès
- Published
- 2012
153. Mediaeval benchmark: Social Event Detection using LDA and external resources.
- Author
-
Mohamed Morchid and Georges Linarès
- Published
- 2011
154. LIA @ MediaEval 2011: Compact representation of heterogeneous descriptors for video genre classification.
- Author
-
Mickael Rouvier and Georges Linarès
- Published
- 2011
155. Transcriber Driving Strategies for Transcription Aid System.
- Author
-
Grégory Senay, Georges Linarès, Benjamin Lecouteux, Stanislas Oger, and Thierry Michel
- Published
- 2010
156. Learning to retrieve out-of-vocabulary words in speech recognition.
- Author
-
Imran A. Sheikh, Irina Illina, Dominique Fohr, and Georges Linarès
- Published
- 2015
157. Local Methods for On-Demand Out-of-Vocabulary Word Retrieval.
- Author
-
Stanislas Oger, Georges Linarès, and Frédéric Béchet
- Published
- 2008
158. Real to H-Space Autoencoders for Theme Identification in Telephone Conversations
- Author
-
Xavier Bost, Titouan Parcollet, Renato De Mori, Mohamed Morchid, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and McGill University = Université McGill [Montréal, Canada]
- Subjects
Acoustics and Ultrasonics ,Computer science ,Index Terms-Features extraction ,Feature extraction ,02 engineering and technology ,Semantics ,Machine learning ,computer.software_genre ,Reduction (complexity) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,quaternion neural networks ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,[INFO]Computer Science [cs] ,Electrical and Electronic Engineering ,Artificial neural network ,business.industry ,Deep learning ,spoken language understanding ,Autoencoder ,quaternion autoencoder ,Computational Mathematics ,Identification (information) ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,computer ,Spoken language - Abstract
International audience; Machine learning (ML) and deep learning with deep neural networks (DNN), have drastically improved the performances of modern systems on numerous spoken language understanding (SLU) related tasks. Since most of current researches focus on new neural architectures to enhance the performances in realistic conditions, few recent works investigated the use of different algebras with neural networks (NN), to better represent the nature of the data being processed. To this extent, quaternion-valued neural networks (QNN) have shown better performances, and an important reduction of the number of neural parameters compared to traditional real-valued neural networks, when dealing with multidimensional signal. Nonetheless, the use of QNNs is strictly limited to quaternion input or output features. This paper introduces a new unsupervised method based on a hybrid autoencoder (AE) called real-to-quaternion autoencoder (R2H), to extract a quaternion-valued input signal from any real-valued data, to be processed by QNNs. The experiments performed to identify the most related theme of a given telephone conversation from a customer care service (CCS), demonstrate that the R2H approach outperforms all the previously established models, either real-or quaternion-valued ones, in term of accuracy and with up to four times fewer neural parameters.
- Published
- 2020
- Full Text
- View/download PDF
159. GMM-based acoustic modeling for embedded speech recognition.
- Author
-
Christophe Lévy, Georges Linarès, and Jean-François Bonastre
- Published
- 2006
- Full Text
- View/download PDF
160. Imperfect transcript driven speech recognition.
- Author
-
Benjamin Lecouteux, Georges Linarès, Pascal Nocera, and Jean-François Bonastre
- Published
- 2006
- Full Text
- View/download PDF
161. A survey of quaternion neural networks
- Author
-
Mohamed Morchid, Georges Linarès, and Titouan Parcollet
- Subjects
Linguistics and Language ,Signal processing ,Quantitative Biology::Neurons and Cognition ,Artificial neural network ,Computer science ,business.industry ,Computer Science::Neural and Evolutionary Computation ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Language and Linguistics ,Image (mathematics) ,Reduction (complexity) ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Quaternion ,business - Abstract
Quaternion neural networks have recently received an increasing interest due to noticeable improvements over real-valued neural networks on real world tasks such as image, speech and signal processing. The extension of quaternion numbers to neural architectures reached state-of-the-art performances with a reduction of the number of neural parameters. This survey provides a review of past and recent research on quaternion neural networks and their applications in different domains. The paper details methods, algorithms and applications for each quaternion-valued neural networks proposed.
- Published
- 2019
- Full Text
- View/download PDF
162. Modelling View-count Dynamics in YouTube.
- Author
-
Cédric Richier, Eitan Altman, Rachid El Azouzi, Tania Altman, Georges Linarès, and Yonathan Portilla
- Published
- 2014
163. Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis
- Author
-
Richard Dufour, Mohamed Morchid, Killian Janod, Georges Linarès, Renato De Mori, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
Acoustics and Ultrasonics ,Computer science ,media_common.quotation_subject ,Speech recognition ,Feature extraction ,02 engineering and technology ,computer.software_genre ,Bottleneck ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Transcription (linguistics) ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Conversation ,Electrical and Electronic Engineering ,ComputingMilieux_MISCELLANEOUS ,media_common ,business.industry ,Speech processing ,Autoencoder ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Computational Mathematics ,Conversation analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,computer ,Natural language processing - Abstract
Automatic transcription of spoken documents is affected by automatic transcription errors that are especially frequent when speech is acquired in severe noisy conditions. Automatic speech recognition errors induce errors in the linguistic features used for a variety of natural language processing tasks. Recently, denoisng autoencoders (DAE) and stacked autoencoders (SAE) have been proposed with interesting results for acoustic feature denoising tasks. This paper deals with the recovery of corrupted linguistic features in spoken documents. Solutions based on DAEs and SAEs are considered and evaluated in a spoken conversation analysis task. In order to improve conversation theme classification accuracy, the possibility of combining abstractions obtained from manual and automatic transcription features is considered. As a result, two original representations of highly imperfect spoken documents are introduced. They are based on bottleneck features of a supervised autoencoder that takes advantage of both noisy and clean transcriptions to improve the robustness of error prone representations. Experimental results on a spoken conversation theme identification task show substantial accuracy improvements obtained with the proposed recovery of corrupted features.
- Published
- 2017
- Full Text
- View/download PDF
164. E2E-SINCNET: TOWARD FULLY END-TO-END SPEECH RECOGNITION
- Author
-
Mohamed Morchid, Titouan Parcollet, Georges Linarès, Parcollet, Titouan, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and PNRIA
- Subjects
Lossless compression ,[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Computer science ,Speech recognition ,05 social sciences ,Word error rate ,010501 environmental sciences ,[INFO] Computer Science [cs] ,01 natural sciences ,Signal ,Field (computer science) ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,End-to-end principle ,0502 economics and business ,Waveform ,[INFO]Computer Science [cs] ,Mel-frequency cepstrum ,050207 economics ,Joint (audio engineering) ,0105 earth and related environmental sciences - Abstract
International audience; Modern end-to-end (E2E) Automatic Speech Recognition (ASR) systems rely on Deep Neural Networks (DNN) that are mostly trained on handcrafted and pre-computed acoustic features such as Mel-filter-banks or Mel-frequency cepstral coefficients. Nonetheless , and despite worse performances, E2E ASR models processing raw waveforms are an active research field due to the lossless nature of the input signal. In this paper, we propose the E2E-SincNet, a novel fully E2E ASR model that goes from the raw waveform to the text transcripts by merging two recent and powerful paradigms: SincNet and the joint CTC-attention training scheme. The conducted experiments on two different speech recognition tasks show that our approach outperforms previously investigated E2E systems relying either on the raw waveform or pre-computed acoustic features, with a reported top-of-the-line Word Error Rate (WER) of 4.7% on the Wall Street Journal (WSJ) dataset.
- Published
- 2020
165. M2H-GAN: A GAN-Based Mapping from Machine to Human Transcripts for Speech Understanding
- Author
-
Xavier Bost, Mohamed Morchid, Georges Linarès, Titouan Parcollet, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and Parcollet, Titouan
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,business.industry ,Computer science ,Speech recognition ,Deep learning ,Machine Learning (stat.ML) ,Context (language use) ,[INFO] Computer Science [cs] ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Machine Learning (cs.LG) ,Term (time) ,Identification (information) ,Statistics - Machine Learning ,[INFO]Computer Science [cs] ,Artificial intelligence ,Representation (mathematics) ,business ,Computation and Language (cs.CL) ,Generative grammar ,Spoken language - Abstract
Deep learning is at the core of recent spoken language understanding (SLU) related tasks. More precisely, deep neural networks (DNNs) drastically increased the performances of SLU systems, and numerous architectures have been proposed. In the real-life context of theme identification of telephone conversations, it is common to hold both a human, manual (TRS) and an automatically transcribed (ASR) versions of the conversations. Nonetheless, and due to production constraints, only the ASR transcripts are considered to build automatic classifiers. TRS transcripts are only used to measure the performances of ASR systems. Moreover, the recent performances in term of classification accuracy, obtained by DNN related systems are close to the performances reached by humans, and it becomes difficult to further increase the performances by only considering the ASR transcripts. This paper proposes to distillates the TRS knowledge available during the training phase within the ASR representation, by using a new generative adversarial network called M2H-GAN to generate a TRS-like version of an ASR document, to improve the theme identification performances., Comment: Submitted at INTERSPEECH 2019
- Published
- 2019
- Full Text
- View/download PDF
166. Real to H-space Encoder for Speech Recognition
- Author
-
Georges Linarès, Mohamed Morchid, Renato De Mori, Titouan Parcollet, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, McGill University = Université McGill [Montréal, Canada], and Parcollet, Titouan
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,FOS: Computer and information sciences ,Sound (cs.SD) ,Relation (database) ,Computer science ,Speech recognition ,Computer Science::Neural and Evolutionary Computation ,[INFO] Computer Science [cs] ,Computer Science - Sound ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Quaternion ,Representation (mathematics) ,Index Terms: quaternion neural networks ,Computer Science - Computation and Language ,Artificial neural network ,Frame (networking) ,Process (computing) ,speech recognition ,recurrent neural net- works ,Recurrent neural network ,Encoder ,Computation and Language (cs.CL) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences. Recently, it has been shown that different input representations, based on multidimensional algebras, such as complex and quaternion numbers, are able to bring to neural networks a more natural, compressive and powerful representation of the input signal by outperforming common real-valued NNs. Indeed, quaternion-valued neural networks (QNNs) better learn both internal dependencies, such as the relation between the Mel-filter-bank value of a specific time frame and its time derivatives, and global dependencies, describing the relations that exist between time frames. Nonetheless, QNNs are limited to quaternion-valued input signals, and it is difficult to benefit from this powerful representation with real-valued input data. This paper proposes to tackle this weakness by introducing a real-to-quaternion encoder that allows QNNs to process any one dimensional input features, such as traditional Mel-filter-banks for automatic speech recognition., Comment: Accepted at INTERSPEECH 2019
- Published
- 2019
167. Bidirectional Quaternion Long Short-term Memory Recurrent Neural Networks for Speech Recognition
- Author
-
Mohamed Morchid, Georges Linarès, Titouan Parcollet, and Renato De Mori
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Sound (cs.SD) ,Computer Science - Machine Learning ,Sequence ,Computer science ,Speech recognition ,Machine Learning (stat.ML) ,020206 networking & telecommunications ,02 engineering and technology ,Computer Science - Sound ,Machine Learning (cs.LG) ,Term (time) ,Long short term memory ,Recurrent neural network ,Audio and Speech Processing (eess.AS) ,Statistics - Machine Learning ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Electrical Engineering and Systems Science - Signal Processing ,Element (category theory) ,Quaternion ,Representation (mathematics) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recurrent neural networks (RNN) are at the core of modern automatic speech recognition (ASR) systems. In particular, long-short term memory (LSTM) recurrent neural networks have achieved state-of-the-art results in many speech recognition tasks, due to their efficient representation of long and short term dependencies in sequences of inter-dependent features. Nonetheless, internal dependencies within the element composing multidimensional features are weakly considered by traditional real-valued representations. We propose a novel quaternion long-short term memory (QLSTM) recurrent neural network that takes into account both the external relations between the features composing a sequence, and these internal latent structural dependencies with the quaternion algebra. QLSTMs are compared to LSTMs during a memory copy-task and a realistic application of speech recognition on the Wall Street Journal (WSJ) dataset. QLSTM reaches better performances during the two experiments with up to $2.8$ times less learning parameters, leading to a more expressive representation of the information., Submitted at ICASSP 2019. arXiv admin note: text overlap with arXiv:1806.04418
- Published
- 2019
- Full Text
- View/download PDF
168. Quaternion Convolutional Neural Networks for Heterogeneous Image Processing
- Author
-
Mohamed Morchid, Georges Linarès, Titouan Parcollet, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,0209 industrial biotechnology ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Machine Learning (stat.ML) ,Image processing ,02 engineering and technology ,Iterative reconstruction ,Grayscale ,Convolutional neural network ,Machine Learning (cs.LG) ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,020901 industrial engineering & automation ,Statistics - Machine Learning ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Quaternion ,Training set ,Artificial neural network ,Pixel ,Color image ,business.industry ,Pattern recognition ,Computer Science::Computer Vision and Pattern Recognition ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Convolutional neural networks (CNN) have recently achieved state-of-the-art results in various applications. In the case of image recognition, an ideal model has to learn independently of the training data, both local dependencies between the three components (R,G,B) of a pixel, and the global relations describing edges or shapes, making it efficient with small or heterogeneous datasets. Quaternion-valued convolutional neural networks (QCNN) solved this problematic by introducing multidimensional algebra to CNN. This paper proposes to explore the fundamental reason of the success of QCNN over CNN, by investigating the impact of the Hamilton product on a color image reconstruction task performed from a gray-scale only training. By learning independently both internal and external relations and with less parameters than real valued convolutional encoder-decoder (CAE), quaternion convolutional encoder-decoders (QCAE) perfectly reconstructed unseen color images while CAE produced worst and gray-scale versions., Comment: Submitted at ICASSP 2019
- Published
- 2019
- Full Text
- View/download PDF
169. Nonlinear GSM echo cancellation: application to speech recognition.
- Author
-
Laurent Barcharolli, Georges Linarès, J.-P. Costa, and Jean-François Bonastre
- Published
- 2003
170. Impact of Word Error Rate on theme identification task of highly imperfect human–human conversations
- Author
-
Georges Linarès, Mohamed Morchid, and Richard Dufour
- Subjects
0209 industrial biotechnology ,Computer science ,media_common.quotation_subject ,Speech recognition ,Word error rate ,02 engineering and technology ,computer.software_genre ,Latent Dirichlet allocation ,Theoretical Computer Science ,symbols.namesake ,020901 industrial engineering & automation ,Discriminative model ,Transcription (linguistics) ,0202 electrical engineering, electronic engineering, information engineering ,Speech analytics ,Conversation ,Gaussian process ,media_common ,business.industry ,Human-Computer Interaction ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Software ,Natural language processing - Abstract
HighlightsReview of the impact of dialogue representations and classification methods.We discuss the impact of discriminative words in terms of transcription accuracy.Original study evaluating the impact of the WER in the LDA topic space. A review is proposed of the impact of word representations and classification methods in the task of theme identification of telephone conversation services having highly imperfect automatic transcriptions. We firstly compare two word-based representations using the classical Term Frequency-Inverse Document Frequency with Gini purity criteria (TF-IDF-Gini) method and the latent Dirichlet allocation (LDA) approach. We then introduce a classification method that takes advantage of the LDA topic space representation, highlighted as the best word representation. To do so, two assumptions about topic representation led us to choose a Gaussian Process (GP) based method. Its performance is compared with a classical Support Vector Machine (SVM) classification method. Experiments showed that the GP approach is a better solution to deal with the multiple theme complexity of a dialogue, no matter the conditions studied (manual or automatic transcriptions) (Morchid et al., 2014). In order to better understand results obtained using different word representation methods and classification approaches, we then discuss the impact of discriminative and non-discriminative words extracted by both word representations methods in terms of transcription accuracy (Morchid et al., 2014). Finally, we propose a novel study that evaluates the impact of the Word Error Rate (WER) in the LDA topic space learning process as well as during the theme identification task. This original qualitative study points out that selecting a small subset of words having the lowest WER (instead of using all the words) allows the system to better classify automatic transcriptions with an absolute gain of 0.9 point, in comparison to the best performance achieved on this dialogue classification task (precision of 83.3%).
- Published
- 2016
- Full Text
- View/download PDF
171. Conversational Networks for Automatic Online Moderation
- Author
-
Georges Linarès, Vincent Labatut, Richard Dufour, Etienne Papegnies, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Nectar de Code, and Région PACA
- Subjects
FOS: Computer and information sciences ,Computer science ,User-generated content ,02 engineering and technology ,Machine learning ,computer.software_genre ,Social computing ,[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI] ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Computer Science - Information Retrieval ,Computer Science - Computers and Society ,Discriminative model ,[INFO.INFO-CY]Computer Science [cs]/Computers and Society [cs.CY] ,020204 information systems ,Computers and Society (cs.CY) ,0202 electrical engineering, electronic engineering, information engineering ,Information retrieval ,Social and Information Networks (cs.SI) ,Computer Science - Computation and Language ,Classification algorithms ,business.industry ,Computer Science - Social and Information Networks ,Text analysis ,Online community ,Human-Computer Interaction ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Statistical classification ,Modeling and Simulation ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,Task analysis ,020201 artificial intelligence & image processing ,The Internet ,Artificial intelligence ,business ,Network theory (graphs) ,Classifier (UML) ,computer ,Computation and Language (cs.CL) ,Social Sciences (miscellaneous) ,Information Retrieval (cs.IR) - Abstract
International audience; Moderation of user-generated content in an online community is a challenge that has great socio-economical ramifications. However, the costs incurred by delegating this work to human agents are high. For this reason, an automatic system able to detect abuse in user-generated content is of great interest. There are a number of ways to tackle this problem, but the most commonly seen in practice are word filtering or regular expression matching. The main limitations are their vulnerability to intentional obfuscation on the part of the users, and their context-insensitive nature. Moreover, they are language-dependent and may require appropriate corpora for training. In this paper, we propose a system for automatic abuse detection that completely disregards message content. We first extract a conversational network from raw chat logs and characterize it through topological measures. We then use these as features to train a classifier on our abuse detection task. We thoroughly assess our system on a dataset of user comments originating from a French Massively Multiplayer Online Game. We identify the most appropriate network extraction parameters and discuss the discriminative power of our features, relatively to their topological and temporal nature. Our method reaches an F-measure of 83.89 when using the full feature set, improving on existing approaches. With a selection of the most discriminative features, we dramatically cut computing time while retaining most of the performance (82.65).
- Published
- 2019
- Full Text
- View/download PDF
172. Remembering winter was coming Character-oriented video summaries of TV series
- Author
-
Raphaël Roth, Damien Malinas, Xavier Bost, Vincent Labatut, Serigne Gueye, Martha Larson, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, ORKIS SAS, Delft Multimedia Information Retrieval Lab (DMIR), Delft University of Technology (TU Delft), Laboratoire Culture et Communication (LCC), Avignon Université (AU), ANR-14-CE24-0022 - GaFes - Galeries des Festivals (2014), FR3621 Agorantic, and ANR-14-CE24-0022,GaFes,Galeries des Festivals(2014)
- Subjects
FOS: Computer and information sciences ,Computer Networks and Communications ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI] ,Computer Science - Information Retrieval ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Narrative ,Plot (narrative) ,Social network analysis ,media_common ,Social and Information Networks (cs.SI) ,Information retrieval ,Plot analysis ,Grammar ,Social network ,business.industry ,Filmmaking ,Data Science ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,Computer Science - Social and Information Networks ,Extractive summarization ,020207 software engineering ,Dynamic social network ,Language & Communication ,Multimedia (cs.MM) ,Character (mathematics) ,Hardware and Architecture ,Dynamics (music) ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,TV series ,Language & Speech Technology ,business ,Computer Science - Multimedia ,Information Retrieval (cs.IR) ,Software - Abstract
International audience; Today's popular TV series tend to develop continuous, complex plots spanning several seasons, but are often viewed in controlled and discontinuous conditions. Consequently, most viewers need to be re-immersed in the story before watching a new season. Although discussions with friends and family can help, we observe that most viewers make extensive use of summaries to re-engage with the plot. Automatic generation of video summaries of TV series' complex stories requires, first, modeling the dynamics of the plot and, second, extracting relevant sequences. In this paper, we tackle plot modeling by considering the social network of interactions between the characters involved in the narrative: substantial, durable changes in a major character's social environment suggest a new development relevant for the summary. Once identified, these major stages in each character's storyline can be used as a basis for completing the summary with related sequences. Our algorithm combines such social network analysis with filmmaking grammar to automatically generate character-oriented video summaries of TV series from partially annotated data. We carry out evaluation with a user study in a real-world scenario: a large sample of viewers were asked to rank video summaries centered on five characters of the popular TV series Game of Thrones, a few weeks before the new, sixth season was released. Our results reveal the ability of character-oriented summaries to re-engage viewers in television series and confirm the contributions of modeling the plot content and exploiting stylistic patterns to identify salient sequences.
- Published
- 2019
- Full Text
- View/download PDF
173. Quaternion Convolutional Neural Networks For Theme Identification Of Telephone Conversations
- Author
-
Mohamed Morchid, Renato De Mori, Titouan Parcollet, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, McGill University = Université McGill [Montréal, Canada], and Parcollet, Titouan
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,business.industry ,Computer science ,Process (computing) ,02 engineering and technology ,010501 environmental sciences ,[INFO] Computer Science [cs] ,01 natural sciences ,Convolutional neural network ,Task (project management) ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Identification (information) ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Task analysis ,Code (cryptography) ,020201 artificial intelligence & image processing ,[INFO]Computer Science [cs] ,Artificial intelligence ,business ,Quaternion ,ComputingMilieux_MISCELLANEOUS ,0105 earth and related environmental sciences - Abstract
Quaternion convolutional neural networks (QCNN) are powerful architectures to learn and model external dependencies that exist between neighbor features of an input vector, and internal latent dependencies within the feature. This paper proposes to evaluate the effectiveness of the QCNN on a realistic theme identification task of spoken telephone conversations between agents and customers from the call center of the Paris transportation system (RATP). We show that QCNNs are more suitable than real-valued CNN to process multidimensional data and to code internal dependencies. Indeed, real-valued CNNs deal with both internal and external relations at the same level since components of an entity are processed independently. Experimental evidence is provided that the proposed QCNN architecture always outperforms real-valued equivalent CNN models in the theme identification task of the DECODA corpus. It is also shown that QCNN accuracy results are the best achieved so far on this task, while reducing by a factor of 4 the number of model parameters.
- Published
- 2018
174. Audiovisual speaker diarization of TV series
- Author
-
Georges Linarès, Xavier Bost, and Serigne Gueye
- Subjects
Speaker diarisation ,FOS: Computer and information sciences ,Modality (human–computer interaction) ,Computer Science - Computation and Language ,Series (mathematics) ,Computer science ,Speech recognition ,ComputerApplications_MISCELLANEOUS ,Intonation (linguistics) ,Set (psychology) ,Computation and Language (cs.CL) ,Computer Science - Multimedia ,Multimedia (cs.MM) - Abstract
Speaker diarization may be difficult to achieve when applied to narrative films, where speakers usually talk in adverse acoustic conditions: background music, sound effects, wide variations in intonation may hide the inter-speaker variability and make audio-based speaker diarization approaches error prone. On the other hand, such fictional movies exhibit strong regularities at the image level, particularly within dialogue scenes. In this paper, we propose to perform speaker diarization within dialogue scenes of TV series by combining the audio and video modalities: speaker diarization is first performed by using each modality; the two resulting partitions of the instance set are then optimally matched, before the remaining instances, corresponding to cases of disagreement between both modalities, are finally processed. The results obtained by applying such a multi-modal approach to fictional films turn out to outperform those obtained by relying on a single modality.
- Published
- 2018
175. Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
- Author
-
Chiheb Trabelsi, Titouan Parcollet, Yoshua Bengio, Ying Zhang, Renato De Mori, Mohamed Morchid, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Montreal Institute for Learning Algorithms [Montréal] (MILA), Centre de Recherches Mathématiques [Montréal] (CRM), and Université de Montréal (UdeM)-Université de Montréal (UdeM)
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,0209 industrial biotechnology ,Computer science ,Speech recognition ,Word error rate ,Machine Learning (stat.ML) ,TIMIT ,02 engineering and technology ,Convolutional neural network ,Computer Science - Sound ,Machine Learning (cs.LG) ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,020901 industrial engineering & automation ,Audio and Speech Processing (eess.AS) ,Statistics - Machine Learning ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,[INFO]Computer Science [cs] ,Quaternion ,Index Terms: quaternion convolutional neural networks ,Artificial neural network ,Quaternion algebra ,business.industry ,Deep learning ,deep learning ,auto- matic speech recognition ,Computer Science - Learning ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs., Accepted at INTERSPEECH 2018
- Published
- 2018
- Full Text
- View/download PDF
176. A TOPIC MODELING BASED REPRESENTATION TO DETECT TWEET LOCATIONS. EXAMPLE OF THE EVENT 'JE SUIS CHARLIE'
- Author
-
Mohamed Morchid, Richard Dufour, Eitan Altman, Yonathan Portilla, Georges Linarès, and Didier Josselin
- Subjects
Topic model ,lcsh:Applied optics. Photonics ,0209 industrial biotechnology ,Computer science ,Sample (statistics) ,02 engineering and technology ,computer.software_genre ,Latent Dirichlet allocation ,lcsh:Technology ,Set (abstract data type) ,symbols.namesake ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,Spatial analysis ,Information retrieval ,Event (computing) ,lcsh:T ,lcsh:TA1501-1820 ,lcsh:TA1-2040 ,symbols ,020201 artificial intelligence & image processing ,Data mining ,lcsh:Engineering (General). Civil engineering (General) ,computer ,Meaning (linguistics) - Abstract
Social Networks became a major actor in information propagation. Using the Twitter popular platform, mobile users post or relay messages from different locations. The tweet content, meaning and location, show how an event-such as the bursty one ”JeSuisCharlie”, happened in France in January 2015, is comprehended in different countries. This research aims at clustering the tweets according to the co-occurrence of their terms, including the country, and forecasting the probable country of a non-located tweet, knowing its content. First, we present the process of collecting a large quantity of data from the Twitter website. We finally have a set of 2,189 located tweets about “Charlie”, from the 7th to the 14th of January. We describe an original method adapted from the Author-Topic (AT) model based on the Latent Dirichlet Allocation (LDA) method. We define an homogeneous space containing both lexical content (words) and spatial information (country). During a training process on a part of the sample, we provide a set of clusters (topics) based on statistical relations between lexical and spatial terms. During a clustering task, we evaluate the method effectiveness on the rest of the sample that reaches up to 95% of good assignment. It shows that our model is pertinent to foresee tweet location after a learning process.
- Published
- 2015
177. Audio-Based Video Genre Identification
- Author
-
Georges Linarès, Stanislas Oger, Bernard Merialdo, Mickael Rouvier, Driss Matrouf, Yingbo Li, Laboratoire d'Informatique Fondamental, Université Pierre et Marie Curie - Paris 6 (UPMC), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and Eurecom [Sophia Antipolis]
- Subjects
Volunteered geographic information ,Acoustics and Ultrasonics ,Computer science ,video genre classification ,Speech recognition ,Feature extraction ,Index Terms—Automatic classification ,Word error rate ,Pragmatics ,linguistic feature extrac-tion ,Support vector machine ,Computational Mathematics ,Identification (information) ,Complementarity (molecular biology) ,Cepstrum ,Computer Science (miscellaneous) ,[INFO]Computer Science [cs] ,Electrical and Electronic Engineering - Abstract
International audience; —This paper presents investigations about the automatic identification of video genre by audio channel analysis. Genre refers to editorial styles such commercials, movies, sports… We propose and evaluate some methods based on both low and high level descriptors, in cepstral or time domains, but also by analyzing the global structure of the document and the linguistic contents. Then, the proposed features are combined and their complementarity is evaluated. On a database composed of single-stories web-videos, the best audio-only based system performs 9% of Classification Error Rate (CER). Finally, we evaluate the complementarity of the proposed audio features and video features that are classically used for Video Genre Identification (VGI). Results demonstrate the complementarity of the modalities for genre recognition, the final audio-video system reaching 6% CER.
- Published
- 2015
- Full Text
- View/download PDF
178. Extraction and Analysis of Dynamic Conversational Networks from TV Series
- Author
-
Serigne Gueye, Xavier Bost, Vincent Labatut, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, ANR & FR Agorantic, Mehmet Kaya, Jalal Kawash, Suheil Khoury, Min-Yuh Day, and ANR-14-CE24-0022,GaFes,Galeries des Festivals(2014)
- Subjects
Dynamic network analysis ,Computer science ,Open problem ,02 engineering and technology ,computer.software_genre ,[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI] ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Narrative ,Sequence ,Social network ,Series (mathematics) ,business.industry ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,020207 software engineering ,16. Peace & justice ,Dynamic Social Network ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,TV Series ,Plot Analysis ,Artificial intelligence ,business ,Focus (optics) ,computer ,Smoothing ,Natural language processing - Abstract
International audience; Identifying and characterizing the dynamics of modern tv series subplots is an open problem. One way is to study the underlying social network of interactions between the characters. Standard dynamic network extraction methods rely on temporal integration, either over the whole considered period, or as a sequence of several time-slices. However, they turn out to be inappropriate in the case of tv series, because the scenes shown onscreen alternatively focus on parallel storylines, and do not necessarily respect a traditional chronology. In this article, we introduce Narrative Smoothing, a novel network extraction method taking advantage of the plot properties to solve some of their limitations. We apply our method to a corpus of 3 popular series, and compare it to both standard approaches. Narrative smoothing leads to more relevant observations when it comes to the characterization of the protagonists and their relationships, confirming its appropriateness to model the intertwined storylines constituting the plots.
- Published
- 2018
- Full Text
- View/download PDF
179. Integration of Word and Semantic Features for Theme Identification in Telephone Conversations.
- Author
-
Yannick Estève, Mohamed Bouallegue, Carole Lailler, Mohamed Morchid, Richard Dufour, Georges Linarès, Driss Matrouf, and Renato De Mori
- Published
- 2015
- Full Text
- View/download PDF
180. Deep quaternion neural networks for spoken language understanding
- Author
-
Mohamed Morchid, Titouan Parcollet, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and Parcollet, Titouan
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Artificial neural network ,Computer science ,business.industry ,Deep learning ,deep learning ,020206 networking & telecommunications ,02 engineering and technology ,Construct (python library) ,[INFO] Computer Science [cs] ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Identification (information) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,[INFO]Computer Science [cs] ,Artificial intelligence ,Quaternion ,business ,Subspace topology ,Spoken language ,Abstraction (linguistics) - Abstract
International audience; The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility. The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters. Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.
- Published
- 2017
181. Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations
- Author
-
Mohamed Morchid, Georges Linarès, Titouan Parcollet, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
Computer science ,Speech recognition ,Index Terms: Spoken language understanding ,020206 networking & telecommunications ,02 engineering and technology ,Overfitting ,Perceptron ,Autoencoder ,Small set ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Denoising encoder-decoder neural networks ,Identification (information) ,symbols.namesake ,Gaussian noise ,Quaternion algebra ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,[INFO]Computer Science [cs] ,020201 artificial intelligence & image processing ,Quaternion ,Neural networks ,Subspace topology - Abstract
International audience; In the last decades, encoder-decoders or autoencoders (AE) have received a great interest from researchers due to their capability to construct robust representations of documents in a low dimensional subspace. Nonetheless, autoencoders reveal little in way of spoken document internal structure by only considering words or topics contained in the document as an isolate basic element, and tend to overfit with small corpus of documents. Therefore, Quaternion Multi-layer Perceptrons (QMLP) have been introduced to capture such internal latent dependencies , whereas denoising autoencoders (DAE) are composed with different stochastic noises to better process small set of documents. This paper presents a novel autoencoder based on both hitherto-proposed DAE (to manage small corpus) and the QMLP (to consider internal latent structures) called "Quater-nion denoising encoder-decoder" (QDAE). Moreover, the paper defines an original angular Gaussian noise adapted to the speci-ficity of hyper-complex algebra. The experiments, conduced on a theme identification task of spoken dialogues from the DE-CODA framework, show that the QDAE obtains the promising gains of 3% and 1.5% compared to the standard real valued de-noising autoencoder and the QMLP respectively.
- Published
- 2017
- Full Text
- View/download PDF
182. Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition
- Author
-
Georges Linarès, Dominique Fohr, Irina Illina, Imran Sheikh, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), Grid'5000, ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
- Subjects
Topic model ,Vocabulary ,Acoustics and Ultrasonics ,Computer science ,media_common.quotation_subject ,Speech recognition ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Context (language use) ,02 engineering and technology ,010501 environmental sciences ,Semantics ,computer.software_genre ,01 natural sciences ,Latent Dirichlet allocation ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,symbols.namesake ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Electrical and Electronic Engineering ,out-of-vocabulary ,proper names ,0105 earth and related environmental sciences ,media_common ,Context model ,business.industry ,Computational Mathematics ,large vocabulary continuous speech recognition ,Automatic indexing ,semantic context ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,Language model ,business ,computer ,Natural language processing - Abstract
International audience; The diachronic nature of broadcast news data leads to the problem of Out-Of-Vocabulary (OOV) words in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are Proper Names (PNs). However PNs are important for automatic indexing of audio-video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the semantic context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and semantic context derived from Latent Dirichlet Allocation (LDA) topic models, continuous word vector representations and the Neural Bag-of-Words (NBOW) model which is capable of learning task specific word and context representations. We propose a Neural Bag-of-Weighted Words (NBOW2) model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed context models.
- Published
- 2017
- Full Text
- View/download PDF
183. Graph-based Features for Automatic Online Abuse Detection
- Author
-
Richard Dufour, Georges Linarès, Vincent Labatut, Etienne Papegnies, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Provence Alpes Côte d'Azur Nectar de Code, Nathalie Camelin, Yannick Estève, and Carlos Martín-Vide
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,moderation ,Information retrieval ,Computer science ,Graph based ,Computer Science - Social and Information Networks ,02 engineering and technology ,Moderation ,[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI] ,Graph ,Abuse detection ,Computer Science - Information Retrieval ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Online communities ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,020204 information systems ,Obfuscation ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Text categorization ,Information Retrieval (cs.IR) - Abstract
International audience; While online communities have become increasingly important over the years, the moderation of user-generated content is still performed mostly manually. Automating this task is an important step in reducing the financial cost associated with moderation, but the majority of automated approaches strictly based on message content are highly vulnerable to intentional obfuscation. In this paper, we discuss methods for extracting conversational networks based on raw multi-participant chat logs, and we study the contribution of graph features to a classification system that aims to determine if a given message is abusive. The conversational graph-based system yields unexpectedly high performance , with results comparable to those previously obtained with a content-based approach.
- Published
- 2017
- Full Text
- View/download PDF
184. Improving multi-stream classification by mapping sequence-embedding in a high dimensional space
- Author
-
Mohamed Morchid, Mohamed Bouaziz, Richard Dufour, Georges Linarès, Département de Recherche en Ingéniérie des Véhicules pour l'Environnement (DRIVE), Université de Bourgogne (UB), Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
Sequence ,Artificial neural network ,Computer science ,business.industry ,Word error rate ,Context (language use) ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Support vector machine ,Recurrent neural network ,Hyperplane ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,Artificial intelligence ,business ,ComputingMilieux_MISCELLANEOUS ,0105 earth and related environmental sciences - Abstract
Most of the Natural and Spoken Language Processing tasks now employ Neural Networks (NN), allowing them to reach impressive performances. Embedding features allow the NLP systems to represent input vectors in a latent space and to improve the observed performances. In this context, Recurrent Neural Network (RNN) based architectures such as Long Short-Term Memory (LSTM) are well known for their capacity to encode sequential data into a non-sequential hidden vector representation, called sequence embedding. In this paper, we propose an LSTM-based multi-stream sequence embedding in order to encode parallel sequences by a single non-sequential latent representation vector. We then propose to map this embedding representation in a high-dimensional space using a Support Vector Machine (SVM) in order to classify the multi-stream sequences by finding out an optimal hyperplane. Multi-stream sequence embedding allowed the SVM classifier to more efficiently profit from information carried by both parallel streams and longer sequences. The system achieved the best performance, in a multi-stream sequence classification task, with a gain of 9 points in error rate compared to an SVM trained on the original input sequences.
- Published
- 2016
- Full Text
- View/download PDF
185. Quaternion Neural Networks for Spoken Language Understanding
- Author
-
Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori, Pierre-Michel Bousquet, Titouan Parcollet, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and Parcollet, Titouan
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Document Structure Description ,Computer science ,02 engineering and technology ,[INFO] Computer Science [cs] ,computer.software_genre ,Machine learning ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Quaternion ,Representation (mathematics) ,ComputingMilieux_MISCELLANEOUS ,Artificial neural network ,Quaternion algebra ,business.industry ,Deep learning ,Multilayer perceptron ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,computer ,Natural language processing ,Spoken language - Abstract
Machine Learning (ML) techniques have allowed a great performance improvement of different challenging Spoken Language Understanding (SLU) tasks. Among these methods, Neural Networks (NN), or Multilayer Perceptron (MLP), recently received a great interest from researchers due to their representation capability of complex internal structures in a low dimensional subspace. However, MLPs employ document representations based on basic word level or topic-based features. Therefore, these basic representations reveal little in way of document statistical structure by only considering words or topics contained in the document as a “bag-of-words”, ignoring relations between them. We propose to remedy this weakness by extending the complex features based on Quaternion algebra presented in [1] to neural networks called QMLP. This original QMLP approach is based on hyper-complex algebra to take into consideration features dependencies in documents. New document features, based on the document structure itself, used as input of the QMLP, are also investigated in this paper, in comparison to those initially proposed in [1]. Experiments made on a SLU task from a real framework of human spoken dialogues showed that our QMLP approach associated with the proposed document features outperforms other approaches, with an accuracy gain of 2% with respect to the MLP based on real numbers and more than 3% with respect to the first Quaternion-based features proposed in [1]. We finally demonstrated that less iterations are needed by our QMLP architecture to be efficient and to reach promising accuracies.
- Published
- 2016
186. Parallel Long Short-Term Memory for multi-stream classification
- Author
-
Richard Dufour, Mohamed Morchid, Georges Linarès, Renato De Mori, Mohamed Bouaziz, Département de Recherche en Ingéniérie des Véhicules pour l'Environnement (DRIVE), Université de Bourgogne (UB), Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
FOS: Computer and information sciences ,Sequence ,Computer Science - Computation and Language ,business.industry ,Computer science ,Speech recognition ,Process (computing) ,Pattern recognition ,Context (language use) ,02 engineering and technology ,Multi stream ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Machine Learning (cs.LG) ,Task (computing) ,Computer Science - Learning ,Recurrent neural network ,020204 information systems ,Logic gate ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Hidden Markov model ,Computation and Language (cs.CL) ,ComputingMilieux_MISCELLANEOUS - Abstract
Recently, machine learning methods have provided a broad spectrum of original and efficient algorithms based on Deep Neural Networks (DNN) to automatically predict an outcome with respect to a sequence of inputs. Recurrent hidden cells allow these DNN-based models to manage long-term dependencies such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM) directions. But most of the information available nowadays is from multistreams or multimedia documents, and require RNNs to process these information synchronously during the training. This paper presents an original LSTM-based architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel synchronized input sequences in order to predict a common output. The proposed PLSTM method could be used for parallel sequence classification purposes. The PLSTM approach is evaluated on an automatic telecast genre sequences classification task and compared with different state-of-the-art architectures. Results show that the proposed PLSTM method outperforms the baseline n-gram models as well as the state-of-the-art LSTM approach., Comment: 2016 IEEE Workshop on Spoken Language Technology
- Published
- 2016
- Full Text
- View/download PDF
187. Spoken Language Understanding in a Latent Topic-based Subspace
- Author
-
Killian Janod, Richard Dufour, Mohamed Morchid, Pierre-Michel Bousquet, Georges Linarès, Waad Ben Kheder, Mohamed Bouaziz, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Département de Recherche en Ingéniérie des Véhicules pour l'Environnement (DRIVE), and Université de Bourgogne (UB)
- Subjects
Computer science ,author-topic model ,factor analysis ,02 engineering and technology ,computer.software_genre ,Latent Dirichlet allocation ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,document clustering ,symbols.namesake ,Transcription (linguistics) ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,c-vector ,business.industry ,Document classification ,020206 networking & telecommunications ,Document clustering ,ComputingMethodologies_PATTERNRECOGNITION ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Subspace topology ,Spoken language - Abstract
International audience; Performance of spoken language understanding applications declines when spoken documents are automatically transcribed in noisy conditions due to high Word Error Rates (WER). To improve the robustness to transcription errors, recent solutions propose to map these automatic transcriptions in a latent space. These studies have proposed to compare classical topic-based representations such as Latent Dirichlet Allocation (LDA), supervised LDA and author-topic (AT) models. An original compact representation, called c-vector, has recently been introduced to walk around the tricky choice of the number of latent topics in these topic-based representations. Moreover, c-vectors allow to increase the robustness of document classification with respect to transcription errors by compacting different LDA representations of a same speech document in a reduced space and then compensate most of the noise of the document representation. The main drawback of this method is the number of sub-tasks needed to build the c-vector space. This paper proposes to both improve this compact representation (c-vector) of spoken documents and to reduce the number of needed sub-tasks, using an original framework in a robust low dimensional space of features from a set of AT models called "Latent Topic-based Sub-space" (LTS). In comparison to LDA, the AT model considers not only the dialogue content (words), but also the class related to the document. Experiments are conducted on the DECODA corpus containing speech conversations from the call-center of the RATP Paris transportation company. Results show that the original LTS representation outperforms the best previous compact representation (c-vector), with a substantial gain of more than 2.5% in terms of correctly labeled conversations.
- Published
- 2016
- Full Text
- View/download PDF
188. Deep Stacked Autoencoders for Spoken Language Understanding
- Author
-
Mohamed Morchid, Killian Janod, Richard Dufour, Renato De Mori, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
030507 speech-language pathology & audiology ,03 medical and health sciences ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,0305 other medical science ,Linguistics ,ComputingMilieux_MISCELLANEOUS ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Spoken language - Abstract
International audience
- Published
- 2016
- Full Text
- View/download PDF
189. Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
- Author
-
Irina Illina, Imran Sheikh, Georges Linarès, Dominique Fohr, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
- Subjects
Computer science ,Speech recognition ,Process (computing) ,oov ,Context (language use) ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Task (project management) ,Bag-of-words model ,0202 electrical engineering, electronic engineering, information engineering ,lvcsr ,Embedding ,Proper noun ,020201 artificial intelligence & image processing ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] ,Layer (object-oriented design) ,proper names ,0105 earth and related environmental sciences - Abstract
International audience; Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech recognition systems used to process di-achronic audio data. To enable recovery of the PNs missed by the system, relevant OOV PNs can be retrieved by exploiting the semantic context of the spoken content. In this paper, we explore the Neural Bag-of-Words (NBOW) model, proposed previously for text classification, to retrieve relevant OOV PNs. We propose a Neural Bag-of-Weighted-Words (NBOW2) model in which the input embedding layer is augmented with a context anchor layer. This layer learns to assign importance to input words and has the ability to capture (task specific) keywords in a NBOW model. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outper-form earlier methods based on raw embeddings from LDA and Skip-gram. Combining NBOW with NBOW2 gives faster convergence during training.
- Published
- 2016
- Full Text
- View/download PDF
190. Learning Word Importance with the Neural Bag-of-Words Model
- Author
-
Imran Sheikh, Georges Linarès, Dominique Fohr, Irina Illina, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Fohr, Dominique, and BLANC - Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio - - ContNomina2012 - ANR-12-BS02-0009 - BLANC - VALID
- Subjects
business.industry ,Computer science ,Speech recognition ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Task (project management) ,Bag-of-words model ,Classifier (linguistics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] ,[INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC] ,business ,computer ,Word (computer architecture) ,Natural language processing ,0105 earth and related environmental sciences - Abstract
The Neural Bag-of-Words (NBOW) model performs classification with an average of the input word vectors and achieves an impressive performance. While the NBOW model learns word vectors targeted for the classification task it does not explicitly model which words are important for given task. In this paper we propose an improved NBOW model with this ability to learn task specific word importance weights. The word importance weights are learned by introducing a new weighted sum composition of the word vectors. With experiments on standard topic and sentiment classification tasks, we show that (a) our proposed model learns meaningful word importance for a given task (b) our model gives best accuracies among the BOW approaches. We also show that the learned word importance weights are comparable to tf-idf based word weights when used as features in a BOWSVM classifier.
- Published
- 2016
191. Document level semantic context for retrieving OOV proper names
- Author
-
Irina Ulina, Georges Linarès, Dominique Fohr, Imran Sheikh, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Context (language use) ,02 engineering and technology ,computer.software_genre ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Document level ,Phonetic search technology ,0202 electrical engineering, electronic engineering, information engineering ,Semantic context ,Proper noun ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] ,proper names ,Information retrieval ,Training set ,business.industry ,Search engine indexing ,OOV ,semantic ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,computer ,Natural language processing ,indexing - Abstract
International audience; Recognition of Proper Names (PNs) in speech is important for content based indexing and browsing of audio-video data.However, many PNs are Out-Of-Vocabulary (OOV) words nfor LVCSR systems used in these applications due to the diachronicnature of data. By exploiting semantic context of the audio, relevant OOV PNs can be retrieved and then the target PNs can be recovered. To retrieve OOV PNs, we propose to represent their context with document level semantic vectors; and show that this approach is able to handle less frequent OOV PNs in the training data. We study different representations, including Random Projections, LSA, LDA, Skip-gram, CBOW and GloVe. A further evaluation of recovery of target OOV PNs using a phonetic search shows that document level semantic context is reliable for recovery of OOV PNs.
- Published
- 2016
- Full Text
- View/download PDF
192. Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval
- Author
-
Irina Illina, Georges Linarès, Dominique Fohr, Imane Nkairi, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Zygmunt Vetulani, Hans Uszkoreit, Marek Kubis, and ANR-12-BS02-0009,ContNomina,Exploitation du contexte pour la reconnaissance de noms propres dans les documents diachroniques audio(2012)
- Subjects
Vocabulary ,Out-of-vocabulary words ,Computer science ,media_common.quotation_subject ,Word error rate ,Context (language use) ,02 engineering and technology ,Proper names ,Speech recognition ,computer.software_genre ,Out of vocabulary ,Task (project management) ,0202 electrical engineering, electronic engineering, information engineering ,Selection (linguistics) ,Proper noun ,[INFO]Computer Science [cs] ,media_common ,business.industry ,020206 networking & telecommunications ,Key (cryptography) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Vocabulary augmentation - Abstract
International audience; Proper name recognition is a challenging task in information retrieval from large audio/video databases. Proper names are semantically rich and are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving proper names from contemporary diachronic text documents. We proposed methods that dynamically augment the automatic speech recognition system vocabulary using lexical and temporal features in diachronic documents. We also studied different metrics for proper name selection in order to limit the vocabulary augmentation and therefore the impact on the ASR performances. Recognition results show a significant reduction of the proper name error rate using an augmented vocabulary.
- Published
- 2016
- Full Text
- View/download PDF
193. Integrating imperfect transcripts into speech recognition systems for building high-quality corpora
- Author
-
Georges Linarès, Benjamin Lecouteux, Stanislas Oger, Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Laboratoire d'Informatique de Grenoble (LIG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
Computer science ,Speech recognition ,02 engineering and technology ,acoustic model training ,computer.software_genre ,text-to-speech alignment ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Theoretical Computer Science ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Search algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Voice activity detection ,business.industry ,Acoustic model ,Speech corpus ,Speech processing ,Human-Computer Interaction ,Scripting language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Transcription (software) ,0305 other medical science ,business ,Error detection and correction ,computer ,Software ,Natural language processing - Abstract
(Impact-F 1.46 estim. in 2012); International audience; The training of state-of-the-art automatic speech recognition (ASR) systems requires huge relevant training corpora. The cost of such databases is high and remains a major limitation for the development of speech-enabled applications in particular contexts (e.g. low-density languages, or specialized domains). On the other hand, a large amount of data can be found in news prompts, movie subtitles or scripts, etc. The use of such data as training corpus could provide a low-cost solution to the acoustic model estimation problem. Unfortunately, prior transcripts are seldom exact with respect to the content of the speech signal, and suffer from a lack of temporal information. This paper tackles the issue of prompt-based speech corpora improvement, by addressing the problems mentioned above. We propose a method allowing to locate accurate transcript segments in speech signals and automatically correct errors or lack of transcript surrounding these segments. This method relies on a new decoding strategy where the search algorithm is driven by the imperfect transcription of the input utterances. The experiments are conducted on the French language, by using the ESTER database and a set of records (and associated prompts) from RTBF (Radio Télévision Belge Francophone). The results demonstrate the effectiveness of the proposed approach, in terms of both error correction and text-to-speech alignment.
- Published
- 2012
- Full Text
- View/download PDF
194. Modeling nuisance variabilities with factor analysis for GMM-based audio pattern classification
- Author
-
Georges Linarès, Florian Verdet, Mickael Rouvier, Driss Matrouf, and Jean-François Bonastre
- Subjects
business.industry ,Computer science ,Speech recognition ,Context (language use) ,Pattern recognition ,computer.software_genre ,Speaker recognition ,Theoretical Computer Science ,Human-Computer Interaction ,Support vector machine ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Feature (machine learning) ,Artificial intelligence ,Computational linguistics ,Audio signal processing ,business ,computer ,Software - Abstract
Audio pattern classification represents a particular statistical classification task and includes, for example, speaker recognition, language recognition, emotion recognition, speech recognition and, recently, video genre classification. The feature being used in all these tasks is generally based on a short-term cepstral representation. The cepstral vectors contain at the same time useful information and nuisance variability, which are difficult to separate in this domain. Recently, in the context of GMM-based recognizers, a novel approach using a Factor Analysis (FA) paradigm has been proposed for decomposing the target model into a useful information component and a session variability component. This approach is called Joint Factor Analysis (JFA), since it models jointly the nuisance variability and the useful information, using the FA statistical method. The JFA approach has even been combined with Support Vector Machines, known for their discriminative power. In this article, we successfully apply this paradigm to three automatic audio processing applications: speaker verification, language recognition and video genre classification. This is done by applying the same process and using the same free software toolkit. We will show that this approach allows for a relative error reduction of over 50% in all the aforementioned audio processing tasks.
- Published
- 2011
- Full Text
- View/download PDF
195. Query-Driven Strategy for On-the-Fly Term Spotting in Spontaneous Speech
- Author
-
Georges Linarès, Mickael Rouvier, Benjamin Lecouteux, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
Audio mining ,Voice activity detection ,Acoustics and Ultrasonics ,Computer science ,business.industry ,Speech recognition ,Search engine indexing ,lcsh:QC221-246 ,Speech corpus ,Spotting ,Speech processing ,computer.software_genre ,lcsh:QA75.5-76.95 ,lcsh:Acoustics. Sound ,[INFO]Computer Science [cs] ,Speech analytics ,lcsh:Electronic computers. Computer science ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Utterance ,Natural language processing - Abstract
International audience; Spoken utterance retrieval was largely studied in the last decades, with the purpose of indexing large audio databases or of detecting keywords in continuous speech streams. While the indexing of closed corpora can be performed via a batch process, on-line spotting systems have to synchronously detect the targeted spoken utterances. We propose a two-level architecture for on-the-fly term spotting. The first level performs a fast detection of the speech segments that probably contain the targeted utterance. The second level refines the detection on the selected segments, by using a speech recognizer based on a query-driven decoding algorithm. Experiments are conducted on both broadcast and spontaneous speech corpora. We investigate the impact of the spontaneity level on system performance. Results show that our method remains effective even if the recognition rates are significantly degraded by disfluencies.
- Published
- 2010
- Full Text
- View/download PDF
196. Predicting popularity dynamics of online contents using data filtering methods
- Author
-
Georges Linarès, Rachid El-Azouzi, Cedric Richier, Tania Jimenez, Eitan Altman, Jimenez, Tania, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Models for the performance analysis and the control of networks (MAESTRO), Inria Sophia Antipolis - Méditerranée (CRISAM), and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
[INFO.INFO-MM] Computer Science [cs]/Multimedia [cs.MM] ,Exploit ,Computer science ,Process (engineering) ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,Baseline model ,020206 networking & telecommunications ,02 engineering and technology ,[INFO] Computer Science [cs] ,computer.software_genre ,Popularity ,[INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation ,Data filtering ,Dynamics (music) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,[INFO]Computer Science [cs] ,Data mining ,[INFO.INFO-MO] Computer Science [cs]/Modeling and Simulation ,computer ,ComputingMilieux_MISCELLANEOUS - Abstract
This paper proposes a new prediction process to explain and predicts popularity evolution of YouTube videos. We exploit prior study on the classification of YouTube videos in order to predict the evolution of videos' view-count. This classification allows to identify important factors of the observed popularity dynamics. In particular, we use this classification as filtering method allowing to identify the factors responsible for this popularity evolution. Results given by extensive experiments show that the proposed prediction process is able to reduce the average prediction errors compared to a state-of-the-art baseline model. We also evaluate the impact of adding popularity criteria in the classification.
- Published
- 2016
197. An Author-Topic based Approach to Cluster Tweets and Mine their Location
- Author
-
Richard Dufour, Didier Josselin, Georges Linarès, Yonathan Portilla, Mohamed Morchid, Jean-Valère Cossu, Alexandre Reiffers-Masson, Marc El-Bèze, Eitan Altman, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Models for the performance analysis and the control of networks (MAESTRO), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Études des Structures, des Processus d’Adaptation et des Changements de l’Espace (ESPACE), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Avignon Université (AU)-Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Université Nice Sophia Antipolis (1965 - 2019) (UNS), and Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU)
- Subjects
0209 industrial biotechnology ,Keywords: Author-Topic model ,Computer science ,Twitter ,Sample (statistics) ,02 engineering and technology ,computer.software_genre ,Latent Dirichlet allocation ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Task (project management) ,Set (abstract data type) ,symbols.namesake ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Author-Topic model ,Tweet location ,Cluster analysis ,Spatial analysis ,ComputingMilieux_MISCELLANEOUS ,General Environmental Science ,Tweets location ,Information retrieval ,Process (computing) ,[SHS.GEO]Humanities and Social Sciences/Geography ,Topic modeling ,Author topic model ,symbols ,General Earth and Planetary Sciences ,020201 artificial intelligence & image processing ,Data mining ,computer ,Meaning (linguistics) - Abstract
Presented as poster at Spatial Statistics Conference 2015, Avignon, France, June 2015; International audience; Social Networks became a major actor in information propagation. Using the Twitter popular platform, mobile users post or relaymessages from different locations. The tweet content, meaning and location show how an event-such as the bursty one“JeSuisCharlie'” happened in France in January 2015 is comprehended in different countries. This research aims at clustering thetweets according to the co-occurrence of their terms, including the country, and forecasting the probable country of a non locatedtweet, knowing its content. First, we present the process of collecting a large quantity of data from the Twitter website. Wefinally have a set of 2.189 located tweets about “Charlie'', from the 7th to the 14th of January. We describe an original methodadapted from the Author-Topic (AT) model based on the Latent Dirichlet Allocation method (LDA). We define a homogeneousspace containing both lexical content (words) and spatial information (country). During a training process on a part of the sample,we provide a set of clusters (topics) based on statistical relations between lexical and spatial terms. During a clustering task, weevaluate the method effectiveness on the rest of the sample that reaches up to 95% of good assignment.
- Published
- 2015
- Full Text
- View/download PDF
198. Author-topic based representation of call-center conversations
- Author
-
Richard Dufour, Mohamed Morchid, Mohamed Bouallegue, Georges Linarès, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
Process (engineering) ,Computer science ,Human/human con-versation ,Context (language use) ,02 engineering and technology ,Space (commercial competition) ,Speech recognition ,computer.software_genre ,01 natural sciences ,Latent Dirichlet allocation ,Task (project management) ,010104 statistics & probability ,symbols.namesake ,Transcription (linguistics) ,0202 electrical engineering, electronic engineering, information engineering ,Speech analytics ,[INFO]Computer Science [cs] ,Latent Dirichlet Allocation ,0101 mathematics ,business.industry ,Representation (systemics) ,Index Terms— Author-Topic model ,Classification ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
International audience; Performance of Automatic Speech Recognition (ASR) systems drops dramatically when transcribing conversations recorded in noisy conditions. Speech analytics suffer from this poor automatic transcription quality. To tackle this difficulty , a solution consists in mapping transcriptions into a space of hidden topics. This abstract representation allows to substantiate the drawbacks of the ASR process. The well-known and commonly used one is the topic-based representation from a Latent Dirichlet Allocation (LDA). Several studies demonstrate the effectiveness and reliability of this high-level representation. During the LDA learning process, distribution of words into each topic is estimated automatically. Nonetheless, in the context of a classification task, no consideration is made for the targeted classes. Thus, if the targeted application is to find out the main theme related to a dialogue, this information should be taken into consideration. In this paper, we propose to compare a classical topic-based representation of a dialogue, with a new one based not only on the dialogue content itself (words), but also on the theme related to the dialogue. This original representation is based on the author-topic (AT) model. The effectiveness of the proposed representation is evaluated on a classification task from automatic dialogue transcriptions between an agent and a customer of the Paris Transportation Company. Experiments confirmed that this author-topic model approach outperforms by far the classical topic representation, with a substantial gain of more than 7% in terms of correctly labeled conversations.
- Published
- 2014
- Full Text
- View/download PDF
199. Feature selection using Principal Component Analysis for massive retweet detection
- Author
-
Richard Dufour, Pierre-Michel Bousquet, Mohamed Morchid, Georges Linarès, Juan-Manuel Torres-Moreno, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
Computer science ,020207 software engineering ,Context (language use) ,Feature selection ,02 engineering and technology ,computer.software_genre ,Popularity ,Set (abstract data type) ,Artificial Intelligence ,Signal Processing ,Principal component analysis ,0202 electrical engineering, electronic engineering, information engineering ,Selection (linguistics) ,020201 artificial intelligence & image processing ,[INFO]Computer Science [cs] ,Computer Vision and Pattern Recognition ,Data mining ,computer ,Software - Abstract
International audience; Social networks become a major actor in massive information propagation. In the context of the Twitter platform, its popularity is due in part to the capability of relaying messages (i.e. tweets) posted by users. This particular mechanism, called retweet, allows users to massively share tweets they consider as potentially interesting for others. In this paper, we propose to study the behavior of tweets that have been massively retweeted in a short period of time. We first analyze specific tweet features through a Principal Component Analysis (PCA) to better understand the behavior of highly forwarded tweets as opposed to those retweeted only a few times. Finally, we propose to automatically detect the massively retweeted messages. The qualitative study is used to select the features allowing the best classification performance. We show that the selection of only the most correlated features, leads to the best classification accuracy (F-measure of 65.7%), with a gain of about 2.4 points in comparison to the use of the complete set of features.
- Published
- 2014
- Full Text
- View/download PDF
200. Theme identification in human-human conversations with features from specific speaker type hidden spaces
- Author
-
Richard Dufour, Mohamed Bouallegue, Georges Linarès, Mohamed Morchid, Renato De Mori, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, McGill University = Université McGill [Montréal, Canada], and Déposants HAL-Avignon, bibliothèque Universitaire
- Subjects
LDA ,Computer science ,Gaussian ,human/human telephone conversation analysis ,02 engineering and technology ,[INFO] Computer Science [cs] ,Type (model theory) ,computer.software_genre ,Machine learning ,Set (abstract data type) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,symbols.namesake ,Simple (abstract algebra) ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Estimation theory ,business.industry ,Index Terms: Spoken language understanding ,020206 networking & telecommunications ,topic identification ,Identification (information) ,symbols ,Artificial intelligence ,0305 other medical science ,business ,computer ,Theme (computing) ,Natural language processing ,Word (computer architecture) - Abstract
International audience; This paper describes a research on topic identification in a real-world customer service telephone conversations between an agent and a customer. Separate hidden spaces are considered for agents, customers and the combination of them. The purpose is to separate semantic constituents from the speaker types and their possible relations. Probabilities of hidden topic features are then used by separate Gaussian classifiers to compute theme probabilities for each speaker type. A simple strategy, that does not require any additional parameter estimation, is introduced to classify themes with confidence indicators for each theme hypothesis. Experimental results on a real-life application show that the use of features from speaker type specific hidden spaces capture useful semantic contents with significantly superior performance with respect to independent word-based features or a single set of features. Experimental results also show that the proposed strategy makes it possible to perform surveys on collections of conversations by automatically selecting processed samples with high theme identification accuracy.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.