67 results on '"Louis-Philippe Morency"'
Search Results
2. What is Multimodal?
- Author
-
Louis-Philippe Morency
- Published
- 2022
- Full Text
- View/download PDF
3. Toward Causal Understanding of Therapist-Client Relationships: A Study of Language Modality and Social Entrainment
- Author
-
Alexandria Vail, Jeffrey Girard, Lauren Bylsma, Jeffrey Cohn, Jay Fournier, Holly Swartz, and Louis-Philippe Morency
- Published
- 2022
- Full Text
- View/download PDF
4. Crossmodal Clustered Contrastive Learning: Grounding of Spoken Language to Gesture
- Author
-
Dong Won Lee, Chaitanya Ahuja, and Louis-Philippe Morency
- Published
- 2021
- Full Text
- View/download PDF
5. Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment
- Author
-
Peter Wu, Liu Ziyin, Louis-Philippe Morency, Paul Pu Liang, and Ruslan Salakhutdinov
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Modality (human–computer interaction) ,Modalities ,Meta learning (computer science) ,Computer Science - Artificial Intelligence ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,computer.software_genre ,Object (computer science) ,Machine Learning (cs.LG) ,Task (project management) ,Multimodal learning ,Artificial Intelligence (cs.AI) ,Generalization (learning) ,Artificial intelligence ,business ,Computation and Language (cs.CL) ,computer ,Natural language processing ,Spoken language - Abstract
How can we generalize to a new prediction task at test time when it also uses a new modality as input? More importantly, how can we do this with as little annotated data as possible? This problem of cross-modal generalization is a new research milestone with concrete impact on real-world applications. For example, can an AI system start understanding spoken language from mostly written text? Or can it learn the visual steps of a new recipe from only text descriptions? In this work, we formalize cross-modal generalization as a learning paradigm to train a model that can (1) quickly perform new tasks (from new domains) while (2) being originally trained on a different input modality. Such a learning paradigm is crucial for generalization to low-resource modalities such as spoken speech in rare languages while utilizing a different high-resource modality such as text. One key technical challenge that makes it different from other learning paradigms such as meta-learning and domain adaptation is the presence of different source and target modalities which will require different encoders. We propose an effective solution based on meta-alignment, a novel method to align representation spaces using strongly and weakly paired cross-modal data while ensuring quick generalization to new tasks across different modalities. This approach uses key ideas from cross-modal learning and meta-learning, and presents strong results on the cross-modal generalization problem. We benchmark several approaches on 3 real-world classification tasks: few-shot recipe classification from text to images of recipes, object classification from images to audio of objects, and language classification from text to spoken speech across 100 languages spanning many rare languages. Our results demonstrate strong performance even when the new target modality has only a few (1-10) labeled samples and in the presence of noisy labels, a scenario particularly prevalent in low-resource modalities.
- Published
- 2021
- Full Text
- View/download PDF
6. Multimodal and Multitask Approach to Listener's Backchannel Prediction
- Author
-
Michal Muszynski, Ryo Ishii, Xutong Ren, and Louis-Philippe Morency
- Subjects
Backchannel ,Human–computer interaction ,Computer science ,media_common.quotation_subject ,Multi-task learning ,Active listening ,Conversation ,Function (engineering) ,Sensory cue ,media_common ,Task (project management) ,Dyad - Abstract
The listener's backchannel has the important function of encouraging a current speaker to hold their turn and continue to speak, which enables smooth conversation. The listener monitors the speaker's turn-management (a.k.a. speaking and listening) willingness and his/her own willingness to display backchannel behavior. Many studies have focused on predicting the appropriate timing of the backchannel so that conversational agents can display backchannel behavior in response to a user who is speaking. To the best of our knowledge, none of them added the prediction of turn-changing and participants' turn-management willingness to the backchannel prediction model in dyad interactions. In this paper, we proposed a novel backchannel prediction model that can jointly predict turn-changing and turn-management willingness. We investigated the impact of modeling turn-changing and willingness to improve backchannel prediction. Our proposed model is based on trimodal inputs, that is, acoustic, linguistic, and visual cues from conversations. Our results suggest that adding turn-management willingness as a prediction task improves the performance of backchannel prediction within the multi-modal multi-task learning approach, while adding turn-changing prediction is not useful for improving the performance of backchannel prediction.
- Published
- 2021
- Full Text
- View/download PDF
7. Impact of Personality on Nonverbal Behavior Generation
- Author
-
Louis-Philippe Morency, Ryo Ishii, Yukiko I. Nakano, and Chaitanya Ahuja
- Subjects
media_common.quotation_subject ,020207 software engineering ,Virtual agent ,02 engineering and technology ,Nonverbal communication ,Nonverbal behavior ,0202 electrical engineering, electronic engineering, information engineering ,Eye tracking ,Personality ,020201 artificial intelligence & image processing ,Big Five personality traits ,Psychology ,Spoken language ,Cognitive psychology ,Dyad ,media_common - Abstract
To realize natural-looking virtual agents, one key technical challenge is to automatically generate nonverbal behaviors from spoken language. Since nonverbal behavior varies depending on personality, it is important to generate these nonverbal behaviors to match the expected personality of a virtual agent. In this work, we study how personality traits relate to the process of generating individual nonverbal behaviors from the whole body, including the head, eye gaze, arms, and posture. To study this, we first created a dialogue corpus including transcripts, a broad range of labelled nonverbal behaviors, and the Big Five personality scores of participants in dyad interactions. We constructed models that can predict each nonverbal behavior label given as an input language representation from the participants' spoken sentences. Our experimental results show that personality can help improve the prediction of nonverbal behaviors.
- Published
- 2020
- Full Text
- View/download PDF
8. Can Prediction of Turn-management Willingness Improve Turn-changing Modeling?
- Author
-
Michal Muszynski, Ryo Ishii, Xutong Ren, and Louis-Philippe Morency
- Subjects
Modalities ,media_common.quotation_subject ,05 social sciences ,Multi-task learning ,050105 experimental psychology ,Task (project management) ,03 medical and health sciences ,0302 clinical medicine ,0501 psychology and cognitive sciences ,Conversation ,Active listening ,Psychology ,Sensory cue ,030217 neurology & neurosurgery ,media_common ,Cognitive psychology ,Dyad - Abstract
For smooth conversation, participants must carefully monitor the turn-management (a.k.a. speaking and listening) willingness of other conversational partners and adjust turn-changing behaviors accordingly. Many studies have focused on predicting the actual moments of speaker changes (a.k.a. turn-changing), but to the best of our knowledge, none of them explicitly modeled the turn-management willingness from both speakers and listeners in dyad interactions. We address the problem of building models for predicting this willingness of both. Our models are based on trimodal inputs, including acoustic, linguistic, and visual cues from conversations. We also study the impact of modeling willingness to help improve the task of turn-changing prediction. We introduce a dyadic conversation corpus with annotated scores of speaker/listener turn-management willingness. Our results show that using all of three modalities of speaker and listener is important for predicting turn-management willingness. Furthermore, explicitly adding willingness as a prediction task improves the performance of turn-changing prediction. Also, turn-management willingness prediction becomes more accurate with this multi-task learning approach.
- Published
- 2020
- Full Text
- View/download PDF
9. Multimodal Behavioral Markers Exploring Suicidal Intent in Social Media Videos
- Author
-
Vaibhav Vaibhav, Mahmoud Ismail, Vasu Sharma, Jeffrey M. Girard, Ankit Shah, and Louis-Philippe Morency
- Subjects
050103 clinical psychology ,Computer science ,05 social sciences ,Context (language use) ,Suicidal intent ,050105 experimental psychology ,Hand movements ,Developmental psychology ,medicine ,0501 psychology and cognitive sciences ,Social media ,medicine.symptom ,Set (psychology) ,Suicidal ideation - Abstract
Suicide is one of the leading causes of death in the modern world. In this digital age, individuals are increasingly using social media to express themselves and often use these platforms to express suicidal intent. Various studies have inspected suicidal intent behavioral markers in controlled environments but it is still unexplored if such markers will generalize to suicidal intent expressed on social media. In this work, we set out to study multimodal behavioral markers related to suicidal intent when expressed on social media videos. We explore verbal, acoustic and visual behavioral markers in the context of identifying individuals at higher risk of suicidal attempt. Our analysis reveals that frequent silences, slouched shoulders, rapid hand movements and profanity are predominant multimodal behavioral markers indicative of suicidal intent1.
- Published
- 2019
- Full Text
- View/download PDF
10. ElderReact: A Multimodal Dataset for Recognizing Emotional Response in Aging Adults
- Author
-
Xinyu Wang, Kaixin Ma, Mingtong Zhang, Jeffrey M. Girard, Xinru Yang, and Louis-Philippe Morency
- Subjects
education.field_of_study ,Computer science ,media_common.quotation_subject ,Population ,020207 software engineering ,02 engineering and technology ,Anger ,Disgust ,Sadness ,Surprise ,0202 electrical engineering, electronic engineering, information engineering ,Happiness ,020201 artificial intelligence & image processing ,Emotional expression ,Valence (psychology) ,education ,media_common ,Cognitive psychology - Abstract
Automatic emotion recognition plays a critical role in technologies such as intelligent agents and social robots and is increasingly being deployed in applied settings such as education and healthcare. Most research to date has focused on recognizing the emotional expressions of young and middle-aged adults and, to a lesser extent, children and adolescents. Very few studies have examined automatic emotion recognition in older adults (i.e., elders), which represent a large and growing population worldwide. Given that aging causes many changes in facial shape and appearance and has been found to alter patterns of nonverbal behavior, there is strong reason to believe that automatic emotion recognition systems may need to be developed specifically (or augmented) for the elder population. To promote and support this type of research, we introduce a newly collected multimodal dataset of elders reacting to emotion elicitation stimuli. Specifically, it contains 1323 video clips of 46 unique individuals with human annotations of six discrete emotions: anger, disgust, fear, happiness, sadness, and surprise as well as valence. We present a detailed analysis of the most indicative features for each emotion. We also establish several baselines using unimodal and multimodal features on this dataset. Finally, we show that models trained on dataset of another age group do not generalize well on elders.
- Published
- 2019
- Full Text
- View/download PDF
11. Learning an appearance-based gaze estimator from one million synthesised images
- Author
-
Louis-Philippe Morency, Andreas Bulling, Tadas Baltrusaitis, Erroll Wood, and Peter Robinson
- Subjects
appearance-based gaze estimation ,business.industry ,learning-by-synthesis ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Appearance based ,Estimator ,020207 software engineering ,3d model ,real-time rendering ,02 engineering and technology ,Animation ,Gaze ,Real-time rendering ,Rendering (computer graphics) ,medicine.anatomical_structure ,Geography ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,020201 artificial intelligence & image processing ,Human eye ,Computer vision ,Artificial intelligence ,3D morphable model ,business - Abstract
Learning-based methods for appearance-based gaze estimation achieve state-of-the-art performance in challenging real-world settings but require large amounts of labelled training data. Learning-by-synthesis was proposed as a promising solution to this problem but current methods are limited with respect to speed, appearance variability, and the head pose and gaze angle distribution they can synthesize. We present UnityEyes, a novel method to rapidly synthesize large amounts of variable eye region images as training data. Our method combines a novel generative 3D model of the human eye region with a real-time rendering framework. The model is based on high-resolution 3D face scans and uses real-time approximations for complex eyeball materials and structures as well as anatomically inspired procedural geometry methods for eyelid animation. We show that these synthesized images can be used to estimate gaze in difficult in-the-wild scenarios, even for extreme gaze angles or in cases in which the pupil is fully occluded. We also demonstrate competitive gaze estimation results on a benchmark in-the-wild dataset, despite only using a light-weight nearest-neighbor algorithm. We are making our UnityEyes synthesis framework available online for the benefit of the research community.
- Published
- 2018
- Full Text
- View/download PDF
12. Toward Objective, Multifaceted Characterization of Psychotic Disorders
- Author
-
Louis-Philippe Morency, Elizabeth Liebson, Alexandria Katarina Vail, and Justin T. Baker
- Subjects
Psychosis ,Computer science ,05 social sciences ,medicine.disease ,Mental illness ,Clinical decision support system ,050105 experimental psychology ,03 medical and health sciences ,Nonverbal communication ,Fluency ,0302 clinical medicine ,Speech disfluency ,medicine ,0501 psychology and cognitive sciences ,Written language ,030217 neurology & neurosurgery ,Cognitive psychology ,Spoken language - Abstract
Psychotic disorders are forms of severe mental illness characterized by abnormal social function and a general sense of disconnect with reality. The evaluation of such disorders is often complex, as their multifaceted nature is often difficult to quantify. Multimodal behavior analysis technologies have the potential to help address this need and supply timelier and more objective decision support tools in clinical settings. While written language and nonverbal behaviors have been previously studied, the present analysis takes the novel approach of examining the rarely-studied modality of spoken language of individuals with psychosis as naturally used in social, face-to-face interactions. Our analyses expose a series of language markers associated with psychotic symptom severity, as well as interesting interactions between them. In particular, we examine three facets of spoken language: (1) lexical markers, through a study of the function of words; (2) structural markers, through a study of grammatical fluency; and (3) disfluency markers, through a study of dialogue self-repair. Additionally, we develop predictive models of psychotic symptom severity, which achieve significant predictive power on both positive and negative psychotic symptom scales. These results constitute a significant step toward the design of future multimodal clinical decision support tools for computational phenotyping of mental illness.
- Published
- 2018
- Full Text
- View/download PDF
13. Temporally Selective Attention Model for Social and Affective State Recognition in Multimedia Content
- Author
-
Justine Cassell, Liangke Gui, Michael Madaio, Louis-Philippe Morency, Amy Ogan, and Hongliang Yu
- Subjects
Multimedia ,Computer science ,Process (engineering) ,Speech recognition ,media_common.quotation_subject ,05 social sciences ,Sentiment analysis ,02 engineering and technology ,computer.software_genre ,Affect (psychology) ,050105 experimental psychology ,Perceptual system ,Salience (neuroscience) ,Salient ,Content (measure theory) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Selective attention ,Function (engineering) ,computer ,media_common - Abstract
The sheer amount of human-centric multimedia content has led to increased research on human behavior understanding. Most existing methods model behavioral sequences without considering the temporal saliency. This work is motivated by the psychological observation that temporally selective attention enables the human perceptual system to process the most relevant information. In this paper, we introduce a new approach, named Temporally Selective Attention Model (TSAM), designed to selectively attend to salient parts of human-centric video sequences. Our TSAM models learn to recognize affective and social states using a new loss function called speaker-distribution loss. Extensive experiments show that our model achieves the state-of-the-art performance on rapport detection and multimodal sentiment analysis. We also show that our speaker-distribution loss function can generalize to other computational models, improving the prediction performance of deep averaging network and Long Short Term Memory (LSTM).
- Published
- 2017
- Full Text
- View/download PDF
14. Exceptionally Social
- Author
-
Behnaz Nojavanasghari, Charles E. Hughes, and Louis-Philippe Morency
- Subjects
education ,020207 software engineering ,02 engineering and technology ,Recommender system ,medicine.disease ,Multimodal interaction ,Social skills ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Social design ,Autism ,020201 artificial intelligence & image processing ,Interactor ,Psychology ,Cognitive load ,Simulation ,Avatar ,Cognitive psychology - Abstract
Avatar-mediated and virtual environments hold a unique potential for promoting social skills in children with autism. This paper describes the design of "Exceptionally Social," which is an interactive system that uses avatars to mediate human-to-human interactions for social skills training of children with autism. This system aims to offer the following functionalities: (1) gives children the opportunity to practice social skills in a safe environment, under various contexts. (2) changes the dynamics of the interactions based on the child's affective states. (3) provides visual support for children to teach them different social skills and facilitate their learning. (4) reduces the cognitive load on the interactor (a trained human orchestrating the avatars' behaviors) by providing real-time feedback about a child's affective states and suggesting appropriate visual supports using a recommendation system.
- Published
- 2017
- Full Text
- View/download PDF
15. EmoReact: a multimodal approach and dataset for recognizing emotional responses in children
- Author
-
Behnaz Nojavanasghari, Charles E. Hughes, Louis-Philippe Morency, and Tadas Baltrusaitis
- Subjects
Social robot ,Computer science ,media_common.quotation_subject ,Emotion classification ,05 social sciences ,Multimodal therapy ,02 engineering and technology ,Facial analysis ,0202 electrical engineering, electronic engineering, information engineering ,Curiosity ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Emotion recognition ,Valence (psychology) ,050104 developmental & child psychology ,media_common ,Cognitive psychology - Abstract
Automatic emotion recognition plays a central role in the technologies underlying social robots, affect-sensitive human computer interaction design and affect-aware tutors. Although there has been a considerable amount of research on automatic emotion recognition in adults, emotion recognition in children has been understudied. This problem is more challenging as children tend to fidget and move around more than adults, leading to more self-occlusions and non-frontal head poses. Also, the lack of publicly available datasets for children with annotated emotion labels leads most researchers to focus on adults. In this paper, we introduce a newly collected multimodal emotion dataset of children between the ages of four and fourteen years old. The dataset contains 1102 audio-visual clips annotated for 17 different emotional states: six basic emotions, neutral, valence and nine complex emotions including curiosity, uncertainty and frustration. Our experiments compare unimodal and multimodal emotion recognition baseline models to enable future research on this topic. Finally, we present a detailed analysis of the most indicative behavioral cues for emotion recognition in children.
- Published
- 2016
- Full Text
- View/download PDF
16. Deep multimodal fusion for persuasiveness prediction
- Author
-
Jayanth Koushik, Deepak Gopinath, Tadas Baltrusaitis, Louis-Philippe Morency, and Behnaz Nojavanasghari
- Subjects
Multimodal fusion ,Persuasion ,Modalities ,Multimedia ,Computer science ,media_common.quotation_subject ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Data science ,Leverage (negotiation) ,0202 electrical engineering, electronic engineering, information engineering ,Trait ,Personality ,020201 artificial intelligence & image processing ,Social multimedia ,Architecture ,computer ,media_common - Abstract
Persuasiveness is a high-level personality trait that quantifies the influence a speaker has on the beliefs, attitudes, intentions, motivations, and behavior of the audience. With social multimedia becoming an important channel in propagating ideas and opinions, analyzing persuasiveness is very important. In this work, we use the publicly available Persuasive Opinion Multimedia (POM) dataset to study persuasion. One of the challenges associated with this problem is the limited amount of annotated data. To tackle this challenge, we present a deep multimodal fusion architecture which is able to leverage complementary information from individual modalities for predicting persuasiveness. Our methods show significant improvement in performance over previous approaches.
- Published
- 2016
- Full Text
- View/download PDF
17. Recognizing Human Actions in the Motion Trajectories of Shapes
- Author
-
Melissa Roemmele, Soja-Marie Morgens, Andrew S. Gordon, and Louis-Philippe Morency
- Subjects
Movement (music) ,Computer science ,business.industry ,media_common.quotation_subject ,05 social sciences ,02 engineering and technology ,Animation ,Crowdsourcing ,050105 experimental psychology ,Motion (physics) ,Recurrent neural network ,Human–computer interaction ,Perception ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Motion perception ,business ,ComputingMethodologies_COMPUTERGRAPHICS ,media_common - Abstract
People naturally anthropomorphize the movement of nonliving objects, as social psychologists Fritz Heider and Marianne Simmel demonstrated in their influential 1944 research study. When they asked participants to narrate an animated film of two triangles and a circle moving in and around a box, participants described the shapes' movement in terms of human actions. Using a framework for authoring and annotating animations in the style of Heider and Simmel, we established new crowdsourced datasets where the motion trajectories of animated shapes are labeled according to the actions they depict. We applied two machine learning approaches, a spatial-temporal bag-of-words model and a recurrent neural network, to the task of automatically recognizing actions in these datasets. Our best results outperformed a majority baseline and showed similarity to human performance, which encourages further use of these datasets for modeling perception from motion trajectories. Future progress on simulating human-like motion perception will require models that integrate motion information with top-down contextual knowledge.
- Published
- 2016
- Full Text
- View/download PDF
18. ERM4CT 2015
- Author
-
Ronald Böck, Kim Hartmann, Louis-Philippe Morency, Albert Ali Salah, Ingo Siegert, and Björn Schuller
- Subjects
Cognitive science ,Multimedia ,Representation (arts) ,ddc:004 ,computer.software_genre ,Psychology ,computer ,Multimodal interaction ,Multimodality - Abstract
In this paper the organisers present a brief overview of the Workshop on Emotion Representation and Modelling for Companion Systems (ERM4CT). The ERM4CT 2015 Workshop is held in conjunction with the 17th ACM International Conference on Multimodal Interaction (ICMI 2015) taking place in Seattle, USA. The ERM4CT is the follow-up of three previous workshops on emotion modelling for affective human-computer interaction and companion systems. Apart from its usual focus on emotion representations and models, this year's ERM4HCI puts special emphasis on.
- Published
- 2015
- Full Text
- View/download PDF
19. Multimodal Public Speaking Performance Assessment
- Author
-
Louis-Philippe Morency, Mathieu Chollet, Torsten Wörtwein, Stefan Scherer, Rainer Stiefelhagen, and Boris Schauerte
- Subjects
Public speaking ,Nonverbal communication ,Tree (data structure) ,Nonverbal behavior ,Modalities ,Multimedia ,Computer science ,ComputerApplications_MISCELLANEOUS ,Key (cryptography) ,Overall performance ,computer.software_genre ,Everyday life ,computer - Abstract
The ability to speak proficiently in public is essential for many professions and in everyday life. Public speaking skills are difficult to master and require extensive training. Recent developments in technology enable new approaches for public speaking training that allow users to practice in engaging and interactive environments. Here, we focus on the automatic assessment of nonverbal behavior and multimodal modeling of public speaking behavior. We automatically identify audiovisual nonverbal behaviors that are correlated to expert judges' opinions of key performance aspects. These automatic assessments enable a virtual audience to provide feedback that is essential for training during a public speaking performance. We utilize multimodal ensemble tree learners to automatically approximate expert judges' evaluations to provide post-hoc performance assessments to the speakers. Our automatic performance evaluation is highly correlated with the experts' opinions with r = 0.745 for the overall performance assessments. We compare multimodal approaches with single modalities and find that the multimodal ensembles consistently outperform single modalities.
- Published
- 2015
- Full Text
- View/download PDF
20. Session details: Oral Session 4: Communication Dynamics
- Author
-
Louis-Philippe Morency
- Subjects
Multimedia ,Computer science ,Session (computer science) ,computer.software_genre ,Process communication ,computer - Published
- 2015
- Full Text
- View/download PDF
21. Exploring Behavior Representation for Learning Analytics
- Author
-
Louis-Philippe Morency, Marcelo Worsley, Paulo Blikstein, and Stefan Scherer
- Subjects
Service (systems architecture) ,Computer science ,business.industry ,Learning analytics ,Machine learning ,computer.software_genre ,Data segment ,Data science ,Learning sciences ,Task (project management) ,Segmentation ,Artificial intelligence ,Engineering design process ,business ,Representation (mathematics) ,computer - Abstract
Multimodal analysis has long been an integral part of studying learning. Historically multimodal analyses of learning have been extremely laborious and time intensive. However, researchers have recently been exploring ways to use multimodal computational analysis in the service of studying how people learn in complex learning environments. In an effort to advance this research agenda, we present a comparative analysis of four different data segmentation techniques. In particular, we propose affect- and pose-based data segmentation, as alternatives to human-based segmentation, and fixed-window segmentation. In a study of ten dyads working on an open-ended engineering design task, we find that affect- and pose-based segmentation are more effective, than traditional approaches, for drawing correlations between learning-relevant constructs, and multimodal behaviors. We also find that pose-based segmentation outperforms the two more traditional segmentation strategies for predicting student success on the hands-on task. In this paper we discuss the algorithms used, our results, and the implications that this work may have in non-education-related contexts.
- Published
- 2015
- Full Text
- View/download PDF
22. Combining Two Perspectives on Classifying Multimodal Data for Recognizing Speaker Traits
- Author
-
Moitreya Chatterjee, Stefan Scherer, Louis-Philippe Morency, and Sunghyun Park
- Subjects
Modalities ,Computer science ,business.industry ,Context (language use) ,Machine learning ,computer.software_genre ,Generative model ,Discriminative model ,Semantic similarity ,Similarity (psychology) ,Artificial intelligence ,business ,Cluster analysis ,computer ,Gesture - Abstract
Human communication involves conveying messages both through verbal and non-verbal channels (facial expression, gestures, prosody, etc.). Nonetheless, the task of learning these patterns for a computer by combining cues from multiple modalities is challenging because it requires effective representation of the signals and also taking into consideration the complex interactions between them. From the machine learning perspective this presents a two-fold challenge: a) Modeling the intermodal variations and dependencies; b) Representing the data using an apt number of features, such that the necessary patterns are captured but at the same time allaying concerns such as over-fitting. In this work we attempt to address these aspects of multimodal recognition, in the context of recognizing two essential speaker traits, namely passion and credibility of online movie reviewers. We propose a novel ensemble classification approach that combines two different perspectives on classifying multimodal data. Each of these perspectives attempts to independently address the two-fold challenge. In the first, we combine the features from multiple modalities but assume inter-modality conditional independence. In the other one, we explicitly capture the correlation between the modalities but in a space of few dimensions and explore a novel clustering based kernel similarity approach for recognition. Additionally, this work investigates a recent technique for encoding text data that captures semantic similarity of verbal content and preserves word-ordering. The experimental results on a recent public dataset shows significant improvement of our approach over multiple baselines. Finally, we also analyze the most discriminative elements of a speaker's non-verbal behavior that contribute to his/her perceived credibility/passionateness.
- Published
- 2015
- Full Text
- View/download PDF
23. Exploring feedback strategies to improve public speaking
- Author
-
Ari Shapiro, Torsten Wörtwein, Stefan Scherer, Louis-Philippe Morency, and Mathieu Chollet
- Subjects
Public speaking ,Nonverbal communication ,Multimedia ,Computer science ,Human–computer interaction ,Control (management) ,Virtual reality ,Everyday life ,computer.software_genre ,computer - Abstract
Good public speaking skills convey strong and effective communication, which is critical in many professions and used in everyday life. The ability to speak publicly requires a lot of training and practice. Recent technological developments enable new approaches for public speaking training that allow users to practice in a safe and engaging environment. We explore feedback strategies for public speaking training that are based on an interactive virtual audience paradigm. We investigate three study conditions: (1) a non-interactive virtual audience (control condition), (2) direct visual feedback, and (3) nonverbal feedback from an interactive virtual audience. We perform a threefold evaluation based on self-assessment questionnaires, expert assessments, and two objectively annotated measures of eye-contact and avoidance of pause fillers. Our experiments show that the interactive virtual audience brings together the best of both worlds: increased engagement and challenge as well as improved public speaking skills as judged by experts.
- Published
- 2015
- Full Text
- View/download PDF
24. Session details: Keynote Address 2
- Author
-
Louis-Philippe Morency
- Subjects
Multimedia ,Computer science ,Session (computer science) ,computer.software_genre ,computer - Published
- 2014
- Full Text
- View/download PDF
25. A Multimodal Context-based Approach for Distress Assessment
- Author
-
Sayan Ghosh, Louis-Philippe Morency, and Moitreya Chatterjee
- Subjects
Distress ,Computer science ,Stress (linguistics) ,Psychological distress ,Context based ,Cognitive psychology - Abstract
The increasing prevalence of psychological distress disorders, such as depression and post-traumatic stress, necessitates a serious effort to create new tools and technologies to help with their diagnosis and treatment. In recent years, new computational approaches were proposed to objectively analyze patient non-verbal behaviors over the duration of the entire interaction between the patient and the clinician. In this paper, we go beyond non-verbal behaviors and propose a tri-modal approach which integrates verbal behaviors with acoustic and visual behaviors to analyze psychological distress during the course of the dyadic semi-structured interviews. Our approach exploits the advantages of the dyadic nature of these interactions to contextualize the participant responses based on the affective components (intimacy and polarity levels) of the questions. We validate our approach using one of the largest corpus of semi-structured interviews for distress assessment which consists of 154 multimodal dyadic interactions. Our results show significant improvement on distress prediction performance when integrating verbal behaviors with acoustic and visual behaviors. In addition, our analysis shows that contextualizing the responses improves the prediction performance, most significantly with positive and intimate questions.
- Published
- 2014
- Full Text
- View/download PDF
26. Dyadic Behavior Analysis in Depression Severity Assessment Interviews
- Author
-
Ying Yang, Stefan Scherer, Zakia Hammal, Louis-Philippe Morency, and Jeffrey F. Cohn
- Subjects
Clinical trial ,Severity assessment ,Computer science ,Dyadic interaction ,otorhinolaryngologic diseases ,Hamilton Rating Scale for Depression ,macromolecular substances ,Interpersonal communication ,Article ,Depression (differential diagnoses) ,Communicative behavior ,Clinical psychology - Abstract
Previous literature suggests that depression impacts vocal timing of both participants and clinical interviewers but is mixed with respect to acoustic features. To investigate further, 57 middle-aged adults (men and women) with Major Depression Disorder and their clinical interviewers (all women) were studied. Participants were interviewed for depression severity on up to four occasions over a 21 week period using the Hamilton Rating Scale for Depression (HRSD), which is a criterion measure for depression severity in clinical trials. Acoustic features were extracted for both participants and interviewers using COVAREP Toolbox. Missing data occurred due to missed appointments, technical problems, or insufficient vocal samples. Data from 36 participants and their interviewers met criteria and were included for analysis to compare between high and low depression severity. Acoustic features for participants varied between men and women as expected, and failed to vary with depression severity for participants. For interviewers, acoustic characteristics strongly varied with severity of the interviewee's depression. Accommodation - the tendency of interactants to adapt their communicative behavior to each other - between interviewers and interviewees was inversely related to depression severity. These findings suggest that interviewers modify their acoustic features in response to depression severity, and depression severity strongly impacts interpersonal accommodation.
- Published
- 2014
- Full Text
- View/download PDF
27. Session details: Poster Session 1
- Author
-
Oya Aran and Louis-Philippe Morency
- Subjects
Multimedia ,Session (computer science) ,Psychology ,computer.software_genre ,computer - Published
- 2014
- Full Text
- View/download PDF
28. Computational Analysis of Persuasiveness in Social Multimedia
- Author
-
Sunghyun Park, Louis-Philippe Morency, Kenji Sagae, Han Suk Shim, and Moitreya Chatterjee
- Subjects
Persuasion ,Modality (human–computer interaction) ,Modalities ,Multimedia ,Computer science ,media_common.quotation_subject ,Context (language use) ,computer.software_genre ,Negotiation ,Human–computer interaction ,Conversation ,Computational analysis ,Social multimedia ,computer ,media_common - Abstract
Our lives are heavily influenced by persuasive communication, and it is essential in almost any types of social interactions from business negotiation to conversation with our friends and family. With the rapid growth of social multimedia websites, it is becoming ever more important and useful to understand persuasiveness in the context of social multimedia content online. In this paper, we introduce our newly created multimedia corpus of 1,000 movie review videos obtained from a social multimedia website called ExpoTV.com, which will be made freely available to the research community. Our research results presented here revolve around the following 3 main research hypotheses. Firstly, we show that computational descriptors derived from verbal and nonverbal behavior can be predictive of persuasiveness. We further show that combining descriptors from multiple communication modalities (audio, text and visual) improve the prediction performance compared to using those from single modality alone. Secondly, we investigate if having prior knowledge of a speaker expressing a positive or negative opinion helps better predict the speaker's persuasiveness. Lastly, we show that it is possible to make comparable prediction of persuasiveness by only looking at thin slices (shorter time windows) of a speaker's behavior.
- Published
- 2014
- Full Text
- View/download PDF
29. Search Strategies for Pattern Identification in Multimodal Data
- Author
-
Louis-Philippe Morency, Francis Quek, and Chreston Miller
- Subjects
Multimodal search ,Pattern identification ,Information retrieval ,Event (computing) ,Computer science ,Multimodal data ,SIGNAL (programming language) - Abstract
The analysis of multimodal data benefits from meaningful search and retrieval. This paper investigates strategies of searching multimodal data for event patterns. Through three longitudinal case studies, we observed researchers exploring and identifying event patterns in multimodal data. The events were extracted from different multimedia signal sources ranging from annotated video transcripts to interaction logs. Each researcher's data has varying temporal characteristics (e.g., sparse, dense, or clustered) that posed several challenges for identifying relevant patterns. We identify unique search strategies and better understand the aspects that contributed to each.
- Published
- 2014
- Full Text
- View/download PDF
30. Toward crowdsourcing micro-level behavior annotations
- Author
-
Louis-Philippe Morency, Sunghyun Park, and Philippa Shoemark
- Subjects
Micro level ,Generalization ,business.industry ,Interface (Java) ,Computer science ,Reliability (computer networking) ,Training (meteorology) ,Crowdsourcing ,Machine learning ,computer.software_genre ,Annotation ,Artificial intelligence ,Set (psychology) ,business ,computer - Abstract
Research that involves human behavior analysis usually requires laborious and costly efforts for obtaining micro-level behavior annotations on a large video corpus. With the emerging paradigm of crowdsourcing however, these efforts can be considerably reduced. We first present OCTAB (Online Crowdsourcing Tool for Annotations of Behaviors), a web-based annotation tool that allows precise and convenient behavior annotations in videos, directly portable to popular crowdsourcing platforms. As part of OCTAB, we introduce a training module with specialized visualizations. The training module's design was inspired by an observational study of local experienced coders, and it enables an iterative procedure for effectively training crowd workers online. Finally, we present an extensive set of experiments that evaluates the feasibility of our crowdsourcing approach for obtaining micro-level behavior annotations in videos, showing the reliability improvement in annotation accuracy when properly training online crowd workers. We also show the generalization of our training approach to a new independent video corpus.
- Published
- 2014
- Full Text
- View/download PDF
31. Audiovisual behavior descriptors for depression assessment
- Author
-
Giota Stratou, Louis-Philippe Morency, and Stefan Scherer
- Subjects
Correlation ,Modalities ,Emotional expressivity ,Discriminative model ,Computer science ,Depression scale ,Linear correlation ,Depression (differential diagnoses) ,Cognitive psychology - Abstract
We investigate audiovisual indicators, in particular measures of reduced emotional expressivity and psycho-motor retardation, for depression within semi-structured virtual human interviews. Based on a standard self-assessment depression scale we investigate the statistical discriminative strength of the audiovisual features on a depression/no-depression basis. Within subject-independent unimodal and multimodal classification experiments we find that early feature-level fusion yields promising results and confirms the statistical findings. We further correlate the behavior descriptors with the assessed depression severity and find considerable correlation. Lastly, a joint multimodal factor analysis reveals two prominent factors within the data that show both statistical discriminative power as well as strong linear correlation with the depression severity score. These preliminary results based on a standard factor analysis are promising and motivate us to investigate this approach further in the future, while incorporating additional modalities.
- Published
- 2013
- Full Text
- View/download PDF
32. Learning a sparse codebook of facial and body microexpressions for emotion recognition
- Author
-
Randall Davis, Yale Song, Louis-Philippe Morency, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Song, Yale, and Davis, Randall
- Subjects
business.industry ,Computer science ,Speech recognition ,Representation (systemics) ,Codebook ,Pattern recognition ,Sensor fusion ,Motion (physics) ,Discriminative model ,Face (geometry) ,Body region ,Artificial intelligence ,business ,Neural coding - Abstract
Obtaining a compact and discriminative representation of facial and body expressions is a difficult problem in emotion recognition. Part of the difficulty is capturing microexpressions, i.e., short, involuntary expressions that last for only a fraction of a second: at a micro-temporal scale, there are so many other subtle face and body movements that do not convey semantically meaningful information. We present a novel approach to this problem by exploiting the sparsity of the frequent micro-temporal motion patterns. Local space-time features are extracted over the face and body region for a very short time period, e.g., few milliseconds. A codebook of microexpressions is learned from the data and used to encode the features in a sparse manner. This allows us to obtain a representation that captures the most salient motion patterns of the face and body at a micro-temporal scale. Experiments performed on the AVEC 2012 dataset show our approach achieving the best published performance on the arousal dimension based solely on visual features. We also report experimental results on audio-visual emotion recognition, comparing early and late data fusion techniques., United States. Office of Naval Research (N000140910625), National Science Foundation (U.S.) (IIS-1018055), National Science Foundation (U.S.) (IIS-1118018), United States. Army Research, Development, and Engineering Command
- Published
- 2013
- Full Text
- View/download PDF
33. ICMI 2013 grand challenge workshop on multimodal learning analytics
- Author
-
Nadir Weibel, Louis-Philippe Morency, Sharon Oviatt, Stefan Scherer, and Marcelo Worsley
- Subjects
Subject-matter expert ,Modalities ,Computer science ,Human–computer interaction ,Analytics ,business.industry ,Content analysis ,Learning analytics ,business ,Curriculum ,Data science ,Learning sciences ,Variety (cybernetics) - Abstract
Advances in learning analytics are contributing new empirical findings, theories, methods, and metrics for understanding how students learn. It also contributes to improving pedagogical support for students' learning through assessment of new digital tools, teaching strategies, and curricula. Multimodal learning analytics (MMLA)[1] is an extension of learning analytics and emphasizes the analysis of natural rich modalities of communication across a variety of learning contexts. This MMLA Grand Challenge combines expertise from the learning sciences and machine learning in order to highlight the rich opportunities that exist at the intersection of these disciplines. As part of the Grand Challenge, researchers were asked to predict: (1) which student in a group was the dominant domain expert, and (2) which problems that the group worked on would be solved correctly or not. Analyses were based on a combination of speech, digital pen and video data. This paper describes the motivation for the grand challenge, the publicly available data resources and results reported by the challenge participants. The results demonstrate that multimodal prediction of the challenge goals: (1) is surprisingly reliable using rich multimodal data sources, (2) can be accomplished using any of the three modalities explored, and (3) need not be based on content analysis.
- Published
- 2013
- Full Text
- View/download PDF
34. Automatic multimodal descriptors of rhythmic body movement
- Author
-
Marwa Mahmoud, Peter Robinson, and Louis-Philippe Morency
- Subjects
Multimodal fusion ,Modalities ,Rhythm ,business.industry ,Computer science ,Speech recognition ,food and beverages ,Body movement ,Computer vision ,Artificial intelligence ,business ,Gesture - Abstract
Prolonged durations of rhythmic body gestures were proved to be correlated with different types of psychological disorders. To-date, there is no automatic descriptor that can robustly detect those behaviours. In this paper, we propose a cyclic gestures descriptor that can detect and localise rhythmic body movements by taking advantage of both colour and depth modalities. We show experimentally how our rhythmic descriptor can successfully localise the rhythmic gestures as: hands fidgeting, legs fidgeting or rocking, significantly higher than the majority vote classification baseline. Our experiments also demonstrate the importance of fusing both modalities, with a significant increase in performance when compared to individual modalities.
- Published
- 2013
- Full Text
- View/download PDF
35. Interactive relevance search and modeling
- Author
-
Louis-Philippe Morency, Francis Quek, and Chreston Miller
- Subjects
Focus (computing) ,Process (engineering) ,Computer science ,business.industry ,Multimodal data ,Machine learning ,computer.software_genre ,Identification (information) ,Unison ,Human–computer interaction ,Multimodal analysis ,Relevance (information retrieval) ,Artificial intelligence ,business ,computer ,Statistical hypothesis testing - Abstract
In this paper we present the findings of three longitudinal case studies in which a new method for conducting multimodal analysis of human behavior is tested. The focus of this new method is to engage a researcher integrally in the analysis process and allow them to guide the identification and discovery of relevant behavior instances within multimodal data. The case studies resulted in the creation of two analysis strategies: Single-Focus Hypothesis Testing and Multi-Focus Hypothesis Testing. Each were shown to be beneficial to multimodal analysis through supporting either a single focused deep analysis or analysis across multiple angles in unison. These strategies exemplified how challenging questions can be answered for multimodal datasets. The new method is described and the case studies' findings are presented detailing how the new method supports multimodal analysis and opens the door for a new breed of analysis methods. Two of the three case studies resulted in publishable results for the respective participants.
- Published
- 2013
- Full Text
- View/download PDF
36. Who is persuasive?
- Author
-
Gelareh Mohammadi, Sunghyun Park, Alessandro Vinciarelli, Kenji Sagae, and Louis-Philippe Morency
- Subjects
Persuasion ,Modality (human–computer interaction) ,Persuasive communication ,Modalities ,Multimedia ,Computer science ,media_common.quotation_subject ,Personality perception ,computer.software_genre ,Personality ,Social multimedia ,InformationSystems_MISCELLANEOUS ,Social psychology ,computer ,media_common ,Social influence - Abstract
Persuasive communication is part of everyone's daily life. With the emergence of social websites like YouTube, Facebook and Twitter, persuasive communication is now seen online on a daily basis. This paper explores the effect of multi-modality and perceived personality on persuasiveness of social multimedia content. The experiments are performed over a large corpus of movie review clips from Youtube which is presented to online annotators in three different modalities: only text, only audio and video. The annotators evaluated the persuasiveness of each review across different modalities and judged the personality of the speaker. Our detailed analysis confirmed several research hypotheses designed to study the relationships between persuasion, perceived personality and communicative channel, namely modality. Three hypotheses are designed: the first hypothesis studies the effect of communication modality on persuasion, the second hypothesis examines the correlation between persuasion and personality perception and finally the third hypothesis, derived from the first two hypotheses explores how communication modality influence the personality perception.
- Published
- 2013
- Full Text
- View/download PDF
37. Multimodal prediction of expertise and leadership in learning groups
- Author
-
Sharon Oviatt, Nadir Weibel, Stefan Scherer, and Louis-Philippe Morency
- Subjects
Mathematical problem ,Modalities ,Multimedia ,business.industry ,Computer science ,media_common.quotation_subject ,Exploratory research ,computer.software_genre ,Multimodal learning analytics ,Identification (information) ,Situated ,Key (cryptography) ,Quality (business) ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common - Abstract
In his study, we investigate low level predictors from audio and writing modalities for the separation and identification of socially dominant leaders and experts within a study group. We use a multimodal dataset of situated computer assisted group learning tasks: Groups of three high-school students solve a number of mathematical problems in two separate sessions. In order to automatically identify the socially dominant student and expert in the group we analyze a number of prosodic and voice quality features as well as writing-based features. In this preliminary study we identify a number of promising acoustic and writing predictors for the disambiguation of leaders, experts and other students. We believe that this exploratory study reveals key opportunities for future analysis of multimodal learning analytics based on a combination of audio and writing signals.
- Published
- 2012
- Full Text
- View/download PDF
38. Towards sensing the influence of visual narratives on human affect
- Author
-
Verónica Pérez-Rosas, Mihai Burzo, Rada Mihalcea, Louis-Philippe Morency, Alexis Narvaez, and Daniel McDuff
- Subjects
Affective behavior ,Modalities ,genetic structures ,business.industry ,Computer science ,Affect (psychology) ,otorhinolaryngologic diseases ,Computer vision ,Narrative ,Artificial intelligence ,business ,Affective computing ,psychological phenomena and processes ,Cognitive psychology - Abstract
In this paper, we explore a multimodal approach to sensing affective state during exposure to visual narratives. Using four different modalities, consisting of visual facial behaviors, thermal imaging, heart rate measurements, and verbal descriptions, we show that we can effectively predict changes in human affect. Our experiments show that these modalities complement each other, and illustrate the role played by each of the four modalities in detecting human affect.
- Published
- 2012
- Full Text
- View/download PDF
39. Structural and temporal inference search (STIS)
- Author
-
Chreston Miller, Francis Quek, and Louis-Philippe Morency
- Subjects
Structure (mathematical logic) ,Sequence ,business.industry ,Computer science ,Multimodal data ,Inference ,Machine learning ,computer.software_genre ,Pattern identification ,Human interaction ,Multimodal analysis ,Corpus based ,Artificial intelligence ,business ,computer - Abstract
There are a multitude of annotated behavior corpora (manual and automatic annotations) available as research expands in multimodal analysis of human behavior. Despite the rich representations within these datasets, search strategies are limited with respect to the advanced representations and complex structures describing human interaction sequences. The relationships amongst human interactions are structural in nature. Hence, we present Structural and Temporal Inference Search (STIS) to support search for relevant patterns within a multimodal corpus based on the structural and temporal nature of human interactions. The user defines the structure of a behavior of interest driving a search focused on the characteristics of the structure. Occurrences of the structure are returned. We compare against two pattern mining algorithms purposed for pattern identification amongst sequences of symbolic data (e.g., sequence of events such as behavior interactions). The results are promising as STIS performs well with several datasets.
- Published
- 2012
- Full Text
- View/download PDF
40. Multimodal human behavior analysis
- Author
-
Randall Davis, Louis-Philippe Morency, Yale Song, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Song, Yale, and Davis, Randall
- Subjects
Conditional random field ,Modality (human–computer interaction) ,Modalities ,Computer science ,business.industry ,Feature vector ,Pattern recognition ,Machine learning ,computer.software_genre ,Kernel method ,Graphical model ,Artificial intelligence ,business ,Canonical correlation ,Projection (set theory) ,computer - Abstract
Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCRF). Our approach uses a nonlinear kernel to map multimodal data to a high-dimensional feature space and finds a new projection of the data that maximizes the correlation across modalities. We use a multi-chain structured graphical model with disjoint sets of latent variables, one set per modality, to jointly learn both view-shared and view-specific sub-structures of the projected data, capturing interaction across modalities explicitly. We evaluate our approach on a task of agreement and disagreement recognition from nonverbal audio-visual cues using the Canal 9 dataset. Experimental results show that KCCA makes capturing nonlinear hidden dynamics easier and MV-HCRF helps learning interaction across modalities., United States. Office of Naval Research (Grant N000140910625), National Science Foundation (U.S.) (Grant IIS-1118018), National Science Foundation (U.S.) (Grant IIS-1018055), United States. Army Research, Development, and Engineering Command
- Published
- 2012
- Full Text
- View/download PDF
41. Step-wise emotion recognition using concatenated-HMM
- Author
-
Derya Ozkan, Stefan Scherer, and Louis-Philippe Morency
- Subjects
Dimension (vector space) ,Dynamics (music) ,business.industry ,Computer science ,Speech recognition ,Regression analysis ,Emotion recognition ,Artificial intelligence ,business ,Set (psychology) ,Hidden Markov model - Abstract
Human emotion is an important part of human-human communication, since the emotional state of an individual often affects the way that he/she reacts to others. In this paper, we present a method based on concatenated Hidden Markov Model (co-HMM) to infer the dimensional and continuous emotion labels from audio-visual cues. Our method is based on the assumption that continuous emotion levels can be modeled by a set of discrete values. Based on this, we represent each emotional dimension by step-wise label classes, and learn the intrinsic and extrinsic dynamics using our co-HMM model. We evaluate our approach on the Audio-Visual Emotion Challenge (AVEC 2012) dataset. Our results show considerable improvement over the baseline regression model presented with the AVEC 2012.
- Published
- 2012
- Full Text
- View/download PDF
42. Session details: 3 Vision
- Author
-
Louis-Philippe Morency
- Subjects
Multimedia ,Session (computer science) ,Psychology ,computer.software_genre ,computer - Published
- 2012
- Full Text
- View/download PDF
43. I already know your answer
- Author
-
Jonathan Gratch, Sunghyun Park, and Louis-Philippe Morency
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,Past history ,Nonverbal communication ,Negotiation ,Respondent ,Artificial intelligence ,Everyday life ,business ,Human communication ,media_common ,Intuition ,Cognitive psychology - Abstract
Be it in our workplace or with our family or friends, negotiation comprises a fundamental fabric of our everyday life, and it is apparent that a system that can automatically predict negotiation outcomes will have substantial implications. In this paper, we focus on finding nonverbal behaviors that are predictive of immediate outcomes (acceptances or rejections of proposals) in a dyadic negotiation. Looking at the nonverbal behaviors of the respondent alone would be inadequate since ample predictive information could also reside in the behaviors of the proposer, as well as the past history between the two parties. With this intuition in mind, we show that a more accurate prediction can be achieved by considering all the three sources (multimodal) of information together. We evaluate our approach on a face-to-face negotiation dataset consisting of 42 dyadic interactions and show that integrating all three sources of information outperforms each individual predictor.
- Published
- 2012
- Full Text
- View/download PDF
44. 1st international workshop on multimodal learning analytics
- Author
-
Stefan Scherer, Marcelo Worsley, and Louis-Philippe Morency
- Subjects
Area studies ,Computer science ,Analytics ,business.industry ,Multimodal analysis ,Educational technology ,Statistical analysis ,Student learning ,business ,Multimodal learning analytics ,Data science ,Learning sciences - Abstract
This summary describes the 1st International Workshop on Multimodal Learning Analytics. This area of study brings together the technologies of multimodal analysis with the learning sciences. The intersection of these domains should enable researchers to foster an improved understanding of student learning, lead to the creation of more natural and enriching learning interfaces, and motivate the development of novel techniques for tackling challenges that are specific of education.
- Published
- 2012
- Full Text
- View/download PDF
45. Computational study of human communication dynamic
- Author
-
Louis-Philippe Morency
- Subjects
Communication ,Facial expression ,business.industry ,Interpersonal communication ,medicine.disease ,Gaze ,Autism spectrum disorder ,Human–computer interaction ,medicine ,Robot ,Psychology ,business ,Human communication ,Gesture ,Social behavior - Abstract
Face-to-face communication is a highly dynamic process where participants mutually exchange and interpret linguistic and gestural signals. Even when only one person speaks at the time, other participants exchange information continuously amongst themselves and with the speaker through gesture, gaze, posture and facial expressions. To correctly interpret the high-level communicative signals, an observer needs to jointly integrate all spoken words, subtle prosodic changes and simultaneous gestures from all participants. In this paper, we present our ongoing research effort at USC MultiComp Lab to create models of human communication dynamic that explicitly take into consideration the multimodal and interpersonal aspects of human face-to-face interactions. The computational framework presented in this paper has wide applicability, including the recognition of human social behaviors, the synthesis of natural animations for robots and virtual humans, improved multimedia content analysis, and the diagnosis of social and behavioral disorders (e.g., autism spectrum disorder).
- Published
- 2011
- Full Text
- View/download PDF
46. Towards multimodal sentiment analysis
- Author
-
Louis-Philippe Morency, Payal Doshi, and Rada Mihalcea
- Subjects
World Wide Web ,Modalities ,business.industry ,Constant flow ,Computer science ,Multimodal data ,Sentiment analysis ,The Internet ,business ,Joint (audio engineering) ,Relevant information ,Task (project management) - Abstract
With more than 10,000 new videos posted online every day on social websites such as YouTube and Facebook, the internet is becoming an almost infinite source of information. One crucial challenge for the coming decade is to be able to harvest relevant information from this constant flow of multimodal data. This paper addresses the task of multimodal sentiment analysis, and conducts proof-of-concept experiments that demonstrate that a joint model that integrates visual, audio, and textual features can be effectively used to identify sentiment in Web videos. This paper makes three important contributions. First, it addresses for the first time the task of tri-modal sentiment analysis, and shows that it is a feasible task that can benefit from the joint exploitation of visual, audio and textual modalities. Second, it identifies a subset of audio-visual features relevant to sentiment analysis and present guidelines on how to integrate these features. Finally, it introduces a new dataset consisting of real online data, which will be useful for future research in this area.
- Published
- 2011
- Full Text
- View/download PDF
47. Session details: Analysis and recognition
- Author
-
Louis-Philippe Morency
- Subjects
Multimedia ,Computer science ,Session (computer science) ,computer.software_genre ,computer - Published
- 2010
- Full Text
- View/download PDF
48. 3rd international workshop on affective interaction in natural environments (AFFINE)
- Author
-
Christopher Peters, Kostas Karpouzis, Jean-Claude Martin, Laurel D. Riek, Ginevra Castellano, and Louis-Philippe Morency
- Subjects
Embodied agent ,Identification (information) ,Social robot ,Computer science ,Human–computer interaction ,Embodied cognition ,Interpretation (philosophy) ,Natural (music) ,Robot ,Affine transformation ,computer.software_genre ,Affect (psychology) ,computer - Abstract
The 3rd International Workshop on Affective Interaction in Natural Environments, AFFINE, follows a number of successful AFFINE workshops and events commencing in 2008.A key aim of AFFINE is the identification and investigation of significant open issues in real-time, affect-aware applications 'in the wild' and especially in embodied interaction, for example, with robots or virtual agents. AFFINE seeks to bring together researchers working on the real-time interpretation of user behaviour with those who are concerned with social robot and virtual agent interaction frameworks.
- Published
- 2010
- Full Text
- View/download PDF
49. Co-occurrence graphs
- Author
-
Louis-Philippe Morency
- Subjects
business.industry ,Head (linguistics) ,Computer science ,Speech recognition ,Representation (systemics) ,Co-occurrence ,computer.software_genre ,Graph ,Discriminative model ,Gesture recognition ,Contextual information ,Artificial intelligence ,business ,computer ,Natural language processing ,Gesture - Abstract
Head pose and gesture offer several conversational grounding cues and are used extensively in face-to-face interaction among people. To accurately recognize visual feedback, humans often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper we describe how contextual information from other participants can be used to predict visual feedback and improve recognition of head gestures in multiparty interactions (e.g., meetings). An important contribution of this paper is our data-driven representation, called co-occurrence graphs, which models co-occurrence between contextual cues such as spoken words and pauses, and visual head gestures. By analyzing these co-occurrence patterns we can automatically select relevant contextual features and predict when visual gestures are more likely. Using a discriminative approach to multi-modal integration, our contextual representation using co-occurrence graph improves head gesture recognition performance on a publicly available dataset of multi-party interactions.
- Published
- 2009
- Full Text
- View/download PDF
50. Use of context in vision processing
- Author
-
Anton Nijholt, Yuri Ivanov, Ralph Braspenning, Maja Pantic, Louis-Philippe Morency, Hamid Aghajan, and Ming-Hsuan Yang
- Subjects
Computer science ,intelligent headlight control ,IR-70415 ,HMI-MI: MULTIMODAL INTERACTIONS ,Machine Learning ,METIS-266455 ,Human–computer interaction ,Robustness (computer science) ,Contextual information ,human- human interaction ,Computer vision ,Modalities ,Ambient intelligence ,business.industry ,EWI-17696 ,driving context ,Cognitive neuroscience of visual object recognition ,context-driven event interpretation ,Object recognition ,camera sensors ,statistical relational models ,image/video content analysis ,Vision science ,smart homes ,Enabling ,visual gesture recognition ,Systems design ,Algorithm design ,Artificial intelligence ,business - Abstract
Recent efforts in defining ambient intelligence applications based on user-centric concepts, the advent of technology in different sensing modalities as well as the expanding interest in multi-modal information fusion and situation-aware and dynamic vision processing algorithms have created a common motivation across different research disciplines to utilize context as a key enabler of application-oriented vision systems design. Improved robustness, efficient use of sensing and computing resources, dynamic task assignment to different operating modules as well as adaptation to event and user behavior models are among the benefits a vision processing system can gain through the utilization of contextual information. The Workshop on Use of Context in Vision Processing (UCVP) aims to address the opportunities in incorporating contextual information in algorithm design for single or multi-camera vision systems, as well as systems in which vision is complemented with other sensing modalities, such as audio, motion, proximity, occupancy, and others.
- Published
- 2009
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.