Author: "Soroosh Mariooryad" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Soroosh Mariooryad"' showing total 25 results

Start Over Author "Soroosh Mariooryad"

25 results on '"Soroosh Mariooryad"'

1. Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM.

Author: Eliya Nachmani, Alon Levkovitch, Roy Hirsch, Julian Salazar, Chulayuth Asawaroengchai, Soroosh Mariooryad, Ehud Rivlin, R. J. Skerry-Ryan, and Michelle Tadmor Ramanovich
Published: 2024

2. Speaker Generation.

Author: Daisy Stanton, Matt Shannon, Soroosh Mariooryad, R. J. Skerry-Ryan, Eric Battenberg, Tom Bagby, and David Kao
Published: 2022
Full Text: View/download PDF

3. Wave-Tacotron: Spectrogram-Free End-to-End Text-to-Speech Synthesis.

Author: Ron J. Weiss, R. J. Skerry-Ryan, Eric Battenberg, Soroosh Mariooryad, and Diederik P. Kingma
Published: 2021
Full Text: View/download PDF

4. Location-Relative Attention Mechanisms for Robust Long-Form Speech Synthesis.

Author: Eric Battenberg, R. J. Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, and Tom Bagby
Published: 2020
Full Text: View/download PDF

5. Semi-Supervised Generative Modeling for Controllable Speech Synthesis.

Author: Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, R. J. Skerry-Ryan, Daisy Stanton, David Kao, and Tom Bagby
Published: 2020

6. Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora.

Author: Soroosh Mariooryad, Reza Lotfian, and Carlos Busso
Published: 2014
Full Text: View/download PDF

7. Automatic characterization of speaking styles in educational videos.

Author: Soroosh Mariooryad, Anitha Kannan, Dilek Hakkani-Tür, and Elizabeth Shriberg
Published: 2014
Full Text: View/download PDF

8. Feature and model level compensation of lexical content for facial emotion recognition.

Author: Soroosh Mariooryad and Carlos Busso
Published: 2013
Full Text: View/download PDF

9. Analysis and Compensation of the Reaction Lag of Evaluators in Continuous Emotional Annotations.

Author: Soroosh Mariooryad and Carlos Busso
Published: 2013
Full Text: View/download PDF

10. Audiovisual corpus to analyze whisper speech.

Author: Tam Tran, Soroosh Mariooryad, and Carlos Busso
Published: 2013
Full Text: View/download PDF

11. Factorizing speaker, lexical and emotional variabilities observed in facial expressions.

Author: Soroosh Mariooryad and Carlos Busso
Published: 2012
Full Text: View/download PDF

12. Detecting Sleepiness by Fusing Classifiers Trained with Novel Acoustic Features.

Author: Tauhidur Rahman, Soroosh Mariooryad, Shalini Keshavamurthy, Gang Liu 0001, John H. L. Hansen, and Carlos Busso
Published: 2011
Full Text: View/download PDF

13. Speaker Generation

Author: Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, and David Kao
Subjects: FOS: Computer and information sciences, Sound (cs.SD), I.2.7, G.3, Computer Science - Machine Learning, Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Machine Learning (cs.LG), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This work explores the task of synthesizing speech in nonexistent human-sounding voices. We call this task "speaker generation", and present TacoSpawn, a system that performs competitively at this task. TacoSpawn is a recurrent attention-based text-to-speech model that learns a distribution over a speaker embedding space, which enables sampling of novel and diverse speakers. Our method is easy to implement, and does not require transfer learning from speaker ID systems. We present objective and subjective metrics for evaluating performance on this task, and demonstrate that our proposed objective metrics correlate with human perception of speaker similarity. Audio samples are available on our demo page., 12 pages, 3 figures, 4 tables, appendix with 2 tables
Published: 2021

14. Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

Author: Eric Battenberg, Soroosh Mariooryad, Ron Weiss, RJ Skerry-Ryan, and Diederik P. Kingma
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Artificial neural network, Computer science, Speech processing, Computer Science - Sound, Autoregressive model, Flow (mathematics), Cascade, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Spectrogram, Waveform, Algorithm, Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing, Block (data storage)
Abstract: We describe a sequence-to-sequence neural network which directly generates speech waveforms from text inputs. The architecture extends the Tacotron model by incorporating a normalizing flow into the autoregressive decoder loop. Output waveforms are modeled as a sequence of non-overlapping fixed-length blocks, each one containing hundreds of samples. The interdependencies of waveform samples within each block are modeled using the normalizing flow, enabling parallel training and synthesis. Longer-term dependencies are handled autoregressively by conditioning each flow on preceding blocks.This model can be optimized directly with maximum likelihood, with-out using intermediate, hand-designed features nor additional loss terms. Contemporary state-of-the-art text-to-speech (TTS) systems use a cascade of separately learned models: one (such as Tacotron) which generates intermediate features (such as spectrograms) from text, followed by a vocoder (such as WaveRNN) which generates waveform samples from the intermediate features. The proposed system, in contrast, does not use a fixed intermediate representation, and learns all parameters end-to-end. Experiments show that the proposed model generates speech with quality approaching a state-of-the-art neural TTS system, with significantly improved generation speed., Comment: 6 pages including supplement, 3 figures. accepted to ICASSP 2021
Published: 2020
Full Text: View/download PDF

15. The Cost of Dichotomizing Continuous Labels for Binary Classification Problems: Deriving a Bayesian-Optimal Classifier

Author: Carlos Busso and Soroosh Mariooryad
Subjects: Structured support vector machine, Computer science, business.industry, 05 social sciences, 050401 social sciences methods, Pattern recognition, Bayes classifier, Quadratic classifier, Machine learning, computer.software_genre, 050105 experimental psychology, Human-Computer Interaction, Support vector machine, ComputingMethodologies_PATTERNRECOGNITION, 0504 sociology, Binary classification, Margin classifier, Maximum a posteriori estimation, 0501 psychology and cognitive sciences, Artificial intelligence, business, Classifier (UML), computer, Software
Abstract: Many pattern recognition problems involve characterizing samples with continuous labels instead of discrete categories. While regression models are suitable for these learning tasks, these labels are often discretized into binary classes to formulate the problem as a conventional classification task (e.g., classes with low versus high values). This methodology brings intrinsic limitations on the classification performance. The continuous labels are typically normally-distributed, with many samples close to the boundary threshold, resulting in poor classification rates. Previous studies only use the discretized labels to train binary classifiers, neglecting the original, continuous labels. This study demonstrates that, even in binary classification problems, exploiting the original labels before splitting the classes can lead to better classification performance. This work proposes an optimal classifier based on the Bayesian maximum a posterior (MAP) criterion for these problems, which effectively utilizes the real-valued labels. We derive the theoretical average performance of this classifier, which can be considered as the expected upper bound performance for the task. Experimental evaluations on synthetic and real data sets show the improvement achieved by the proposed classifier, in contrast to conventional classifiers trained with binary labels. These evaluations clearly demonstrate the optimality of the proposed classifier, and the precision of the expected upper bound obtained by our derivation.
Published: 2017

16. Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

Author: Eric Battenberg, David T. H. Kao, Tom Bagby, Soroosh Mariooryad, RJ Skerry-Ryan, Daisy Stanton, and Matt Shannon
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Computer Science - Computation and Language, Mechanism (biology), Computer science, Speech recognition, 020206 networking & telecommunications, Speech synthesis, 02 engineering and technology, computer.software_genre, Computer Science - Sound, Machine Learning (cs.LG), 030507 speech-language pathology & audiology, 03 medical and health sciences, Consistency (database systems), Audio and Speech Processing (eess.AS), 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), FOS: Electrical engineering, electronic engineering, information engineering, 0305 other medical science, computer, Computation and Language (cs.CL), Energy (signal processing), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text. We show that these failures can be addressed using simple location-relative attention mechanisms that do away with content-based query/key comparisons. We compare two families of attention mechanisms: location-relative GMM-based mechanisms and additive energy-based mechanisms. We suggest simple modifications to GMM-based attention that allow it to align quickly and consistently during training, and introduce a new location-relative attention mechanism to the additive energy-based family, called Dynamic Convolution Attention (DCA). We compare the various mechanisms in terms of alignment speed and consistency during training, naturalness, and ability to generalize to long utterances, and conclude that GMM attention and DCA can generalize to very long utterances, while preserving naturalness for shorter, in-domain utterances., Comment: Accepted to ICASSP 2020
Published: 2019
Full Text: View/download PDF

17. Facial Expression Recognition in the Presence of Speech Using Blind Lexical Compensation

Author: Carlos Busso and Soroosh Mariooryad
Subjects: Facial expression, Face hallucination, Speech recognition, Phonetic transcription, 02 engineering and technology, Facial recognition system, Human-Computer Interaction, 030507 speech-language pathology & audiology, 03 medical and health sciences, Face (geometry), 0202 electrical engineering, electronic engineering, information engineering, Three-dimensional face recognition, 020201 artificial intelligence & image processing, Transcription (software), 0305 other medical science, Articulation (phonetics), Psychology, Software
Abstract: During spontaneous conversations the articulation process as well as the internal emotional states influence the facial configurations. Inferring the conveyed emotions from the information presented in facial expressions requires decoupling the linguistic and affective messages in the face. Normalizing and compensating for the underlying lexical content have shown improvement in recognizing facial expressions. However, this requires the transcription and phoneme alignment information, which is not available in broad range of applications. This study uses the asymmetric bilinear factorization model to perform the decoupling of linguistic and affective information when they are not given. The emotion recognition evaluations on the IEMOCAP database show the capability of the proposed approach in separating these factors in facial expressions, yielding statistically significant performance improvements. The achieved improvement is similar to the case when the ground truth phonetic transcription is known. Similarly, experiments on the SEMAINE database using image-based features demonstrate the effectiveness of the proposed technique in practical scenarios.
Published: 2016

18. Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators

Author: Carlos Busso and Soroosh Mariooryad
Subjects: business.industry, Emotion classification, Lag, Speech recognition, Feature extraction, Mutual information, Machine learning, computer.software_genre, Human-Computer Interaction, Artificial intelligence, Emotion recognition, Valence (psychology), Psychology, business, computer, Software
Abstract: An appealing scheme to characterize expressive behaviors is the use of emotional dimensions such as activation (calm versus active) and valence (negative versus positive). These descriptors offer many advantages to describe the wide spectrum of emotions. Due to the continuous nature of fast-changing expressive vocal and gestural behaviors, it is desirable to continuously track these emotional traces, capturing subtle and localized events (e.g., with FEELTRACE). However, time-continuous annotations introduce challenges that affect the reliability of the labels. In particular, an important issue is the evaluators’ reaction lag caused by observing, appraising, and responding to the expressive behaviors. An empirical analysis demonstrates that this delay varies from 1 to 6 seconds, depending on the annotator, expressive dimension, and actual behaviors. Our experiments show accuracy improvements even with fixed delays (1-3 seconds). This paper proposes to compensate for this reaction lag by finding the time-shift that maximizes the mutual information between the expressive behaviors and the time-continuous annotations. The approach is implemented by making different assumptions about the evaluators’ reaction lag. The benefits of compensating for the delay is demonstrated with emotion classification experiments. On average, the classifiers trained with facial and speech features show more than 7 percent relative improvements over baseline classifiers trained and tested without shifting the time-continuous annotations.
Published: 2015

19. Compensating for speaker or lexical variabilities in speech for emotion recognition

Author: Carlos Busso and Soroosh Mariooryad
Subjects: Normalization (statistics), Linguistics and Language, Computer science, business.industry, Communication, Speech recognition, Mutual information, Speaker recognition, computer.software_genre, Language and Linguistics, Computer Science Applications, Speaker diarisation, Modeling and Simulation, Feature (machine learning), Computer Vision and Pattern Recognition, Artificial intelligence, Set (psychology), business, computer, Software, Human voice, Natural language processing, Uncertainty reduction theory
Abstract: Affect recognition is a crucial requirement for future human machine interfaces to effectively respond to nonverbal behaviors of the user. Speech emotion recognition systems analyze acoustic features to deduce the speaker's emotional state. However, human voice conveys a mixture of information including speaker, lexical, cultural, physiological and emotional traits. The presence of these communication aspects introduces variabilities that affect the performance of an emotion recognition system. Therefore, building robust emotional models requires careful considerations to compensate for the effect of these variabilities. This study aims to factorize speaker characteristics, verbal content and expressive behaviors in various acoustic features. The factorization technique consists in building phoneme level trajectory models for the features. We propose a metric to quantify the dependency between acoustic features and communication traits (i.e., speaker, lexical and emotional factors). This metric, which is motivated by the mutual information framework, estimates the uncertainty reduction in the trajectory models when a given trait is considered. The analysis provides important insights on the dependency between the features and the aforementioned factors. Motivated by these results, we propose a feature normalization technique based on the whitening transformation that aims to compensate for speaker and lexical variabilities. The benefit of employing this normalization scheme is validated with the presented factor analysis method. The emotion recognition experiments show that the normalization approach can attenuate the variability imposed by the verbal content and speaker identity, yielding 4.1% and 2.4% relative performance improvements on a selected set of features, respectively.
Published: 2014

20. Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech

Author: Angeliki Metallinou, Soroosh Mariooryad, Carlos Busso, and Shrikanth S. Narayanan
Subjects: Normalization (statistics), business.industry, Speech recognition, Feature extraction, Emotion detection, Pattern recognition, Human-Computer Interaction, Emotion recognition, Affine transformation, Artificial intelligence, Natural approach, Psychology, business, Software
Abstract: The externalization of emotion is intrinsically speaker-dependent. A robust emotion recognition system should be able to compensate for these differences across speakers. A natural approach is to normalize the features before training the classifiers. However, the normalization scheme should not affect the acoustic differences between emotional classes. This study presents the iterative feature normalization (IFN) framework, which is an unsupervised front-end, especially designed for emotion detection. The IFN approach aims to reduce the acoustic differences, between the neutral speech across speakers, while preserving the inter-emotional variability in expressive speech. This goal is achieved by iteratively detecting neutral speech for each speaker, and using this subset to estimate the feature normalization parameters. Then, an affine transformation is applied to both neutral and emotional speech. This process is repeated till the results from the emotion detection system are consistent between consecutive iterations. The IFN approach is exhaustively evaluated using the IEMOCAP database and a data set obtained under free uncontrolled recording conditions with different evaluation configurations. The results show that the systems trained with the IFN approach achieve better performance than systems trained either without normalization or with global normalization.
Published: 2013

21. Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition

Author: Carlos Busso and Soroosh Mariooryad
Subjects: Communication, Facial expression, Modalities, business.industry, Mutual information, Facial recognition system, Multimodal interaction, Human-Computer Interaction, Dialog box, business, Psychology, psychological phenomena and processes, Software, Human communication, Cognitive psychology, Gesture
Abstract: Psycholinguistic studies on human communication have shown that during human interaction individuals tend to adapt their behaviors mimicking the spoken style, gestures, and expressions of their conversational partners. This synchronization pattern is referred to as entrainment. This study investigates the presence of entrainment at the emotion level in cross-modality settings and its implications on multimodal emotion recognition systems. The analysis explores the relationship between acoustic features of the speaker and facial expressions of the interlocutor during dyadic interactions. The analysis shows that 72 percent of the time the speakers displayed similar emotions, indicating strong mutual influence in their expressive behaviors. We also investigate the cross-modality, cross-speaker dependence, using mutual information framework. The study reveals a strong relation between facial and acoustic features of one subject with the emotional state of the other subject. It also shows strong dependence between heterogeneous modalities across conversational partners. These findings suggest that the expressive behaviors from one dialog partner provide complementary information to recognize the emotional state of the other dialog partner. The analysis motivates classification experiments exploiting cross-modality, cross-speaker information. The study presents emotion recognition experiments using the IEMOCAP and SEMAINE databases. The results demonstrate the benefit of exploiting this emotional entrainment effect, showing statistically significant improvements.
Published: 2013

22. Generating Human-Like Behaviors Using Joint, Speech-Driven Models for Conversational Agents

Author: Carlos Busso and Soroosh Mariooryad
Subjects: Facial expression, Visual perception, Acoustics and Ultrasonics, Computer science, Speech recognition, Animation, computer.software_genre, Speech processing, Electrical and Electronic Engineering, Dialog system, computer, Dynamic Bayesian network, Computer facial animation, Gesture
Abstract: During human communication, every spoken message is intrinsically modulated within different verbal and nonverbal cues that are externalized through various aspects of speech and facial gestures. These communication channels are strongly interrelated, which suggests that generating human-like behavior requires a careful study of their relationship. Neglecting the mutual influence of different communicative channels in the modeling of natural behavior for a conversational agent may result in unrealistic behaviors that can affect the intended visual perception of the animation. This relationship exists both between audiovisual information and within different visual aspects. This paper explores the idea of using joint models to preserve the coupling not only between speech and facial expression, but also within facial gestures. As a case study, the paper focuses on building a speech-driven facial animation framework to generate natural head and eyebrow motions. We propose three dynamic Bayesian networks (DBNs), which make different assumptions about the coupling between speech, eyebrow and head motion. Synthesized animations are produced based on the MPEG-4 facial animation standard, using the audiovisual IEMOCAP database. The experimental results based on perceptual evaluations reveal that the proposed joint models (speech/eyebrow/head) outperform audiovisual models that are separately trained (speech/head and speech/eyebrow).
Published: 2012

23. Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora

Author: Reza Lotfian, Carlos Busso, and Soroosh Mariooryad
Subjects: Computer science, business.industry, media_common.quotation_subject, Speech recognition, Speech corpus, Speech processing, computer.software_genre, Task (project management), Phone, Natural (music), Conversation, Artificial intelligence, Affective computing, business, computer, Natural language processing, media_common
Abstract: A key element in affective computing is to have large corpora of genuine emotional samples collected during natural conversations. Recording natural interactions through telephone is an appealing approach to build emotional databases. However, collecting real conversational data with expressive reactions is a challenging task, especially if the recordings are to be shared with the community (e.g., privacy concerns). This study explores a novel approach consisting in retrieving emotional reactions from existing spontaneous speech databases collected for general speech processing problems. Although most of the recordings in these databases are expected to have non-emotional expressions, given the naturalness of the interactions, the flow of the conversation can lead to emotional responses from conversation partners which we aim to retrieve. We use the IEMOCAP and SEMAINE databases to build emotion detector systems. We use these classifiers to identify emotional behaviors from the FISHER database, which is a large conversational speech corpus recorded over the phone. Subjective evaluations over the retrieved samples demonstrate the potential of the proposed scheme to build naturalistic emotional speech database. Index Terms: emotion recognition, expressive speech, information retrieval, emotional databases
Published: 2014

24. Analysis and Compensation of the Reaction Lag of Evaluators in Continuous Emotional Annotations

Author: Carlos Busso and Soroosh Mariooryad
Subjects: Artificial neural network, Contextual image classification, business.industry, Speech recognition, Lag, Mutual information, Stimulus (physiology), computer.software_genre, Annotation, Emotion recognition, Artificial intelligence, Affective computing, Psychology, business, GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries), computer, Natural language processing
Abstract: Defining useful emotional descriptors to characterize expressive behaviors is an important research area in affective computing. Recent studies have shown the benefits of using continuous emotional evaluations to annotate spontaneous corpora. Instead of assigning global labels per segments, this approach captures the temporal dynamic evolution of the emotions. A challenge of continuous assessments is the inherent reaction lag of the evaluators. During the annotation process, an observer needs to sense the stimulus, perceive the emotional message, and define his/her judgment, all this in real time. As a result, we expect a reaction lag between the annotation and the underlying emotional content. This paper uses mutual information to quantify and compensate for this reaction lag. Classification experiments on the SEMAINE database demonstrate that the performance of emotion recognition systems improve when the evaluator reaction lag is considered. We explore annotator-dependent and annotator-independent compensation schemes.
Published: 2013

25. Factorizing speaker, lexical and emotional variabilities observed in facial expressions

Author: Carlos Busso and Soroosh Mariooryad
Subjects: Facial expression, Computer science, Speech recognition, Metric (mathematics), Feature extraction, Mutual information, Affect (psychology), Speaker recognition, Facial recognition system, TRACE (psycholinguistics)
Abstract: An effective human computer interaction system should be equipped with mechanisms to recognize and respond to the affective state of the user. However, spoken message conveys different communicative aspects such as the verbal content, emotional state and idiosyncrasy of the speaker. Each of these aspects introduces variability that will affect the performance of an emotion recognition system. If the models used to capture the expressive behaviors are constrained by the lexical content and speaker identity, it is expected that the observed uncertainty in the channel will decrease, improving the accuracy of the system. Motivated by these observations, this study aims to quantify and localize the speaker, lexical and emotional variabilities observed in the face during human interaction. A metric inspired in mutual information theory is proposed to quantify the dependency of facial features on these factors. This metric uses the trace of the covariance matrix of facial motion trajectories to measure the uncertainty. The experimental results confirm the strong influence of the lexical information in the lower part of the face. For this facial region, the results demonstrate the benefit of constraining the emotional model on the lexical content. The ultimate goal of this research is to utilize this information to constrain the emotional models on the underlying lexical units to improve the accuracy of emotion recognition systems.
Published: 2012

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

25 results on '"Soroosh Mariooryad"'

1. Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM.

2. Speaker Generation.

3. Wave-Tacotron: Spectrogram-Free End-to-End Text-to-Speech Synthesis.

4. Location-Relative Attention Mechanisms for Robust Long-Form Speech Synthesis.

5. Semi-Supervised Generative Modeling for Controllable Speech Synthesis.

6. Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora.

7. Automatic characterization of speaking styles in educational videos.

8. Feature and model level compensation of lexical content for facial emotion recognition.

9. Analysis and Compensation of the Reaction Lag of Evaluators in Continuous Emotional Annotations.

10. Audiovisual corpus to analyze whisper speech.

11. Factorizing speaker, lexical and emotional variabilities observed in facial expressions.

12. Detecting Sleepiness by Fusing Classifiers Trained with Novel Acoustic Features.

13. Speaker Generation

14. Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

15. The Cost of Dichotomizing Continuous Labels for Binary Classification Problems: Deriving a Bayesian-Optimal Classifier

16. Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

17. Facial Expression Recognition in the Presence of Speech Using Blind Lexical Compensation

18. Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators

19. Compensating for speaker or lexical variabilities in speech for emotion recognition

20. Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech

21. Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition

22. Generating Human-Like Behaviors Using Joint, Speech-Driven Models for Conversational Agents

23. Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora

24. Analysis and Compensation of the Reaction Lag of Evaluators in Continuous Emotional Annotations

25. Factorizing speaker, lexical and emotional variabilities observed in facial expressions

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

25 results on '"Soroosh Mariooryad"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources