Author: "Shao-Yen Tseng" / Topic: computer - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shao-Yen Tseng"' showing total 5 results

Start Over Author "Shao-Yen Tseng" Topic computer

5 results on '"Shao-Yen Tseng"'

1. Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

Author: Shao-Yen Tseng, Panayiotis G. Georgiou, and Shrikanth S. Narayanan
Subjects: Context model, Computer science, business.industry, Applied Mathematics, Feature extraction, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, Paralanguage, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Task analysis, Word usage, Language model, Artificial intelligence, Electrical and Electronic Engineering, business, computer, Natural language processing, Word (computer architecture), Spoken language
Abstract: Word embeddings such as ELMo and BERT have been shown to model word usage in language with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant performance improvement across many natural language processing tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of a parallel stream to the bidirectional language model. This multimodal language model is trained on spoken language data that includes both text and audio modalities. We show that embeddings extracted from this model integrate paralinguistic cues into word meanings and can provide vital affective information by applying these multimodal embeddings to the task of speaker emotion recognition.
Published: 2021

2. 'Honey, I Learned to Talk'

Author: Panayiotis G. Georgiou, Haoqi Li, Shao-Yen Tseng, and Brian R. Baucom
Subjects: Multimodal fusion, Modality (human–computer interaction), Modalities, business.industry, Computer science, media_common.quotation_subject, Significant difference, Mean absolute error, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Expression (mathematics), Perception, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, psychological phenomena and processes, media_common
Abstract: In this work we analyze the importance of lexical and acoustic modalities in behavioral expression and perception. We demonstrate that this importance relates to the amount of therapy, and hence communication training, that a person received. It also exhibits some relationship to gender. We proceed to provide an analysis on couple therapy data by splitting the data into clusters based on gender or stage in therapy. Our analysis demonstrates the significant difference between optimal modality weights per cluster and relationship to therapy stage. Given this finding we propose the use of communication-skill aware fusion models to account for these differences in modality importance. The fusion models operate on partitions of the data according to the gender of the speaker or the therapy stage of the couple. We show that while most multimodal fusion methods can improve mean absolute error of behavioral estimates, the best results are given by a model that considers the degree of communication training among the interlocutors.
Published: 2018

3. Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection

Author: Samarjit Das, Shao-Yen Tseng, Yun Wang, Florian Metze, Joseph Szurley, and Juncheng Li
Subjects: FOS: Computer and information sciences, Sound (cs.SD), business.industry, Computer science, Event (relativity), Deep learning, Small footprint, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Computer Science - Sound, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: State-of-the-art audio event detection (AED) systems rely on supervised learning using strongly labeled data. However, this dependence severely limits scalability to large-scale datasets where fine resolution annotations are too expensive to obtain. In this paper, we propose a small-footprint multiple instance learning (MIL) framework for multi-class AED using weakly annotated labels. The proposed MIL framework uses audio embeddings extracted from a pre-trained convolutional neural network as input features. We show that by using audio embeddings the MIL framework can be implemented using a simple DNN with performance comparable to recurrent neural networks. We evaluate our approach by training an audio tagging system using a subset of AudioSet, which is a large collection of weakly labeled YouTube video excerpts. Combined with a late-fusion approach, we improve the F1 score of a baseline audio tagging system by 17%. We show that audio embeddings extracted by the convolutional neural networks significantly boost the performance of all MIL models. This framework reduces the model complexity of the AED system and is suitable for applications where computational resources are limited., 5 pages, 3 figures
Published: 2017

4. Unsupervised online multitask learning of behavioral sentence embeddings

Author: Brian R. Baucom, Panayiotis G. Georgiou, and Shao-Yen Tseng
Subjects: FOS: Computer and information sciences, General Computer Science, Computer science, Process (engineering), Multi-task learning, 02 engineering and technology, computer.software_genre, Unsupervised learning, lcsh:QA75.5-76.95, Emotional embeddings, 030507 speech-language pathology & audiology, 03 medical and health sciences, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Computer Science - Computation and Language, Sentence embeddings, business.industry, Natural Language and Speech, Behavior analysis, ComputingMethodologies_PATTERNRECOGNITION, Transformation (function), Embedding, 020201 artificial intelligence & image processing, Couples therapy, Emotion recognition, lcsh:Electronic computers. Computer science, Artificial intelligence, 0305 other medical science, Transfer of learning, business, Computation and Language (cs.CL), computer, Sentence, Natural language processing, Word (computer architecture)
Abstract: Appropriate embedding transformation of sentences can aid in downstream tasks such as NLP and emotion and behavior analysis. Such efforts evolved from word vectors which were trained in an unsupervised manner using large-scale corpora. Recent research, however, has shown that sentence embeddings trained using in-domain data or supervised techniques, often through multitask learning, perform better than unsupervised ones. Representations have also been shown to be applicable in multiple tasks, especially when training incorporates multiple information sources. In this work we aspire to combine the simplicity of using abundant unsupervised data with transfer learning by introducing an online multitask objective. We present a multitask paradigm for unsupervised learning of sentence embeddings which simultaneously addresses domain adaption. We show that embeddings generated through this process increase performance in subsequent domain-relevant tasks. We evaluate on the affective tasks of emotion recognition and behavior analysis and compare our results with state-of-the-art general-purpose supervised sentence embeddings. Our unsupervised sentence embeddings outperform the alternative universal embeddings in both identifying behaviors within couples therapy and in emotion recognition.
Published: 2019

5. Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language Models

Author: Sandeep Nallan Chakravarthula, Shao-Yen Tseng, Panayiotis G. Georgiou, and Brian R. Baucom
Subjects: Low resource, Computer science, business.industry, 010501 environmental sciences, computer.software_genre, 01 natural sciences, 03 medical and health sciences, Annotation, 0302 clinical medicine, 030212 general & internal medicine, Language model, Artificial intelligence, business, computer, Natural language processing, 0105 earth and related environmental sciences
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Shao-Yen Tseng"'

1. Multimodal Embeddings From Language Models for Emotion Recognition in the Wild

2. 'Honey, I Learned to Talk'

3. Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection

4. Unsupervised online multitask learning of behavioral sentence embeddings

5. Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language Models

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

5 results on '"Shao-Yen Tseng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources