Author: "Chen, Hsuan-Jui" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chen, Hsuan-Jui"' showing total 3 results

Start Over Author "Chen, Hsuan-Jui" Database OpenAIRE

3 results on '"Chen, Hsuan-Jui"'

1. Once-for-All Sequence Compression for Self-Supervised Speech Models

Author: Chen, Hsuan-Jui, Meng, Yen, and Lee, Hung-yi
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The sequence length along the time axis is often the dominant factor of the computation in speech processing. Works have been proposed to reduce the sequence length for lowering the computational cost in self-supervised speech models. However, different downstream tasks have different tolerance of sequence compressing, so a model that produces a fixed compressing rate may not fit all tasks. In this work, we introduce a once-for-all (OFA) sequence compression framework for self-supervised speech models that supports a continuous range of operating compressing rates. The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants with a smooth performance-efficiency trade-off. We further explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search., Accepted to ICASSP 2023
Published: 2022

2. DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

Author: Lin, Guan-Ting, Chuang, Yung-Sung, Chung, Ho-Lam, Yang, Shu-wen, Chen, Hsuan-Jui, Dong, Shuyan, Li, Shang-Wen, Mohamed, Abdelrahman, Lee, Hung-yi, and Lee, Lin-shan
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Spoken Question Answering (SQA) is to find the answer from a spoken document given a question, which is crucial for personal assistants when replying to the queries from the users. Existing SQA methods all rely on Automatic Speech Recognition (ASR) transcripts. Not only does ASR need to be trained with massive annotated data that are time and cost-prohibitive to collect for low-resourced languages, but more importantly, very often the answers to the questions include name entities or out-of-vocabulary words that cannot be recognized correctly. Also, ASR aims to minimize recognition errors equally over all words, including many function words irrelevant to the SQA task. Therefore, SQA without ASR transcripts (textless) is always highly desired, although known to be very difficult. This work proposes Discrete Spoken Unit Adaptive Learning (DUAL), leveraging unlabeled data for pre-training and fine-tuned by the SQA downstream task. The time intervals of spoken answers can be directly predicted from spoken documents. We also release a new SQA benchmark corpus, NMSQA, for data with more realistic scenarios. We empirically showed that DUAL yields results comparable to those obtained by cascading ASR and text QA model and robust to real-world data. Our code and model will be open-sourced., Comment: Accepted by Interspeech 2022
Published: 2022

3. SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Author: Tsai, Hsiang-Sheng, Chang, Heng-Jui, Huang, Wen-Chin, Huang, Zili, Lakhotia, Kushal, Yang, Shu-wen, Dong, Shuyan, Liu, Andy T., Lai, Cheng-I Jeff, Shi, Jiatong, Chang, Xuankai, Hall, Phil, Chen, Hsuan-Jui, Li, Shang-Wen, Watanabe, Shinji, Mohamed, Abdelrahman, and Lee, Hung-yi
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards introducing a common benchmark to evaluate pre-trained models across various speech tasks. In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain and quality across different types of tasks. It entails freezing pre-trained model parameters, only using simple task-specific trainable heads. The goal is to be inclusive of all researchers, and encourage efficient use of computational resources. We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation., Comment: ACL 2022 main conference
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

3 results on '"Chen, Hsuan-Jui"'

1. Once-for-All Sequence Compression for Self-Supervised Speech Models

2. DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

3. SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Database

Publisher

3 results on '"Chen, Hsuan-Jui"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources