Back to Search Start Over

A Large-Scale Evaluation of Speech Foundation Models

Authors :
Yang, Shu-wen
Chang, Heng-Jui
Huang, Zili
Liu, Andy T.
Lai, Cheng-I
Wu, Haibin
Shi, Jiatong
Chang, Xuankai
Tsai, Hsiang-Sheng
Huang, Wen-Chin
Feng, Tzu-hsun
Chi, Po-Han
Lin, Yist Y.
Chuang, Yung-Sung
Huang, Tzu-Hsien
Tseng, Wei-Cheng
Lakhotia, Kushal
Li, Shang-Wen
Mohamed, Abdelrahman
Watanabe, Shinji
Lee, Hung-yi
Yang, Shu-wen
Chang, Heng-Jui
Huang, Zili
Liu, Andy T.
Lai, Cheng-I
Wu, Haibin
Shi, Jiatong
Chang, Xuankai
Tsai, Hsiang-Sheng
Huang, Wen-Chin
Feng, Tzu-hsun
Chi, Po-Han
Lin, Yist Y.
Chuang, Yung-Sung
Huang, Tzu-Hsien
Tseng, Wei-Cheng
Lakhotia, Kushal
Li, Shang-Wen
Mohamed, Abdelrahman
Watanabe, Shinji
Lee, Hung-yi
Publication Year :
2024

Abstract

The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark.<br />Comment: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

Details

Database :
OAIster
Publication Type :
Electronic Resource
Accession number :
edsoai.on1438547357
Document Type :
Electronic Resource