Author: "Lin, Yi-An" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lin, Yi-An"' showing total 33,281 results

Start Over Author "Lin, Yi-An"

33,281 results on '"Lin, Yi-An"'

1. BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights

Author: Hsu, Chan-Jan, Lin, Yi-Cheng, Lin, Chia-Chun, Chen, Wei-Chih, Chung, Ho Lam, Li, Chen-An, Chen, Yi-Chang, Yu, Chien-Yu, Lee, Ming-Ji, Chen, Chien-Cheng, Huang, Ru-Heng, Lee, Hung-yi, and Shiu, Da-Shan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We present BreezyVoice, a Text-to-Speech (TTS) system specifically adapted for Taiwanese Mandarin, highlighting phonetic control abilities to address the unique challenges of polyphone disambiguation in the language. Building upon CosyVoice, we incorporate a $S^{3}$ tokenizer, a large language model (LLM), an optimal-transport conditional flow matching model (OT-CFM), and a grapheme to phoneme prediction model, to generate realistic speech that closely mimics human utterances. Our evaluation demonstrates BreezyVoice's superior performance in both general and code-switching contexts, highlighting its robustness and effectiveness in generating high-fidelity speech. Additionally, we address the challenges of generalizability in modeling long-tail speakers and polyphone disambiguation. Our approach significantly enhances performance and offers valuable insights into the workings of neural codec TTS systems.
Published: 2025

2. Holographic thread game, kinematic space and quantum circuit

Author: Lin, Yi-Yu, Zhang, Jun, Zhang, Xinyu, and Fu, Rong
Subjects: High Energy Physics - Theory, Quantum Physics
Abstract: In this paper, we present a holographic ``thread game", which visually represents quantum entanglement with thread-like objects, based on some natural assumptions. By studying their trajectories in the holographic bulk, we ultimately find that these threads should correspond precisely to perfect geodesics in the bulk, making kinematic space a tailor-made language for describing the thread game. Furthermore, it turns out that the thread game has a quantum circuit interpretation, where each thread plays the role of a wire. Therefore, kinematic space can be regarded as the ``input board" of the holographic quantum circuit characterizing the entanglement structure of the spacetime. Based on this understanding, we also present a nice explanation of holographic complexity in the language of kinematic space., Comment: 54 pages, 16 figures
Published: 2025

3. Emergence of Painting Ability via Recognition-Driven Evolution

Author: Lin, Yi, Gu, Lin, Cui, Ziteng, Su, Shenghan, Hao, Yumo, Tian, Yingtao, Harada, Tatsuya, and Yang, Jianfei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: From Paleolithic cave paintings to Impressionism, human painting has evolved to depict increasingly complex and detailed scenes, conveying more nuanced messages. This paper attempts to emerge this artistic capability by simulating the evolutionary pressures that enhance visual communication efficiency. Specifically, we present a model with a stroke branch and a palette branch that together simulate human-like painting. The palette branch learns a limited colour palette, while the stroke branch parameterises each stroke using B\'ezier curves to render an image, subsequently evaluated by a high-level recognition module. We quantify the efficiency of visual communication by measuring the recognition accuracy achieved with machine vision. The model then optimises the control points and colour choices for each stroke to maximise recognition accuracy with minimal strokes and colours. Experimental results show that our model achieves superior performance in high-level recognition tasks, delivering artistic expression and aesthetic appeal, especially in abstract sketches. Additionally, our approach shows promise as an efficient bit-level image compression technique, outperforming traditional methods.
Published: 2025

4. Merging Context Clustering with Visual State Space Models for Medical Image Segmentation

Author: Zhu, Yun, Zhang, Dong, Lin, Yi, Feng, Yifei, and Tang, Jinhui
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Medical image segmentation demands the aggregation of global and local feature representations, posing a challenge for current methodologies in handling both long-range and short-range feature interactions. Recently, vision mamba (ViM) models have emerged as promising solutions for addressing model complexities by excelling in long-range feature iterations with linear complexity. However, existing ViM approaches overlook the importance of preserving short-range local dependencies by directly flattening spatial tokens and are constrained by fixed scanning patterns that limit the capture of dynamic spatial context information. To address these challenges, we introduce a simple yet effective method named context clustering ViM (CCViM), which incorporates a context clustering module within the existing ViM models to segment image tokens into distinct windows for adaptable local clustering. Our method effectively combines long-range and short-range feature interactions, thereby enhancing spatial contextual representations for medical image segmentation tasks. Extensive experimental evaluations on diverse public datasets, i.e., Kumar, CPM17, ISIC17, ISIC18, and Synapse demonstrate the superior performance of our method compared to current state-of-the-art methods. Our code can be found at https://github.com/zymissy/CCViM., Comment: Our paper has been accepted by the IEEE Transactions on Medical Imaging. Our code can be found at https://github.com/zymissy/CCViM
Published: 2025

5. The Calder\'on problem for the logarithmic Schr\'odinger equation

Author: Harrach, Bastian, Lin, Yi-Hsuan, and Weth, Tobias
Subjects: Mathematics - Analysis of PDEs
Abstract: We study the Calder\'on problem for a logarithmic Schr\"odinger type operator of the form $L_{\Delta} +q$, where $L_{\Delta}$ denotes the logarithmic Laplacian, which arises as formal derivative $\frac{d}{ds} \big|_{s=0}(-\Delta)^s$ of the family of fractional Laplacian operators. This operator enjoys remarkable nonlocal properties, such as the unique continuation and Runge approximation. Based on these tools, we can uniquely determine bounded potentials using the Dirichlet-to-Neumann map. Additionally, we can build a constructive uniqueness result by utilizing the monotonicity method. Our results hold for any space dimension., Comment: 19 pages. All comments are welcome
Published: 2024

6. Entanglement principle for the fractional Laplacian with applications to inverse problems

Author: Feizmohammadi, Ali and Lin, Yi-Hsuan
Subjects: Mathematics - Analysis of PDEs, Mathematics - Complex Variables
Abstract: We prove an entanglement principle for fractional Laplace operators on $\mathbb R^n$ for $n\geq 2$ as follows; if different fractional powers of the Laplace operator acting on several distinct functions on $\mathbb R^n$, which vanish on some nonempty open set $O$, are known to be linearly dependent on $O$, then all the functions must be globally zero. This remarkable principle was recently discovered to be true for smooth functions on compact Riemannian manifolds without boundary \cite{FKU24}. Our main result extends the principle to the noncompact Euclidean space stated for tempered distributions under suitable decay conditions at infinity. We also present applications of this principle to solve new inverse problems for recovering anisotropic principal terms as well as zeroth order coefficients in fractional polyharmonic equations. Our proof of the entanglement principle uses the heat semigroup formulation of fractional Laplacian to establish connections between the principle and the study of several topics including interpolation properties for holomorphic functions under certain growth conditions at infinity, meromorphic extensions of holomorphic functions from a subdomain, as well as support theorems for spherical mean transforms on $\mathbb R^n$ that are defined as averages of functions over spheres., Comment: 33 pages
Published: 2024

7. Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism

Author: Lin, Yi-Chien, Kwon, Woosuk, Pineda, Ronald, and Paravecino, Fanny Nina
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Serving Large Language Models (LLMs) efficiently has become crucial. LLMs are often served with multiple devices using techniques like data, pipeline, and tensor parallelisms. Each parallelism presents trade-offs between computation, memory, and communication overhead, making it challenging to determine the optimal parallel execution plan. Moreover, input workloads also impact parallelism strategies. Tasks with long prompts like article summarization are compute-intensive, while tasks with long generation lengths like code generation are often memory-intensive; these differing characteristics result in distinct optimal execution plans. Since searching for the optimal plan via actual deployment is prohibitively expensive, we propose APEX, an LLM serving system simulator that efficiently identifies an optimal parallel execution plan. APEX captures the complex characteristics of iteration-level batching, a technique widely used in SOTA LLM serving systems. APEX leverages the repetitive structure of LLMs to reduce design space, maintaining a similar simulation overhead, even when scaling to trillion scale models. APEX supports a wide range of LLMs, device clusters, etc., and it can be easily extended through its high-level templates. We run APEX simulations using a CPU and evaluate the identified optimal plans using 8 H100 GPUs, encompassing a wide range of LLMs and input workloads. We show that APEX can find optimal execution plans that are up to 4.42x faster than heuristic plans in terms of end-to-end serving latency. APEX also reports a set of metrics used in LLM serving systems, such as time per output token and time to first token. Furthermore, APEX can identify an optimal parallel execution plan within 15 minutes using a CPU. This is 71x faster and 1234x more cost-effective than actual deployment on a GPU cluster using cloud services. APEX will be open-sourced upon acceptance.
Published: 2024

8. Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

Author: Yang, Chih-Kai, Fu, Yu-Kuan, Li, Chen-An, Lin, Yi-Cheng, Lin, Yu-Xiang, Chen, Wei-Chih, Chung, Ho Lam, Kuan, Chun-Yi, Huang, Wei-Ping, Lu, Ke-Han, Lin, Tzu-Quan, Wang, Hsiu-Hsuan, Hu, En-Pei, Hsu, Chan-Jan, Tseng, Liang-Hsuan, Chiu, I-Hsiang, Sanga, Ulin, Chen, Xuanjun, Hsu, Po-chun, Yang, Shu-wen, and Lee, Hung-yi
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin., Comment: Work in progress
Published: 2024

9. Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Author: Huang, Chien-yu, Chen, Wei-Chih, Yang, Shu-wen, Liu, Andy T., Li, Chen-An, Lin, Yu-Xiang, Tseng, Wei-Cheng, Diwan, Anuj, Shih, Yi-Jen, Shi, Jiatong, Chen, William, Chen, Xuanjun, Hsiao, Chi-Yuan, Peng, Puyuan, Wang, Shih-Heng, Kuan, Chun-Yi, Lu, Ke-Han, Chang, Kai-Wei, Yang, Chih-Kai, Ritter-Gutierrez, Fabian, Chuang, Ming To, Huang, Kuan-Po, Arora, Siddhant, Lin, You-Kuan, Yeo, Eunjung, Chang, Kalvin, Chien, Chung-Ming, Choi, Kwanghee, Hsieh, Cheng-Hsiu, Lin, Yi-Cheng, Yu, Chee-En, Chiu, I-Hsiang, Guimarães, Heitor R., Han, Jionghao, Lin, Tzu-Quan, Lin, Tzu-Yuan, Chang, Homu, Chang, Ting-Wu, Chen, Chun Wei, Chen, Shou-Jen, Chen, Yu-Hua, Cheng, Hsi-Chun, Dhawan, Kunal, Fang, Jia-Lin, Fang, Shi-Xin, Chiang, Kuan-Yu Fang, Fu, Chi An, Hsiao, Hsien-Fu, Hsu, Ching Yu, Huang, Shao-Syuan, Wei, Lee Chen, Lin, Hsi-Che, Lin, Hsuan-Hao, Lin, Hsuan-Ting, Lin, Jian-Ren, Liu, Ting-Chun, Lu, Li-Chun, Pai, Tsung-Min, Pasad, Ankita, Kuan, Shih-Yun Shan, Shon, Suwon, Tang, Yuxun, Tsai, Yun-Shao, Wei, Jui-Chiang, Wei, Tzu-Chieh, Wu, Chengxi, Wu, Dien-Ruei, Yang, Chao-Han Huck, Yang, Chieh-Chi, Yip, Jia Qi, Yuan, Shao-Xiang, Noroozi, Vahid, Chen, Zhehuai, Wu, Haibin, Livescu, Karen, Harwath, David, Watanabe, Shinji, and Lee, Hung-yi
Subjects: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.
Published: 2024

10. YourSkatingCoach: A Figure Skating Video Benchmark for Fine-Grained Element Analysis

Author: Chen, Wei-Yi, Lin, Yi-Ling, Su, Yu-An, Yeh, Wei-Hsin, and Ku, Lun-Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Combining sports and machine learning involves leveraging ML algorithms and techniques to extract insight from sports-related data such as player statistics, game footage, and other relevant information. However, datasets related to figure skating in the literature focus primarily on element classification and are currently unavailable or exhibit only limited access, which greatly raise the entry barrier to developing visual sports technology for it. Moreover, when using such data to help athletes improve their skills, we find they are very coarse-grained: they work for learning what an element is, but they are poorly suited to learning whether the element is good or bad. Here we propose air time detection, a novel motion analysis task, the goal of which is to accurately detect the duration of the air time of a jump. We present YourSkatingCoach, a large, novel figure skating dataset which contains 454 videos of jump elements, the detected skater skeletons in each video, along with the gold labels of the start and ending frames of each jump, together as a video benchmark for figure skating. In addition, although this type of task is often viewed as classification, we cast it as a sequential labeling problem and propose a Transformer-based model to calculate the duration. Experimental results show that the proposed model yields a favorable results for a strong baseline. To further verify the generalizability of the fine-grained labels, we apply the same process to other sports as cross-sports tasks but for coarse-grained task action classification. Here we fine-tune the classification to demonstrate that figure skating, as it contains the essential body movements, constitutes a strong foundation for adaptation to other sports.
Published: 2024

11. The fractional anisotropic Calder\'{o}n problem for a nonlocal parabolic equation on closed Riemannian manifolds

Author: Lin, Yi-Hsuan
Subjects: Mathematics - Analysis of PDEs, Mathematics - Differential Geometry
Abstract: We consider the fractional anisotropic Calder\'on problem for the nonlocal parabolic equation $(\partial_t -\Delta_g)^s u=f$ ($0
Published: 2024

12. Kirillov's conjecture on Hecke-Grothendieck polynomials

Author: Brubaker, Ben, Dasher, A. Suki, Hu, Michael, Jain, Nupur, Li, Yifan, Lin, Yi, Mihaila, Maria, Tran, Van, and Ünel, I. Deniz
Subjects: Mathematics - Combinatorics, Mathematical Physics, Mathematics - Representation Theory
Abstract: We use algebraic methods in statistical mechanics to represent a multi-parameter class of polynomials in severable variables as partition functions of a new family of solvable lattice models. The class of polynomials, defined by A.N. Kirillov, is derived from the largest class of divided difference operators satisfying the braid relations of Cartan type $A$. It includes as specializations Schubert, Grothendieck, and dual-Grothendieck polynomials among others. In particular, our results prove positivity conjectures of Kirillov for the subfamily of Hecke--Grothendieck polynomials, while the larger family is shown to exhibit rare instances of negative coefficients.
Published: 2024

13. Bots can Snoop: Uncovering and Mitigating Privacy Risks of Bots in Group Chats

Author: Chou, Kai-Hsiang, Lin, Yi-Min, Wang, Yi-An, Li, Jonathan Weiping, Kim, Tiffany Hyun-Jin, and Hsiao, Hsu-Chun
Subjects: Computer Science - Cryptography and Security
Abstract: New privacy concerns arise with chatbots on group messaging platforms. Chatbots may access information beyond their intended functionalities, such as messages unintended for chatbots or sender's identities. Chatbot operators may exploit such information to infer personal information and link users across groups, potentially leading to personal data breaches, pervasive tracking, and targeted advertising. Our analysis of conversation datasets shows that (1) chatbots often access far more messages than needed, and (2) when a user joins a new group with chatbots, there is a 3.4% chance that at least one of the chatbots can recognize and associate the user with their previous interactions in other groups. Although state-of-the-art group messaging protocols provide robust end-to-end security and some platforms have implemented policies to limit chatbot access, no platforms successfully combine these features. This paper introduces SnoopGuard, a secure group messaging protocol that ensures user privacy against chatbots while maintaining strong end-to-end security. Our method offers selective message access, preventing chatbots from accessing unrelated messages, and ensures sender anonymity within the group. SnoopGuard achieves $O(\log n + m)$ message-sending complexity for a group of $n$ users and $m$ chatbots, compared to $O(\log(n + m))$ in state-of-the-art protocols, with acceptable overhead for enhanced privacy. Our prototype implementation shows that sending a message in a group of 50 users and 10 chatbots takes about 30 milliseconds when integrated with Message Layer Security (MLS)., Comment: 18 pages, 5 figures
Published: 2024

14. Attentive-based Multi-level Feature Fusion for Voice Disorder Diagnosis

Author: Shen, Lipeng, Xiong, Yifan, Guo, Dongyue, Mo, Wei, Yu, Lingyu, Yang, Hui, and Lin, Yi
Subjects: Computer Science - Sound, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Voice disorders negatively impact the quality of daily life in various ways. However, accurately recognizing the category of pathological features from raw audio remains a considerable challenge due to the limited dataset. A promising method to handle this issue is extracting multi-level pathological information from speech in a comprehensive manner by fusing features in the latent space. In this paper, a novel framework is designed to explore the way of high-quality feature fusion for effective and generalized detection performance. Specifically, the proposed model follows a two-stage training paradigm: (1) ECAPA-TDNN and Wav2vec 2.0 which have shown remarkable effectiveness in various domains are employed to learn the universal pathological information from raw audio; (2) An attentive fusion module is dedicatedly designed to establish the interaction between pathological features projected by EcapTdnn and Wav2vec 2.0 respectively and guide the multi-layer fusion, the entire model is jointly fine-tuned from pre-trained features by the automatic voice pathology detection task. Finally, comprehensive experiments on the FEMH and SVD datasets demonstrate that the proposed framework outperforms the competitive baselines, and achieves the accuracy of 90.51% and 87.68%.
Published: 2024

15. Revisiting Deep Ensemble Uncertainty for Enhanced Medical Anomaly Detection

Author: Gu, Yi, Lin, Yi, Cheng, Kwang-Ting, and Chen, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical anomaly detection (AD) is crucial in pathological identification and localization. Current methods typically rely on uncertainty estimation in deep ensembles to detect anomalies, assuming that ensemble learners should agree on normal samples while exhibiting disagreement on unseen anomalies in the output space. However, these methods may suffer from inadequate disagreement on anomalies or diminished agreement on normal samples. To tackle these issues, we propose D2UE, a Diversified Dual-space Uncertainty Estimation framework for medical anomaly detection. To effectively balance agreement and disagreement for anomaly detection, we propose Redundancy-Aware Repulsion (RAR), which uses a similarity kernel that remains invariant to both isotropic scaling and orthogonal transformations, explicitly promoting diversity in learners' feature space. Moreover, to accentuate anomalous regions, we develop Dual-Space Uncertainty (DSU), which utilizes the ensemble's uncertainty in input and output spaces. In input space, we first calculate gradients of reconstruction error with respect to input images. The gradients are then integrated with reconstruction outputs to estimate uncertainty for inputs, enabling effective anomaly discrimination even when output space disagreement is minimal. We conduct a comprehensive evaluation of five medical benchmarks with different backbones. Experimental results demonstrate the superiority of our method to state-of-the-art methods and the effectiveness of each component in our framework. Our code is available at https://github.com/Rubiscol/D2UE., Comment: Early accepted by MICCAI2024
Published: 2024

16. Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

Author: Wu, Haibin, Chen, Xuanjun, Lin, Yi-Cheng, Chang, Kaiwei, Du, Jiawei, Lu, Ke-Han, Liu, Alexander H., Chung, Ho-Lam, Wu, Yuan-Kuei, Yang, Dongchao, Liu, Songxiang, Wu, Yi-Chiao, Tan, Xu, Glass, James, Watanabe, Shinji, and Lee, Hung-yi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling. The ideal neural audio codec should maintain content, paralinguistics, speaker characteristics, and audio information even at low bitrates. Recently, numerous advanced neural codec models have been proposed. However, codec models are often tested under varying experimental conditions. As a result, we introduce the Codec-SUPERB challenge at SLT 2024, designed to facilitate fair and lightweight comparisons among existing codec models and inspire advancements in the field. This challenge brings together representative speech applications and objective metrics, and carefully selects license-free datasets, sampling them into small sets to reduce evaluation computation costs. This paper presents the challenge's rules, datasets, five participant systems, results, and findings.
Published: 2024

17. Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection

Author: Lin, Hsi-Che, Lin, Yi-Cheng, Chou, Huang-Cheng, and Lee, Hung-yi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction. However, building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese. In this paper, we propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages. Specifically, we employ expressive Speech-to-Speech translation (S2ST) combined with a novel bootstrapping data selection pipeline to generate labeled data in the target language. Extensive experiments demonstrate that our method is both effective and generalizable across different upstream models and languages. Our results suggest that this approach can facilitate the development of more scalable and robust multilingual SER systems., Comment: 5 pages, 2 figures, Accepted to ICASSP 2025
Published: 2024

18. Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement

Author: Ren, Wenze, Wu, Haibin, Lin, Yi-Cheng, Chen, Xuanjun, Chao, Rong, Hung, Kuo-Hsuan, Li, You-Jin, Ting, Wen-Yuan, Wang, Hsin-Min, and Tsao, Yu
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction. Traditional methods, such as CNN or LSTM, attempt to model the temporal dynamics of full-band and sub-band spectral and spatial features. However, these approaches face limitations in fully modeling complex temporal dependencies, especially in dynamic acoustic environments. To overcome these challenges, we modify the current advanced model McNet by introducing an improved version of Mamba, a state-space model, and further propose MCMamba. MCMamba has been completely reengineered to integrate full-band and narrow-band spatial information with sub-band and full-band spectral features, providing a more comprehensive approach to modeling spatial and spectral information. Our experimental results demonstrate that MCMamba significantly improves the modeling of spatial and spectral features in multichannel speech enhancement, outperforming McNet and achieving state-of-the-art performance on the CHiME-3 dataset. Additionally, we find that Mamba performs exceptionally well in modeling spectral information., Comment: Accepted by ICASSP 2025
Published: 2024

19. Absence of itinerant ferromagnetism in a cobalt-based oxypnictide

Author: Li, Hua-Xun, Jiang, Hao, Lin, Yi-Qiang, Li, Jia-Xin, Song, Shi-Jie, Zhu, Qin-Qing, Ren, Zhi, and Cao, Guang-Han
Subjects: Condensed Matter - Materials Science, Condensed Matter - Strongly Correlated Electrons
Abstract: We report a layered transition-metal-ordered oxypnictide Sr$_{2}$CrCoAsO$_{3}$. The new material was synthesized by solid-state reactions under vacuum. It has an intergrowth structure with a perovskite-like Sr$_3$Cr$_2$O$_6$ unit and ThCr$_2$Si$_2$-type SrCo$_2$As$_2$ block stacking coherently along the crystallographic $c$ axis. The measurements of electrical resistivity, magnetic susceptibility, and specific heat indicate metallic conductivity from the CoAs layers and short-range antiferromagnetic ordering in the CrO$_{2}$ planes. No itinerant-electron ferromagnetism expected in CoAs layers is observed. This result, combined with the first-principles calculations and the previous reports of other CoAs-layer-based materials, suggests that the Co$-$Co bondlength plays a crucial role in the emergence of itinerant ferromagnetism., Comment: 9 pages, 7 figures
Published: 2024
Full Text: View/download PDF

20. Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget

Author: Liu, Andy T., Lin, Yi-Cheng, Wu, Haibin, Winkler, Stefan, and Lee, Hung-yi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound
Abstract: Despite their impressive success, training foundation models remains computationally costly. This paper investigates how to efficiently train speech foundation models with self-supervised learning (SSL) under a limited compute budget. We examine critical factors in SSL that impact the budget, including model architecture, model size, and data size. Our goal is to make analytical steps toward understanding the training dynamics of speech foundation models. We benchmark SSL objectives in an entirely comparable setting and find that other factors contribute more significantly to the success of SSL. Our results show that slimmer model architectures outperform common small architectures under the same compute and parameter budget. We demonstrate that the size of the pre-training data remains crucial, even with data augmentation during SSL training, as performance suffers when iterating over limited data. Finally, we identify a trade-off between model size and data size, highlighting an optimal model size for a given compute budget., Comment: To appear in SLT 2024
Published: 2024

21. FIF-UNet: An Efficient UNet Using Feature Interaction and Fusion for Medical Image Segmentation

Author: Gou, Xiaolin, Liao, Chuanlin, Zhou, Jizhe, Ye, Fengshuo, and Lin, Yi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Nowadays, pre-trained encoders are widely used in medical image segmentation because of their ability to capture complex feature representations. However, the existing models fail to effectively utilize the rich features obtained by the pre-trained encoder, resulting in suboptimal segmentation results. In this work, a novel U-shaped model, called FIF-UNet, is proposed to address the above issue, including three plug-and-play modules. A channel spatial interaction module (CSI) is proposed to obtain informative features by establishing the interaction between encoder stages and corresponding decoder stages. A cascaded conv-SE module (CoSE) is designed to enhance the representation of critical features by adaptively assigning importance weights on different feature channels. A multi-level fusion module (MLF) is proposed to fuse the multi-scale features from the decoder stages, ensuring accurate and robust final segmentation. Comprehensive experiments on the Synapse and ACDC datasets demonstrate that the proposed FIF-UNet outperforms existing state-of-the-art methods, which achieves the highest average DICE of 86.05% and 92.58%, respectively.
Published: 2024

22. A local uniqueness theorem for the fractional Schr\'{o}dinger equation on closed Riemannian manifolds

Author: Lin, Yi-Hsuan
Subjects: Mathematics - Analysis of PDEs
Abstract: We investigate that a potential $V$ in the fractional Schr\"odinger equation $( (-\Delta_g )^s +V ) u=f$ can be recovered locally by using the local source-to-solution map on smooth connected closed Riemannian manifolds. To achieve this goal, we derive a related new Runge approximation property., Comment: 6 pages
Published: 2024

23. Zinc finger nuclease-mediated gene editing in hematopoietic stem cells results in reactivation of fetal hemoglobin in sickle cell disease.

Author: Lessard, Samuel, Rimmelé, Pauline, Ling, Hui, Moran, Kevin, Vieira, Benjamin, Lin, Yi-Dong, Rajani, Gaurav, Hong, Vu, Reik, Andreas, Boismenu, Richard, Hsu, Ben, Chen, Michael, Cockroft, Bettina, Uchida, Naoya, Tisdale, John, Alavi, Asif, Krishnamurti, Lakshmanan, Abedi, Mehrdad, Galeon, Isobelle, Reiner, David, Wang, Lin, Ramezi, Anne, Rendo, Pablo, Walters, Mark, Levasseur, Dana, Peters, Robert, Harris, Timothy, and Hicks, Alexandra
Subjects: Anemia, Sickle Cell, Fetal Hemoglobin, Humans, Gene Editing, Hematopoietic Stem Cells, Zinc Finger Nucleases, Female, Male, Adult, Hematopoietic Stem Cell Transplantation, Animals, Mice, Repressor Proteins
Abstract: BIVV003 is a gene-edited autologous cell therapy in clinical development for the potential treatment of sickle cell disease (SCD). Hematopoietic stem cells (HSC) are genetically modified with mRNA encoding zinc finger nucleases (ZFN) that target and disrupt a specific regulatory GATAA motif in the BCL11A erythroid enhancer to reactivate fetal hemoglobin (HbF). We characterized ZFN-edited HSC from healthy donors and donors with SCD. Results of preclinical studies show that ZFN-mediated editing is highly efficient, with enriched biallelic editing and high frequency of on-target indels, producing HSC capable of long-term multilineage engraftment in vivo, and express HbF in erythroid progeny. Interim results from the Phase 1/2 PRECIZN-1 study demonstrated that BIVV003 was well-tolerated in seven participants with SCD, of whom five of the six with more than 3 months of follow-up displayed increased total hemoglobin and HbF, and no severe vaso-occlusive crises. Our data suggest BIVV003 represents a compelling and novel cell therapy for the potential treatment of SCD.
Published: 2024

24. Ambivalence

Author: Lin, Yi Hsuan
Abstract: This music score was submitted for Resonate 2024: An Open Access Call for Scores by the UCLA Music Library.
Published: 2024

25. Aligning Medical Images with General Knowledge from Large Language Models

Author: Fang, Xiao, Lin, Yi, Zhang, Dong, Cheng, Kwang-Ting, and Chen, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Pre-trained large vision-language models (VLMs) like CLIP have revolutionized visual representation learning using natural language as supervisions, and demonstrated promising generalization ability. In this work, we propose ViP, a novel visual symptom-guided prompt learning framework for medical image analysis, which facilitates general knowledge transfer from CLIP. ViP consists of two key components: a visual symptom generator (VSG) and a dual-prompt network. Specifically, VSG aims to extract explicable visual symptoms from pre-trained large language models, while the dual-prompt network utilizes these visual symptoms to guide the training on two learnable prompt modules, i.e., context prompt and merge prompt, which effectively adapts our framework to medical image analysis via large VLMs. Extensive experimental results demonstrate that ViP can outperform state-of-the-art methods on two challenging datasets.
Published: 2024

26. Optimal Runge approximation for nonlocal wave equations and unique determination of polyhomogeneous nonlinearities

Author: Lin, Yi-Hsuan, Tyni, Teemu, and Zimmermann, Philipp
Subjects: Mathematics - Analysis of PDEs, Primary 35R30, Secondary 26A33, 42B37
Abstract: The main purpose of this article is to establish the Runge-type approximation in $L^2(0,T;\widetilde{H}^s(\Omega))$ for solutions of linear nonlocal wave equations. To achieve this, we extend the theory of very weak solutions for classical wave equations to our nonlocal framework. This strengthened Runge approximation property allows us to extend the existing uniqueness results for Calder\'on problems of linear and nonlinear nonlocal wave equations in our earlier works. Furthermore, we prove unique determination results for the Calder\'on problem of nonlocal wave equations with polyhomogeneous nonlinearities., Comment: 38 pages
Published: 2024

27. Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

Author: Li, Weijia, Yu, Jinhua, Chen, Dairong, Lin, Yi, Dong, Runmin, Zhang, Xiang, He, Conghui, and Fu, Haohuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we propose a geometry-aware semi-supervised framework for fine-grained building function recognition, utilizing geometric relationships among multi-source data to enhance pseudo-label accuracy in semi-supervised learning, broadening its applicability to various building function categorization systems. Firstly, we design an online semi-supervised pre-training stage, which facilitates the precise acquisition of building facade location information in street-view images. In the second stage, we propose a geometry-aware coarse annotation generation module. This module effectively combines GIS data and street-view data based on the geometric relationships, improving the accuracy of pseudo annotations. In the third stage, we combine the newly generated coarse annotations with the existing labeled dataset to achieve fine-grained functional recognition of buildings across multiple cities at a large scale. Extensive experiments demonstrate that our proposed framework exhibits superior performance in fine-grained functional recognition of buildings. Within the same categorization system, it achieves improvements of 7.6\% and 4.8\% compared to fully-supervised methods and state-of-the-art semi-supervised methods, respectively. Additionally, our method also performs well in cross-city scenarios, i.e., extending the model trained on OmniCity (New York) to new cities (i.e., Los Angeles and Boston) with different building function categorization systems. This study offers a new solution for large-scale multi-city applications with minimal annotation requirements, facilitating more efficient data updates and resource allocation in urban management., Comment: This paper is currently under review
Published: 2024

28. Ultrafast creation of a light induced semimetallic state in strongly excited 1T-TiSe$_2$

Author: Huber, Maximilian, Lin, Yi, Marini, Giovanni, Moreschini, Luca, Jozwiak, Chris, Bostwick, Aaron, Calandra, Matteo, and Lanzara, Alessandra
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Materials Science
Abstract: Screening, a ubiquitous phenomenon associated with the shielding of electric fields by surrounding charges, has been widely adopted as a means to modify a material's properties. While so far most studies have relied on static changes of screening through doping or gating, here we demonstrate that screening can also drive the onset of distinct quantum states on the ultrafast timescale. By using time and angle-resolved photoemission spectroscopy we show that intense optical excitation can drive 1T-TiSe$_2$, a prototypical charge density wave material, almost instantly from a gapped into a semimetallic state. By systematically comparing changes in bandstructure over time and excitation strength with theoretical calculations we find that the appearance of this state is likely caused by a dramatic reduction of the screening length. In summary, this work showcases how optical excitation enables the screening driven design of a non-equilibrium semimetallic phase in TiSe$_2$, possibly providing a general pathway into highly screened phases in other strongly correlated materials.
Published: 2024
Full Text: View/download PDF

29. The Calder\'on problem for the Schr\'odinger equation in transversally anisotropic geometries with partial data

Author: Lin, Yi-Hsuan, Nakamura, Gen, and Zimmermann, Philipp
Subjects: Mathematics - Analysis of PDEs
Abstract: We study the partial data Calder\'on problem for the anisotropic Schr\"{o}dinger equation \begin{equation} \label{eq: a1} (-\Delta_{\widetilde{g}}+V)u=0\text{ in }\Omega\times (0,\infty), \end{equation} where $\Omega\subset\mathbb{R}^n$ is a bounded smooth domain, $\widetilde{g}=g_{ij}(x)dx^{i}\otimes dx^j+dy\otimes dy$ and $V$ is translationally invariant in the $y$ direction. Our goal is to recover both the metric $g$ and the potential $V$ from the (partial) Neumann-to-Dirichlet (ND) map on $\Gamma\times \{0\}$ with $\Gamma\Subset \Omega$. Our approach can be divided into three steps: Step 1. Boundary determination. We establish a novel boundary determination to identify $(g,V)$ on $\Gamma$ with help of suitable approximate solutions for the Schr\"odinger equation with inhomogeneous Neumann boundary condition. Step 2. Relation to a nonlocal elliptic inverse problem. We relate inverse problems for the Schr\"odinger equation with the nonlocal elliptic equation \begin{equation} \label{eq: a2} (-\Delta_g+V)^{1/2}v=f\text{ in }\Omega, \end{equation} via the Caffarelli--Silvestre type extension, where the measurements are encoded in the source-to-solution map. The nonlocality of this inverse problem allows us to recover the associated heat kernel. Step 3. Reduction to an inverse problem for a wave equation. Combining the knowledge of the heat kernel with the Kannai type transmutation formula, we transfer the inverse problem for the nonlocal equation to an inverse problem for the wave equation \begin{equation} \label{eq: a3} (\partial_t^2-\Delta_g+V)w=F\text{ in }\Omega\times (0,\infty), \end{equation} where the measurement operator is also the source-to-solution map. We can finally determine $(g,V)$ on $\Omega\setminus\Gamma$ by solving the inverse problem for the wave equation., Comment: 54 pages. All comments are welcome
Published: 2024

30. Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models

Author: Lin, Yi-Cheng, Chen, Wei-Chih, and Lee, Hung-yi
Subjects: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Warning: This paper may contain texts with uncomfortable content. Large Language Models (LLMs) have achieved remarkable performance in various tasks, including those involving multimodal data like speech. However, these models often exhibit biases due to the nature of their training data. Recently, more Speech Large Language Models (SLLMs) have emerged, underscoring the urgent need to address these biases. This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in SLLMs. By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. Our experiments reveal significant insights into their performance and bias levels. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
Published: 2024

31. SkyDiffusion: Ground-to-Aerial Image Synthesis with Diffusion Models and BEV Paradigm

Author: Ye, Junyan, He, Jun, Li, Weijia, Lv, Zhutao, Lin, Yi, Yu, Jinhua, Yang, Haote, and He, Conghui
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Ground-to-aerial image synthesis focuses on generating realistic aerial images from corresponding ground street view images while maintaining consistent content layout, simulating a top-down view. The significant viewpoint difference leads to domain gaps between views, and dense urban scenes limit the visible range of street views, making this cross-view generation task particularly challenging. In this paper, we introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images, utilizing a diffusion model and the Bird's-Eye View (BEV) paradigm. The Curved-BEV method in SkyDiffusion converts street-view images into a BEV perspective, effectively bridging the domain gap, and employs a "multi-to-one" mapping strategy to address occlusion issues in dense urban scenes. Next, SkyDiffusion designed a BEV-guided diffusion model to generate content-consistent and realistic aerial images. Additionally, we introduce a novel dataset, Ground2Aerial-3, designed for diverse ground-to-aerial image synthesis applications, including disaster scene aerial synthesis, historical high-resolution satellite image synthesis, and low-altitude UAV image synthesis tasks. Experimental results demonstrate that SkyDiffusion outperforms state-of-the-art methods on cross-view datasets across natural (CVUSA), suburban (CVACT), urban (VIGOR-Chicago), and various application scenarios (G2A-3), achieving realistic and content-consistent aerial image generation. More result and dataset information can be found at https://opendatalab.github.io/skydiffusion/ ., Comment: 10 pages, 7 figures
Published: 2024

32. ProSpec RL: Plan Ahead, then Execute

Author: Liu, Liangliang, Guan, Yi, Wang, BoRan, Shen, Rujia, Lin, Yi, Kong, Chaoran, Yan, Lian, and Jiang, Jingchi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Imagining potential outcomes of actions before execution helps agents make more informed decisions, a prospective thinking ability fundamental to human cognition. However, mainstream model-free Reinforcement Learning (RL) methods lack the ability to proactively envision future scenarios, plan, and guide strategies. These methods typically rely on trial and error to adjust policy functions, aiming to maximize cumulative rewards or long-term value, even if such high-reward decisions place the environment in extremely dangerous states. To address this, we propose the Prospective (ProSpec) RL method, which makes higher-value, lower-risk optimal decisions by imagining future n-stream trajectories. Specifically, ProSpec employs a dynamic model to predict future states (termed "imagined states") based on the current state and a series of sampled actions. Furthermore, we integrate the concept of Model Predictive Control and introduce a cycle consistency constraint that allows the agent to evaluate and select the optimal actions from these trajectories. Moreover, ProSpec employs cycle consistency to mitigate two fundamental issues in RL: augmenting state reversibility to avoid irreversible events (low risk) and augmenting actions to generate numerous virtual trajectories, thereby improving data efficiency. We validated the effectiveness of our method on the DMControl benchmarks, where our approach achieved significant performance improvements. Code will be open-sourced upon acceptance.
Published: 2024

33. Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

Author: Ma, Jiabo, Guo, Zhengrui, Zhou, Fengtao, Wang, Yihui, Xu, Yingxue, Cai, Yu, Zhu, Zhengjie, Jin, Cheng, Lin, Yi, Jiang, Xinrui, Han, Anjia, Liang, Li, Chan, Ronald Cheong Kin, Wang, Jiguang, Cheng, Kwang-Ting, and Chen, Hao
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear. To address this gap, we established a most comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 39 specific tasks. Our findings reveal that existing foundation models excel at certain task types but struggle to effectively handle the full breadth of clinical tasks. To improve the generalization of pathology foundation models, we propose a unified knowledge distillation framework consisting of both expert and self knowledge distillation, where the former allows the model to learn from the knowledge of multiple expert models, while the latter leverages self-distillation to enable image representation learning via local-global alignment. Based on this framework, a Generalizable Pathology Foundation Model (GPFM) is pretrained on a large-scale dataset consisting of 190 million images from around 86,000 public H&E whole slides across 34 major tissue types. Evaluated on the established benchmark, GPFM achieves an impressive average rank of 1.36, with 29 tasks ranked 1st, while the the second-best model, UNI, attains an average rank of 2.96, with only 4 tasks ranked 1st. The superior generalization of GPFM demonstrates its exceptional modeling capabilities across a wide range of clinical tasks, positioning it as a new cornerstone for feature representation in CPath.
Published: 2024

34. EMO-Codec: An In-Depth Look at Emotion Preservation capacity of Legacy and Neural Codec Models With Subjective and Objective Evaluations

Author: Ren, Wenze, Lin, Yi-Cheng, Chou, Huang-Cheng, Wu, Haibin, Wu, Yi-Chiao, Lee, Chi-Chun, Lee, Hung-yi, and Tsao, Yu
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: The neural codec model reduces speech data transmission delay and serves as the foundational tokenizer for speech language models (speech LMs). Preserving emotional information in codecs is crucial for effective communication and context understanding. However, there is a lack of studies on emotion loss in existing codecs. This paper evaluates neural and legacy codecs using subjective and objective methods on emotion datasets like IEMOCAP. Our study identifies which codecs best preserve emotional information under various bitrate scenarios. We found that training codec models with both English and Chinese data had limited success in retaining emotional information in Chinese. Additionally, resynthesizing speech through these codecs degrades the performance of speech emotion recognition (SER), particularly for emotions like sadness, depression, fear, and disgust. Human listening tests confirmed these findings. This work guides future speech technology developments to ensure new codecs maintain the integrity of emotional information in speech.
Published: 2024

35. Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models

Author: Lin, Yi-Cheng, Lin, Tzu-Quan, Yang, Chih-Kai, Lu, Ke-Han, Chen, Wei-Chih, Kuan, Chun-Yi, and Lee, Hung-yi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Speech Integrated Large Language Models (SILLMs) combine large language models with speech perception to perform diverse tasks, such as emotion recognition to speaker verification, demonstrating universal audio understanding capability. However, these models may amplify biases present in training data, potentially leading to biased access to information for marginalized groups. This work introduces a curated spoken bias evaluation toolkit and corresponding dataset. We evaluate gender bias in SILLMs across four semantic-related tasks: speech-to-text translation (STT), spoken coreference resolution (SCR), spoken sentence continuation (SSC), and spoken question answering (SQA). Our analysis reveals that bias levels are language-dependent and vary with different evaluation methods. Our findings emphasize the necessity of employing multiple approaches to comprehensively assess biases in SILLMs, providing insights for developing fairer SILLM systems.
Published: 2024

36. DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering

Author: Hei, Zijian, Liu, Weiling, Ou, Wenjie, Qiao, Juyi, Jiao, Junming, Song, Guowen, Tian, Ting, and Lin, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Retrieval-Augmented Generation (RAG) has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA). RAG expands the query context by incorporating external knowledge bases to enhance the response accuracy. However, it would be inefficient to access LLMs multiple times for each query and unreliable to retrieve all the relevant documents by a single query. We have found that even though there is low relevance between some critical documents and query, it is possible to retrieve the remaining documents by combining parts of the documents with the query. To mine the relevance, a two-stage retrieval framework called Dynamic-Relevant Retrieval-Augmented Generation (DR-RAG) is proposed to improve document retrieval recall and the accuracy of answers while maintaining efficiency. Additionally, a compact classifier is applied to two different selection strategies to determine the contribution of the retrieved documents to answering the query and retrieve the relatively relevant documents. Meanwhile, DR-RAG call the LLMs only once, which significantly improves the efficiency of the experiment. The experimental results on multi-hop QA datasets show that DR-RAG can significantly improve the accuracy of the answers and achieve new progress in QA systems.
Published: 2024

37. Approximation and uniqueness results for the nonlocal diffuse optical tomography problem

Author: Lin, Yi-Hsuan and Zimmermann, Philipp
Subjects: Mathematics - Analysis of PDEs, 35R30, 26A33, 35J10, 35J70
Abstract: We investigate the inverse problem of recovering the diffusion and absorption coefficients $(\sigma,q)$ in the nonlocal diffuse optical tomography equation $(-\text{div}( \sigma \nabla))^s u+q u =0 \text{ in }\Omega$ from the nonlocal Dirichlet-to-Neumann (DN) map $\Lambda^s_{\sigma,q}$. The purpose of this article is to establish the following approximation and uniqueness results. - Approximation: We show that solutions to the conductivity equation $ \text{div}( \sigma \nabla v)=0 \text{ in }\Omega$ can be approximated in $H^1(\Omega)$ by solutions to the nonlocal diffuse optical tomography equation and the DN map $\Lambda_\sigma$ related to conductivity equation can be approximated by the nonlocal DN map $\Lambda_{\sigma,q}^s$. - Local uniqueness: We prove that the absorption coefficient $q$ can be determined in a neighborhood $\mathcal{N}$ of the boundary $\partial\Omega$ provided $\sigma$ is already known in $\mathcal{N}$. - Global uniqueness: Under the same assumptions as for the local uniqueness result, and if one of the potentials vanishes in $\Omega$, then one can turn with the help of \ref{item 1 abstract} the local determination into a global uniqueness result. It is worth mentioning that the approximation result relies on the Caffarelli--Silvestre type extension technique and the geometric form of the Hahn--Banach theorem., Comment: 37 pages
Published: 2024

38. Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition

Author: Lin, Yi-Cheng, Wu, Haibin, Chou, Huang-Cheng, Lee, Chi-Chun, and Lee, Hung-yi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The rapid growth of Speech Emotion Recognition (SER) has diverse global applications, from improving human-computer interactions to aiding mental health diagnostics. However, SER models might contain social bias toward gender, leading to unfair outcomes. This study analyzes gender bias in SER models trained with Self-Supervised Learning (SSL) at scale, exploring factors influencing it. SSL-based SER models are chosen for their cutting-edge performance. Our research pioneering research gender bias in SER from both upstream model and data perspectives. Our findings reveal that females exhibit slightly higher overall SER performance than males. Modified CPC and XLS-R, two well-known SSL models, notably exhibit significant bias. Moreover, models trained with Mandarin datasets display a pronounced bias toward valence. Lastly, we find that gender-wise emotion distribution differences in training data significantly affect gender bias, while upstream model representation has a limited impact., Comment: Accepted by INTERSPEECH 2024
Published: 2024
Full Text: View/download PDF

39. On the social bias of speech self-supervised models

Author: Lin, Yi-Cheng, Lin, Tzu-Quan, Lin, Hsi-Che, Liu, Andy T., and Lee, Hung-yi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning
Abstract: Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant concerns. Social bias refers to the phenomenon where algorithms potentially amplify disparate properties between social groups present in the data used for training. Bias in SSL models can perpetuate injustice by automating discriminatory patterns and reinforcing inequitable systems. This work reveals that prevalent SSL models inadvertently acquire biased associations. We probe how various factors, such as model architecture, size, and training methodologies, influence the propagation of social bias within these models. Finally, we explore the efficacy of debiasing SSL models through regularization techniques, specifically via model compression. Our findings reveal that employing techniques such as row-pruning and training wider, shallower models can effectively mitigate social bias within SSL model., Comment: Accepted by INTERSPEECH 2024
Published: 2024
Full Text: View/download PDF

40. Taxane/anthracycline combinations reduced incidence of breast cancer recurrence in young women across molecular subtypes: a real-world evidence of Taiwan from 2011 to 2019

Author: Chien, Yu-Ning, Lin, Li-Yin, Lin, Yi-Chun, Hsieh, Yi-Chen, Tu, Shih-Hsin, and Chiou, Hung-Yi
Published: 2025
Full Text: View/download PDF

41. International myeloma working group immunotherapy committee recommendation on sequencing immunotherapy for treatment of multiple myeloma: MULTIPLE MYELOMA, GAMMOPATHIES

Author: Costa, Luciano J., Banerjee, Rahul, Mian, Hira, Weisel, Katja, Bal, Susan, Derman, Benjamin A., Htut, Maung M., Nagarajan, Chandramouli, Rodriguez, Cesar, Richter, Joshua, Frigault, Matthew J., Ye, Jing C., van de Donk, Niels W. C. J., Voorhees, Peter M., Puliafito, Benjamin, Bahlis, Nizar, Popat, Rakesh, Chng, Wee Joo, Ho, P. Joy, Kaur, Gurbakhash, Kapoor, Prashant, Du, Juan, Schjesvold, Fredrik, Berdeja, Jesus, Einsele, Hermann, Cohen, Adam D., Mikhael, Joseph, Biru, Yelak, Rajkumar, S. Vincent, Lin, Yi, Martin, Thomas G., and Chari, Ajai
Published: 2025
Full Text: View/download PDF

42. A comprehensive review of eutrophication simulation and assessment in surface water: analysis and visualization

Author: Liu, Hua, Zhang, Yalei, Lin, Yi, and Zhou, Xuefei
Published: 2025
Full Text: View/download PDF

43. Physicians’ knowledge, awareness and instructions of home blood pressure monitoring: Asia HBPM survey in Taiwan

Author: Lin, Yi-Syuan, Lin, Hung-Ju, and Wang, Tzung-Dau
Published: 2025
Full Text: View/download PDF

44. Experimental and Numerical Investigations of Carbon-based Nanoparticle Reinforcement on Microstructure and Mechanical Properties of Epoxy Coatings

Author: Xu, Lu-Yang, Wang, Xing-Yu, Lin, Yi-Zhou, Huang, Ying, Tao, Cheng-Cheng, and Zhang, Da-Wei
Published: 2025
Full Text: View/download PDF

45. Fabrication of natural enzyme-covered / amino-modified Pd-Pt bimetallic-doped zeolitic imidazolate framework for ultrasensitive detection of metabolites

Author: Bai, Chen-Chen, Lang, Jin-ye, Wang, Xin-yu, Zhao, Jia-meng, Dong, Lin-Yi, Liu, Jun-Jie, and Wang, Xian-Hua
Published: 2025
Full Text: View/download PDF

46. Capillary leak phenotype as a major cause of death in patients with POEMS syndrome: MULTIPLE MYELOMA, GAMMOPATHIES

Author: Lee, Kenzie, Kourelis, Taxiarchis, Tschautscher, Marcella, Warsame, Rahma, Buadi, Francis, Gertz, Morie, Muchtar, Eli, Dingli, David, Hayman, Suzanne, Go, Ronald, Hwa, Lisa, Fonder, Amie, Gonsalves, Wilson, Hobbs, Miriam, Kyle, Robert, Kapoor, Prashant, Leung, Nelson, Binder, Moritz, Cook, Joselle, Lin, Yi, Rogers, Michelle, Rajkumar, S. Vincent, Kumar, Shaji, and Dispenzieri, Angela
Published: 2024
Full Text: View/download PDF

47. DOCK1 deficiency drives placental trophoblast cell dysfunction by influencing inflammation and oxidative stress, hallmarks of preeclampsia

Author: Xu, Yichi, Qin, Xiaoli, Zeng, Weihong, Wu, Fan, Wei, Xiaowei, Li, Qian, and Lin, Yi
Published: 2024
Full Text: View/download PDF

48. Epidemic risks of measles and rubella in China: a systematic review and meta-analysis

Author: Lin, Yi-Tong, Gao, Yi-Xuan, Zhang, Yan, Cui, Ai-Li, Wang, Hui-Ling, Zhu, Zhen, and Mao, Nai-Ying
Published: 2024
Full Text: View/download PDF

49. The association of the comorbidity status of metabolic syndrome and cognitive dysfunction with health-related quality of life

Author: Lin, Yi-Hsuan, Chang, Hsiao-Ting, Wang, Yen-Feng, Fuh, Jong-Ling, Wang, Shuu-Jiun, Chen, Harn-Shen, Li, Sih-Rong, Lin, Ming-Hwai, Chen, Tzeng-Ji, and Hwang, Shinn-Jang
Published: 2024
Full Text: View/download PDF

50. Increase of PCSK9 expression in diabetes promotes VEGFR2 ubiquitination to inhibit endothelial function and skin wound healing

Author: Gao, Jian-Jun, Wu, Fang-Yuan, Liu, Yu-Jia, Li, Le, Lin, Yi-Jun, Kang, Yue-Ting, Peng, Yue-Ming, Liu, Yi-Fang, Wang, Chen, Ma, Zhen-Sheng, Cao, Yang, Cao, Hong-Yu, Mo, Zhi-Wei, Li, Yan, Ou, Jing-Song, and Ou, Zhi-Jun
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

33,281 results on '"Lin, Yi-An"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources