Author: "Wu, Yihan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wu, Yihan"' showing total 931 results

Start Over Author "Wu, Yihan"

931 results on '"Wu, Yihan"'

1. Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization

Author: Wu, Yihan, Lu, Yichen, Peng, Yifan, Wang, Xihua, Song, Ruihua, and Watanabe, Shinji
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence
Abstract: Audiovisual Automatic Speech Recognition (AV-ASR) aims to improve speech recognition accuracy by leveraging visual signals. It is particularly challenging in unconstrained real-world scenarios across various domains due to noisy acoustic environments, spontaneous speech, and the uncertain use of visual information. Most previous works fine-tune audio-only ASR models on audiovisual datasets, optimizing them for conventional ASR objectives. However, they often neglect visual features and common errors in unconstrained video scenarios. In this paper, we propose using a preference optimization strategy to improve speech recognition accuracy for real-world videos. First, we create preference data via simulating common errors that occurred in AV-ASR from two focals: manipulating the audio or vision input and rewriting the output transcript. Second, we propose BPO-AVASR, a Bifocal Preference Optimization method to improve AV-ASR models by leveraging both input-side and output-side preference. Extensive experiments demonstrate that our approach significantly improves speech recognition accuracy across various domains, outperforming previous state-of-the-art models on real-world video speech recognition., Comment: Accepted by AAAI 2025
Published: 2024

2. De-mark: Watermark Removal in Large Language Models

Author: Chen, Ruibo, Wu, Yihan, Guo, Junfeng, and Huang, Heng
Subjects: Computer Science - Computation and Language
Abstract: Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models (LMs). However, the robustness of the watermarking schemes has not been well explored. In this paper, we present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively. Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark and identifying the red-green list within the n-gram watermark. Experiments on popular LMs, such as Llama3 and ChatGPT, demonstrate the efficiency and effectiveness of De-mark in watermark removal and exploitation tasks.
Published: 2024

3. A Watermark for Order-Agnostic Language Models

Author: Chen, Ruibo, Wu, Yihan, Chen, Yanshuo, Liu, Chenxi, Guo, Junfeng, and Huang, Heng
Subjects: Computer Science - Computation and Language
Abstract: Statistical watermarking techniques are well-established for sequentially decoded language models (LMs). However, these techniques cannot be directly applied to order-agnostic LMs, as the tokens in order-agnostic LMs are not generated sequentially. In this work, we introduce Pattern-mark, a pattern-based watermarking framework specifically designed for order-agnostic LMs. We develop a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns. Correspondingly, we propose a statistical pattern-based detection algorithm that recovers the key sequence during detection and conducts statistical tests based on the count of high-frequency patterns. Our extensive evaluations on order-agnostic LMs, such as ProteinMPNN and CMLM, demonstrate Pattern-mark's enhanced detection efficiency, generation quality, and robustness, positioning it as a superior watermarking technique for order-agnostic LMs.
Published: 2024

4. ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

Author: Shi, Jiatong, Tian, Jinchuan, Wu, Yihan, Jung, Jee-weon, Yip, Jia Qi, Masuyama, Yoshiki, Chen, William, Wu, Yuning, Tang, Yuxun, Baali, Massa, Alharhi, Dareen, Zhang, Dong, Deng, Ruifan, Srivastava, Tejes, Wu, Haibin, Liu, Alexander H., Raj, Bhiksha, Jin, Qin, Song, Ruihua, and Watanabe, Shinji
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Neural codecs have become crucial to recent speech and audio generation research. In addition to signal compression capabilities, discrete codecs have also been found to enhance downstream training efficiency and compatibility with autoregressive language models. However, as extensive downstream applications are investigated, challenges have arisen in ensuring fair comparisons across diverse applications. To address these issues, we present a new open-source platform ESPnet-Codec, which is built on ESPnet and focuses on neural codec training and evaluation. ESPnet-Codec offers various recipes in audio, music, and speech for training and evaluation using several widely adopted codec models. Together with ESPnet-Codec, we present VERSA, a standalone evaluation toolkit, which provides a comprehensive evaluation of codec performance over 20 audio evaluation metrics. Notably, we demonstrate that ESPnet-Codec can be integrated into six ESPnet tasks, supporting diverse applications., Comment: Accepted by SLT
Published: 2024

5. LoVA: Long-form Video-to-Audio Generation

Author: Cheng, Xin, Wang, Xihua, Wu, Yihan, Wang, Yuyue, and Song, Ruihua
Subjects: Computer Science - Sound, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Video-to-audio (V2A) generation is important for video editing and post-processing, enabling the creation of semantics-aligned audio for silent video. However, most existing methods focus on generating short-form audio for short video segment (less than 10 seconds), while giving little attention to the scenario of long-form video inputs. For current UNet-based diffusion V2A models, an inevitable problem when handling long-form audio generation is the inconsistencies within the final concatenated audio. In this paper, we first highlight the importance of long-form V2A problem. Besides, we propose LoVA, a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) architecture, LoVA proves to be more effective at generating long-form audio compared to existing autoregressive models and UNet-based diffusion models. Extensive objective and subjective experiments demonstrate that LoVA achieves comparable performance on 10-second V2A benchmark and outperforms all other baselines on a benchmark with long-form video input., Comment: Accepted by ICASSP 2025
Published: 2024

6. SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

Author: Jung, Jee-weon, Wu, Yihan, Wang, Xin, Kim, Ji-Hoon, Maiti, Soumi, Matsunaga, Yuta, Shim, Hye-jin, Tian, Jinchuan, Evans, Nicholas, Chung, Joon Son, Zhang, Wangyou, Um, Seyun, Takamichi, Shinnosuke, and Watanabe, Shinji
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with different levels of noise to be trained. However, existing datasets typically include clean, high-quality recordings (bona fide data) due to the requirements for TTS training; studio-quality or well-recorded read speech is typically necessary to train TTS models. Existing SDD datasets also have limited usefulness for training SASV models due to insufficient speaker diversity. We present SpoofCeleb, which leverages a fully automated pipeline that processes the VoxCeleb1 dataset, transforming it into a suitable form for TTS training. We subsequently train 23 contemporary TTS systems. The resulting SpoofCeleb dataset comprises over 2.5 million utterances from 1,251 unique speakers, collected under natural, real-world conditions. The dataset includes carefully partitioned training, validation, and evaluation sets with well-controlled experimental protocols. We provide baseline results for both SDD and SASV tasks. All data, protocols, and baselines are publicly available at https://jungjee.github.io/spoofceleb., Comment: 9 pages, 2 figures, 8 tables
Published: 2024

7. Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

Author: Wu, Yihan, Peng, Yifan, Lu, Yichen, Chang, Xuankai, Song, Ruihua, and Watanabe, Shinji
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Sound
Abstract: Visual signals can enhance audiovisual speech recognition accuracy by providing additional contextual information. Given the complexity of visual signals, an audiovisual speech recognition model requires robust generalization capabilities across diverse video scenarios, presenting a significant challenge. In this paper, we introduce EVA, leveraging the mixture-of-Experts for audioVisual ASR to perform robust speech recognition for ``in-the-wild'' videos. Specifically, we first encode visual information into visual tokens sequence and map them into speech space by a lightweight projection. Then, we build EVA upon a robust pretrained speech recognition model, ensuring its generalization ability. Moreover, to incorporate visual information effectively, we inject visual information into the ASR model through a mixture-of-experts module. Experiments show our model achieves state-of-the-art results on three benchmarks, which demonstrates the generalization ability of EVA across diverse video domains., Comment: 6 pages, 2 figures, accepted by IEEE Spoken Language Technology Workshop 2024
Published: 2024

8. Text-To-Speech Synthesis In The Wild

Author: Jung, Jee-weon, Zhang, Wangyou, Maiti, Soumi, Wu, Yihan, Wang, Xin, Kim, Ji-Hoon, Matsunaga, Yuta, Um, Seyun, Tian, Jinchuan, Shim, Hye-jin, Evans, Nicholas, Chung, Joon Son, Takamichi, Shinnosuke, and Watanabe, Shinji
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence
Abstract: Text-to-speech (TTS) systems are traditionally trained using modest databases of studio-quality, prompted or read speech collected in benign acoustic environments such as anechoic rooms. The recent literature nonetheless shows efforts to train TTS systems using data collected in the wild. While this approach allows for the use of massive quantities of natural speech, until now, there are no common datasets. We introduce the TTS In the Wild (TITW) dataset, the result of a fully automated pipeline, in this case, applied to the VoxCeleb1 dataset commonly used for speaker recognition. We further propose two training sets. TITW-Hard is derived from the transcription, segmentation, and selection of VoxCeleb1 source data. TITW-Easy is derived from the additional application of enhancement and additional data selection based on DNSMOS. We show that a number of recent TTS models can be trained successfully using TITW-Easy, but that it remains extremely challenging to produce similar results using TITW-Hard. Both the dataset and protocols are publicly available and support the benchmarking of TTS systems trained using TITW data., Comment: 5 pages, submitted to ICASSP 2025 as a conference paper
Published: 2024

9. YuLan: An Open-source Large Language Model

Author: Zhu, Yutao, Zhou, Kun, Mao, Kelong, Chen, Wentong, Sun, Yiding, Chen, Zhipeng, Cao, Qian, Wu, Yihan, Chen, Yushuo, Wang, Feng, Zhang, Lei, Li, Junyi, Wang, Xiaolei, Wang, Lei, Zhang, Beichen, Dong, Zican, Cheng, Xiaoxue, Chen, Yuhan, Tang, Xinyu, Hou, Yupeng, Ren, Qiangqiang, Pang, Xincheng, Xie, Shufang, Zhao, Wayne Xin, Dou, Zhicheng, Mao, Jiaxin, Lin, Yankai, Song, Ruihua, Xu, Jun, Chen, Xu, Yan, Rui, Wei, Zhewei, Hu, Di, Huang, Wenbing, Gao, Ze-Feng, Chen, Yueguo, Lu, Weizheng, and Wen, Ji-Rong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for developing LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat.
Published: 2024

10. The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Author: Chang, Xuankai, Shi, Jiatong, Tian, Jinchuan, Wu, Yuning, Tang, Yuxun, Wu, Yihan, Watanabe, Shinji, Adi, Yossi, Chen, Xie, and Jin, Qin
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compression and restoration, speech recognition, and speech generation. To foster exploration in this domain, we introduce the Interspeech 2024 Challenge, which focuses on new speech processing benchmarks using discrete units. It encompasses three pivotal tasks, namely multilingual automatic speech recognition, text-to-speech, and singing voice synthesis, and aims to assess the potential applicability of discrete units in these tasks. This paper outlines the challenge designs and baseline descriptions. We also collate baseline and selected submission systems, along with preliminary findings, offering valuable contributions to future research in this evolving field., Comment: This manuscript has been accepted by Interspeech2024
Published: 2024

11. Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions

Author: Wu, Yihan, Chen, Ruibo, Hu, Zhengmian, Chen, Yanshuo, Guo, Junfeng, Zhang, Hongyang, and Huang, Heng
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Language model (LM) watermarking techniques inject a statistical signal into LM-generated content by substituting the random sampling process with pseudo-random sampling, using watermark keys as the random seed. Among these statistical watermarking approaches, distortion-free watermarks are particularly crucial because they embed watermarks into LM-generated content without compromising generation quality. However, one notable limitation of pseudo-random sampling compared to true-random sampling is that, under the same watermark keys (i.e., key collision), the results of pseudo-random sampling exhibit correlations. This limitation could potentially undermine the distortion-free property. Our studies reveal that key collisions are inevitable due to the limited availability of watermark keys, and existing distortion-free watermarks exhibit a significant distribution bias toward the original LM distribution in the presence of key collisions. Moreover, achieving a perfect distortion-free watermark is impossible as no statistical signal can be embedded under key collisions. To reduce the distribution bias caused by key collisions, we introduce a new family of distortion-free watermarks--beta-watermark. Experimental results support that the beta-watermark can effectively reduce the distribution bias under key collisions.
Published: 2024

12. Explainable deep learning and virtual evolution identifies antimicrobial peptides with activity against multidrug-resistant human pathogens

Author: Wang, Beilun, Lin, Peijun, Zhong, Yuwei, Tan, Xiao, Shen, Yangyang, Huang, Yi, Jin, Kai, Zhang, Yan, Zhan, Ying, Shen, Dian, Wang, Meng, Yu, Zhou, and Wu, Yihan
Published: 2025
Full Text: View/download PDF

13. Wheat germ agglutinin modified mixed micelles overcome the dual barrier of mucus/enterocytes for effective oral absorption of shikonin and gefitinib

Author: Hou, Xuefeng, Ai, Xinyi, Liu, Zhenda, Yang, Jiayi, Wu, Yihan, Zhang, Di, and Feng, Nianping
Published: 2025
Full Text: View/download PDF

14. The impact of disgust on moral judgment in individuals with varying disgust propensities

Author: Wu, Yihan, Zheng, Ronglian, Xing, Huili, Kou, Yining, Wang, Yufeng, Zou, Feng, Wu, Xin, Liu, Fan, Luo, Yanyan, and Zhang, Meng
Published: 2024
Full Text: View/download PDF

15. Analyzing the Impact of Tall Building Geometries on Wind Environment in a Hypothetical Urban Context: A Typological and Parametric Study

Author: Wu, Yihan, Li, Weifeng, Zeng, Ningyi, Bai, Xiaoxia, di Prisco, Marco, Series Editor, Chen, Sheng-Hong, Series Editor, Vayas, Ioannis, Series Editor, Kumar Shukla, Sanjay, Series Editor, Sharma, Anuj, Series Editor, Kumar, Nagesh, Series Editor, Wang, Chien Ming, Series Editor, Cui, Zhen-Dong, Series Editor, Lu, Xinzheng, Series Editor, He, Bao-Jie, editor, Prasad, Deo, editor, Yan, Li, editor, Cheshmehzangi, Ali, editor, and Pignatta, Gloria, editor
Published: 2025
Full Text: View/download PDF

16. Alterations of electrocortical activity during hand movements induced by motor cortex glioma

Author: Wu, Yihan, Chang, Tao, Chen, Siliang, Niu, Xiaodong, Li, Yu, Fang, Yuan, Yang, Lei, Zong, Yixuan, Yang, Yaoxin, Li, Yuehua, Wang, Mengsong, Yang, Wen, Wu, Yixuan, Fu, Chen, Fang, Xia, Quan, Yuxin, Peng, Xilin, Sun, Qiang, Van Hulle, Marc M., Liu, Yanhui, Jiang, Ning, Farina, Dario, Yang, Yuan, He, Jiayuan, and Mao, Qing
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with glioma-infiltrated motor cortex, and recorded high-density electrocortical signals during finger movement tasks. The results showed that glioma suppresses task-related synchronization in the high-gamma band and reduces the power across all frequency bands. The resulting atypical motor information transmission model with discrete signaling pathways and delayed responses disrupts the stability of neuronal encoding patterns for finger movement kinematics across various temporal-spatial scales. These findings demonstrate that gliomas functionally invade neural circuits within the motor cortex. This result advances our understanding of motor function processing in chronic disease states, which is important to advance the surgical strategies and neurorehabilitation approaches for patients with malignant gliomas.
Published: 2024

17. Lambda: Learning Matchable Prior For Entity Alignment with Unlabeled Dangling Cases

Author: Yin, Hang, Xiang, Liyao, Ding, Dong, He, Yuheng, Wu, Yihan, Wang, Xinbing, and Zhou, Chenghu
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval, I.2.4, H.3.3
Abstract: We investigate the entity alignment (EA) problem with unlabeled dangling cases, meaning that partial entities have no counterparts in the other knowledge graph (KG), and this type of entity remains unlabeled. To address this challenge, we propose the framework \textit{Lambda} for dangling detection and then entity alignment. Lambda features a GNN-based encoder called KEESA with spectral contrastive learning for EA and a positive-unlabeled learning algorithm for dangling detection called iPULE. iPULE offers theoretical guarantees of unbiasedness, uniform deviation bounds, and convergence. Experimental results demonstrate that each component contributes to overall performances that are superior to baselines, even when baselines additionally exploit 30\% of dangling entities labeled for training., Comment: Accepted in NeurIPS 2024 as a poster
Published: 2024

18. Few-Shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt

Author: Liu, Chenxi, Wang, Zhenyi, Xiong, Tianyi, Chen, Ruibo, Wu, Yihan, Guo, Junfeng, and Huang, Heng
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Few-Shot Class-Incremental Learning (FSCIL) models aim to incrementally learn new classes with scarce samples while preserving knowledge of old ones. Existing FSCIL methods usually fine-tune the entire backbone, leading to overfitting and hindering the potential to learn new classes. On the other hand, recent prompt-based CIL approaches alleviate forgetting by training prompts with sufficient data in each task. In this work, we propose a novel framework named Attention-aware Self-adaptive Prompt (ASP). ASP encourages task-invariant prompts to capture shared knowledge by reducing specific information from the attention aspect. Additionally, self-adaptive task-specific prompts in ASP provide specific information and transfer knowledge from old classes to new classes with an Information Bottleneck learning objective. In summary, ASP prevents overfitting on base task and does not require enormous data in few-shot incremental tasks. Extensive experiments on three benchmark datasets validate that ASP consistently outperforms state-of-the-art FSCIL and prompt-based CIL methods in terms of both learning new classes and mitigating forgetting., Comment: ECCV 2024
Published: 2024

19. Reactive Fluorescent and Colorimetric Probe for Highly Selective and Sensitive Detection of Hg2+ in Real Water Samples

Author: Yang, Jiarui, Zhang, Kaiqiang, Zhao, Yong, Song, Yanxi, Wu, Yihan, and Li, Hongqi
Published: 2024
Full Text: View/download PDF

20. Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection

Author: Chen, Ruibo, Wu, Yihan, Chen, Lichang, Liu, Guodong, He, Qi, Xiong, Tianyi, Liu, Chenxi, Guo, Junfeng, and Huang, Heng
Subjects: Computer Science - Computation and Language
Abstract: Data selection in instruction tuning emerges as a pivotal process for acquiring high-quality data and training instruction-following large language models (LLMs), but it is still a new and unexplored research area for vision-language models (VLMs). Existing data selection approaches on LLMs either rely on single unreliable scores, or use downstream tasks for selection, which is time-consuming and can lead to potential over-fitting on the chosen evaluation datasets. To address this challenge, we introduce a novel dataset selection method, Self-Filter, that utilizes the VLM itself as a filter. This approach is inspired by the observation that VLMs benefit from training with the most challenging instructions. Self-Filter operates in two stages. In the first stage, we devise a scoring network to evaluate the difficulty of training instructions, which is co-trained with the VLM. In the second stage, we use the trained score net to measure the difficulty of each instruction, select the most challenging samples, and penalize similar samples to encourage diversity. Comprehensive experiments on LLaVA and MiniGPT-4 show that Self-Filter can reach better results compared to full data settings with merely about 15% samples, and can achieve superior performance against competitive baselines., Comment: 9 pages, 3 figures, 4 tables
Published: 2024

21. SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

Author: Wu, Yihan, Maiti, Soumi, Peng, Yifan, Zhang, Wangyou, Li, Chenda, Wang, Yuyue, Wang, Xihua, Watanabe, Shinji, and Song, Ruihua
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent advancements in language models have significantly enhanced performance in multiple speech-related tasks. Existing speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model. However, this design omits the intrinsic connections between different speech tasks, which can potentially boost the performance of each task. In this work, we propose a novel decoder-only speech language model, SpeechComposer, that can unify common speech tasks by composing a fixed set of prompt tokens. Built upon four primary tasks -- speech synthesis, speech recognition, speech language modeling, and text language modeling -- SpeechComposer can easily extend to more speech tasks via compositions of well-designed prompt tokens, like voice conversion and speech enhancement. The unification of prompt tokens also makes it possible for knowledge sharing among different speech tasks in a more structured manner. Experimental results demonstrate that our proposed SpeechComposer can improve the performance of both primary tasks and composite tasks, showing the effectiveness of the shared prompt tokens. Remarkably, the unified decoder-only model achieves a comparable and even better performance than the baselines which are expert models designed for single tasks., Comment: 11 pages, 2 figures
Published: 2024

22. The cyclophilin D (CypD) of Toxoplasma gondii is involved in the parasite’s response to oxidative stress damage

Author: Ying, Zhu, Wu, Yihan, Sun, Zhepeng, Liu, Jing, and Liu, Qun
Published: 2024
Full Text: View/download PDF

23. A preliminary study on the effects of Xiang Shao granules on reproductive endocrinology in drugged ovariectomised rats

Author: Jia, Qiucheng, Tang, Huimin, Zhong, Xiangmei, Chen, Wanying, Wu, Yihan, Wei, Weiwei, Zheng, Hong, and Chen, Jiming
Published: 2024
Full Text: View/download PDF

24. Single cell expression and chromatin accessibility of the Toxoplasma gondii lytic cycle identifies AP2XII-8 as an essential ribosome regulon driver

Author: Lou, Jingjing, Rezvani, Yasaman, Arriojas, Argenis, Wu, Yihan, Shankar, Nachiket, Degras, David, Keroack, Caroline D., Duraisingh, Manoj T., Zarringhalam, Kourosh, and Gubbels, Marc-Jan
Published: 2024
Full Text: View/download PDF

25. The cuproptosis-related signature predicts the prognosis and immune microenvironments of primary diffuse gliomas: a comprehensive analysis

Author: Chang, Tao, Wu, Yihan, Niu, Xiaodong, Guo, Zhiwei, Gan, Jiahao, Wang, Xiang, Liu, Yanhui, Pan, Qi, Mao, Qing, and Yang, Yuan
Published: 2024
Full Text: View/download PDF

26. Unravelling the veil of appearance anxiety: exploring social media use among Chinese young people

Author: Wu, Yihan, Xue, Ying, Zhao, Xiaohan, Han, Sijia, and Wu, Weiyun
Published: 2024
Full Text: View/download PDF

27. The root canal morphology of mandibular anterior teeth and its correlation with the occurrence of three-rooted mandibular first molars

Author: ZHU Peng, GU Yongchun, WU Yihan, XU Xiaoming
Subjects: mandibular anterior teeth, three-rooted mandibular first molar, root canal system, cone beam computed tomography, Dentistry, RK1-715, Other systems of medicine, RZ201-999
Abstract: Objective To study the root canal morphology of permanent mandibular anterior teeth and explore its correlation with the occurrence of three-rooted mandibular molars using cone beam computed tomographic (CBCT) imaging. Methods CBCT image data of 200 subjects were randomly collected from dental clinics. The root canal morphology of the mandibular anterior teeth was identified and classified by Vertucci’s classification, and the root length and labio-lingual dimension at the tooth neck level were measured. The occurrence of three-rooted mandibular first molars was examined as well. The concurrence rates of double-canaled anterior teeth and three-rooted mandibular first molars at each side, and concurrence rates of bilateral double-canaled anterior teeth and three-rooted mandibular first molars were calculated. Spearman correlation tests were applied to analyze the correlation between the double-canaled anterior teeth and three-rooted mandibular first molars, as well as the bilateral antimetric teeth. Results The incidence of double-canaled system was 10.4%, 18.6% and 6.5% in mandibular central incisors, lateral incisors and canines, respectively; the bilateral concurrence rates were 5.7%, 11.1% and 3.0%, respectively, and Spearman correlation coefficients (rho) were 0.487, 0.505 and 0.440 (P0.05). The frequency of three-rooted mandibular first molars was 24.6%; gender difference was not detected (P>0.05), while the incidence was significantly higher at the right side (29.0%) than the left side (20.3%)(P0.05) could be detected between them. Conclusion The lateral incisors exhibited the highest incidence of two root canals among the mandibular anterior teeth. Additionally, there was no significant correlation between three-rooted mandibular first molars and double-canaled mandibular anterior teeth.
Published: 2025
Full Text: View/download PDF

28. Local environment-based machine learning for molecular adsorption energy prediction

Author: Li, Yifan, Wu, Yihan, Han, Yuhang, Lyu, Qujie, Wu, Hao, Zhang, Xiuying, and Shen, Lei
Subjects: Condensed Matter - Materials Science
Abstract: Most machine learning (ML) models in Materials Science are developed by global geometric features, often falling short in describing localized characteristics, like molecular adsorption on materials. In this study, we introduce a local environment framework that extracts local features from crystal structures to portray the environment surrounding specific adsorption sites. Upon OC20 database (~20,000 3D entries), we apply our local environment framework on several ML models, such as random forest, convolutional neural network, and graph neural network. It is found that our framework achieves remarkable prediction accuracy in predicting molecular adsorption energy, significantly outperforming other examined global-environment-based models. Moreover, the employment of this framework reduces data requirements and augments computational speed, specifically for deep learning algorithms. Finally, we directly apply our Local Environment ResNet (LERN) on a small 2DMatPedia database (~2,000 2D entries), which also achieves highly accurate prediction, demonstrating the model transferability and remarkable data efficiency. Overall, the prediction accuracy, data-utilization efficiency, and transferability of our local-environment-based ML framework hold a promising high applicability across a broad molecular adsorption field, such as catalysis and sensor technologies., Comment: 6 figures
Published: 2023

29. GPT-4 Vision on Medical Image Classification -- A Case Study on COVID-19 Dataset

Author: Chen, Ruibo, Xiong, Tianyi, Wu, Yihan, Liu, Guodong, Hu, Zhengmian, Chen, Lichang, Chen, Yanshuo, Liu, Chenxi, and Huang, Heng
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: This technical report delves into the application of GPT-4 Vision (GPT-4V) in the nuanced realm of COVID-19 image classification, leveraging the transformative potential of in-context learning to enhance diagnostic processes.
Published: 2023

30. A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models

Author: Wu, Yihan, Hu, Zhengmian, Guo, Junfeng, Zhang, Hongyang, and Huang, Heng
Subjects: Computer Science - Cryptography and Security, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models. A challenge in the domain lies in preserving the distribution of original generated content after watermarking. Our research extends and improves upon existing watermarking framework, placing emphasis on the importance of a \textbf{Di}stribution-\textbf{P}reserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking (distribution-preserving), is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens (resilient). DiPmark operates by selecting a random set of tokens prior to the generation of a word, then modifying the token distribution through a distribution-preserving reweight function to enhance the probability of these selected tokens during the sampling process. Extensive empirical evaluation on various language models and tasks demonstrates our approach's distribution-preserving property, accessibility, and resilience, making it a effective solution for watermarking tasks that demand impeccable quality preservation., Comment: ICML 2024
Published: 2023

31. Shielding the Unseen: Privacy Protection through Poisoning NeRF with Spatial Deformation

Author: Wu, Yihan, Feng, Brandon Y., and Huang, Heng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we introduce an innovative method of safeguarding user privacy against the generative capabilities of Neural Radiance Fields (NeRF) models. Our novel poisoning attack method induces changes to observed views that are imperceptible to the human eye, yet potent enough to disrupt NeRF's ability to accurately reconstruct a 3D scene. To achieve this, we devise a bi-level optimization algorithm incorporating a Projected Gradient Descent (PGD)-based spatial deformation. We extensively test our approach on two common NeRF benchmark datasets consisting of 29 real-world scenes with high-quality images. Our results compellingly demonstrate that our privacy-preserving method significantly impairs NeRF's performance across these benchmark datasets. Additionally, we show that our method is adaptable and versatile, functioning across various perturbation strengths and NeRF architectures. This work offers valuable insights into NeRF's vulnerabilities and emphasizes the need to account for such potential privacy risks when developing robust 3D scene reconstruction algorithms. Our study contributes to the larger conversation surrounding responsible AI and generative machine learning, aiming to protect user privacy and respect creative ownership in the digital age.
Published: 2023

32. Unbiased Watermark for Large Language Models

Author: Hu, Zhengmian, Chen, Lichang, Wu, Xidong, Wu, Yihan, Zhang, Hongyang, and Huang, Heng
Subjects: Computer Science - Cryptography and Security
Abstract: The recent advancements in large language models (LLMs) have sparked a growing apprehension regarding the potential misuse. One approach to mitigating this risk is to incorporate watermarking techniques into LLMs, allowing for the tracking and attribution of model outputs. This study examines a crucial aspect of watermarking: how significantly watermarks impact the quality of model-generated outputs. Previous studies have suggested a trade-off between watermark strength and output quality. However, our research demonstrates that it is possible to integrate watermarks without affecting the output probability distribution with appropriate implementation. We refer to this type of watermark as an unbiased watermark. This has significant implications for the use of LLMs, as it becomes impossible for users to discern whether a service provider has incorporated watermarks or not. Furthermore, the presence of watermarks does not compromise the performance of the model in downstream tasks, ensuring that the overall utility of the language model is preserved. Our findings contribute to the ongoing discussion around responsible AI development, suggesting that unbiased watermarks can serve as an effective means of tracking and attributing model outputs without sacrificing output quality.
Published: 2023

33. Markov Chain-Guided Graph Construction and Sampling Depth Optimization for EEG-Based Mental Disorder Detection

Author: Wu, Yihan, Chang, Tao, Xu, Peng, and Zhang, Yangsong
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: Graph Neural Networks (GNNs) have received considerable attention since its introduction. It has been widely applied in various fields due to its ability to represent graph structured data. However, the application of GNNs is constrained by two main issues. Firstly, the "over-smoothing" problem restricts the use of deeper network structures. Secondly, GNNs' applicability is greatly limited when nodes and edges are not clearly defined and expressed, as is the case with EEG data.In this study, we proposed an innovative approach that harnesses the distinctive properties of the graph structure's Markov Chain to optimize the sampling depth of deep graph convolution networks. We introduced a tailored method for constructing graph structures specifically designed for analyzing EEG data, alongside the development of a vertex-level GNN classification model for precise detection of mental disorders. In order to verify the method's performance, we conduct experiments on two disease datasets using a subject-independent experiment scenario. For the Schizophrenia (SZ) data, our method achieves an average accuracy of 100% using only the first 300 seconds of data from each subject. Similarly, for Major Depressive Disorder (MDD) data, the method yields average accuracies of over 99%. These experiments demonstrate the method's ability to effectively distinguish between healthy control (HC) subjects and patients with mental disorders. We believe this method shows great promise for clinical diagnosis., Comment: 5 figures, 4 tables
Published: 2023

34. Characterizing normal perinatal development of the human brain structural connectivity

Author: Wu, Yihan, Vasung, Lana, Calixto, Camilo, Gholipour, Ali, and Karimi, Davood
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Early brain development is characterized by the formation of a highly organized structural connectome. The interconnected nature of this connectome underlies the brain's cognitive abilities and influences its response to diseases and environmental factors. Hence, quantitative assessment of structural connectivity in the perinatal stage is useful for studying normal and abnormal neurodevelopment. However, estimation of the connectome from diffusion MRI data involves complex computations. For the perinatal period, these computations are further challenged by the rapid brain development and imaging difficulties. Combined with high inter-subject variability, these factors make it difficult to chart the normal development of the structural connectome. As a result, there is a lack of reliable normative baselines of structural connectivity metrics at this critical stage in brain development. In this study, we developed a computational framework, based on spatio-temporal averaging, for determining such baselines. We used this framework to analyze the structural connectivity between 33 and 44 postmenstrual weeks using data from 166 subjects. Our results unveiled clear and strong trends in the development of structural connectivity in perinatal stage. Connection weighting based on fractional anisotropy and neurite density produced the most consistent results. We observed increases in global and local efficiency, a decrease in characteristic path length, and widespread strengthening of the connections within and across brain lobes and hemispheres. We also observed asymmetry patterns that were consistent between different connection weighting approaches. The new computational method and results are useful for assessing normal and abnormal development of the structural connectome early in life.
Published: 2023

35. Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets

Author: Wang, Yimu, Zhang, Dinghuai, Wu, Yihan, Huang, Heng, and Zhang, Hongyang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Despite incredible advances, deep learning has been shown to be susceptible to adversarial attacks. Numerous approaches have been proposed to train robust networks both empirically and certifiably. However, most of them defend against only a single type of attack, while recent work takes steps forward in defending against multiple attacks. In this paper, to understand multi-target robustness, we view this problem as a bargaining game in which different players (adversaries) negotiate to reach an agreement on a joint direction of parameter updating. We identify a phenomenon named player domination in the bargaining game, namely that the existing max-based approaches, such as MAX and MSD, do not converge. Based on our theoretical analysis, we design a novel framework that adjusts the budgets of different adversaries to avoid any player dominance. Experiments on standard benchmarks show that employing the proposed framework to the existing approaches significantly advances multi-target robustness.
Published: 2023

36. ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios

Author: Wang, Yuyue, Xiao, Huan, Wu, Yihan, and Song, Ruihua
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Text to Speech (TTS) models can generate natural and high-quality speech, but it is not expressive enough when synthesizing speech with dramatic expressiveness, such as stand-up comedies. Considering comedians have diverse personal speech styles, including personal prosody, rhythm, and fillers, it requires real-world datasets and strong speech style modeling capabilities, which brings challenges. In this paper, we construct a new dataset and develop ComedicSpeech, a TTS system tailored for the stand-up comedy synthesis in low-resource scenarios. First, we extract prosody representation by the prosody encoder and condition it to the TTS model in a flexible way. Second, we enhance the personal rhythm modeling by a conditional duration predictor. Third, we model the personal fillers by introducing comedian-related special tokens. Experiments show that ComedicSpeech achieves better expressiveness than baselines with only ten-minute training data for each comedian. The audio samples are available at https://xh621.github.io/stand-up-comedy-demo/, Comment: 5 pages, 4 tables, 2 figure
Published: 2023

37. ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

Author: Chen, Zehua, Wu, Yihan, Leng, Yichong, Chen, Jiawei, Liu, Haohe, Tan, Xu, Cui, Yang, Wang, Ke, He, Lei, Zhao, Sheng, Bian, Jiang, and Mandic, Danilo
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Signal Processing
Abstract: Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/., Comment: 13 pages, 5 figures
Published: 2022

38. Adversarial Weight Perturbation Improves Generalization in Graph Neural Networks

Author: Wu, Yihan, Bojchevski, Aleksandar, and Huang, Heng
Subjects: Computer Science - Machine Learning
Abstract: A lot of theoretical and empirical evidence shows that the flatter local minima tend to improve generalization. Adversarial Weight Perturbation (AWP) is an emerging technique to efficiently and effectively find such minima. In AWP we minimize the loss w.r.t. a bounded worst-case perturbation of the model parameters thereby favoring local minima with a small loss in a neighborhood around them. The benefits of AWP, and more generally the connections between flatness and generalization, have been extensively studied for i.i.d. data such as images. In this paper, we extensively study this phenomenon for graph data. Along the way, we first derive a generalization bound for non-i.i.d. node classification tasks. Then we identify a vanishing-gradient issue with all existing formulations of AWP and we propose a new Weighted Truncated AWP (WT-AWP) to alleviate this issue. We show that regularizing graph neural networks with WT-AWP consistently improves both natural and robust generalization across many different graph learning tasks and models., Comment: AAAI 2023 (oral)
Published: 2022

39. VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

Author: Wu, Yihan, Guo, Junliang, Tan, Xu, Zhang, Chen, Li, Bohan, Song, Ruihua, He, Lei, Zhao, Sheng, Menezes, Arul, and Bian, Jiang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech synthesis. To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible to that of the original speech, which requires strict length control. Previous works usually control the number of words or characters generated by the machine translation model to be similar to the source sentence, without considering the isochronicity of speech as the speech duration of words/characters in different languages varies. In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech. Specifically, we control the speech length of generated sentence by guiding the prediction of each word with the duration information, including the speech duration of itself as well as how much duration is left for the remaining words. We design experiments on four language directions (German -> English, Spanish -> English, Chinese <-> English), and the results show that the proposed method achieves better length control ability on the generated speech than baseline methods. To make up the lack of real-world datasets, we also construct a real-world test set collected from films to provide comprehensive evaluations on the video dubbing task., Comment: AAAI 2023 camera version
Published: 2022

40. PromptTTS: Controllable Text-to-Speech with Text Descriptions

Author: Guo, Zhifang, Leng, Yichong, Wu, Yihan, Zhao, Sheng, and Tan, Xu
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound
Abstract: Using a text description as prompt to guide the generation of text or images (e.g., GPT-3 or DALLE-2) has drawn wide attention recently. Beyond text and image generation, in this work, we explore the possibility of utilizing text descriptions to guide speech synthesis. Thus, we develop a text-to-speech (TTS) system (dubbed as PromptTTS) that takes a prompt with both style and content descriptions as input to synthesize the corresponding speech. Specifically, PromptTTS consists of a style encoder and a content encoder to extract the corresponding representations from the prompt, and a speech decoder to synthesize speech according to the extracted style and content representations. Compared with previous works in controllable TTS that require users to have acoustic knowledge to understand style factors such as prosody and pitch, PromptTTS is more user-friendly since text descriptions are a more natural way to express speech style (e.g., ''A lady whispers to her friend slowly''). Given that there is no TTS dataset with prompts, to benchmark the task of PromptTTS, we construct and release a dataset containing prompts with style and content information and the corresponding speech. Experiments show that PromptTTS can generate speech with precise style control and high speech quality. Audio samples and our dataset are publicly available., Comment: Submitted to ICASSP 2023
Published: 2022

41. Towards Robust Dataset Learning

Author: Wu, Yihan, Li, Xinda, Kerschbaum, Florian, Huang, Heng, and Zhang, Hongyang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Adversarial training has been actively studied in recent computer vision research to improve the robustness of models. However, due to the huge computational cost of generating adversarial samples, adversarial training methods are often slow. In this paper, we study the problem of learning a robust dataset such that any classifier naturally trained on the dataset is adversarially robust. Such a dataset benefits the downstream tasks as natural training is much faster than adversarial training, and demonstrates that the desired property of robustness is transferable between models and data. In this work, we propose a principled, tri-level optimization to formulate the robust dataset learning problem. We show that, under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset. Extensive experiments on MNIST, CIFAR10, and TinyImageNet demostrate the effectiveness of our algorithm with different network initializations and architectures.
Published: 2022

42. Visual representations in the human brain are aligned with large language models

Author: Doerig, Adrien, Kietzmann, Tim C, Allen, Emily, Wu, Yihan, Naselaris, Thomas, Kay, Kendrick, and Charest, Ian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Quantitative Biology - Neurons and Cognition
Abstract: The human brain extracts complex information from visual inputs, including objects, their spatial and semantic interrelations, and their interactions with the environment. However, a quantitative approach for studying this information remains elusive. Here, we test whether the contextual information encoded in large language models (LLMs) is beneficial for modelling the complex visual information extracted by the brain from natural scenes. We show that LLM embeddings of scene captions successfully characterise brain activity evoked by viewing the natural scenes. This mapping captures selectivities of different brain areas, and is sufficiently robust that accurate scene captions can be reconstructed from brain activity. Using carefully controlled model comparisons, we then proceed to show that the accuracy with which LLM representations match brain representations derives from the ability of LLMs to integrate complex information contained in scene captions beyond that conveyed by individual words. Finally, we train deep neural network models to transform image inputs into LLM representations. Remarkably, these networks learn representations that are better aligned with brain representations than a large number of state-of-the-art alternative models, despite being trained on orders-of-magnitude less data. Overall, our results suggest that LLM embeddings of scene captions provide a representational format that accounts for complex information extracted by the brain from visual inputs.
Published: 2022

43. Lattice matching enables construction of CaS@NaYF4 heterostructure with synergistically enhanced water resistance and luminescence for antibiotic detection

Author: Wang, Yao, Chen, Huadong, Zhao, Tonghan, Wang, Jing, Wu, Yihan, Liu, Jinliang, Zhang, Yong, and Zhu, Xiaohui
Published: 2024
Full Text: View/download PDF

44. Contribution of freefall and rock mass structure to post-fragmentation spreading of rockslides

Author: Zhu, Zhiyuan, Wu, Yihan, Bi, Yuzhang, Zheng, Lu, Chen, Fei, Wu, Wei, and Zhang, Hong
Published: 2024
Full Text: View/download PDF

45. Analysis of anthropogenic disturbance and spatial and temporal changes of bird communities in plateau wetlands fusing bird survey and nighttime light remote sensing data

Author: Zhang, Xingyi, Zhong, Zhenhua, Zhang, Maolin, Zhao, Fei, Wu, Yihan, Sun, Yongqi, Luo, Jinxuan, Zhang, Yiyang, Wang, Xinrui, Cai, Jingzhi, Zhao, Xiaoqing, Xiong, Yinhong, Zhang, Sujin, and An, Tingbo
Published: 2025
Full Text: View/download PDF

46. Elucidating metabolite and pH variations in stroke through guanidino, amine and amide CEST MRI: A comparative multi-field study at 9.4T and 3T

Author: Wang, Kexin, Ju, Licheng, Qiao, Guanda, Liang, Yajie, Wu, Yihan, Chu, Chengyan, Rogers, Joshua, Li, Yuguo, Cao, Suyi, Dawson, Valina L., Dawson, Ted M, Walczak, Piotr, and Xu, Jiadi
Published: 2025
Full Text: View/download PDF

47. State-dependent inter-network functional connectivity development in neonatal brain from the developing human connectome project

Author: Zhao, Zhiyong, Li, Ruolin, Wu, Yihan, Li, Mingyang, and Wu, Dan
Published: 2025
Full Text: View/download PDF

48. Synthesizing the quantitative impact of urban neighborhood morphology on pedestrian wind environment- A meta-analysis approach

Author: Wu, Yihan, Wang, Mu, Kong, Fanyi, Kong, Junqiao, and Liu, Huimin
Published: 2025
Full Text: View/download PDF

49. Schizophrenia detection based on EEG using Recurrent Auto-Encoder framework

Author: Wu, Yihan, Xia, Min, Wang, Xiuzhu, and Zhang, Yangsong
Subjects: Quantitative Biology - Neurons and Cognition, Electrical Engineering and Systems Science - Signal Processing
Abstract: Schizophrenia (SZ) is a serious mental disorder that could seriously affect the patient's quality of life. In recent years, detection of SZ based on deep learning (DL) using electroencephalogram (EEG) has received increasing attention. In this paper, we proposed an end-to-end recurrent auto-encoder (RAE) model to detect SZ. In the RAE model, the raw data was input into one auto-encoder block, and the reconstructed data were recurrently input into the same block. The extracted code by auto-encoder block was simultaneously served as an input of a classifier block to discriminate SZ patients from healthy controls (HC). Evaluated on the dataset containing 14 SZ patients and 14 HC subjects, and the proposed method achieved an average classification accuracy of 81.81% in subject-independent experiment scenario. This study demonstrated that the structure of RAE is able to capture the differential features between SZ patients and HC subjects.
Published: 2022

50. Self-supervised Context-aware Style Representation for Expressive Speech Synthesis

Author: Wu, Yihan, Wang, Xi, Zhang, Shaofei, He, Lei, Song, Ruihua, and Nie, Jian-Yun
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Expressive speech synthesis, like audiobook synthesis, is still challenging for style representation learning and prediction. Deriving from reference audio or predicting style tags from text requires a huge amount of labeled data, which is costly to acquire and difficult to define and annotate accurately. In this paper, we propose a novel framework for learning style representation from abundant plain text in a self-supervised manner. It leverages an emotion lexicon and uses contrastive learning and deep clustering. We further integrate the style representation as a conditioned embedding in a multi-style Transformer TTS. Comparing with multi-style TTS by predicting style tags trained on the same dataset but with human annotations, our method achieves improved results according to subjective evaluations on both in-domain and out-of-domain test sets in audiobook speech. Moreover, with implicit context-aware style representation, the emotion transition of synthesized audio in a long paragraph appears more natural. The audio samples are available on the demo web., Comment: Accepted by Interspeech 2022
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

931 results on '"Wu, Yihan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources