Author: "Zhang, Chunlei" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang, Chunlei"' showing total 2,662 results

Start Over Author "Zhang, Chunlei"

2,662 results on '"Zhang, Chunlei"'

1. Preference Alignment Improves Language Model-Based TTS

Author: Tian, Jinchuan, Zhang, Chunlei, Shi, Jiatong, Zhang, Hao, Yu, Jianwei, Watanabe, Shinji, and Yu, Dong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based systems offer competitive performance to their counterparts. Further optimization can be achieved through preference alignment algorithms, which adjust LMs to align with the preferences of reward models, enhancing the desirability of the generated content. This study presents a thorough empirical evaluation of how preference alignment algorithms, particularly Direct Preference Optimization (DPO), enhance LM-based TTS. With a 1.15B parameter LM-based TTS model, we demonstrate that preference alignment consistently improves intelligibility, speaker similarity, and proxy subjective evaluation scores, with the latter two metrics surpassing even human speech in certain evaluations. We also show preference alignment is applicable to low-resource scenarios and effectively generalized to out-of-domain applications.
Published: 2024

2. Advancing Multi-talker ASR Performance with Large Language Models

Author: Shi, Mohan, Jin, Zengrui, Xu, Yaoxun, Xu, Yong, Zhang, Shi-Xiong, Wei, Kun, Shao, Yiwen, Zhang, Chunlei, and Yu, Dong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence
Abstract: Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR). Serialized output training (SOT) is a classic method to address multi-talker ASR, with the idea of concatenating transcriptions from multiple speakers according to the emission times of their speech for training. However, SOT-style transcriptions, derived from concatenating multiple related utterances in a conversation, depend significantly on modeling long contexts. Therefore, compared to traditional methods that primarily emphasize encoder performance in attention-based encoder-decoder (AED) architectures, a novel approach utilizing large language models (LLMs) that leverages the capabilities of pre-trained decoders may be better suited for such complex and challenging scenarios. In this paper, we propose an LLM-based SOT approach for multi-talker ASR, leveraging pre-trained speech encoder and LLM, fine-tuning them on multi-talker dataset using appropriate strategies. Experimental results demonstrate that our approach surpasses traditional AED-based methods on the simulated dataset LibriMix and achieves state-of-the-art performance on the evaluation set of the real-world dataset AMI, outperforming the AED model trained with 1000 times more supervised data in previous works., Comment: 8 pages, accepted by IEEE SLT 2024
Published: 2024

3. TokSing: Singing Voice Synthesis based on Discrete Tokens

Author: Wu, Yuning, zhang, Chunlei, Shi, Jiatong, Tang, Yuxun, Yang, Shan, and Jin, Qin
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody expression poses a great challenge for utilizing discrete tokens. In this paper, we introduce TokSing, a discrete-based SVS system equipped with a token formulator that offers flexible token blendings. We observe a melody degradation during discretization, prompting us to integrate a melody signal with the discrete token and incorporate a specially-designed melody enhancement strategy in the musical encoder. Extensive experiments demonstrate that our TokSing achieves better performance against the Mel spectrogram baselines while offering advantages in intermediate representation space cost and convergence speed., Comment: Accepted by Interspeech 2024
Published: 2024

4. Diagnosis and Treatment of Myxoid Liposarcoma

Author: Qu, Guoxin, Zhang, Chunlei, Tian, Zhichao, and Yao, Weitao
Published: 2024
Full Text: View/download PDF

5. Efficacy of Ixekizumab in Chinese Patients with Moderate-to-Severe Psoriasis and Special Body Area Involvement: Sub-analysis of a Randomized, Double-Blind, Multicenter Phase 3 Study

Author: Li, Xia, Ding, Yangfeng, Zhang, Chunlei, Lu, Yan, Li, Fuqiu, Pan, Weili, Guo, Shuping, Li, Jinnan, Zhao, Bilian, and Zheng, Jie
Published: 2024
Full Text: View/download PDF

6. Crack Propagation and Strength Characteristics of Rock Mass Containing Cross-Cracks in Immovable Stone Cultural Relics

Author: Zhang, Chunlei, Arif, Arifuggaman, Zhang, Zhenhua, Feng, Ruimin, Sajib, Mahabub Hasan, Peng, Ningbo, Zhuang, Wei, Feng, Mingjie, Yuan, Meng, and Zhang, Ye
Published: 2024
Full Text: View/download PDF

7. Aβ-Aggregation-Generated Blue Autofluorescence Illuminates Senile Plaques as well as Complex Blood and Vascular Pathologies in Alzheimer’s Disease

Author: Fu, Hualin, Li, Jilong, Zhang, Chunlei, Du, Peng, Gao, Guo, Ge, Qiqi, Guan, Xinping, and Cui, Daxiang
Published: 2024
Full Text: View/download PDF

8. uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

Author: Yang, Muqiao, Zhang, Chunlei, Xu, Yong, Xu, Zhongweiyang, Wang, Heming, Raj, Bhiksha, and Yu, Dong
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs. In this paper, we propose a Unified Speech Enhancement and Editing (uSee) model with conditional diffusion models to handle various tasks at the same time in a generative manner. Specifically, by providing multiple types of conditions including self-supervised learning embeddings and proper text prompts to the score-based diffusion model, we can enable controllable generation of the unified speech enhancement and editing model to perform corresponding actions on the source speech. Our experiments show that our proposed uSee model can achieve superior performance in both speech denoising and dereverberation compared to other related generative speech enhancement models, and can perform speech editing given desired environmental sound text description, signal-to-noise ratios (SNR), and room impulse responses (RIR). Demos of the generated speech are available at https://muqiaoy.github.io/usee.
Published: 2023

9. Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

Author: Wang, Heming, Yu, Meng, Zhang, Hao, Zhang, Chunlei, Xu, Zhongweiyang, Yang, Muqiao, Zhang, Yixuan, and Yu, Dong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing. Existing deep learning based enhancement methods often struggle to effectively remove background noise and reverberation in real-world scenarios, hampering listening experiences. To address these challenges, we propose a novel approach that uses pre-trained generative methods to resynthesize clean, anechoic speech from degraded inputs. This study leverages pre-trained vocoder or codec models to synthesize high-quality speech while enhancing robustness in challenging scenarios. Generative methods effectively handle information loss in speech signals, resulting in regenerated speech that has improved fidelity and reduced artifacts. By harnessing the capabilities of pre-trained models, we achieve faithful reproduction of the original speech in adverse conditions. Experimental evaluations on both simulated datasets and realistic samples demonstrate the effectiveness and robustness of our proposed methods. Especially by leveraging codec, we achieve superior subjective scores for both simulated and realistic recordings. The generated speech exhibits enhanced audio quality, reduced background noise, and reverberation. Our findings highlight the potential of pre-trained generative techniques in speech processing, particularly in scenarios where traditional methods falter. Demos are available at https://whmrtm.github.io/SoundResynthesis., Comment: Paper in submission
Published: 2023

10. Non-coding RNAs in the spotlight of the pathogenesis, diagnosis, and therapy of cutaneous T cell lymphoma

Author: He, Xiao, Zhang, Qian, Wang, Yimeng, Sun, Jiachen, Zhang, Ying, and Zhang, Chunlei
Published: 2024
Full Text: View/download PDF

11. ALKBH5 regulates chicken adipogenesis by mediating LCAT mRNA stability depending on m6A modification

Author: Chao, Xiaohuan, Guo, Lijin, Ye, Chutian, Liu, Aijun, Wang, Xiaomeng, Ye, Mao, Fan, Zhexia, Luan, Kang, Chen, Jiahao, Zhang, Chunlei, Liu, Manqing, Zhou, Bo, Zhang, Xiquan, Li, Zhenhui, and Luo, Qingbin
Published: 2024
Full Text: View/download PDF

12. Association between metal exposures and periodontitis among U.S. adults: the potential mediating role of biological aging

Author: Dai, Zhida, Fu, Yingyin, Tan, Yuxuan, Yu, Xinyuan, Cao, Yixi, Xia, Yian, Jing, Chunxia, and Zhang, Chunlei
Published: 2024
Full Text: View/download PDF

13. Oroxin A alleviates early brain injury after subarachnoid hemorrhage by regulating ferroptosis and neuroinflammation

Author: Chen, Junhui, Shi, Zhonghua, Zhang, Chunlei, Xiong, Kun, Zhao, Wei, and Wang, Yuhai
Published: 2024
Full Text: View/download PDF

14. Harnessing strong aromatic conjugation in low-dimensional perovskite heterojunctions for high-performance photovoltaic devices

Author: Li, Bo, Liu, Qi, Gong, Jianqiu, Li, Shuai, Zhang, Chunlei, Gao, Danpeng, Chen, Zhongwei, Li, Zhen, Wu, Xin, Zhao, Dan, Yu, Zexin, Li, Xintong, Wang, Yan, Lu, Haipeng, Zeng, Xiao Cheng, and Zhu, Zonglong
Published: 2024
Full Text: View/download PDF

15. Experimental and Numerical Studies of a Shield Twin Tunnel Undercrossing the Existing High-Speed Railway Tunnel

Author: Fei, Ruizhen, Peng, Limin, Zhang, Chunlei, Zhang, Jiqing, and Zhang, Peng
Published: 2024
Full Text: View/download PDF

16. Research on Interference Signal Recognition in P Wave Pickup and Magnitude Estimation

Author: Yin, Deyu, Chen, Yadong, Yang, Yushun, Cheng, Yongzhen, and Zhang, Chunlei
Published: 2024
Full Text: View/download PDF

17. Experimental and Numerical Analysis on Bending Behavior of Hybrid Bridge Deck System Composed of Transversely Connected OSD and Composite Deck

Author: Dai, Changyuan, Su, Qingtian, Shao, Changyu, and Zhang, Chunlei
Published: 2024
Full Text: View/download PDF

18. Effectiveness of Ixekizumab in Chinese Patients with Moderate-Severe Plaque Psoriasis with Special Area Involvement: Subanalysis of a Prospective, Multicenter, Observational Real-World Study

Author: Li, Ying, Lv, Chengzhi, Dang, Lin, Lin, Bingjiang, Tao, Juan, Zhang, Chunlei, Zhou, Xiaoyong, Ma, Han, Lu, Yi, Chen, Rong, Li, Jinnan, Dou, Guanshen, Liang, Yunsheng, Liang, Yanhua, and Shi, Yuling
Published: 2024
Full Text: View/download PDF

19. Fabrication, Microstructure and Mechanical Properties of in situ GNPs Reinforced Magnesium Matrix Composites

Author: Zhao, Peitang, Li, Xuejian, Shi, Hailong, Hu, Xiaoshi, Zhang, Chunlei, Xu, Chao, and Wang, Xiaojun
Published: 2024
Full Text: View/download PDF

20. Make-A-Voice: Unified Voice Synthesis With Discrete Representation

Author: Huang, Rongjie, Zhang, Chunlei, Wang, Yongqi, Yang, Dongchao, Liu, Luping, Ye, Zhenhui, Jiang, Ziyue, Weng, Chao, Zhao, Zhou, and Yu, Dong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common. In addition, the majority of voice synthesis models currently rely on annotated audio data, but it is crucial to scale them to self-supervised datasets in order to effectively capture the wide range of acoustic variations present in human voice, including speaker identity, emotion, and prosody. In this work, we propose Make-A-Voice, a unified framework for synthesizing and manipulating voice signals from discrete representations. Make-A-Voice leverages a "coarse-to-fine" approach to model the human voice, which involves three stages: 1) semantic stage: model high-level transformation between linguistic content and self-supervised semantic tokens, 2) acoustic stage: introduce varying control signals as acoustic conditions for semantic-to-acoustic modeling, and 3) generation stage: synthesize high-fidelity waveforms from acoustic tokens. Make-A-Voice offers notable benefits as a unified voice synthesis framework: 1) Data scalability: the major backbone (i.e., acoustic and generation stage) does not require any annotations, and thus the training data could be scaled up. 2) Controllability and conditioning flexibility: we investigate different conditioning mechanisms and effectively handle three voice synthesis applications, including text-to-speech (TTS), voice conversion (VC), and singing voice synthesis (SVS) by re-synthesizing the discrete voice representations with prompt guidance. Experimental results demonstrate that Make-A-Voice exhibits superior audio quality and style similarity compared with competitive baseline models. Audio samples are available at https://Make-A-Voice.github.io
Published: 2023

21. International Experience and Implications of Continuous Monitoring of CO2 Emissions from Thermal Power Plants

Author: Zhang Chunlei, Zhao Liang, Liu Xiao, Huang Haiwei, Zhao Yong, Peng Silong, and Shu Yinbiao
Subjects: carbon emission statistics and accounting system, thermal power plant, CO2 emissions, continuous monitoring, continuous emission monitoring systems, carbon market, Engineering (General). Civil engineering (General), TA1-2040
Abstract: To promote the development and application of continuous monitoring technologies for CO2 emissions from thermal power plants in China and support the establishment of a carbon emission statistics and accounting system in the country, this study analyzes the international experience in the continuous monitoring of CO2 emissions from thermal power plants using literature research, expert discussion, and technical interpretation, with a focus on the United States and the European Union. The analysis covers eight aspects: types of units applying the technology, formulation of implementation standards, measures for dealing with failures, selection of monitoring technologies, research on key technologies, support for the low-carbon development of thermal power plants, verification of monitoring reports, and evaluation of measurement accuracy. Drawing on international beneficial experience and considering the current status of continuous monitoring of CO2 emissions from thermal power plants in China, the study proposes the following suggestions: (1) formulating supporting policies and regulations while considering the characteristics of technology application and the actual situation of thermal power plants in China, (2) accelerating the improvement in technical implementation standards and specifications to enhance the guiding and supporting role of standards, (3) strengthening the statistical analysis of the measurements using existing flue gas flow meters to provide references for relevant work, (4) enhancing research on flue gas flow measurement in thermal power plants to improve the technical level of continuous monitoring of CO2 emissions, (5) promoting the application of digital technologies in the verification of monitoring reports to support closed-loop management of continuous monitoring of CO2 emissions, and (6) improving the evaluation standards system for measurement accuracy of continuous emission monitoring systems to enhance the international recognition of China’s carbon emission data.
Published: 2024
Full Text: View/download PDF

22. C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification

Author: Zhang, Chunlei and Yu, Dong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Self-supervised learning (SSL) has drawn an increased attention in the field of speech processing. Recent studies have demonstrated that contrastive learning is able to learn discriminative speaker embeddings in a self-supervised manner. However, base contrastive self-supervised learning (CSSL) assumes that the pairs generated from a view of anchor instance and any view of other instances are all negative, which introduces many false negative pairs in constructing the loss function. The problem is referred as $class$-$collision$, which remains as one major issue that impedes the CSSL based speaker verification (SV) systems from achieving better performances. In the meanwhile, studies reveal that negative sample free SSL frameworks perform well in learning speaker or image representations. In this study, we investigate SSL techniques that lead to an improved SV performance. We first analyse the impact of false negative pairs in the CSSL systems. Then, a multi-stage Class-Collision Correction (C3) method is proposed, which leads to the state-of-the-art CSSL based speaker embedding system. On the basis of the pretrained CSSL model, we further propose to employ a negative sample free SSL objective (i.e., DINO) to fine-tune the speaker embedding network. The resulting speaker embedding system (C3-DINO) achieves 2.5% EER with a simple Cosine Distance Scoring method on Voxceleb1 test set, which outperforms the previous SOTA SSL system (4.86%) by a significant +45% relative improvement. With speaker clustering and pseudo labeling on Voxceleb2 training set, a LDA/CDS back-end applying on the C3-DINO speaker embeddings is able to further push the EER to 2.2%. Comprehensive experimental investigations of the Voxceleb benchmarks and our internal dataset demonstrate the effectiveness of our proposed methods, and the performance gap between the SSL SV and the supervised counterpart narrows further., Comment: Accepted to IEEE Journal of Selected Topics in Signal Processing
Published: 2022
Full Text: View/download PDF

23. Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE

Author: Lian, Jiachen, Zhang, Chunlei, Anumanchipalli, Gopala Krishna, and Yu, Dong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound
Abstract: In this paper, we propose a novel unsupervised text-to-speech acoustic model training scheme, named UTTS, which does not require text-audio pairs. UTTS is a multi-speaker speech synthesizer that supports zero-shot voice cloning, it is developed from a perspective of disentangled speech representation learning. The framework offers a flexible choice of a speaker's duration model, timbre feature (identity) and content for TTS inference. We leverage recent advancements in self-supervised speech representation learning as well as speech synthesis front-end techniques for system development. Specifically, we employ our recently formulated Conditional Disentangled Sequential Variational Auto-encoder (C-DSVAE) as the backbone UTTS AM, which offers well-structured content representations given unsupervised alignment (UA) as condition during training. For UTTS inference, we utilize a lexicon to map input text to the phoneme sequence, which is expanded to the frame-level forced alignment (FA) with a speaker-dependent duration model. Then, we develop an alignment mapping module that converts FA to UA. Finally, the C-DSVAE, serving as the self-supervised TTS AM, takes the predicted UA and a target speaker embedding to generate the mel spectrogram, which is ultimately converted to waveform with a neural vocoder. We show how our method enables speech synthesis without using a paired TTS corpus in AM development stage. Experiments demonstrate that UTTS can synthesize speech of high naturalness and intelligibility measured by human and objective evaluations. Audio samples are available at our demo page https://neurtts.github.io/utts\_demo/., Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 31)
Published: 2022
Full Text: View/download PDF

24. LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Author: Tian, Jinchuan, Yu, Jianwei, Zhang, Chunlei, Weng, Chao, Zou, Yuexian, and Yu, Dong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Despite the rapid progress in automatic speech recognition (ASR) research, recognizing multilingual speech using a unified ASR system remains highly challenging. Previous works on multilingual speech recognition mainly focus on two directions: recognizing multiple monolingual speech or recognizing code-switched speech that uses different languages interchangeably within a single utterance. However, a pragmatic multilingual recognizer is expected to be compatible with both directions. In this work, a novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information and generating frame-level language-aware representations during encoding. In the LAE, the primary encoding is implemented by the shared block while the language-specific blocks are used to extract specific representations for each language. To learn language-specific information discriminatively, a language-aware training method is proposed to optimize the language-specific blocks in LAE. Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks. With either a real-recorded or simulated code-switched dataset, the proposed LAE achieves statistically significant improvements on both CTC and neural transducer systems. Code is released
Published: 2022

25. Investigation of the Viability of Cells upon Co-Exposure to Gold and Iron Oxide Nanoparticles

Author: Zhang, Qian, primary, Lai, Weien, additional, Yin, Ting, additional, Zhang, Chunlei, additional, Yue, Caixia, additional, Cheng, Jin, additional, Wang, Kan, additional, Yang, Yuming, additional, Cui, Daxiang, additional, and Parak, Wolfgang J., additional
Published: 2024
Full Text: View/download PDF

26. NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

Author: Yu, Meng, Xu, Yong, Zhang, Chunlei, Zhang, Shi-Xiong, and Yu, Dong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back. In this paper, we present an all-deep-learning framework that implicitly estimates the second order statistics of echo/noise and target speech, and jointly solves echo and noise suppression through an attention based recurrent neural network. The proposed model outperforms the state-of-the-art joint echo cancellation and speech enhancement method F-T-LSTM in terms of objective speech quality metrics, speech recognition accuracy and model complexity. We show that this model can work with speaker embedding for better target speech enhancement and furthermore develop a branch for automatic gain control (AGC) task to form an all-in-one front-end speech enhancement system., Comment: Submitted to INTERSPEECH 2022
Published: 2022

27. Towards Improved Zero-shot Voice Conversion with Conditional DSVAE

Author: Lian, Jiachen, Zhang, Chunlei, Anumanchipalli, Gopala Krishna, and Yu, Dong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Disentangling content and speaking style information is essential for zero-shot non-parallel voice conversion (VC). Our previous study investigated a novel framework with disentangled sequential variational autoencoder (DSVAE) as the backbone for information decomposition. We have demonstrated that simultaneous disentangling content embedding and speaker embedding from one utterance is feasible for zero-shot VC. In this study, we continue the direction by raising one concern about the prior distribution of content branch in the DSVAE baseline. We find the random initialized prior distribution will force the content embedding to reduce the phonetic-structure information during the learning process, which is not a desired property. Here, we seek to achieve a better content embedding with more phonetic information preserved. We propose conditional DSVAE, a new model that enables content bias as a condition to the prior modeling and reshapes the content embedding sampled from the posterior distribution. In our experiment on the VCTK dataset, we demonstrate that content embeddings derived from the conditional DSVAE overcome the randomness and achieve a much better phoneme classification accuracy, a stabilized vocalization and a better zero-shot VC performance compared with the competitive DSVAE baseline., Comment: Accepted to 2022 Interspeech. Demo link is here https://jlian2.github.io/Improved-Voice-Conversion-with-Conditional-DSVAE/
Published: 2022

28. EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

Author: Maiti, Soumi, Ueda, Yushi, Watanabe, Shinji, Zhang, Chunlei, Yu, Meng, Zhang, Shi-Xiong, and Xu, Yong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting. Our proposed framework integrates speaker diarization based on end-to-end neural diarization (EEND) models, speaker counting with encoder-decoder based attractors (EDA), and speech separation using Conv-TasNet. In addition, we propose a multiple 1x1 convolutional layer architecture for estimating the separation masks corresponding to a flexible number of speakers and a fusion technique for refining the separated speech signal with obtained speaker diarization information to improve the joint framework. Experiments using the LibriMix dataset show that our proposed method outperforms the single-task baselines in both diarization and separation metrics for fixed and flexible numbers of speakers and improves speaker counting performance for flexible numbers of speakers. All materials will be open-sourced and reproducible in ESPnet toolkit., Comment: Accepted in SLT 2022
Published: 2022

29. Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion

Author: Lian, Jiachen, Zhang, Chunlei, and Yu, Dong
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Signal Processing
Abstract: Traditional studies on voice conversion (VC) have made progress with parallel training data and known speakers. Good voice conversion quality is obtained by exploring better alignment modules or expressive mapping functions. In this study, we investigate zero-shot VC from a novel perspective of self-supervised disentangled speech representation learning. Specifically, we achieve the disentanglement by balancing the information flow between global speaker representation and time-varying content representation in a sequential variational autoencoder (VAE). A zero-shot voice conversion is performed by feeding an arbitrary speaker embedding and content embeddings to the VAE decoder. Besides that, an on-the-fly data augmentation training strategy is applied to make the learned representation noise invariant. On TIMIT and VCTK datasets, we achieve state-of-the-art performance on both objective evaluation, i.e., speaker verification (SV) on speaker embedding and content embedding, and subjective evaluation, i.e., voice naturalness and similarity, and remains to be robust even with noisy source/target utterances., Comment: Accepted to 2022 ICASSP
Published: 2022

30. LH-stimulated periodic lincRNA HEOE regulates follicular dynamics and influences estrous cycle and fertility via miR-16-ZMAT3 and PGF2α in pigs

Author: Liu, Mingzheng, Chen, Jiahao, Liu, Shuhan, Zhang, Chunlei, Chao, Xiaohuan, Yang, Huan, Xu, Qinglei, Wang, Tianshuo, Bi, Hongwei, Ding, Yuan, Wang, Ziming, Muhammad, Asim, Muhammad, Mubashir, Schinckel, Allan P., and Zhou, Bo
Published: 2024
Full Text: View/download PDF

31. New method for the preparation of hierarchical nanotube K0.3xMnxCe1-xOδ catalysts and their excellent catalytic performance for soot combustion

Author: Zhang, Chunlei, Gao, Siyu, Zhou, Shengran, Yu, Di, Wang, Lanyi, Fan, Xiaoqiang, Yu, Xuehua, Liu, Bing, and Zhao, Zhen
Published: 2024
Full Text: View/download PDF

32. Phytic acid-loaded polyvinyl alcohol hydrogel promotes wound healing of injured corneal epithelium through inhibiting ferroptosis

Author: Gong, Danni, Wu, Nianxuan, Chen, Huan, Zhang, Weijie, Yan, Chenxi, Zhang, Chunlei, Fu, Yao, and Sun, Hao
Published: 2024
Full Text: View/download PDF

33. Using multimodal resources to design EFL classroom lead-ins—A multimodal pedagogical stylistics perspective

Author: Lei, Qian and Zhang, Chunlei
Published: 2024
Full Text: View/download PDF

34. Evaluation of forage quality in various soybean varieties and high-yield cultivation techniques

Author: Sun, Baiquan, Yuan, Shan, Naser, Mahmoud, Zhou, Yanfeng, Jia, Hongchang, Yu, Yang, Yao, Xiangyu, Wu, Tingting, Song, Wenwen, Jiang, Bingjun, Dong, Hongxia, Zhang, Chunlei, Sapey, Enoch, Wang, Peiguo, Sun, Yanhui, Zhang, Junquan, Zhang, Lixin, Li, Qimeng, Xu, Cailong, Jia, Xin, Wu, Cunxiang, Yu, Lijie, Sun, Shi, Guo, Changhong, and Han, Tianfu
Published: 2024
Full Text: View/download PDF

35. Examining generative AI user addiction from a C-A-C perspective

Author: Zhou, Tao and Zhang, Chunlei
Published: 2024
Full Text: View/download PDF

36. Cell membrane-camouflaged nanoarchitectonics of photosensitizer nanoparticles for enhanced phototherapy in surgery

Author: Xu, Xia, Sun, Jiachen, Zhao, Jie, Yu, Fanchen, Xu, Yang, Zhang, Chunlei, and Li, Junbai
Published: 2025
Full Text: View/download PDF

37. Development of an online monitoring system for water body γ radioactivity based on 4G-RTU

Author: ZHANG Sheng, WANG Yongjun, WANG Ruijun, HUANG Qingbo, ZHANG Chunlei, and WU Mingyang
Subjects: 4g-rtu, water body, γ radioactivity, online monitoring, Nuclear engineering. Atomic power, TK9001-9401
Abstract: BackgroundWith the rapid development of nuclear energy and the wide application of nuclear technology, the radioactivity level of bodies of water has become a highly concerning issue for the public and governments, especially after the Fukushima nuclear accident in Japan.PurposeThis study aims to develop an online γ radioactivitymonitoring system based on 4G Remote Terminal Unit (4G-RTU) to meet the needs of online and emergency monitoring of water radioactivity.MethodsFirst, the γ-ray monitoring device based on sodium iodide detector, 4G-RTU, an integrated power supply waterproof and compression resistant floating device and corresponding software were employed to compose an online monitoring. Second, Qt programming control software was used to realize the remote control of the system, real-time radioactivity monitoring, and data upload. Finally, the original data obtained by the system were used to test the performance indicators, applicability, accuracy, and software functions to verify the practicability of the system.ResultsWithin the coverage of the 4G network, the system realizes remote control of equipment, real-time online monitoring, and data upload throughout the day. The detectable energy range of the system is 30~3 000 keV, and the energy resolution of the system for 137Cs at 662 keV is 7.3% with minimum detectable activity of 0.75 Bq∙L-1. The spectral drift for 208Tl at 2 614 keV is 0.33%, and the linearity of the spectral energy is 0.999 970. The maximum value of energy spectrum stability is 2.28% for 7 h continuous operating, and the minimum value is -2.36%. The operating temperature range of the system is in the range of -5 ℃ to +50 ℃.ConclusionsThe on-line monitoring system meets the application demand and achieves the expected function. It has important popularization value and application prospects in the field of online radioactive monitoring of bodies of water such as oceans, lakes, and rivers.
Published: 2024
Full Text: View/download PDF

38. Dictionary cache transformer for hyperspectral image classification

Author: Zhou, Heng, Zhang, Xin, Zhang, Chunlei, Ma, Qiaoyu, and Jiang, Yanan
Published: 2023
Full Text: View/download PDF

39. Solid-state injection locking microwave amplifier.

Author: Lu, Chenyang, Kim, Mun, Zhang, Chunlei, and Hu, Can-Ming
Subjects: PHASE noise, MICROWAVE amplifiers, CAVITY resonators, BANDWIDTHS, NOISE
Abstract: In this paper, we present the design and performance evaluation of a prototype device based on gain-driven polariton, which can function as an amplifier. This device can outperform a single microwave cavity in terms of phase noise and stability. Employed as a standalone amplifier, it demonstrates consistency between input and output phase noise, achieving a gain of up to 35 dB with a typical output of 5 dBm at 4.26 GHz. Despite its limited bandwidth, the device provides an effective solution for small signal amplification, ensuring minimal noise introduction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Simultaneous modification of redox and acidic properties of FeOx catalysts derived from MIL-100(Fe) via HPW incorporation for NH3-SCR

Author: Jiang, Miao, Yan, Zidi, Zhang, Yanshuang, Zhang, Chunlei, Chang, Chuang, Xiao, Min, Ruan, Luna, Yan, Yong, Yu, Yunbo, and He, Hong
Published: 2024
Full Text: View/download PDF

41. Investigation of sodium–manganese oxides with various crystal phases for the efficient catalytic removal of diesel soot particles

Author: Yu, Di, Ren, Yu, Zhang, Yinguang, Gao, Siyu, Zhang, Xinyu, Chen, Xinyu, Chen, Siyuan, Wang, Lanyi, Zhang, Chunlei, Yu, Xuehua, Fan, Xiaoqiang, and Zhao, Zhen
Published: 2024
Full Text: View/download PDF

42. Competition between microstructural factors affecting growth of abnormally large grains in thin Cu foils

Author: Guo, Jing, Zhang, Chunlei, Zöllner, Dana, Li, Xin, Wu, Guilin, Huang, Tianlin, Pantleon, Wolfgang, Huang, Xiaoxu, and Juul Jensen, Dorte
Published: 2024
Full Text: View/download PDF

43. LatLBP: Spatial-spectral latent local binary pattern for hyperspectral image classification

Author: Zhang, Xin, Jiang, Yanan, Zhang, Chunlei, and Zhang, Zitong
Published: 2024
Full Text: View/download PDF

44. Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Author: Yan, Brian, Zhang, Chunlei, Yu, Meng, Zhang, Shi-Xiong, Dalmia, Siddharth, Berrebbi, Dan, Weng, Chao, Watanabe, Shinji, and Yu, Dong
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with label-to-frame synchronization, our joint modeling framework can be conditionally factorized such that the final bilingual output, which may or may not be code-switched, is obtained given only monolingual information. We show that this conditionally factorized joint framework can be modeled by an end-to-end differentiable neural network. We demonstrate the efficacy of our proposed model on bilingual Mandarin-English speech recognition across both monolingual and code-switched corpora.
Published: 2021

45. Characterizing the 2D single atom solutions to capture CO2 by the digital twin model

Author: Wu, Yuansheng, Zhou, Chenyang, Li, Yuan, Zhang, Chunlei, Yu, Yunsong, and Wang, Geoff
Published: 2024
Full Text: View/download PDF

46. Coordinated management of oxygen excess ratio and cathode pressure for PEMFC based on synthesis variable-gain robust predictive control

Author: Zhang, Xin, Zhang, Chunlei, Zhang, Zhijin, Gao, Sen, and Li, He
Published: 2024
Full Text: View/download PDF

47. Designing a Ce/In-CHA OXZEO catalyst for high-efficient selective catalytic reduction of nitrogen oxide with methane

Author: Zhang, Chunlei, Xu, Guangyan, Zhang, Yanshuang, Chang, Chuang, Jiang, Miao, Ruan, Luna, Xiao, Min, Yan, Zidi, Yu, Yunbo, and He, Hong
Published: 2024
Full Text: View/download PDF

48. Research progress on preparation of cerium-based oxide catalysts with specific morphology and their application for purification of diesel engine exhaust

Author: Zhang, Chunlei, Gao, Siyu, Yu, Di, Zhou, Shengran, Wang, Lanyi, Yu, Xuehua, and Zhao, Zhen
Published: 2024
Full Text: View/download PDF

49. Bismuth nanocluster loaded on N-doped porous carbon with “memory catalysis” effect for efficient photocatalytic H2 generation and uranium(VI) reduction

Author: Qin, Weidi, Xiao, Yang, Zhang, Chunlei, Gong, Haiyi, Zhang, Qingsong, and Zeng, Qingyi
Published: 2024
Full Text: View/download PDF

50. Simultaneously uranium reduction and organics degradation by a drivingpowers enhanced photocatalytic fuel cell based on a UiO-66-NH2 derived zirconia/N-dopped porous carbon cathode

Author: Zhang, Qingsong, Deng, Qimou, Zhang, Yaoyao, Xiao, Yang, Zhang, Chunlei, Gong, Haiyi, Zeng, Qingming, Zhang, Qingyan, and Zeng, Qingyi
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

2,662 results on '"Zhang, Chunlei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources