Search

Your search keyword '"Wu, Xixin"' showing total 69 results

Search Constraints

Start Over You searched for: Author "Wu, Xixin" Remove constraint Author: "Wu, Xixin" Topic electrical engineering and systems science - audio and speech processing Remove constraint Topic: electrical engineering and systems science - audio and speech processing
69 results on '"Wu, Xixin"'

Search Results

1. Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

2. Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives

3. DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions

4. Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT

5. learning discriminative features from spectrograms using center loss for speech emotion recognition

6. Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech

7. AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions

8. Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC

9. Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation

10. Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

11. SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

12. SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

13. Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models

14. Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

15. Autoregressive Speech Synthesis without Vector Quantization

16. Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models

17. UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

18. CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

19. Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

20. SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

21. Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

22. CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

23. Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction

24. UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization

25. Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

26. StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis

27. UniAudio: An Audio Foundation Model Toward Universal Audio Generation

28. Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

29. QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

30. Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

31. MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

32. Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

33. A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition

34. Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

35. A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

36. Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

37. Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

38. Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using $\beta$-VAE

39. A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

40. Exploring linguistic feature and model combination for speech recognition based automatic AD detection

41. Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

42. Neural Architecture Search for Speech Emotion Recognition

43. Spoofing-Aware Speaker Verification by Multi-Level Fusion

44. A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS

45. Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

46. The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

47. Characterizing the adversarial vulnerability of speech self-supervised learning

48. Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

49. VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis

50. Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization

Catalog

Books, media, physical & digital resources