Search

Your search keyword '"Liu, Shujie"' showing total 1,450 results

Search Constraints

Start Over You searched for: Author "Liu, Shujie" Remove constraint Author: "Liu, Shujie"
1,450 results on '"Liu, Shujie"'

Search Results

1. ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

2. NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization

3. Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

4. Autoregressive Speech Synthesis without Vector Quantization

5. VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

6. VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

7. TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

8. CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

9. RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

10. WavLLM: Towards Robust and Adaptive Speech Large Language Model

11. Advanced Long-Content Speech Recognition With Factorized Neural Transducer

13. Boosting Large Language Model for Speech Synthesis: An Empirical Study

17. COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

18. DDTSE: Discriminative Diffusion Model for Target Speech Extraction

19. WavMark: Watermarking for Audio Generation

20. SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

21. On decoder-only architecture for speech-to-text and large language model integration

22. OpenNDD: Open Set Recognition for Neurodevelopmental Disorders Detection

23. Accelerating Transducers through Adjacent Token Merging

24. Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

30. VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

31. ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

32. Code-Switching Text Generation and Injection in Mandarin-English ASR

33. Target Sound Extraction with Variable Cross-modality Clues

34. Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

35. Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

36. Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

37. BEATs: Audio Pre-Training with Acoustic Tokenizers

38. Game Engine Technology in Cultural Heritage Digitization Application Prospect–Taking the Digital Cave of the Mogao Caves in China as an Example

39. Research on Hydrate Formation Risk in the Wellbore of Deepwater Dual-Source Co-production

40. Numerical Simulation of Breathing Effect Induced by Drilling in Deep-Water Shallow Formations

41. VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

42. Exploring WavLM on Speech Enhancement

43. LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

44. LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

45. Two-Stream Network for Sign Language Recognition and Translation

46. Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

47. SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

48. SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

50. Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

Catalog

Books, media, physical & digital resources