Search

Your search keyword '"Yu, Kai"' showing total 15,165 results

Search Constraints

Start Over You searched for: Author "Yu, Kai" Remove constraint Author: "Yu, Kai"
15,165 results on '"Yu, Kai"'

Search Results

1. Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding

2. A Survey on Speech Large Language Models

3. LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

4. MobA: A Two-Level Agent System for Efficient Mobile Task Automation

5. Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models

6. SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs

7. F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

8. 3D UAV Trajectory Planning for IoT Data Collection via Matrix-Based Evolutionary Computation

9. AlignSum: Data Pyramid Hierarchical Fine-tuning for Aligning with Human Summarization Preference

10. TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation

11. SciDFM: A Large Language Model with Mixture-of-Experts for Science

12. FracGM: A Fast Fractional Programming Technique for Geman-McClure Robust Estimator

13. ChemDFM-X: Towards Large Multimodal Model for Chemistry

14. vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

15. BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding

16. SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks

17. Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

18. A New Pipeline For Generating Instruction Dataset via RAG and Self Fine-Tuning

19. UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling

20. Masked EEG Modeling for Driving Intention Prediction

21. Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings

22. A Reference-Based 3D Semantic-Aware Framework for Accurate Local Facial Attribute Editing

23. DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

24. Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

25. Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter

26. On the Effectiveness of Acoustic BPE in Decoder-Only TTS

27. IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation

28. Text-aware Speech Separation for Multi-talker Keyword Spotting

29. GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

30. Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

31. FakeSound: Deepfake General Audio Detection

32. Evolving Subnetwork Training for Large Language Models

33. Sparsity-Accelerated Training for Large Language Models

34. BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

35. Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation

36. MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving

37. Performance Analysis of Uplink/Downlink Decoupled Access in Cellular-V2X Networks

38. AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

39. CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions

40. Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech

41. StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

42. The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

43. Cell-Free Multi-User MIMO Equalization via In-Context Learning

44. Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind

45. Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback

46. TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

47. A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

48. ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary

49. Enhancing Audio Generation Diversity with Visual Information

Catalog

Books, media, physical & digital resources