Search

Your search keyword '"Yan, Zhijie"' showing total 333 results

Search Constraints

Start Over You searched for: Author "Yan, Zhijie" Remove constraint Author: "Yan, Zhijie"
333 results on '"Yan, Zhijie"'

Search Results

1. CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

2. FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

3. TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

4. Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving

6. Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures

7. Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

8. LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

9. The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

10. Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

11. MSim: A Long-Term Interactive Driving Simulator

12. Long-Term Interactive Driving Simulation: MPC to the Rescue

17. MUG: A General Meeting Understanding and Generation Benchmark

18. Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)

19. Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

20. M$$^2$$Sim: A Long-Term Interactive Driving Simulator

21. Long-Term Interactive Driving Simulation: MPC to the Rescue

22. MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

23. Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

25. Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

26. Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

27. ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

28. Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

31. M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

33. BeamTransformer: Microphone Array-based Overlapping Speech Detection

34. A Real-time Speaker Diarization System Based on Spatial Spectrum

38. Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

39. Effect of P and Ti on the agglomeration behavior of Al2O3 inclusions in Fe–P–Ti alloys

40. Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System

48. Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

49. Surface Defect Detection and Classification Based on Fusing Multiple Computer Vision Techniques

Catalog

Books, media, physical & digital resources