Search

Your search keyword '"Yan, Zhijie"' showing total 365 results

Search Constraints

Start Over You searched for: Author "Yan, Zhijie" Remove constraint Author: "Yan, Zhijie" Publication Year Range Last 50 years Remove constraint Publication Year Range: Last 50 years
365 results on '"Yan, Zhijie"'

Search Results

1. Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

2. Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study

3. CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

4. FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

5. TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

6. Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving

7. Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures

9. Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

10. LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

11. The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

13. Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

14. MUG: A General Meeting Understanding and Generation Benchmark

15. Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)

16. Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

17. MSim: A Long-Term Interactive Driving Simulator

18. Long-Term Interactive Driving Simulation: MPC to the Rescue

19. MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

20. Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

23. Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

24. M$$^2$$Sim: A Long-Term Interactive Driving Simulator

25. Long-Term Interactive Driving Simulation: MPC to the Rescue

26. Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

27. ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

28. Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

30. M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

31. BeamTransformer: Microphone Array-based Overlapping Speech Detection

35. A Real-time Speaker Diarization System Based on Spatial Spectrum

38. Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

44. Effect of P and Ti on the agglomeration behavior of Al2O3 inclusions in Fe–P–Ti alloys

45. Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System

47. Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

49. Surface Defect Detection and Classification Based on Fusing Multiple Computer Vision Techniques

Catalog

Books, media, physical & digital resources