393 results on '"Zhou Zhao"'
Search Results
2. WIA-LD2ND: Wavelet-Based Image Alignment for Self-supervised Low-Dose CT Denoising.
3. Spatial-Aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image.
4. MoreStyle: Relax Low-Frequency Constraint of Fourier-Based Image Reconstruction in Generalizable Medical Image Segmentation.
5. Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation.
6. Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays.
7. Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer.
8. EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.
9. Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey.
10. Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt.
11. Wav2SQL: Direct Generalizable Speech-To-SQL Parsing.
12. MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech.
13. Rethinking the Multimodal Correlation of Multimodal Sequential Learning via Generalizable Attentional Results Alignment.
14. Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment.
15. AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension.
16. Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion.
17. TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.
18. Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation.
19. Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition.
20. Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
21. Robust Singing Voice Transcription Serves Synthesis.
22. TextrolSpeech: A Text Style Control Speech Corpus with Codec Language Text-to-Speech Models.
23. Language Model is a Branch Predictor for Simultaneous Machine Translation.
24. AntCritic: Argument Mining for Free-Form and Visually-Rich Financial Comments.
25. MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer.
26. Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations.
27. StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis.
28. AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
29. Non-confusing Generation of Customized Concepts in Diffusion Models.
30. UniAudio: Towards Universal Audio Generation with Large Language Models.
31. FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion.
32. InstructSpeech: Following Speech Editing Instructions via Large Language Models.
33. Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.
34. Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.
35. Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding.
36. MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
37. Exploring Group Video Captioning with Efficient Relational Approximation.
38. Open-Vocabulary Object Detection With an Open Corpus.
39. ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer.
40. 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding.
41. ART: rule bAsed futuRe-inference deducTion.
42. DATE: Domain Adaptive Product Seeker for E-Commerce.
43. ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos.
44. WINNER: Weakly-supervised hIerarchical decompositioN and aligNment for spatio-tEmporal video gRounding.
45. Gloss Attention for Gloss-free Sign Language Translation.
46. MSSRNet: Manipulating Sequential Style Representation for Unsupervised Text Style Transfer.
47. Unsupervised Domain Adaptation for Referring Semantic Segmentation.
48. Rethinking Missing Modality Learning from a Decoding Perspective.
49. UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching.
50. Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.