45 results on '"Anna, Rohrbach"'
Search Results
2. Object-based (yet Class-agnostic) Video Domain Adaptation.
3. Simple Token-Level Confidence Improves Caption Correctness.
4. Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens.
5. Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly.
6. The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning.
7. Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022.
8. ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension.
9. On Guiding Visual Attention with Language Specification.
10. K-LITE: Learning Transferable Visual Models with External Knowledge.
11. TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency.
12. Shape-Guided Diffusion with Inside-Outside Attention.
13. Focus! Relevant and Sufficient Context Selection for News Image Captioning.
14. G^3: Geolocation via Guidebook Grounding.
15. Using Language to Extend to Unseen Domains.
16. Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation.
17. Object-Region Video Transformers.
18. NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media.
19. CLIP-It! Language-Guided Video Summarization.
20. DETReg: Unsupervised Pretraining with Region Priors for Object Detection.
21. How Much Can CLIP Benefit Vision-and-Language Tasks?
22. Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion.
23. More Control for Free! Image Synthesis with Semantic Diffusion Guidance.
24. Identity-Aware Multi-Sentence Video Description.
25. Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.
26. Language-Conditioned Graph Networks for Relational Reasoning.
27. Viewpoint Invariant Change Captioning.
28. Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.
29. Textual Explanations for Self-Driving Vehicles.
30. Women also Snowboard: Overcoming Bias in Captioning Models.
31. Object Hallucination in Image Captioning.
32. Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract).
33. Speaker-Follower Models for Vision-and-Language Navigation.
34. Video Object Segmentation with Language Referring Expressions.
35. Adversarial Inference for Multi-Sentence Video Description.
36. Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract).
37. Can you fool AI with adversarial examples on a visual Turing test?
38. Gradient-free Policy Architecture Search and Adaptation.
39. Generating Descriptions with Grounded and Co-Referenced People.
40. Movie Description.
41. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
42. Recognizing Fine-Grained and Composite Activities using Hand-Centric Features and Script Data.
43. A Dataset for Movie Description.
44. The Long-Short Story of Movie Description.
45. Grounding of Textual Phrases in Images by Reconstruction.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.