Search

Your search keyword '"Bitton, Yonatan"' showing total 33 results

Search Constraints

Start Over You searched for: Author "Bitton, Yonatan" Remove constraint Author: "Bitton, Yonatan"
33 results on '"Bitton, Yonatan"'

Search Results

1. Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

2. Contrastive Sequential-Diffusion Learning: An approach to Multi-Scene Instructional Video Synthesis

3. Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

4. Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

5. DataComp-LM: In search of the next generation of training sets for language models

6. VideoPhy: Evaluating Physical Commonsense for Video Generation

7. Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks

8. TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

9. ImageInWords: Unlocking Hyper-Detailed Image Descriptions

10. DOCCI: Descriptions of Connected and Contrasting Images

11. ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies

12. A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

13. Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

14. VideoCon: Robust Video-Language Alignment via Contrast Captions

15. VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

16. OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

17. Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

18. Transferring Visual Attributes from Natural Language to Verified Image Generation

19. What You See is What You Read? Improving Text-Image Alignment Evaluation

20. q2d: Turning Questions into Dialogs to Teach Models How to Search

21. DataComp: In search of the next generation of multimodal datasets

22. IRFL: Image Recognition of Figurative Language

23. Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

24. VASR: Visual Analogies of Situation Recognition

25. WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

26. Data Efficient Masked Language Modeling for Vision and Language

27. Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA

Catalog

Books, media, physical & digital resources