Search

Your search keyword '"Piergiovanni, AJ"' showing total 130 results

Search Constraints

Start Over You searched for: Author "Piergiovanni, AJ" Remove constraint Author: "Piergiovanni, AJ"
130 results on '"Piergiovanni, AJ"'

Search Results

1. Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning

2. Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

3. Diversifying Joint Vision-Language Tokenization Learning

4. Joint Adaptive Representations for Image-Language Learning

5. PaLI-X: On Scaling up a Multilingual Vision and Language Model

6. MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

7. Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

8. Compound Tokens: Channel Fusion for Vision-Language Representation Learning

9. F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

10. PaLI: A Jointly-Scaled Multilingual Language-Image Model

11. Pre-training image-language transformers for open-vocabulary tasks

12. Video Question Answering with Iterative Video-Text Co-Tokenization

13. Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering

14. FindIt: Generalized Localization with Natural Language Queries

15. 4D-Net for Learned Multi-Modal Alignment

16. Unsupervised Discovery of Actions in Instructional Videos

17. TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

18. Unsupervised Action Segmentation for Instructional Videos

19. Adaptive Intermediate Representations for Video Understanding

20. Recognizing Actions in Videos from Unseen Viewpoints

21. AssembleNet++: Assembling Modality Representations via Attention Connections

22. Adversarial Generative Grammars for Human Activity Prediction

23. AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification

24. AViD Dataset: Anonymized Videos from Diverse Countries

25. Evolving Losses for Unsupervised Video Representation Learning

26. Tiny Video Networks

27. Model-based Behavioral Cloning with Future Image Similarity Learning

28. Evolving Losses for Unlabeled Video Representation Learning

29. AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

30. Early Detection of Injuries in MLB Pitchers from Video

31. Differentiable Grammars for Videos

32. Evolving Space-Time Neural Architectures for Videos

33. Representation Flow for Action Recognition

34. Learning Multimodal Representations for Unseen Activities

35. Learning Real-World Robot Policies by Dreaming

36. Fine-grained Activity Recognition in Baseball Videos

37. Temporal Gaussian Mixture Layer for Videos

38. Learning Latent Super-Events to Detect Multiple Activities in Videos

39. Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters

43. Computational principles underlying people’s behavior explanations

Catalog

Books, media, physical & digital resources