Search

Your search keyword '"Arnab, Anurag"' showing total 173 results

Search Constraints

Start Over You searched for: Author "Arnab, Anurag" Remove constraint Author: "Arnab, Anurag"
173 results on '"Arnab, Anurag"'

Search Results

1. Towards Optimal Adapter Placement for Efficient Transfer Learning

2. Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels

3. Mixture of Nested Experts: Adaptive Processing of Visual Tokens

4. Planted: a dataset for planted forest identification from multi-satellite time series

5. Streaming Dense Video Captioning

6. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

7. Time-, Memory- and Parameter-Efficient Visual Adaptation

8. Pixel Aligned Language Models

9. Video Summarization: Towards Entity-Aware Captions

10. UnLoc: A Unified Framework for Video Localization Tasks

11. Does Visual Pretraining Help End-to-End Reasoning?

12. Dense Video Object Captioning from Disjoint Supervision

13. How can objects help action recognition?

14. Optimizing ViViT Training: Time and Memory Reduction for Action Recognition

15. PaLI-X: On Scaling up a Multilingual Vision and Language Model

16. End-to-End Spatio-Temporal Action Localisation with Video Transformers

17. VicTR: Video-conditioned Text Representations for Activity Recognition

18. CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

19. Scaling Vision Transformers to 22 Billion Parameters

20. Adaptive Computation with Elastic Input Sequence

21. Audiovisual Masked Autoencoders

22. Token Turing Machines

23. Dynamic Graph Message Passing Networks for Visual Recognition

24. Beyond Transfer Learning: Co-finetuning for Action Localisation

25. M&M Mix: A Multimodal Multiview Transformer Ensemble

26. Simple Open-Vocabulary Object Detection with Vision Transformers

27. Learning with Neighbor Consistency for Noisy Labels

28. End-to-end Generative Pretraining for Multimodal Video Captioning

29. Multiview Transformers for Video Recognition

30. PolyViT: Co-training Vision Transformers on Images, Videos and Audio

31. The Efficiency Misnomer

32. SCENIC: A JAX Library for Computer Vision Research and Beyond

33. Compressive Visual Representations

34. Attention Bottlenecks for Multimodal Fusion

35. TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

36. ViViT: A Video Vision Transformer

37. Unified Graph Structured Models for Video Understanding

39. Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos

40. Dual Graph Convolutional Network for Semantic Segmentation

41. Dynamic Graph Message Passing Networks

42. Exploiting temporal context for 3D human pose estimation in the wild

43. Meta Learning Deep Visual Words for Fast Video Object Segmentation

44. Simple Open-Vocabulary Object Detection

45. Pixel-level scene understanding with deep structured models

46. Weakly- and Semi-Supervised Panoptic Segmentation

47. On the Robustness of Semantic Segmentation Models to Adversarial Attacks

48. Holistic, Instance-Level Human Parsing

49. Pixelwise Instance Segmentation with a Dynamically Instantiated Network

50. A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials

Catalog

Books, media, physical & digital resources