Search

Your search keyword '"Feris, Rogerio"' showing total 667 results

Search Constraints

Start Over You searched for: Author "Feris, Rogerio" Remove constraint Author: "Feris, Rogerio"
667 results on '"Feris, Rogerio"'

Search Results

1. Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation

2. Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers

3. State-Space Large Audio Language Models

4. Teaching VLMs to Localize Specific Objects from In-context Examples

5. GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

6. Scaling Granite Code Models to 128K Context

7. DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

8. Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems

9. Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

10. Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

11. Comparison Visual Instruction Tuning

12. ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

13. $\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

14. Adaptive Memory Replay for Continual Learning

15. CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

16. Large Scale Generative AI Text Applied to Sports and Music

17. Learning Human Action Recognition Representations Without Real Humans

18. LangNav: Language as a Perceptual Representation for Navigation

19. Self-Specialization: Uncovering Latent Expertise within Large Language Models

20. TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification

21. Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

22. LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

23. Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

24. Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs

25. Going Beyond Nouns With Vision & Language Models Using Synthetic Data

26. What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

27. Mind the Backbone: Minimizing Backbone Distortion for Robust Object Detection

28. MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

29. Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

30. Learning to Grow Pretrained Models for Efficient Transformer Training

31. Synthetic Pre-Training Tasks for Neural Machine Translation

32. Procedural Image Programs for Representation Learning

33. Exploring Consistency in Cross-Domain Transformer for Domain Adaptive Semantic Segmentation

34. CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning

35. Teaching Structured Vision&Language Concepts to Vision&Language Models

36. ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

37. C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

38. FETA: Towards Specializing Foundation Models for Expert Task Applications

39. VALHALLA: Visual Hallucination for Machine Translation

40. SimVQA: Exploring Simulated Environments for Visual Question Answering

41. Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

42. Unsupervised Domain Generalization by Learning a Bridge Across Domains

43. Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

44. Targeted Supervised Contrastive Learning for Long-Tailed Recognition

45. Cascaded Multilingual Audio-Visual Learning from Videos

46. Dynamic Network Quantization for Efficient Video Inference

47. Separating Skills and Concepts for Novel Visual Question Answering

48. IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers

49. Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

50. AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

Catalog

Books, media, physical & digital resources