Author: "Vani, Ankit" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Vani, Ankit"' showing total 11 results

Start Over Author "Vani, Ankit"

11 results on '"Vani, Ankit"'

1. Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics

Author: Vani, Ankit, Tung, Frederick, Oliveira, Gabriel L., and Sharifi-Noghabi, Hossein
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Despite attaining high empirical generalization, the sharpness of models trained with sharpness-aware minimization (SAM) do not always correlate with generalization error. Instead of viewing SAM as minimizing sharpness to improve generalization, our paper considers a new perspective based on SAM's training dynamics. We propose that perturbations in SAM perform perturbed forgetting, where they discard undesirable model biases to exhibit learning signals that generalize better. We relate our notion of forgetting to the information bottleneck principle, use it to explain observations like the better generalization of smaller perturbation batches, and show that perturbed forgetting can exhibit a stronger correlation with generalization than flatness. While standard SAM targets model biases exposed by the steepest ascent directions, we propose a new perturbation that targets biases exposed through the model's outputs. Our output bias forgetting perturbations outperform standard SAM, GSAM, and ASAM on ImageNet, robustness benchmarks, and transfer to CIFAR-{10,100}, while sometimes converging to sharper regions. Our results suggest that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface., Comment: Published as a conference paper at ICML 2024. 9 pages main, 15 pages total including references and appendix
Published: 2024

2. SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

Author: Vani, Ankit, Nguyen, Bac, Lavoie, Samuel, Krishna, Ranjay, and Courville, Aaron
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts., Comment: Conference paper at ECCV 2024. 11 pages main, 23 pages total including references and appendix
Published: 2024

3. On the Compositional Generalization Gap of In-Context Learning

Author: Hosseini, Arian, Vani, Ankit, Bahdanau, Dzmitry, Sordoni, Alessandro, and Courville, Aaron
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Pretrained large generative language models have shown great performance on many tasks, but exhibit low compositional generalization abilities. Scaling such models has been shown to improve their performance on various NLP tasks even just by conditioning them on a few examples to solve the task without any fine-tuning (also known as in-context learning). In this work, we look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning. In the ID settings, the demonstrations are from the same split (test or train) that the model is being evaluated on, and in the OOD settings, they are from the other split. We look at how the relative generalization gap of in-context learning evolves as models are scaled up. We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets, CFQ, SCAN and GeoQuery with different number of exemplars, and observe a trend of decreasing relative generalization gap as models are scaled up.
Published: 2022

4. Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

Author: Lavoie, Samuel, Tsirigotis, Christos, Schwarzer, Max, Vani, Ankit, Noukhovitch, Michael, Kawaguchi, Kenji, and Courville, Aaron
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into $L$ simplices of $V$ dimensions each using a softmax operation. This procedure conditions the representation onto a constrained space during pretraining and imparts an inductive bias for group sparsity. For downstream classification, we formally prove that the SEM representation leads to better generalization than an unnormalized representation. Furthermore, we empirically demonstrate that SSL methods trained with SEMs have improved generalization on natural image datasets such as CIFAR-100 and ImageNet. Finally, when used in a downstream classification task, we show that SEM features exhibit emergent semantic coherence where small groups of learned features are distinctly predictive of semantically-relevant classes., Comment: 30 pages, 8 figures, Preprint
Published: 2022

5. Fortuitous Forgetting in Connectionist Networks

Author: Zhou, Hattie, Vani, Ankit, Larochelle, Hugo, and Courville, Aaron
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Abstract: Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact be favorable to learning. We introduce "forget-and-relearn" as a powerful paradigm for shaping the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the relearning step reinforces features that are consistently useful under different conditions. The forget-and-relearn framework unifies many existing iterative training algorithms in the image classification and language emergence literature, and allows us to understand the success of these algorithms in terms of the disproportionate forgetting of undesirable information. We leverage this understanding to improve upon existing algorithms by designing more targeted forgetting operations. Insights from our analysis provide a coherent view on the dynamics of iterative training in neural networks and offer a clear path towards performance improvements., Comment: ICLR Camera Ready
Published: 2022

6. Iterated learning for emergent systematicity in VQA

Author: Vani, Ankit, Schwarzer, Max, Lu, Yuchen, Dhekane, Eeshan, and Courville, Aaron
Subjects: Computer Science - Machine Learning, I.2.6
Abstract: Although neural module networks have an architectural bias towards compositionality, they require gold standard layouts to generalize systematically in practice. When instead learning layouts and modules jointly, compositionality does not arise automatically and an explicit pressure is necessary for the emergence of layouts exhibiting the right structure. We propose to address this problem using iterated learning, a cognitive science theory of the emergence of compositional languages in nature that has primarily been applied to simple referential games in machine learning. Considering the layouts of module networks as samples from an emergent language, we use iterated learning to encourage the development of structure within this language. We show that the resulting layouts support systematic generalization in neural agents solving the more complex task of visual question-answering. Our regularized iterated learning method can outperform baselines without iterated learning on SHAPES-SyGeT (SHAPES Systematic Generalization Test), a new split of the SHAPES dataset we introduce to evaluate systematic generalization, and on CLOSURE, an extension of CLEVR also designed to test systematic generalization. We demonstrate superior performance in recovering ground-truth compositional program structure with limited supervision on both SHAPES-SyGeT and CLEVR., Comment: Published as a conference paper at ICLR 2021. 9 pages main, 21 pages total including references and appendix
Published: 2021

7. GAIT: A Geometric Approach to Information Theory

Author: Gallego-Posada, Jose, Vani, Ankit, Schwarzer, Max, and Lacoste-Julien, Simon
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our proposed divergence exhibits performance on par with state-of-the-art methods based on the Wasserstein distance, but enjoys a closed-form expression that can be computed efficiently. We demonstrate the versatility of our method via experiments on a broad range of domains: training generative models, computing image barycenters, approximating empirical measures and counting modes., Comment: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020. 19 pages
Published: 2019

8. Grounded Recurrent Neural Networks

Author: Vani, Ankit, Jernite, Yacine, and Sontag, David
Subjects: Statistics - Machine Learning, Computer Science - Computation and Language, Computer Science - Learning, Computer Science - Neural and Evolutionary Computing
Abstract: In this work, we present the Grounded Recurrent Neural Network (GRNN), a recurrent neural network architecture for multi-label prediction which explicitly ties labels to specific dimensions of the recurrent hidden state (we call this process "grounding"). The approach is particularly well-suited for extracting large numbers of concepts from text. We apply the new model to address an important problem in healthcare of understanding what medical concepts are discussed in clinical text. Using a publicly available dataset derived from Intensive Care Units, we learn to label a patient's diagnoses and procedures from their discharge summary. Our evaluation shows a clear advantage to using our proposed architecture over a variety of strong baselines.
Published: 2017

9. On the Compositional Generalization Gap of In-Context Learning

Author: Hosseini, Arian, primary, Vani, Ankit, additional, Bahdanau, Dzmitry, additional, Sordoni, Alessandro, additional, and Courville, Aaron, additional
Published: 2022
Full Text: View/download PDF

10. Categorising videos using a personalised category catalogue

Author: Bairi, Ramakrishna B., primary, Vani, Ankit, additional, Ahuja, Pooja, additional, and Ramakrishnan, Ganesh, additional
Published: 2015
Full Text: View/download PDF

11. Categorising videos using a personalised category catalogue.

Author: Bairi, Ramakrishna B., Vani, Ankit, Ahuja, Pooja, and Ramakrishnan, Ganesh
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"Vani, Ankit"'

1. Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics

2. SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

3. On the Compositional Generalization Gap of In-Context Learning

4. Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

5. Fortuitous Forgetting in Connectionist Networks

6. Iterated learning for emergent systematicity in VQA

7. GAIT: A Geometric Approach to Information Theory

8. Grounded Recurrent Neural Networks

9. On the Compositional Generalization Gap of In-Context Learning

10. Categorising videos using a personalised category catalogue

11. Categorising videos using a personalised category catalogue.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

11 results on '"Vani, Ankit"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources