Author: "Hu, Shell Xu" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hu, Shell Xu"' showing total 28 results

Start Over Author "Hu, Shell Xu"

28 results on '"Hu, Shell Xu"'

1. MobileQuant: Mobile-friendly Quantization for On-device Language Models

Author: Tan, Fuwen, Lee, Royson, Dudziak, Łukasz, Hu, Shell Xu, Bhattacharya, Sourav, Hospedales, Timothy, Tzimiropoulos, Georgios, and Martinez, Brais
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have revolutionized language processing, delivering outstanding results across multiple applications. However, deploying LLMs on edge devices poses several challenges with respect to memory, energy, and compute costs, limiting their widespread use in devices such as mobile phones. A promising solution is to reduce the number of bits used to represent weights and activations. While existing works have found partial success at quantizing LLMs to lower bitwidths, e.g. 4-bit weights, quantizing activations beyond 16 bits often leads to large computational overheads due to poor on-device quantization support, or a considerable accuracy drop. Yet, 8-bit activations are very attractive for on-device deployment as they would enable LLMs to fully exploit mobile-friendly hardware, e.g. Neural Processing Units (NPUs). In this work, we make a first attempt to facilitate the on-device deployment of LLMs using integer-only quantization. We first investigate the limitations of existing quantization methods for on-device deployment, with a special focus on activation quantization. We then address these limitations by introducing a simple post-training quantization method, named MobileQuant, that extends previous weight equivalent transformation works by jointly optimizing the weight transformation and activation range parameters in an end-to-end manner. MobileQuant demonstrates superior capabilities over existing methods by 1) achieving near-lossless quantization on a wide range of LLM benchmarks, 2) reducing latency and energy consumption by 20\%-50\% compared to current on-device quantization strategies, 3) requiring limited compute budget, 4) being compatible with mobile-friendly compute units, e.g. NPU., Comment: EMNLP 2024 Findings. Code and models available: https://github.com/saic-fi/MobileQuant
Published: 2024

2. BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

Author: Lin, Xinna, Ma, Siqi, Shan, Junjie, Zhang, Xiaojing, Hu, Shell Xu, Guo, Tiannan, Li, Stan Z., and Yu, Kaicheng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an AI Scientist perspective remains largely unexplored. To this end, we draw inspiration from one most important abilities of scientists, understanding the literature, and introduce BioKGBench. In contrast to traditional evaluation benchmark that only focuses on factual QA, where the LLMs are known to have hallucination issues, we first disentangle "Understanding Literature" into two atomic abilities, i) "Understanding" the unstructured text from research papers by performing scientific claim verification, and ii) Ability to interact with structured Knowledge-Graph Question-Answering (KGQA) as a form of "Literature" grounding. We then formulate a novel agent task, dubbed KGCheck, using KGQA and domain-based Retrieval-Augmented Generation (RAG) to identify the factual errors of existing large-scale knowledge graph databases. We collect over two thousand data for two atomic tasks and 225 high-quality annotated data for the agent task. Surprisingly, we discover that state-of-the-art agents, both daily scenarios and biomedical ones, have either failed or inferior performance on our benchmark. We then introduce a simple yet effective baseline, dubbed BKGAgent. On the widely used popular knowledge graph, we discover over 90 factual errors which provide scenarios for agents to make discoveries and demonstrate the effectiveness of our approach. The code and data are available at https://github.com/westlake-autolab/BioKGBench.
Published: 2024

3. Recurrent Early Exits for Federated Learning with Heterogeneous Clients

Author: Lee, Royson, Fernandez-Marques, Javier, Hu, Shell Xu, Li, Da, Laskaridis, Stefanos, Dudziak, Łukasz, Hospedales, Timothy, Huszár, Ferenc, and Lane, Nicholas D.
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Federated learning (FL) has enabled distributed learning of a model across multiple clients in a privacy-preserving manner. One of the main challenges of FL is to accommodate clients with varying hardware capacities; clients have differing compute and memory requirements. To tackle this challenge, recent state-of-the-art approaches leverage the use of early exits. Nonetheless, these approaches fall short of mitigating the challenges of joint learning multiple exit classifiers, often relying on hand-picked heuristic solutions for knowledge distillation among classifiers and/or utilizing additional layers for weaker classifiers. In this work, instead of utilizing multiple classifiers, we propose a recurrent early exit approach named ReeFL that fuses features from different sub-models into a single shared classifier. Specifically, we use a transformer-based early-exit module shared among sub-models to i) better exploit multi-layer feature representations for task-specific prediction and ii) modulate the feature representation of the backbone model for subsequent predictions. We additionally present a per-client self-distillation approach where the best sub-model is automatically selected as the teacher of the other sub-models at each client. Our experiments on standard image and speech classification benchmarks across various emerging federated fine-tuning baselines demonstrate ReeFL's effectiveness over previous works., Comment: Accepted at the 41st International Conference on Machine Learning (ICML 2024)
Published: 2024

4. EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods

Author: Basu, Samyadeep, Saberi, Mehrdad, Bhardwaj, Shweta, Chegini, Atoosa Malemir, Massiceti, Daniela, Sanjabi, Maziar, Hu, Shell Xu, and Feizi, Soheil
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: A plethora of text-guided image editing methods have recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models such as Imagen and Stable Diffusion. A standardized evaluation protocol, however, does not exist to compare methods across different types of fine-grained edits. To address this gap, we introduce EditVal, a standardized benchmark for quantitatively evaluating text-guided image editing methods. EditVal consists of a curated dataset of images, a set of editable attributes for each image drawn from 13 possible edit types, and an automated evaluation pipeline that uses pre-trained vision-language models to assess the fidelity of generated images for each edit type. We use EditVal to benchmark 8 cutting-edge diffusion-based editing methods including SINE, Imagic and Instruct-Pix2Pix. We complement this with a large-scale human study where we show that EditVall's automated evaluation pipeline is strongly correlated with human-preferences for the edit types we considered. From both the human study and automated evaluation, we find that: (i) Instruct-Pix2Pix, Null-Text and SINE are the top-performing methods averaged across different edit types, however {\it only} Instruct-Pix2Pix and Null-Text are able to preserve original image properties; (ii) Most of the editing methods fail at edits involving spatial operations (e.g., changing the position of an object). (iii) There is no `winner' method which ranks the best individually across a range of different edit types. We hope that our benchmark can pave the way to developing more reliable text-guided image editing tools in the future. We will publicly release EditVal, and all associated code and human-study templates to support these research directions in https://deep-ml-research.github.io/editval/.
Published: 2023

5. Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP

Author: Basu, Samyadeep, Hu, Shell Xu, Sanjabi, Maziar, Massiceti, Daniela, and Feizi, Soheil
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Image-text contrastive models like CLIP have wide applications in zero-shot classification, image-text retrieval, and transfer learning. However, they often struggle on compositional visio-linguistic tasks (e.g., attribute-binding or object-relationships) where their performance is no better than random chance. To address this, we introduce SDS-CLIP, a lightweight and sample-efficient distillation method to enhance CLIP's compositional visio-linguistic reasoning. Our approach fine-tunes CLIP using a distillation objective borrowed from large text-to-image generative models like Stable-Diffusion, which are known for their strong visio-linguistic reasoning abilities. On the challenging Winoground benchmark, SDS-CLIP improves the visio-linguistic performance of various CLIP models by up to 7%, while on the ARO dataset, it boosts performance by up to 3%. This work underscores the potential of well-designed distillation objectives from generative models to enhance contrastive image-text models with improved visio-linguistic reasoning capabilities., Comment: Short paper
Published: 2023

6. Strong Baselines for Parameter Efficient Few-Shot Fine-tuning

Author: Basu, Samyadeep, Massiceti, Daniela, Hu, Shell Xu, and Feizi, Soheil
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase on a set of base classes. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters. While these methods have shown promise, inconsistencies in experimental conditions make it difficult to disentangle their advantage from other experimental factors including the feature extractor architecture, pre-trained initialization and fine-tuning algorithm, amongst others. In our paper, we conduct a large-scale, experimentally consistent, empirical analysis to study PEFTs for few-shot image classification. Through a battery of over 1.8k controlled experiments on large-scale few-shot benchmarks including Meta-Dataset (MD) and ORBIT, we uncover novel insights on PEFTs that cast light on their efficacy in fine-tuning ViTs for few-shot classification. Through our controlled empirical study, we have two main findings: (i) Fine-tuning just the LayerNorm parameters (which we call LN-Tune) during few-shot adaptation is an extremely strong baseline across ViTs pre-trained with both self-supervised and supervised objectives, (ii) For self-supervised ViTs, we find that simply learning a set of scaling parameters for each attention matrix (which we call AttnScale) along with a domain-residual adapter (DRA) module leads to state-of-the-art performance (while being $\sim\!$ 9$\times$ more parameter-efficient) on MD. Our extensive empirical findings set strong baselines and call for rethinking the current design of PEFT methods for FSC.
Published: 2023

7. Federated Learning for Inference at Anytime and Anywhere

Author: Liu, Zicheng, Li, Da, Fernandez-Marques, Javier, Laskaridis, Stefanos, Gao, Yan, Dudziak, Łukasz, Li, Stan Z., Hu, Shell Xu, and Hospedales, Timothy
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Federated learning has been predominantly concerned with collaborative training of deep networks from scratch, and especially the many challenges that arise, such as communication cost, robustness to heterogeneous data, and support for diverse device capabilities. However, there is no unified framework that addresses all these problems together. This paper studies the challenges and opportunities of exploiting pre-trained Transformer models in FL. In particular, we propose to efficiently adapt such pre-trained models by injecting a novel attention-based adapter module at each transformer block that both modulates the forward pass and makes an early prediction. Training only the lightweight adapter by FL leads to fast and communication-efficient learning even in the presence of heterogeneous data and devices. Extensive experiments on standard FL benchmarks, including CIFAR-100, FEMNIST and SpeechCommandsv2 demonstrate that this simple framework provides fast and accurate FL while supporting heterogenous device capabilities, efficient personalization, and scalable-cost anytime inference., Comment: 14 pages, 3 figures
Published: 2022

8. TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut

Author: Wang, Yangtao, Shen, Xi, Yuan, Yuan, Du, Yuming, Li, Maomao, Hu, Shell Xu, Crowley, James L, and Vaufreydaz, Dominique
Subjects: Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features learned by the transformer. Detection and segmentation of salient objects is then formulated as a graph-cut problem and solved using the classical Normalized Cut algorithm. Despite the simplicity of this approach, it achieves state-of-the-art results on several common image and video detection and segmentation tasks. For unsupervised object discovery, this approach outperforms the competing approaches by a margin of 6.1%, 5.7%, and 2.6%, respectively, when tested with the VOC07, VOC12, and COCO20K datasets. For the unsupervised saliency detection task in images, this method improves the score for Intersection over Union (IoU) by 4.4%, 5.6% and 5.2%. When tested with the ECSSD, DUTS, and DUT-OMRON datasets, respectively, compared to current state-of-the-art techniques. This method also achieves competitive results for unsupervised video object segmentation tasks with the DAVIS, SegTV2, and FBMS datasets., Comment: arXiv admin note: text overlap with arXiv:2202.11539
Published: 2022

9. Feed-Forward Latent Domain Adaptation

Author: Bohdal, Ondrej, Li, Da, Hu, Shell Xu, and Hospedales, Timothy
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We study a new highly-practical problem setting that enables resource-constrained edge devices to adapt a pre-trained model to their local data distributions. Recognizing that device's data are likely to come from multiple latent domains that include a mixture of unlabelled domain-relevant and domain-irrelevant examples, we focus on the comparatively under-studied problem of latent domain adaptation. Considering limitations of edge devices, we aim to only use a pre-trained model and adapt it in a feed-forward way, without using back-propagation and without access to the source data. Modelling these realistic constraints bring us to the novel and practically important problem setting of feed-forward latent domain adaptation. Our solution is to meta-learn a network capable of embedding the mixed-relevance target dataset and dynamically adapting inference for target examples using cross-attention. The resulting framework leads to consistent improvements over strong ERM baselines. We also show that our framework sometimes even improves on the upper bound of domain-supervised adaptation, where only domain-relevant instances are provided for adaptation. This suggests that human annotated domain labels may not always be optimal, and raises the possibility of doing better through automated instance selection., Comment: Accepted at WACV 2024. Project page: https://ondrejbohdal.github.io/cxda
Published: 2022

10. Compressing Features for Learning with Noisy Labels

Author: Chen, Yingyi, Hu, Shell Xu, Shen, Xi, Ai, Chunrong, and Suykens, Johan A. K.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this paper, we focus on the problem of learning with noisy labels and introduce compression inductive bias to network architectures to alleviate this over-fitting problem. More precisely, we revisit one classical regularization named Dropout and its variant Nested Dropout. Dropout can serve as a compression constraint for its feature dropping mechanism, while Nested Dropout further learns ordered feature representations w.r.t. feature importance. Moreover, the trained models with compression regularization are further combined with Co-teaching for performance boost. Theoretically, we conduct bias-variance decomposition of the objective function under compression regularization. We analyze it for both single model and Co-teaching. This decomposition provides three insights: (i) it shows that over-fitting is indeed an issue for learning with noisy labels; (ii) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; (iii) it gives explanations on the performance boost brought by incorporating compression regularization into Co-teaching. Experiments show that our simple approach can have comparable or even better performance than the state-of-the-art methods on benchmarks with real-world label noise including Clothing1M and ANIMAL-10N. Our implementation is available at https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/., Comment: Accepted to TNNLS 2022. Project page: https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/
Published: 2022

11. Fisher SAM: Information Geometry and Sharpness Aware Minimisation

Author: Kim, Minyoung, Li, Da, Hu, Shell Xu, and Hospedales, Timothy M.
Subjects: Computer Science - Machine Learning
Abstract: Recent sharpness-aware minimisation (SAM) is known to find flat minima which is beneficial for better generalisation with improved robustness. SAM essentially modifies the loss function by reporting the maximum loss value within the small neighborhood around the current iterate. However, it uses the Euclidean ball to define the neighborhood, which can be inaccurate since loss functions for neural networks are typically defined over probability distributions (e.g., class predictive probabilities), rendering the parameter space non Euclidean. In this paper we consider the information geometry of the model parameter space when defining the neighborhood, namely replacing SAM's Euclidean balls with ellipsoids induced by the Fisher information. Our approach, dubbed Fisher SAM, defines more accurate neighborhood structures that conform to the intrinsic metric of the underlying statistical manifold. For instance, SAM may probe the worst-case loss value at either a too nearby or inappropriately distant point due to the ignorance of the parameter space geometry, which is avoided by our Fisher SAM. Another recent Adaptive SAM approach stretches/shrinks the Euclidean ball in accordance with the scale of the parameter magnitudes. This might be dangerous, potentially destroying the neighborhood structure. We demonstrate improved performance of the proposed Fisher SAM on several benchmark datasets/tasks.
Published: 2022

12. Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference

Author: Hu, Shell Xu, Li, Da, Stühmer, Jan, Kim, Minyoung, and Hospedales, Timothy M.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Few-shot learning (FSL) is an important and topical problem in computer vision that has motivated extensive research into numerous methods spanning from sophisticated meta-learning methods to simple transfer learning baselines. We seek to push the limits of a simple-but-effective pipeline for more realistic and practical settings of few-shot image classification. To this end, we explore few-shot learning from the perspective of neural network architecture, as well as a three stage pipeline of network updates under different data supplies, where unsupervised external data is considered for pre-training, base categories are used to simulate few-shot tasks for meta-training, and the scarcely labelled data of an novel task is taken for fine-tuning. We investigate questions such as: (1) How pre-training on external data benefits FSL? (2) How state-of-the-art transformer architectures can be exploited? and (3) How fine-tuning mitigates domain shift? Ultimately, we show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks such as Mini-ImageNet, CIFAR-FS, CDFSL and Meta-Dataset. Our code and demo are available at https://hushell.github.io/pmf., Comment: Accepted by CVPR2022
Published: 2022

13. AvaTr: One-Shot Speaker Extraction with Transformers

Author: Hu, Shell Xu, Arefin, Md Rifat, Nguyen, Viet-Nhat, Dipani, Alish, Pitkow, Xaq, and Tolias, Andreas Savas
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: To extract the voice of a target speaker when mixed with a variety of other sounds, such as white and ambient noises or the voices of interfering speakers, we extend the Transformer network to attend the most relevant information with respect to the target speaker given the characteristics of his or her voices as a form of contextual information. The idea has a natural interpretation in terms of the selective attention theory. Specifically, we propose two models to incorporate the voice characteristics in Transformer based on different insights of where the feature selection should take place. Both models yield excellent performance, on par or better than published state-of-the-art models on the speaker extraction task, including separating speech of novel speakers not seen during training., Comment: 6 pages, 4 main figures, 2 supplemental figures
Published: 2021

14. Boosting Co-teaching with Compression Regularization for Label Noise

Author: Chen, Yingyi, Shen, Xi, Hu, Shell Xu, and Suykens, Johan A. K.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we study the problem of learning image classification models in the presence of label noise. We revisit a simple compression regularization named Nested Dropout. We find that Nested Dropout, though originally proposed to perform fast information retrieval and adaptive data compression, can properly regularize a neural network to combat label noise. Moreover, owing to its simplicity, it can be easily combined with Co-teaching to further boost the performance. Our final model remains simple yet effective: it achieves comparable or even better performance than the state-of-the-art approaches on two real-world datasets with label noise which are Clothing1M and ANIMAL-10N. On Clothing1M, our approach obtains 74.9% accuracy which is slightly better than that of DivideMix. On ANIMAL-10N, we achieve 84.1% accuracy while the best public result by PLC is 83.4%. We hope that our simple approach can be served as a strong baseline for learning with label noise. Our implementation is available at https://github.com/yingyichen-cyy/Nested-Co-teaching., Comment: Accepted by CVPR Workshop 2021. Project page: https://github.com/yingyichen-cyy/Nested-Co-teaching
Published: 2021

15. Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

Author: Hu, Shell Xu, Moreno, Pablo G., Xiao, Yang, Shen, Xi, Obozinski, Guillaume, Lawrence, Neil D., and Damianou, Andreas
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior on the query set of each task. We derive a novel amortized variational inference that couples all the variational posteriors via a meta-model, which consists of a synthetic gradient network and an initialization network. Each variational posterior is derived from synthetic gradient descent to approximate the true posterior on the query set, although where we do not have access to the true gradient. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification outperform previous state-of-the-art methods. Besides, we conduct two zero-shot learning experiments to further explore the potential of the synthetic gradient., Comment: ICLR 2020
Published: 2020

16. Variational Information Distillation for Knowledge Transfer

Author: Ahn, Sungsoo, Hu, Shell Xu, Damianou, Andreas, Lawrence, Neil D., and Dai, Zhenwen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match the activations or the corresponding hand-crafted features of the teacher and the student networks. We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks. We compare our method with existing knowledge transfer methods on both knowledge distillation and transfer learning tasks and show that our method consistently outperforms existing methods. We further demonstrate the strength of our method on knowledge transfer across heterogeneous network architectures by transferring knowledge from a convolutional neural network (CNN) to a multi-layer perceptron (MLP) on CIFAR-10. The resulting MLP significantly outperforms the-state-of-the-art methods and it achieves similar performance to the CNN with a single convolutional layer., Comment: To appear at CVPR 2019
Published: 2019

17. Feed-Forward Latent Domain Adaptation

Author: Bohdal, Ondrej, primary, Li, Da, additional, Hu, Shell Xu, additional, and Hospedales, Timothy, additional
Published: 2024
Full Text: View/download PDF

18. Exploring weight symmetry in deep neural networks

Author: Hu, Shell Xu, Zagoruyko, Sergey, and Komodakis, Nikos
Published: 2019
Full Text: View/download PDF

19. TokenCut: Segmenting Objects in Images and Videos With Self-Supervised Transformer and Normalized Cut

Author: Wang, Yangtao, primary, Shen, Xi, additional, Yuan, Yuan, additional, Du, Yuming, additional, Li, Maomao, additional, Hu, Shell Xu, additional, Crowley, James L., additional, and Vaufreydaz, Dominique, additional
Published: 2023
Full Text: View/download PDF

20. Augmenting CLIP with Improved Visio-Linguistic Reasoning

Author: Basu, Samyadeep, Sanjabi, Maziar, Massiceti, Daniela, Hu, Shell Xu, Feizi, Soheil, Basu, Samyadeep, Sanjabi, Maziar, Massiceti, Daniela, Hu, Shell Xu, and Feizi, Soheil
Abstract: Image-text contrastive models such as CLIP are useful for a variety of downstream applications including zero-shot classification, image-text retrieval and transfer learning. However, these contrastively trained vision-language models often fail on compositional visio-linguistic tasks such as Winoground with performance equivalent to random chance. In our paper, we address this issue and propose a sample-efficient light-weight method called SDS-CLIP to improve the compositional visio-linguistic reasoning capabilities of CLIP. The core idea of our method is to use differentiable image parameterizations to fine-tune CLIP with a distillation objective from large text-to-image generative models such as Stable-Diffusion which are relatively good at visio-linguistic reasoning tasks. On the challenging Winoground compositional reasoning benchmark, our method improves the absolute visio-linguistic performance of different CLIP models by up to 7%, while on the ARO dataset, our method improves the visio-linguistic performance by upto 3%. As a byproduct of inducing visio-linguistic reasoning into CLIP, we also find that the zero-shot performance improves marginally on a variety of downstream datasets. Our method reinforces that carefully designed distillation objectives from generative models can be leveraged to extend existing contrastive image-text models with improved visio-linguistic reasoning capabilities.
Published: 2023

21. Compressing Features for Learning With Noisy Labels

Author: Chen, Yingyi, Hu, Shell Xu, Shen, Xi, Ai, Chunrong, and Suykens, Johan A. K.
Abstract: Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this article, we focus on the problem of learning with noisy labels and introduce compression inductive bias to network architectures to alleviate this overfitting problem. More precisely, we revisit one classical regularization named Dropout and its variant Nested Dropout. Dropout can serve as a compression constraint for its feature dropping mechanism, while Nested Dropout further learns ordered feature representations with respect to feature importance. Moreover, the trained models with compression regularization are further combined with co-teaching for performance boost. Theoretically, we conduct bias variance decomposition of the objective function under compression regularization. We analyze it for both single model and co-teaching. This decomposition provides three insights: 1) it shows that overfitting is indeed an issue in learning with noisy labels; 2) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; and 3) it gives explanations on the performance boost brought by incorporating compression regularization into co-teaching. Experiments show that our simple approach can have comparable or even better performance than the state-of-the-art methods on benchmarks with real-world label noise including Clothing1M and ANIMAL-10N. Our implementation is available at https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/.
Published: 2024
Full Text: View/download PDF

22. Feed-Forward Source-Free Latent Domain Adaptation via Cross-Attention

Author: Bohdal, Ondrej, Li, Da, Hu, Shell Xu, and Hospedales, Timothy
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: We study the highly practical but comparatively under-studied problem of latent-domain adaptation, where a source model should be adapted to a target dataset that contains a mixture of unlabelled domain-relevant and domain-irrelevant examples. Furthermore, motivated by the requirements for data privacy and the need for embedded and resource-constrained devices of all kinds to adapt to local data distributions, we focus on the setting of feed-forward source-free domain adaptation, where adaptation should not require access to the source dataset, and also be back propagation-free. Our solution is to meta-learn a network capable of embedding the mixed-relevance target dataset and dynamically adapting inference for target examples using cross-attention. The resulting framework leads to consistent improvement on strong ERM baselines. We also show that our framework sometimes even improves on the upper bound of domain-supervised adaptation, where only domain-relevant instances are provided for adaptation. This suggests that human annotated domain labels may not always be optimal, and raises the possibility of doing better through automated instance selection., Shorter version accepted at the First Workshop of Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022
Published: 2022

23. Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference

Author: Hu, Shell Xu, primary, Li, Da, additional, Stuhmer, Jan, additional, Kim, Minyoung, additional, and Hospedales, Timothy M., additional
Published: 2022
Full Text: View/download PDF

24. Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Author: Wang, Yangtao, primary, Shen, Xi, additional, Hu, Shell Xu, additional, Yuan, Yuan, additional, Crowley, James L., additional, and Vaufreydaz, Dominique, additional
Published: 2022
Full Text: View/download PDF

25. Compressing Features for Learning With Noisy Labels

Author: Chen, Yingyi, primary, Hu, Shell Xu, additional, Shen, Xi, additional, Ai, Chunrong, additional, and Suykens, Johan A. K., additional
Published: 2022
Full Text: View/download PDF

26. AvaTr: One-Shot Speaker Extraction with Transformers

Author: Hu, Shell Xu, primary, Arefin, Md. Rifat, additional, Nguyen, Viet-Nhat, additional, Dipani, Alish, additional, Pitkow, Xaq, additional, and Tolias, Andreas Savas, additional
Published: 2021
Full Text: View/download PDF

27. Boosting Co-teaching with Compression Regularization for Label Noise

Author: Chen, Yingyi, primary, Shen, Xi, additional, Hu, Shell Xu, additional, and Suykens, Johan A. K., additional
Published: 2021
Full Text: View/download PDF

28. Variational Information Distillation for Knowledge Transfer

Author: Ahn, Sungsoo, primary, Hu, Shell Xu, additional, Damianou, Andreas, additional, Lawrence, Neil D., additional, and Dai, Zhenwen, additional
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

28 results on '"Hu, Shell Xu"'

1. MobileQuant: Mobile-friendly Quantization for On-device Language Models

2. BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

3. Recurrent Early Exits for Federated Learning with Heterogeneous Clients

4. EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods

5. Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP

6. Strong Baselines for Parameter Efficient Few-Shot Fine-tuning

7. Federated Learning for Inference at Anytime and Anywhere

8. TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut

9. Feed-Forward Latent Domain Adaptation

10. Compressing Features for Learning with Noisy Labels

11. Fisher SAM: Information Geometry and Sharpness Aware Minimisation

12. Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference

13. AvaTr: One-Shot Speaker Extraction with Transformers

14. Boosting Co-teaching with Compression Regularization for Label Noise

15. Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

16. Variational Information Distillation for Knowledge Transfer

17. Feed-Forward Latent Domain Adaptation

18. Exploring weight symmetry in deep neural networks

19. TokenCut: Segmenting Objects in Images and Videos With Self-Supervised Transformer and Normalized Cut

20. Augmenting CLIP with Improved Visio-Linguistic Reasoning

21. Compressing Features for Learning With Noisy Labels

22. Feed-Forward Source-Free Latent Domain Adaptation via Cross-Attention

23. Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference

24. Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

25. Compressing Features for Learning With Noisy Labels

26. AvaTr: One-Shot Speaker Extraction with Transformers

27. Boosting Co-teaching with Compression Regularization for Label Noise

28. Variational Information Distillation for Knowledge Transfer

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

28 results on '"Hu, Shell Xu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources