Author: "Zhao, Bingchen" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhao, Bingchen"' showing total 151 results

Start Over Author "Zhao, Bingchen" Publication Year Range Last 10 years

151 results on '"Zhao, Bingchen"'

1. CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

Author: Liu, Yanqing, Li, Xianhang, Wang, Zeyu, Zhao, Bingchen, and Xie, Cihang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Previous works show that noisy, web-crawled image-text pairs may limit vision-language pretraining like CLIP and propose learning with synthetic captions as a promising alternative. Our work continues this effort, introducing two simple yet effective designs to better leverage richly described synthetic captions. Firstly, by observing a strong inverse effect in learning with synthetic captions -- the short synthetic captions can generally lead to MUCH higher performance than full-length ones -- we therefore fed only partial synthetic captions to the text encoder. Secondly, we incorporate an autoregressive captioner to mimic the recaptioning process -- by conditioning on the paired image input and web-crawled text description, the captioner learns to predict the full-length synthetic caption generated by advanced MLLMs. Experiments show that our framework significantly improves zero-shot performance in cross-modal retrieval tasks, setting new SOTA results on MSCOCO and Flickr30K. Moreover, such trained vision encoders can enhance the visual capability of LLaVA, showing strong improvements on a range of MLLM benchmarks. Our project page is https://ucsc-vlaa.github.io/CLIPS/., Comment: 12 pages
Published: 2024

2. AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation

Author: Wang, Zijun, Tu, Haoqin, Mei, Jieru, Zhao, Bingchen, Wang, Yisen, and Xie, Cihang
Subjects: Computer Science - Computation and Language
Abstract: This paper studies the vulnerabilities of transformer-based Large Language Models (LLMs) to jailbreaking attacks, focusing specifically on the optimization-based Greedy Coordinate Gradient (GCG) strategy. We first observe a positive correlation between the effectiveness of attacks and the internal behaviors of the models. For instance, attacks tend to be less effective when models pay more attention to system prompts designed to ensure LLM safety alignment. Building on this discovery, we introduce an enhanced method that manipulates models' attention scores to facilitate LLM jailbreaking, which we term AttnGCG. Empirically, AttnGCG shows consistent improvements in attack efficacy across diverse LLMs, achieving an average increase of ~7% in the Llama-2 series and ~10% in the Gemma series. Our strategy also demonstrates robust attack transferability against both unseen harmful goals and black-box LLMs like GPT-3.5 and GPT-4. Moreover, we note our attention-score visualization is more interpretable, allowing us to gain better insights into how our targeted attention manipulation facilitates more effective jailbreaking. We release the code at https://github.com/UCSC-VLAA/AttnGCG-attack.
Published: 2024

3. A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?

Author: Xie, Yunfei, Wu, Juncheng, Tu, Haoqin, Yang, Siwei, Zhao, Bingchen, Zong, Yongshuo, Jin, Qiao, Xie, Cihang, and Zhou, Yuyin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) have exhibited remarkable capabilities across various domains and tasks, pushing the boundaries of our knowledge in learning and cognition. The latest model, OpenAI's o1, stands out as the first LLM with an internalized chain-of-thought technique using reinforcement learning strategies. While it has demonstrated surprisingly strong capabilities on various general language tasks, its performance in specialized fields such as medicine remains unknown. To this end, this report provides a comprehensive exploration of o1 on different medical scenarios, examining 3 key aspects: understanding, reasoning, and multilinguality. Specifically, our evaluation encompasses 6 tasks using data from 37 medical datasets, including two newly constructed and more challenging question-answering (QA) tasks based on professional medical quizzes from the New England Journal of Medicine (NEJM) and The Lancet. These datasets offer greater clinical relevance compared to standard medical QA benchmarks such as MedQA, translating more effectively into real-world clinical utility. Our analysis of o1 suggests that the enhanced reasoning ability of LLMs may (significantly) benefit their capability to understand various medical instructions and reason through complex clinical scenarios. Notably, o1 surpasses the previous GPT-4 in accuracy by an average of 6.2% and 6.6% across 19 datasets and two newly created complex QA scenarios. But meanwhile, we identify several weaknesses in both the model capability and the existing evaluation protocols, including hallucination, inconsistent multilingual ability, and discrepant metrics for evaluation. We release our raw data and model outputs at https://ucsc-vlaa.github.io/o1_medicine/ for future research., Comment: The first four authors contributed equally, project page available at https://ucsc-vlaa.github.io/o1_medicine/
Published: 2024

4. Contextuality Helps Representation Learning for Generalized Category Discovery

Author: Luo, Tingzhang, Du, Mingxuan, Shi, Jiatao, Chen, Xinxiang, Zhao, Bingchen, and Huang, Shaoguang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper introduces a novel approach to Generalized Category Discovery (GCD) by leveraging the concept of contextuality to enhance the identification and classification of categories in unlabeled datasets. Drawing inspiration from human cognition's ability to recognize objects within their context, we propose a dual-context based method. Our model integrates two levels of contextuality: instance-level, where nearest-neighbor contexts are utilized for contrastive learning, and cluster-level, employing prototypical contrastive learning based on category prototypes. The integration of the contextual information effectively improves the feature learning and thereby the classification accuracy of all categories, which better deals with the real-world datasets. Different from the traditional semi-supervised and novel category discovery techniques, our model focuses on a more realistic and challenging scenario where both known and novel categories are present in the unlabeled data. Extensive experimental results on several benchmark data sets demonstrate that the proposed model outperforms the state-of-the-art. Code is available at: https://github.com/Clarence-CV/Contexuality-GCD
Published: 2024

5. PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery

Author: Cendra, Fernando Julio, Zhao, Bingchen, and Han, Kai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We tackle the problem of Continual Category Discovery (CCD), which aims to automatically discover novel categories in a continuous stream of unlabeled data while mitigating the challenge of catastrophic forgetting -- an open problem that persists even in conventional, fully supervised continual learning. To address this challenge, we propose PromptCCD, a simple yet effective framework that utilizes a Gaussian Mixture Model (GMM) as a prompting method for CCD. At the core of PromptCCD lies the Gaussian Mixture Prompting (GMP) module, which acts as a dynamic pool that updates over time to facilitate representation learning and prevent forgetting during category discovery. Moreover, GMP enables on-the-fly estimation of category numbers, allowing PromptCCD to discover categories in unlabeled data without prior knowledge of the category numbers. We extend the standard evaluation metric for Generalized Category Discovery (GCD) to CCD and benchmark state-of-the-art methods on diverse public datasets. PromptCCD significantly outperforms existing methods, demonstrating its effectiveness. Project page: https://visual-ai.github.io/promptccd ., Comment: ECCV 2024, Project page: https://visual-ai.github.io/promptccd
Published: 2024

6. Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Author: Zhao, Bingchen, Zong, Yongshuo, Zhang, Letian, and Hospedales, Timothy
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The advancement of large language models (LLMs) has significantly broadened the scope of applications in natural language processing, with multi-modal LLMs extending these capabilities to integrate and interpret visual data. However, existing benchmarks for visual language models (VLMs) predominantly focus on single-image inputs, neglecting the crucial aspect of multi-image understanding. In this paper, we introduce a Multi-Image Relational Benchmark MIRB, designed to evaluate VLMs' ability to compare, analyze, and reason across multiple images. Our benchmark encompasses four categories: perception, visual world knowledge, reasoning, and multi-hop reasoning. Through a comprehensive evaluation of a wide range of open-source and closed-source models, we demonstrate that while open-source VLMs were shown to approach the performance of GPT-4V in single-image tasks, a significant performance gap remains in multi-image reasoning tasks. Our findings also reveal that even the state-of-the-art GPT-4V model struggles with our benchmark, underscoring the need for further research and development in this area. We believe our contribution of MIRB could serve as a testbed for developing the next-generation multi-modal models., Comment: First three authors contributed equally. Dataset: https://huggingface.co/datasets/VLLMs/MIRB
Published: 2024

7. What If We Recaption Billions of Web Images with LLaMA-3?

Author: Li, Xianhang, Tu, Haoqin, Hui, Mude, Wang, Zeyu, Zhao, Bingchen, Xiao, Junfei, Ren, Sucheng, Mei, Jieru, Liu, Qing, Zheng, Huangjie, Zhou, Yuyin, and Xie, Cihang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text-to-image generation. However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community effort, leveraging the powerful and \textit{open-sourced} LLaMA-3, a GPT-4 level LLM. Our recaptioning pipeline is simple: first, we fine-tune a LLaMA-3-8B powered LLaVA-1.5 and then employ it to recaption 1.3 billion images from the DataComp-1B dataset. Our empirical results confirm that this enhanced dataset, Recap-DataComp-1B, offers substantial benefits in training advanced vision-language models. For discriminative models like CLIP, we observe enhanced zero-shot performance in cross-modal retrieval tasks. For generative models like text-to-image Diffusion Transformers, the generated images exhibit a significant improvement in alignment with users' text instructions, especially in following complex queries. Our project page is https://www.haqtu.me/Recap-Datacomp-1B/, Comment: First five authors contributed equally
Published: 2024

8. Labeled Data Selection for Category Discovery

Author: Zhao, Bingchen, Lang, Nico, Belongie, Serge, and Mac Aodha, Oisin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Category discovery methods aim to find novel categories in unlabeled visual data. At training time, a set of labeled and unlabeled images are provided, where the labels correspond to the categories present in the images. The labeled data provides guidance during training by indicating what types of visual properties and features are relevant for performing discovery in the unlabeled data. As a result, changing the categories present in the labeled set can have a large impact on what is ultimately discovered in the unlabeled set. Despite its importance, the impact of labeled data selection has not been explored in the category discovery literature to date. We show that changing the labeled data can significantly impact discovery performance. Motivated by this, we propose two new approaches for automatically selecting the most suitable labeled data based on the similarity between the labeled and unlabeled data. Our observation is that, unlike in conventional supervised transfer learning, the best labeled is neither too similar, nor too dissimilar, to the unlabeled categories. Our resulting approaches obtains state-of-the-art discovery performance across a range of challenging fine-grained benchmark datasets.
Published: 2024

9. What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights

Author: Wen, Xin, Zhao, Bingchen, Chen, Yilun, Pang, Jiangmiao, and Qi, Xiaojuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Severe data imbalance naturally exists among web-scale vision-language datasets. Despite this, we find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning, and demonstrates significant effectiveness in learning generalizable representations. With an aim to investigate the reasons behind this finding, we conduct controlled experiments to study various underlying factors, and reveal that CLIP's pretext task forms a dynamic classification problem wherein only a subset of classes is present in training. This isolates the bias from dominant classes and implicitly balances the learning signal. Furthermore, the robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts, which are inaccessible to supervised learning. Our study not only uncovers the mechanisms behind CLIP's generalizability beyond data imbalance but also provides transferable insights for the research community. The findings are validated in both supervised and self-supervised learning, enabling models trained on imbalanced data to achieve CLIP-level performance on diverse recognition tasks. Code and data are available at: https://github.com/CVMI-Lab/clip-beyond-tail., Comment: Accepted at NeurIPS 2024
Published: 2024

10. HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Author: Hui, Mude, Yang, Siwei, Zhao, Bingchen, Shi, Yichun, Wang, Heng, Wang, Peng, Zhou, Yuyin, and Xie, Cihang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, and then used to create high-quality diptychs featuring input and output images with detailed text prompts, followed by precise alignment ensured through post-processing. In addition, we propose two evaluation metrics, Alignment and Coherence, to quantitatively assess the quality of image edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing models. For example, an HQ-Edit finetuned InstructPix2Pix can attain state-of-the-art image editing performance, even surpassing those models fine-tuned with human-annotated data. The project page is https://thefllood.github.io/HQEdit_web., Comment: Project Page: https://thefllood.github.io/HQEdit_web
Published: 2024

11. Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Author: Wang, Ye, Wang, Yaxiong, Wu, Yujiao, Zhao, Bingchen, and Qian, Xueming
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data, where the unlabelled data may come from known or novel classes. The prevailing approach generally involves clustering across all data and learning conceptions by prototypical contrastive learning. However, existing methods largely hinge on the performance of clustering algorithms and are thus subject to their inherent limitations. Firstly, the estimated cluster number is often smaller than the ground truth, making the existing methods suffer from the lack of prototypes for comprehensive conception learning. To address this issue, we propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes (centers). As there is no ground truth for the potential prototype, we develop a self-supervised prototype learning framework to optimize the potential prototype in an end-to-end fashion. Secondly, clustering is computationally intensive, and the conventional strategy of clustering both labelled and unlabelled instances exacerbates this issue. To counteract this inefficiency, we opt to cluster only the unlabelled instances and subsequently expand the cluster prototypes with our introduced potential prototypes to fast explore novel classes. Despite the simplicity of our proposed method, extensive empirical analysis on a wide range of datasets confirms that our method consistently delivers state-of-the-art results. Specifically, our method surpasses the nearest competitor by a significant margin of 9.7% within the Stanford Cars dataset and 12x clustering efficiency within the Herbarium 19 dataset. We will make the code and checkpoints publicly available at https://github.com/xjtuYW/PNP.git., Comment: 9 pages, 7 figures
Published: 2024

12. Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Author: Peng, Bo, Goldstein, Daniel, Anthony, Quentin, Albalak, Alon, Alcaide, Eric, Biderman, Stella, Cheah, Eugene, Du, Xingjian, Ferdinan, Teddy, Hou, Haowen, Kazienko, Przemysław, GV, Kranthi Kiran, Kocoń, Jan, Koptyra, Bartłomiej, Krishna, Satyapriya, McClelland Jr., Ronald, Lin, Jiaju, Muennighoff, Niklas, Obeid, Fares, Saito, Atsushi, Song, Guangyu, Tu, Haoqin, Wirawan, Cahya, Woźniak, Stanisław, Zhang, Ruichong, Zhao, Bingchen, Zhao, Qihang, Zhou, Peng, Zhu, Jian, and Zhu, Rui-Jie
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer
Published: 2024

13. Beyond the Known: Novel Class Discovery for Open-world Graph Learning

Author: Jin, Yucheng, Xiong, Yun, Fang, Juncheng, Wu, Xixi, He, Dongxiao, Jia, Xing, Zhao, Bingchen, and Yu, Philip
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Node classification on graphs is of great importance in many applications. Due to the limited labeling capability and evolution in real-world open scenarios, novel classes can emerge on unlabeled testing nodes. However, little attention has been paid to novel class discovery on graphs. Discovering novel classes is challenging as novel and known class nodes are correlated by edges, which makes their representations indistinguishable when applying message passing GNNs. Furthermore, the novel classes lack labeling information to guide the learning process. In this paper, we propose a novel method Open-world gRAph neuraL network (ORAL) to tackle these challenges. ORAL first detects correlations between classes through semi-supervised prototypical learning. Inter-class correlations are subsequently eliminated by the prototypical attention network, leading to distinctive representations for different classes. Furthermore, to fully explore multi-scale graph features for alleviating label deficiencies, ORAL generates pseudo-labels by aligning and ensembling label estimations from multiple stacked prototypical attention networks. Extensive experiments on several benchmark datasets show the effectiveness of our proposed method.
Published: 2024

14. AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability

Author: Yang, Siwei, Zhao, Bingchen, and Xie, Cihang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: This paper introduces AQA-Bench, a novel benchmark to assess the sequential reasoning capabilities of large language models (LLMs) in algorithmic contexts, such as depth-first search (DFS). The key feature of our evaluation benchmark lies in its interactive evaluation protocol -- for example, in DFS, the availability of each node's connected edge is contingent upon the model's traversal to that node, thereby necessitating the LLM's ability to effectively remember visited nodes and strategize subsequent moves. We comprehensively build AQA-Bench with three different algorithms, namely binary search, depth-first search, and breadth-first search, and to evaluate the sequential reasoning ability of 12 different LLMs. Our investigations reveal several interesting findings: (1) Closed-source models like GPT-4 and Gemini generally show strong sequential reasoning ability, significantly outperforming open-source LLMs. (2) Naively providing interactive examples may inadvertently hurt few-shot performance. (3) A very limited number of predecessor steps following the optimal policy can substantially boost small models' performance. (4) The scaling correlation between performance and model size is not always significant, sometimes even showcasing an inverse trend. We hope our study can catalyze future work on advancing the understanding and enhancement of LLMs' capabilities in sequential reasoning. The code is available at https://github.com/UCSC-VLAA/AQA-Bench.
Published: 2024

15. Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

Author: Zhao, Bingchen, Tu, Haoqin, Wei, Chen, Mei, Jieru, and Xie, Cihang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text understanding to embracing multiple modalities, we intriguingly note that, within each attention block, tuning LayerNorm suffices to yield strong performance. Moreover, when benchmarked against other tuning approaches like full parameter finetuning or LoRA, its benefits on efficiency are substantial. For example, when compared to LoRA on a 13B model scale, performance can be enhanced by an average of over 20% across five multi-modal tasks, and meanwhile, results in a significant reduction of trainable parameters by 41.9% and a decrease in GPU memory usage by 17.6%. On top of this LayerNorm strategy, we showcase that selectively tuning only with conversational data can improve efficiency further. Beyond these empirical outcomes, we provide a comprehensive analysis to explore the role of LayerNorm in adapting LLMs to the multi-modal domain and improving the expressive power of the model., Comment: The first two authors contributed equally
Published: 2023

16. What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models

Author: Zhang, Letian, Zhai, Xiaotong, Zhao, Zhongkai, Zong, Yongshuo, Wen, Xin, and Zhao, Bingchen
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Counterfactual reasoning, a fundamental aspect of human cognition, involves contemplating alternatives to established facts or past events, significantly enhancing our abilities in planning and decision-making. In light of the advancements in current multi-modal large language models, we explore their effectiveness in counterfactual reasoning. To facilitate this investigation, we introduce a novel dataset, C-VQA, specifically designed to test the counterfactual reasoning capabilities of modern multi-modal large language models. This dataset is constructed by infusing original questions with counterfactual presuppositions, spanning various types such as numerical and boolean queries. It encompasses a mix of real and synthetic data, representing a wide range of difficulty levels. Our thorough evaluations of contemporary vision-language models using this dataset have revealed substantial performance drops, with some models showing up to a 40% decrease, highlighting a significant gap between current models and human-like vision reasoning capabilities. We hope our dataset will serve as a vital benchmark for evaluating the counterfactual reasoning capabilities of models. Code and dataset are publicly available at https://bzhao.me/C-VQA/.
Published: 2023

17. Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

Author: Zong, Yongshuo, Yu, Tingyang, Chavhan, Ruchika, Zhao, Bingchen, and Hospedales, Timothy
Subjects: Computer Science - Machine Learning
Abstract: Large language and vision-language models are rapidly being deployed in practice thanks to their impressive capabilities in instruction following, in-context learning, and so on. This raises an urgent need to carefully analyse their robustness so that stakeholders can understand if and when such models are trustworthy enough to be relied upon in any given application. In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA). Specifically, we show empirically that popular models are vulnerable to adversarial permutation in answer sets for multiple-choice prompting, which is surprising as models should ideally be as invariant to prompt permutation as humans are. These vulnerabilities persist across various model sizes, and exist in very recent language and vision-language models. Code is available at https://github.com/ys-zong/FoolyourVLLMs., Comment: ICML 2024; v3 fix typo
Published: 2023

18. Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics

Author: Tu, Haoqin, Zhao, Bingchen, Wei, Chen, and Xie, Cihang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses. While they excel in multi-modal tasks, the pure NLP abilities of MLLMs are often underestimated and left untested. In this study, we get out of the box and unveil an intriguing characteristic of MLLMs -- our preliminary results suggest that visual instruction tuning, a prevailing strategy for transitioning LLMs into MLLMs, unexpectedly and interestingly helps models attain both improved truthfulness and ethical alignment in the pure NLP context. For example, a visual-instruction-tuned LLaMA2 7B model surpasses the performance of the LLaMA2-chat 7B model, fine-tuned with over one million human annotations, on TruthfulQA-mc and Ethics benchmarks. Further analysis reveals that the improved alignment can be attributed to the superior instruction quality inherent to visual-text data. In releasing our code at github.com/UCSC-VLAA/Sight-Beyond-Text, we aspire to foster further exploration into the intrinsic value of visual-text synergies and, in a broader scope, multi-modal interactions in alignment research.
Published: 2023

19. Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery

Author: Zhao, Bingchen, Wen, Xin, and Han, Kai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we address the problem of generalized category discovery (GCD), \ie, given a set of images where part of them are labelled and the rest are not, the task is to automatically cluster the images in the unlabelled data, leveraging the information from the labelled data, while the unlabelled data contain images from the labelled classes and also new ones. GCD is similar to semi-supervised learning (SSL) but is more realistic and challenging, as SSL assumes all the unlabelled images are from the same classes as the labelled ones. We also do not assume the class number in the unlabelled data is known a-priori, making the GCD problem even harder. To tackle the problem of GCD without knowing the class number, we propose an EM-like framework that alternates between representation learning and class number estimation. We propose a semi-supervised variant of the Gaussian Mixture Model (GMM) with a stochastic splitting and merging mechanism to dynamically determine the prototypes by examining the cluster compactness and separability. With these prototypes, we leverage prototypical contrastive learning for representation learning on the partially labelled data subject to the constraints imposed by the labelled data. Our framework alternates between these two steps until convergence. The cluster assignment for an unlabelled instance can then be retrieved by identifying its nearest prototype. We comprehensively evaluate our framework on both generic image classification datasets and challenging fine-grained object recognition datasets, achieving state-of-the-art performance., Comment: This paper is accepted at ICCV 2023
Published: 2023

20. Incremental Generalized Category Discovery

Author: Zhao, Bingchen and Mac Aodha, Oisin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We explore the problem of Incremental Generalized Category Discovery (IGCD). This is a challenging category incremental learning setting where the goal is to develop models that can correctly categorize images from previously seen categories, in addition to discovering novel ones. Learning is performed over a series of time steps where the model obtains new labeled and unlabeled data, and discards old data, at each iteration. The difficulty of the problem is compounded in our generalized setting as the unlabeled data can contain images from categories that may or may not have been observed before. We present a new method for IGCD which combines non-parametric categorization with efficient image sampling to mitigate catastrophic forgetting. To quantify performance, we propose a new benchmark dataset named iNatIGCD that is motivated by a real-world fine-grained visual categorization task. In our experiments we outperform existing related methods, Comment: This paper is accepted at ICCV 2023
Published: 2023

21. OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

Author: Zhao, Bingchen, Wang, Jiahao, Ma, Wufei, Jesslen, Artur, Yang, Siwei, Yu, Shaozuo, Zendel, Oliver, Theobalt, Christian, Yuille, Alan, and Kortylewski, Adam
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV-v2, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking of models for image classification, object detection, and 3D pose estimation. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods, which reveal that: 1) Some nuisance factors have a much stronger negative effect on the performance compared to others, also depending on the vision task. 2) Current approaches to enhance robustness have only marginal effects, and can even reduce robustness. 3) We do not observe significant differences between convolutional and transformer architectures. We believe our dataset provides a rich test bed to study robustness and will help push forward research in this area. Our dataset can be accessed from https://bzhao.me/OOD-CV/, Comment: arXiv admin note: substantial text overlap with arXiv:2111.14341
Published: 2023

22. Vision Learners Meet Web Image-Text Pairs

Author: Zhao, Bingchen, Cui, Quan, Wu, Hao, Yoshie, Osamu, Yang, Cheng, and Mac Aodha, Oisin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Many self-supervised learning methods are pre-trained on the well-curated ImageNet-1K dataset. In this work, given the excellent scalability of web data, we consider self-supervised pre-training on noisy web sourced image-text paired data. First, we conduct a benchmark study of representative self-supervised pre-training methods on large-scale web data in a like-for-like setting. We compare a range of methods, including single-modal ones that use masked training objectives and multi-modal ones that use image-text constrastive training. We observe that existing multi-modal methods do not outperform their single-modal counterparts on vision transfer learning tasks. We derive an information-theoretical view to explain these benchmark results, which provides insight into how to design a novel vision learner. Inspired by this insight, we present a new visual representation pre-training method, MUlti-modal Generator~(MUG), that learns from scalable web sourced image-text data. MUG achieves state-of-the-art transfer performance on a variety of tasks and demonstrates promising scaling properties. Pre-trained models and code will be made public upon acceptance., Comment: Project page: https://bzhao.me/MUG/
Published: 2023

23. Constraining Light Scalar Field with Torsion-Balance Gravity Experiments

Author: Qin, ChengGang, Lu, XiaoYu, Zhao, BingChen, Ke, Jun, Du, AnBin, Luo, Jie, Tan, YuJie, and Shao, ChengGang
Subjects: General Relativity and Quantum Cosmology
Abstract: The light scalar field with a coupling to standard model particles provide a possible source of the dark matter, long-range Yukawa forces or violation of the weak equivalence principle, which can be potentially explored by precision gravity experiments. We describe the searches for such light scalar fields with the three types of gravity experiments, including the $G$-measurement experiments, Inverse-Square Law (ISL) experiments, and equivalence principle experiments. We investigate the potential influences of the scalar field as a function of its mass, and focus on the experimental constraints from torsion-balance gravity experiments. HUST-18 $G$-measurement torsion-balance experiments place bounds on the photon coupling and electron coupling at up to $\Lambda_{\gamma}=7\times10^{17}$ GeV and $\Lambda_{e}=1\times10^{17}$ GeV in the mass ranges $10^{-9}-10^{-4}$ eV. Results from the ISL experiments by the Universities of Washington, Stanford, IUPUI, HUST, Colorado, Irvine, Yale and others allow us to set limits on the photon coupling and electron coupling at up to $\Lambda_{\gamma}=5\times10^{17}$ GeV and $\Lambda_{e}=3\times10^{16}$ GeV for scalar field mass ranges between $10^{-5}$ and $10^{-1}$ eV. Additionally, we also discuss the limits from equivalence principle experiments, and $MICROSCOPE$ final result updates the constrains on the coupling parameters at up to $\Lambda_{\gamma}=7\times10^{22}$ GeV and $\Lambda_{e}=4\times10^{21}$ GeV for mass ranges $\lesssim 10^{-13}$ eV. These results contribute experimental constraints to relatively unexplored mass regions of {light scalar field} parameter space and improve upon previous limits in some mass ranges. This work paves the way for long-range Yukawa forces mediated by light scalar fields in future high-precision gravity experiments., Comment: 21 pages, 6 fiures
Published: 2022

24. Parametric Classification for Generalized Category Discovery: A Baseline Study

Author: Wen, Xin, Zhao, Bingchen, and Qi, Xiaojuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. Previous studies argued that parametric classifiers are prone to overfitting to seen categories, and endorsed using a non-parametric classifier formed with semi-supervised k-means. However, in this study, we investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We demonstrate that two prediction biases exist: the classifier tends to predict seen classes more often, and produces an imbalanced distribution across seen and novel categories. Based on these findings, we propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers. We hope the investigation and proposed simple framework can serve as a strong baseline to facilitate future studies in this field. Our code is available at: https://github.com/CVMI-Lab/SimGCD., Comment: v3: ICCV'23 version; v4: updated the dataset table
Published: 2022

25. One Venue, Two Conferences: The Separation of Chinese and American Citation Networks

Author: Zhao, Bingchen, Gu, Yuling, Forde, Jessica Zosa, and Saphra, Naomi
Subjects: Computer Science - Digital Libraries, Computer Science - Machine Learning
Abstract: At NeurIPS, American and Chinese institutions cite papers from each other's regions substantially less than they cite endogamously. We build a citation graph to quantify this divide, compare it to European connectivity, and discuss the causes and consequences of the separation., Comment: Workshop on Cultures of AI and AI for Culture @ NeurIPS 2022
Published: 2022

26. XCon: Learning with Experts for Fine-grained Category Discovery

Author: Fei, Yixin, Zhao, Zhongkai, Yang, Siwei, and Zhao, Bingchen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We address the problem of generalized category discovery (GCD) in this paper, i.e. clustering the unlabeled images leveraging the information from a set of seen classes, where the unlabeled images could contain both seen classes and unseen classes. The seen classes can be seen as an implicit criterion of classes, which makes this setting different from unsupervised clustering where the cluster criteria may be ambiguous. We mainly concern the problem of discovering categories within a fine-grained dataset since it is one of the most direct applications of category discovery, i.e. helping experts discover novel concepts within an unlabeled dataset using the implicit criterion set forth by the seen classes. State-of-the-art methods for generalized category discovery leverage contrastive learning to learn the representations, but the large inter-class similarity and intra-class variance pose a challenge for the methods because the negative examples may contain irrelevant cues for recognizing a category so the algorithms may converge to a local-minima. We present a novel method called Expert-Contrastive Learning (XCon) to help the model to mine useful information from the images by first partitioning the dataset into sub-datasets using k-means clustering and then performing contrastive learning on each of the sub-datasets to learn fine-grained discriminative features. Experiments on fine-grained datasets show a clear improved performance over the previous best methods, indicating the effectiveness of our method.
Published: 2022

27. Self-Supervised Visual Representation Learning with Semantic Grouping

Author: Wen, Xin, Zhao, Bingchen, Zheng, Anlin, Zhang, Xiangyu, and Qi, Xiaojuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric data. Existing works have demonstrated the potential of utilizing the underlying complex structure within scene-centric data; still, they commonly rely on hand-crafted objectness priors or specialized pretext tasks to build a learning framework, which may harm generalizability. Instead, we propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning. The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots. Based on the learned data-dependent slots, a contrastive objective is employed for representation learning, which enhances the discriminability of features, and conversely facilitates grouping semantically coherent pixels together. Compared with previous efforts, by simultaneously optimizing the two coupled objectives of semantic grouping and contrastive learning, our approach bypasses the disadvantages of hand-crafted priors and is able to learn object/group-level representations from scene-centric images. Experiments show our approach effectively decomposes complex scenes into semantic groups for feature learning and significantly benefits downstream tasks, including object detection, instance segmentation, and semantic segmentation. Code is available at: https://github.com/CVMI-Lab/SlotCon., Comment: Accepted at NeurIPS 2022
Published: 2022

28. Fabrication and building energy-saving performance evaluation of polyethylene glycol/polymethyl methacrylate/expanded graphite thermal enhanced shape-stable phase change material

Author: Wei, Hanze, Zheng, Ziao, Xu, Xiaoling, Zheng, Chunyuan, Li, Bin, Zhao, Bingchen, Wei, Ziqing, and Zhai, Xiaoqiang
Published: 2024
Full Text: View/download PDF

29. Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective

Author: Cui, Quan, Zhao, Bingchen, Chen, Zhao-Min, Zhao, Borui, Song, Renjie, Liang, Jiajun, Zhou, Boyan, and Yoshie, Osamu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i.e., image classification. By a comprehensive temporal analysis, we observe a trade-off between these two properties. The discriminability keeps increasing with the training progressing while the transferability intensely diminishes in the later training period. From the perspective of information-bottleneck theory, we reveal that the incompatibility between discriminability and transferability is attributed to the over-compression of input information. More importantly, we investigate why and how the InfoNCE loss can alleviate the over-compression, and further present a learning framework, named contrastive temporal coding~(CTC), to counteract the over-compression and alleviate the incompatibility. Extensive experiments validate that CTC successfully mitigates the incompatibility, yielding discriminative and transferable representations. Noticeable improvements are achieved on the image classification task and challenging transfer learning tasks. We hope that this work will raise the significance of the transferability property in the conventional supervised learning setting. Code is available at https://github.com/DTennant/dt-tradeoff., Comment: Accepted by ECCV 2022, Quan Cui and Bingchen Zhao contributed equally to this work
Published: 2022

30. OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

Author: Zhao, Bingchen, Yu, Shaozuo, Ma, Wufei, Yu, Mingxin, Mei, Shenxiao, Wang, Angtian, He, Ju, Yuille, Alan, and Kortylewski, Adam
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking models for image classification, object detection, and 3D pose estimation. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods, which reveal that: 1. Some nuisance factors have a much stronger negative effect on the performance compared to others, also depending on the vision task. 2. Current approaches to enhance robustness have only marginal effects, and can even reduce robustness. 3. We do not observe significant differences between convolutional and transformer architectures. We believe our dataset provides a rich testbed to study robustness and will help push forward research in this area., Comment: Project webpage: http://bzhao.me/OOD-CV/, this work is accepted as Oral at ECCV 2022
Published: 2021

31. Improving Contrastive Learning by Visualizing Feature Transformation

Author: Zhu, Rui, Zhao, Bingchen, Liu, Jingen, Sun, Zhenglong, and Chen, Chang Wen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Contrastive learning, which aims at minimizing the distance between positive pairs while maximizing that of negative ones, has been widely and successfully applied in unsupervised feature learning, where the design of positive and negative (pos/neg) pairs is one of its keys. In this paper, we attempt to devise a feature-level data manipulation, differing from data augmentation, to enhance the generic contrastive self-supervised learning. To this end, we first design a visualization scheme for pos/neg score (Pos/neg score indicates cosine similarity of pos/neg pair.) distribution, which enables us to analyze, interpret and understand the learning process. To our knowledge, this is the first attempt of its kind. More importantly, leveraging this tool, we gain some significant observations, which inspire our novel Feature Transformation proposals including the extrapolation of positives. This operation creates harder positives to boost the learning because hard positives enable the model to be more view-invariant. Besides, we propose the interpolation among negatives, which provides diversified negatives and makes the model more discriminative. It is the first attempt to deal with both challenges simultaneously. Experiment results show that our proposed Feature Transformation can improve at least 6.0% accuracy on ImageNet-100 over MoCo baseline, and about 2.0% accuracy on ImageNet-1K over the MoCoV2 baseline. Transferring to the downstream tasks successfully demonstrate our model is less task-bias. Visualization tools and codes https://github.com/DTennant/CL-Visualizing-Feature-Transformation ., Comment: ICCV 2021(Oral), supplementary materials included. Codes and visualization tools: https://github.com/DTennant/CL-Visualizing-Feature-Transformation
Published: 2021

32. Novel Visual Category Discovery with Dual Ranking Statistics and Mutual Knowledge Distillation

Author: Zhao, Bingchen and Han, Kai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we tackle the problem of novel visual category discovery, i.e., grouping unlabelled images from new classes into different semantic partitions by leveraging a labelled dataset that contains images from other different but relevant categories. This is a more realistic and challenging setting than conventional semi-supervised learning. We propose a two-branch learning framework for this problem, with one branch focusing on local part-level information and the other branch focusing on overall characteristics. To transfer knowledge from the labelled data to the unlabelled, we propose using dual ranking statistics on both branches to generate pseudo labels for training on the unlabelled data. We further introduce a mutual knowledge distillation method to allow information exchange and encourage agreement between the two branches for discovering new categories, allowing our model to enjoy the benefits of global and local features. We comprehensively evaluate our method on public benchmarks for generic object classification, as well as the more challenging datasets for fine-grained visual recognition, achieving state-of-the-art performance.
Published: 2021

33. Predictive thermal performance analysis of T-wall based adsorption thermal battery for solar building heating

Author: Zeng, Ziya, Zhao, Bingchen, Yang, Xinge, Chen, Zhihui, Yu, Jiaqi, Chua, Kian Jon Ernest, and Wang, Ruzhu
Published: 2024
Full Text: View/download PDF

34. Analyzing excess heat factors of convective/radiant terminals: Balancing beginning and steady stage

Author: Yang, Zixu, Chi, Junjie, Luo, Bin, Liu, Shurong, Zhao, Bingchen, Li, Yujie, Sun, Hongli, Lin, Borong, and Shi, Wenxing
Published: 2024
Full Text: View/download PDF

35. Rail-5k: a Real-World Dataset for Rail Surface Defects Detection

Author: Zhang, Zihao, Yu, Shaozuo, Yang, Siwei, Zhou, Yu, and Zhao, Bingchen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper presents the Rail-5k dataset for benchmarking the performance of visual algorithms in a real-world application scenario, namely the rail surface defects detection task. We collected over 5k high-quality images from railways across China, and annotated 1100 images with the help from railway experts to identify the most common 13 types of rail defects. The dataset can be used for two settings both with unique challenges, the first is the fully-supervised setting using the 1k+ labeled images for training, fine-grained nature and long-tailed distribution of defect classes makes it hard for visual algorithms to tackle. The second is the semi-supervised learning setting facilitated by the 4k unlabeled images, these 4k images are uncurated containing possible image corruptions and domain shift with the labeled images, which can not be easily tackle by previous semi-supervised learning methods. We believe our dataset could be a valuable benchmark for evaluating robustness and reliability of visual algorithms.
Published: 2021

36. Reducing the feature divergence of RGB and near-infrared images using Switchable Normalization

Author: Yang, Siwei, Yu, Shaozuo, Zhao, Bingchen, and Wang, Yin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual pattern recognition over agricultural areas is an important application of aerial image processing. In this paper, we consider the multi-modality nature of agricultural aerial images and show that naively combining different modalities together without taking the feature divergence into account can lead to sub-optimal results. Thus, we apply a Switchable Normalization block to our DeepLabV3 segmentation model to alleviate the feature divergence. Using the popular symmetric Kullback Leibler divergence measure, we show that our model can greatly reduce the divergence between RGB and near-infrared channels. Together with a hybrid loss function, our model achieves nearly 10\% improvements in mean IoU over previously published baseline., Comment: CVPR2020 AgriVision workshop
Published: 2021

37. Temporal Context Aggregation for Video Retrieval with Contrastive Learning

Author: Shao, Jie, Wen, Xin, Zhao, Bingchen, and Xue, Xiangyang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: The current research focus on Content-Based Video Retrieval requires higher-level video representation describing the long-range semantic dependencies of relevant incidents, events, etc. However, existing methods commonly process the frames of a video as individual images or short clips, making the modeling of long-range semantic dependencies difficult. In this paper, we propose TCA (Temporal Context Aggregation for Video Retrieval), a video representation learning framework that incorporates long-range temporal information between frame-level features using the self-attention mechanism. To train it on video retrieval datasets, we propose a supervised contrastive learning method that performs automatic hard negative mining and utilizes the memory bank mechanism to increase the capacity of negative samples. Extensive experiments are conducted on multiple video retrieval tasks, such as CC_WEB_VIDEO, FIVR-200K, and EVVE. The proposed method shows a significant performance advantage (~17% mAP on FIVR-200K) over state-of-the-art methods with video-level features, and deliver competitive results with 22x faster inference time comparing with frame-level features.
Published: 2020

38. Distilling Visual Priors from Self-Supervised Learning

Author: Zhao, Bingchen and Wen, Xin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Convolutional Neural Networks (CNNs) are prone to overfit small training datasets. We present a novel two-phase pipeline that leverages self-supervised learning and knowledge distillation to improve the generalization ability of CNN models for image classification under the data-deficient setting. The first phase is to learn a teacher model which possesses rich and generalizable visual representations via self-supervised learning, and the second phase is to distill the representations into a student model in a self-distillation manner, and meanwhile fine-tune the student model for the image classification task. We also propose a novel margin loss for the self-supervised contrastive learning proxy task to better learn the representation under the data-deficient scenario. Together with other tricks, we achieve competitive performance in the VIPriors image classification challenge., Comment: This is the 2nd place tech report for VIPriors Image Classification Challenge ECCVW2020
Published: 2020

39. The 1st Agriculture-Vision Challenge: Methods and Results

Author: Chiu, Mang Tik, Xu, Xingqian, Wang, Kai, Hobbs, Jennifer, Hovakimyan, Naira, Huang, Thomas S., Shi, Honghui, Wei, Yunchao, Huang, Zilong, Schwing, Alexander, Brunner, Robert, Dozier, Ivan, Dozier, Wyatt, Ghandilyan, Karen, Wilson, David, Park, Hyunseong, Kim, Junhee, Kim, Sungho, Liu, Qinghui, Kampffmeyer, Michael C., Jenssen, Robert, Salberg, Arnt B., Barbosa, Alexandre, Trevisan, Rodrigo, Zhao, Bingchen, Yu, Shaozuo, Yang, Siwei, Wang, Yin, Sheng, Hao, Chen, Xiao, Su, Jingyi, Rajagopal, Ram, Ng, Andrew, Huynh, Van Thong, Kim, Soo-Hyung, Na, In-Seop, Baid, Ujjwal, Innani, Shubham, Dutande, Prasad, Baheti, Bhakti, Talbar, Sanjay, and Tang, Jianyu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agriculture-Vision Challenge Dataset was employed, which comprises of 21,061 aerial and multi-spectral farmland images. This paper provides a summary of notable methods and results in the challenge. Our submission server and leaderboard will continue to open for researchers that are interested in this challenge dataset and task; the link can be found here., Comment: CVPR 2020 Workshop
Published: 2020

40. Strategies of stable thermal output and humidity dual control for a packed-bed adsorption thermal battery

Author: Zeng, Ziya, Zhao, Bingchen, Chen, Weidong, Ernest Chua, Kian Jon, and Wang, Ruzhu
Published: 2023
Full Text: View/download PDF

41. High-power-density packed-bed thermal energy storage using form-stable expanded graphite-based phase change composite

Author: Zeng, Ziya, Zhao, Bingchen, and Wang, Ruzhu
Published: 2023
Full Text: View/download PDF

42. OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

Author: Zhao, Bingchen, Yu, Shaozuo, Ma, Wufei, Yu, Mingxin, Mei, Shenxiao, Wang, Angtian, He, Ju, Yuille, Alan, Kortylewski, Adam, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
Published: 2022
Full Text: View/download PDF

43. Water based adsorption thermal battery: Sorption mechanisms and applications

Author: Zeng, Ziya, Zhao, Bingchen, and Wang, Ruzhu
Published: 2023
Full Text: View/download PDF

44. Thermal performance investigation of large-scale latent heat storage for heat supply

Author: Zhao, Bingchen and Wang, Ruzhu
Published: 2022
Full Text: View/download PDF

45. Highly stretchable and conformal electromagnetic interference shielding armor with strain sensing ability

Author: Xu, Jiandong, Chang, Hao, Zhao, Bingchen, Li, Ruisong, Cui, Tianrui, Jian, Jinming, Yang, Yi, Tian, He, Zhang, Sheng, and Ren, Tian-Ling
Published: 2022
Full Text: View/download PDF

46. Deciphering the Selectivity of CBL-B Inhibitors Using All-Atom Molecular Dynamics and Machine Learning

Author: Zhou, Feng, primary, Du, Haolin, additional, Wang, Yang, additional, Fu, Weiqiang, additional, Zhao, Bingchen, additional, Zhou, Jielong, additional, and Zhang, Yingsheng J., additional
Published: 2024
Full Text: View/download PDF

47. Characterizing Robotic and Organic Query in SPARQL Search Sessions

Author: Zhang, Xinyue, Wang, Meng, Zhao, Bingchen, Liu, Ruyang, Zhang, Jingyuan, Yang, Han, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wang, Xin, editor, Zhang, Rui, editor, Lee, Young-Koo, editor, Sun, Le, editor, and Moon, Yang-Sae, editor
Published: 2020
Full Text: View/download PDF

48. Materials for Thermal Energy Storage: Classification, Selection and Characterization

Author: Zhao, Bingchen, primary, Zhang, Yannan, additional, and Wang, Ruzhu, additional
Published: 2022
Full Text: View/download PDF

49. Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective

Author: Cui, Quan, primary, Zhao, Bingchen, additional, Chen, Zhao-Min, additional, Zhao, Borui, additional, Song, Renjie, additional, Zhou, Boyan, additional, Liang, Jiajun, additional, and Yoshie, Osamu, additional
Published: 2022
Full Text: View/download PDF

50. OOD-CV-v2 : An Extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

Author: Zhao, Bingchen, Wang, Jiahao, Ma, Wufei, Jesslen, Artur, Yang, Siwei, Yu, Shaozuo, Zendel, Oliver, Theobalt, Christian, Yuille, Alan L., and Kortylewski, Adam
Abstract: Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV-v2, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking of models for image classification, object detection, and 3D pose estimation. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods, which reveal that: 1) Some nuisance factors have a much stronger negative effect on the performance compared to others, also depending on the vision task. 2) Current approaches to enhance robustness have only marginal effects, and can even reduce robustness. 3) We do not observe significant differences between convolutional and transformer architectures. We believe our dataset provides a rich test bed to study robustness and will help push forward research in this area.
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

151 results on '"Zhao, Bingchen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources