Author: "Xiao, Xuefeng" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xiao, Xuefeng"' showing total 319 results

Start Over Author "Xiao, Xuefeng"

319 results on '"Xiao, Xuefeng"'

1. IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Author: Ji, Yatai, Zhang, Shilong, Wu, Jie, Sun, Peize, Chen, Weifeng, Xiao, Xuefeng, Yang, Sidi, Yang, Yujiu, and Luo, Ping
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and intricate plots. Towards movie understanding, a critical initial step for LVLMs is to unleash the potential of character identities memory and recognition across multiple visual scenarios. To achieve the goal, we propose visual instruction tuning with ID reference and develop an ID-Aware Large Vision-Language Model, IDA-VLM. Furthermore, our research introduces a novel benchmark MM-ID, to examine LVLMs on instance IDs memory and recognition across four dimensions: matching, location, question-answering, and captioning. Our findings highlight the limitations of existing LVLMs in recognizing and associating instance identities with ID reference. This paper paves the way for future artificial intelligence systems to possess multi-identity visual inputs, thereby facilitating the comprehension of complex visual narratives like movies.
Published: 2024

2. ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Author: Chen, Weifeng, Zhang, Jiacheng, Wu, Jie, Wu, Hefeng, Xiao, Xuefeng, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present \textbf{ID-Aligner}, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. \textbf{Project Page: \url{https://idaligner.github.io/}}
Published: 2024

3. Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Author: Ren, Yuxi, Xia, Xin, Lu, Yanzuo, Zhang, Jiacheng, Wu, Jie, Xie, Pan, Wang, Xing, and Xiao, Xuefeng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference., Comment: Accepted by NeurIPS 2024 (Camera-Ready Version). Project Page: https://hyper-sd.github.io/
Published: 2024

4. ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Author: Li, Ming, Yang, Taojiannan, Kuang, Huafeng, Wu, Jie, Wang, Zhaoning, Xiao, Xuefeng, and Chen, Chen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: To enhance the controllability of text-to-image diffusion models, existing efforts like ControlNet incorporated image-based conditional controls. In this paper, we reveal that existing methods still face significant challenges in generating images that align with the image conditional controls. To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls. Specifically, for an input conditional control, we use a pre-trained discriminative reward model to extract the corresponding condition of the generated images, and then optimize the consistency loss between the input conditional control and extracted condition. A straightforward implementation would be generating images from random noises and then calculating the consistency loss, but such an approach requires storing gradients for multiple sampling timesteps, leading to considerable time and memory costs. To address this, we introduce an efficient reward strategy that deliberately disturbs the input images by adding noise, and then uses the single-step denoised images for reward fine-tuning. This avoids the extensive costs associated with image sampling, allowing for more efficient reward fine-tuning. Extensive experiments show that ControlNet++ significantly improves controllability under various conditional controls. For example, it achieves improvements over ControlNet by 11.1% mIoU, 13.4% SSIM, and 7.6% RMSE, respectively, for segmentation mask, line-art edge, and depth conditions. All the code, models, demo and organized data have been open sourced on our Github Repo., Comment: Camera Ready Version. Project Page: https://liming-ai.github.io/ControlNet_Plus_Plus; Code & Data: https://github.com/liming-ai/ControlNet_Plus_Plus
Published: 2024

5. UniFL: Improve Stable Diffusion via Unified Feedback Learning

Author: Zhang, Jiacheng, Wu, Jie, Ren, Yuxi, Xia, Xin, Kuang, Huafeng, Xie, Pan, Li, Jiashi, Xiao, Xuefeng, Zheng, Min, Fu, Lean, and Li, Guanbin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications. However, despite these significant advancements, the current competitive solutions still suffer from several limitations, including inferior visual quality, a lack of aesthetic appeal, and inefficient inference, without a comprehensive solution in sight. To address these challenges, we present UniFL, a unified framework that leverages feedback learning to enhance diffusion models comprehensively. UniFL stands out as a universal, effective, and generalizable solution applicable to various diffusion models, such as SD1.5 and SDXL. Notably, UniFL incorporates three key components: perceptual feedback learning, which enhances visual quality; decoupled feedback learning, which improves aesthetic appeal; and adversarial feedback learning, which optimizes inference speed. In-depth experiments and extensive user studies validate the superior performance of our proposed method in enhancing both the quality of generated models and their acceleration. For instance, UniFL surpasses ImageReward by 17% user preference in terms of generation quality and outperforms LCM and SDXL Turbo by 57% and 20% in 4-step inference. Moreover, we have verified the efficacy of our approach in downstream tasks, including Lora, ControlNet, and AnimateDiff.
Published: 2024

6. ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Author: Ren, Yuxi, Wu, Jie, Lu, Yanzuo, Kuang, Huafeng, Xia, Xin, Wang, Xionghui, Wang, Qianqian, Zhu, Yixing, Xie, Pan, Wang, Shiyin, Xiao, Xuefeng, Wang, Yitong, Zheng, Min, and Fu, Lean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field grapples with inherent challenges, including: i) inferior quality; ii) poor consistency; iii) insufficient instrcution adherence; iv) suboptimal generation efficiency. To address these obstacles, we present ByteEdit, an innovative feedback learning framework meticulously designed to Boost, Comply, and Accelerate Generative Image Editing tasks. ByteEdit seamlessly integrates image reward models dedicated to enhancing aesthetics and image-text alignment, while also introducing a dense, pixel-level reward model tailored to foster coherence in the output. Furthermore, we propose a pioneering adversarial and progressive feedback learning strategy to expedite the model's inference speed. Through extensive large-scale user evaluations, we demonstrate that ByteEdit surpasses leading generative image editing products, including Adobe, Canva, and MeiTu, in both generation quality and consistency. ByteEdit-Outpainting exhibits a remarkable enhancement of 388% and 135% in quality and consistency, respectively, when compared to the baseline model. Experiments also verfied that our acceleration models maintains excellent performance results in terms of quality and consistency.
Published: 2024

7. AffineQuant: Affine Transformation Quantization for Large Language Models

Author: Ma, Yuexiao, Li, Huixia, Zheng, Xiawu, Ling, Feng, Xiao, Xuefeng, Wang, Rui, Wen, Shilei, Chao, Fei, and Ji, Rongrong
Subjects: Computer Science - Machine Learning
Abstract: The significant resource requirements associated with Large-scale Language Models (LLMs) have generated considerable interest in the development of techniques aimed at compressing and accelerating neural networks. Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training. Existing PTQ methods for LLMs limit the optimization scope to scaling transformations between pre- and post-quantization weights. In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant). This approach extends the optimization scope and thus significantly minimizing quantization errors. Additionally, by employing the corresponding inverse matrix, we can ensure equivalence between the pre- and post-quantization outputs of PTQ, thereby maintaining its efficiency and generalization capabilities. To ensure the invertibility of the transformation during optimization, we further introduce a gradual mask optimization method. This method initially focuses on optimizing the diagonal elements and gradually extends to the other elements. Such an approach aligns with the Levy-Desplanques theorem, theoretically ensuring invertibility of the transformation. As a result, significant performance improvements are evident across different LLMs on diverse datasets. To illustrate, we attain a C4 perplexity of 15.76 (2.26 lower vs 18.02 in OmniQuant) on the LLaMA2-7B model of W4A4 quantization without overhead. On zero-shot tasks, AffineQuant achieves an average of 58.61 accuracy (1.98 lower vs 56.63 in OmniQuant) when using 4/4-bit quantization for LLaMA-30B, which setting a new state-of-the-art benchmark for PTQ in LLMs., Comment: ICLR 2024
Published: 2024

8. VmambaIR: Visual State Space Model for Image Restoration

Author: Shi, Yuan, Xia, Bin, Jin, Xiaoyu, Wang, Xing, Zhao, Tianyu, Xia, Xin, Xiao, Xuefeng, and Yang, Wenming
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Image restoration is a critical task in low-level computer vision, aiming to restore high-quality images from degraded inputs. Various models, such as convolutional neural networks (CNNs), generative adversarial networks (GANs), transformers, and diffusion models (DMs), have been employed to address this problem with significant impact. However, CNNs have limitations in capturing long-range dependencies. DMs require large prior models and computationally intensive denoising steps. Transformers have powerful modeling capabilities but face challenges due to quadratic complexity with input image size. To address these challenges, we propose VmambaIR, which introduces State Space Models (SSMs) with linear complexity into comprehensive image restoration tasks. We utilize a Unet architecture to stack our proposed Omni Selective Scan (OSS) blocks, consisting of an OSS module and an Efficient Feed-Forward Network (EFFN). Our proposed omni selective scan mechanism overcomes the unidirectional modeling limitation of SSMs by efficiently modeling image information flows in all six directions. Furthermore, we conducted a comprehensive evaluation of our VmambaIR across multiple image restoration tasks, including image deraining, single image super-resolution, and real-world image super-resolution. Extensive experimental results demonstrate that our proposed VmambaIR achieves state-of-the-art (SOTA) performance with much fewer computational resources and parameters. Our research highlights the potential of state space models as promising alternatives to the transformer and CNN architectures in serving as foundational frameworks for next-generation low-level visual tasks., Comment: 23 pages
Published: 2024

9. ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Author: Cheng, Jiaxiang, Xie, Pan, Xia, Xin, Li, Jiashi, Wu, Jie, Ren, Yuxi, Li, Huixia, Xiao, Xuefeng, Zheng, Min, and Fu, Lean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter), a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. Unlike other multi-resolution generation methods that process images of static resolution with complex post-process operations, ResAdapter directly generates images with the dynamical resolution. Especially, after learning a deep understanding of pure resolution priors, ResAdapter trained on the general dataset, generates resolution-free images with personalized diffusion models while preserving their original style domain. Comprehensive experiments demonstrate that ResAdapter with only 0.5M can process images with flexible resolutions for arbitrary diffusion models. More extended experiments demonstrate that ResAdapter is compatible with other modules (e.g., ControlNet, IP-Adapter and LCM-LoRA) for image generation across a broad range of resolutions, and can be integrated into other multi-resolution model (e.g., ElasticDiffusion) for efficiently generating higher-resolution images. Project link is https://res-adapter.github.io, Comment: 21 pages, 16 figures
Published: 2024

10. ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Author: Ren, Yuxi, Wu, Jie, Lu, Yanzuo, Kuang, Huafeng, Xia, Xin, Wang, Xionghui, Wang, Qianqian, Zhu, Yixing, Xie, Pan, Wang, Shiyin, Xiao, Xuefeng, Wang, Yitong, Zheng, Min, Fu, Lean, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

11. ControlNet: Improving Conditional Controls with Efficient Consistency Feedback : Project Page: liming-ai.github.io/ControlNet_Plus_Plus

Author: Li, Ming, Yang, Taojiannan, Kuang, Huafeng, Wu, Jie, Wang, Zhaoning, Xiao, Xuefeng, Chen, Chen, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

12. AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

Author: Li, Lijiang, Li, Huixia, Zheng, Xiawu, Wu, Jie, Xiao, Xuefeng, Wang, Rui, Zheng, Min, Pan, Xin, Chao, Fei, and Ji, Rongrong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion models are emerging expressive generative models, in which a large number of time steps (inference steps) are required for a single image generation. To accelerate such tedious process, reducing steps uniformly is considered as an undisputed principle of diffusion models. We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optimal time steps for different models. Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training. Specifically, we first design a unified search space that consists of all possible time steps and various architectures. Then, a two stage evolutionary algorithm is introduced to find the optimal solution in the designed search space. To further accelerate the search process, we employ FID score between generated and real samples to estimate the performance of the sampled examples. As a result, the proposed method is (i).training-free, obtaining the optimal time steps and model architecture without any training process; (ii). orthogonal to most advanced diffusion samplers and can be integrated to gain better sample quality. (iii). generalized, where the searched time steps and architectures can be directly applied on different diffusion models with the same guidance scale. Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $\times$ 64 with only four steps, compared to 138.66 with DDIM. The code is available at https://github.com/lilijiangg/AutoDiffusion.
Published: 2023

13. UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Author: Ren, Yuxi, Wu, Jie, Zhang, Peng, Zhang, Manlin, Xiao, Xuefeng, He, Qian, Wang, Rui, Zheng, Min, and Pan, Xin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient learning with fewer training data/labels. To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. UGC sets up semi-supervised-driven network architecture search and adaptive online semi-supervised distillation stages sequentially, which formulates a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient, and performance-excellent model.
Published: 2023

14. DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

Author: Zhang, Manlin, Wu, Jie, Ren, Yuxi, Li, Ming, Qin, Jie, Xiao, Xuefeng, Liu, Wei, Wang, Rui, Zheng, Min, and Ma, Andy J.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart., Comment: Code and Models are publicly available. Project Page: https://mettyz.github.io/DiffusionEngine
Published: 2023

15. DLIP: Distilling Language-Image Pre-training

Author: Kuang, Huafeng, Wu, Jie, Zheng, Xiawu, Li, Ming, Xiao, Xuefeng, Wang, Rui, Zheng, Min, and Ji, Rongrong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Vision-Language Pre-training (VLP) shows remarkable progress with the assistance of extremely heavy parameters, which challenges deployment in real applications. Knowledge distillation is well recognized as the essential procedure in model compression. However, existing knowledge distillation techniques lack an in-depth investigation and analysis of VLP, and practical guidelines for VLP-oriented distillation are still not yet explored. In this paper, we present DLIP, a simple yet efficient Distilling Language-Image Pre-training framework, through which we investigate how to distill a light VLP model. Specifically, we dissect the model distillation from multiple dimensions, such as the architecture characteristics of different modules and the information transfer of different modalities. We conduct comprehensive experiments and provide insights on distilling a light but performant VLP model. Experimental results reveal that DLIP can achieve a state-of-the-art accuracy/efficiency trade-off across diverse cross-modal tasks, e.g., image-text retrieval, image captioning and visual question answering. For example, DLIP compresses BLIP by 1.9x, from 213M to 108M parameters, while achieving comparable or better performance. Furthermore, DLIP succeeds in retaining more than 95% of the performance with 22.4% parameters and 24.8% FLOPs compared to the teacher model and accelerates inference speed by 2.7x.
Published: 2023

16. AlignDet: Aligning Pre-training and Fine-tuning in Object Detection

Author: Li, Ming, Wu, Jie, Wang, Xionghui, Chen, Chen, Qin, Jie, Xiao, Xuefeng, Wang, Rui, Zheng, Min, and Pan, Xin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The paradigm of large-scale pre-training followed by downstream fine-tuning has been widely employed in various object detection algorithms. In this paper, we reveal discrepancies in data, model, and task between the pre-training and fine-tuning procedure in existing practices, which implicitly limit the detector's performance, generalization ability, and convergence speed. To this end, we propose AlignDet, a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies. AlignDet decouples the pre-training process into two stages, i.e., image-domain and box-domain pre-training. The image-domain pre-training optimizes the detection backbone to capture holistic visual abstraction, and box-domain pre-training learns instance-level semantics and task-aware concepts to initialize the parts out of the backbone. By incorporating the self-supervised pre-trained backbones, we can pre-train all modules for various detectors in an unsupervised paradigm. As depicted in Figure 1, extensive experiments demonstrate that AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule. For example, AlignDet improves FCOS by 5.3 mAP, RetinaNet by 2.1 mAP, Faster R-CNN by 3.3 mAP, and DETR by 2.3 mAP under fewer epochs., Comment: Camera Ready Version on ICCV 2023. Code and Models are publicly available. Project Page: https://liming-ai.github.io/AlignDet
Published: 2023

17. Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning

Author: Chen, Weifeng, Ji, Yatai, Wu, Jie, Wu, Hefeng, Xie, Pan, Li, Jiashi, Xia, Xin, Xiao, Xuefeng, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: Recent advances in text-to-image (T2I) diffusion models have enabled impressive image generation capabilities guided by text prompts. However, extending these techniques to video generation remains challenging, with existing text-to-video (T2V) methods often struggling to produce high-quality and motion-consistent videos. In this work, we introduce Control-A-Video, a controllable T2V diffusion model that can generate videos conditioned on text prompts and reference control maps like edge and depth maps. To tackle video quality and motion consistency issues, we propose novel strategies to incorporate content prior and motion prior into the diffusion-based generation process. Specifically, we employ a first-frame condition scheme to transfer video generation from the image domain. Additionally, we introduce residual-based and optical flow-based noise initialization to infuse motion priors from reference videos, promoting relevance among frame latents for reduced flickering. Furthermore, we present a Spatio-Temporal Reward Feedback Learning (ST-ReFL) algorithm that optimizes the video diffusion model using multiple reward models for video quality and motion consistency, leading to superior outputs. Comprehensive experiments demonstrate that our framework generates higher-quality, more consistent videos compared to existing state-of-the-art methods in controllable text-to-video generation
Published: 2023

18. FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

Author: Qin, Jie, Wu, Jie, Yan, Pengxiang, Li, Ming, Yuxi, Ren, Xiao, Xuefeng, Wang, Yitong, Wang, Rui, Wen, Shilei, Pan, Xin, and Wang, Xingang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, open-vocabulary learning has emerged to accomplish segmentation for arbitrary categories of text-based descriptions, which popularizes the segmentation system to more general-purpose application scenarios. However, existing methods devote to designing specialized architectures or parameters for specific segmentation tasks. These customized design paradigms lead to fragmentation between various segmentation tasks, thus hindering the uniformity of segmentation models. Hence in this paper, we propose FreeSeg, a generic framework to accomplish Unified, Universal and Open-Vocabulary Image Segmentation. FreeSeg optimizes an all-in-one network via one-shot training and employs the same architecture and parameters to handle diverse segmentation tasks seamlessly in the inference procedure. Additionally, adaptive prompt learning facilitates the unified model to capture task-aware and category-sensitive concepts, improving model robustness in multi-task and varied scenarios. Extensive experimental results demonstrate that FreeSeg establishes new state-of-the-art results in performance and generalization on three segmentation tasks, which outperforms the best task-specific architectures by a large margin: 5.5% mIoU on semantic segmentation, 17.6% mAP on instance segmentation, 20.1% PQ on panoptic segmentation for the unseen class on COCO., Comment: Accepted by CVPR 2023; camera-ready version
Published: 2023

19. Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

Author: Ma, Yuexiao, Li, Huixia, Zheng, Xiawu, Xiao, Xuefeng, Wang, Rui, Wen, Shilei, Pan, Xin, Chao, Fei, and Ji, Rongrong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2*0.5., Comment: Accepted by CVPR 2023
Published: 2023

20. Multi-Objective Evolutionary for Object Detection Mobile Architectures Search

Author: Zhang, Haichao, Li, Jiashi, Xia, Xin, Hao, Kuangrong, and Xiao, Xuefeng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Recently, Neural architecture search has achieved great success on classification tasks for mobile devices. The backbone network for object detection is usually obtained on the image classification task. However, the architecture which is searched through the classification task is sub-optimal because of the gap between the task of image and object detection. As while work focuses on backbone network architecture search for mobile device object detection is limited, mainly because the backbone always requires expensive ImageNet pre-training. Accordingly, it is necessary to study the approach of network architecture search for mobile device object detection without expensive pre-training. In this work, we propose a mobile object detection backbone network architecture search algorithm which is a kind of evolutionary optimized method based on non-dominated sorting for NAS scenarios. It can quickly search to obtain the backbone network architecture within certain constraints. It better solves the problem of suboptimal linear combination accuracy and computational cost. The proposed approach can search the backbone networks with different depths, widths, or expansion sizes via a technique of weight mapping, making it possible to use NAS for mobile devices detection tasks a lot more efficiently. In our experiments, we verify the effectiveness of the proposed approach on YoloX-Lite, a lightweight version of the target detection framework. Under similar computational complexity, the accuracy of the backbone network architecture we search for is 2.0% mAP higher than MobileDet. Our improved backbone network can reduce the computational effort while improving the accuracy of the object detection network. To prove its effectiveness, a series of ablation studies have been carried out and the working mechanism has been analyzed in detail.
Published: 2022

21. Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

Author: Qin, Jie, Wu, Jie, Li, Ming, Xiao, Xuefeng, Zheng, Min, and Wang, Xingang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Albeit with varying degrees of progress in the field of Semi-Supervised Semantic Segmentation, most of its recent successes are involved in unwieldy models and the lightweight solution is still not yet explored. We find that existing knowledge distillation techniques pay more attention to pixel-level concepts from labeled data, which fails to take more informative cues within unlabeled data into account. Consequently, we offer the first attempt to provide lightweight SSSS models via a novel multi-granularity distillation (MGD) scheme, where multi-granularity is captured from three aspects: i) complementary teacher structure; ii) labeled-unlabeled data cooperative distillation; iii) hierarchical and multi-levels loss setting. Specifically, MGD is formulated as a labeled-unlabeled data cooperative distillation scheme, which helps to take full advantage of diverse data characteristics that are essential in the semi-supervised setting. Image-level semantic-sensitive loss, region-level content-aware loss, and pixel-level consistency loss are set up to enrich hierarchical distillation abstraction via structurally complementary teachers. Experimental results on PASCAL VOC2012 and Cityscapes reveal that MGD can outperform the competitive approaches by a large margin under diverse partition protocols. For example, the performance of ResNet-18 and MobileNet-v2 backbone is boosted by 11.5% and 4.6% respectively under 1/16 partition protocol on Cityscapes. Although the FLOPs of the model backbone is compressed by 3.4-5.3x (ResNet-18) and 38.7-59.6x (MobileNetv2), the model manages to achieve satisfactory segmentation results., Comment: Accepted by ECCV2022
Published: 2022

22. Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios

Author: Li, Jiashi, Xia, Xin, Li, Wei, Li, Huixia, Wang, Xing, Xiao, Xuefeng, Wang, Rui, Zheng, Min, and Pan, Xin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Due to the complex attention mechanisms and model design, most existing vision Transformers (ViTs) can not perform as efficiently as convolutional neural networks (CNNs) in realistic industrial deployment scenarios, e.g. TensorRT and CoreML. This poses a distinct challenge: Can a visual neural network be designed to infer as fast as CNNs and perform as powerful as ViTs? Recent works have tried to design CNN-Transformer hybrid architectures to address this issue, yet the overall performance of these works is far away from satisfactory. To end these, we propose a next generation vision Transformer for efficient deployment in realistic industrial scenarios, namely Next-ViT, which dominates both CNNs and ViTs from the perspective of latency/accuracy trade-off. In this work, the Next Convolution Block (NCB) and Next Transformer Block (NTB) are respectively developed to capture local and global information with deployment-friendly mechanisms. Then, Next Hybrid Strategy (NHS) is designed to stack NCB and NTB in an efficient hybrid paradigm, which boosts performance in various downstream tasks. Extensive experiments show that Next-ViT significantly outperforms existing CNNs, ViTs and CNN-Transformer hybrid architectures with respect to the latency/accuracy trade-off across various vision tasks. On TensorRT, Next-ViT surpasses ResNet by 5.5 mAP (from 40.4 to 45.9) on COCO detection and 7.7% mIoU (from 38.8% to 46.5%) on ADE20K segmentation under similar latency. Meanwhile, it achieves comparable performance with CSWin, while the inference speed is accelerated by 3.6x. On CoreML, Next-ViT surpasses EfficientFormer by 4.6 mAP (from 42.6 to 47.2) on COCO detection and 3.5% mIoU (from 45.1% to 48.6%) on ADE20K segmentation under similar latency. Our code and models are made public at: https://github.com/bytedance/Next-ViT
Published: 2022

23. Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation

Author: Li, Ming, Wu, Jie, Cai, Jinhang, Qin, Jie, Ren, Yuxi, Xiao, Xuefeng, Zheng, Min, Wang, Rui, and Pan, Xin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Recently, Synthetic data-based Instance Segmentation has become an exceedingly favorable optimization paradigm since it leverages simulation rendering and physics to generate high-quality image-annotation pairs. In this paper, we propose a Parallel Pre-trained Transformers (PPT) framework to accomplish the synthetic data-based Instance Segmentation task. Specifically, we leverage the off-the-shelf pre-trained vision Transformers to alleviate the gap between natural and synthetic data, which helps to provide good generalization in the downstream synthetic data scene with few samples. Swin-B-based CBNet V2, SwinL-based CBNet V2 and Swin-L-based Uniformer are employed for parallel feature learning, and the results of these three models are fused by pixel-level Non-maximum Suppression (NMS) algorithm to obtain more robust results. The experimental results reveal that PPT ranks first in the CVPR2022 AVA Accessibility Vision and Autonomy Challenge, with a 65.155% mAP., Comment: The solution of 1st Place in AVA Accessibility Vision and Autonomy Challenge on CVPR 2022 workshop. Website: https://accessibility-cv.github.io/
Published: 2022

24. MoCoViT: Mobile Convolutional Vision Transformer

Author: Ma, Hailong, Xia, Xin, Wang, Xing, Xiao, Xuefeng, Li, Jiashi, and Zheng, Min
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, Transformer networks have achieved impressive results on a variety of vision tasks. However, most of them are computationally expensive and not suitable for real-world mobile applications. In this work, we present Mobile Convolutional Vision Transformer (MoCoViT), which improves in performance and efficiency by introducing transformer into mobile convolutional networks to leverage the benefits of both architectures. Different from recent works on vision transformer, the mobile transformer block in MoCoViT is carefully designed for mobile devices and is very lightweight, accomplished through two primary modifications: the Mobile Self-Attention (MoSA) module and the Mobile Feed Forward Network (MoFFN). MoSA simplifies the calculation of the attention map through Branch Sharing scheme while MoFFN serves as a mobile version of MLP in the transformer, further reducing the computation by a large margin. Comprehensive experiments verify that our proposed MoCoViT family outperform state-of-the-art portable CNNs and transformer neural architectures on various vision tasks. On ImageNet classification, it achieves 74.5% top-1 accuracy at 147M FLOPs, gaining 1.2% over MobileNetV3 with less computations. And on the COCO object detection task, MoCoViT outperforms GhostNet by 2.1 AP in RetinaNet framework., Comment: After evaluation, the relevant technical details are temporarily inconvenient to be disclosed, so the manuscript is temporarily withdrawn. We will wait for the right time to reopen
Published: 2022

25. TRT-ViT: TensorRT-oriented Vision Transformer

Author: Xia, Xin, Li, Jiashi, Wu, Jie, Wang, Xing, Xiao, Xuefeng, Zheng, Min, and Wang, Rui
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We revisit the existing excellent Transformers from the perspective of practical application. Most of them are not even as efficient as the basic ResNets series and deviate from the realistic deployment scenario. It may be due to the current criterion to measure computation efficiency, such as FLOPs or parameters is one-sided, sub-optimal, and hardware-insensitive. Thus, this paper directly treats the TensorRT latency on the specific hardware as an efficiency metric, which provides more comprehensive feedback involving computational capacity, memory cost, and bandwidth. Based on a series of controlled experiments, this work derives four practical guidelines for TensorRT-oriented and deployment-friendly network design, e.g., early CNN and late Transformer at stage-level, early Transformer and late CNN at block-level. Accordingly, a family of TensortRT-oriented Transformers is presented, abbreviated as TRT-ViT. Extensive experiments demonstrate that TRT-ViT significantly outperforms existing ConvNets and vision Transformers with respect to the latency/accuracy trade-off across diverse visual tasks, e.g., image classification, object detection and semantic segmentation. For example, at 82.7% ImageNet-1k top-1 accuracy, TRT-ViT is 2.7$\times$ faster than CSWin and 2.0$\times$ faster than Twins. On the MS-COCO object detection task, TRT-ViT achieves comparable performance with Twins, while the inference speed is increased by 2.8$\times$.
Published: 2022

26. SepViT: Separable Vision Transformer

Author: Li, Wei, Wang, Xing, Xia, Xin, Wu, Jie, Li, Jiashi, Xiao, Xuefeng, Zheng, Min, and Wen, Shiping
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision Transformers have witnessed prevailing success in a series of vision tasks. However, these Transformers often rely on extensive computational costs to achieve high performance, which is burdensome to deploy on resource-constrained devices. To alleviate this issue, we draw lessons from depthwise separable convolution and imitate its ideology to design an efficient Transformer backbone, i.e., Separable Vision Transformer, abbreviated as SepViT. SepViT helps to carry out the local-global information interaction within and among the windows in sequential order via a depthwise separable self-attention. The novel window token embedding and grouped self-attention are employed to compute the attention relationship among windows with negligible cost and establish long-range visual interactions across multiple windows, respectively. Extensive experiments on general-purpose vision benchmarks demonstrate that SepViT can achieve a state-of-the-art trade-off between performance and latency. Among them, SepViT achieves 84.2% top-1 accuracy on ImageNet-1K classification while decreasing the latency by 40%, compared to the ones with similar accuracy (e.g., CSWin). Furthermore, SepViT achieves 51.0% mIoU on ADE20K semantic segmentation task, 47.9 AP on the RetinaNet-based COCO detection task, 49.4 box AP and 44.6 mask AP on Mask R-CNN-based COCO object detection and instance segmentation tasks.
Published: 2022

27. ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer

Author: Yang, Rui, Ma, Hailong, Wu, Jie, Tang, Yansong, Xiao, Xuefeng, Zheng, Min, and Li, Xiu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions. Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and global representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrices while unbinding them with the input. This scalability fetches context-oriented generalization and enhances object sensitivity, which pushes the whole network into a more effective trade-off state between accuracy and cost. Furthermore, we propose an Interactive Window-based Self-Attention (IWSA), which establishes interaction between non-overlapping regions by re-merging independent value tokens and aggregating spatial information from adjacent windows. By stacking the SSA and IWSA alternately, the Scalable Vision Transformer (ScalableViT) achieves state-of-the-art performance in general-purpose vision tasks. For example, ScalableViT-S outperforms Twins-SVT-S by 1.4% and Swin-T by 1.8% on ImageNet-1K classification., Comment: This paper appears at ECCV2022
Published: 2022

28. Highly dispersed CeO2 nanocubics supported on hydrogen substituted graphyne sheets for highly NH3 gas sensing detection and humidity independent at room temperature

Author: Zhang, Chuantao, Yu, Lingmin, Li, Senlin, Cao, Lei, Nan, Ning, Xue, Rushun, Gong, Man, Zhang, Yaxuan, Zhang, Hao, Xiao, Xuefeng, Yang, Shanglin, Fan, Xinhui, and Shi, Peichang
Published: 2025
Full Text: View/download PDF

29. Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Author: Qin, Jie, Wu, Jie, Xiao, Xuefeng, Li, Lujun, and Wang, Xingang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Image-level weakly supervised semantic segmentation (WSSS) is a fundamental yet challenging computer vision task facilitating scene understanding and automatic driving. Most existing methods resort to classification-based Class Activation Maps (CAMs) to play as the initial pseudo labels, which tend to focus on the discriminative image regions and lack customized characteristics for the segmentation task. To alleviate this issue, we propose a novel activation modulation and recalibration (AMR) scheme, which leverages a spotlight branch and a compensation branch to obtain weighted CAMs that can provide recalibration supervision and task-specific concepts. Specifically, an attention modulation module (AMM) is employed to rearrange the distribution of feature importance from the channel-spatial sequential perspective, which helps to explicitly model channel-wise interdependencies and spatial encodings to adaptively modulate segmentation-oriented activation responses. Furthermore, we introduce a cross pseudo supervision for dual branches, which can be regarded as a semantic similar regularization to mutually refine two branches. Extensive experiments show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset, surpassing not only current methods trained with the image-level of supervision but also some methods relying on stronger supervision, such as saliency label. Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance., Comment: Accepted by AAAI2022
Published: 2021

30. Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme

Author: Li, Shaojie, Wu, Jie, Xiao, Xuefeng, Chao, Fei, Mao, Xudong, and Ji, Rongrong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, a series of algorithms have been explored for GAN compression, which aims to reduce tremendous computational overhead and memory usages when deploying GANs on resource-constrained edge devices. However, most of the existing GAN compression work only focuses on how to compress the generator, while fails to take the discriminator into account. In this work, we revisit the role of discriminator in GAN compression and design a novel generator-discriminator cooperative compression scheme for GAN compression, termed GCC. Within GCC, a selective activation discriminator automatically selects and activates convolutional channels according to a local capacity constraint and a global coordination constraint, which help maintain the Nash equilibrium with the lightweight generator during the adversarial training and avoid mode collapse. The original generator and discriminator are also optimized from scratch, to play as a teacher model to progressively refine the pruned generator and the selective activation discriminator. A novel online collaborative distillation scheme is designed to take full advantage of the intermediate feature of the teacher generator and discriminator to further boost the performance of the lightweight generator. Extensive experiments on various GAN-based generation tasks demonstrate the effectiveness and generalization of GCC. Among them, GCC contributes to reducing 80% computational costs while maintains comparable performance in image translation tasks. Our code and models are available at https://github.com/SJLeo/GCC., Comment: Accepted by NeurIPS2021 (The 35th Conference on Neural Information Processing Systems)
Published: 2021

31. Transfer and transformation characteristics of Zn and Cd in soil-rotation plant (Brassica napus L and Oryza sativa L) system and its influencing factors

Author: Yan, Qiuxiao, Fang, Hui, Wang, Daoping, Xiao, Xuefeng, Deng, Tingfei, Li, Xiangying, Wei, Fuxiao, Liu, Jiming, and Lin, Changhu
Published: 2023
Full Text: View/download PDF

32. Online Multi-Granularity Distillation for GAN Compression

Author: Ren, Yuxi, Wu, Jie, Xiao, Xuefeng, and Yang, Jianchao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Generative Adversarial Networks (GANs) have witnessed prevailing success in yielding outstanding images, however, they are burdensome to deploy on resource-constrained devices due to ponderous computational costs and hulking memory usage. Although recent efforts on compressing GANs have acquired remarkable results, they still exist potential model redundancies and can be further compressed. To solve this issue, we propose a novel online multi-granularity distillation (OMGD) scheme to obtain lightweight GANs, which contributes to generating high-fidelity images with low computational demands. We offer the first attempt to popularize single-stage online distillation for GAN-oriented compression, where the progressively promoted teacher generator helps to refine the discriminator-free based student generator. Complementary teacher generators and network layers provide comprehensive and multi-granularity concepts to enhance visual fidelity from diverse dimensions. Experimental results on four benchmark datasets demonstrate that OMGD successes to compress 40x MACs and 82.5X parameters on Pix2Pix and CycleGAN, without loss of image quality. It reveals that OMGD provides a feasible solution for the deployment of real-time image translation on resource-constrained devices. Our code and models are made public at: https://github.com/bytedance/OMGD., Comment: Accepted by ICCV2021
Published: 2021

33. Network pharmacology and experimental verification to explore the anti-superficial thrombophlebitis mechanism of Mailuo shutong pill

Author: Li, Shirong, Xiao, He, Liu, Mingfei, Wang, Qingguo, Sun, Chenghong, Yao, Jingchun, Cao, Ningning, Zhang, Haifang, Zhang, Guimin, and Xiao, Xuefeng
Published: 2024
Full Text: View/download PDF

34. Fast and Accurate Quantized Camera Scene Detection on Smartphones, Mobile AI 2021 Challenge: Report

Author: Ignatov, Andrey, Malivenko, Grigory, Timofte, Radu, Chen, Sheng, Xia, Xin, Liu, Zhaoyan, Zhang, Yuwei, Zhu, Feng, Li, Jiashi, Xiao, Xuefeng, Tian, Yuan, Wu, Xinglong, Kyrkou, Christos, Chen, Yixin, Zhang, Zexin, Peng, Yunbo, Lin, Yue, Dutta, Saikat, Das, Sourya Dipta, Shah, Nisarg A., Kumar, Himanshu, Ge, Chao, Wu, Pei-Lin, Du, Jin-Hua, Batutin, Andrew, Federico, Juan Pablo, Lyda, Konrad, Khojoyan, Levon, Thanki, Abhishek, Paul, Sayak, and Siddiqui, Shahid
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions that can demonstrate a real-time performance on smartphones and IoT platforms. For this, the participants were provided with a large-scale CamSDD dataset consisting of more than 11K images belonging to the 30 most important scene categories. The runtime of all models was evaluated on the popular Apple Bionic A11 platform that can be found in many iOS devices. The proposed solutions are fully compatible with all major mobile AI accelerators and can demonstrate more than 100-200 FPS on the majority of recent smartphone platforms while achieving a top-3 accuracy of more than 98%. A detailed description of all models developed in the challenge is provided in this paper., Comment: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: substantial text overlap with arXiv:2105.08630; text overlap with arXiv:2105.07825, arXiv:2105.07809, arXiv:2105.08629
Published: 2021

35. Progressive Automatic Design of Search Space for One-Shot Neural Architecture Search

Author: Xia, Xin, Xiao, Xuefeng, Wang, Xing, and Zheng, Min
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural Architecture Search (NAS) has attracted growing interest. To reduce the search cost, recent work has explored weight sharing across models and made major progress in One-Shot NAS. However, it has been observed that a model with higher one-shot model accuracy does not necessarily perform better when stand-alone trained. To address this issue, in this paper, we propose Progressive Automatic Design of search space, named PAD-NAS. Unlike previous approaches where the same operation search space is shared by all the layers in the supernet, we formulate a progressive search strategy based on operation pruning and build a layer-wise operation search space. In this way, PAD-NAS can automatically design the operations for each layer and achieve a trade-off between search space quality and model diversity. During the search, we also take the hardware platform constraints into consideration for efficient neural network model deployment. Extensive experiments on ImageNet show that our method can achieve state-of-the-art performance., Comment: 10 pages, 7 figures
Published: 2020

36. Intestinal flora, intestinal metabolism, and intestinal immunity changes in complete Freud's adjuvant-rheumatoid arthritis C57BL/6 mice

Author: Liu, Mingfei, Li, Shirong, Cao, Ningning, Wang, Qingguo, Liu, Yuhao, Xu, Qianqian, Zhang, Lin, Sun, Chenghong, Xiao, Xuefeng, and Yao, Jingchun
Published: 2023
Full Text: View/download PDF

37. Investigating the use of polymers for eliminating the petroleum pollutants via molecular dynamics method: Analyze the interaction between the formaldehyde as a commonly chemical pollutant and the polypropylene

Author: Ma, Chao, Lei, Yuxi, Li, Weiyin, Xiao, Xuefeng, and Han, Han
Published: 2023
Full Text: View/download PDF

38. Jingfang granules ameliorate inflammation and immune disorders in mice exposed to low temperature and high humidity by restoring the dysregulation of gut microbiota and fecal metabolites

Author: Li, Shirong, Wu, Jieyi, Cao, Ningning, Wang, Qingguo, Zhang, Yuanyuan, Yang, Tianye, Miao, Yu, Pan, Lihong, Xiao, He, Liu, Mingfei, Sun, Chenghong, Yao, Jingchun, and Xiao, Xuefeng
Published: 2023
Full Text: View/download PDF

39. An Empirical Study of Propagation-based Methods for Video Object Segmentation

Author: Guo, Hengkai, Wang, Wenji, Guo, Guanjun, Li, Huaxia, Liu, Jiachen, He, Qian, and Xiao, Xuefeng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: While propagation-based approaches have achieved state-of-the-art performance for video object segmentation, the literature lacks a fair comparison of different methods using the same settings. In this paper, we carry out an empirical study for propagation-based methods. We view these approaches from a unified perspective and conduct detailed ablation study for core methods, input cues, multi-object combination and training strategies. With careful designs, our improved end-to-end memory networks achieve a global mean of 76.1 on DAVIS 2017 val set., Comment: The 2019 DAVIS Challenge on Video Object Segmentation - CVPR Workshops
Published: 2019

40. Pharmacokinetics, tissue distribution and excretion of six bioactive components from total glucosides picrorhizae rhizoma, as simultaneous determined by a UHPLC-MS/MS method

Author: Wu, Jieyi, Song, Zhaohui, Cai, Nan, Cao, Ningning, Wang, Qingguo, Xiao, Xuefeng, Yang, Xiaokun, He, Yi, and Zou, Shuxuan
Published: 2023
Full Text: View/download PDF

41. Icaritin induces resolution of inflammation by targeting cathepsin B to prevents mice from ischemia-reperfusion injury

Author: Sun, Chenghong, Cao, Ningning, Wang, Qingguo, Liu, Ning, Yang, Tianye, Li, Shirong, Pan, Lihong, Yao, Jingchun, Zhang, Li, Liu, Mingfei, Zhang, Guimin, Xiao, Xuefeng, and Liu, Changxiao
Published: 2023
Full Text: View/download PDF

42. Preparation, electrical, thermal and mechanical properties of near-stoichiometric lithium tantalate wafers

Author: Xiao, Xuefeng, Xu, Qingyan, Liang, Shuaijie, Zhang, Huan, Ma, Lingling, Hai, Lian, and Zhang, Xuefeng
Published: 2022
Full Text: View/download PDF

43. One-step construction of cubic-like NiS2@MoS2 nanocrystals for improved electrocatalytic performance

Author: Lei, Yuxi, Li, Weiyin, Xiao, Xuefeng, Zhang, Huan, Ma, Tianpeng, and Ma, Chao
Published: 2022
Full Text: View/download PDF

44. Design of a Very Compact CNN Classifier for Online Handwritten Chinese Character Recognition Using DropWeight and Global Pooling

Author: Xiao, Xuefeng, Yang, Yafeng, Ahmad, Tasweer, Jin, Lianwen, and Chang, Tianhai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Currently, owing to the ubiquity of mobile devices, online handwritten Chinese character recognition (HCCR) has become one of the suitable choice for feeding input to cell phones and tablet devices. Over the past few years, larger and deeper convolutional neural networks (CNNs) have extensively been employed for improving character recognition performance. However, its substantial storage requirement is a significant obstacle in deploying such networks into portable electronic devices. To circumvent this problem, we propose a novel technique called DropWeight for pruning redundant connections in the CNN architecture. It is revealed that the proposed method not only treats streamlined architectures such as AlexNet and VGGNet well but also exhibits remarkable performance for deep residual network and inception network. We also demonstrate that global pooling is a better choice for building very compact online HCCR systems. Experiments were performed on the ICDAR-2013 online HCCR competition dataset using our proposed network, and it is found that the proposed approach requires only 0.57 MB for storage, whereas state-of-the-art CNN-based methods require up to 135 MB; meanwhile the performance is decreased only by 0.91%., Comment: 5 pages, 2 figures, 2 tables
Published: 2017

45. Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition

Author: Xiao, Xuefeng, Jin, Lianwen, Yang, Yafeng, Yang, Weixin, Sun, Jun, and Chang, Tianhai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Like other problems in computer vision, offline handwritten Chinese character recognition (HCCR) has achieved impressive results using convolutional neural network (CNN)-based methods. However, larger and deeper networks are needed to deliver state-of-the-art results in this domain. Such networks intuitively appear to incur high computational cost, and require the storage of a large number of parameters, which renders them unfeasible for deployment in portable devices. To solve this problem, we propose a Global Supervised Low-rank Expansion (GSLRE) method and an Adaptive Drop-weight (ADW) technique to solve the problems of speed and storage capacity. We design a nine-layer CNN for HCCR consisting of 3,755 classes, and devise an algorithm that can reduce the networks computational cost by nine times and compress the network to 1/18 of the original size of the baseline model, with only a 0.21% drop in accuracy. In tests, the proposed algorithm surpassed the best single-network performance reported thus far in the literature while requiring only 2.3 MB for storage. Furthermore, when integrated with our effective forward implementation, the recognition of an offline character image took only 9.7 ms on a CPU. Compared with the state-of-the-art CNN model for HCCR, our approach is approximately 30 times faster, yet 10 times more cost efficient., Comment: 15 pages, 7 figures, 5 tables
Published: 2017

46. Flocculation of combined contaminants of dye and heavy metal by nano-chitosan flocculants

Author: Sun, Yongjun, Li, Deng, Lu, Xi, Sheng, Jinwei, Zheng, Xing, and Xiao, Xuefeng
Published: 2021
Full Text: View/download PDF

47. Flocculation of heavy metal by functionalized starch-based bioflocculants: Characterization and process evaluation

Author: Xiao, Xuefeng, Sun, Yongjun, Liu, Jianwen, and Zheng, Huaili
Published: 2021
Full Text: View/download PDF

48. Progressive Automatic Design of Search Space for One-Shot Neural Architecture Search

Author: Xia, Xin, primary, Xiao, Xuefeng, additional, and Wang, Xing, additional
Published: 2022
Full Text: View/download PDF

49. ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

Author: Yang, Rui, primary, Ma, Hailong, additional, Wu, Jie, additional, Tang, Yansong, additional, Xiao, Xuefeng, additional, Zheng, Min, additional, and Li, Xiu, additional
Published: 2022
Full Text: View/download PDF

50. Multi-granularity Distillation Scheme Towards Lightweight Semi-supervised Semantic Segmentation

Author: Qin, Jie, primary, Wu, Jie, additional, Li, Ming, additional, Xiao, Xuefeng, additional, Zheng, Min, additional, and Wang, Xingang, additional
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

319 results on '"Xiao, Xuefeng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources