Author: "Shen, Chunhua" / Publisher: springer nature / Topic: semantic segmentation - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shen, Chunhua"' showing total 5 results

Start Over Author "Shen, Chunhua" Topic semantic segmentation Publisher springer nature

5 results on '"Shen, Chunhua"'

1. Scaling Up Multi-domain Semantic Segmentation with Sentence Embeddings.

Author: Yin, Wei, Liu, Yifan, Shen, Chunhua, Sun, Baichuan, and van den Hengel, Anton
Subjects: COMPUTER vision, GENERALIZATION, ANNOTATIONS, PARAGRAPHS, TAXONOMY
Abstract: The state-of-the-art semantic segmentation methods have achieved impressive performance on predefined close-set individual datasets, but their generalization to zero-shot domains and unseen categories is limited. Labeling a large-scale dataset is challenging and expensive, Training a robust semantic segmentation model on multi-domains has drawn much attention. However, inconsistent taxonomies hinder the naive merging of current publicly available annotations. To address this, we propose a simple solution to scale up the multi-domain semantic segmentation dataset with less human effort. We replace each class label with a sentence embedding, which is a vector-valued embedding of a sentence describing the class. This approach enables the merging of multiple datasets from different domains, each with varying class labels and semantics. We merged publicly available noisy and weak annotations with the most finely annotated data, over 2 million images, which enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets, despite not using any images therefrom. Instead of manually tuning a consistent label space, we utilized a vector-valued embedding of short paragraphs to describe the classes. By fine-tuning the model on standard semantic segmentation datasets, we also achieve a significant improvement over the state-of-the-art supervised segmentation on NYUD-V2 (Silberman et al., in: European conference on computer vision, Springer, pp 746–760, 2012) and PASCAL-context (Everingham et al. in Int J Comput Visi 111(1):98–136, 2015) at 60 % and 65 % mIoU, respectively. Our method can segment unseen labels based on the closeness of language embeddings, showing strong generalization to unseen image domains and labels. Additionally, it enables impressive performance improvements in some adaptation applications, such as depth estimation and instance segmentation. Code is available at https://github.com/YvanYin/SSIW. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers.

Author: Zhang, Bowen, Liu, Liyang, Phan, Minh Hieu, Tian, Zhi, Shen, Chunhua, and Liu, Yifan
Subjects: TRANSFORMER models, PLAINS, MACHINE learning, VIDEO coding, AUTOMATED teller machines
Abstract: This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder–decoder framework and introduce SegViTv2. In this study, we introduce a novel Attention-to-Mask (ATM) module to design a lightweight decoder effective for plain ViT. The proposed ATM converts the global attention map into semantic masks for high-quality segmentation results. Our decoder outperforms popular decoder UPerNet using various ViT backbones while consuming only about 5 % of the computational cost. For the encoder, we address the concern of the relatively high computational cost in the ViT-based encoders and propose a Shrunk++ structure that incorporates edge-aware query-based down-sampling (EQD) and query-based up-sampling (QU) modules. The Shrunk++ structure reduces the computational cost of the encoder by up to 50 % while maintaining competitive performance. Furthermore, we propose to adapt SegViT for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge. Experiments show that our proposed SegViTv2 surpasses recent segmentation methods on three popular benchmarks including ADE20k, COCO-Stuff-10k and PASCAL-Context datasets. The code is available through the following link: https://github.com/zbwxp/SegVit. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. SPL-Net: Spatial-Semantic Patch Learning Network for Facial Attribute Recognition with Limited Labeled Data.

Author: Yan, Yan, Shu, Ying, Chen, Si, Xue, Jing-Hao, Shen, Chunhua, and Wang, Hanzi
Subjects: DEEP learning, ROTATIONAL motion
Abstract: Existing deep learning-based facial attribute recognition (FAR) methods rely heavily on large-scale labeled training data. Unfortunately, in many real-world applications, only limited labeled data are available, resulting in the performance deterioration of these methods. To address this issue, we propose a novel spatial-semantic patch learning network (SPL-Net), consisting of a multi-branch shared subnetwork (MSS), three auxiliary task subnetworks (ATS), and an FAR subnetwork, for attribute classification with limited labeled data. Considering the diversity of facial attributes, MSS includes a task-shared branch and four region branches, each of which contains cascaded dual cross attention modules to extract region-specific features. SPL-Net involves a two-stage learning procedure. In the first stage, MSS and ATS are jointly trained to perform three auxiliary tasks (i.e., a patch rotation task (PRT), a patch segmentation task (PST), and a patch classification task (PCT)), which exploit the spatial-semantic relationship on large-scale unlabeled facial data from various perspectives. Specifically, PRT encodes the spatial information of facial images based on self-supervised learning. PST and PCT respectively capture the pixel-level and image-level semantic information of facial images by leveraging a facial parsing model. Thus, a well-pretrained MSS is obtained. In the second stage, based on the pre-trained MSS, an FAR model is easily fine-tuned to predict facial attributes by requiring only a small amount of labeled data. Experimental results on challenging facial attribute datasets (including CelebA, LFWA, and MAAD) show the superiority of SPL-Net over several state-of-the-art methods in the case of limited labeled data. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Structured Binary Neural Networks for Image Recognition.

Author: Zhuang, Bohan, Shen, Chunhua, Tan, Mingkui, Chen, Peng, Liu, Lingqiao, and Reid, Ian
Subjects: *DEEP learning, *IMAGE recognition (Computer vision), *OBJECT recognition (Computer vision), *CONVOLUTIONAL neural networks, *MOBILE learning
Abstract: In this paper, we propose to train binarized convolutional neural networks (CNNs) that are of significant importance for deploying deep learning to mobile devices with limited power capacity and computing resources. Previous works on quantizing CNNs often seek to approximate the floating-point information of weights and/or activations using a set of discrete values. Such methods, termed value approximation here, typically are built on the same network architecture of the full-precision counterpart. Instead, we take a new "structured approximation" view for network quantization — it is possible and valuable to exploit flexible architecture transformation when learning low-bit networks, which can achieve even better performance than the original networks in some cases. In particular, we propose a "group decomposition" strategy, termed GroupNet, which divides a network into desired groups. Interestingly, with our GroupNet strategy, each full-precision group can be effectively reconstructed by aggregating a set of homogeneous binary branches. We also propose to learn effective connections among groups to improve the representation capability. To improve the model capacity, we propose to dynamically execute sparse binary branches conditioned on input features while preserving the computational cost. More importantly, the proposed GroupNet shows strong flexibility for a few vision tasks. For instance, we extend the GroupNet for accurate semantic segmentation by embedding the rich context into the binary structure. The proposed GroupNet also shows strong performance on object detection. Experiments on image classification, semantic segmentation, and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature. Moreover, the speedup and runtime memory cost evaluation comparing with related quantization strategies is analyzed on GPU platforms, which serves as a strong benchmark for further research. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

5. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation.

Author: Yu, Changqian, Gao, Changxin, Wang, Jingbo, Yu, Gang, Shen, Chunhua, and Sang, Nong
Subjects: SEMANTICS, DEEP learning
Abstract: Low-level details and high-level semantics are both essential to the semantic segmentation task. However, to speed up the model inference, current approaches almost always sacrifice the low-level details, leading to a considerable decrease in accuracy. We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for real-time semantic segmentation. For this purpose, we propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2). This architecture involves the following: (i) A detail branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) A semantics branch, with narrow channels and deep layers to obtain high-level semantic context. The detail branch has wide channel dimensions and shallow layers, while the semantics branch has narrow channel dimensions and deep layers. Due to the reduction in the channel capacity and the use of a fast-downsampling strategy, the semantics branch is lightweight and can be implemented by any efficient model. We design a guided aggregation layer to enhance mutual connections and fuse both types of feature representation. Moreover, a booster training strategy is designed to improve the segmentation performance without any extra inference cost. Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture shows favorable performance compared to several state-of-the-art real-time semantic segmentation approaches. Specifically, for a 2048 × 1024 input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce GTX 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy. The code and trained models are available online at https://git.io/BiSeNet. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Shen, Chunhua"'

1. Scaling Up Multi-domain Semantic Segmentation with Sentence Embeddings.

2. SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers.

3. SPL-Net: Spatial-Semantic Patch Learning Network for Facial Attribute Recognition with Limited Labeled Data.

4. Structured Binary Neural Networks for Image Recognition.

5. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

5 results on '"Shen, Chunhua"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources