Start Over

End-to-end weakly supervised semantic segmentation with reliable region mining.

Authors :: Zhang, Bingfeng
Xiao, Jimin
Wei, Yunchao
Huang, Kaizhu
Luo, Shan
Zhao, Yao
Source :: Pattern Recognition. Aug2022, Vol. 128, pN.PAG-N.PAG. 1p.
Publication Year :: 2022
Abstract: • We make an exten sion of our previous wok and design a more powerful end to end n etwork for weakly supervised semantic segmentation • We propose two new loss functions for utilizing the reliable labels, including a new dense energy loss and a batch based class distance loss. The former relies on shallow features, whilst the latter focuses on distinguishing high level s emantic features for different classes. • We design a new attention module to extract comprehensive global information. By using a re weighting technique, it can suppress dominant or noisy attention values and aggregate sufficient global information. • Our approach achieves a new state of the art performance for weakly supervised semantic segmentation. Weakly supervised semantic segmentation is a challenging task that only takes image-level labels as supervision but produces pixel-level predictions for testing. To address such a challenging task, most current approaches generate pseudo pixel masks first that are then fed into a separate semantic segmentation network. However, these two-step approaches suffer from high complexity and being hard to train as a whole. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into tiny reliable object/background regions. Such reliable regions are then directly served as ground-truth labels for the segmentation branch, where both global information and local information sub-branches are used to generate accurate pixel-level predictions. Furthermore, a new joint loss is proposed that considers both shallow and high-level features. Despite its apparent simplicity, our end-to-end solution achieves competitive mIoU scores (val : 65.4%, test : 65.3%) on Pascal VOC compared with the two-step counterparts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC 2012 dataset(val : 69.3%, test : 69.2%). Code is available at: https://github.com/zbf1991/RRM. [ABSTRACT FROM AUTHOR]