2,569 results on '"Huang, Thomas S."'
Search Results
2. Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection
- Author
-
Li, Jiachen, Cheng, Bowen, Feris, Rogerio, Xiong, Jinjun, Huang, Thomas S., Hwu, Wen-Mei, and Shi, Humphrey
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods, which limits their potential in competing with classic anchor-based models that are supported by well-designed assignment methods based on the Intersection-over-Union~(IoU) metric. In this paper, we present \textbf{Pseudo-Intersection-over-Union~(Pseudo-IoU)}: a simple metric that brings more standardized and accurate assignment rule into anchor-free object detection frameworks without any additional computational cost or extra parameters for training and testing, making it possible to further improve anchor-free object detection by utilizing training samples of good quality under effective assignment rules that have been previously applied in anchor-based methods. By incorporating Pseudo-IoU metric into an end-to-end single-stage anchor-free object detection framework, we observe consistent improvements in their performance on general object detection benchmarks such as PASCAL VOC and MSCOCO. Our method (single-model and single-scale) also achieves comparable performance to other recent state-of-the-art anchor-free methods without bells and whistles. Our code is based on mmdetection toolbox and will be made publicly available at https://github.com/SHI-Labs/Pseudo-IoU-for-Anchor-Free-Object-Detection., Comment: CVPR 2021 Workshop
- Published
- 2021
3. CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation
- Author
-
Fu, Yang, Yang, Linjie, Liu, Ding, Huang, Thomas S., and Shi, Humphrey
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects and they suffer in the video scenario due to several distinct challenges such as motion blur and drastic appearance change. To eliminate ambiguities introduced by only using single-frame features, we propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information. The aggregation process is carefully designed with a new attention mechanism which significantly increases the discriminative power of the learned features. We further improve the tracking capability of our model through a siamese design by incorporating both feature similarities and spatial similarities. Experiments conducted on the YouTube-VIS dataset validate the effectiveness of proposed CompFeat. Our code will be available at https://github.com/SHI-Labs/CompFeat-for-Video-Instance-Segmentation., Comment: Accepted to AAAI 2021
- Published
- 2020
4. Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation
- Author
-
Yu, Hanchao, Chen, Xiao, Shi, Humphrey, Chen, Terrence, Huang, Thomas S., and Sun, Shanhui
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Cardiac motion estimation plays a key role in MRI cardiac feature tracking and function assessment such as myocardium strain. In this paper, we propose Motion Pyramid Networks, a novel deep learning-based approach for accurate and efficient cardiac motion estimation. We predict and fuse a pyramid of motion fields from multiple scales of feature representations to generate a more refined motion field. We then use a novel cyclic teacher-student training strategy to make the inference end-to-end and further improve the tracking performance. Our teacher model provides more accurate motion estimation as supervision through progressive motion compensations. Our student model learns from the teacher model to estimate motion in a single step while maintaining accuracy. The teacher-student knowledge distillation is performed in a cyclic way for a further performance boost. Our proposed method outperforms a strong baseline model on two public available clinical datasets significantly, evaluated by a variety of metrics and the inference time. New evaluation metrics are also proposed to represent errors in a clinically meaningful manner., Comment: Accepted by MICCAI2020
- Published
- 2020
5. Neural Sparse Representation for Image Restoration
- Author
-
Fan, Yuchen, Yu, Jiahui, Mei, Yiqun, Zhang, Yulun, Fu, Yun, Liu, Ding, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Inspired by the robustness and efficiency of sparse representation in sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks. Our method structurally enforces sparsity constraints upon hidden neurons. The sparsity constraints are favorable for gradient-based learning algorithms and attachable to convolution layers in various networks. Sparsity in neurons enables computation saving by only operating on non-zero components without hurting accuracy. Meanwhile, our method can magnify representation dimensionality and model capacity with negligible additional computation cost. Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks, including image super-resolution, image denoising, and image compression artifacts removal. Code is available at https://github.com/ychfan/nsr
- Published
- 2020
6. Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining
- Author
-
Mei, Yiqun, Fan, Yuchen, Zhou, Yuqian, Huang, Lichao, Huang, Thomas S., and Shi, Humphrey
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing ,Statistics - Machine Learning - Abstract
Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of the current deep models have studied another inherent property of images: cross-scale feature correlation. In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution (LR) image. The performance of SISR is significantly improved by exhaustively integrating all possible priors. Extensive experiments demonstrate the effectiveness of the proposed CS-NL module by setting new state-of-the-arts on multiple SISR benchmarks., Comment: CVPR2020
- Published
- 2020
7. NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results
- Author
-
Zhang, Kai, Gu, Shuhang, Timofte, Radu, Shang, Taizhang, Dai, Qiuju, Zhu, Shengchen, Yang, Tong, Guo, Yandong, Jo, Younghyun, Yang, Sejong, Kim, Seon Joo, Zha, Lin, Jiang, Jiande, Gao, Xinbo, Lu, Wen, Liu, Jing, Yoon, Kwangjin, Jeon, Taegyun, Akita, Kazutoshi, Ooba, Takeru, Ukita, Norimichi, Luo, Zhipeng, Yao, Yuehan, Xu, Zhenyu, He, Dongliang, Wu, Wenhao, Ding, Yukang, Li, Chao, Li, Fu, Wen, Shilei, Li, Jianwei, Yang, Fuzhi, Yang, Huan, Fu, Jianlong, Kim, Byung-Hoon, Baek, JaeHyun, Ye, Jong Chul, Fan, Yuchen, Huang, Thomas S., Lee, Junyeop, Lee, Bokyeung, Min, Jungki, Kim, Gwantae, Lee, Kanghyu, Park, Jaihyun, Mykhailych, Mykola, Zhong, Haoyu, Shi, Yukai, Yang, Xiaojun, Yang, Zhijing, Lin, Liang, Zhao, Tongtong, Peng, Jinjia, Wang, Huibing, Jin, Zhi, Wu, Jiahao, Chen, Yifu, Shang, Chenming, Zhang, Huanrong, Min, Jeongki, S, Hrishikesh P, Puthussery, Densen, and C V, Jiji
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best perceptual quality and similar to the ground truth. The track had 280 registered participants, and 19 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution., Comment: CVPRW 2020
- Published
- 2020
8. Pyramid Attention Networks for Image Restoration
- Author
-
Mei, Yiqun, Fan, Yuchen, Zhang, Yulun, Yu, Jiahui, Zhou, Yuqian, Liu, Ding, Fu, Yun, Huang, Thomas S., and Shi, Humphrey
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing ,Statistics - Machine Learning - Abstract
Self-similarity refers to the image prior widely used in image restoration algorithms that small but similar patterns tend to occur at different locations and scales. However, recent advanced deep convolutional neural network based methods for image restoration do not take full advantage of self-similarities by relying on self-attention neural modules that only process information at the same scale. To solve this problem, we present a novel Pyramid Attention module for image restoration, which captures long-range feature correspondences from a multi-scale feature pyramid. Inspired by the fact that corruptions, such as noise or compression artifacts, drop drastically at coarser image scales, our attention module is designed to be able to borrow clean signals from their "clean" correspondences at the coarser levels. The proposed pyramid attention module is a generic building block that can be flexibly integrated into various neural architectures. Its effectiveness is validated through extensive experiments on multiple image restoration tasks: image denoising, demosaicing, compression artifact reduction, and super resolution. Without any bells and whistles, our PANet (pyramid attention module with simple network backbones) can produce state-of-the-art results with superior accuracy and visual quality. Our code will be available at https://github.com/SHI-Labs/Pyramid-Attention-Networks
- Published
- 2020
9. The 1st Agriculture-Vision Challenge: Methods and Results
- Author
-
Chiu, Mang Tik, Xu, Xingqian, Wang, Kai, Hobbs, Jennifer, Hovakimyan, Naira, Huang, Thomas S., Shi, Honghui, Wei, Yunchao, Huang, Zilong, Schwing, Alexander, Brunner, Robert, Dozier, Ivan, Dozier, Wyatt, Ghandilyan, Karen, Wilson, David, Park, Hyunseong, Kim, Junhee, Kim, Sungho, Liu, Qinghui, Kampffmeyer, Michael C., Jenssen, Robert, Salberg, Arnt B., Barbosa, Alexandre, Trevisan, Rodrigo, Zhao, Bingchen, Yu, Shaozuo, Yang, Siwei, Wang, Yin, Sheng, Hao, Chen, Xiao, Su, Jingyi, Rajagopal, Ram, Ng, Andrew, Huynh, Van Thong, Kim, Soo-Hyung, Na, In-Seop, Baid, Ujjwal, Innani, Shubham, Dutande, Prasad, Baheti, Bhakti, Talbar, Sanjay, and Tang, Jianyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agriculture-Vision Challenge Dataset was employed, which comprises of 21,061 aerial and multi-spectral farmland images. This paper provides a summary of notable methods and results in the challenge. Our submission server and leaderboard will continue to open for researchers that are interested in this challenge dataset and task; the link can be found here., Comment: CVPR 2020 Workshop
- Published
- 2020
10. Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation
- Author
-
Wang, Zhonghao, Wei, Yunchao, Feris, Rogerior, Xiong, Jinjun, Hwu, Wen-Mei, Huang, Thomas S., and Shi, Humphrey
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Learning segmentation from synthetic data and adapting to real data can significantly relieve human efforts in labelling pixel-level masks. A key challenge of this task is how to alleviate the data distribution discrepancy between the source and target domains, i.e. reducing domain shift. The common approach to this problem is to minimize the discrepancy between feature distributions from different domains through adversarial training. However, directly aligning the feature distribution globally cannot guarantee consistency from a local view (i.e. semantic-level), which prevents certain semantic knowledge learned on the source domain from being applied to the target domain. To tackle this issue, we propose a semi-supervised approach named Alleviating Semantic-level Shift (ASS), which can successfully promote the distribution consistency from both global and local views. Specifically, leveraging a small number of labeled data from the target domain, we directly extract semantic-level feature representations from both the source and the target domains by averaging the features corresponding to same categories advised by pixel-level masks. We then feed the produced features to the discriminator to conduct semantic-level adversarial learning, which collaborates with the adversarial learning from the global view to better alleviate the domain shift. We apply our ASS to two domain adaptation tasks, from GTA5 to Cityscapes and from Synthia to Cityscapes. Extensive experiments demonstrate that: (1) ASS can significantly outperform the current unsupervised state-of-the-arts by employing a small number of annotated samples from the target domain; (2) ASS can beat the oracle model trained on the whole target dataset by over 3 points by augmenting the synthetic source data with annotated samples from the target domain without suffering from the prevalent problem of overfitting to the source domain., Comment: CVPRW 2020
- Published
- 2020
11. Laplacian Denoising Autoencoder
- Author
-
Jiao, Jianbo, Bao, Linchao, Wei, Yunchao, He, Shengfeng, Shi, Honghui, Lau, Rynson, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
While deep neural networks have been shown to perform remarkably well in many machine learning tasks, labeling a large amount of ground truth data for supervised training is usually very costly to scale. Therefore, learning robust representations with unlabeled data is critical in relieving human effort and vital for many downstream tasks. Recent advances in unsupervised and self-supervised learning approaches for visual data have benefited greatly from domain knowledge. Here we are interested in a more generic unsupervised learning framework that can be easily generalized to other domains. In this paper, we propose to learn data representations with a novel type of denoising autoencoder, where the noisy input data is generated by corrupting latent clean data in the gradient domain. This can be naturally generalized to span multiple scales with a Laplacian pyramid representation of the input data. In this way, the agent learns more robust representations that exploit the underlying data structures across multiple scales. Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach, compared to its counterpart with single-scale corruption and other approaches. Furthermore, we also demonstrate that the learned representations perform well when transferring to other downstream vision tasks.
- Published
- 2020
12. Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation
- Author
-
Wang, Zhonghao, Yu, Mo, Wei, Yunchao, Feris, Rogerio, Xiong, Jinjun, Hwu, Wen-mei, Huang, Thomas S., and Shi, Humphrey
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work. State-of-the-art approaches prove that performing semantic-level alignment is helpful in tackling the domain shift issue. Based on the observation that stuff categories usually share similar appearances across images of different domains while things (i.e. object instances) have much larger differences, we propose to improve the semantic-level alignment with different strategies for stuff regions and for things: 1) for the stuff categories, we generate feature representation for each class and conduct the alignment operation from the target domain to the source domain; 2) for the thing categories, we generate feature representation for each individual instance and encourage the instance in the target domain to align with the most similar one in the source domain. In this way, the individual differences within thing categories will also be considered to alleviate over-alignment. In addition to our proposed method, we further reveal the reason why the current adversarial loss is often unstable in minimizing the distribution discrepancy and show that our method can help ease this issue by minimizing the most similar stuff and instance features between the source and the target domains. We conduct extensive experiments in two unsupervised domain adaptation tasks, i.e. GTA5 to Cityscapes and SYNTHIA to Cityscapes, and achieve the new state-of-the-art segmentation accuracy., Comment: CVPR 2020
- Published
- 2020
13. Deep Affinity Net: Instance Segmentation via Affinity
- Author
-
Xu, Xingqian, Chiu, Mang Tik, Huang, Thomas S., and Shi, Honghui
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Most of the modern instance segmentation approaches fall into two categories: region-based approaches in which object bounding boxes are detected first and later used in cropping and segmenting instances; and keypoint-based approaches in which individual instances are represented by a set of keypoints followed by a dense pixel clustering around those keypoints. Despite the maturity of these two paradigms, we would like to report an alternative affinity-based paradigm where instances are segmented based on densely predicted affinities and graph partitioning algorithms. Such affinity-based approaches indicate that high-level graph features other than regions or keypoints can be directly applied in the instance segmentation task. In this work, we propose Deep Affinity Net, an effective affinity-based approach accompanied with a new graph partitioning algorithm Cascade-GAEC. Without bells and whistles, our end-to-end model results in 32.4% AP on Cityscapes val and 27.5% AP on test. It achieves the best single-shot result as well as the fastest running time among all affinity-based models. It also outperforms the region-based method Mask R-CNN.
- Published
- 2020
14. AlignSeg: Feature-Aligned Segmentation Networks
- Author
-
Huang, Zilong, Wei, Yunchao, Wang, Xinggang, Liu, Wenyu, Huang, Thomas S., and Shi, Humphrey
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation. However, most of the current popular network architectures tend to ignore the misalignment issues during the feature aggregation process caused by 1) step-by-step downsampling operations, and 2) indiscriminate contextual information fusion. In this paper, we explore the principles in addressing such feature misalignment issues and inventively propose Feature-Aligned Segmentation Networks (AlignSeg). AlignSeg consists of two primary modules, i.e., the Aligned Feature Aggregation (AlignFA) module and the Aligned Context Modeling (AlignCM) module. First, AlignFA adopts a simple learnable interpolation strategy to learn transformation offsets of pixels, which can effectively relieve the feature misalignment issue caused by multiresolution feature aggregation. Second, with the contextual embeddings in hand, AlignCM enables each pixel to choose private custom contextual information in an adaptive manner, making the contextual embeddings aligned better to provide appropriate guidance. We validate the effectiveness of our AlignSeg network with extensive experiments on Cityscapes and ADE20K, achieving new state-of-the-art mIoU scores of 82.6% and 45.95%, respectively. Our source code will be made available., Comment: Accepted by TPAMI 2021
- Published
- 2020
15. Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis
- Author
-
Chiu, Mang Tik, Xu, Xingqian, Wei, Yunchao, Huang, Zilong, Schwing, Alexander, Brunner, Robert, Khachatrian, Hrant, Karapetyan, Hovnatan, Dozier, Ivan, Rose, Greg, Wilson, David, Tudor, Adrian, Hovakimyan, Naira, Huang, Thomas S., and Shi, Honghui
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computers and Society ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
The success of deep learning in visual recognition tasks has driven advancements in multiple fields of research. Particularly, increasing attention has been drawn towards its application in agriculture. Nevertheless, while visual pattern recognition on farmlands carries enormous economic values, little progress has been made to merge computer vision and crop sciences due to the lack of suitable agricultural image datasets. Meanwhile, problems in agriculture also pose new challenges in computer vision. For example, semantic segmentation of aerial farmland images requires inference over extremely large-size images with extreme annotation sparsity. These challenges are not present in most of the common object datasets, and we show that they are more challenging than many other aerial image datasets. To encourage research in computer vision for agriculture, we present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns. We collected 94,986 high-quality aerial images from 3,432 farmlands across the US, where each image consists of RGB and Near-infrared (NIR) channels with resolution as high as 10 cm per pixel. We annotate nine types of field anomaly patterns that are most important to farmers. As a pilot study of aerial agricultural semantic segmentation, we perform comprehensive experiments using popular semantic segmentation models; we also propose an effective model designed for aerial agricultural pattern recognition. Our experiments demonstrate several challenges Agriculture-Vision poses to both the computer vision and agriculture communities. Future versions of this dataset will include even more aerial images, anomaly patterns and image channels. More information at https://www.agriculture-vision.com., Comment: CVPR 2020
- Published
- 2020
16. Scale-wise Convolution for Image Restoration
- Author
-
Fan, Yuchen, Yu, Jiahui, Liu, Ding, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
While scale-invariant modeling has substantially boosted the performance of visual recognition tasks, it remains largely under-explored in deep networks based image restoration. Naively applying those scale-invariant techniques (e.g. multi-scale testing, random-scale data augmentation) to image restoration tasks usually leads to inferior performance. In this paper, we show that properly modeling scale-invariance into neural networks can bring significant benefits to image restoration performance. Inspired from spatial-wise convolution for shift-invariance, "scale-wise convolution" is proposed to convolve across multiple scales for scale-invariance. In our scale-wise convolutional network (SCN), we first map the input image to the feature space and then build a feature pyramid representation via bi-linear down-scaling progressively. The feature pyramid is then passed to a residual network with scale-wise convolutions. The proposed scale-wise convolution learns to dynamically activate and aggregate features from different input scales in each residual building block, in order to exploit contextual information on multiple scales. In experiments, we compare the restoration accuracy and parameter efficiency among our model and many different variants of multi-scale neural networks. The proposed network with scale-wise convolution achieves superior performance in multiple image restoration tasks including image super-resolution, image denoising and image compression artifacts removal. Code and models are available at: https://github.com/ychfan/scn_sr, Comment: AAAI 2020
- Published
- 2019
17. Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation
- Author
-
Cheng, Bowen, Collins, Maxwell D., Zhu, Yukun, Liu, Ting, Huang, Thomas S., Adam, Hartwig, and Chen, Liang-Chieh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, Panoptic-DeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. As a result, our single Panoptic-DeepLab simultaneously ranks first at all three Cityscapes benchmarks, setting the new state-of-art of 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set. Additionally, equipped with MobileNetV3, Panoptic-DeepLab runs nearly in real-time with a single 1025x2049 image (15.8 frames per second), while achieving a competitive performance on Cityscapes (54.1 PQ% on test set). On Mapillary Vistas test set, our ensemble of six models attains 42.7% PQ, outperforming the challenge winner in 2018 by a healthy margin of 1.5%. Finally, our Panoptic-DeepLab also performs on par with several top-down approaches on the challenging COCO dataset. For the first time, we demonstrate a bottom-up approach could deliver state-of-the-art results on panoptic segmentation., Comment: CVPR 2020
- Published
- 2019
18. Any-Precision Deep Neural Networks
- Author
-
Yu, Haichao, Li, Haoxiang, Shi, Honghui, Huang, Thomas S., and Hua, Gang
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
We present any-precision deep neural networks (DNNs), which are trained with a new method that allows the learned DNNs to be flexible in numerical precision during inference. The same model in runtime can be flexibly and directly set to different bit-widths, by truncating the least significant bits, to support dynamic speed and accuracy trade-off. When all layers are set to low-bits, we show that the model achieved accuracy comparable to dedicated models trained at the same precision. This nice property facilitates flexible deployment of deep learning models in real-world applications, where in practice trade-offs between model accuracy and runtime efficiency are often sought. Previous literature presents solutions to train models at each individual fixed efficiency/accuracy trade-off point. But how to produce a model flexible in runtime precision is largely unexplored. When the demand of efficiency/accuracy trade-off varies from time to time or even dynamically changes in runtime, it is infeasible to re-train models accordingly, and the storage budget may forbid keeping multiple models. Our proposed framework achieves this flexibility without performance degradation. More importantly, we demonstrate that this achievement is agnostic to model architectures and applicable to multiple vision tasks. Our code is released at https://github.com/SHI-Labs/Any-Precision-DNNs., Comment: AAAI 2021
- Published
- 2019
19. Panoptic-DeepLab
- Author
-
Cheng, Bowen, Collins, Maxwell D., Zhu, Yukun, Liu, Ting, Huang, Thomas S., Adam, Hartwig, and Chen, Liang-Chieh
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing ,Statistics - Machine Learning - Abstract
We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. Our single Panoptic-DeepLab sets the new state-of-art at all three Cityscapes benchmarks, reaching 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set, and advances results on the other challenging Mapillary Vistas., Comment: This work is presented at ICCV 2019 Joint COCO and Mapillary Recognition Challenge Workshop
- Published
- 2019
20. One-to-one Mapping for Unpaired Image-to-image Translation
- Author
-
Shen, Zengming, Zhou, S. Kevin, Chen, Yifan, Georgescu, Bogdan, Liu, Xuqi, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Recently image-to-image translation has attracted significant interests in the literature, starting from the successful use of the generative adversarial network (GAN), to the introduction of cyclic constraint, to extensions to multiple domains. However, in existing approaches, there is no guarantee that the mapping between two image domains is unique or one-to-one. Here we propose a self-inverse network learning approach for unpaired image-to-image translation. Building on top of CycleGAN, we learn a self-inverse function by simply augmenting the training samples by swapping inputs and outputs during training and with separated cycle consistency loss for each mapping direction. The outcome of such learning is a proven one-to-one mapping function. Our extensive experiments on a variety of datasets, including cross-modal medical image synthesis, object transfiguration, and semantic labeling, consistently demonstrate clear improvement over the CycleGAN method both qualitatively and quantitatively. Especially our proposed method reaches the state-of-the-art result on the cityscapes benchmark dataset for the label to photo unpaired directional image translation., Comment: Accepted by WACV 2020
- Published
- 2019
21. Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation
- Author
-
Shen, Zengming, Chen, Yifan, Zhou, S. Kevin, Georgescu, Bogdan, Liu, Xuqi, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
The one-to-one mapping is necessary for many bidirectional image-to-image translation applications, such as MRI image synthesis as MRI images are unique to the patient. State-of-the-art approaches for image synthesis from domain X to domain Y learn a convolutional neural network that meticulously maps between the domains. A different network is typically implemented to map along the opposite direction, from Y to X. In this paper, we explore the possibility of only wielding one network for bi-directional image synthesis. In other words, such an autonomous learning network implements a self-inverse function. A self-inverse network shares several distinct advantages: only one network instead of two, better generalization and more restricted parameter space. Most importantly, a self-inverse function guarantees a one-to-one mapping, a property that cannot be guaranteed by earlier approaches that are not self-inverse. The experiments on three datasets show that, compared with the baseline approaches that use two separate models for the image synthesis along two directions, our self-inverse network achieves better synthesis results in terms of standard metrics. Finally, our sensitivity analysis confirms the feasibility of learning a self-inverse function for the bidirectional image translation., Comment: 10 pages, 9 figures
- Published
- 2019
22. HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
- Author
-
Cheng, Bowen, Xiao, Bin, Wang, Jingdong, Shi, Honghui, Huang, Thomas S., and Zhang, Lei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene. The code and models are available at https://github.com/HRNet/Higher-HRNet-Human-Pose-Estimation., Comment: CVPR 2020
- Published
- 2019
23. FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary
- Author
-
Yang, Yingzhen, Yu, Jiahui, Jojic, Nebojsa, Huan, Jun, and Huang, Thomas S.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
We present a novel method of compression of deep Convolutional Neural Networks (CNNs) by weight sharing through a new representation of convolutional filters. The proposed method reduces the number of parameters of each convolutional layer by learning a 1D vector termed Filter Summary (FS). The convolutional filters are located in FS as overlapping 1D segments, and nearby filters in FS share weights in their overlapping regions in a natural way. The resultant neural network based on such weight sharing scheme, termed Filter Summary CNNs or FSNet, has a FS in each convolution layer instead of a set of independent filters in the conventional convolution layer. FSNet has the same architecture as that of the baseline CNN to be compressed, and each convolution layer of FSNet has the same number of filters from FS as that of the basline CNN in the forward process. With compelling computational acceleration ratio, the parameter space of FSNet is much smaller than that of the baseline CNN. In addition, FSNet is quantization friendly. FSNet with weight quantization leads to even higher compression ratio without noticeable performance loss. We further propose Differentiable FSNet where the way filters share weights is learned in a differentiable and end-to-end manner. Experiments demonstrate the effectiveness of FSNet in compression of CNNs for computer vision tasks including image classification and object detection, and the effectiveness of DFSNet is evidenced by the task of Neural Architecture Search., Comment: published at ICLR 2020
- Published
- 2019
24. An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity
- Author
-
Yang, Yingzhen, Yu, Jiahui, Li, Xingjian, Huan, Jun, and Huang, Thomas S.
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Regularization of Deep Neural Networks (DNNs) for the sake of improving their generalization capability is important and challenging. The development in this line benefits theoretical foundation of DNNs and promotes their usability in different areas of artificial intelligence. In this paper, we investigate the role of Rademacher complexity in improving generalization of DNNs and propose a novel regularizer rooted in Local Rademacher Complexity (LRC). While Rademacher complexity is well known as a distribution-free complexity measure of function class that help boost generalization of statistical learning methods, extensive study shows that LRC, its counterpart focusing on a restricted function class, leads to sharper convergence rates and potential better generalization given finite training sample. Our LRC based regularizer is developed by estimating the complexity of the function class centered at the minimizer of the empirical loss of DNNs. Experiments on various types of network architecture demonstrate the effectiveness of LRC regularization in improving generalization. Moreover, our method features the state-of-the-art result on the CIFAR-$10$ dataset with network architecture found by neural architecture search., Comment: Updated the link to the open source PaddlePaddle code of LRC Regularization as well as the author list
- Published
- 2019
25. CCNet: Criss-Cross Attention for Semantic Segmentation
- Author
-
Huang, Zilong, Wang, Xinggang, Wei, Yunchao, Huang, Lichao, Shi, Humphrey, Liu, Wenyu, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Contextual information is vital in visual understanding problems, such as semantic segmentation and object detection. We propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module harvests the contextual information of all the pixels on its criss-cross path. By taking a further recurrent operation, each pixel can finally capture the full-image dependencies. Besides, a category consistent loss is proposed to enforce the criss-cross attention module to produce more discriminative features. Overall, CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85% of the non-local block. 3) The state-of-the-art performance. We conduct extensive experiments on semantic segmentation benchmarks including Cityscapes, ADE20K, human parsing benchmark LIP, instance segmentation benchmark COCO, video segmentation benchmark CamVid. In particular, our CCNet achieves the mIoU scores of 81.9%, 45.76% and 55.47% on the Cityscapes test set, the ADE20K validation set and the LIP validation set respectively, which are the new state-of-the-art results. The source codes are available at \url{https://github.com/speedinghzl/CCNet}., Comment: IEEE TPAMI 2020 & ICCV 2019
- Published
- 2018
26. A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization
- Author
-
Cheng, Bowen, Wei, Yunchao, Yu, Jiahui, Chang, Shiyu, Xiong, Jinjun, Hwu, Wen-Mei, Huang, Thomas S., and Shi, Humphrey
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
While training on samples drawn from independent and identical distribution has been a de facto paradigm for optimizing image classification networks, humans learn new concepts in an easy-to-hard manner and on the selected examples progressively. Driven by this fact, we investigate the training paradigms where the samples are not drawn from independent and identical distribution. We propose a data sampling strategy, named Drop-and-Refresh (DaR), motivated by the learning behaviors of humans that selectively drop easy samples and refresh them only periodically. We show in our experiments that the proposed DaR strategy can maintain (and in many cases improve) the predictive accuracy even when the training cost is reduced by 15% on various datasets (CIFAR 10, CIFAR 100 and ImageNet) and with different backbone architectures (ResNets, DenseNets and MobileNets). Furthermore and perhaps more importantly, we find the ImageNet pre-trained models using our DaR sampling strategy achieves better transferability for the downstream tasks including object detection (+0.3 AP), instance segmentation (+0.3 AP), scene parsing (+0.5 mIoU) and human pose estimation (+0.6 AP). Our investigation encourages people to rethink the connections between the sampling strategy for training and the transferability of its learned features for pre-training ImageNet models., Comment: Technical report
- Published
- 2018
27. Connecting Image Denoising and High-Level Vision Tasks via Deep Learning
- Author
-
Liu, Ding, Wen, Bihan, Jiao, Jianbo, Liu, Xianming, Wang, Zhangyang, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image denoising and high-level vision tasks are usually handled independently in the conventional practice of computer vision, and their connection is fragile. In this paper, we cope with the two jointly and explore the mutual influence between them with the focus on two questions, namely (1) how image denoising can help improving high-level vision tasks, and (2) how the semantic information from high-level vision tasks can be used to guide image denoising. First for image denoising we propose a convolutional neural network in which convolutions are conducted in various spatial resolutions via downsampling and upsampling operations in order to fuse and exploit contextual information on different scales. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via back-propagation. We experimentally show that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network produces more visually appealing results. Extensive experiments demonstrate the benefit of exploiting image semantics simultaneously for image denoising and high-level vision tasks via deep learning. The code is available online: https://github.com/Ding-Liu/DeepDenoising, Comment: arXiv admin note: text overlap with arXiv:1706.04284
- Published
- 2018
28. Non-Local Recurrent Network for Image Restoration
- Author
-
Liu, Ding, Wen, Bihan, Fan, Yuchen, Loy, Chen Change, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Many classic methods have shown non-local self-similarity in natural images to be an effective prior for image restoration. However, it remains unclear and challenging to make use of this intrinsic property via deep networks. In this paper, we propose a non-local recurrent network (NLRN) as the first attempt to incorporate non-local operations into a recurrent neural network (RNN) for image restoration. The main contributions of this work are: (1) Unlike existing methods that measure self-similarity in an isolated manner, the proposed non-local module can be flexibly integrated into existing deep networks for end-to-end training to capture deep feature correlation between each location and its neighborhood. (2) We fully employ the RNN structure for its parameter efficiency and allow deep feature correlation to be propagated along adjacent recurrent states. This new design boosts robustness against inaccurate correlation estimation due to severely degraded images. (3) We show that it is essential to maintain a confined neighborhood for computing deep feature correlation given degraded images. This is in contrast to existing practice that deploys the whole image. Extensive experiments on both image denoising and super-resolution tasks are conducted. Thanks to the recurrent non-local operations and correlation propagation, the proposed NLRN achieves superior results to state-of-the-art methods with much fewer parameters., Comment: NIPS 2018
- Published
- 2018
29. Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation
- Author
-
Wei, Yunchao, Xiao, Huaxin, Shi, Honghui, Jie, Zequn, Feng, Jiashi, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Despite the remarkable progress, weakly supervised segmentation approaches are still inferior to their fully supervised counterparts. We obverse the performance gap mainly comes from their limitation on learning to produce high-quality dense object localization maps from image-level supervision. To mitigate such a gap, we revisit the dilated convolution [1] and reveal how it can be utilized in a novel way to effectively overcome this critical limitation of weakly supervised segmentation approaches. Specifically, we find that varying dilation rates can effectively enlarge the receptive fields of convolutional kernels and more importantly transfer the surrounding discriminative information to non-discriminative object regions, promoting the emergence of these regions in the object localization maps. Then, we design a generic classification network equipped with convolutional blocks of different dilated rates. It can produce dense and reliable object localization maps and effectively benefit both weakly- and semi- supervised semantic segmentation. Despite the apparent simplicity, our proposed approach obtains superior performance over state-of-the-arts. In particular, it achieves 60.8% and 67.6% mIoU scores on Pascal VOC 2012 test set in weakly- (only image-level labels are available) and semi- (1,464 segmentation masks are available) supervised settings, which are the new state-of-the-arts., Comment: Accepted by CVPR 2018 as Spotlight
- Published
- 2018
30. Image Super-Resolution via Dual-State Recurrent Networks
- Author
-
Han, Wei, Chang, Shiyu, Liu, Ding, Yu, Mo, Witbrock, Michael, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Advances in image super-resolution (SR) have recently benefited significantly from rapid developments in deep neural networks. Inspired by these recent discoveries, we note that many state-of-the-art deep SR architectures can be reformulated as a single-state recurrent neural network (RNN) with finite unfoldings. In this paper, we explore new structures for SR based on this compact RNN view, leading us to a dual-state design, the Dual-State Recurrent Network (DSRN). Compared to its single state counterparts that operate at a fixed spatial resolution, DSRN exploits both low-resolution (LR) and high-resolution (HR) signals jointly. Recurrent signals are exchanged between these states in both directions (both LR to HR and HR to LR) via delayed feedback. Extensive quantitative and qualitative evaluations on benchmark datasets and on a recent challenge demonstrate that the proposed DSRN performs favorably against state-of-the-art algorithms in terms of both memory consumption and predictive accuracy.
- Published
- 2018
31. Generative Image Inpainting with Contextual Attention
- Author
-
Yu, Jiahui, Lin, Zhe, Yang, Jimei, Shen, Xiaohui, Lu, Xin, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted structures or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness of convolutional neural networks in explicitly borrowing or copying information from distant spatial locations. On the other hand, traditional texture and patch synthesis approaches are particularly suitable when it needs to borrow textures from the surrounding regions. Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions. The model is a feed-forward, fully convolutional neural network which can process images with multiple holes at arbitrary locations and with variable sizes during the test time. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that our proposed approach generates higher-quality inpainting results than existing ones. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting., Comment: Accepted in CVPR 2018; add CelebA-HQ results; open sourced; interactive demo available: http://jhyu.me/demo
- Published
- 2018
32. Enhance Visual Recognition under Adverse Conditions via Deep Networks
- Author
-
Liu, Ding, Cheng, Bowen, Wang, Zhangyang, Zhang, Haichao, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Visual recognition under adverse conditions is a very important and challenging problem of high practical value, due to the ubiquitous existence of quality distortions during image acquisition, transmission, or storage. While deep neural networks have been extensively exploited in the techniques of low-quality image restoration and high-quality image recognition tasks respectively, few studies have been done on the important problem of recognition from very low-quality images. This paper proposes a deep learning based framework for improving the performance of image and video recognition models under adverse conditions, using robust adverse pre-training or its aggressive variant. The robust adverse pre-training algorithms leverage the power of pre-training and generalizes conventional unsupervised pre-training and data augmentation methods. We further develop a transfer learning approach to cope with real-world datasets of unknown adverse conditions. The proposed framework is comprehensively evaluated on a number of image and video recognition benchmarks, and obtains significant performance improvements under various single or mixed adverse conditions. Our visualization and analysis further add to the explainability of results.
- Published
- 2017
- Full Text
- View/download PDF
33. Dilated Recurrent Neural Networks
- Author
-
Chang, Shiyu, Zhang, Yang, Han, Wei, Yu, Mo, Guo, Xiaoxiao, Tan, Wei, Cui, Xiaodong, Witbrock, Michael, Hasegawa-Johnson, Mark, and Huang, Thomas S.
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Learning - Abstract
Learning with recurrent neural networks (RNNs) on long sequences is a notoriously difficult task. There are three major challenges: 1) complex dependencies, 2) vanishing and exploding gradients, and 3) efficient parallelization. In this paper, we introduce a simple yet effective RNN connection structure, the DilatedRNN, which simultaneously tackles all of these challenges. The proposed architecture is characterized by multi-resolution dilated recurrent skip connections and can be combined flexibly with diverse RNN cells. Moreover, the DilatedRNN reduces the number of parameters needed and enhances training efficiency significantly, while matching state-of-the-art performance (even with standard RNN cells) in tasks involving very long-term dependencies. To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures. We rigorously prove the advantages of the DilatedRNN over other recurrent neural architectures. The code for our method is publicly available at https://github.com/code-terminator/DilatedRNN, Comment: Accepted by NIPS 2017
- Published
- 2017
34. Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach
- Author
-
Cheng, Bowen, Wang, Zhangyang, Zhang, Zhaobin, Li, Zhu, Liu, Ding, Yang, Jianchao, Huang, Shuai, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Learning - Abstract
Emotion recognition from facial expressions is tremendously useful, especially when coupled with smart devices and wireless multimedia applications. However, the inadequate network bandwidth often limits the spatial resolution of the transmitted video, which will heavily degrade the recognition reliability. We develop a novel framework to achieve robust emotion recognition from low bit rate video. While video frames are downsampled at the encoder side, the decoder is embedded with a deep network model for joint super-resolution (SR) and recognition. Notably, we propose a novel max-mix training strategy, leading to a single "One-for-All" model that is remarkably robust to a vast range of downsampling factors. That makes our framework well adapted for the varied bandwidths in real transmission scenarios, without hampering scalability or efficiency. The proposed framework is evaluated on the AVEC 2016 benchmark, and demonstrates significantly improved stand-alone recognition performance, as well as rate-distortion (R-D) performance, than either directly recognizing from LR frames, or separating SR and recognition., Comment: Accepted by the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII2017)
- Published
- 2017
35. On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse Approximation
- Author
-
Yang, Yingzhen, Feng, Jiashi, Jojic, Nebojsa, Yang, Jianchao, and Huang, Thomas S.
- Subjects
Mathematics - Optimization and Control ,Computer Science - Learning - Abstract
We study the proximal gradient descent (PGD) method for $\ell^{0}$ sparse approximation problem as well as its accelerated optimization with randomized algorithms in this paper. We first offer theoretical analysis of PGD showing the bounded gap between the sub-optimal solution by PGD and the globally optimal solution for the $\ell^{0}$ sparse approximation problem under conditions weaker than Restricted Isometry Property widely used in compressive sensing literature. Moreover, we propose randomized algorithms to accelerate the optimization by PGD using randomized low rank matrix approximation (PGD-RMA) and randomized dimension reduction (PGD-RDR). Our randomized algorithms substantially reduces the computation cost of the original PGD for the $\ell^{0}$ sparse approximation problem, and the resultant sub-optimal solution still enjoys provable suboptimality, namely, the sub-optimal solution to the reduced problem still has bounded gap to the globally optimal solution to the original problem.
- Published
- 2017
36. Discriminative Similarity for Clustering and Semi-Supervised Learning
- Author
-
Yang, Yingzhen, Liang, Feng, Jojic, Nebojsa, Yan, Shuicheng, Feng, Jiashi, and Huang, Thomas S.
- Subjects
Statistics - Machine Learning ,Computer Science - Learning - Abstract
Similarity-based clustering and semi-supervised learning methods separate the data into clusters or classes according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose a novel discriminative similarity learning framework which learns discriminative similarity for either data clustering or semi-supervised learning. The proposed framework learns classifier from each hypothetical labeling, and searches for the optimal labeling by minimizing the generalization error of the learned classifiers associated with the hypothetical labeling. Kernel classifier is employed in our framework. By generalization analysis via Rademacher complexity, the generalization error bound for the kernel classifier learned from hypothetical labeling is expressed as the sum of pairwise similarity between the data from different classes, parameterized by the weights of the kernel classifier. Such pairwise similarity serves as the discriminative similarity for the purpose of clustering and semi-supervised learning, and discriminative similarity with similar form can also be induced by the integrated squared error bound for kernel density classification. Based on the discriminative similarity induced by the kernel classifier, we propose new clustering and semi-supervised learning methods.
- Published
- 2017
37. When Image Denoising Meets High-Level Vision Tasks: A Deep Learning Approach
- Author
-
Liu, Ding, Wen, Bihan, Liu, Xianming, Wang, Zhangyang, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Conventionally, image denoising and high-level vision tasks are handled separately in computer vision. In this paper, we cope with the two jointly and explore the mutual influence between them. First we propose a convolutional neural network for image denoising which achieves the state-of-the-art performance. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via back-propagation. We demonstrate that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network can generate more visually appealing results. To the best of our knowledge, this is the first work investigating the benefit of exploiting image semantics simultaneously for image denoising and high-level vision tasks via deep learning. The code is available online https://github.com/Ding-Liu/DeepDenoising., Comment: the 27th International Joint Conference on Artificial Intelligence (2018)
- Published
- 2017
38. Fast Generation for Convolutional Autoregressive Models
- Author
-
Ramachandran, Prajit, Paine, Tom Le, Khorrami, Pooya, Babaeizadeh, Mohammad, Chang, Shiyu, Zhang, Yang, Hasegawa-Johnson, Mark A., Campbell, Roy H., and Huang, Thomas S.
- Subjects
Computer Science - Learning ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
Convolutional autoregressive models have recently demonstrated state-of-the-art performance on a number of generation tasks. While fast, parallel training methods have been crucial for their success, generation is typically implemented in a na\"{i}ve fashion where redundant computations are unnecessarily repeated. This results in slow generation, making such models infeasible for production environments. In this work, we describe a method to speed up generation in convolutional autoregressive models. The key idea is to cache hidden states to avoid redundant computation. We apply our fast generation method to the Wavenet and PixelCNN++ models and achieve up to $21\times$ and $183\times$ speedups respectively., Comment: Accepted at ICLR 2017 Workshop
- Published
- 2017
39. Feedback Neural Network for Weakly Supervised Geo-Semantic Segmentation
- Author
-
Liu, Xianming, Zhang, Amy, Tiecke, Tobias, Gros, Andreas, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Learning from weakly-supervised data is one of the main challenges in machine learning and computer vision, especially for tasks such as image semantic segmentation where labeling is extremely expensive and subjective. In this paper, we propose a novel neural network architecture to perform weakly-supervised learning by suppressing irrelevant neuron activations. It localizes objects of interest by learning from image-level categorical labels in an end-to-end manner. We apply this algorithm to a practical challenge of transforming satellite images into a map of settlements and individual buildings. Experimental results show that the proposed algorithm achieves superior performance and efficiency when compared with various baseline models., Comment: 9 pages, 4 figures
- Published
- 2016
40. Fast Wavenet Generation Algorithm
- Author
-
Paine, Tom Le, Khorrami, Pooya, Chang, Shiyu, Zhang, Yang, Ramachandran, Prajit, Hasegawa-Johnson, Mark A., and Huang, Thomas S.
- Subjects
Computer Science - Sound ,Computer Science - Data Structures and Algorithms ,Computer Science - Learning - Abstract
This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naive implementation that has complexity O(2^L) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advantages of our fast implementation over a naive one. While this method is presented for Wavenet, the same scheme can be applied anytime one wants to perform autoregressive generation or online prediction using a model with dilated convolution layers. The code for our method is publicly available., Comment: Technical Report
- Published
- 2016
41. Deep Double Sparsity Encoder: Learning to Sparsify Not Only Features But Also Parameters
- Author
-
Wang, Zhangyang and Huang, Thomas S.
- Subjects
Computer Science - Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper emphasizes the significance to jointly exploit the problem structure and the parameter structure, in the context of deep modeling. As a specific and interesting example, we describe the deep double sparsity encoder (DDSE), which is inspired by the double sparsity model for dictionary learning. DDSE simultaneously sparsities the output features and the learned model parameters, under one unified framework. In addition to its intuitive model interpretation, DDSE also possesses compact model size and low complexity. Extensive simulations compare DDSE with several carefully-designed baselines, and verify the consistently superior performance of DDSE. We further apply DDSE to the novel application domain of brain encoding, with promising preliminary results achieved.
- Published
- 2016
42. Stacked Approximated Regression Machine: A Simple Deep Learning Approach
- Author
-
Wang, Zhangyang, Chang, Shiyu, Ling, Qing, Huang, Shuai, Hu, Xia, Shi, Honghui, and Huang, Thomas S.
- Subjects
Computer Science - Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
With the agreement of my coauthors, I Zhangyang Wang would like to withdraw the manuscript "Stacked Approximated Regression Machine: A Simple Deep Learning Approach". Some experimental procedures were not included in the manuscript, which makes a part of important claims not meaningful. In the relevant research, I was solely responsible for carrying out the experiments; the other coauthors joined in the discussions leading to the main algorithm. Please see the updated text for more details., Comment: This manuscript has been withdrawn by the authors. Please see the updated text for details
- Published
- 2016
43. Streaming Recommender Systems
- Author
-
Chang, Shiyu, Zhang, Yang, Tang, Jiliang, Yin, Dawei, Chang, Yi, Hasegawa-Johnson, Mark A., and Huang, Thomas S.
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Information Retrieval ,Computer Science - Learning - Abstract
The increasing popularity of real-world recommender systems produces data continuously and rapidly, and it becomes more realistic to study recommender systems under streaming scenarios. Data streams present distinct properties such as temporally ordered, continuous and high-velocity, which poses tremendous challenges to traditional recommender systems. In this paper, we investigate the problem of recommendation with stream inputs. In particular, we provide a principled framework termed sRec, which provides explicit continuous-time random process models of the creation of users and topics, and of the evolution of their interests. A variational Bayesian approach called recursive meanfield approximation is proposed, which permits computationally efficient instantaneous on-line inference. Experimental results on several real-world datasets demonstrate the advantages of our sRec over other state-of-the-arts.
- Published
- 2016
44. Learning A Deep $\ell_\infty$ Encoder for Hashing
- Author
-
Wang, Zhangyang, Yang, Yingzhen, Chang, Shiyu, Ling, Qing, and Huang, Thomas S.
- Subjects
Computer Science - Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
We investigate the $\ell_\infty$-constrained representation which demonstrates robustness to quantization errors, utilizing the tool of deep learning. Based on the Alternating Direction Method of Multipliers (ADMM), we formulate the original convex minimization problem as a feed-forward neural network, named \textit{Deep $\ell_\infty$ Encoder}, by introducing the novel Bounded Linear Unit (BLU) neuron and modeling the Lagrange multipliers as network biases. Such a structural prior acts as an effective network regularization, and facilitates the model initialization. We then investigate the effective use of the proposed model in the application of hashing, by coupling the proposed encoders under a supervised pairwise loss, to develop a \textit{Deep Siamese $\ell_\infty$ Network}, which can be optimized from end to end. Extensive experiments demonstrate the impressive performances of the proposed model. We also provide an in-depth analysis of its behaviors against the competitors., Comment: To be presented at IJCAI'16
- Published
- 2016
45. Seq-NMS for Video Object Detection
- Author
-
Han, Wei, Khorrami, Pooya, Paine, Tom Le, Ramachandran, Prajit, Babaeizadeh, Mohammad, Shi, Honghui, Li, Jianan, Yan, Shuicheng, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip. Recently, there have been major advances for doing object detection in a single image. These methods typically contain three phases: (i) object proposal generation (ii) object classification and (iii) post-processing. We propose a modification of the post-processing phase that uses high-scoring object detections from nearby frames to boost scores of weaker detections within the same clip. We show that our method obtains superior results to state-of-the-art single image object detection techniques. Our method placed 3rd in the video object detection (VID) task of the ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC2015)., Comment: Technical Report for Imagenet VID Competition 2015
- Published
- 2016
46. How Deep Neural Networks Can Improve Emotion Recognition on Video Data
- Author
-
Khorrami, Pooya, Paine, Tom Le, Brady, Kevin, Dagli, Charlie, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the system's overall performance. We present our findings on videos from the Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the effects of several hyperparameters on overall performance while also achieving superior performance to the baseline and other competing methods., Comment: Accepted at ICIP 2016. Fixed typo in Experiments section
- Published
- 2016
47. Studying Very Low Resolution Recognition Using Deep Networks
- Author
-
Wang, Zhangyang, Chang, Shiyu, Yang, Yingzhen, Liu, Ding, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Learning - Abstract
Visual recognition research often assumes a sufficient resolution of the region of interest (ROI). That is usually violated in practice, inspiring us to explore the Very Low Resolution Recognition (VLRR) problem. Typically, the ROI in a VLRR problem can be smaller than $16 \times 16$ pixels, and is challenging to be recognized even by human experts. We attempt to solve the VLRR problem using deep learning methods. Taking advantage of techniques primarily in super resolution, domain adaptation and robust regression, we formulate a dedicated deep learning method and demonstrate how these techniques are incorporated step by step. Any extra complexity, when introduced, is fully justified by both analysis and simulation results. The resulting \textit{Robust Partially Coupled Networks} achieves feature enhancement and recognition simultaneously. It allows for both the flexibility to combat the LR-HR domain mismatch, and the robustness to outliers. Finally, the effectiveness of the proposed models is evaluated on three different VLRR tasks, including face identification, digit recognition and font recognition, all of which obtain very impressive performances.
- Published
- 2016
48. $\mathbf{D^3}$: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images
- Author
-
Wang, Zhangyang, Liu, Ding, Chang, Shiyu, Ling, Qing, Yang, Yingzhen, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Learning - Abstract
In this paper, we design a Deep Dual-Domain ($\mathbf{D^3}$) based fast restoration model to remove artifacts of JPEG compressed images. It leverages the large learning capacity of deep networks, as well as the problem-specific expertise that was hardly incorporated in the past design of deep architectures. For the latter, we take into consideration both the prior knowledge of the JPEG compression scheme, and the successful practice of the sparsity-based dual-domain approach. We further design the One-Step Sparse Inference (1-SI) module, as an efficient and light-weighted feed-forward approximation of sparse coding. Extensive experiments verify the superiority of the proposed $D^3$ model over several state-of-the-art methods. Specifically, our best model is capable of outperforming the latest deep model for around 1 dB in PSNR, and is 30 times faster.
- Published
- 2016
49. Brain-Inspired Deep Networks for Image Aesthetics Assessment
- Author
-
Wang, Zhangyang, Chang, Shiyu, Dolcos, Florin, Beck, Diane, Liu, Ding, and Huang, Thomas S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
Image aesthetics assessment has been challenging due to its subjective nature. Inspired by the scientific advances in the human visual perception and neuroaesthetics, we design Brain-Inspired Deep Networks (BDN) for this task. BDN first learns attributes through the parallel supervised pathways, on a variety of selected feature dimensions. A high-level synthesis network is trained to associate and transform those attributes into the overall aesthetics rating. We then extend BDN to predicting the distribution of human ratings, since aesthetics ratings are often subjective. Another highlight is our first-of-its-kind study of label-preserving transformations in the context of aesthetics assessment, which leads to an effective data augmentation approach. Experimental results on the AVA dataset show that our biological inspired and task-specific BDN model gains significantly performance improvement, compared to other state-of-the-art models with the same or higher parameter capacity.
- Published
- 2016
50. Learning with $\ell^{0}$-Graph: $\ell^{0}$-Induced Sparse Subspace Clustering
- Author
-
Yang, Yingzhen, Feng, Jiashi, Yang, Jianchao, and Huang, Thomas S.
- Subjects
Computer Science - Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Sparse subspace clustering methods, such as Sparse Subspace Clustering (SSC) \cite{ElhamifarV13} and $\ell^{1}$-graph \cite{YanW09,ChengYYFH10}, are effective in partitioning the data that lie in a union of subspaces. Most of those methods use $\ell^{1}$-norm or $\ell^{2}$-norm with thresholding to impose the sparsity of the constructed sparse similarity graph, and certain assumptions, e.g. independence or disjointness, on the subspaces are required to obtain the subspace-sparse representation, which is the key to their success. Such assumptions are not guaranteed to hold in practice and they limit the application of sparse subspace clustering on subspaces with general location. In this paper, we propose a new sparse subspace clustering method named $\ell^{0}$-graph. In contrast to the required assumptions on subspaces for most existing sparse subspace clustering methods, it is proved that subspace-sparse representation can be obtained by $\ell^{0}$-graph for arbitrary distinct underlying subspaces almost surely under the mild i.i.d. assumption on the data generation. We develop a proximal method to obtain the sub-optimal solution to the optimization problem of $\ell^{0}$-graph with proved guarantee of convergence. Moreover, we propose a regularized $\ell^{0}$-graph that encourages nearby data to have similar neighbors so that the similarity graph is more aligned within each cluster and the graph connectivity issue is alleviated. Extensive experimental results on various data sets demonstrate the superiority of $\ell^{0}$-graph compared to other competing clustering methods, as well as the effectiveness of regularized $\ell^{0}$-graph.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.