708 results on '"Huang, Thomas S."'
Search Results
2. CDTD: A Large-Scale Cross-Domain Benchmark for Instance-Level Image-to-Image Translation and Domain Adaptive Object Detection
- Author
-
Shen, Zhiqiang, Huang, Mingyang, Shi, Jianping, Liu, Zechun, Maheshwari, Harsh, Zheng, Yutong, Xue, Xiangyang, Savvides, Marios, and Huang, Thomas S.
- Published
- 2021
- Full Text
- View/download PDF
3. Semi-online Multi-people Tracking by Re-identification
- Author
-
Lan, Long, Wang, Xinchao, Hua, Gang, Huang, Thomas S., and Tao, Dacheng
- Published
- 2020
- Full Text
- View/download PDF
4. Subspace Learning by ℓ0-Induced Sparsity
- Author
-
Yang, Yingzhen, Feng, Jiashi, Jojic, Nebojsa, Yang, Jianchao, and Huang, Thomas S.
- Published
- 2018
- Full Text
- View/download PDF
5. Multi-metric learning for multi-sensor fusion based classification
- Author
-
Zhang, Yanning, Zhang, Haichao, Nasrabadi, Nasser M., and Huang, Thomas S.
- Published
- 2013
- Full Text
- View/download PDF
6. Large-scale supervised similarity learning in networks
- Author
-
Chang, Shiyu, Qi, Guo-Jun, Yang, Yingzhen, Aggarwal, Charu C., Zhou, Jiayu, Wang, Meng, and Huang, Thomas S.
- Published
- 2016
- Full Text
- View/download PDF
7. RankCompete: Simultaneous ranking and clustering of information networks
- Author
-
Cao, Liangliang, Jin, Xin, Yin, Zhijun, Del Pozo, Andrey, Luo, Jiebo, Han, Jiawei, and Huang, Thomas S.
- Published
- 2012
- Full Text
- View/download PDF
8. Information Retrieval beyond the Text Document.
- Author
-
Rui, Yong, Ortega, Michael, and Huang, Thomas S.
- Abstract
Reports some of the progress made over the years toward exploring information beyond the text domain. Describes the Multimedia Analysis and Retrieval Systems (MARS), developed to increase access to non-textual information. Addresses the following aspects of MARS: (1) visual feature extraction; (2) retrieval models; (3) query reformulation techniques; (4) efficient execution speed performance; and (5) user interface considerations. (AEF)
- Published
- 1999
9. Continuation Techniques for a Certain Class of Analytic Functions
- Author
-
Huang, Thomas S.
- Published
- 1984
10. CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation
- Author
-
Fu, Yang, Yang, Linjie, Liu, Ding, Huang, Thomas S., and Shi, Humphrey
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,General Medicine - Abstract
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects and they suffer in the video scenario due to several distinct challenges such as motion blur and drastic appearance change. To eliminate ambiguities introduced by only using single-frame features, we propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information. The aggregation process is carefully designed with a new attention mechanism which significantly increases the discriminative power of the learned features. We further improve the tracking capability of our model through a siamese design by incorporating both feature similarities and spatial similarities. Experiments conducted on the YouTube-VIS dataset validate the effectiveness of proposed CompFeat. Our code will be available at https://github.com/SHI-Labs/CompFeat-for-Video-Instance-Segmentation., Accepted to AAAI 2021
- Published
- 2020
11. Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval
- Author
-
Fu, Yun, Li, Zhu, Huang, Thomas S., and Katsaggelos, Aggelos K.
- Published
- 2008
- Full Text
- View/download PDF
12. Models for Patch-Based Image Restoration
- Author
-
Das Gupta, Mithun, Rajaram, Shyamsundar, Petrovic, Nemanja, and Huang, Thomas S.
- Published
- 2009
- Full Text
- View/download PDF
13. Enhancing bilinear subspace learning by element rearrangement
- Author
-
Xu, Dong, Yan, Shuicheng, Lin, Stephen, Huang, Thomas S., and Chang, Shih-Fu
- Subjects
Algorithm ,Algorithms - Published
- 2009
14. A fast 2D shape recovery approach by fusing features and appearance
- Author
-
Zhu, Jianke, Lyu, Michael R., and Huang, Thomas S.
- Subjects
Algorithm ,Algorithms -- Usage ,Image processing -- Methods ,Machine vision -- Methods ,Mathematical optimization - Abstract
In this paper, we present a fusion approach to solve the nonrigid shape recovery problem, which takes advantage of both the appearance information and the local features. We have two major contributions. First, we propose a novel progressive finite Newton optimization scheme for the feature-based nonrigid surface detection problem, which is reduced to only solving a set of linear equations. The key is to formulate the nonrigid surface detection as an unconstrained quadratic optimization problem that has a closed-form solution for a given set of observations. Second, we propose a deformable Lucas-Kanade algorithm that triangulates the template image into small patches and constrains the deformation through the second-order derivatives of the mesh vertices. We formulate it into a sparse regularized least squares problem, which is able to reduce the computational cost and the memory requirement. The inverse compositional algorithm is applied to efficiently solve the optimization problem. We have conducted extensive experiments for performance evaluation on various environments, whose promising results show that the proposed algorithm is both efficient and effective. Index Terms--Image processing and computer vision, nonrigid detection, real-time deformable registration, nonrigid augmented reality, medical image registration.
- Published
- 2009
15. Hierarchical space-time model enabling efficient search for human actions
- Author
-
Ning, Huazhong, Han, Tony X., Liu, Dirk B. Walthen Ming, and Huang, Thomas S.
- Subjects
Algorithms -- Usage ,Image processing -- Methods ,Object recognition (Computers) -- Methods ,Pattern recognition -- Methods ,Algorithm ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
We propose a five-layer hierarchical space-time model (HSTM) for representing and searching human actions in videos. From a features point of view, both invariance and selectivity are desirable characteristics, which seem to contradict each other. To make these characteristics coexist, we introduce a coarse-to-fine search and verification scheme for action searching, based on the HSTM model. Because going through layers of the hierarchy corresponds to progressively turning the knob between invariance and selectivity, this strategy enables search for human actions ranging from rapid movements of sports to subtle motions of facial expressions. The introduction of the Histogram of Gabor Orientations feature makes the searching for actions go smoothly across the hierarchical layers of the HSTM model. The efficient matching is achieved by applying integral histograms to compute the features in the top two layers. The HSTM model was tested on three selected challenging video sequences and on the KTH human action database. And it achieved improvement over other state-of-the-art algorithms. These promising results validate that the HSTM model is both selective and robust for searching human actions. Index Terms--Action recognition, action search, hierarchical space-time model (HSTM), Histogram of Gabor Orientations (HIGO).
- Published
- 2009
16. Mode-kn factor analysis for image ensembles
- Author
-
Shuicheng Yan, Huan Wang, Jilin Tu, Xiaoou Tang, and Huang, Thomas S.
- Subjects
Discriminant analysis -- Usage ,Factor analysis -- Usage ,Image processing -- Analysis ,Tensors (Mathematics) -- Usage ,Business ,Computers ,Electronics ,Electronics and electrical industries - Published
- 2009
17. Ubiquitously supervised subspace learning
- Author
-
Jianchao Yang, Shuicheng Yan, and Huang, Thomas S.
- Subjects
Machine vision -- Analysis ,Dimensional analysis -- Usage ,Business ,Computers ,Electronics ,Electronics and electrical industries - Published
- 2009
18. Synchronized submanifold embedding for person - independent pose estimation and beyond
- Author
-
Shuicheng Yan, Huan Wang, Yun Fu, Jun Yan, Xiaoou Tang, and Huang, Thomas S.
- Subjects
Facial expression -- Analysis ,Image coding -- Analysis ,User interface -- Analysis ,User interface ,Business ,Computers ,Electronics ,Electronics and electrical industries - Published
- 2009
19. Correspondence propagation with weak priors
- Author
-
Huan Wang, Shuicheng Yan, Jianzhuang Liu, Xiaoou Tang, and Huang, Thomas S.
- Subjects
Machine vision -- Analysis ,Image processing -- Research ,Business ,Computers ,Electronics ,Electronics and electrical industries - Published
- 2009
20. Locating nose-tips and estimating head poses in images by Tensorposes
- Author
-
Tu, Jilin, Fu, Yun, and Huang, Thomas S.
- Subjects
Image processing -- Methods ,Principal components analysis -- Usage ,Decomposition (Mathematics) -- Methods ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
This paper introduces a head pose estimation system that automatically localizes the nose-tips of the faces and estimates head poses in images simultaneously. In the training stage, the nose-tips of the faces are first manually labeled. The appearance variations caused by head pose changes are then characterized by a Tensorposes model. Given an image with unknown head pose and nose-tip location, the nose-tip of the face is automatically localized in a coarse-to-fine fashion after the skin color segmentation. The head pose is also estimated simultaneously. The performance of our system is evaluated on the Pointing'04 head pose image data set. We first evaluate the classification performance of the Tensorposes models with image patches of the faces cropped according to the manually labeled nose-tip locations of the faces in the Pointing '04 data set. By leaving-one-person-out evaluation strategy, we obtain the optimal parameters of the Tensorposes model, and evaluate the discriminative power of the Tensorposes model built based on high order singular value decomposition (HOSVD) and multilinear independent component analysis (MICA), and naive principal component analysis (PCA) subspace models. It is shown Tensorposes model by HOSVD and MICA decomposition performs similarly good but much better than naive PCA subspace models. The Tensorposes model is then utilized to automatically localize nose-tip location in the testing image and to simultaneously estimate the head pose. The nose-tip localization and pose estimation accuracy of the proposed system are evaluated against the ground truth. Finally, cross-database evaluation of the performance of our system is carried out on Pointing'04 database, a selected subset of CMU PIE database, and some pictures from CLEAR'07 head pose evaluation database. The experiments show that our system generalizes reasonably well to the real-world scenarios. Index Terms--Face image, head pose estimation, nose-tip, Pointing'04, Tensorposes.
- Published
- 2009
21. A Survey of affect recognition methods: audio, visual, and spontaneous expressions
- Author
-
Zeng, Zhihong, Pantic, Maja, Roisman, Glenn I., and Huang, Thomas S.
- Subjects
Form perception -- Methods ,Form perception -- Surveys - Abstract
Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions, despite the fact that deliberate behavior differs in visual appearance, audio profile, and timing from spontaneously occurring behavior. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behavior have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis, including audiovisual fusion, linguistic and paralinguistic fusion, and multicue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next, we examine available approaches for solving the problem of machine understanding of human affective behavior and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology. Index Terms--Evaluation/methodology, human-centered computing, affective computing, introductory, survey.
- Published
- 2009
22. Convergent 2-D subspace learning with Null Space analysis
- Author
-
Xu, Dong, Yan, Shuicheng, Lin, Stephen, and Huang, Thomas S.
- Subjects
Topological spaces -- Evaluation ,Convergence (Mathematics) -- Evaluation ,Algorithms -- Usage ,Algorithm ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
Recent research has demonstrated the success of supervised dimensionality reduction algorithms 2DLDA and 2DMFA, which are based on the image-as-matrix representation, in small sample size cases. To solve the convergence problem in 2DLDA and 2DMFA, we propose in this work two new schemes, called Null Space based 2DLDA (NS2DLDA) and Null Space based 2DMFA (NS2DMFA), and apply them to the challenging multi-view face recognition task. First, we convert each 2-D face image (matrix) into a vector and compute the first projection matrix P1 from the null space of the intra-class scatter matrix, such that the samples from the same class are projected to the same point. Then the data are projected and reconstructed with P1. Finally, we re-organize the reconstructed datum into a matrix and then compute the second projection direction P2, in the form of a Kronecker product of two matrices, by maximizing the inter-class scatter. A proof of algorithmic convergence is provided. The experiments on two benchmark multi-view face databases, the CMU PIE and FERET databases, demonstrate that NS2DLDA outperforms Fisherface, Null Space LDA (NSLDA) and 2DLDA. Additionally, NS2DMFA is also demonstrated to be more accurate than MFA and 2DMFA for face recognition. Index Terms--LDA, MFA, multiview face recognition, null space LDA, 2DLDA, 2DMFA.
- Published
- 2008
23. Locality versus globality: query-driven localized linear models for facial image computing
- Author
-
Fu, Yun, Li, Zhu, Yuan, Junsong, Wu, Ying, and Huang, Thomas S.
- Subjects
Machine learning -- Research ,Linear models (Statistics) -- Usage ,Linear regression models -- Usage ,Algorithms -- Usage ,Algorithm ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
Conventional subspace learning or recent feature extraction methods consider globality as the key criterion to design discriminative algorithms for image classification. We demonstrate in this paper that applying the local manner in sample space, feature space, and learning space via linear subspace learning can sufficiently boost the discriminating power, as measured by discriminating power coefficient (DPC). The proposed solution achieves good classification accuracy gains and shows computationally efficient. Particularly, we approximate the global nonlinearity through a multimodal localized piecewise subspace learning framework, in which three locality criteria can work individually or jointly for any new subspace learning algorithm design. It turns out that most existing subspace learning methods can be unified in such a common framework embodying either the global or local learning manner. On the other hand, we address the problem of numerical difficulty in the large-size pattern classification case, where many local variations cannot be adequately handled by a single global model. By localizing the modeling, the classification error rate estimation is also localized and thus it appears to be more robust and flexible for the model selection among different model candidates. As a new algorithm design based on the proposed framework, the query-driven locally adaptive (QDLA) mixture-of-experts model for robust face recognition and head pose estimation is presented. Experiments demonstrate the local approach to be effective, robust, and fast for large size, multiclass, and multivariance data sets. Index Terms--Discriminating power coefficient (DPC), face recognition, globality, head pose estimation, human-centered computing (HCC), locality, mixture-of-experts model, subspace learning.
- Published
- 2008
24. Correlation metric for generalized feature extraction
- Author
-
Fu, Yun, Yan, Shuicheng, and Huang, Thomas S.
- Subjects
Object recognition (Computers) -- Analysis ,Pattern recognition -- Analysis ,Correlation (Statistics) -- Analysis ,Principal components analysis -- Usage - Abstract
Beyond conventional linear and kernel-based feature extraction, we present a more generalized formulation for feature extraction in this paper. Two representative algorithms using the correlation metric are proposed based on this formulation. Correlation Embedding Analysis (CEA), which incorporates both correlational mapping and discriminant analysis, boosts the discriminating power by mapping the data from a high-dimensional hypersphere onto another low-dimensional hypersphere and preserving the neighboring relations with local-sensitive graph modeling. Correlational Principal Component Analysis (CPCA) generalizes the Principal Component Analysis (PCA) algorithm to the case with data distributed on a high-dimensional hypersphere. Their advantages stem from two facts: 1) directly working on normalized data, which are often the outputs from data preprocessing, and 2) directly designed with the correlation metric, which is shown to be generally better than euclidean distance for classification purpose in many real-world applications. Extensive visual recognition experiments compared with existing feature extraction algorithms demonstrate the effectiveness of the proposed algorithms. Index Terms--Feature extraction, graph embedding, correlation embedding analysis, correlational principal component analysis, face recognition.
- Published
- 2008
25. Matrix-variate factor analysis and its applications
- Author
-
Xie, Xianchao, Yan, Shuicheng, Kwok, James T., and Huang, Thomas S.
- Subjects
Algorithms -- Usage ,Discriminant analysis -- Methods ,Factor analysis -- Methods ,Neural networks -- Research ,Algorithm ,Neural network ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
Factor analysis (FA) seeks to reveal the relationship between an observed vector variable and a latent variable of reduced dimension. It has been widely used in many applications involving high-dimensional data, such as image representation and face recognition. An intrinsic limitation of FA lies in its potentially poor performance when the data dimension is high, a problem known as curse of dimensionality. Motivated by the fact that images are inherently matrices, we develop, in this brief, an FA model for matrix-variate variables and present an efficient parameter estimation algorithm. Experiments on both toy and real-world image data demonstrate that the proposed matrix-variant FA model is more efficient and accurate than the classical FA approach, especially when the observed variable is high-dimensional and the samples available are limited. Index Terms--Conditional expectation maximization (EM), face recognition, factor analysis (FA), matrix.
- Published
- 2008
26. Image-based human age estimation by manifold learning and locally adjusted robust regression
- Author
-
Guodong Guo, Yun Fu, Dyer, Charles R., and Huang, Thomas S.
- Subjects
Human-computer interaction -- Research ,Image processing -- Analysis ,Regression analysis -- Usage ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
The article proposes locally adjusted robust regressor (LARR), a novel method for robust learning and prediction of aging patterns. The efficiency of LARR in age estimation is observed.
- Published
- 2008
27. Real-time multimodal human--avatar interaction
- Author
-
Fu, Yun, Li, Renxiang, Huang, Thomas S., and Danielsen, Mike
- Subjects
Human-computer interaction -- Research ,Real-time control -- Design and construction ,Real-time systems -- Design and construction ,Mobile communication systems -- Research ,Wireless communication systems -- Research ,Three-dimensional display systems -- Usage ,Visual communication -- Technology application ,Technology application ,Real-time system ,Wireless technology ,3D technology ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
This paper presents a novel real-time multimodal human-avatar interaction (RTM-HAI) framework with vision-based remote animation control (RAC). The framework is designed for both mobile and desktop avatar-based human-machine or human-human visual communications in real-world scenarios. Using 3-D components stored in the Java mobile 3-D (M3G) file format, the avatar models can be flexibly constructed and customized on the fly on any mobile devices or systems that support the M3G standard. For the RAC head tracker, we propose a 2-D real-time face detection/tracking strategy through an interactive loop, in which the detection and tracking complement each other for efficient and reliable face localization, tolerating extreme user movement. With the face location robustly tracked, the RAC head tracker selects a main user and estimates the user's head rolling, tilting, yawing, scaling, horizontal, and vertical motion in order to generate avatar animation parameters. The animation parameters can be used either locally or remotely and can be transmitted through socket over the network. In addition, it integrates audio-visual analysis and synthesis modules to realize multichannel and runtime animations, visual TTS and real-time viseme detection and rendering. The framework is recognized as an effective design for future realistic industrial products of humanoid kiosk and human-to-human mobile communication. Index Terms--Avatar, DAZ3D, head tracking, human-computer interaction, mobile 3-D (M3G), multimodal system, real-time multimodal human-avatar interaction (RTM-HAI), TTS, visual communication.
- Published
- 2008
28. Image classification using correlation tensor analysis
- Author
-
Yun Fu and Huang, Thomas S.
- Subjects
Correlation (Statistics) -- Analysis ,Embedded systems -- Usage ,Image coding -- Analysis ,Discriminant analysis ,Factor analysis ,Embedded system ,System on a chip ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
Correlation tensor analysis (CTA) is used for appearance-based discriminant subspace learning. Efficient dimensionality reduction, effective discriminative embedding and low computational cost are the main advantages of the CTA.
- Published
- 2008
29. Ranking with uncertain labels and its applications
- Author
-
Yan Shuicheng, Wang Huan, Liu Jianzhuang, Tang Xiao’ou, and Huang, Thomas S.
- Published
- 2007
- Full Text
- View/download PDF
30. Reconstruction and recognition of tensor-based objects with concurrent subspaces analysis
- Author
-
Xu, Dong, Yan, Shuicheng, Zhang, Lei, Lin, Stephen, Zhang, Hong-Jiang, and Huang, Thomas S.
- Subjects
Algorithms -- Usage ,Functional analysis -- Methods ,Principal components analysis -- Methods ,Electronic data processing -- Research ,Algorithm ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
Principal Components Analysis (PCA) has traditionally been utilized with data expressed in the form of 1-D vectors, but there exists much data such as gray-level images, video sequences, Gabor-filtered images and so on, that are intrinsically in the form of second or higher order tensors. For representations of image objects in their intrinsic form and order rather than concatenating all the object data into a single vector, we propose in this paper a new optimal object reconstruction criterion with which the information of a high-dimensional tensor is represented as a much lower dimensional tensor computed from projections to multiple concurrent subspaces. In each of these subspaces, correlations with respect to one of the tensor dimensions are reduced, enabling better object reconstruction performance. Concurrent subspaces analysis (CSA) is presented to efficiently learn these subspaces in an iterative manner. In contrast to techniques such as PCA which vectorize tensor data, CSA's direct use of data in tensor form brings an enhanced ability to learn a representative subspace and an increased number of available projection directions. These properties enable CSA to outperform traditional algorithms in the common case of small sample sizes, where CSA can be effective even with only a single sample per class. Extensive experiments on images of faces and digital numbers encoded as second or third order tensors demonstrate that the proposed CSA outperforms PCA-based algorithms in object reconstruction and object recognition. Index Terms--Concurrent subspaces analysis (CSA), dimensionality reduction, object reconstruction, object representation, principal components analysis (PCA).
- Published
- 2008
31. Joint face and head tracking inside multi-camera smart rooms
- Author
-
Zhang, Zhenqiu, Potamianos, Gerasimos, Senior, Andrew W., and Huang, Thomas S.
- Published
- 2007
- Full Text
- View/download PDF
32. Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation
- Author
-
Yu, Hanchao, Chen, Xiao, Shi, Humphrey, Chen, Terrence, Huang, Thomas S., and Sun, Shanhui
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Vision and Pattern Recognition (cs.CV) ,Image and Video Processing (eess.IV) ,FOS: Electrical engineering, electronic engineering, information engineering ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,Physics::Physics Education ,Electrical Engineering and Systems Science - Image and Video Processing ,Machine Learning (cs.LG) - Abstract
Cardiac motion estimation plays a key role in MRI cardiac feature tracking and function assessment such as myocardium strain. In this paper, we propose Motion Pyramid Networks, a novel deep learning-based approach for accurate and efficient cardiac motion estimation. We predict and fuse a pyramid of motion fields from multiple scales of feature representations to generate a more refined motion field. We then use a novel cyclic teacher-student training strategy to make the inference end-to-end and further improve the tracking performance. Our teacher model provides more accurate motion estimation as supervision through progressive motion compensations. Our student model learns from the teacher model to estimate motion in a single step while maintaining accuracy. The teacher-student knowledge distillation is performed in a cyclic way for a further performance boost. Our proposed method outperforms a strong baseline model on two public available clinical datasets significantly, evaluated by a variety of metrics and the inference time. New evaluation metrics are also proposed to represent errors in a clinically meaningful manner., Accepted by MICCAI2020
- Published
- 2020
33. Neural Sparse Representation for Image Restoration
- Author
-
Fan, Yuchen, Yu, Jiahui, Mei, Yiqun, Zhang, Yulun, Fu, Yun, Liu, Ding, and Huang, Thomas S.
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Inspired by the robustness and efficiency of sparse representation in sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks. Our method structurally enforces sparsity constraints upon hidden neurons. The sparsity constraints are favorable for gradient-based learning algorithms and attachable to convolution layers in various networks. Sparsity in neurons enables computation saving by only operating on non-zero components without hurting accuracy. Meanwhile, our method can magnify representation dimensionality and model capacity with negligible additional computation cost. Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks, including image super-resolution, image denoising, and image compression artifacts removal. Code is available at https://github.com/ychfan/nsr
- Published
- 2020
34. Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation
- Author
-
Wang, Zhonghao, Wei, Yunchao, Feris, Rogerior, Xiong, Jinjun, Hwu, Wen-Mei, Huang, Thomas S., and Shi, Humphrey
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Machine Learning (cs.LG) - Abstract
Learning segmentation from synthetic data and adapting to real data can significantly relieve human efforts in labelling pixel-level masks. A key challenge of this task is how to alleviate the data distribution discrepancy between the source and target domains, i.e. reducing domain shift. The common approach to this problem is to minimize the discrepancy between feature distributions from different domains through adversarial training. However, directly aligning the feature distribution globally cannot guarantee consistency from a local view (i.e. semantic-level), which prevents certain semantic knowledge learned on the source domain from being applied to the target domain. To tackle this issue, we propose a semi-supervised approach named Alleviating Semantic-level Shift (ASS), which can successfully promote the distribution consistency from both global and local views. Specifically, leveraging a small number of labeled data from the target domain, we directly extract semantic-level feature representations from both the source and the target domains by averaging the features corresponding to same categories advised by pixel-level masks. We then feed the produced features to the discriminator to conduct semantic-level adversarial learning, which collaborates with the adversarial learning from the global view to better alleviate the domain shift. We apply our ASS to two domain adaptation tasks, from GTA5 to Cityscapes and from Synthia to Cityscapes. Extensive experiments demonstrate that: (1) ASS can significantly outperform the current unsupervised state-of-the-arts by employing a small number of annotated samples from the target domain; (2) ASS can beat the oracle model trained on the whole target dataset by over 3 points by augmenting the synthetic source data with annotated samples from the target domain without suffering from the prevalent problem of overfitting to the source domain., CVPRW 2020
- Published
- 2020
35. Laplacian Denoising Autoencoder
- Author
-
Jiao, Jianbo, Bao, Linchao, Wei, Yunchao, He, Shengfeng, Shi, Honghui, Lau, Rynson, and Huang, Thomas S.
- Subjects
FOS: Computer and information sciences ,Computer Science::Machine Learning ,Computer Science - Machine Learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Machine Learning (cs.LG) - Abstract
While deep neural networks have been shown to perform remarkably well in many machine learning tasks, labeling a large amount of ground truth data for supervised training is usually very costly to scale. Therefore, learning robust representations with unlabeled data is critical in relieving human effort and vital for many downstream tasks. Recent advances in unsupervised and self-supervised learning approaches for visual data have benefited greatly from domain knowledge. Here we are interested in a more generic unsupervised learning framework that can be easily generalized to other domains. In this paper, we propose to learn data representations with a novel type of denoising autoencoder, where the noisy input data is generated by corrupting latent clean data in the gradient domain. This can be naturally generalized to span multiple scales with a Laplacian pyramid representation of the input data. In this way, the agent learns more robust representations that exploit the underlying data structures across multiple scales. Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach, compared to its counterpart with single-scale corruption and other approaches. Furthermore, we also demonstrate that the learned representations perform well when transferring to other downstream vision tasks.
- Published
- 2020
36. Formulating face verification with semidefinite programming
- Author
-
Shuicheng Yan, Jianzhuang Liu, Xiaoou Tang, and Huang, Thomas S.
- Subjects
Algorithms -- Usage ,Bayesian statistical decision theory -- Usage ,Computer programming -- Research ,Dimensional analysis -- Usage ,Algorithm ,Computer programming ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
Affine subspace for verification (ASV) is a novel algorithm for verifying the subspace learning techniques. Findings reveal the effectiveness of ASV in achieving encouraging face verification accuracy.
- Published
- 2007
37. Guest editors' introduction: human-centered computing - toward a human revolution
- Author
-
Jaimes, Alejandro, Gatica-Perez, Daniel, Sebe, Nicu, and Huang, Thomas S.
- Subjects
Computers and civilization -- Evaluation ,Human-computer interaction - Published
- 2007
38. Integrating discriminant and descriptive information for dimension reduction and classification
- Author
-
Yu, Jie, Tian, Qi, Rui, Ting, and Huang, Thomas S.
- Subjects
Artificial intelligence -- Usage ,Information storage and retrieval -- Analysis ,Image processing -- Analysis ,Object recognition (Computers) -- Analysis ,Pattern recognition -- Analysis ,Artificial intelligence ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
In this paper, a novel hybrid dimension reduction technique for classification is proposed based on the hybrid analysis of principal component analysis (PCA) and linear discriminant analysis (LDA). LDA is known for capturing the most discriminant features of the data in the projected space while PCA is known for preserving the most descriptive ones after projection. Our hybrid technique integrates discriminant and descriptive information and finds a richer set of alternatives beyond LDA and PCA in a 2-D parametric space, which fits a specific classification task and data distribution better. Theoretical study shows that our technique also alleviates the singularity problem of scatter matrix, which is caused by small training set, and increases the effective dimension of the projected subspace. In order to find the hybrid features adaptively and avoid exhaustive parameter searching, we further propose a boosted hybrid analysis method that incorporates a nonlinear boosting process to enhance a set of hybrid classifiers and combine them into a more accurate one. Compared with the other techniques that aim at combining PCA and LDA, our approaches are novel because our method finds alternatives to LDA and PCA in a 2-D parameter space and the boosting process provides enhancement and robust combination of the classifiers. Extensive experiments are conducted on benchmark and real image databases to compare our proposed methods with the state-of-the-art linear and nonlinear discriminant analysis techniques. The results show the superior performance of our hybrid analysis methods. Index Terms--Artificial intelligence, image classification, information retrieval, pattern recognition.
- Published
- 2007
39. Panoptic-DeepLab
- Author
-
Cheng, Bowen, Collins, Maxwell D., Zhu, Yukun, Liu, Ting, Huang, Thomas S., Adam, Hartwig, and Chen, Liang-Chieh
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. Our single Panoptic-DeepLab sets the new state-of-art at all three Cityscapes benchmarks, reaching 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set, and advances results on the other challenging Mapillary Vistas., Comment: This work is presented at ICCV 2019 Joint COCO and Mapillary Recognition Challenge Workshop
- Published
- 2019
40. Learning probabilistic classifiers for human–computer interaction applications
- Author
-
Sebe, Nicu, Cohen, Ira, Cozman, Fabio G., Gevers, Theo, and Huang, Thomas S.
- Published
- 2005
- Full Text
- View/download PDF
41. Adaptive Video Fast Forward
- Author
-
Petrovic, Nemanja, Jojic, Nebojsa, and Huang, Thomas S.
- Published
- 2005
- Full Text
- View/download PDF
42. Total variation models for variable lighting face recognition
- Author
-
Chen, Terrence, Yin, Wotao, Zhou, Xiang Sean, Comaniciu, Dorin, Sr., and Huang, Thomas S.
- Subjects
Biometric technology ,Biometry -- Research ,Machine vision -- Research ,Object recognition (Computers) -- Research ,Pattern recognition -- Research - Abstract
In this paper, we present the logarithmic total variation (LTV) model for face recognition under varying illumination, including natural lighting conditions, where we rarely know the strength, direction, or number of light sources. The proposed LTV model has the ability to factorize a single face image and obtain the illumination invariant facial structure, which is then used for face recognition. Our model is inspired by the SQI model but has better edge-preserving ability and simpler parameter selection. The merit of this model is that neither does it require any lighting assumption nor does it need any training. The LTV model reaches very high recognition rates in the tests using both Yale and CMU PIE face databases as well as a face database containing 765 subjects under outdoor lighting conditions. Index Terms--Face and gesture recognition, signal processing, image processing and computer vision, pattern analysis.
- Published
- 2006
43. Multicue HMM-UKF for real-time contour tracking
- Author
-
Chen, Yunqiang, Rui, Yong, and Huang, Thomas S.
- Subjects
Object recognition (Computers) -- Research ,Pattern recognition -- Research - Abstract
We propose an HMM model for contour detection based on multiple visual cues in spatial domain and improve it by joint probabilistic matching to reduce background clutter. It is further integrated with unscented Kalman filter to exploit object dynamics in nonlinear systems for robust contour tracking. Index Terms--Parametric contour, HMM, unscented Kalman filters, joint probabilistic matching.
- Published
- 2006
44. Motion analysis of articulated objects from monocular images
- Author
-
Zhang, Xiaoyun, Liu, Yuncai, and Huang, Thomas S.
- Subjects
Object recognition (Computers) -- Methods ,Pattern recognition -- Methods - Abstract
This paper presents a new method of motion analysis of articulated objects from feature point correspondences over monocular perspective images without imposing any constraints on motion. An articulated object is modeled as a kinematic chain consisting of joints and links, and its 3D joint positions are estimated within a scale factor using the connection relationship of two links over two or three images. Then, twists and exponential maps are employed to represent the motion of each link, including the general motion of the base link and the rotation of other links around their joints. Finally, constraints from image point correspondences, which are similar to that of the essential matrix in rigid motion, are developed to estimate the motion. In the algorithm, the characteristic of articulated motion, i.e., motion correlation among links, is applied to decrease the complexity of the problem and improve the robustness. A point pattern matching algorithm for articulated objects is also discussed in this paper. Simulations and experiments on real images show the correctness and efficiency of the algorithms. Index Terms--Articulated object, kinematic chain, motion estimation, exponential map, point pattern matching.
- Published
- 2006
45. Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning
- Author
-
Wu, Ying and Huang, Thomas S.
- Published
- 2004
- Full Text
- View/download PDF
46. Visualization and User-Modeling for Browsing Personal Photo Libraries
- Author
-
Moghaddam, Baback, Tian, Qi, Lesh, Neal, Shen, Chia, and Huang, Thomas S.
- Published
- 2004
- Full Text
- View/download PDF
47. Analyzing and capturing articulated hand motion in image sequences
- Author
-
Wu, Ying, Lin, John, and Huang, Thomas S.
- Subjects
Algorithm ,Biometric technology ,Technology application ,Algorithms -- Research ,Algorithms -- Technology application ,Machine vision -- Research ,Object recognition (Computers) -- Research ,Pattern recognition -- Research ,Biometry -- Research - Abstract
Capturing the human hand motion from video involves the estimation of the rigid global hand pose as well as the nonrigid finger articulation. The complexity induced by the high degrees of freedom of the articulated hand challenges many visual tracking techniques. For example, the particle filtering technique is plagued by the demanding requirement of a huge number of particles and the phenomenon of particle degeneracy. This paper presents a novel approach to tracking the articulated hand in video by learning and integrating natural hand motion priors. To cope with the finger articulation, this paper proposes a powerful sequential Monte Carlo tracking algorithm based on importance sampling techniques, where the importance function is based on an initial manifold model of the articulation configuration space learned from motion-captured data. In addition, this paper presents a divide-and-conquer strategy that decouples the hand poses and finger articulations and integrates them in an iterative framework to reduce the complexity of the problem. Our experiments show that this approach is effective and efficient for tracking the articulated hand. This approach can be extended to track other articulated targets. Index Terms--Motion, tracking, video analysis, statistical computing, probabilistic algorithms, face and gesture recognition.
- Published
- 2005
48. Relevance feedback in image retrieval: A comprehensive review
- Author
-
Zhou, Xiang Sean and Huang, Thomas S.
- Published
- 2003
- Full Text
- View/download PDF
49. Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction
- Author
-
Cohen, Ira, Cozman, Fabio G., Sebe, Nicu, Cirelo, Marcelo C., and Huang, Thomas S.
- Subjects
Machine vision -- Research - Abstract
Automatic classification is one of the basic tasks required in any pattern recognition and human computer interaction application. In this paper, we discuss training probabilistic classifiers with labeled and unlabeled data. We provide a new analysis that shows under what conditions unlabeled data can be used in learning to improve classification performance. We also show that, if the conditions are violated, using unlabeled data can be detrimental to classification performance. We discuss the implications of this analysis to a specific type of probabilistic classifiers, Bayesian networks, and propose a new structure learning algorithm that can utilize unlabeled data to improve classification. Finally, we show how the resulting algorithms are successfully employed in two applications related to human-computer interaction and pattern recognition: facial expression recognition and face detection. Index Terms--Semisupervised learning, generative models, facial expression recognition, face detection, unlabeled data, Bayesian network classifiers.
- Published
- 2004
50. A fused hidden Markov model with application to bimodal speech processing
- Author
-
Pan, Hao, Levinson, Stephen E., Huang, Thomas S., and Zhi-Pei Liang
- Subjects
Voice recognition -- Research ,Signal processing -- Research ,Markov processes -- Analysis ,Digital signal processor ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
A novel fused hidden Markov model (HMM) is presented for integrating tightly coupled time series, such as audio and visual features of speech. The resulting HMMs are then fused together using a probabilistic fusion model, which is optimal according to the maximum entropy principle and a maximum mutual information criterion.
- Published
- 2004
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.