21 results on '"Huang, Thomas"'
Search Results
2. AlignSeg: Feature-Aligned Segmentation Networks.
- Author
-
Huang, Zilong, Wei, Yunchao, Wang, Xinggang, Liu, Wenyu, Huang, Thomas S., and Shi, Humphrey
- Subjects
- *
COMPUTER architecture , *LEARNING strategies , *IMAGE segmentation , *INTERPOLATION , *PIXELS - Abstract
Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation. However, most of the current popular network architectures tend to ignore the misalignment issues during the feature aggregation process caused by step-by-step downsampling operations and indiscriminate contextual information fusion. In this paper, we explore the principles in addressing such feature misalignment issues and inventively propose Feature-Aligned Segmentation Networks (AlignSeg). AlignSeg consists of two primary modules, i.e., the Aligned Feature Aggregation (AlignFA) module and the Aligned Context Modeling (AlignCM) module. First, AlignFA adopts a simple learnable interpolation strategy to learn transformation offsets of pixels, which can effectively relieve the feature misalignment issue caused by multi-resolution feature aggregation. Second, with the contextual embeddings in hand, AlignCM enables each pixel to choose private custom contextual information adaptively, making the contextual embeddings be better aligned. We validate the effectiveness of our AlignSeg network with extensive experiments on Cityscapes and ADE20K, achieving new state-of-the-art mIoU scores of 82.6 and 45.95 percent, respectively. Our source code is available at https://github.com/speedinghzl/AlignSeg. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. A Fast 2D Shape Recovery Approach by Fusing Features and Appearance.
- Author
-
Jianke Zhu, Lyu, Michael R., and Huang, Thomas S.
- Subjects
- *
MATHEMATICAL optimization , *COMPUTER vision , *IMAGE processing , *MATHEMATICAL statistics , *ESTIMATION theory , *ALGORITHMS - Abstract
In this paper, we present a fusion approach to solve the nonrigid shape recovery problem, which takes advantage of both the appearance information and the local features. We have two major contributions. First, we propose a novel progressive finite Newton optimization scheme for the feature-based nonrigid surface detection problem, which is reduced to only solving a set of linear equations. The key is to formulate the nonrigid surface detection as an unconstrained quadratic optimization problem that has a closed-form solution for a given set of observations. Second, we propose a deformable Lucas-Kanade algorithm that triangulates the template image into small patches and constrains the deformation through the second-order derivatives of the mesh vertices. We formulate it into a sparse regularized least squares problem, which is able to reduce the computational cost and the memory requirement. The inverse compositional algorithm is applied to efficiently solve the optimization problem. We have conducted extensive experiments for performance evaluation on various environments, whose promising results show that the proposed algorithm is both efficient and effective. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
4. Correlation Metric for Generalized Feature Extraction.
- Author
-
Yun Fu, Shuicheng Yan, and Huang, Thomas S.
- Subjects
- *
FEATURE extraction , *ELECTRONIC data processing , *PATTERN recognition systems , *IMAGE processing , *HUMAN facial recognition software , *STATISTICAL correlation - Abstract
Beyond conventional linear and kernel-based feature extraction, we present a more generalized formulation for feature extraction in this paper. Two representative algorithms using the correlation metric are proposed based on this formulation. Correlation Embedding Analysis (CEA), which incorporates both correlational mapping and discriminant analysis, boosts the discriminating power by mapping the data from a high-dimensional hypersphere onto another low-dimensional hypersphere and preserving the neighboring relations with local-sensitive graph modeling. Correlational Principal Component Analysis (CPCA) generalizes the Principal Component Analysis (PCA) algorithm to the case with data distributed on a high-dimensional hypersphere. Their advantages stem from two facts: 1) directly working on normalized data, which are often the outputs from data preprocessing, and 2) directly designed with the correlation metric, which is shown to be generally better than euclidean distance for classification purpose in many real-world applications. Extensive visual recognition experiments compared with existing feature extraction algorithms demonstrate the effectiveness of the proposed algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
5. Multicue HMM-UKF for Real-Time Contour Tracking.
- Author
-
Yunqiang Chen, Yong Rui, and Huang, Thomas S.
- Subjects
- *
CONTOURS (Cartography) , *TRACKING shot (Cinematography) , *KALMAN filtering , *NONLINEAR systems , *ESTIMATION theory , *PROBABILITY theory - Abstract
We propose an HMM model for contour detection based on multiple visual cues in spatial domain and improve it by joint probabilistic matching to reduce background clutter. It is further integrated with unscented Kalman filter to exploit object dynamics in nonlinear systems for robust contour tracking. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
6. Motion Analysis of Articulated Objects from Monocular Images.
- Author
-
Xiaoyun Zhang, Yuncai Liu, and Huang, Thomas S.
- Subjects
- *
ALGORITHMS , *MONOCULARS , *SIMULATION methods & models , *KINEMATICS , *CONSTRAINT satisfaction , *ARTIFICIAL intelligence - Abstract
This paper presents a new method of motion analysis of articulated objects from feature point correspondences over monocular perspective images without imposing any constraints on motion. An articulated object is modeled as a kinematic chain consisting of joints and links, and its 3D joint positions are estimated within a scale factor using the connection relationship of two links over two or three images. Then, twists and exponential maps are employed to represent the motion of each link, including the general motion of the base link and the rotation of other links around their joints. Finally, constraints from image point correspondences, which are similar to that of the essential matrix in rigid motion, are developed to estimate the motion. In the algorithm, the characteristic of articulated motion, i.e., motion correlation among links, is applied to decrease the complexity of the problem and improve the robustness. A point pattern matching algorithm for articulated objects is also discussed in this paper. Simulations and experiments on real images show the correctness and efficiency of the algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
7. Analyzing and Capturing Articulated Hand Motion in Image Sequences.
- Author
-
Ying Wu, John Lin, and Huang, Thomas S.
- Subjects
- *
HUMAN mechanics , *MONTE Carlo method , *DEGREES of freedom , *OPTICAL images , *ALGORITHMS , *VIDEOS - Abstract
Capturing the human hand motion from video involves the estimation of the rigid global hand pose as well as the nonrigid finger articulation. The complexity induced by the high degrees of freedom of the articulated hand challenges many visual tracking techniques. For example, the particle filtering technique is plagued by the demanding requirement of a huge number of particles and the phenomenon of particle degeneracy. This paper presents a novel approach to tracking the articulated hand in video by learning and integrating natural hand motion priors. To cope with the finger articulation, this paper proposes a powerful sequential Monte Carlo tracking algorithm based on importance sampling techniques, where the importance function is based on an initial manifold model of the articulation configuration space learned from motion-captured data. In addition, this paper presents a divide-and-conquer strategy that decouples the hand poses and finger articulations and integrates them in an iterative framework to reduce the complexity of the problem. Our experiments show that this approach is effective and efficient for tracking the articulated hand. This approach can be extended to track other articulated targets. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
8. Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes.
- Author
-
Qi, Guo-Jun, Liu, Wei, Aggarwal, Charu, and Huang, Thomas
- Subjects
- *
OPTICAL images , *LEARNING , *MODAL logic , *SEMANTICS , *IMAGE databases - Abstract
In this paper, we present a label transfer model from texts to images for image classification tasks. The problem of image classification is often much more challenging than text classification. On one hand, labeled text data is more widely available than the labeled images for classification tasks. On the other hand, text data tends to have natural semantic interpretability, and they are often more directly related to class labels. On the contrary, the image features are not directly related to concepts inherent in class labels. One of our goals in this paper is to develop a model for revealing the functional relationships between text and image features as to directly transfer intermodal and intramodal labels to annotate the images. This is implemented by learning a transfer function as a bridge to propagate the labels between two multimodal spaces. However, the intermodal label transfers could be undermined by blindly transferring the labels of noisy texts to annotate images. To mitigate this problem, we present an intramodal label transfer process, which complements the intermodal label transfer by transferring the image labels instead when relevant text is absent from the source corpus. In addition, we generalize the inter-modal label transfer to zero-shot learning scenario where there are only text examples available to label unseen classes of images without any positive image examples. We evaluate our algorithm on an image classification task and show the effectiveness with respect to the other compared algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
9. Guest Editors' Introduction to the Special Section on Graphical Models in Computer Vision.
- Author
-
Rehg, James M., Pavlovic, Vladimir, Huang, Thomas S., and Freeman, William T.
- Subjects
- *
COMPUTER vision , *GRAPH theory , *HUMAN mechanics , *IMAGE reconstruction - Abstract
Introduces articles about graphical models in computer vision published in the July 2003 issue of the periodical 'IEEE Transactions on Pattern Analysis and Machine Intelligence.' Stereo matching using belief propagation; Unsupervised learning of human motion; Decision making and uncertainty management in a three-dimensional reconstruction system.
- Published
- 2003
- Full Text
- View/download PDF
10. Enhancing Bilinear Subspace Learning by Element Rearrangement.
- Author
-
Dong Xu, Shuicheng Yan, Lin, Stephen, Huang, Thomas S., and Shih-Fu Chang
- Subjects
- *
ALGORITHMS , *ITERATIVE methods (Mathematics) , *STATISTICAL correlation , *INTEGER programming , *DATA compression - Abstract
The success of bilinear subspace learning heavily depends on reducing correlations among features along rows and columns of the data matrices. In this work, we study the problem of rearranging elements within a matrix in order to maximize these correlations so that information redundancy in matrix data can be more extensively removed by existing bilinear subspace learning algorithms. An efficient iterative algorithm is proposed to tackle this essentially integer programming problem. In each step, the matrix structure is refined with a constrained Earth Mover's Distance procedure that incrementally rearranges matrices to become more similar to their low-rank approximations, which have high correlation among features along rows and columns. In addition, we present two extensions of the algorithm for conducting supervised bilinear subspace learning. Experiments in both unsupervised and supervised bilinear subspace learning demonstrate the effectiveness of our proposed algorithms in improving data compression performance and classification accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
11. Guest Editors' Introduction to the Special Section on Probabilistic Graphical Models.
- Author
-
Qiang Ji, Luo, Jiebo, Metaxas, Dimitris, Torralba, Antonio, Huang, Thomas S., and Sudderth, Erik B.
- Subjects
- *
PATTERN recognition systems , *THREE-dimensional imaging - Abstract
The article discusses various reports published within the issue including one on the introduction of an integrated system for recognizing actions and objects, another on a probabilistic framework for three-dimensional (3D) visual object representation and one on the detection of repetitive patterns within deformed 2D lattices.
- Published
- 2009
- Full Text
- View/download PDF
12. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions.
- Author
-
Zhihong Zeng, Pantic, Maja, Roisman, Glenn I., and Huang, Thomas S.
- Subjects
- *
HUMAN behavior , *ARTIFICIAL intelligence , *EMOTIONS , *PATTERN recognition systems , *PATTERN perception - Abstract
Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions, despite the tact that deliberate behavior differs in visual appearance, audio profile, and timing from spontaneously occurring behavior. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behavior have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis, including audiovisual fusion, linguistic and paralinguistic fusion, and multicue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next, we examine available approaches for solving the problem of machine understanding of human affective behavior and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
13. Total Variation Models for Variable Lighting Face Recognition.
- Author
-
Chen, Terrence, Wotao Yin, Xiang Sean Zhou, Comaniciu Sr., Dorin, and Huang, Thomas S.
- Subjects
- *
LOGARITHMIC functions , *FACE perception , *PERCEPTUAL-motor processes , *SIGNAL processing , *LIGHT sources , *AVAILABLE light photography - Abstract
In this paper, we present the logarithmic total variation (LTV) model for face recognition under varying illumination, including natural lighting conditions, where we rarely know the strength, direction, or number of light sources. The proposed LTV model has the ability to factorize a single face image and obtain the illumination invariant facial structure, which is then used for face recognition. Our model is inspired by the SQI model but has better edge-preserving ability and simpler parameter selection. The merit of this model is that neither does it require any lighting assumption nor does it need any training. The LTV model reaches very high recognition rates in the tests using both Yale and CMU PIE face databases as well as a face database containing 765 subjects under outdoor lighting conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
14. Semisupervised Learning of Classifiers: Theory, Algorithm and Their Application to Humane Computer Interaction.
- Author
-
Cohen, Ira, Cozman, Fabio G., Sebe, Nicu, Cirelo, Marcelo C., and Huang, Thomas S.
- Subjects
- *
AUTOMATIC classification , *CLASSIFICATION , *COMPUTERS , *ALGORITHMS , *FACIAL expression , *ARTIFICIAL intelligence - Abstract
Automatic classification is one of the basic tasks required in any pattern recognition and human computer interaction application. In this paper, we discuss training probabilistic classifiers with labeled and unlabeled data. We provide a new analysis that shows under what conditions unlabeled data can be used in learning to improve classification performance. We also show that, if the conditions are violated, using unlabeled data can be detrimental to classification performance. We discuss the implications of this analysis to a specific type of probabilistic classifiers, Bayesian networks, and propose a new structure learning algorithm that can utilize unlabeled data to improve classification. Finally, we show how the resulting algorithms are successfully employed in two applications related to human-Computer interaction and pattern recognition: facial expression recognition and face detection. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
15. Age Synthesis and Estimation via Faces: A Survey.
- Author
-
Fu, Yun, Guo, Guodong, and Huang, Thomas S.
- Abstract
Human age, as an important personal trait, can be directly inferred by distinct patterns emerging from the facial appearance. Derived from rapid advances in computer graphics and machine vision, computer-based age synthesis and estimation via faces have become particularly prevalent topics recently because of their explosively emerging real-world applications, such as forensic art, electronic customer relationship management, security control and surveillance monitoring, biometrics, entertainment, and cosmetology. Age synthesis is defined to rerender a face image aesthetically with natural aging and rejuvenating effects on the individual face. Age estimation is defined to label a face image automatically with the exact age (year) or the age group (year range) of the individual face. Because of their particularity and complexity, both problems are attractive yet challenging to computer-based application system designers. Large efforts from both academia and industry have been devoted in the last a few decades. In this paper, we survey the complete state-of-the-art techniques in the face image-based age synthesis and estimation topics. Existing models, popular algorithms, system performances, technical difficulties, popular face aging databases, evaluation protocols, and promising future directions are also provided with systematic discussions. [ABSTRACT FROM PUBLISHER]
- Published
- 2010
- Full Text
- View/download PDF
16. Constrained Nonnegative Matrix Factorization for Image Representation.
- Author
-
Liu, Haifeng, Wu, Zhaohui, Cai, Deng, and Huang, Thomas S.
- Subjects
- *
NONNEGATIVE matrices , *FACTORIZATION , *IMAGE processing , *MATHEMATICAL decomposition , *STOCHASTIC convergence , *PRINCIPAL components analysis , *MACHINE learning - Abstract
Nonnegative matrix factorization (NMF) is a popular technique for finding parts-based, linear representations of nonnegative data. It has been successfully applied in a wide range of applications such as pattern recognition, information retrieval, and computer vision. However, NMF is essentially an unsupervised method and cannot make use of label information. In this paper, we propose a novel semi-supervised matrix decomposition method, called Constrained Nonnegative Matrix Factorization (CNMF), which incorporates the label information as additional constraints. Specifically, we show how explicitly combining label information improves the discriminating power of the resulting matrix decomposition. We explore the proposed CNMF method with two cost function formulations and provide the corresponding update solutions for the optimization problems. Empirical experiments demonstrate the effectiveness of our novel algorithm in comparison to the state-of-the-art approaches through a set of evaluations based on real-world applications. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
17. Partially Supervised Speaker Clustering.
- Author
-
Tang, Hao, Chu, Stephen, Hasegawa-Johnson, Mark, and Huang, Thomas
- Subjects
- *
SPEECH perception , *MEASUREMENT , *HEARING , *SPEECH , *FEATURE extraction , *STREAMING audio - Abstract
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
18. Exploring Context and Content Links in Social Media: A Latent Space Method.
- Author
-
Qi, Guo-Jun, Aggarwal, Charu, Tian, Qi, Ji, Heng, and Huang, Thomas
- Subjects
- *
MULTIMEDIA communications , *SEMANTICS , *LARGE scale integration of circuits , *VISUALIZATION , *SOCIAL media , *ALGORITHMS - Abstract
Social media networks contain both content and context-specific information. Most existing methods work with either of the two for the purpose of multimedia mining and retrieval. In reality, both content and context information are rich sources of information for mining, and the full power of mining and processing algorithms can be realized only with the use of a combination of the two. This paper proposes a new algorithm which mines both context and content links in social media networks to discover the underlying latent semantic space. This mapping of the multimedia objects into latent feature vectors enables the use of any off-the-shelf multimedia retrieval algorithms. Compared to the state-of-the-art latent methods in multimedia analysis, this algorithm effectively solves the problem of sparse context links by mining the geometric structure underlying the content links between multimedia objects. Specifically for multimedia annotation, we show that an effective algorithm can be developed to directly construct annotation models by simultaneously leveraging both context and content information based on latent structure between correlated semantic concepts. We conduct experiments on the Flickr data set, which contains user tags linked with images. We illustrate the advantages of our approach over the state-of-the-art multimedia retrieval techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
19. Introduction to the Special Section on Real-World Face Recognition.
- Author
-
Hua, Gang, Yang, Ming-Hsuan, Learned-Miller, Erik, Ma, Yi, Turk, Matthew, Kriegman, David J., and Huang, Thomas S.
- Subjects
- *
HUMAN facial recognition software , *ALGORITHMS , *IMAGING systems , *TELEVISION in security systems , *WEBSITES , *DIGITAL images , *FACIAL expression - Published
- 2011
- Full Text
- View/download PDF
20. Active Learning Based on Locally Linear Reconstruction.
- Author
-
Zhang, Lijun, Chen, Chun, Bu, Jiajun, Cai, Deng, He, Xiaofei, and Huang, Thomas S.
- Subjects
- *
ACTIVE learning , *LEARNING problems , *EXPERIMENTAL design , *PARAMETER estimation , *ALGORITHMS , *CONVEX functions , *MATHEMATICAL optimization , *MANIFOLDS (Mathematics) - Abstract
We consider the active learning problem, which aims to select the most representative points. Out of many existing active learning techniques, optimum experimental design (OED) has received considerable attention recently. The typical OED criteria minimize the variance of the parameter estimates or predicted value. However, these methods see only global euclidean structure, while the local manifold structure is ignored. For example, I-optimal design selects those data points such that other data points can be best approximated by linear combinations of all the selected points. In this paper, we propose a novel active learning algorithm which takes into account the local structure of the data space. That is, each data point should be approximated by the linear combination of only its neighbors. Given the local reconstruction coefficients for every data point and the coordinates of the selected points, a transductive learning algorithm called Locally Linear Reconstruction (LLR) is proposed to reconstruct every other point. The most representative points are thus defined as those whose coordinates can be used to best reconstruct the whole data set. The sequential and convex optimization schemes are also introduced to solve the optimization problem. The experimental results have demonstrated the effectiveness of our proposed method. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
21. Graph Regularized Nonnegative Matrix Factorization for Data Representation.
- Author
-
Cai, Deng, He, Xiaofei, Han, Jiawei, and Huang, Thomas S.
- Subjects
- *
GRAPH theory , *NONNEGATIVE matrices , *FACTORIZATION , *DATA analysis , *COMPUTER vision , *INFORMATION retrieval , *PATTERN perception , *MANIFOLDS (Mathematics) , *EMBEDDINGS (Mathematics) - Abstract
Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. On the other hand, from the geometric perspective, the data is usually sampled from a low-dimensional manifold embedded in a high-dimensional ambient space. One then hopes to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Graph Regularized Nonnegative Matrix Factorization (GNMF), for this purpose. In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure. Our empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.