40 results on '"Changhu Wang"'
Search Results
2. Research on PMSM Control System with LC Filter
- Author
-
Zhiqiang Zhang, Hejin Xiong, Changhu Wang, and Yu Cao
- Published
- 2021
3. MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis
- Author
-
Jiaxin Li, Zijian Feng, Qi She, Henghui Ding, Changhu Wang, and Gim Hee Lee
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Graphics ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Graphics (cs.GR) ,ComputingMethodologies_COMPUTERGRAPHICS ,Machine Learning (cs.LG) - Abstract
In this paper, we propose MINE to perform novel view synthesis and depth estimation via dense 3D reconstruction from a single image. Our approach is a continuous depth generalization of the Multiplane Images (MPI) by introducing the NEural radiance fields (NeRF). Given a single image as input, MINE predicts a 4-channel image (RGB and volume density) at arbitrary depth values to jointly reconstruct the camera frustum and fill in occluded contents. The reconstructed and inpainted frustum can then be easily rendered into novel RGB or depth views using differentiable rendering. Extensive experiments on RealEstate10K, KITTI and Flowers Light Fields show that our MINE outperforms state-of-the-art by a large margin in novel view synthesis. We also achieve competitive results in depth estimation on iBims-1 and NYU-v2 without annotated depth supervision. Our source code is available at https://github.com/vincentfung13/MINE, Comment: ICCV 2021. Main paper and supplementary materials
- Published
- 2021
4. Domain-Invariant Disentangled Network for Generalizable Object Detection
- Author
-
Chuang Lin, Zehuan Yuan, Sicheng Zhao, Peize Sun, Changhu Wang, and Jianfei Cai
- Published
- 2021
5. Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective
- Author
-
Wei Wang, Haochen Zhang, Zehuan Yuan, and Changhu Wang
- Published
- 2021
6. Learning the Best Pooling Strategy for Visual Semantic Embedding
- Author
-
Hexiang Hu, Hao Wu, Jiacheng Chen, Yuning Jiang, and Changhu Wang
- Subjects
FOS: Computer and information sciences ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Feature extraction ,Pooling ,Computer Science - Computer Vision and Pattern Recognition ,Semantics ,Machine learning ,computer.software_genre ,Data modeling ,Visualization ,Code (cryptography) ,Feature (machine learning) ,Embedding ,Artificial intelligence ,business ,computer - Abstract
Visual Semantic Embedding (VSE) is a dominant approach for vision-language retrieval, which aims at learning a deep embedding space such that visual data are embedded close to their semantic text labels or descriptions. Recent VSE models use complex methods to better contextualize and aggregate multi-modal features into holistic embeddings. However, we discover that surprisingly simple (but carefully selected) global pooling functions (e.g., max pooling) outperform those complex models, across different feature extractors. Despite its simplicity and effectiveness, seeking the best pooling function for different data modality and feature extractor is costly and tedious, especially when the size of features varies (e.g., text, video). Therefore, we propose a Generalized Pooling Operator (GPO), which learns to automatically adapt itself to the best pooling strategy for different features, requiring no manual tuning while staying effective and efficient. We extend the VSE model using this proposed GPO and denote it as VSE$\infty$. Without bells and whistles, VSE$\infty$ outperforms previous VSE methods significantly on image-text retrieval benchmarks across popular feature extractors. With a simple adaptation, variants of VSE$\infty$ further demonstrate its strength by achieving the new state of the art on two video-text retrieval datasets. Comprehensive experiments and visualizations confirm that GPO always discovers the best pooling strategy and can be a plug-and-play feature aggregation module for standard VSE models. Code and pre-trained models are available at https://vse-infty.github.io., CVPR 2021 camera-ready (oral). The new version fixes a few typos and updates citations
- Published
- 2021
7. Cross Media Routing and Clustering Algorithm for Autonomous Marine Systems
- Author
-
Changhu Wang, Lin Shuisheng, Haifen Yang, Ding Jiannan, and Meiqiu Jiang
- Subjects
Routing protocol ,Computer science ,Node (networking) ,Real-time computing ,Cross media ,Routing (electronic design automation) ,Underwater ,Cluster analysis ,Data transmission ,Network simulation - Abstract
In the air-sea environment, unmanned aerial vehicles (UAVs), unmanned surface vehicles (USVs) and autonomous underwater vehicles (AUVs) constitute the autonomous marine systems (AMS). However, the challenge is that the performance of underwater acoustic networks is much lower than that of overwater networks. Existing clustering algorithms cannot adapt to AMS well. Besides, existing routing protocols have not considered using the advantages of the overwater networks performance to improve the transmission performance of the underwater acoustic networks. In this paper, we propose a LEACH-based cross media clustering algorithm (LEACH-CM) and a vector-based cross media routing protocol (VBCM), both of them are applicable to the AMS environment. LEACH-CM can ensure the success rate of data transmission in a high-density node environment. VBCM can effectively select a link with a lower delay based on the estimated link delay. We also verify the effectiveness of LEACH-CM and VBCM through NS3 network simulation.
- Published
- 2021
8. Moflowgan: Video Generation With Flow Guidance
- Author
-
Changhu Wang, Zehuan Yuan, Xiangzhong Fang, and Wei Li
- Subjects
Facial expression ,Contextual image classification ,Computer science ,business.industry ,05 social sciences ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,010501 environmental sciences ,01 natural sciences ,Facial recognition system ,Flow (mathematics) ,Feature (computer vision) ,0502 economics and business ,Computer vision ,Artificial intelligence ,050207 economics ,business ,Pose ,0105 earth and related environmental sciences ,Generator (mathematics) - Abstract
In recent years, video generation has attracted a lot of attention in the computer vision community. Unlike image generation which only focuses on appearance, video generation requires modeling both content information and motion dynamics. In this work, we propose MoFlowGAN, which explicitly models motion dynamics by a content-motion decomposition architecture with an additional flow generator. The decomposition architecture models content and motion separately and is instantiated by a compact variant of BigGAN [1]. Besides, the flow generator generates optical flow directly based on high-level feature maps of adjacent frames as a strong supervision, hence the searching space of motion patterns is highly reduced. Our proposed MoFlowGAN achieves the state-of-the-art results on both MUG facial expression and UCF-101 datasets.
- Published
- 2020
9. Improving Convolutional Networks With Self-Calibrated Convolutions
- Author
-
Jiashi Feng, Jiang-Jiang Liu, Ming-Ming Cheng, Qibin Hou, and Changhu Wang
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Object detection ,Convolution ,Discriminative model ,Kernel (image processing) ,Computer engineering ,Convolutional code ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Artificial intelligence ,business ,Feature learning ,0105 earth and related environmental sciences - Abstract
Recent advances on CNNs are mostly devoted to designing more complex architectures to enhance their representation learning capacity. In this paper, we consider how to improve the basic convolutional feature transformation process of CNNs without tuning the model architectures. To this end, we present a novel self-calibrated convolutions that explicitly expand fields-of-view of each convolutional layers through internal communications and hence enrich the output features. In particular, unlike the standard convolutions that fuse spatial and channel-wise information using small kernels (e.g., 3x3), self-calibrated convolutions adaptively build long-range spatial and inter-channel dependencies around each spatial location through a novel self-calibration operation. Thus, it can help CNNs generate more discriminative representations by explicitly incorporating richer information. Our self-calibrated convolution design is simple and generic, and can be easily applied to augment standard convolutional layers without introducing extra parameters and complexity. Extensive experiments demonstrate that when applying self-calibrated convolutions into different backbones, our networks can significantly improve the baseline models in a variety of vision tasks, including image recognition, object detection, instance segmentation, and keypoint detection, with no need to change the network architectures. We hope this work could provide a promising way for future research in designing novel convolutional feature transformations for improving convolutional networks. Code is available on the project page.
- Published
- 2020
10. Unsupervised Teacher-Student Model for Large-Scale Video Retrieval
- Author
-
Yei-Wei Chen, Dong Liang, Changhu Wang, Jie Shao, Rui Wang, and Lanfen Lin
- Subjects
Scale (ratio) ,Computer science ,Event (computing) ,business.industry ,Frame (networking) ,Machine learning ,computer.software_genre ,Pipeline (software) ,Domain (software engineering) ,Feature (computer vision) ,Social media ,Artificial intelligence ,Representation (mathematics) ,business ,computer - Abstract
With the growth of video-sharing platforms and social media applications, video retrieval plays an import role in many aspects, such as copyright infringement detection, event classification, personalized recommendation, and etc. The content-based video retrieval presents the following two main challenges: (i) Distribution inconsistency for feature representation from the source domain to the target domain. (ii) Difficulty of video aggregation by sufficiently incorporating frame-based information. In this paper, we propose an unsupervised teacher-student model (UTS Net) to improve the performance of the content-based video retrieval tasks: (i) A teacher-student model maintaining the global consistency for feature representation from different domains and retaining the local inconsistency within the intra-batch data; (ii) A simple but effective video retrieval pipeline integrating the frame-level binarized feature. Our proposed framework experimentally outperforms the state-of-the-art approach on the DSVR, CSVR, and ISVR tasks in the FIVR datasets, and achieves a mean average precision of 76%, 72%, and 61%, respectively.
- Published
- 2019
11. Temporal Feature Augmented Network for Video Instance Segmentation
- Author
-
Minghui Dong, Changhu Wang, Yuanyuan Huang, Jie Shao, Shiping Wen, Kai Su, Dongdong Yu, Kaihui Zhou, and Jian Wang
- Subjects
Feature (computer vision) ,Computer science ,business.industry ,Motion blur ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer vision ,Segmentation ,Image segmentation ,Variation (game tree) ,Artificial intelligence ,Object (computer science) ,business ,Task (project management) - Abstract
In this paper, we propose a temporal feature augmented network for video instance segmentation. Video instance segmentation task can be split into two subtasks: instance segmentation and tracking. Similar to the previous work, a track head is added to an instance segmentation network to track object instances across frames. Then the network can performing detection, segmentation and tracking tasks simultaneously. We choose the Cascade-RCNN as the basic instance segmentation network. Besides, in order to make better use of the rich information contained in the video, a temporal feature augmented module is introduced to the network. When performing instance segmentation task on a single frame, information from other frames in the same video will be included and the performance of instance segmentation task can be effectively improved. Moreover, experiments show that the temporal feature augmented module can effectively alleviate the problem of motion blur and pose variation
- Published
- 2019
12. Generative Dual Adversarial Network for Generalized Zero-Shot Learning
- Author
-
Changhu Wang, He Huang, Philip S. Yu, and Chang-Dong Wang
- Subjects
FOS: Computer and information sciences ,Class (computer programming) ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Deep learning ,Computer Science - Computer Vision and Pattern Recognition ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Semantic mapping ,Categorization ,Feature (computer vision) ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences ,Generator (mathematics) - Abstract
This paper studies the problem of generalized zero-shot learning which requires the model to train on image-label pairs from some seen classes and test on the task of classifying new images from both seen and unseen classes. Most previous models try to learn a fixed one-directional mapping between visual and semantic space, while some recently proposed generative methods try to generate image features for unseen classes so that the zero-shot learning problem becomes a traditional fully-supervised classification problem. In this paper, we propose a novel model that provides a unified framework for three different approaches: visual-> semantic mapping, semantic->visual mapping, and metric learning. Specifically, our proposed model consists of a feature generator that can generate various visual features given class embeddings as input, a regressor that maps each visual feature back to its corresponding class embedding, and a discriminator that learns to evaluate the closeness of an image feature and a class embedding. All three components are trained under the combination of cyclic consistency loss and dual adversarial loss. Experimental results show that our model not only preserves higher accuracy in classifying images from seen classes, but also performs better than existing state-of-the-art models in in classifying images from unseen classes.
- Published
- 2019
13. Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information
- Author
-
Kai Su, Dongdong Yu, Xin Geng, Zhenqi Xu, and Changhu Wang
- Subjects
FOS: Computer and information sciences ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Deep learning ,Feature extraction ,Computer Science - Computer Vision and Pattern Recognition ,Context (language use) ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Feature (computer vision) ,Pyramid ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Computer vision ,Pyramid (image processing) ,Artificial intelligence ,business ,Spatial analysis ,Pose ,0105 earth and related environmental sciences ,Gesture ,Communication channel - Abstract
Multi-person pose estimation is an important but challenging problem in computer vision. Although current approaches have achieved significant progress by fusing the multi-scale feature maps, they pay little attention to enhancing the channel-wise and spatial information of the feature maps. In this paper, we propose two novel modules to perform the enhancement of the information for the multi-person pose estimation. First, a Channel Shuffle Module (CSM) is proposed to adopt the channel shuffle operation on the feature maps with different levels, promoting cross-channel information communication among the pyramid feature maps. Second, a Spatial, Channel-wise Attention Residual Bottleneck (SCARB) is designed to boost the original residual unit with attention mechanism, adaptively highlighting the information of the feature maps both in the spatial and channel-wise context. The effectiveness of our proposed modules is evaluated on the COCO keypoint benchmark, and experimental results show that our approach achieves the state-of-the-art results., Accepted by CVPR 2019
- Published
- 2019
14. Jersey Number Recognition with Semi-Supervised Spatial Transformer Network
- Author
-
Li Lei, Changhu Wang, Shikun Xu, Li Gen, and Xiang Liu
- Subjects
Artificial neural network ,Computer science ,business.industry ,Detector ,Feature extraction ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Object detection ,Data modeling ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences ,Transformer (machine learning model) - Abstract
It is still a challenging task to recognize the jersey number of players on the court in soccer match videos, as the jersey numbers are very small in the object detection task and annotated data are not easy to collect. Based on the object detection results of all the players on the court, a CNN model is first introduced to classify these numbers on the deteced players' images. To localize the jersey number more precisely without involving another digit detector and extra consumption, we then improve the former network to an end-to-end framework by fusing with the spatial transformer network (STN). To further improve the accuracy, we bring extra supervision to STN and upgrade the model to a semi-supervised multi-task learning system, by labeling a small portion of the number areas in the dataset by quadrangle. Extensive experiments illustrate the effectiveness of the proposed framework.
- Published
- 2018
15. Image segmentation using contour, surface, and depth cues
- Author
-
C.-C. Jay Kuo, Jian Li, Chen Chen, Xiang Fu, and Changhu Wang
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,020201 artificial intelligence & image processing ,Segmentation ,Computer vision ,02 engineering and technology ,Image segmentation ,Artificial intelligence ,Depth perception ,business - Abstract
We target at solving the problem of automatic image segmentation. Although 1D contour and 2D surface cues have been widely utilized in existing work, 3D depth information of an image, a necessary cue according to human visual perception, is however overlooked in automatic image segmentation. In this paper, we study how to fully utilize 1D contour, 2D surface, and 3D depth cues for image segmentation. First, three elementary segmentation modules are developed for these cues respectively. The proposed 3D depth cue is able to segment different textured regions even with similar color, and also merge similar textured areas, which cannot be achieved using state-of-the-art approaches. Then, a content-dependent spectral (CDS) graph is proposed for layered affinity models to produce the final segmentation. CDS is designed to build a more reliable relationship between neighboring surface nodes based on the three elementary cues in the spectral graph. Extensive experiments not only show the superior performance of the proposed algorithm over state-of-the-art approaches, but also verify the necessities of these three cues in image segmentation.
- Published
- 2017
16. Surveillance Video Parsing with Single Frame Supervision
- Author
-
Han Yu, Si Liu, Renda Bao, Changhu Wang, Yao Sun, and Ruihe Qian
- Subjects
Parsing ,Computer science ,business.industry ,Frame (networking) ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,02 engineering and technology ,Image segmentation ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Object detection ,Optical flow estimation ,Video tracking ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences ,Reference frame ,Block-matching algorithm - Abstract
Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications [41, 8]. However, pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (i) roughly parses the frames within the video segment, (ii) estimates the optical flow between frames and (iii) fuses the rough parsing results warped by optical flow to produce the refined parsing result. The three components of SVP, namely frame parsing, optical flow estimation and temporal fusion are integrated in an end-to-end manner. Experimental results on two surveillance video datasets show the superiority of SVP over state-of-the-arts. The collected video parsing datasets can be downloaded via http://liusi-group.com/projects/SVP for the further studies.
- Published
- 2017
17. Barycentric coordinates based soft assignment for object classification
- Author
-
Chang Wen Chen, Changhu Wang, and Tao Wei
- Subjects
Computer science ,Fisher kernel ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Visual Word ,010501 environmental sciences ,01 natural sciences ,Algorithm ,0105 earth and related environmental sciences ,Barycentric coordinates - Abstract
For object classification, soft assignment (SA) is capable of improving the bag-of-visual-words (BoVW) model and has the advantages in conceptual simplicity. However, the performance of soft assignment is inferior to those recently developed encoding schemes. In this paper, we propose a novel scheme called barycentric coordinates based soft assignment (BCSA) for the classification of object images. While maintaining conceptual simplicity, this scheme will be shown to outperform most of the existing encoding schemes, including sparse and local coding schemes. Furthermore, with only single-scale features, it is able to achieve comparable or even better performance to current state-of-the-art Fisher kernel (FK) encoding scheme. In particular, the proposed BCSA scheme enjoys the following properties: 1) preservation of linear order precision for encoding which makes BCSA robust to linear transform distortions; 2) inheriting naturally the visual word uncertainty which leads to a more expressive model; 3) generating linear classifiable codes that can be learned with significant less computational cost and storage. Extensive experiments based on widely used Caltech-101 and Caltech-256 datasets have been carried out to show its effectiveness of the proposed BCSA scheme in both performance and simplicity.
- Published
- 2016
18. Robust Image Segmentation Using Contour-Guided Color Palettes
- Author
-
Chien-Yi Wang, C.-C. Jay Kuo, Chen Chen, Xiang Fu, and Changhu Wang
- Subjects
Color histogram ,Color image ,Segmentation-based object categorization ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-space segmentation ,Pattern recognition ,Image segmentation ,Image texture ,Region growing ,Computer vision ,Artificial intelligence ,business ,Histogram equalization ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
The contour-guided color palette (CCP) is proposed for robust image segmentation. It efficiently integrates contour and color cues of an image. To find representative colors of an image, color samples along long contours between regions, similar in spirit to machine learning methodology that focus on samples near decision boundaries, are collected followed by the mean-shift (MS) algorithm in the sampled color space to achieve an image-dependent color palette. This color palette provides a preliminary segmentation in the spatial domain, which is further fine-tuned by post-processing techniques such as leakage avoidance, fake boundary removal, and small region mergence. Segmentation performances of CCP and MS are compared and analyzed. While CCP offers an acceptable standalone segmentation result, it can be further integrated into the framework of layered spectral segmentation to produce a more robust segmentation. The superior performance of CCP-based segmentation algorithm is demonstrated by experiments on the Berkeley Segmentation Dataset.
- Published
- 2015
19. Trip Mining and Recommendation from Geo-tagged Photos
- Author
-
Lei Zhang, Changhu Wang, Nenghai Yu, and Huagang Yin
- Subjects
Metadata ,World Wide Web ,Search engine ,Similarity (geometry) ,Index (publishing) ,Computer science ,Recommender system ,Scale (map) ,Tourism ,Task (project management) - Abstract
Trip planning is generally a very time-consuming task due to the complex trip requirements and the lack of convenient tools/systems to assist the planning. In this paper, we propose a travel path search system based on geo-tagged photos to facilitate tourists' trip planning, not only for where to visit but also how to visit. The large scale geo-tagged photos that are public ally available on the web make this system possible, as geo-tagged photos encode rich travel-related metadata and can be used to mine travel paths from previous tourists. In this work, about 20 million geo-tagged photos were crawled from Panoramio.com. Then a substantial number of travel paths are minded from the crawled geo-tagged photos. After that, a search system is built to index and search the paths, and the Sparse Chamfer Distance is proposed to measure the similarity of two paths. The search system supports various types of queries, including (1) a destination name, (2) a user-specified region on the map, (3) some user-preferred locations. Based on the search system, users can interact with the system by specifying a region or several interest points on the map to find paths. Extensive experiments show the effectiveness of the proposed framework.
- Published
- 2012
20. The scale of edges
- Author
-
Changhu Wang, Xianming Liu, Hongxun Yao, and Lei Zhang
- Subjects
Parsing ,Scale (ratio) ,business.industry ,Open problem ,Scale-invariant feature transform ,Edge (geometry) ,computer.software_genre ,Edge detection ,Interpretation (model theory) ,Visualization ,Computer vision ,Artificial intelligence ,business ,computer ,Algorithm ,Mathematics - Abstract
Although the scale of isotropic visual elements such as blobs and interest points, e.g. SIFT[12], has been well studied and adopted in various applications, how to determine the scale of anisotropic elements such as edges is still an open problem. In this paper, we study the scale of edges, and try to answer two questions: 1) what is the scale of edges, and 2) how to calculate it. From the points of human cognition and physical interpretation, we illustrate the existence of the scale of edges and provide a quantitative definition. Then, an automatic edge scale selection approach is proposed. Finally, a cognitive experiment is conducted to validate the rationality of the detected scales. Moreover, the importance of identifying the scale of edges is also shown in applications such as boundary detection and hierarchical edge parsing.
- Published
- 2012
21. Shape-based web image clustering for unsupervised object detection?
- Author
-
Xilin Chen, Changhu Wang, and Wei Zheng
- Subjects
Boosting (machine learning) ,Computer science ,business.industry ,Detector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Object detection ,Support vector machine ,Object-class detection ,Discriminative model ,Viola–Jones object detection framework ,Computer vision ,Artificial intelligence ,business ,Cluster analysis - Abstract
Automatic object detection for an arbitrary class is an important but very challenging problem, due to the countless kinds of objects in the world and the large amount of labeling work for each object. In this work, we target at solving the problem of automatic object detection for an arbitrary class without the laborious human effort. Motivated by the explosive growth of Web images and the phenomenal success of search techniques, we develop an unsupervised object detection framework by automatically training the object detector on the top returns of certain image search engine queried by the name of the object class. In order to automatically isolate the objects from the Web images for training, only clipart images with simple background are used, which preserve most of the shape information of the objects. A two-stage shape-based clustering algorithm is proposed to mine typical shapes of the object, in which the inner-class variance of object shapes is considered and undesired images are filtered out. In order to reduce the gap between clipart images and real-world images, we introduce an efficient algorithm to synthesize the real-world images from clipart images, and only shape feature is used in the detector training part. Finally, the synthetic images could be used to train object detectors by an off-the-shelf discriminative algorithm, e.g., boosting or SVM. Extensive experiments show the effectiveness of the proposed framework on objects with simple and representative shapes, and the proposed framework could be considered as a good beginning of solving this challenging problem.
- Published
- 2011
22. Edgel index for large-scale sketch-based image search
- Author
-
Changhu Wang, Yang Cao, Lei Zhang, and Liqing Zhang
- Subjects
Information retrieval ,Matching (graph theory) ,business.industry ,Computer science ,Search engine indexing ,Sketch ,Search engine ,Index (publishing) ,Feature (computer vision) ,Computer vision ,Artificial intelligence ,business ,Image retrieval ,Blossom algorithm - Abstract
Retrieving images to match with a hand-drawn sketch query is a highly desired feature, especially with the popularity of devices with touch screens. Although query-by-sketch has been extensively studied since 1990s, it is still very challenging to build a real-time sketch-based image search engine on a large-scale database due to the lack of effective and efficient matching/indexing solutions. The explosive growth of web images and the phenomenal success of search techniques have encouraged us to revisit this problem and target at solving the problem of web-scale sketch-based image retrieval. In this work, a novel index structure and the corresponding raw contour-based matching algorithm are proposed to calculate the similarity between a sketch query and natural images, and make sketch-based image retrieval scalable to millions of images. The proposed solution simultaneously considers storage cost, retrieval accuracy, and efficiency, based on which we have developed a real-time sketch-based image search engine by indexing more than 2 million images. Extensive experiments on various retrieval tasks (basic shape search, specific image search, and similar image search) show better accuracy and efficiency than state-of-the-art methods.
- Published
- 2011
23. Robust semantic sketch based specific image retrieval
- Author
-
Bo Zhang, Dong Wang, Xiaobing Liu, Cailiang Liu, Changhu Wang, and Lei Zhang
- Subjects
Information retrieval ,Robustness (computer science) ,Computer science ,Search algorithm ,Histogram ,LabelMe ,Image retrieval ,Episodic memory ,Blossom algorithm ,Sketch ,Semantic gap - Abstract
Specific images refer to images one has a certain episodic memory about, e.g. a picture one has ever seen before. Specific image retrieval is a frequent daily information need and the episodic memory is the key to find a specific image. In this paper, we propose a novel semantic sketch-based interface to incorporate the episodic memory for specific image retrieval. The interface allows a user to specify the semantic category and rough area/color of the objects in his memory. To bridge the semantic gap between the query sketch and database images, in the back end, a sampling method selects exemplars from a reference dataset which contains many object instances with user-provided tags and bounding boxes. After that, an exemplar matching algorithm ranks images to retrieve the target image to match the user's memory. In practice, we have observed that query sketches are usually error prone. That is, the position or the color of an object may not be accurate. Meanwhile, the annotations in the reference dataset are also noisy. Thus, the search algorithm has to handle two kinds of errors: 1) reference dataset label noise; 2) user sketch error such as position or scale. For the former, we propose a robust sampling method. For the latter, we derive an efficient spatial reranking algorithm to tolerate inaccurate user sketches. Detailed experimental results on the LabelMe dataset show that the proposed approach is robust to both kinds of errors.
- Published
- 2010
24. Probabilistic models for supervised dictionary learning
- Author
-
Bao-Liang Lu, Zhiwei Li, Lei Zhang, Changhu Wang, and Xiaochen Lian
- Subjects
K-SVD ,Contextual image classification ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Probabilistic logic ,Statistical model ,Pattern recognition ,Mixture model ,Logistic regression ,Machine learning ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Categorization ,Computer Science::Computer Vision and Pattern Recognition ,Histogram ,Pyramid ,Pyramid (image processing) ,Artificial intelligence ,business ,computer - Abstract
Dictionary generation is a core technique of the bag-of-visual-words (BOV) models when applied to image categorization. Most of previous approaches generate dictionaries by unsupervised clustering techniques, e.g. k-means. However, the features obtained by such kind of dictionaries may not be optimal for image classification. In this paper, we propose a probabilistic model for supervised dictionary learning (SDLM) which seamlessly combines an unsuper-vised model (a Gaussian Mixture Model) and a supervised model (a logistic regression model) in a probabilistic framework. In the model, image category information directly affects the generation of a dictionary. A dictionary obtained by this approach is a trade-off between minimization of distortions of clusters and maximization of discriminative power of image-wise representations, i.e. histogram representations of images. We further extend the model to incorporate spatial information during the dictionary learning process in a spatial pyramid matching like manner. We extensively evaluated the two models on various benchmark dataset and obtained promising results.
- Published
- 2010
25. Spatial-bag-of-features
- Author
-
Changhu Wang, Lei Zhang, Yang Cao, Zhiwei Li, and Liqing Zhang
- Subjects
Computer science ,business.industry ,Search engine indexing ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Feature selection ,Pattern recognition ,Automatic image annotation ,Image texture ,Discriminative model ,Histogram ,Computer vision ,Visual Word ,Artificial intelligence ,Representation (mathematics) ,business ,Image retrieval ,Image resolution ,Feature detection (computer vision) - Abstract
In this paper, we study the problem of large scale image retrieval by developing a new class of bag-of-features to encode geometric information of objects within an image. Beyond existing orderless bag-of-features, local features of an image are first projected to different directions or points to generate a series of ordered bag-of-features, based on which different families of spatial bag-of-features are designed to capture the invariance of object translation, rotation, and scaling. Then the most representative features are selected based on a boosting-like method to generate a new bag-of-features-like vector representation of an image. The proposed retrieval framework works well in image retrieval task owing to the following three properties: 1) the encoding of geometric information of objects for capturing objects' spatial transformation, 2) the supervised feature selection and combination strategy for enhancing the discriminative power, and 3) the representation of bag-of-features for effective image matching and indexing for large scale image retrieval. Extensive experiments on 5000 Oxford building images and 1 million Panoramio images show the effectiveness and efficiency of the proposed features as well as the retrieval framework.
- Published
- 2010
26. Multi-label sparse coding for automatic image annotation
- Author
-
Shuicheng Yan, Lei Zhang, Hong-Jiang Zhang, and Changhu Wang
- Subjects
Contextual image classification ,business.industry ,Computer science ,Dimensionality reduction ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Iterative reconstruction ,Image segmentation ,Sparse approximation ,Mixture model ,Automatic image annotation ,Computer Science::Computer Vision and Pattern Recognition ,Computer vision ,Artificial intelligence ,Neural coding ,business ,Image retrieval ,Subspace topology - Abstract
In this paper, we present a multi-label sparse coding framework for feature extraction and classification within the context of automatic image annotation. First, each image is encoded into a so-called supervector, derived from the universal Gaussian Mixture Models on orderless image patches. Then, a label sparse coding based subspace learning algorithm is derived to effectively harness multi-label information for dimensionality reduction. Finally, the sparse coding method for multi-label data is proposed to propagate the multi-labels of the training images to the query image with the sparse l1 reconstruction coefficients. Extensive image annotation experiments on the Corel5k and Corel30k databases both show the superior performance of the proposed multi-label sparse coding framework over the state-of-the-art algorithms.
- Published
- 2009
27. A lexica family with small semantic gap
- Author
-
Jiemin Liu, Xiaokang Yang, Changhu Wang, Qi Tian, Lei Zhang, Shipeng Li, and Yijuan Lu
- Subjects
Consistency (database systems) ,Information retrieval ,Text mining ,business.industry ,Computer science ,Feature extraction ,Feature selection ,Space (commercial competition) ,Lexicon ,business ,Construct (philosophy) ,Image retrieval ,Semantic gap - Abstract
Defining a lexicon of high-level concepts is the first step for data collection and model construction in concept-based image retrieval. Differences of semantic gaps among concepts are well worth considering. By measuring consistency in visual space and textual space, concepts with small semantic gap can be obtained. Considering so many diverse concepts in large-scale image dataset, we construct a lexica family of high-level concepts with small semantic gap based on different low-level features and different consistency measurements. In this lexica family, the lexica are independent to each other and mutually complementary. It provides helpful suggestions about data collection, feature selection and search model construction for large-scale image retrieval.
- Published
- 2009
28. Multiplicative nonnegative greph embedding
- Author
-
Shuicheng Yan, Hong-Jiang Zhang, Changhu Wang, Zheng Song, and Lei Zhang
- Subjects
Combinatorics ,Discrete mathematics ,Factorization ,Graph embedding ,Multiplicative function ,MathematicsofComputing_NUMERICALANALYSIS ,Embedding ,Graph theory ,Tensor ,Nonnegative matrix ,MathematicsofComputing_DISCRETEMATHEMATICS ,Matrix decomposition ,Mathematics - Abstract
In this paper, we study the problem of nonnegative graph embedding, originally investigated in [J. Yang et al., 2008] for reaping the benefits from both nonnegative data factorization and the specific purpose characterized by the intrinsic and penalty graphs. Our contributions are two-fold. On the one hand, we present a multiplicative iterative procedure for nonnegative graph embedding, which significantly reduces the computational cost compared with the iterative procedure in [14] involving the matrix inverse calculation of an M-matrix. On the other hand, the nonnegative graph embedding framework is expressed in a more general way by encoding each datum as a tensor of arbitrary order, which brings a group of byproducts, e.g., nonnegative discriminative tensor factorization algorithm, with admissible time and memory cost. Extensive experiments compared with the state-of-the-art algorithms on nonnegative data factorization, graph embedding, and tensor representation demonstrate the algorithmic properties in computation speed, sparsity, discriminating power, and robustness to realistic image occlusions.
- Published
- 2009
29. Large scale natural image classification by sparsity exploration
- Author
-
Changhu Wang, Shuicheng Yan, and Hong-Jiang Zhang
- Subjects
Scale (ratio) ,Contextual image classification ,Computer science ,business.industry ,WordNet ,Sample (statistics) ,Sparse approximation ,computer.software_genre ,Machine learning ,k-nearest neighbors algorithm ,Discriminative model ,Robustness (computer science) ,Data mining ,Artificial intelligence ,business ,computer - Abstract
We consider in this paper the problem of large scale natural image classification. As the explosion and popularity of images in the Internet, there are increasing attentions to utilize millions of or even billions of these images for helping image related research. Beyond the opportunities brought by unlimited data, a great challenge is how to design more effective classification methods under these large scale scenarios. Most of existing attempts are based on k-nearest-neighbor method. However, in spite of the optimistic performance in some tasks, this strategy still suffers from that, one single fixed global parameter k is not robust for different object classes from different semantic levels. In this paper, we propose an alternative method, called l1-nearest-neighbor, based on a sparse representation computed by l1-minimization. We first treat a testing sample as a sparse linear combination of all training samples, and then consider the related samples as the nearest neighbors of the testing sample. Finally, we classify the testing sample based on the majority of these neighbors' classes. We conduct extensive experiments on a 1.6 million natural image database on different semantic levels defined based on WordNet, which demonstrate that the proposed l1-nearest-neighbor algorithm outperforms k-nearest-neighbor in two aspects: 1) the robustness of parameter selection for different semantic levels, and 2) the discriminative capability for large scale image classification task.
- Published
- 2009
30. Content-Based Image Annotation Refinement
- Author
-
Changhu Wang, Hong-Jiang Zhang, Feng Jing, and Lei Zhang
- Subjects
Information retrieval ,business.industry ,Computer science ,Markov process ,Set (abstract data type) ,Support vector machine ,symbols.namesake ,Annotation ,Automatic image annotation ,Feature (computer vision) ,symbols ,business ,Hidden Markov model ,Image retrieval ,Content management - Abstract
Automatic image annotation has been an active research topic due to its great importance in image retrieval and management. However, results of the state-of-the-art image annotation methods are often unsatisfactory. Despite continuous efforts in inventing new annotation algorithms, it would be advantageous to develop a dedicated approach that could refine imprecise annotations. In this paper, a novel approach to automatically refining the original annotations of images is proposed. For a query image, an existing image annotation method is first employed to obtain a set of candidate annotations. Then, the candidate annotations are re-ranked and only the top ones are reserved as the final annotations. By formulating the annotation refinement process as a Markov process and defining the candidate annotations as the states of a Markov chain, a content-based image annotation refinement (CIAR) algorithm is proposed to re-rank the candidate annotations. It leverages both corpus information and the content feature of a query image. Experimental results on a typical Corel dataset show not only the validity of the refinement, but also the superiority of the proposed algorithm over existing ones.
- Published
- 2007
31. Realistic 3D Face Modeling by Fusing Multiple 2D Images
- Author
-
Shuicheng Yan, Hong-Jiang Zhang, Wei-Ying Ma, and Changhu Wang
- Subjects
2d images ,business.industry ,Estimation theory ,Computer science ,Efficient algorithm ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Texture model ,Pattern recognition ,Texture (geology) ,Face (geometry) ,Computer vision ,Artificial intelligence ,Quaternion ,business ,Representation (mathematics) ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
In this paper, we propose a fully automatic and efficient algorithm for realistic 3D face reconstruction by fusing multiple 2D face images. Firstly, an efficient multi-view 2D face alignment algorithm is utilized to localize the facial points of the face images; and then the intrinsic shape and texture models are inferred by the proposed Syncretized Shape Model (SSM) and Syncretized Texture Model (STM), respectively. Compared with other related works, our proposed algorithm has the following characteristics: 1) the inferred shape and texture are more realistic owing to the constraints and co-enhancement among the multiple images; 2) it is fully automatic, without any user interaction; and 3) the shape and pose parameter estimation is efficient via EM approach and unit quaternion based pose representation, and is also robust as a result of the dynamic correspondence approach. The experimental results show the effectiveness of our proposed algorithm for 3D face reconstruction.
- Published
- 2005
32. Edgel index for large-scale sketch-based image search.
- Author
-
Yang Cao, Changhu Wang, Liqing Zhang, and Lei Zhang
- Published
- 2011
- Full Text
- View/download PDF
33. Shape-based web image clustering for unsupervised object detection?
- Author
-
Wei Zheng, Changhu Wang, and Xilin Chen
- Published
- 2011
- Full Text
- View/download PDF
34. Spatial-bag-of-features.
- Author
-
Yang Cao, Changhu Wang, Zhiwei Li, Liqing Zhang, and Lei Zhang
- Published
- 2010
- Full Text
- View/download PDF
35. Probabilistic models for supervised dictionary learning.
- Author
-
Xiao-Chen Lian, Zhiwei Li, Changhu Wang, Bao-Liang Lu, and Lei Zhang
- Published
- 2010
- Full Text
- View/download PDF
36. Robust semantic sketch based specific image retrieval.
- Author
-
Cailiang Liu, Dong Wang, Xiaobing Liu, Changhu Wang, Lei Zhang, and Bo Zhang
- Published
- 2010
- Full Text
- View/download PDF
37. Multi-label sparse coding for automatic image annotation.
- Author
-
Changhu Wang, Shuicheng Yan, Lei Zhang, and Hong-Jiang Zhang
- Published
- 2009
- Full Text
- View/download PDF
38. Multiplicative nonnegative greph embedding.
- Author
-
Changhu Wang, Zheng Song, Shuicheng Yan, Lei Zhang, and Hong-Jiang Zhang
- Published
- 2009
- Full Text
- View/download PDF
39. Realistic 3D Face Modeling by Fusing Multiple 2D Images.
- Author
-
Changhu Wang, Shuicheng Yan, Hongjiang Zhang, and Weiying Ma
- Published
- 2005
- Full Text
- View/download PDF
40. Viewpoint-Aware Representation for Sketch-Based 3D Model Retrieval.
- Author
-
Changqing Zou, Changhu Wang, Yafei Wen, Lei Zhang, and Jianzhuang Liu
- Subjects
IMAGE retrieval ,THREE-dimensional modeling ,DESCRIPTOR systems ,EDGE detection (Image processing) ,QUERY (Information retrieval system) - Abstract
We study the problem of sketch-based 3D model retrieval, and propose a solution powered by a new query-to-model distance metric and a powerful feature descriptor based on the bag-of-features framework. The main idea of the proposed query-to-model distance metric is to represent a query sketch using a compact set of sample views (called basic views) of each model, and to rank the models in ascending order of the representation errors. To better differentiate between relevant and irrelevant models, the representation is constrained to be essentially a combination of basic views with similar viewpoints. In another aspect, we propose a mid-level descriptor (called BOF-JESC) which robustly characterizes the edge information within junction-centered patches, to extract the salient shape features from sketches or model views. The combination of the query-to-model distance metric and the BOF-JESC descriptor achieves effective results on two latest benchmark datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.