77 results on '"HanQing Lu"'
Search Results
2. ROSE: Robust Caches for Amazon Product Search
- Author
-
Chen Luo, Vihan Lakshman, Anshumali Shrivastava, Tianyu Cao, Sreyashi Nag, Rahul Goutam, Hanqing Lu, Yiwei Song, and Bing Yin
- Published
- 2022
- Full Text
- View/download PDF
3. QUEACO
- Author
-
Qiang Yang, Yiwei Song, Tianyu Cao, Danqing Zhang, Bing Yin, Hanqing Lu, Chen Luo, Zheng Li, Tony Wu, and Tuo Zhao
- Subjects
FOS: Computer and information sciences ,Normalization (statistics) ,Computer Science - Machine Learning ,Focus (computing) ,Computer Science - Computation and Language ,Meta learning (computer science) ,Computer Science - Artificial Intelligence ,Computer science ,Value (computer science) ,computer.software_genre ,Bridge (interpersonal) ,Computer Science - Information Retrieval ,Machine Learning (cs.LG) ,Artificial Intelligence (cs.AI) ,Named-entity recognition ,Leverage (statistics) ,Canonical form ,Data mining ,Computation and Language (cs.CL) ,computer ,Information Retrieval (cs.IR) - Abstract
We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms. Such a problem consists of two phases: {named entity recognition (NER)} and {attribute value normalization (AVN)}. However, existing works only focus on the NER phase but neglect equally important AVN. To bridge this gap, this paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO, which involves both two phases. Moreover, by leveraging large-scale weakly-labeled behavior data, we further improve the extraction performance with less supervision cost. Specifically, for the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels to refine the weakly-labeled data for training a student network. Meanwhile, the teacher network can be dynamically adapted by the feedback of the student's performance on strongly-labeled data to maximally denoise the noisy supervisions from the weak labels. For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products. Extensive experiments on a real-world large-scale E-commerce dataset demonstrate the effectiveness of QUEACO., Comment: The 30th ACM International Conference on Information and Knowledge Management (CIKM 2021, Applied Research Track)
- Published
- 2021
- Full Text
- View/download PDF
4. Dual Hierarchical Temporal Convolutional Network with QA-Aware Dynamic Normalization for Video Story Question Answering
- Author
-
Richang Hong, Hanqing Lu, Xinxin Zhu, Fei Liu, and Jing Liu
- Subjects
Questions and answers ,Exploit ,business.industry ,Computer science ,Normalization (image processing) ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Multimodal interaction ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,Subtitle ,020201 artificial intelligence & image processing ,Artificial intelligence ,Temporal scales ,business ,computer ,0105 earth and related environmental sciences - Abstract
Video story question answering (video story QA) is a challenging problem, as it requires a joint understanding of diverse data sources (i.e., video, subtitle, question, and answer choices). Existing approaches for video story QA have several common defects: (1) single temporal scale; (2) static and rough multimodal interaction; and (3) insufficient (or shallow) exploitation of both question and answer choices. In this paper, we propose a novel framework named Dual Hierarchical Temporal Convolutional Network (DHTCN) to address the aforementioned defects together. The proposed DHTCN explores multiple temporal scales by building hierarchical temporal convolutional network. In each temporal convolutional layer, two key components, namely AttLSTM and QA-Aware Dynamic Normalization, are introduced to capture the temporal dependency and the multimodal interaction in a dynamic and fine-grained manner. To enable sufficient exploitation of both question and answer choices, we increase the depth of QA pairs with a stack of non-linear layers, and exploit QA pairs in each layer of the network. Extensive experiments are conducted on two widely used datasets: TVQA and MovieQA, demonstrating the effectiveness of DHTCN. Our model obtains state-of-the-art results on the both datasets.
- Published
- 2020
- Full Text
- View/download PDF
5. Gate-based Bidirectional Interactive Decoding Network for Scene Text Recognition
- Author
-
Hanqing Lu, Yunze Gao, Yingying Chen, and Jinqiao Wang
- Subjects
Computer science ,Speech recognition ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Context (language use) ,Data_CODINGANDINFORMATIONTHEORY ,02 engineering and technology ,Text recognition ,Encoder ,Decoding methods ,Image (mathematics) - Abstract
Scene text recognition has attracted rapidly increasing attention from the research community. Recent dominant approaches typically follow an attention-based encoder-decoder framework that uses a unidirectional decoder to perform decoding in a left-to-right manner, but ignoring equally important right-to-left grammar information. In this paper, we propose a novel Gate-based Bidirectional Interactive Decoding Network (GBIDN) for scene text recognition. Firstly, the backward decoder performs decoding from right to left and generates the reverse language context. After that, the forward decoder simultaneously utilizes the visual context from image encoder and the reverse language context from backward decoder through two attention modules. In this way, the bidirectional decoders perform effective interaction to fully fuse the bidirectional grammar information and further improve the decoding quality. Besides, in order to relieve the adverse effect of noises, we devise a gated context mechanism to adaptively make use of the visual context and reverse language context. Extensive experiments on various challenging benchmarks demonstrate the effectiveness of our method.
- Published
- 2019
- Full Text
- View/download PDF
6. Erasing-based Attention Learning for Visual Question Answering
- Author
-
Hanqing Lu, Fei Liu, Richang Hong, and Jing Liu
- Subjects
Scheme (programming language) ,Computer science ,business.industry ,Inference ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Task (project management) ,Constraint (information theory) ,Discriminative model ,Margin (machine learning) ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences ,computer.programming_language - Abstract
Attention learning for visual question answering remains a challenging task, where most existing methods treat the attention and the non-attention parts in isolation. In this paper, we propose to enforce the correlation between the attention and the non-attention parts as a constraint for attention learning. We first adopt an attention-guided erasing scheme to obtain the attention and the non-attention parts respectively, and then learn to separate the attention and the non-attention parts by an appropriate distance margin in a feature embedding space. Furthermore, we associate a typical classification loss with the above distance constraint to learn a more discriminative attention map for answer prediction. The proposed approach does not introduce extra model parameters or inference complexity, and can be combined with any attention-based models. Extensive ablation experiments validate the effectiveness of our method, and new state-of-the-art or competitive results on four publicly available datasets are achieved.
- Published
- 2019
- Full Text
- View/download PDF
7. Enhancing Visual Question Answering Using Dropout
- Author
-
Yong Li, Jing Liu, Zhiwei Fang, Hanqing Lu, Yanyuan Qiao, and Qu Tang
- Subjects
Computer science ,business.industry ,Inference ,02 engineering and technology ,Variance (accounting) ,010501 environmental sciences ,Overfitting ,Machine learning ,computer.software_genre ,01 natural sciences ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Dropout (neural networks) ,0105 earth and related environmental sciences - Abstract
Using dropout in Visual Question Answering (VQA) is a common practice to prevent overfitting. However, in multi-path networks, the current way to use dropout may cause two problems: the co-adaptations of neurons and the explosion of output variance. In this paper, we propose the coherent dropout and the siamese dropouy to solve the two problems, respectively. Specifically, in coherent dropout, all relevant dropout layers in multiple paths are forced to work coherently to maximize the ability of preventing neuron co-adaptations. We show that the coherent dropout is simple in implementation but very effective to overcome overfitting. As for the explosion of output variance, we develop a siamese dropout mechanism to explicitly minimize the difference between the two output vectors produced from the same input data during training phase. Such mechanism can reduce the gap between training and inference phases and make the VQA model more robust. Extensive experiments are conducted to verify the effectiveness of coherent dropout and siamese dropout. And the results also show that our methods can bring additional improvements on the state-of-the-art VQA models.
- Published
- 2018
- Full Text
- View/download PDF
8. Pseudo Label based Unsupervised Deep Discriminative Hashing for Image Retrieval
- Author
-
Jiaxiang Wu, Jian Cheng, Hanqing Lu, Qinghao Hu, and Lifang Wu
- Subjects
Computer science ,business.industry ,Quantization (signal processing) ,Hash function ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Locality-sensitive hashing ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Feature hashing ,Artificial intelligence ,business ,Image retrieval ,computer ,Computer Science::Databases ,Computer Science::Cryptography and Security ,0105 earth and related environmental sciences - Abstract
Hashing methods play an important role in large scale image retrieval. Traditional hashing methods use hand-crafted features to learn hash functions, which can not capture the high level semantic information. Deep hashing algorithms use deep neural networks to learn feature representation and hash functions simultaneously. Most of these algorithms exploit supervised information to train the deep network. However, supervised information is expensive to obtain. In this paper, we propose a pseudo label based unsupervised deep discriminative hashing algorithm. First, we cluster images via K-means and the cluster labels are treated as pseudo labels. Then we train a deep hashing network with pseudo labels by minimizing the classification loss and quantization loss. Experiments on two datasets demonstrate that our unsupervised deep discriminative hashing method outperforms the state-of-art unsupervised hashing methods.
- Published
- 2017
- Full Text
- View/download PDF
9. RSVP
- Author
-
Han Yu, Ruihe Qian, Guanghui Ren, Hanqing Lu, Changhu Wang, Si Liu, and Yao Sun
- Subjects
Parsing ,Computer science ,business.industry ,Speech recognition ,Deep learning ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,computer.software_genre ,Single frame ,Video tracking ,Face (geometry) ,Segmentation ,Artificial intelligence ,business ,computer - Abstract
In this demo, we present a real-time surveillance video parsing (RSVP) system to parse surveillance videos. Surveillance video parsing, which aims to segment the video frames into several labels, e.g., face, pants, left-legs, has wide applications, especially in security filed. However, it is very tedious and time-consuming to annotate all the frames in a video. We design a RSVP system to parse the surveillance videos in real-time. The RSVP system requires only one labeled frame in training stage. The RSVP system jointly considers the segmentation of preceding frames when parsing one particular frame within the video. The RSVP system is proved to be effective and efficient in real applications.
- Published
- 2017
- Full Text
- View/download PDF
10. Sketch-based Image Retrieval using Generative Adversarial Networks
- Author
-
Longteng Guo, Hanqing Lu, Wei Wen, Yuhang Wang, Jing Liu, and Zhonghua Luo
- Subjects
business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,02 engineering and technology ,Real image ,Sketch ,Image (mathematics) ,Consistency (database systems) ,Feature (computer vision) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Encoder ,Image retrieval ,Generative grammar ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
For sketch-based image retrieval (SBIR), we propose a generative adversarial network trained on a large number of sketches and their corresponding real images. To imitate human search process, we attempt to match candidate images with theimaginary image in user single s mind instead of the sketch query, i.e., not only the shape information of sketches but their possible content information are considered in SBIR. Specifically, a conditional generative adversarial network (cGAN) is employed to enrich the content information of sketches and recover the imaginary images, and two VGG-based encoders, which work on real and imaginary images respectively, are used to constrain their perceptual consistency from the view of feature representations. During SBIR, we first generate an imaginary image from a given sketch via cGAN, and then take the output of the learned encoder for imaginary images as the feature of the query sketch. Finally, we build an interactive SBIR system that shows encouraging performance.
- Published
- 2017
- Full Text
- View/download PDF
11. Learning Max-Margin GeoSocial Multimedia Network Representations for Point-of-Interest Suggestion
- Author
-
Fei Wu, Jun Xiao, Min Yang, Hanqing Lu, Zhou Zhao, Yueting Zhuang, and Qifan Yang
- Subjects
Point of interest ,Multimedia ,Computer science ,02 engineering and technology ,computer.software_genre ,Social relation ,Margin (machine learning) ,020204 information systems ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Multimedia network ,Web service ,Mobile device ,computer ,Feature learning - Abstract
With the rapid development of mobile devices, point-of-interest (POI) suggestion has become a popular online web service, which provides attractive and interesting locations to users. In order to provide interesting POIs, many existing POI recommendation works learn the latent representations of users and POIs from users' past visiting POIs, which suffers from the sparsity problem of POI data. In this paper, we consider the problem of POI suggestion from the viewpoint of learning geosocial multimedia network representations. We propose a novel max-margin metric geosocial multimedia network representation learning framework by exploiting users' check-in behavior and their social relations. We then develop a random-walk based learning method with max-margin metric network embedding. We evaluate the performance of our method on a large-scale geosocial multimedia network dataset and show that our method achieves the best performance than other state-of-the-art solutions.
- Published
- 2017
- Full Text
- View/download PDF
12. Deep learning driven hypergraph representation for image-based emotion recognition
- Author
-
Hanqing Lu and Yuchi Huang
- Subjects
Ground truth ,Hypergraph ,Computer science ,business.industry ,Deep learning ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Layer (object-oriented design) ,Representation (mathematics) ,business ,computer - Abstract
In this paper, we proposed a bi-stage framework for image-based emotion recognition by combining the advantages of deep convolutional neural networks (D-CNN) and hypergraphs. To exploit the representational power of D-CNN, we remodeled its last hidden feature layer as the `attribute' layer in which each hidden unit produces probabilities on a specific semantic attribute. To describe the high-order relationship among facial images, each face was assigned to various hyperedges according to the computed probabilities on different D-CNN attributes. In this way, we tackled the emotion prediction problem by a transductive learning approach, which tends to assign the same label to faces that share many incidental hyperedges (attributes), with the constraints that predicted labels of training samples should be similar to their ground truth labels. We compared the proposed approach to state-of-the-art methods and its effectiveness was demonstrated by extensive experimentation.
- Published
- 2016
- Full Text
- View/download PDF
13. Objectness-aware Semantic Segmentation
- Author
-
Junjie Yan, Hanqing Lu, Yuhang Wang, Jing Liu, and Yong Li
- Subjects
0209 industrial biotechnology ,Artificial neural network ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Pascal (programming language) ,Machine learning ,computer.software_genre ,Convolutional neural network ,Upsampling ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Artificial intelligence ,business ,computer ,Test data ,computer.programming_language - Abstract
Recent advances in semantic segmentation are driven by the success of fully convolutional neural network (FCN). However, the coarse label map from the network and the object discrimination ability for semantic segmentation weaken the performance of those FCN-based models. To address these issues, we propose an objectness-aware semantic segmentation framework (OA-Seg) by jointly learning an object proposal network (OPN) and a lightweight deconvolutional neural network (Light-DCNN). First, OPN is learned based on a fully convolutional architecture to simultaneously predict object bounding boxes and their objectness scores. Second, we design a Light-DCNN to provide a finer upsampling way than FCN. The Light-DCNN is constructed with convolutional layers in VGG-net and their mirrored deconvolutional structure, where all fully-connected layers are removed. And hierarchical classification layers are added to multi-scale deconvolutional features to introduce more contextual information for pixel-wise label prediction. Compared with previous works, our approach performs an obvious decrease on model size and convergence time. Thorough evaluations are performed on the PASCAL VOC 2012 benchmark, and our model yields impressive results on its validation data (70.3% mean IoU) and test data (74.1% mean IoU).
- Published
- 2016
- Full Text
- View/download PDF
14. Partial Multi-Modal Sparse Coding via Adaptive Similarity Structure Regularization
- Author
-
Hanqing Lu, Yueting Zhuang, Xiaofei He, Cai Deng, and Zhou Zhao
- Subjects
Modality (human–computer interaction) ,Computer science ,business.industry ,0206 medical engineering ,Structure (category theory) ,Pattern recognition ,02 engineering and technology ,Sparse approximation ,020601 biomedical engineering ,Regularization (mathematics) ,Modal ,Similarity (network science) ,ComputerApplications_MISCELLANEOUS ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Neural coding - Abstract
Multi-modal sparse coding has played an important role in many multimedia applications, where data are usually with multiple modalities. Recently, various multi-modal sparse coding approaches have been proposed to learn sparse codes of multi-modal data, which assume that data appear in all modalities, or at least there is one modality containing all data. However, in real applications, it is often the case that some modalities of the data may suffer from missing information and thus result in partial multi-modality data. In this paper, we propose to solve the partial multi-modal sparse coding problem via multi-modal similarity structure regularization. Specifically, we propose a partial multi-modal sparse coding framework termed Adaptive Partial Multi-Modal Similarity Structure Regularization for Sparse Coding (AdaPM2SC), which preserves the similarity structure within the same modality and between different modalities. Experimental results conducted on two real-world datasets demonstrate that AdaPM2SC significantly outperforms the state-of-the-art methods under partial multi-modality scenario.
- Published
- 2016
- Full Text
- View/download PDF
15. Deep People Counting with Faster R-CNN and Correlation Tracking
- Author
-
Jinqiao Wang, Hanqing Lu, Baocai Yin, Zhiqiang Li, Yikai Fang, Huazhong Xu, and La Zhang
- Subjects
Computer science ,business.industry ,Deep learning ,010401 analytical chemistry ,Detector ,Correlation filter ,02 engineering and technology ,01 natural sciences ,Convolutional neural network ,0104 chemical sciences ,Correlation ,Minimum bounding box ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Crowd counting - Abstract
Crowd counting is a key problem for many computer vision tasks while most existing methods try to count people based on regression with hand-crafted features. Recently, the fast development of deep learning has resulted in many promising detectors of generic object classes. In this paper, to effective leverage the discriminability of convolutional neural networks, we propose a method to people counting based on Faster R-CNN[9] head-shoulder detection and correlation tracking. Firstly, we train a Faster R-CNN head-shoulder detector with Zeiler model to detect people with multiple poses and views. Next, we employ kernelized correlation filter(KCF)[7] to track the people and obtain the trajectory. Considering the results of the detection and tracking, we fuse the two bounding box to obtain a continuous and stable trajectory. Extensive experiments and comparison show the promise of the proposed approach.
- Published
- 2016
- Full Text
- View/download PDF
16. Object-aware Deep Network for Commodity Image Retrieval
- Author
-
Yong Li, Jing Liu, Hanqing Lu, Song Hang, Jinhui Tang, Zhiwei Fang, and Yuhang Wang
- Subjects
education.field_of_study ,Information retrieval ,business.industry ,Computer science ,Population ,Commodity ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,Object (computer science) ,01 natural sciences ,Object detection ,Image (mathematics) ,Ranking (information retrieval) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,education ,business ,Feature learning ,Image retrieval ,computer ,0105 earth and related environmental sciences - Abstract
Recent years, with the development of e-commerce and population of mobile phones, image-based commodity retrieval has attracted much attention. This paper proposed a deep framework for commodity image retrieval(CMIR) from the view that they are same designed commodities. Our framework can catch as many design details as possible by exploring object detection and ranking sensitive feature learning, while the former is performed based on Faster R-CNN, and the later is learned with a multi-task Siamese Network. Besides, we refine the processing speed of the framework to make it a live system. Our framework is implemented on an android application based on Client/Server structure model whose server response time is about 150 ms per query.
- Published
- 2016
- Full Text
- View/download PDF
17. Personalized Recommendation Meets Your Next Favorite
- Author
-
Ting Yuan, Hanqing Lu, Jian Cheng, and Qiang Song
- Subjects
Topic model ,Measure (data warehouse) ,Information retrieval ,Computer science ,business.industry ,media_common.quotation_subject ,Recommender system ,Machine learning ,computer.software_genre ,Ranking (information retrieval) ,Personalization ,Ranking ,Collaborative filtering ,Graph (abstract data type) ,Artificial intelligence ,Function (engineering) ,business ,computer ,media_common - Abstract
A comprehensive understanding of user's item selection behavior is not only essential to many scientific disciplines, but also has a profound business impact on online recommendation. Recent researches have discovered that user's favorites can be divided into 2 categories: long-term and short-term. User's item selection behavior is a mixed decision of her long and short-term favorites. In this paper, we propose a unified model, namely States Transition pAir-wise Ranking Model (STAR), to address users' favorites mining for sequential-set recommendation. Our method utilizes a transition graph for collaborative filtering that accounts for mining user's short-term favorites, jointed with a generative topic model for expressing user's long-term favorites. Furthermore, a user's specific prior is introduced into our unified model for better modeling personalization. Technically, we develop a pair-wise ranking loss function for parameters learning. Empirically, we measure the effectiveness of our method using two real-world datasets and the results show that our method outperforms state-of-the-art methods.
- Published
- 2015
- Full Text
- View/download PDF
18. Exclusive Constrained Discriminative Learning for Weakly-Supervised Semantic Segmentation
- Author
-
Hanqing Lu, Songde Ma, Jin Liu, and Peng Ying
- Subjects
Computer science ,business.industry ,LabelMe ,Pattern recognition ,Machine learning ,computer.software_genre ,Term (time) ,Task (project management) ,Constraint (information theory) ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Segmentation ,Artificial intelligence ,business ,Set (psychology) ,computer ,Smoothing ,Discriminative learning - Abstract
How to import image-level labels as weak supervision to direct the region-level labeling task is the core task of weakly-supervised semantic segmentation. In this paper, we focus on designing an effective but simple weakly-supervised constraint, and propose an exclusive constrained discriminative learning model for image semantic segmentation. To be specific, we employ a discriminative linear regression model to assign subsets of superpixels with different labels. During the assignment, we construct an exclusive weakly-supervised constraint term to suppress the labeling responses of each superpixel on the labels outside its parent image-level label set. Besides, a spectral smoothing term is integrated to encourage that both visually and semantically similar superpixels have similar labels. Combining these terms, we formulate the problem as a convex objective function, which can be easily optimized via alternative iterations. Extensive experiments on MSRC-21 and LabelMe datasets demonstrate the effectiveness of the proposed model.
- Published
- 2015
- Full Text
- View/download PDF
19. Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks
- Author
-
Hanqing Lu, Yong Li, Jing Liu, and Yuhang Wang
- Subjects
business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,LabelMe ,Pattern recognition ,Pascal (programming language) ,Semi-supervised learning ,Machine learning ,computer.software_genre ,Convolutional neural network ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Labeled data ,Segmentation ,Artificial intelligence ,business ,computer ,computer.programming_language - Abstract
Successful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-supervised learning framework by exploring images most only with image-level labels and very few with pixel-level labels, in which two stages of Convolutional Neural Network (CNN) training are included. First, a pixel-level supervised CNN is trained on very few fully annotated images. Second, given a large number of images with only image-level labels available, a collaborative-supervised CNN is designed to jointly perform the pixel-level and image-level classification tasks, while the pixel-level labels are predicted by the fully-supervised network in the first stage. The collaborative-supervised network can remain the discriminative ability of the fully-supervised model learned with fully labeled images, and further enhance the performance by importing more weakly labeled data. Our experiments on two challenging datasets, i.e, PASCAL VOC 2007 and LabelMe LMO, demonstrate the satisfactory performance of our approach, nearly matching the results achieved when all training images have pixel-level labels.
- Published
- 2015
- Full Text
- View/download PDF
20. Spatio-Temporal Triangular-Chain CRF for Activity Recognition
- Author
-
Yifan Zhang, Congqi Cao, and Hanqing Lu
- Subjects
Activity recognition ,Chain (algebraic topology) ,Action (philosophy) ,business.industry ,Computer science ,Artificial intelligence ,Dimension (data warehouse) ,business ,Machine learning ,computer.software_genre ,CRFS ,computer ,Hierarchical database model - Abstract
Understanding human activities in video is a fundamental problem in computer vision. In real life, human activities are composed of temporal and spatial arrangement of actions. Understanding such complex activities requires recognizing not only each individual action, but more importantly, capturing their spatio-temporal relationships. This paper addresses the problem of complex activity recognition with a unified hierarchical model. We expand triangular-chain CRFs (TriCRFs) to the spatial dimension. The proposed architecture can be perceived as a spatio-temporal version of the TriCRFs, in which the labels of actions and activity are modeled jointly and their complex dependencies are exploited. Experiments show that our model generates promising results, outperforming competing methods significantly. The framework also can be applied to model other structured sequential data.
- Published
- 2015
- Full Text
- View/download PDF
21. Learning Multi-view Deep Features for Small Object Retrieval in Surveillance Scenarios
- Author
-
Zheng-Jun Zha, Hanqing Lu, Haiyun Guo, Min Xu, and Jinqiao Wang
- Subjects
Computer science ,Property (programming) ,business.industry ,Hash function ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Representation (systemics) ,Pattern recognition ,Object (computer science) ,Convolutional neural network ,Discriminative model ,Feature (computer vision) ,Visual Objects ,Computer vision ,Artificial intelligence ,business ,computer ,computer.programming_language - Abstract
© 2015 ACM. With the explosive growth of surveillance videos, object re-trieval has become a significant task for security monitoring. However, visual objects in surveillance videos are usually of small size with complex light conditions, view changes and partial occlusions, which increases the dificulty level of eff-ciently retrieving objects of interest in a large-scale dataset. Although deep features have achieved promising results on object classification and retrieval and have been veriffed to contain rich semantic structure property, they lack of ade-quate color information, which is as crucial as structure in-formation for effective object representation. In this paper, we propose to leverage discriminative Convolutional Neural Network (CNN) to learn deep structure and color feature to form an Effcient multi-view object representation. Specifi-cally, we utilize CNN trained on ImageNet to abstract rich semantic structure information. Meanwhile, we propose a CNN model supervised by 11 color names to extract deep color features. Compared with traditional color descriptors, deep color features can capture the common color property across difierent illumination conditions. Then, the comple-mentary multi-view deep features are encoded into short bi-nary codes by Locality-Sensitive Hash (LSH) and fused to retrieve objects. Retrieval experiments are performed on a dataset of 100k objects extracted from multi-camera surveil-lance videos. Comparison results with several popular visual descriptors show the effectiveness of the proposed approach.
- Published
- 2015
- Full Text
- View/download PDF
22. Incremental Matrix Factorization via Feature Space Re-learning for Recommender System
- Author
-
Qiang Song, Hanqing Lu, and Jian Cheng
- Subjects
Matrix (mathematics) ,Theoretical computer science ,Feature Dimension ,Computer science ,Feature (computer vision) ,Feature vector ,Document-term matrix ,Recommender system ,Matrix decomposition ,Non-negative matrix factorization - Abstract
Matrix factorization is widely used in Recommender Systems. Although existing popular incremental matrix factorization methods are effectively in reducing time complexity, they simply assume that the similarity between items or users is invariant. For instance, they keep the item feature matrix unchanged and just update the user matrix without re-training the entire model. However, with the new users growing continuously, the fitting error would be accumulated since the extra distribution information of items has not been utilized. In this paper, we present an alternative and reasonable approach, with a relaxed assumption that the similarity between items (users) is relatively stable after updating. Concretely, utilizing the prediction error of the new data as the auxiliary features, our method updates both feature matrices simultaneously, and thus users' preference can be better modeled than merely adjusting one corresponded feature matrix. Besides, our method maintains the feature dimension in a smaller size through taking advantage of matrix sketching. Experimental results show that our proposal outperforms the existing incremental matrix factorization methods.
- Published
- 2015
- Full Text
- View/download PDF
23. 60 Hz self-tuning background modeling
- Author
-
Hanqing Lu, Jun Luo, La Zhang, Jinqiao Wang, Yingying Chen, and Huazhong Xu
- Subjects
Background subtraction ,Sequence ,Pixel ,business.industry ,Computer science ,Benchmark (computing) ,Self-tuning ,Preprocessor ,Computer vision ,Artificial intelligence ,business ,Sample (graphics) ,Change detection - Abstract
Background modeling or change detection is often used as a preprocessing step in many computer vision tasks especially for intelligent surveillance. Despite various methods have been proposed to deal with this problem, they often involve complex parameter settings and have poor adaptability to scene changes. In this paper, we propose a fast and robust approach for background modeling with self-adaptive ability. Like ViBe [7], each pixel model is represented by a sequence of historical samples based on sample consensus. To adapt various changes in complex scenes, a flexible feedback scheme is presented to automatically adjust the model parameters. Moreover, a selective diffusion method is employed to overcome the problems like incomplete foregrounds or false detections brought by intermittent moving objects. Experiment results on ChangeDetection benchmark 2014 show that the proposed approach outperforms state-of-the-art approaches with a speed of 60 fps on CPU for a 640 × 480 image sequence.
- Published
- 2015
- Full Text
- View/download PDF
24. Concurrent group activity classification with context modeling
- Author
-
Wei Fu, Jian Cheng, Hanqing Lu, Jinqiao Wang, Jing Liu, and Chaoyang Zhao
- Subjects
Context model ,Action (philosophy) ,Computer science ,business.industry ,Duration time ,Feature (machine learning) ,Context (language use) ,Artificial intelligence ,business ,Group activity ,Cognitive psychology ,Task (project management) - Abstract
Group activity classification is the task to identify activities with multiple person participation, which often involves in the usage of the context information like person relationships and person interactions. In this paper, we propose a novel approach to jointly model three co-existing cues including the activity duration time, individual action feature and the context information shared between person interactions. Our approach infers group activity labels of all the persons together with their activity durations, especially for the situation with multiple group activities co-existing. Experimental results show that our approach outperform state-of-the-art by 10%.
- Published
- 2015
- Full Text
- View/download PDF
25. When Personalization Meets Conformity
- Author
-
Shuang Qiu, Hanqing Lu, Zhenfeng Zhu, Jian Cheng, and Xi Zhang
- Subjects
Information retrieval ,Computer science ,media_common.quotation_subject ,Context (language use) ,Recommender system ,computer.software_genre ,Conformity ,Personalization ,Domain (software engineering) ,Similarity (psychology) ,Leverage (statistics) ,Data mining ,computer ,media_common - Abstract
Existing recommender systems place emphasis on personalization to achieve promising accuracy. However, in the context of multiple domain, users are likely to seek the same behaviors as domain authorities. This conformity effect provides a wealth of prior knowledge when it comes to multi-domain recommendation, but has not been fully exploited. In particular, users whose behaviors are significant similar with the public tastes can be viewed as domain authorities. To detect these users meanwhile embed conformity into recommendation, a domain-specific similarity matrix is intuitively employed. Therefore, a collective similarity is obtained to leverage the conformity with personalization. In this paper, we establish a Collective Structure Sparse Representation(CSSR) method for multi-domain recommendation. Based on adaptive $k$-Nearest-Neighbor framework, we impose the lasso and group lasso penalties as well as least square loss to jointly optimize the collective similarity. Experimental results on real-world data confirm the effectiveness of the proposed method.
- Published
- 2015
- Full Text
- View/download PDF
26. Mobile Media Thumbnailing
- Author
-
Yingying Chen, Jinqiao Wang, Hanqing Lu, and Jing Liu
- Subjects
Computer science ,business.industry ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Thumbnail ,Display device ,Mobile media ,Computer vision ,The Internet ,Video browsing ,Artificial intelligence ,Image warping ,business ,Mobile device - Abstract
With the development of Multimedia and Internet techniques, massively increasing visual data, such as image and video, need to be shown and browsed as thumbnails in various digital display platforms, like PC, cell phone, etc. This demonstration presents a grid based adaptive media thumb-nailing approach to maximize user experience in mobile image and video browsing. After representative frame extraction by spectral clustering and salient region detection, we obtain thumbnails with three resizing operators: cropping, warping and scaling, and adaptively fuse them into a unified grid based convex programming problem which could be solved simultaneously and efficiently through numerical optimization. Extensive experiments and comparisons on HUAWEI Honor 6 and Samsung S5 demonstrate that the proposed method achieves an excellent information preservation for thumbnails in mobile devices.
- Published
- 2015
- Full Text
- View/download PDF
27. Exploring Heterogeneity for Multi-Domain Recommendation with Decisive Factors Selection
- Author
-
Hanqing Lu, Shuang Qiu, Xi Zhang, and Jian Cheng
- Subjects
Consistency (database systems) ,Multi domain ,Computer science ,Data mining ,computer.software_genre ,computer ,Selection (genetic algorithm) ,Domain (software engineering) - Abstract
To address the recommendation problems in the scenarios of multiple domains, in this paper, we propose a novel method, HMRec, which models both consistency and heterogeneity of users' multiple behaviors in a unified framework. Moreover, the decisive factors of each domain can also be captured by our approach successfully. Experiments on the real multi-domain dataset demonstrate the effectiveness of our model.
- Published
- 2015
- Full Text
- View/download PDF
28. Mask Assisted Object Coding with Deep Learning for Object Retrieval in Surveillance Videos
- Author
-
Hanqing Lu, Min Xu, Jinqiao Wang, and Kezhen Teng
- Subjects
Artificial neural network ,Computer science ,business.industry ,Deep learning ,Autoencoder ,Background noise ,Robustness (computer science) ,Video tracking ,Visual Objects ,Computer vision ,Artificial intelligence ,business ,computer ,Coding (social sciences) ,computer.programming_language - Abstract
Retrieving visual object from a large-scale video dataset is one of multimedia research focuses but a challenging task due to imprecise object extraction and partial occlusion. This paper presents a novel approach to efficiently encode and retrieve visual objects, which addresses some practical complications in surveillance videos. Specifically, we take advantage of the mask information to assist object representation, and develop an encoding method by utilizing highly nonlinear mapping with a deep neural network. Furthermore, we add some occluded noise into the learning process to enhance the robustness of dealing with background noise and partial occlusions. A real-life surveillance video data containing over 10 million objects are built to evaluate the proposed approach. Experimental results show our approach significantly outperforms state-of-the-art solutions for object retrieval in large-scale video dataset.
- Published
- 2014
- Full Text
- View/download PDF
29. Supervised Hashing with Soft Constraints
- Author
-
Cong Leng, Jian Cheng, Jiaxiang Wu, Xi Zhang, and Hanqing Lu
- Subjects
Boosting (machine learning) ,Computer science ,business.industry ,Hash function ,Hamming distance ,Pattern recognition ,Machine learning ,computer.software_genre ,Regularization (mathematics) ,Semantic similarity ,Artificial intelligence ,Hamming space ,business ,computer ,Semantic gap - Abstract
Due to the ability to preserve semantic similarity in Hamming space, supervised hashing has been extensively studied recently. Most existing approaches encourage two dissimilar samples to have maximum Hamming distance. This may lead to an unexpected consequence that two unnecessarily similar samples would have the same code if they are both dissimilar with another sample. Besides, in existing methods, all labeled pairs are treated with equal importance without considering the semantic gap, which is not conducive to thoroughly leverage the supervised information. We present a general framework for supervised hashing to address the above two limitations. We do not toughly require a dissimilar pair to have maximum Hamming distance. Instead, a soft constraint which can be viewed as a regularization to avoid over-fitting is utilized. Moreover, we impose different weights to different training pairs, and these weights can be automatically adjusted in the learning process. Experiments on two benchmarks show that the proposed method can easily outperform other state-of-the-art methods.
- Published
- 2014
- Full Text
- View/download PDF
30. Less is More
- Author
-
Hanqing Lu, Jian Cheng, and Xi Zhang
- Subjects
Preference learning ,business.industry ,Computer science ,Recommender system ,Machine learning ,computer.software_genre ,Preference ,MovieLens ,Cold start ,Benchmark (computing) ,Preference elicitation ,Artificial intelligence ,Set (psychology) ,business ,computer - Abstract
Cold start recommendation is a challenging but crucial problem for recommender systems. Preference Elicitation, as a commonly used approach to address the problem, solicits preference of cold user by interviewing them with some elaborately selected items. How to select minimum items to reflect user preference as much as possible is the essential goal of preference elicitation. In this paper, we propose a novel Structured Sparse Representative Selection(SSRS) model to select a sparse set of items based on their ability of representation. Moreover, a e2,1-norm is utilized on both loss function and regularization to make the model insensitive to outliers and avoid selecting redundant queries respectively. Empirical results on benchmark movie rating datasets Movielens and Flixster verify the promising performance of our proposed preference elicitation method for cold start recommendation.
- Published
- 2014
- Full Text
- View/download PDF
31. Group latent factor model for recommendation with multiple user behaviors
- Author
-
Jinqiao Wang, Jian Cheng, Ting Yuan, and Hanqing Lu
- Subjects
Computer science ,Group (mathematics) ,business.industry ,Recommender system ,computer.software_genre ,Machine learning ,Linear subspace ,Matrix decomposition ,Factor (programming language) ,Collaborative filtering ,Data mining ,Artificial intelligence ,business ,computer ,computer.programming_language - Abstract
Recently, some recommendation methods try to relieve the data sparsity problem of Collaborative Filtering by exploiting data from users' multiple types of behaviors. However, most of the exist methods mainly consider to model the correlation between different behaviors and ignore the heterogeneity of them, which may make improper information transferred and harm the recommendation results. To address this problem, we propose a novel recommendation model, named Group Latent Factor Model (GLFM), which attempts to learn a factorization of latent factor space into subspaces that are shared across multiple behaviors and subspaces that are specific to each type of behaviors. Thus, the correlation and heterogeneity of multiple behaviors can be modeled by these shared and specific latent factors. Experiments on the real-world dataset demonstrate that our model can integrate users' multiple types of behaviors into recommendation better.
- Published
- 2014
- Full Text
- View/download PDF
32. Item group based pairwise preference learning for personalized ranking
- Author
-
Ting Yuan, Cong Leng, Hanqing Lu, Shuang Qiu, and Jian Cheng
- Subjects
Preference learning ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Preference ,k-nearest neighbors algorithm ,Ranking ,Collaborative filtering ,Pairwise comparison ,Artificial intelligence ,Set (psychology) ,business ,computer - Abstract
Collaborative filtering with implicit feedbacks has been steadily receiving more attention, since the abundant implicit feedbacks are more easily collected while explicit feedbacks are not necessarily always available. Several recent work address this problem well utilizing pairwise ranking method with a fundamental assumption that a user prefers items with positive feedbacks to the items without observed feedbacks, which also implies that the items without observed feedbacks are treated equally without distinction. However, users have their own preference on different items with different degrees which can be modeled into a ranking relationship. In this paper, we exploit this prior information of a user's preference from the nearest neighbor set by the neighbors' implicit feedbacks, which can split items into different item groups with specific ranking relations. We propose a novel PRIGP(Personalized Ranking with Item Group based Pairwise preference learning) algorithm to integrate item based pairwise preference and item group based pairwise preference into the same framework. Experimental results on three real-world datasets demonstrate the proposed method outperforms the competitive baselines on several ranking-oriented evaluation metrics.
- Published
- 2014
- Full Text
- View/download PDF
33. Random subspace for binary codes learning in large scale image retrieval
- Author
-
Hanqing Lu, Jian Cheng, and Cong Leng
- Subjects
Theoretical computer science ,Computer science ,Universal hashing ,Nearest neighbor search ,Feature vector ,Concatenation ,Hash function ,Short Code ,2-choice hashing ,Linear code ,K-independent hashing ,Binary code ,computer ,Image retrieval ,computer.programming_language - Abstract
Due to the fast query speed and low storage cost, hashing based approximate nearest neighbor search methods have attracted much attention recently. Many state of the art methods are based on eigenvalue decomposition. In these approaches, the information caught in different dimensions is unbalanced and generally most of the information is contained in the top eigenvectors. We demonstrate that this leads to an unexpected phenomenon that longer hashing code does not necessarily yield better performance. In this work, we introduce a random subspace strategy to address this limitation. At first, a small fraction of the whole feature space is randomly sampled to train the hashing algorithms each time and only the top eigenvectors are kept to generate one piece of short code. This process will be repeated several times and then the obtained many pieces of short codes are concatenated into one piece of long code. Theoretical analysis and experiments on two benchmarks confirm the effectiveness of the proposed strategy for hashing.
- Published
- 2014
- Full Text
- View/download PDF
34. Fusing multi-modal features for gesture recognition
- Author
-
Hanqing Lu, Jian Cheng, Jiaxiang Wu, and Chaoyang Zhao
- Subjects
Dynamic time warping ,business.industry ,Computer science ,Speech recognition ,Pattern recognition ,Sign language ,Levenshtein distance ,Gesture recognition ,Artificial intelligence ,business ,Precision and recall ,Hidden Markov model ,Classifier (UML) ,Signature recognition - Abstract
This paper proposes a novel multi-modal gesture recognition framework and introduces its application to continuous sign language recognition. A Hidden Markov Model is used to construct the audio feature classifier. A skeleton feature classifier is trained to provided complementary information based on the Dynamic Time Warping model. The confidence scores generated by two classifiers are firstly normalized and then combined to produce a weighted sum for the final recognition. Experimental results have shown that the precision and recall scores for 20 classes of our multi-modal recognition framework can achieve 0.8829 and 0.8890 respectively, which proves that our method is able to correctly reject false detection caused by single classifier. Our approach scored 0.12756 in mean Levenshtein distance and was ranked 1st in the Multi-modal Gesture Recognition Challenge in 2013.
- Published
- 2013
- Full Text
- View/download PDF
35. Object co-segmentation via discriminative low rank matrix recovery
- Author
-
Yong Li, Jing Liu, Hanqing Lu, Zechao Li, and Yang Liu
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Low-rank approximation ,Pattern recognition ,Object (computer science) ,Image (mathematics) ,Term (time) ,Set (abstract data type) ,Discriminative model ,Salient ,Segmentation ,Computer vision ,Artificial intelligence ,business - Abstract
The goal of this paper is to simultaneously segment the object regions appearing in a set of images of the same object class, known as object co-segmentation. Different from typical methods, simply assuming that the regions common among images are the object regions, we additionally consider the disturbance from consistent backgrounds, and indicate not only common regions but salient ones among images to be the object regions. To this end, we propose a Discriminative Low Rank matrix Recovery (DLRR) algorithm to divide the over-completely segmented regions (i.e.,superpixels) of a given image set into object and non-object ones. In DLRR, a low-rank matrix recovery term is adopted to detect salient regions in an image, while a discriminative learning term is used to distinguish the object regions from all the super-pixels. An additional regularized term is imported to jointly measure the disagreement between the predicted saliency and the objectiveness probability corresponding to each super-pixel of the image set. For the unified learning problem by connecting the above three terms, we design an efficient optimization procedure based on block-coordinate descent. Extensive experiments are conducted on two public datasets, i.e., MSRC and iCoseg, and the comparisons with some state-of-the-arts demonstrate the effectiveness of our work.
- Published
- 2013
- Full Text
- View/download PDF
36. TCRec
- Author
-
Jing Liu, Yu Jiang, Zechao Li, Hanqing Lu, and Xi Zhang
- Subjects
Product category ,Information retrieval ,Computer science ,Product (category theory) ,Social trust - Abstract
In this paper, we develop a novel product recommendation method called TCRec, which takes advantage of consumer rating history record, social-trust network and product category information simultaneously. Compared experiments are conducted on two real-world datasets and outstanding performance is achieved, which demonstrates the effectiveness of TCRec.
- Published
- 2013
- Full Text
- View/download PDF
37. TopRec
- Author
-
Hanqing Lu, Biao Niu, Xi Zhang, Jian Cheng, and Ting Yuan
- Subjects
Topic model ,Social network ,Computer science ,business.industry ,Probabilistic logic ,Filter (signal processing) ,Recommender system ,Machine learning ,computer.software_genre ,Preference ,Domain (software engineering) ,Collaborative filtering ,Artificial intelligence ,business ,computer - Abstract
Traditionally, Collaborative Filtering assumes that similar users have similar responses to similar items. However, human activities exhibit heterogenous features across multiple domains such that users own similar tastes in one domain may behave quite differently in other domains. Moreover, highly sparse data presents crucial challenge in preference prediction. Intuitively, if users' interested domains are captured first, the recommender system is more likely to provide the enjoyed items while filter out those uninterested ones. Therefore, it is necessary to learn preference profiles from the correlated domains instead of the entire user-item matrix. In this paper, we propose a unified framework, TopRec, which detects topical communities to construct interpretable domains for domain-specific collaborative filtering. In order to mine communities as well as the corresponding topics, a semi-supervised probabilistic topic model is utilized by integrating user guidance with social network. Experimental results on real-world data from Epinions and Ciao demonstrate the effectiveness of the proposed framework.
- Published
- 2013
- Full Text
- View/download PDF
38. Real-time multiple object instances detection
- Author
-
Jinqiao Wang, Hanqing Lu, Yifan Zhang, and Chengli Xie
- Subjects
Computer science ,business.industry ,Template matching ,Pattern recognition ,Data mining ,Artificial intelligence ,computer.software_genre ,business ,computer ,Classifier (UML) - Abstract
In this paper, we present a novel, real-time multiple object instance detection system via template matching and pairwise classification. Instance detection aims to find and locate exactly the same object instances as specified. Our system is composed of two heterogeneous stages. The first stage adopts instance-specific detection to generate candidates. And the second stage makes use of a pairwise-based classifier across instance categories to test and verify these candidates with respect to templates. Experiments show the superiority of our approach.
- Published
- 2012
- Full Text
- View/download PDF
39. Hi, magic closet, tell me what to wear!
- Author
-
Shuicheng Yan, Zheng Song, Hanqing Lu, Jiashi Feng, Si Liu, Tianzhu Zhang, and Changsheng Xu
- Subjects
Banquet ,Information retrieval ,Computer science ,business.industry ,media_common.quotation_subject ,Magic (programming) ,Closet ,Clothing ,business ,Magic (paranormal) ,GeneralLiterature_MISCELLANEOUS ,media_common - Abstract
In this paper, we aim at a practical system, magic closet, for automatic occasion-oriented clothing recommendation. Given a user-input occasion, e.g., wedding, shopping or dating, magic closet intelligently suggests the most suitable clothing from the user's own clothing photo album, or automatically pairs the user-specified reference clothing (upper-body or lower-body) with the most suitable one from online shops. Two key criteria are explicitly considered for the magic closet system. One criterion is to wear properly, e.g., compared to suit pants, it is more decent to wear a cocktail dress for a banquet occasion. The other criterion is to wear aesthetically, e.g., a red T-shirt matches better white pants than green pants. To narrow the semantic gap between the low-level features of clothing and the high-level occasion categories, we adopt middle-level clothing attributes (e.g., clothing category, color, pattern) as a bridge. More specifically, the clothing attributes are treated as latent variables in our proposed latent Support Vector Machine (SVM) based recommendation model. The wearing properly criterion is described in the model through a feature-occasion potential and an attribute-occasion potential, while the wearing aesthetically criterion is expressed by an attribute-attribute potential. To learn a generalize-well model and comprehensively evaluate it, we collect a large clothing What-to-Wear (WoW) dataset, and thoroughly annotate the whole dataset with 7 multi-value clothing attributes and 10 occasion categories via Amazon Mechanic Turk. Extensive experiments on the WoW dataset demonstrate the effectiveness of the magic closet system for both occasion-oriented clothing recommendation and pairing.
- Published
- 2012
- Full Text
- View/download PDF
40. Social tag alignment with image regions by sparse reconstructions
- Author
-
Yang Liu, Zechao Li, Hanqing Lu, Jing Liu, and Biao Niu
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer vision ,Artificial intelligence ,Neural coding ,business ,Image retrieval ,Image (mathematics) ,Feature detection (computer vision) ,Task (project management) - Abstract
How to align social tags with image regions without additional human intervention is a challenging but a valuable task since it can provide more detailed image semantic information and improve the accuracy of image retrieval. To this end, we propose a novel tag-to-region method with two phases of sparse reconstructions by exploring the large-scale user contributed resources. Given an image with social tags, we first explore the tagging information of large-scale social images to sparsely reconstruct the label vector of the given image, and then use the reconstructing weights as the semantic relevance to the image. With the top $T$ semantically relevant images, we further employ a group sparse coding algorithm to reconstruct each region of the given image, in which the regions from the social images with a common label are deemed as a label group. The group sparsity works on the assumption that one image region corresponds to tags as few as possible. Finally, the region-level tags can be predicted based on the reconstruction error in the corresponding label groups. Extensive experiments on MSRC and SAIAPR TC-12 datasets demonstrate the encouraging performance of our method in comparison with other baselines.
- Published
- 2012
- Full Text
- View/download PDF
41. Low rank metric learning for social image retrieval
- Author
-
Zechao Li, Yu Jiang, Jinhui Tang, Jing Liu, and Hanqing Lu
- Subjects
Computer science ,business.industry ,Pattern recognition ,Overfitting ,Machine learning ,computer.software_genre ,Contextual design ,Semantic similarity ,Convex optimization ,Metric (mathematics) ,Leverage (statistics) ,Artificial intelligence ,business ,computer ,Image retrieval - Abstract
With the popularity of social media applications, large amounts of social images associated with rich context are available, which is helpful for many applications. In this paper, we propose a Low Rank distance Metric Learning (LRML) algorithm by discovering knowledge from these rich contextual data, to boost the performance of CBIR. Different from traditional approaches that often use the must-links and cannot-links between images, the proposed method exploits information from the visual and textual domains. We assume that the visual similarity estimated by the learned metric is expected to be consistent with the semantic similarity in the textual domain. Since tags are usually noisy, misspelling or meaningless, we also leverage the preservation of visual structure to prevent overfitting those noisy tags. On the other hand, the metric is straightforward constrained to be low rank. We formulate it as a convex optimization problem with nuclear norm minimization and propose an effective optimization algorithm based on proximal gradient method. With the learned metric for image retrieval, some experimental evaluations on a real-world dataset demonstrate the outperformance of our approach over other related work.
- Published
- 2012
- Full Text
- View/download PDF
42. Point-context descriptor based region search for logo recognition
- Author
-
Hanqing Lu, Jianlong Fu, Yifan Zhang, and Jinqiao Wang
- Subjects
Logo recognition ,Point (typography) ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Mode (statistics) ,Context (language use) ,Logo ,Inverted index ,Image (mathematics) ,Constraint (information theory) ,Computer vision ,Artificial intelligence ,business - Abstract
We propose a novel approach for logo recognition in this paper. Firstly, we adopt a point-context descriptor for region modeling, which is a highly correlated integration of three kinds of features: point, shape, and patch. Secondly, after the query image is segmented into region trees, an asymmetric region-to-image search approach is utilized to visual logo recognition. A weak geometric constraint based on regions is encoded into the inverted file structures to accelerate the search speed. Then we apply global features to refine the results in the re-ranking stage. Finally, we combine each region score both in max-response mode and accumulate-response mode to obtain the final results. To evaluate the performance, we test the proposed approach both on our challenging logo dataset and the Flickr_Logos dataset. Experiments and comparisons show that our approach is superior to the state-of-the-art approaches.
- Published
- 2012
- Full Text
- View/download PDF
43. Chat with illustration
- Author
-
Hanqing Lu, Changsheng Xu, Zechao Li, Yu Jiang, and Jing Liu
- Subjects
User studies ,Scheme (programming language) ,World Wide Web ,Service (systems architecture) ,Index (publishing) ,Computer science ,First language ,Language barrier ,Context (language use) ,computer ,Sentence ,computer.programming_language - Abstract
Traditional instant messaging service mainly transfers textual message, while the visual message is ignored to a great extent. In this paper, we propose a novel instant messaging scheme with visual aids named Chat with Illustration (CWI), which presents users visual messages associated with chat content automatically. When users start their chat, the system first identifies meaningful keywords from dialogue content and analyzes context relations. Then CWI explores keyword-based image search in a image database with cluster-based index. Finally, according to context relations, CWI assembles these images properly and presents an optimal visual message for each dialogue sentence. With the combination of textual and visual message, users could enjoy a more interesting and vivid communication experience. Especially for different native language speakers, CWI can help them cross language barrier to some degree. The in-depth user studies demonstrate the effectiveness of our approach.
- Published
- 2012
- Full Text
- View/download PDF
44. Multiple features fusion for crowd density estimation
- Author
-
Hanqing Lu, Zhenchong Wang, Jinqiao Wang, and Zi Ye
- Subjects
Fusion ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Wavelet transform ,Pattern recognition ,Rule-based system ,Support vector machine ,Background noise ,Robustness (computer science) ,Computer vision ,Artificial intelligence ,Crowd density ,business ,Statistic - Abstract
Crowd density estimation, is much valuable in intelligent crowd monitoring. The traditional approach based on static texture analysis of single frame, is not adept to complex background, and the rule based statistic approaches are short of robustness for background noise. In this paper, a crowd density estimation approach fusing statistic features and texture analysis was proposed. After extracting foreground objects with frame difference, we learn SVM classifiers with GLCM and statistical features. The experiment results show the superiority of the proposed method and it can be applied in a complex environment.
- Published
- 2012
- Full Text
- View/download PDF
45. Ordinal preserving projection
- Author
-
Jing Liu, Changsheng Xu, Hanqing Lu, Yan Liu, Changsheng Li, and Qingshan Liu
- Subjects
Computer science ,business.industry ,Dimensionality reduction ,Pattern recognition ,computer.software_genre ,Image (mathematics) ,Data set ,Ranking ,Learning to rank ,Artificial intelligence ,Data mining ,Projection (set theory) ,business ,computer ,Subspace topology ,Curse of dimensionality - Abstract
Learning to rank has been demonstrated as a powerful tool for image ranking, but the issue of the "curse of dimensionality" is a key challenge of learning a ranking model from a large image database. This paper proposes a novel dimensionality reduction algorithm named ordinal preserving projection (OPP) for learning to rank. We first define two matrices, which work in the row direction and column direction respectively. The two matrices aim at leveraging the global structure of the data set and ordinal information of the observations. By maximizing the corresponding objective functions, we can obtain two optimal projection matrices mapping original data points into low-dimensional subspace, in which both global structure and ordinal information can be preserved. The experiments are conducted on the public available MSRA-MM image data set and "Web Queries" image data set, and the experimental results demonstrate the effectiveness of the proposed method.
- Published
- 2012
- Full Text
- View/download PDF
46. News contextualization with geographic and visual information
- Author
-
Changsheng Xu, Jing Liu, Meng Wang, Zechao Li, and Hanqing Lu
- Subjects
Set (abstract data type) ,Contextualization ,Information retrieval ,Computer science ,Reading (process) ,media_common.quotation_subject ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Relevance (information retrieval) ,Matrix decomposition ,Image (mathematics) ,media_common - Abstract
In this paper, we investigate the contextualization of news documents with geographic and visual information. We propose a matrix factorization approach to analyze the location relevance for each news document. We also propose a method to enrich the document with a set of web images. For location relevance analysis, we first perform toponym extraction and expansion to obtain a toponym list from news documents. We then propose a matrix factorization method to estimate the location-document relevance scores while simultaneously capturing the correlation of locations and documents. For image enrichment, we propose a method to generate multiple queries from each news document for image search and then employ an intelligent fusion approach to collect a set of images from the search results. Based on the location relevance analysis and image enrichment, we introduce a news browsing system named NewsMap which can support users in reading news via browsing a map and retrieving news with location queries. The news documents with the corresponding enriched images are presented to help users quickly get information. Extensive experiments demonstrate the effectiveness of our approaches.
- Published
- 2011
- Full Text
- View/download PDF
47. Snap & play
- Author
-
Changsheng Xu, Si Liu, Hanqing Lu, Jian Dong, Shuicheng Yan, and Qiang Chen
- Subjects
Entertainment ,Game design ,Video game development ,Multimedia ,Computer science ,ComputingMilieux_PERSONALCOMPUTING ,computer.software_genre ,Game Developer ,Adaptation (computer science) ,computer ,Mobile device - Abstract
According to the year 2010 report of the Entertainment Software Association [5], 42% of USA heads of households reported playing games on mobile devices, rising quickly from the 20% in 2002 and bringing huge market for mobile games. In this paper, by taking the popular game, Find-the-Difference (FiDi), as a concrete example, we explore new mobile game design principles and techniques for enhancing player's gaming experience in personalized, automatic, and dynamic aspects. Unlike traditional FiDi game, where image pairs (source image vs. target image) with M different patches are manually produced by game developer and players may feel boring or cheat after practicing all image pairs, our proposed Personalized FiDi (P-FiDi) mobile game may be played under a new Snap & Play mode. The player may first take photos with one s mobile device (or select from one's own albums). Then, these photos serve as source images, and the P-FiDi system automatically generates the counterpart target images by sequential operations of aesthetic image quality enhancement, image patch and differentiating style joint selection, music adaptation, dynamic difficulty level determination, and ultimate automatic image editing with a rich set of popular differentiating styles used in traditional FiDi game. Finally, the player enjoys the unique gaming with one's own (instant) photos and music, and the freedom to have new gaming image pairs any time. The user studies show that the P-FiDi mobile game is satisfying in terms of player experience.
- Published
- 2011
- Full Text
- View/download PDF
48. Specific vehicle detection and tracking in road environment
- Author
-
Yang Zhang, Hanqing Lu, Jinqiao Wang, Wei Fu, and Huazhong Xu
- Subjects
Background subtraction ,Svm classifier ,Kernel (image processing) ,Computer science ,business.industry ,Vehicle detection ,Histogram ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer vision ,Artificial intelligence ,business - Abstract
In this paper, we propose a real-time method to detect and track specific vehicles, toward monitoring the abnormal activities in the traffic environment. Firstly, a novel background subtraction approach is used to get the accurate foreground segmentations and shadow suppression. Then a HIK (Histogram Intersection Kernel) based SVM classifier is trained to recognize whether a vehicle is suspicious. Finally, the Camshift based tracking is used to fast track the specific vehicles. Experiments in a real traffic scenario show the promise of the proposed approach.
- Published
- 2011
- Full Text
- View/download PDF
49. Feature selection under learning to rank model for multimedia retrieve
- Author
-
Ling Shao, Changsheng Xu, Hanqing Lu, and Changsheng Li
- Subjects
Multimedia ,business.industry ,Computer science ,Dimensionality reduction ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Feature extraction ,Pattern recognition ,Feature selection ,computer.software_genre ,Feature model ,Feature (computer vision) ,Ranking SVM ,Learning to rank ,Artificial intelligence ,business ,Feature learning ,computer - Abstract
Most multimedia retrieval problem can be described by ranking model, i.e. the images in the database could be ranked according to the similarity compared with the query image. Existing ranking models generally use the features that are pre-defined by experts. This paper utilized machine learning techniques to automatically select useful features for ranking. We first generate a set of feature subsets by putting each feature into an individual feature subset. Then we sort these feature subsets according to the ranking performances. Third, two neighbor feature subsets in the ranked order are pairwised to generate a new feature subset. The new feature subsets are sorted based on the new ranking performance. Iterate until reach the pre-defined stop point. Experimental results on .gov dataset and Caltech101 development set show the effectiveness and efficiency of the proposed algorithm.
- Published
- 2010
- Full Text
- View/download PDF
50. Multi-modal multi-correlation person-centric news retrieval
- Author
-
Jing Liu, Zechao Li, Xiaobin Zhu, and Hanqing Lu
- Subjects
Correlation ,Information retrieval ,Modal ,Ranking ,Event (computing) ,Computer science ,Face (geometry) ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Natural language ,Image (mathematics) ,Ranking (information retrieval) ,Visualization - Abstract
In this paper, we propose a framework of multi-modal multi-correlation person-centric news retrieval, which integrates news event correlations, news entity correlations, and event-entity correlations simultaneously by exploring both text and image information. The proposed framework is confined to a person-name query and enables a more vivid and informative person-centric news retrieval by providing two views of result presentation, namely a query-oriented multi-correlation map and a ranking list of news items with necessary descriptions including news image, news title and summary, central entities and relevant news events. First, we pre-process news articles using natural language techniques, and initialize the three correlations by statistical analysis about events and entities in news articles and face images. Second, a Multi-correlation Probabilistic Matrix Factorization (MPMF) algorithm is proposed to complete and refine the three correlations. Different from traditional Probabilistic Matrix Factorization (PMF), the proposed MPFM additionally considers the event correlations and the entity correlations as well as the event-entity correlations during the factor analysis. Third, the result ranking and visualization are conducted to present search results relevant to a target news topic. Experimental results on a news dataset collected from multiple news websites demonstrate the attractive performance of the proposed solution for news retrieval.
- Published
- 2010
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.