Author: "Nie, Liqiang" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Nie, Liqiang"' showing total 80 results

Start Over Author "Nie, Liqiang" Publisher ieee

80 results on '"Nie, Liqiang"'

1. VK-G2T: Vision and Context Knowledge Enhanced Gloss2text

Author: Jing, Liqiang, primary, Song, Xuemeng, additional, Zu, Xinxing, additional, Zheng, Na, additional, Zhao, Zhongzhou, additional, and Nie, Liqiang, additional
Published: 2024
Full Text: View/download PDF

2. CHMATCH: Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervised Learning

Author: Wu, Jianlong, primary, Yang, Haozhe, additional, Gan, Tian, additional, Ding, Ning, additional, Jiang, Feijun, additional, and Nie, Liqiang, additional
Published: 2023
Full Text: View/download PDF

3. CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset

Author: Gan, Tian, primary, Wang, Qing, additional, Dong, Xingning, additional, Ren, Xiangyuan, additional, Nie, Liqiang, additional, and Guo, Qingpei, additional
Published: 2023
Full Text: View/download PDF

4. Modeling Product’s Visual and Functional Characteristics for Recommender Systems (Extended Abstract)

Author: Wu, Bin, primary, He, Xiangnan, additional, Chen, Yu, additional, Nie, Liqiang, additional, Zheng, Kai, additional, and Ye, Yangdong, additional
Published: 2023
Full Text: View/download PDF

5. Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

Author: Dong, Xingning, primary, Gan, Tian, additional, Song, Xuemeng, additional, Wu, Jianlong, additional, Cheng, Yuan, additional, and Nie, Liqiang, additional
Published: 2022
Full Text: View/download PDF

6. Win The Lottery Ticket Via Fourier Analysis: Frequencies Guided Network Pruning

Author: Shang, Yuzhang, primary, Duan, Bin, additional, Zong, Ziliang, additional, Nie, Liqiang, additional, and Yan, Yan, additional
Published: 2022
Full Text: View/download PDF

7. Learning Dual Low-Rank Representation for Multi-Label Micro-Video Classification.

Author: Lu, Wei, Li, Desheng, Nie, Liqiang, Jing, Peiguang, and Su, Yuting
Abstract: Currently, with the rapid development of mobile Internet, micro-video has become a prevailing format of user-generated contents (UGCs) on various social media platforms. Several studies have been conducted towards to understanding high-level micro-video semantics, such as venue categorization, memorability, and popularity. However, these approaches supported tasks with only a single output, which exhibited limitations when attempting to use them to resolve tasks with multiple outputs, especially the multi-label micro-video classification. To tackle this problem, in this paper, we propose a dual multi-modal low-rank decomposition (DMLRD) method for multi-label micro-video classification tasks. To learn more comprehensive micro-video representations, we first learn the low-rank-regularized modality-specific and modality-shared components by considering the consistency and the complementarity among modalities simultaneously. Meanwhile, the less descriptive power of each modality aroused by inherent properties can be solved to a certain extent. To obtain unseen label representations, we next construct a sparsity-regularized multi-matrix normal estimation term to jointly encode the latent relationship structures among labels and dimensions. Experiments on two datasets demonstrate the effectiveness of our proposed method over the state-of-art methods. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

8. Lipschitz Continuity Guided Knowledge Distillation

Author: Shang, Yuzhang, primary, Duan, Bin, additional, Zong, Ziliang, additional, Nie, Liqiang, additional, and Yan, Yan, additional
Published: 2021
Full Text: View/download PDF

9. Graph Contrastive Clustering

Author: Zhong, Huasong, primary, Wu, Jianlong, additional, Chen, Chong, additional, Huang, Jianqiang, additional, Deng, Minghua, additional, Nie, Liqiang, additional, Lin, Zhouchen, additional, and Hua, Xian-Sheng, additional
Published: 2021
Full Text: View/download PDF

10. An Attribute-Aware Attentive GCN Model for Attribute Missing in Recommendation.

Author: Liu, Fan, Cheng, Zhiyong, Zhu, Lei, Liu, Chenghao, and Nie, Liqiang
Subjects: MISSING data (Statistics), MAXIMUM likelihood detection, RECOMMENDER systems, MESSAGE passing (Computer science)
Abstract: As important side information, attributes have been widely exploited in the existing recommender system for better performance. However, in the real-world scenarios, it is common that some attributes of items/users are missing (e.g., some movies miss the genre data). Prior studies usually use a default value (i.e., “other”) to represent the missing attribute, resulting in sub-optimal performance. To address this problem, in this paper, we present an attribute-aware attentive graph convolution network (A ${^2}$ 2 -GCN). In particular, we first construct a graph, where users, items, and attributes are three types of nodes and their associations are edges. Thereafter, we leverage the graph convolution network to characterize the complicated interactions among $<$ < users, items, attributes $>$ > . Furthermore, to learn the node representation, we adopt the message-passing strategy to aggregate the messages passed from the other directly linked types of nodes (e.g., a user or an attribute). Towards this end, we are capable of incorporating associate attributes to strengthen the user and item representation learning, and thus naturally solve the attribute missing problem. Given that for different users, the attributes of an item have different influence on their preference to this item, we design a novel attention mechanism to filter the message passed from an item to a target user by considering the attribute information. Extensive experiments have been conducted on several publicly accessible datasets to justify our model, demonstrating that our model outperforms several state-of-the-art methods and demonstrate the effectiveness of our attention method. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

11. Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching.

Author: Dong, Xinfeng, Zhang, Huaxiang, Zhu, Lei, Nie, Liqiang, and Liu, Li
Subjects: SEMANTICS, MODAL logic, GRAPH algorithms, IMAGE reconstruction, FEATURE extraction, PARTICLES
Abstract: In order to carry out more accurate retrieval across image-text modalities, some scholars use fine-grained feature to align image and text. Most of them directly use attention mechanism to align image regions and words in the sentence, and ignore the fact that semantics related to an object is abstract and cannot be accurately expressed by object information alone. To overcome this weakness, we propose a hierarchical feature aggregation algorithm based on graph convolutional networks (GCN) to facilitate object semantic integrity by integrating attributes of an object and relations between objects hierarchically in both image and text modalities. In order to eliminate the semantic gap between modalities, we propose a cross-modal feature fusion method based on transformer to generate modal-specific feature representations by integrating both the object feature and global feature from the other modality. Then we map the fusion feature into a common space. Experiment results on the most frequently-used datasets MSCOCO and Flickr30K show the effectiveness of the proposed model compared with the latest methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

12. Enhancing Factorization Machines With Generalized Metric Learning.

Author: Guo, Yangyang, Cheng, Zhiyong, Jing, Jiazheng, Lin, Yanpeng, Nie, Liqiang, and Wang, Meng
Subjects: FACTORIZATION, ARTIFICIAL neural networks, SECONDHAND trade, EUCLIDEAN distance
Abstract: Factorization Machines (FMs) are effective in incorporating side information to overcome the cold-start and data sparsity problems in recommender systems. Traditional FMs adopt the inner product to model the second-order interactions between different attributes, which are represented via feature vectors. The problem is that the inner product violates the triangle inequality property of feature vectors. As a result, it cannot well capture fine-grained attribute interactions, resulting in sub-optimal performance. Recently, the euclidean distance is exploited in FMs to replace the inner product and has delivered better performance. However, previous FM methods including the ones equipped with the euclidean distance all focus on the attribute-level interaction modeling, ignoring the critical intrinsic feature correlations inside attributes. Thereby, they fail to model the complex and rich interactions exhibited in the real-world data. To tackle this problem, in this paper, we propose a FM framework equipped with generalized metric learning techniques to better capture these feature correlations. In particular, based on this framework, we present a Mahalanobis distance and a deep neural network (DNN) methods, which can effectively model the linear and non-linear correlations between features, respectively. Besides, we design an efficient approach for simplifying the model functions. Experiments on several benchmark datasets demonstrate that our proposed framework outperforms several state-of-the-art baselines by a large margin. Moreover, we collect a new large-scale dataset on second-hand trading to justify the effectiveness of our method over cold-start and data sparsity problems in recommender systems. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

13. Loss Re-Scaling VQA: Revisiting the Language Prior Problem From a Class-Imbalance View.

Author: Guo, Yangyang, Nie, Liqiang, Cheng, Zhiyong, Tian, Qi, and Zhang, Min
Subjects: *IMAGE recognition (Computer vision), *COMPUTER vision, *VISUAL learning, *SCIENTIFIC community, *BAYES' estimation
Abstract: Recent studies have pointed out that many well-developed Visual Question Answering (VQA) models are heavily affected by the language prior problem. It refers to making predictions based on the co-occurrence pattern between textual questions and answers instead of reasoning upon visual contents. To tackle this problem, most existing methods focus on strengthening the visual feature learning capability to reduce this text shortcut influence on model decisions. However, few efforts have been devoted to analyzing its inherent cause and providing an explicit interpretation. It thus lacks a good guidance for the research community to move forward in a purposeful way, resulting in model construction perplexity towards overcoming this non-trivial problem. In this paper, we propose to interpret the language prior problem in VQA from a class-imbalance view. Concretely, we design a novel interpretation scheme whereby the loss of mis-predicted frequent and sparse answers from the same question type is distinctly exhibited during the late training phase. It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer, to a given question whose right answer is sparse in the training set. Based upon this observation, we further propose a novel loss re-scaling approach to assign different weights to each answer according to the training data statistics for estimating the final loss. We apply our approach into six strong baselines and the experimental results on two VQA-CP benchmark datasets evidently demonstrate its effectiveness. In addition, we also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

14. Partially Supervised Compatibility Modeling.

Author: Guan, Weili, Wen, Haokun, Song, Xuemeng, Wang, Chun, Yeh, Chung-Hsing, Chang, Xiaojun, and Nie, Liqiang
Subjects: SUPERVISED learning, IMAGE color analysis, MODELS (Persons), SOURCE code, FACTOR structure
Abstract: Fashion Compatibility Modeling (FCM), which aims to automatically evaluate whether a given set of fashion items makes a compatible outfit, has attracted increasing research attention. Recent studies have demonstrated the benefits of conducting the item representation disentanglement towards FCM. Although these efforts have achieved prominent progress, they still perform unsatisfactorily, as they mainly investigate the visual content of fashion items, while overlooking the semantic attributes of items (e.g., color and pattern), which could largely boost the model performance and interpretability. To address this issue, we propose to comprehensively explore the visual content and attributes of fashion items towards FCM. This problem is non-trivial considering the following challenges: a) how to utilize the irregular attribute labels of items to partially supervise the attribute-level representation learning of fashion items; b) how to ensure the intact disentanglement of attribute-level representations; and c) how to effectively sew the multiple granulairites (i.e, coarse-grained item-level and fine-grained attribute-level) information to enable performance improvement and interpretability. To address these challenges, in this work, we present a partially supervised outfit compatibility modeling scheme (PS-OCM). In particular, we first devise a partially supervised attribute-level embedding learning component to disentangle the fine-grained attribute embeddings from the entire visual feature of each item. We then introduce a disentangled completeness regularizer to prevent the information loss during disentanglement. Thereafter, we design a hierarchical graph convolutional network, which seamlessly integrates the attribute- and item-level compatibility modeling, and enables the explainable compatibility reasoning. Extensive experiments on the real-world dataset demonstrate that our PS-OCM significantly outperforms the state-of-the-art baselines. We have released our source codes and well-trained models to benefit other researchers (https://site2750.wixsite.com/ps-ocm). [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

15. Modeling Product’s Visual and Functional Characteristics for Recommender Systems.

Author: Wu, Bin, He, Xiangnan, Chen, Yun, Nie, Liqiang, Zheng, Kai, and Ye, Yangdong
Subjects: RECOMMENDER systems, MATRIX decomposition, PRODUCT attributes, MACHINE learning, IMPLICIT learning, OFFICE equipment & supplies
Abstract: An effective recommender system can significantly help customers to find desired products and assist business owners to earn more income. Nevertheless, the decision-making process of users is highly complex, not only dependent on the personality and preference of a user, but also complicated by the characteristics of a specific product. For example, for products of different domains (e.g., clothing versus office products), the product aspects that affect a user’s decision are very different. As such, traditional collaborative filtering methods that model only user-item interaction data would deliver unsatisfactory recommendation results. In this work, we focus on fine-grained modeling of product characteristics to improve recommendation quality. Specifically, we first divide a product’s characteristics into visual and functional aspects—i.e., the visual appearance and functionality of the product. One insight is that, the visual characteristic is very important for products of visually-aware domain (e.g., clothing), while the functional characteristic plays a more crucial role for visually non-aware domain (e.g., office products). We then contribute a novel probabilistic model, named Visual and Functional Probabilistic Matrix Factorization (VFPMF), to unify the two factors to estimate user preferences on products. Nevertheless, such an expressive model poses efficiency challenge in parameter learning from implicit feedback. To address the technical challenge, we devise a computationally efficient learning algorithm based on alternating least squares. Furthermore, we provide an online updating procedure of the algorithm, shedding some light on how to adapt our method to real-world recommendation scenario where data continuously streams in. Extensive experiments on four real-word datasets demonstrate the effectiveness of our method with both offline and online protocols. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

16. Adversarial Graph Convolutional Network for Cross-Modal Retrieval.

Author: Dong, Xinfeng, Liu, Li, Zhu, Lei, Nie, Liqiang, and Zhang, Huaxiang
Subjects: MODAL logic, REPRESENTATIONS of graphs, GENERATIVE adversarial networks
Abstract: The completeness of semantic expression plays an important role in cross-modal retrieval tasks, which contributes to align the cross-modal data and thus narrow the modality gap. But due to the abstractness of semantics, the same topic may have different aspects to be well described so it may be incomplete to express semantics with only one sample. In order to obtain semantic complementary information and strengthen similar information for samples with the same semantics, we utilize a graph convolutional network (GCN) to reconstruct the sample representation based on the adjacency relationship between the sample itself and its neighborhoods. We construct a local graph for each instance, and propose a novel Graph Feature Generator based on GCN and a fully-connected network to reconstruct node features based on local graph and map the features of two modalities into a common space. The Graph Feature Generator and Graph Feature Discriminator adopt a minimax game strategy to generate modality-invariant graph feature representations. Experiments on three benchmark datasets demonstrate the superiority of our proposed model compared with several state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

17. Principal Component Analysis on Graph-Hessian

Author: Pan, Yichen, primary, Zhou, Yicong, additional, Liu, Weifeng, additional, and Nie, Liqiang, additional
Published: 2019
Full Text: View/download PDF

18. BATCH: A Scalable Asymmetric Discrete Cross-Modal Hashing.

Author: Wang, Yongxin, Luo, Xin, Nie, Liqiang, Song, Jingkuan, Zhang, Wei, and Xu, Xin-Shun
Subjects: BINARY codes, MATRIX decomposition, SPARSE matrices, REDUNDANCY in engineering
Abstract: Supervised cross-modal hashing has attracted much attention. However, there are still some challenges, e.g., how to effectively embed the label information into binary codes, how to avoid using a large similarity matrix and make a model scalable to large-scale datasets, how to efficiently solve the binary optimization problem. To address these challenges, in this paper, we present a novel supervised cross-modal hashing method, i.e., scalaBle Asymmetric discreTe Cross-modal Hashing, BATCH for short. It leverages collective matrix factorization to learn a common latent space for the labels and different modalities, and embeds the labels into binary codes by minimizing a distance-distance difference problem. Furthermore, it builds a connection between the common latent space and the hash codes by an asymmetric strategy. In the light of this, it can perform cross-modal retrieval and embed more similarity information into the binary codes. In addition, it introduces a quantization minimization term and orthogonal constraints into the optimization problem, and generates the binary codes discretely. Therefore, the quantization error and redundancy may be much reduced. Moreover, it is a two-step method, making the optimization simple and scalable to large-scale datasets. Extensive experimental results on three benchmark datasets demonstrate that BATCH outperforms some state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

19. Cooperation Learning From Multiple Social Networks: Consistent and Complementary Perspectives.

Author: Guan, Weili, Song, Xuemeng, Gan, Tian, Lin, Junyu, Chang, Xiaojun, and Nie, Liqiang
Abstract: GWI survey has highlighted the flourishing use of multiple social networks: the average number of social media accounts per Internet user is 5.54, and among them, 2.82 are being used actively. Indeed, users tend to express their views in more than one social media site. Hence, merging social signals of the same user across different social networks together, if available, can facilitate the downstream analyses. Previous work has paid little attention on modeling the cooperation among the following factors when fusing data from multiple social networks: 1) as data from different sources characterizes the characteristics of the same social user, the source consistency merits our attention; 2) due to their different functional emphases, some aspects of the same user captured by different social networks can be just complementary and results in the source complementarity; and 3) different sources can contribute differently to the user characterization and hence lead to the different source confidence. Toward this end, we propose a novel unified model, which co-regularizes source consistency, complementarity, and confidence to boost the learning performance with multiple social networks. In addition, we derived its theoretical solution and verified the model with the real-world application of user interest inference. Extensive experiments over several state-of-the-art competitors have justified the superiority of our model. 1 http://tinyurl.com/zk6kgc9 [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

20. Conversational Image Search.

Author: Nie, Liqiang, Jiao, Fangkai, Wang, Wenjie, Wang, Yinglong, and Tian, Qi
Subjects: *IMAGE representation, *IMAGE color analysis, *KNOWLEDGE representation (Information theory), *DATA release
Abstract: Conversational image search, a revolutionary search mode, is able to interactively induce the user response to clarify their intents step by step. Several efforts have been dedicated to the conversation part, namely automatically asking the right question at the right time for user preference elicitation, while few studies focus on the image search part given the well-prepared conversational query. In this paper, we work towards conversational image search, which is much difficult compared to the traditional image search task, due to the following challenges: 1) understanding complex user intents from a multimodal conversational query; 2) utilizing multiform knowledge associated images from a memory network; and 3) enhancing the image representation with distilled knowledge. To address these problems, in this paper, we present a novel contextuaL imAge seaRch sCHeme (LARCH for short), consisting of three components. In the first component, we design a multimodal hierarchical graph-based neural network, which learns the conversational query embedding for better user intent understanding. As to the second one, we devise a multi-form knowledge embedding memory network to unify heterogeneous knowledge structures into a homogeneous base that greatly facilitates relevant knowledge retrieval. In the third component, we learn the knowledge-enhanced image representation via a novel gated neural network, which selects the useful knowledge from retrieved relevant one. Extensive experiments have shown that our LARCH yields significant performance over an extended benchmark dataset. As a side contribution, we have released the data, codes, and parameter settings to facilitate other researchers in the conversational image search community. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

21. Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization.

Author: Hu, Yupeng, Nie, Liqiang, Liu, Meng, Wang, Kun, Wang, Yinglong, and Hua, Xian-Sheng
Subjects: *VIDEO compression, *CONTENT analysis, *TASK analysis
Abstract: Video moment localization, as an important branch of video content analysis, has attracted extensive attention in recent years. However, it is still in its infancy due to the following challenges: cross-modal semantic alignment and localization efficiency. To address these impediments, we present a cross-modal semantic alignment network. To be specific, we first design a video encoder to generate moment candidates, learn their representations, as well as model their semantic relevance. Meanwhile, we design a query encoder for diverse query intention understanding. Thereafter, we introduce a multi-granularity interaction module to deeply explore the semantic correlation between multi-modalities. Thereby, we can effectively complete target moment localization via sufficient cross-modal semantic understanding. Moreover, we introduce a semantic pruning strategy to reduce cross-modal retrieval overhead, improving localization efficiency. Experimental results on two benchmark datasets have justified the superiority of our model over several state-of-the-art competitors. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

22. Video Moment Localization via Deep Cross-Modal Hashing.

Author: Hu, Yupeng, Liu, Meng, Su, Xiaobin, Gao, Zan, and Nie, Liqiang
Subjects: VIDEO compression, HAMMING distance, STREAMING video & television, COMPACT spaces (Topology), VIDEO coding
Abstract: Due to the continuous booming of surveillance and Web videos, video moment localization, as an important branch of video content analysis, has attracted wide attention from both industry and academia in recent years. It is, however, a non-trivial task due to the following challenges: temporal context modeling, intelligent moment candidate generation, as well as the necessary efficiency and scalability in practice. To address these impediments, we present a deep end-to-end cross-modal hashing network. To be specific, we first design a video encoder relying on a bidirectional temporal convolutional network to simultaneously generate moment candidates and learn their representations. Considering that the video encoder characterizes temporal contextual structures at multiple scales of time windows, we can thus obtain enhanced moment representations. As a counterpart, we design an independent query encoder towards user intention understanding. Thereafter, a cross-model hashing module is developed to project these two heterogeneous representations into a shared isomorphic Hamming space for compact hash code learning. After that, we can effectively estimate the relevance score of each “moment-query” pair via the Hamming distance. Besides effectiveness, our model is far more efficient and scalable since the hash codes of videos can be learned offline. Experimental results on real-world datasets have justified the superiority of our model over several state-of-the-art competitors. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

23. Iterative Local-Global Collaboration Learning Towards One-Shot Video Person Re-Identification.

Author: Liu, Meng, Qu, Leigang, Nie, Liqiang, Liu, Maofu, Duan, Lingyu, and Chen, Baoquan
Subjects: ITERATIVE learning control, VIDEO surveillance, RECOMMENDER systems, INFORMATION filtering, VIDEOS, LEARNING
Abstract: Video person re-identification (video Re-ID) plays an important role in surveillance video analysis and has gained increasing attention recently. However, existing supervised methods require vast labeled identities across cameras. Although some unsupervised approaches have been exploited for video Re-ID, they are still in their infancy due to the complex nature of learning discriminative features on unlabelled data. In this article, we focus on one-shot video Re-ID and present an iterative local-global collaboration learning approach to learn robust and discriminative person representations. Specifically, it jointly considers the global video information and local frame sequence information to better capture the diverse appearance of the person for feature learning and pseudo-label estimation. Moreover, as the cross-entropy loss may induce the model to focus on identity-irrelevant factors, we introduce the variational information bottleneck as a regularization term to train the model together. It can help filter undesirable information and characterize subtle differences among persons. Since accuracy cannot always be guaranteed for pseudo-labels, we adopt a dynamic selection strategy to select part of pseudo-labeled data with higher confidence to update the training set and re-train the learning model. During training, our method iteratively executes the feature learning, pseudo-label estimation, and dynamic sample selection until all the unlabeled data have been seen. Extensive experiments on two public datasets, i.e., DukeMTMC-VideoReID and MARS, have verified the superiority of our model to several cutting-edge competitors. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

24. Model Optimization Boosting Framework for Linear Model Hash Learning.

Author: Liu, Xingbo, Nie, Xiushan, Zhou, Quan, Nie, Liqiang, and Yin, Yilong
Subjects: NEIGHBORHOOD characteristics, INFORMATION retrieval
Abstract: Efficient hashing techniques have attracted extensive research interests in both storage and retrieval of high-dimensional data, such as images and videos. In existing hashing methods, a linear model is commonly utilized owing to its efficiency. To obtain better accuracy, linear-based hashing methods focus on designing a generalized linear objective function with different constraints or penalty terms that consider the inherent characteristics and neighborhood information of samples. Differing from existing hashing methods, in this study, we propose a self-improvement framework called Model Boost (MoBoost) to improve model parameter optimization for linear-based hashing methods without adding new constraints or penalty terms. In the proposed MoBoost, for a linear-based hashing method, we first repeatedly execute the hashing method to obtain several hash codes to training samples. Then, utilizing two novel fusion strategies, these codes are fused into a single set. We also propose two new criteria to evaluate the goodness of hash bits during the fusion process. Based on the fused set of hash codes, we learn new parameters for the linear hash function that can significantly improve the accuracy. In general, the proposed MoBoost can be adopted by existing linear-based hashing methods, achieving more precise and stable performance compared to the original methods, and adopting the proposed MoBoost will incur negligible time and space costs. To evaluate the proposed MoBoost, we performed extensive experiments on four benchmark datasets, and the results demonstrate superior performance. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

25. Scalable Deep Hashing for Large-Scale Social Image Retrieval.

Author: Cui, Hui, Zhu, Lei, Li, Jingjing, Yang, Yang, and Nie, Liqiang
Subjects: IMAGE retrieval, ARTIFICIAL neural networks, SUPERVISED learning, DEEP learning, TAGS (Metadata), IMAGE representation, BINARY codes
Abstract: Recent years have witnessed the wide application of hashing for large-scale image retrieval, because of its high computation efficiency and low storage cost. Particularly, benefiting from current advances in deep learning, supervised deep hashing methods have greatly boosted the retrieval performance, under the strong supervision of large amounts of manually annotated semantic labels. However, their performance is highly dependent upon the supervised labels, which significantly limits the scalability. In contrast, unsupervised deep hashing without label dependence enjoys the advantages of well scalability. Nevertheless, due to the relaxed hash optimization, and more importantly, the lack of semantic guidance, existing methods suffer from limited retrieval performance. In this paper, we propose a SCAlable Deep Hashing (SCADH) to learn enhanced hash codes for social image retrieval. We formulate a unified scalable deep hash learning framework which explores the weak but free supervision of discriminative user tags that are commonly accompanied with social images. It jointly learns image representations and hash functions with deep neural networks, and simultaneously enhances the discriminative capability of image hash codes with the refined semantics from the accompanied social tags. Further, instead of simple relaxed hash optimization, we propose a discrete hash optimization method based on Augmented Lagrangian Multiplier to directly solve the hash codes and avoid the binary quantization information loss. Experiments on two standard social image datasets demonstrate the superiority of the proposed approach compared with state-of-the-art shallow and deep hashing techniques. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

26. Neural Compatibility Modeling With Probabilistic Knowledge Distillation.

Author: Han, Xianjing, Song, Xuemeng, Yao, Yiyang, Xu, Xin-Shun, and Nie, Liqiang
Subjects: FASHION, MODELS (Persons), MODERN society, HUMAN beings, SCIENTIFIC community
Abstract: In modern society, clothing matching plays a pivotal role in people’s daily life, as suitable outfits can beautify their appearance directly. Nevertheless, how to make a suitable outfit has become a daily headache for many people, especially those who do not have much sense of aesthetics. In the light of this, many research efforts have been dedicated to the task of complementary clothing matching and have achieved great success relying on the advanced data-driven neural networks. However, most existing methods overlook the rich valuable knowledge accumulated by our human beings in the fashion domain, especially the rules regarding clothing matching, like “coats go with dresses” and “silk tops cannot go with chiffon bottoms”. Towards this end, in this work, we propose a knowledge-guided neural compatibility modeling scheme, which is able to incorporate the rich fashion domain knowledge to enhance the performance of the compatibility modeling in the context of clothing matching. To better integrate the huge and implicit fashion domain knowledge into the data-driven neural networks, we present a probabilistic knowledge distillation (PKD) method, which is able to encode vast knowledge rules in a probabilistic manner. Extensive experiments on two real-world datasets have verified the guidance of rules from different sources and demonstrated the effectiveness and portability of our model. As a byproduct, we released the codes and involved parameters to benefit the research community. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

27. SCRATCH: A Scalable Discrete Matrix Factorization Hashing Framework for Cross-Modal Retrieval.

Author: Chen, Zhen-Duo, Li, Chuan-Xiang, Luo, Xin, Nie, Liqiang, Zhang, Wei, and Xu, Xin-Shun
Subjects: MATRIX decomposition, BINARY codes, LEARNING strategies, HASHING
Abstract: In this paper, we present a novel supervised cross-modal hashing framework, namely Scalable disCRete mATrix faCtorization Hashing (SCRATCH). First, it utilizes collective matrix factorization on original features together with label semantic embedding, to learn the latent representations in a shared latent space. Thereafter, it generates binary hash codes based on the latent representations. During optimization, it avoids using a large $n\times n$ similarity matrix and generates hash codes discretely. Besides, based on different objective functions, learning strategy, and features, we further present three models in this framework, i.e., SCRATCH-o, SCRATCH-t, and SCRATCH-d. The first one is a one-step method, learning the hash functions and the binary codes in the same optimization problem. The second is a two-step method, which first generates the binary codes and then learns the hash functions based on the learned hash codes. The third one is a deep version of SCRATCH-t, which utilizes deep neural networks as hash functions. The extensive experiments on two widely used benchmark datasets demonstrate that SCRATCH-o and SCRATCH-t outperform some state-of-the-art shallow hashing methods for cross-modal retrieval. The SCRATCH-d also outperforms some state-of-the-art deep hashing models. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

28. Low-Rank Regularized Multi-Representation Learning for Fashion Compatibility Prediction.

Author: Jing, Peiguang, Ye, Shu, Nie, Liqiang, Liu, Jing, and Su, Yuting
Abstract: The currently flourishing fashion-oriented community websites and the continuous pursuit of fashion have attracted the increased research interest of the fashion analysis community. Many studies show that predicting the compatibility of fashion outfits is a nontrivial task due to the difficulty in capturing the implicit patterns affecting fashion compatibility prediction and the complex relationships presented by raw data. To address these problems, in this paper, we propose a transductive low-rank hypergraph regularizer multiple-representation learning framework (LHMRL), whereby we formulate the processes of feature representation and fashion compatibility prediction in a joint framework. Specifically, we first introduce a low-rank regularized multiple-representation learning framework, in which the lowest-rank multiple representations of samples can be learned to characterize samples from different perspectives. In this framework, we maximize the total difference among multiple representations based on Grassmann manifold theory and incorporate a common hypergraph regularizer to naturally encode the complex relationships between fashion items and an outfit. To enhance the representation ability of our model, we then develop a supervised learning term by exploiting two types of supervision information from labeled data. Experiments on a publicly available large-scale dataset demonstrate the effectiveness of our proposed model over the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

29. Learning the Traditional Art of Chinese Calligraphy via Three-Dimensional Reconstruction and Assessment.

Author: Jian, Muwei, Dong, Junyu, Gong, Maoguo, Yu, Hui, Nie, Liqiang, Yin, Yilong, and Lam, Kin-Man
Abstract: The traditional art of Chinese calligraphy, reflecting the wisdom of the grass-roots community, is the soul of Chinese culture. Just like many other types of craftsmanship, it is part of the historical heritage and is worth conserving, from generation to generation. Since the movements of an ink brush are in a 3D style when Chinese calligraphy is written, they embody “The Power of Beauty,” comprising various reflectance properties and rough-surface geometry. To truly understand the powerful significance and beauty of the art of Chinese calligraphy, in this paper, a 3D calligraphy reconstruction method, based on Photometric Stereo, is designed to capture the detailed appearance of the calligraphy's 3D surface geometry. For assessment, an Iterative Closest Point (ICP) algorithm is applied for registration of 3D intrinsic shapes between the Chinese calligraphy and the calligraphy fans’ handwriting. Through matching these two sets of calligraphy characters, the designed system can give a score to the handwriting of a user. Experiments have been performed on Chinese calligraphy from different historical dynasties to evaluate the effectiveness of the proposed scheme, and experimental results show that the developed system is useful and provides a convenient method of calligraphy appreciation and assessment. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

30. Graph Convolutional Network Hashing.

Author: Zhou, Xiang, Shen, Fumin, Liu, Li, Liu, Wei, Nie, Liqiang, Yang, Yang, and Shen, Heng Tao
Abstract: Recently, graph-based hashing that learns similarity-preserving binary codes via an affinity graph has been extensively studied for large-scale image retrieval. However, most graph-based hashing methods resort to intractable binary quadratic programs, making them unscalable to massive data. In this paper, we propose a novel graph convolutional network-based hashing framework, dubbed GCNH, which directly carries out spectral convolution operations on both an image set and an affinity graph built over the set, naturally yielding similarity-preserving binary embedding. GCNH fundamentally differs from conventional graph hashing methods which adopt an affinity graph as the only learning guidance in an objective function to pursue the binary embedding. As the core ingredient of GCNH, we introduce an intuitive asymmetric graph convolutional (AGC) layer to simultaneously convolve the anchor graph, input data, and convolutional filters. By virtue of the AGC layer, GCNH well addresses the issues of scalability and out-of-sample extension when leveraging affinity graphs for hashing. As a use case of our GCNH, we particularly study the semisupervised hashing scenario in this paper. Comprehensive image retrieval evaluations on the CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate the consistent advantages of GCNH over the state-of-the-art methods given limited labeled data. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

31. Neural Multimodal Cooperative Learning Toward Micro-Video Understanding.

Author: Wei, Yinwei, Wang, Xiang, Guan, Weili, Nie, Liqiang, Lin, Zhouchen, and Chen, Baoquan
Subjects: GROUP work in education, VIDEOS, MODAL logic, FEATURE extraction
Abstract: The prevailing characteristics of micro-videos result in the less descriptive power of each modality. The micro-video representations, several pioneer efforts proposed, are limited in implicitly exploring the consistency between different modality information but ignore the complementarity. In this paper, we focus on how to explicitly separate the consistent features and the complementary features from the mixed information and harness their combination to improve the expressiveness of each modality. Toward this end, we present a neural multimodal cooperative learning (NMCL) model to split the consistent component and the complementary component by a novel relation-aware attention mechanism. Specifically, the computed attention score can be used to measure the correlation between the features extracted from different modalities. Then, a threshold is learned for each modality to distinguish the consistent and complementary features according to the score. Thereafter, we integrate the consistent parts to enhance the representations and supplement the complementary ones to reinforce the information in each modality. As to the problem of redundant information, which may cause overfitting and is hard to distinguish, we devise an attention network to dynamically capture the features which closely related the category and output a discriminative representation for prediction. The experimental results on a real-world micro-video dataset show that the NMCL outperforms the state-of-the-art methods. Further studies verify the effectiveness and cooperative effects brought by the attentive mechanism. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

32. Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval.

Author: Li, Chuan-Xiang, Yan, Ting-Kun, Luo, Xin, Nie, Liqiang, and Xu, Xin-Shun
Abstract: The hashing-based approximate nearest neighbors search is able to reduce the storage cost and improve query speed. Therefore, they have attracted much attention in these years. Moreover, some hashing methods have been proposed for cross-modal retrieval tasks. However, there are still some issues that need to be further addressed. For example, some of them only construct a simple similarity matrix when learning hash functions or binary codes, which may lose some useful information. Some of them solve the hard discrete optimization problem by relaxing the binary constraints and quantizing the solution to obtain the final results, which may generate large quantization errors. To address these challenges, we present a new supervised cross-modal hashing method, named supervised robust discrete multimodal hashing (SRDMH). Specifically, it incorporates full label information into the hash functions learning to preserve the similarity in the original space. In addition, instead of relaxing the binary constraints, it is able to learn the binary codes and hash functions simultaneously. Moreover, it adopts a flexible $\ell _{2,p}$ loss with nonlinear kernel embedding and introduces an intermediate presentation of the binary codes. In light of this, it becomes more robust and easier to solve by an iterative algorithm presented in this paper. To evaluate its performance, we conduct extensive experiments on three benchmark datasets. The results verify that SRDMH outperforms seven state-of-the-art cross-modal hashing methods. In addition, we also extend it to the classification task. Compared with other hashing methods, SRDMH also obtains better results when its binary codes are used for classification. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

33. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning

Author: Chen, Long, primary, Zhang, Hanwang, additional, Xiao, Jun, additional, Nie, Liqiang, additional, Shao, Jian, additional, Liu, Wei, additional, and Chua, Tat-Seng, additional
Published: 2017
Full Text: View/download PDF

34. Distribution-Oriented Aesthetics Assessment With Semantic-Aware Hybrid Network.

Author: Cui, Chaoran, Liu, Huihui, Lian, Tao, Nie, Liqiang, Zhu, Lei, and Yin, Yilong
Abstract: Image aesthetics assessment has emerged as a hot topic in recent years due to its potential in numerous high-level vision applications. In this paper, distinguished from existing studies relying on a single label, we propose quantifying image aesthetics by a distribution over multiple quality levels. The distribution-based representation characterizes the disagreement among users’ aesthetic preferences regarding the same image, and is also compatible with the traditional task of aesthetic label prediction. Our framework is developed based on fully convolutional networks and enables inputs of varying sizes. In this way, we circumvent the fixed-size constraint of prevalent convolutional neural networks, and avoid the risk of impairing the intrinsic aesthetic appeal of images. Moreover, given the fact that aesthetic perceiving is coupled with semantic understanding, we present a novel semantic-aware hybrid NEtwork (SANE), which harvests the information from object categorization and scene recognition to enhance image aesthetics assessment. Experiments on two benchmark datasets have well verified the effectiveness of our approach in both scenarios of aesthetic distribution prediction and aesthetic label prediction, and highlighted the benefits of input preserving as well as semantic understanding for images. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

35. Exploring Web Images to Enhance Skin Disease Analysis Under A Computer Vision Framework.

Author: Xia, Yingjie, Zhang, Luming, Meng, Lei, Yan, Yan, Nie, Liqiang, and Li, Xuelong
Abstract: To benefit the skin care, this paper aims to design an automatic and effective visual analysis framework, with the expectation of recognizing the skin disease from a given image conveying the disease affected surface. This task is nontrivial, since it is hard to collect sufficient well-labeled samples. To address such problem, we present a novel transfer learning model, which is able to incorporate external knowledge obtained from the rich and relevant Web images contributed by grassroots. In particular, we first construct a target domain by crawling a small set of images from vertical and professional dermatological websites. We then construct a source domain by collecting a large set of skin disease related images from commercial search engines. To reinforce the learning performance in the target domain, we initially build a learning model in the target domain, and then seamlessly leverage the training samples in the source domain to enhance this learning model. The distribution gap between these two domains are bridged by a linear combination of Gaussian kernels. Instead of training models with low-level features, we resort to deep models to learn the succinct, invariant, and high-level image representations. Different from previous efforts that focus on a few types of skin diseases with a small and confidential set of images generated from hospitals, this paper targets at thousands of commonly seen skin diseases with publicly accessible Web images. Hence the proposed model is easily repeatable by other researchers and extendable to other disease types. Extensive experiments on a real-world dataset have demonstrated the superiority of our proposed method over the state-of-the-art competitors. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

36. Low-Rank Multi-View Embedding Learning for Micro-Video Popularity Prediction.

Author: Jing, Peiguang, Su, Yuting, Nie, Liqiang, Bai, Xu, Liu, Jing, and Wang, Meng
Subjects: INTERNET content management systems, REGRESSION analysis, SOCIAL media, VIDEOS, GENERALIZATION
Abstract: Recently, a prevailing trend of user generated content (UGC) on social media sites is the emerging micro-videos. Micro-videos afford many potential opportunities ranging from network content caching to online advertising, yet there are still little efforts dedicated to research on micro-video understanding. In this paper, we focus on popularity prediction of micro-videos by presenting a novel low-rank multi-view embedding learning framework. We name it as transductive low-rank multi-view regression (TLRMVR), and it is capable of boosting the performance of micro-video popularity prediction by jointly considering the intrinsic representations of the source and target samples. In particular, TLRMVR integrates low-rank multi-view embedding and regression analysis into a unified framework such that the lowest-rank representation shared by all views not only captures the global structure of all views, but also indicates the regression requirements. The framework is formulated as a regression model and it seeks a set of view-specific projection matrices with low-rank constraints to map multi-view features into a common subspace. In addition, a multi-graph regularization term is constructed to improve the generalization capability and further prevents the overfitting problem. Extensive experiments conducted on a publicly available dataset demonstrate that our proposed method achieve promising results as compared with state-of-the-art baselines. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

37. An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition.

Author: Luo, Minnan, Chang, Xiaojun, Nie, Liqiang, Yang, Yi, Hauptmann, Alexander G., and Zheng, Qinghua
Abstract: Video semantic recognition usually suffers from the curse of dimensionality and the absence of enough high-quality labeled instances, thus semisupervised feature selection gains increasing attentions for its efficiency and comprehensibility. Most of the previous methods assume that videos with close distance (neighbors) have similar labels and characterize the intrinsic local structure through a predetermined graph of both labeled and unlabeled data. However, besides the parameter tuning problem underlying the construction of the graph, the affinity measurement in the original feature space usually suffers from the curse of dimensionality. Additionally, the predetermined graph separates itself from the procedure of feature selection, which might lead to downgraded performance for video semantic recognition. In this paper, we exploit a novel semisupervised feature selection method from a new perspective. The primary assumption underlying our model is that the instances with similar labels should have a larger probability of being neighbors. Instead of using a predetermined similarity graph, we incorporate the exploration of the local structure into the procedure of joint feature selection so as to learn the optimal graph simultaneously. Moreover, an adaptive loss function is exploited to measure the label fitness, which significantly enhances model’s robustness to videos with a small or substantial loss. We propose an efficient alternating optimization algorithm to solve the proposed challenging problem, together with analyses on its convergence and computational complexity in theory. Finally, extensive experimental results on benchmark datasets illustrate the effectiveness and superiority of the proposed approach on video semantic recognition related tasks. [ABSTRACT FROM PUBLISHER]
Published: 2018
Full Text: View/download PDF

38. Multiview Physician-Specific Attributes Fusion for Health Seeking.

Author: Nie, Liqiang, Zhang, Luming, Yan, Chang, Xiaojun, Liu, Maofu, and Shaoling, Ling
Abstract: Community-based health services have risen as important online resources for resolving users health concerns. Despite the value, the gap between what health seekers with specific health needs and what busy physicians with specific attitudes and expertise can offer is being widened. To bridge this gap, we present a question routing scheme that is able to connect health seekers to the right physicians. In this scheme, we first bridge the expertise matching gap via a probabilistic fusion of the physician-expertise distribution and the expertise-question distribution. The distributions are calculated by hypergraph-based learning and kernel density estimation. We then measure physicians attitudes toward answering general questions from the perspectives of activity, responsibility, reputation, and willingness. At last, we adaptively fuse the expertise modeling and attitude modeling by considering the personal needs of the health seekers. Extensive experiments have been conducted on a real-world dataset to validate our proposed scheme. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

39. Rank-Constrained Spectral Clustering With Flexible Embedding.

Author: Li, Zhihui, Nie, Feiping, Chang, Xiaojun, Nie, Liqiang, Zhang, Huaxiang, and Yang, Yi
Subjects: ARTIFICIAL neural networks, CLUSTER analysis (Statistics), MACHINE learning
Abstract: Spectral clustering (SC) has been proven to be effective in various applications. However, the learning scheme of SC is suboptimal in that it learns the cluster indicator from a fixed graph structure, which usually requires a rounding procedure to further partition the data. Also, the obtained cluster number cannot reflect the ground truth number of connected components in the graph. To alleviate these drawbacks, we propose a rank-constrained SC with flexible embedding framework. Specifically, an adaptive probabilistic neighborhood learning process is employed to recover theblock-diagonalaffinity matrix of an ideal graph. Meanwhile, a flexible embedding scheme is learned to unravel the intrinsic cluster structure in low-dimensional subspace, where the irrelevant information and noise in high-dimensional data have been effectively suppressed. The proposed method is superior to previous SC methods in that: 1) the block-diagonal affinity matrix learned simultaneously with the adaptive graph construction process, more explicitly induces the cluster membership without further discretization; 2) the number of clusters is guaranteed to converge to the ground truth via a rank constraint on the Laplacian matrix; and 3) the mismatch between the embedded feature and the projected feature allows more freedom for finding the proper cluster structure in the low-dimensional subspace as well as learning the corresponding projection matrix. Experimental results on both synthetic and real-world data sets demonstrate the promising performance of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

40. Large-Scale Tracking for Images With Few Textures.

Author: Lu, Guoyu, Nie, Liqiang, Sorensen, Scott, and Kambhamettu, Chandra
Abstract: Image tracking provides crucial insight for the image motion, which generates essential information for incremental structure-from-motion reconstruction and camera pose estimation. Typical usages, such as 3D reconstruction and visual odometry, all rely on robust and accurate local feature tracking through consecutive images. Current algorithms realize feature tracking through matching features extracted from discriminant textures in the images, for which distinctive image content is required to obtain accurate feature matching. For images with few textures, usually, an insufficient number of features are extracted to perform reliable tracking in a series of sequential images. We propose a method that makes use of a limited number of discriminate features to explore other features without strong discriminant power. We develop a feature integrating surrounding salient points distribution knowledge, raw pixel value, and coordinate information to discover a significant amount of features in weakly textured areas in an image. We also incorporate epipolar geometry in the feature correspondence calculation by taking the distance from the matching candidate to its corresponding point's epipolar line into account. To reduce the number of unreliable features, we project the estimated 3D points back to the images. The reprojection error is standardized according to the 3D point's depth, which reduces the bias introduced by the object distance to the camera. We conduct experiments on a large dataset of Arctic sea ice images, mainly composed by planes of ices and sea water. The experimental results demonstrate that our method can perform fast and accurate tracking in weakly textured images. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

41. Data-Driven Answer Selection in Community QA Systems.

Author: Nie, Liqiang, Wei, Xiaochi, Zhang, Dongxiang, Wang, Xiang, Gao, Zhipeng, and Yang, Yi
Subjects: *DATA analysis, *FACILITATED learning, *ONLINE education, *INTERNET in education, *DATA modeling
Abstract: Finding similar questions from historical archives has been applied to question answering, with well theoretical underpinnings and great practical success. Nevertheless, each question in the returned candidate pool often associates with multiple answers, and hence users have to painstakingly browse a lot before finding the correct one. To alleviate such problem, we present a novel scheme to rank answer candidates via pairwise comparisons. In particular, it consists of one offline learning component and one online search component. In the offline learning component, we first automatically establish the positive, negative, and neutral training samples in terms of preference pairs guided by our data-driven observations. We then present a novel model to jointly incorporate these three types of training samples. The closed-form solution of this model is derived. In the online search component, we first collect a pool of answer candidates for the given question via finding its similar questions. We then sort the answer candidates by leveraging the offline trained model to judge the preference orders. Extensive experiments on the real-world vertical and general community-based question answering datasets have comparatively demonstrated its robustness and promising performance. Also, we have released the codes and data to facilitate other researchers. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

42. Predicting Image Memorability Through Adaptive Transfer Learning From External Sources.

Author: Jing, Peiguang, Su, Yuting, Nie, Liqiang, and Gu, Huimin
Abstract: Remembering images is an innate human capability. Camera images are captured by different people under varying environmental conditions, which leads to highly diverse image memorability scores. However, the factors that make an image more or less memorable are unclear, and it remains unknown how we can more accurately predict image memorability by using such factors. In this paper, we propose a novel framework called multiview transfer learning from external sources (MTLES) to predict image memorability. In this framework, we simultaneously leverage different types of visual feature sets and multiple types of predefined image attributes derived from external sources. In particular, to enhance representation ability of visual features, we construct connections between visual feature sets and higher level image attributes by transferring attribute knowledge from external sources. MTLES integrates weak learning through external sources, transfer learning, and multiview consistency loss with different types of feature sets into a joint framework. To better solve this joint optimization problem, we further develop an alternating iterative algorithm to deal with it. Experiments performed on the publicly available LaMem dataset demonstrate the effectiveness of the proposed scheme. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

43. I Know What You Want to Express: Sentence Element Inference by Incorporating External Knowledge Base.

Author: Wei, Xiaochi, Huang, Heyan, Nie, Liqiang, Zhang, Hanwang, Mao, Xian-Ling, and Chua, Tat-Seng
Subjects: ELECTRONIC data processing, PREDICTIVE text entry software, SEMANTIC computing, NATURAL language processing, DATA mining
Abstract: Sentence auto-completion is an important feature that saves users many keystrokes in typing the entire sentence by providing suggestions as they type. Despite its value, the existing sentence auto-completion methods, such as query completion models, can hardly be applied to solving the object completion problem in sentences with the form of (subject, verb, object), due to the complex natural language description and the data deficiency problem. Towards this goal, we treat an SVO sentence as a three-element triple (subject, sentence pattern, object), and cast the sentence object completion problem as an element inference problem. These elements in all triples are encoded into a unified low-dimensional embedding space by our proposed TRANSFER model, which leverages the external knowledge base to strengthen the representation learning performance. With such representations, we can provide reliable candidates for the desired missing element by a linear model. Extensive experiments on a real-world dataset have well-validated our model. Meanwhile, we have successfully applied our proposed model to factoid question answering systems for answer candidate selection, which further demonstrates the applicability of the TRANSFER model. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

44. Weakly Supervised Multilabel Clustering and its Applications in Computer Vision.

Author: Xia, Yingjie, Nie, Liqiang, Zhang, Luming, Yang, Yi, Hong, Richang, and Li, Xuelong
Abstract: Clustering is a useful statistical tool in computer vision and machine learning. It is generally accepted that introducing supervised information brings remarkable performance improvement to clustering. However, assigning accurate labels is expensive when the amount of training data is huge. Existing supervised clustering methods handle this problem by transferring the bag-level labels into the instance-level descriptors. However, the assumption that each bag has a single label limits the application scope seriously. In this paper, we propose weakly supervised multilabel clustering, which allows assigning multiple labels to a bag. Based on this, the instance-level descriptors can be clustered with the guidance of bag-level labels. The key technique is a weakly supervised random forest that infers the model parameters. Thereby, a deterministic annealing strategy is developed to optimize the nonconvex objective function. The proposed algorithm is efficient in both the training and the testing stages. We apply it to three popular computer vision tasks: 1) image clustering; 2) semantic image segmentation; and 3) multiple objects localization. Impressive performance on the state-of-the-art image data sets is achieved in our experiments. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

45. Perceptual Attributes Optimization for Multivideo Summarization.

Author: Nie, Liqiang, Hong, Richang, Zhang, Luming, Xia, Yingjie, Tao, Dacheng, and Sebe, Nicu
Abstract: Nowadays, many consumer videos are captured by portable devices such as iPhone. Different from constrained videos that are produced by professionals, e.g., those for broadcast, summarizing multiple handheld videos from a same scenery is a challenging task. This is because: 1) these videos have dramatic semantic and style variances, making it difficult to extract the representative key frames; 2) the handheld videos are with different degrees of shakiness, but existing summarization techniques cannot alleviate this problem adaptively; and 3) it is difficult to develop a quality model that evaluates a video summary, due to the subjectiveness of video quality assessment. To solve these problems, we propose perceptual multiattribute optimization which jointly refines multiple perceptual attributes (i.e., video aesthetics, coherence, and stability) in a multivideo summarization process. In particular, a weakly supervised learning framework is designed to discover the semantically important regions in each frame. Then, a few key frames are selected based on their contributions to cover the multivideo semantics. Thereafter, a probabilistic model is proposed to dynamically fit the key frames into an aesthetically pleasing video summary, wherein its frames are stabilized adaptively. Experiments on consumer videos taken from sceneries throughout the world demonstrate the descriptiveness, aesthetics, coherence, and stability of the generated summary. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

46. A Biologically Inspired Automatic System for Media Quality Assessment.

Author: Zhang, Luming, Hong, Richang, Nie, Liqiang, and Hong, Chaoqun
Subjects: ARTIFICIAL intelligence, COMPUTER vision, FEATURE extraction, SUPERVISED learning, ALGORITHMS
Abstract: Photo aesthetic quality evaluation is a challenging task in artificial intelligence systems. In this paper, we propose a biologically inspired aesthetic descriptor that mimicks humans sequentially perceiving visually/semantically salientref refid="fnote1"/ id="fnote1" asterisk="no"paraIn general, visually salient regions are perceived by low-level visual features, such as the high contrast between the foreground and the background objects; while semantically salient regions are perceived by high-level visual features such as human faces.pararegions in a photo. In particular, a weakly supervised learning paradigm is developed to project the local image descriptors into a low-dimensional semantic space. Then, each graphlet can be described by multiple types of visual features, both in low-level and in high-level. Since humans usually perceive only a few salient regions in a photo, a sparsity-constrained graphlet ranking algorithm is proposed that seamlessly integrates both the low-level and the high-level visual cues. Top-ranked graphlets are those visually/semantically prominent local aesthetic descriptors in a photo. They are sequentially linked into a path that simulates humans actively viewing process. Finally, we learn a probabilistic aesthetic measure based on such actively viewing paths (AVPs) from the training photos. Experimental results show that: 1) the AVPs are 87.65% consistent with real human gaze shifting paths, as verified by the eye-tracking data and 2) our aesthetic measure outperforms many of its competitors. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

47. Detecting Densely Distributed Graph Patterns for Fine-Grained Image Categorization.

Author: Zhang, Luming, Yang, Wang, Meng, Hong, Richang, Nie, Liqiang, and Li, Xuelong
Subjects: GRAPH theory, IMAGE processing, DATA mining, ALGORITHMS, IMAGE recognition (Computer vision)
Abstract: Fine-grained image categorization is a challenging task aiming at distinguishing objects belonging to the same basic-level category, e.g., leaf or mushroom. It is a useful technique that can be applied for species recognition, face verification, and so on. Most of the existing methods either have difficulties to detect discriminative object components automatically, or suffer from the limited amount of training data in each sub-category. To solve these problems, this paper proposes a new fine-grained image categorization model. The key is a dense graph mining algorithm that hierarchically localizes discriminative object parts in each image. More specifically, to mimic the human hierarchical perception mechanism, a superpixel pyramid is generated for each image. Thereby, graphlets from each layer are constructed to seamlessly capture object components. Intuitively, graphlets representative to each super-/sub-category is densely distributed in their feature space. Thus, a dense graph mining algorithm is developed to discover graphlets representative to each super-/sub-category. Finally, the discovered graphlets from pairwise images are integrated into an image kernel for fine-grained recognition. Theoretically, the learned kernel can generalize several state-of-the-art image kernels. Experiments on nine image sets demonstrate the advantage of our method. Moreover, the discovered graphlets from each sub-category accurately capture those tiny discriminative object components, e.g., bird claws, heads, and bodies. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

48. Modeling Disease Progression via Multisource Multitask Learners: A Case Study With Alzheimer’s Disease.

Author: Nie, Liqiang, Zhang, Luming, Meng, Lei, Song, Xuemeng, Chang, Xiaojun, and Li, Xuelong
Subjects: *DISEASE progression, *ALZHEIMER'S disease
Abstract: Understanding the progression of chronic diseases can empower the sufferers in taking proactive care. To predict the disease status in the future time points, various machine learning approaches have been proposed. However, a few of them jointly consider the dual heterogeneities of chronic disease progression. In particular, the predicting task at each time point has features from multiple sources, and multiple tasks are related to each other in chronological order. To tackle this problem, we propose a novel and unified scheme to coregularize the prior knowledge of source consistency and temporal smoothness. We theoretically prove that our proposed model is a linear model. Before training our model, we adopt the matrix factorization approach to address the data missing problem. Extensive evaluations on real-world Alzheimer’s disease data set have demonstrated the effectiveness and efficiency of our model. It is worth mentioning that our model is generally applicable to a rich range of chronic diseases. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

49. Retargeting Semantically-Rich Photos.

Author: Zhang, Luming, Wang, Meng, Nie, Liqiang, Hong, Liang, Rui, Yong, and Tian, Qi
Abstract: Semantically-rich photos contain a rich variety of semantic objects (e.g., pedestrians and bicycles). Retargeting these photos is a challenging task since each semantic object has fixed geometric characteristics. Shrinking these objects simultaneously during retargeting is prone to distortion. In this paper, we propose to retarget semantically-rich photos by detecting photo semantics from image tags, which are predicted by a multi-label SVM. The key technique is a generative model termed latent stability discovery (LSD). It can robustly localize various semantic objects in a photo by making use of the predicted noisy image tags. Based on LSD, a feature fusion algorithm is proposed to detect salient regions at both the low-level and high-level. These salient regions are linked into a path sequentially to simulate human visual perception . Finally, we learn the prior distribution of such paths from aesthetically pleasing training photos. The prior enforces the path of a retargeted photo to be maximally similar to those from the training photos. In the experiment, we collect 217 1600 \times 1200 photos, each containing over seven salient objects. Comprehensive user studies demonstrate the competitiveness of our method. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

50. Disease Inference from Health-Related Questions via Sparse Deep Learning.

Author: Nie, Liqiang, Wang, Meng, Zhang, Luming, Yan, Shuicheng, Zhang, Bo, and Chua, Tat-Seng
Subjects: *MACHINE learning, *INFERENCE engines (Computer science), *MEDICAL informatics, *INTERNET in medicine, *MEDICAL consultation
Abstract: Automatic disease inference is of importance to bridge the gap between what online health seekers with unusual symptoms need and what busy human doctors with biased expertise can offer. However, accurately and efficiently inferring diseases is non-trivial, especially for community-based health services due to the vocabulary gap, incomplete information, correlated medical concepts, and limited high quality training samples. In this paper, we first report a user study on the information needs of health seekers in terms of questions and then select those that ask for possible diseases of their manifested symptoms for further analytic. We next propose a novel deep learning scheme to infer the possible diseases given the questions of health seekers. The proposed scheme is comprised of two key components. The first globally mines the discriminant medical signatures from raw features. The second deems the raw features and their signatures as input nodes in one layer and hidden nodes in the subsequent layer, respectively. Meanwhile, it learns the inter-relations between these two layers via pre-training with pseudo-labeled data. Following that, the hidden nodes serve as raw features for the more abstract signature mining. With incremental and alternative repeating of these two components, our scheme builds a sparsely connected deep architecture with three hidden layers. Overall, it well fits specific tasks with fine-tuning. Extensive experiments on a real-world dataset labeled by online doctors show the significant performance gains of our scheme. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

80 results on '"Nie, Liqiang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources