96 results on '"Jianping Fan"'
Search Results
2. nGIA: A novel Greedy Incremental Alignment based algorithm for gene sequence clustering
- Author
-
Zhen Ju, Huiling Zhang, Jintao Meng, Jingjing Zhang, Jianping Fan, Yi Pan, Weiguo Liu, Xuelei Li, and Yanjie Wei
- Subjects
Computer Networks and Communications ,Hardware and Architecture ,Software - Published
- 2022
3. Improvement of cross-efficiency based on TODIM method
- Author
-
Meiqin Wu, Xiaoqing Hou, and Jianping Fan
- Subjects
Geometry and Topology ,Software ,Theoretical Computer Science - Published
- 2022
4. Contour-enhanced CycleGAN framework for style transfer from scenery photos to Chinese landscape paintings
- Author
-
Xianlin Peng, Shenglin Peng, Qiyao Hu, Jinye Peng, Jiaxin Wang, Xinyu Liu, and Jianping Fan
- Subjects
Artificial Intelligence ,Software - Published
- 2022
5. Semisupervised image classification by mutual learning of multiple self‐supervised models
- Author
-
Jian Zhang, Jianing Yang, Jun Yu, and Jianping Fan
- Subjects
Human-Computer Interaction ,Artificial Intelligence ,Software ,Theoretical Computer Science - Published
- 2022
6. Social Image-text Sentiment Classification With Cross-Modal Consistency and Knowledge Distillation
- Author
-
Huan Liu, Ke Li, Jianping Fan, Caixia Yan, Tao Qin, and Qinghua Zheng
- Subjects
Human-Computer Interaction ,Software - Published
- 2022
7. Guided Filter Network for Semantic Image Segmentation
- Author
-
Xiang Zhang, Wanqing Zhao, Wei Zhang, Jinye Peng, and Jianping Fan
- Subjects
Computer Graphics and Computer-Aided Design ,Software - Abstract
The existing publicly available datasets with pixel-level labels contain limited categories, and it is difficult to generalize to the real world containing thousands of categories. In this paper, we propose an approach to generate object masks with detailed pixel-level structures/boundaries automatically to enable semantic image segmentation of thousands of targets in the real world without manually labelling. A Guided Filter Network (GFN) is first developed to learn the segmentation knowledge from an existed dataset, and such GFN then transfers the learned segmentation knowledge to generate initial coarse object masks for the target images. These coarse object masks are treated as pseudo labels to self-optimize the GFN iteratively in the target images. Our experiments on six image sets have demonstrated that our proposed approach can generate object masks with detailed pixel-level structures/boundaries, whose quality is comparable to the manually-labelled ones. Our proposed approach also achieves better performance on semantic image segmentation than most existing weakly-supervised, semi-supervised, and domain adaptation approaches under the same experimental conditions.
- Published
- 2022
8. Adaptive Selection of Reference Frames for Video Object Segmentation
- Author
-
Lingyi Hong, Wei Zhang, Liangyu Chen, Wenqiang Zhang, and Jianping Fan
- Subjects
Computer Graphics and Computer-Aided Design ,Software - Abstract
Video object segmentation is a challenging task in computer vision because the appearances of target objects might change drastically along the time in the video. To solve this problem, space-time memory (STM) networks are exploited to make use of the information from all the intermediate frames between the first frame and the current frame in the video. However, fully using the information from all the memory frames may make STM not practical for long videos. To overcome this issue, a novel method is developed in this paper to select the reference frames adaptively. First, an adaptive selection criterion is introduced to choose the reference frames with similar appearance and precise mask estimation, which can efficiently capture the rich information of the target object and overcome the challenges of appearance changes, occlusion, and model drift. Secondly, bi-matching (bi-scale and bi-direction) is conducted to obtain more robust correlations for objects of various scales and prevents multiple similar objects in the current frame from being mismatched with the same target object in the reference frame. Thirdly, a novel edge refinement technique is designed by using an edge detection network to obtain smooth edges from the outputs of edge confidence maps, where the edge confidence is quantized into ten sub-intervals to generate smooth edges step by step. Experimental results on the challenging benchmark datasets DAVIS-2016, DAVIS-2017, YouTube-VOS, and a Long-Video dataset have demonstrated the effectiveness of our proposed approach to video object segmentation.
- Published
- 2022
9. GGD-GAN: Gradient-Guided dual-Branch adversarial networks for relic sketch generation
- Author
-
Jun Wang, Erlei Zhang, Shan Cui, Jiaxin Wang, Qunxi Zhang, Jianping Fan, and Jinye Peng
- Subjects
Artificial Intelligence ,Signal Processing ,Computer Vision and Pattern Recognition ,Software - Published
- 2023
10. Imitating targets from all sides: an unsupervised transfer learning method for person re-identification
- Author
-
Jianping Fan, Jiajie Tian, Baopeng Zhang, Zhu Teng, and Yanxue Wang
- Subjects
business.industry ,Computer science ,020207 software engineering ,Pattern recognition ,Computational intelligence ,02 engineering and technology ,Domain (software engineering) ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Artificial Intelligence ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Identity (object-oriented programming) ,020201 artificial intelligence & image processing ,Pairwise comparison ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Representation (mathematics) ,Transfer of learning ,Software - Abstract
Person re-identification (Re-ID) models usually present a limited performance when they are trained on one dataset and tested on another dataset due to the inter-dataset bias (e.g. completely different identities and backgrounds) and the intra-dataset difference (e.g. camera and pose changes). In other words, the absence of identity labels (who the person is) and pairwise labels (whether a pair of images belongs to the same person or not) leads to failures in unsupervised person Re-ID problem. We argue that synchronous consideration of these two aspects can improve the performance of unsupervised person Re-ID model. In this work, we introduce a Classification and Latent Commonality (CLC) method based on transfer learning for the unsupervised person Re-ID problem. Our method has three characteristics: (1) proposing an imitate model to generate an imitated target domain with estimated identity labels and create a pseudo target domain to compensate the pairwise labels across camera views; (2) formulating a dual classification loss on both the source domain and imitated target domain to learn a discriminative representation and diminish the inter-domain bias; (3) investigating latent commonality and reducing the intra-domain difference by constraining triplet loss on the source domain, imitated target domain and pairwise label target domain (composed of pseudo target domain and target domain). Extensive experiments are conducted on three widely employed benchmarks, including Market-1501, DukeMTMC-reID and MSMT17, and experimental results demonstrate that the proposed method can achieve a competitive performance against other state-of-the-art unsupervised Re-ID approaches.
- Published
- 2021
11. Import Vertical Characteristic of Rain Streak for Single Image Deraining
- Author
-
Zhexin Zhang, Jiajun Ding, Jun Yu, Yiming Yuan, and Jianping Fan
- Subjects
Computer Networks and Communications ,Hardware and Architecture ,Media Technology ,Software ,Information Systems - Abstract
Recently, deep convolutional neural networks show good effect for single image deraining. These networks always adopt the conventional convolution method to extract features, which may neglect the characteristic of rain streak. A novelty vertical module is proposed to focus on the vertical characteristic of rain streak. Such module uses 1 x X convolution kernel to extract the vertical information of rainstreaks and a X x X convolution kernel to keep relative location information. Use this module in the front of deraining network can better detach rain streaks from background. In addition, the contrastive learning is employed to improve the performance of the model. Extensive experimental results demonstrated the superiority of the deraining methods with the proposed methods in comparison with the base ones.
- Published
- 2022
12. Hierarchical bilinear convolutional neural network for image classification
- Author
-
Jianping Fan, Long Chen, Ziyu Guan, Xiang Zhang, Sheng Zhong, Lei Tang, Chao Zhao, Jinye Peng, and Hangzai Luo
- Subjects
QA76.75-76.765 ,Contextual image classification ,Computer science ,business.industry ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Bilinear interpolation ,Computer software ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Convolutional neural network ,Software - Abstract
Image classification is one of the mainstream tasks of computer vision. However, the most existing methods use labels of the same granularity level for training. This leads to ignoring the hierarchy that may help to differentiate different visual objects better. Embedding hierarchical information into the convolutional neural networks (CNNs) can effectively regulate the semantic space and thus reduce the ambiguity of prediction. To this end, a multi‐task learning framework, named as Hierarchical Bilinear Convolutional Neural Network (HB‐CNN), is developed by seamlessly integrating CNNs with multi‐task learning over the hierarchical visual concept structures. Specifically, the labels with a tree structure are used as the supervision to hierarchically train multiple branch networks. In this way, the model can not only learn additional information (e.g. context information) as the coarse‐level category features, but also focus the learned fine‐level category features on the object properties. To smoothly pass hierarchical conceptual information and encourage feature reuse, a connectivity pattern is proposed to connect features at different levels. Furthermore, a bilinear module is embedded to generalise various orderless texture feature descriptors so that our model can capture more discriminative features. The proposed method is extensively evaluated on the CIFAR‐10, CIFAR‐100, and ‘Orchid’ Plant image sets. The experimental results show the effectiveness and superiority of our method.
- Published
- 2021
13. Deep Multiple Instance Hashing for Fast Multi-Object Image Search
- Author
-
Hangzai Luo, Jinye Peng, Jianping Fan, Wanqing Zhao, and Ziyu Guan
- Subjects
business.industry ,Computer science ,Hash function ,Full text search ,Pattern recognition ,Object (computer science) ,Inverted index ,Computer Graphics and Computer-Aided Design ,Object detection ,Spatial relation ,Feature (computer vision) ,Artificial intelligence ,business ,Image retrieval ,Software - Abstract
Multi-keyword query is widely supported in text search engines. However, an analogue in image retrieval systems, multi-object query, is rarely studied. Meanwhile, traditional object-based image retrieval methods often involve multiple steps separately. In this work, we propose a weakly-supervised Deep Multiple Instance Hashing (DMIH) approach for multi-object image retrieval. Our DMIH approach, which leverages a popular CNN model to build the end-to-end relation between a raw image and the binary hash codes of its multiple objects, can support multi-object queries effectively and integrate object detection with hashing learning seamlessly. We treat object detection as a binary multiple instance learning (MIL) problem and such instances are automatically extracted from multi-scale convolutional feature maps. We also design a conditional random field (CRF) module to capture both the semantic and spatial relations among different class labels. For hashing training, we sample image pairs to learn their semantic relationships in terms of hash codes of the most probable proposals for owned labels as guided by object predictors. The two objectives benefit each other in a multi-task learning scheme. Finally, a two-level inverted index method is proposed to further speed up the retrieval of multi-object queries. Our DMIH approach outperforms state-of-the-arts on public benchmarks for object-based image retrieval and achieves promising results for multi-object queries.
- Published
- 2021
14. AirCargoChain: A Distributed and Scalable Data Sharing Approach of Blockchain for Air Cargo
- Author
-
Qifeng Gu, Qiang Qu, Qingshan Jiang, Gejun Le, and Jianping Fan
- Subjects
020203 distributed computing ,Service (systems architecture) ,Network architecture ,Computer Networks and Communications ,business.industry ,Computer science ,Node (networking) ,Contract management ,020206 networking & telecommunications ,02 engineering and technology ,Air cargo ,Data sharing ,Hardware and Architecture ,Scalability ,Computer data storage ,0202 electrical engineering, electronic engineering, information engineering ,business ,Software ,Information Systems ,Computer network - Abstract
Air cargo involves large-scale data and multiple stakeholders, i.e., airports, airlines, agents, and clients. How to enable stakeholders to share data in a secure way is essential, since it improves efficiency for various processes among stakeholders. This paper proposes AirCargoChain, a blockchain-based data sharing approach for air cargo, which has the following advantages: (a) secure: we propose a blockchain-based cooperative network architecture, Cooperative Network (CN), to allow mutually distrusted stakeholders to manage data collaboratively. (b) scalable: we design a storage scheme, Off-Chain Storage (OCS), based on IPFS to support large-scale data storage; (c) user-friendly: we provide effective communication mechanism and convenient contract name service in Node Communication (NC) and Contract Management (CM), respectively. A comprehensive evaluation offers insight into the applicability and effectiveness of AirCargoChain.
- Published
- 2020
15. Discriminative Fast Hierarchical Learning for Multiclass Image Classification
- Author
-
Jianping Fan, Ji Zhang, Xinbo Gao, and Yu Zheng
- Subjects
Contextual image classification ,Computer Networks and Communications ,business.industry ,Computer science ,Deep learning ,Multi-task learning ,Pattern recognition ,02 engineering and technology ,Computer Science Applications ,Visualization ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Stochastic gradient descent ,Discriminative model ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Classifier (UML) ,Software - Abstract
In this article, a discriminative fast hierarchical learning algorithm is developed for supporting multiclass image classification, where a visual tree is seamlessly integrated with multitask learning to achieve fast training of the tree classifier hierarchically (i.e., a set of structural node classifiers over the visual tree). By partitioning a large number of categories hierarchically in a coarse-to-fine fashion, a visual tree is first constructed and further used to handle data imbalance and identify the interrelated learning tasks automatically (e.g., the tasks for learning the node classifiers for the sibling child nodes under the same parent node are strongly interrelated), and a multitask SVM classifier is trained for each nonleaf node to achieve more effective separation of its sibling child nodes at the next level of the visual tree. Both the internode visual similarities and the interlevel visual correlations are utilized to train more discriminative multitask SVM classifiers and control the interlevel error propagation effectively, and a stochastic gradient descent (SGD) algorithm is developed for learning such multitask SVM classifiers with higher efficiency. Our experimental results have demonstrated that our fast hierarchical learning algorithm can achieve very competitive results on both the classification accuracy rates and the computational efficiency.
- Published
- 2020
16. A concept ontology triplet network for learning discriminative representations of fine-grained classes
- Author
-
Qiqi Zhang, Guiqing He, Xu Yuelei, Haixi Zhang, and Jianping Fan
- Subjects
Structure (mathematical logic) ,Computer Networks and Communications ,business.industry ,Computer science ,020207 software engineering ,Sample (statistics) ,02 engineering and technology ,Ontology (information science) ,Image (mathematics) ,Discriminative model ,Hardware and Architecture ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Ontology ,Semantic memory ,Artificial intelligence ,business ,Software - Abstract
Triplet network is an efficient method of metric learning, but with the increase of the number of fine-grained images and sample categories, the training of Triplet network is more and more challengeable. In order to solve this problem, this paper proposes an algorithm that effectively combine Concept Ontology Structure with the Triplet network trained of Two-layer Ontology Loss. It not only utilizes semantic knowledge to guide the Concept Ontology Structure of the network, but also makes use of the relationship between the layers to make the network more effective to see the triplets, which enhances the separability of the learned features. At the same time, we also use the bilinear function jointly trained with the Triplet network to enhance the image details, further improving the performance of the network. Finally, the effectiveness of the proposed algorithm is also proved by the results of classification experiments on the fine-grained image databases - Orchid and Fashion60.
- Published
- 2020
17. A decentralised approach for link inference in large signed graphs
- Author
-
Romana Talat, Jianping Fan, Qiang Qu, Muhammad Muzammal, and Faima Abbasi
- Subjects
Theoretical computer science ,Computer Networks and Communications ,Computer science ,Probabilistic logic ,Inference ,020206 networking & telecommunications ,02 engineering and technology ,Graph ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Signed graph ,Software ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Social networks as large graphs have interesting information embedded within. The presence of links between nodes characterises the underlying relationships between nodes. Link inference is an interesting problem and has been studied for unsigned and signed graphs. Whilst the signed links give more insight into the node relationships, the class imbalance and the limited availability of signed graph datasets obstructs the studies in this domain. Furthermore, the studies in literature usually consider a single large graph and ignore the underlying potentially different sub-graphs in the original graph. In this work, we consider signed graphs for link inference with a focus on negative links and adopt a decentralised approach to learn the graph and sub-graph embeddings, i.e., we consider sub-graphs of the original signed graph for link inference. As we focus on negative links, the problem becomes more challenging due to the class-imbalance and sparsity of the sub-graphs. For the input graph, we employ a decentralised approach to learn the latent factors in the sub-graphs using probabilistic matrix factorisation. We perform an extensive experimental study using real datasets to assess the applicability and effectiveness of the approach. The results show that the decentralised approach is a promising consideration and gives encouraging results for the performance and scalability of the solution.
- Published
- 2020
18. Deep Spatial and Temporal Network for Robust Visual Object Tracking
- Author
-
Qiang Wang, Baopeng Zhang, Jianping Fan, Junliang Xing, and Zhu Teng
- Subjects
Source code ,Computer science ,business.industry ,Deep learning ,media_common.quotation_subject ,Frame (networking) ,Representation (systemics) ,02 engineering and technology ,Object (computer science) ,Computer Graphics and Computer-Aided Design ,Visualization ,Video tracking ,0202 electrical engineering, electronic engineering, information engineering ,Eye tracking ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Software ,media_common - Abstract
There are two key components that can be leveraged for visual tracking: (a) object appearances; and (b) object motions. Many existing techniques have recently employed deep learning to enhance visual tracking due to its superior representation power and strong learning ability, where most of them employed object appearances but few of them exploited object motions. In this work, a deep spatial and temporal network (DSTN) is developed for visual tracking by explicitly exploiting both the object representations from each frame and their dynamics along multiple frames in a video, such that it can seamlessly integrate the object appearances with their motions to produce compact object appearances and capture their temporal variations effectively. Our DSTN method, which is deployed into a tracking pipeline in a coarse-to-fine form, can perceive the subtle differences on spatial and temporal variations of the target (object being tracked), and thus it benefits from both off-line training and online fine-tuning. We have also conducted our experiments over four largest tracking benchmarks, including OTB-2013, OTB-2015, VOT2015, and VOT2017, and our experimental results have demonstrated that our DSTN method can achieve competitive performance as compared with the state-of-the-art techniques. The source code, trained models, and all the experimental results of this work will be made public available to facilitate further studies on this problem.
- Published
- 2020
19. An end-to-end identity association network based on geometry refinement for multi-object tracking
- Author
-
Rui Li, Baopeng Zhang, Zhu Teng, and Jianping Fan
- Subjects
Artificial Intelligence ,Signal Processing ,Computer Vision and Pattern Recognition ,Software - Published
- 2022
20. Multi-attribute group decision-making method based on weighted partitioned Maclaurin symmetric mean operator and a novel score function under neutrosophic cubic environment
- Author
-
Jianping Fan, Shanshan Zhai, and ·Meiqin Wu
- Subjects
Geometry and Topology ,Software ,Theoretical Computer Science - Abstract
Neutrosophic cubic set (NCS) is the generalized version of neutrosophic sets and interval neutrosophic sets. It can deal with the complex information by combining the neutrosophic set (NS) and cubic set (CS). The partitioned Maclaurin symmetric mean (PMSM) operator can reflect the interrelationships among attributes where there are interrelationships among attributes in the same partition, but the attributes in different partitions are irrelevant. To effectively gather neutrosophic cubic information, we extend the PMSM operator to neutrosophic cubic environment and define the neutrosophic cubic partitioned Maclaurin symmetric mean (NCPMSM) operator and neutrosophic cubic weighted partitioned Maclaurin symmetric mean (NCWPMSM) operator. Later, we define a novel score function of NCS which overcome the drawbacks of the existing score functions. Next, based on NCWPMSM operator and the novel score function, we develop a multi-attribute group decision-making method. Finally, we give an example of supplier selection to illustrate the usefulness of the proposed multi-attribute group decision-making (MAGDM) method. At the same time, a comparative analysis is to show the effectiveness and advantages of the proposed method compared with the existing methods
- Published
- 2021
21. On spatio-temporal blockchain query processing
- Author
-
Muhammad Muzammal, Qiang Qu, Christian S. Jensen, Ildar Nurgaliev, and Jianping Fan
- Subjects
Authenticated data-structure ,Random graph ,Blockchain ,Computer Networks and Communications ,Computer science ,business.industry ,Data management ,Distributed computing ,020206 networking & telecommunications ,02 engineering and technology ,Blockchains ,Data structure ,Sequential access ,Spatio-temporal data ,Graph ,Tree (data structure) ,Block-DAG ,Hardware and Architecture ,Data integrity ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,business ,Software - Abstract
Recent advances in blockchain technology suggest that the technology has potential for use in applications in a variety of new domains including spatio-temporal data management. The reliability and immutability of blockchains combined with the support for decentralized, trustless data processing offer new opportunities for applications in such domains. However, current blockchain proposals do not support spatio-temporal data processing, and the block-based sequential access in blockchain hinders efficient query processing. We propose spatio-temporal blockchain technology that supports fast query processing. More specifically, we propose blockchain technology that records time and location attributes for the transactions, maintains data integrity, and supports fast spatial queries by the introduction of a cryptographically signed tree data structure, the Merkle Block Space Index (BSI), which is a modification of the Merkle KD-tree. We consider Bitcoin-like near-uniform block generation, and we process temporal queries by means of a block-DAG data structure, called Temporal Graph Search (TGS), without the need for temporal indexes. To enable the experiments, we propose a random graph model to generate a block-DAG topology for an abstract peer-to-peer network. We perform a comprehensive evaluation to offer insight into the applicability and effectiveness of the proposed technology. The evaluation indicates that TGS-BSI is a promising solution for efficient spatio-temporal query processing on blockchains.
- Published
- 2019
22. A novel generative adversarial net for calligraphic tablet images denoising
- Author
-
Jianping Fan, Jiulong Zhang, and Mingtao Guo
- Subjects
Structure (mathematical logic) ,Computer Networks and Communications ,Computer science ,business.industry ,Noise reduction ,020207 software engineering ,02 engineering and technology ,Residual ,Net (mathematics) ,Superresolution ,Image (mathematics) ,Hardware and Architecture ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Computer vision ,Noise (video) ,Artificial intelligence ,business ,Software - Abstract
Chinese calligraphic images have important historical and artistic value, but natural weathering and man-made decay severely damage these works, thus image denoising is an important topic to be addressed. Traditional denoising methods still leave room for improvement. In this paper, image denoising is modeled as generation of clean image by using GAN (Goodfellow I et al. Advances in Neural Information Processing Systems 2672–2680, 2014) with an embedment of residual dense blocks (Zhang Y et al. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018) that was formerly used for super resolution reconstruction. Meanwhile, a new type of noise is defined to simulate the real noise, and is used for compensation of unpaired data in the training set for GAN. The new structure, used with some preprocessing and training methods, yield satisfactory results compared to known denoising methods.
- Published
- 2019
23. Learning multi-layer coarse-to-fine representations for large-scale image classification
- Author
-
Ji Zhang, Kuizhi Mei, Yu Zheng, and Jianping Fan
- Subjects
Contextual image classification ,Computer science ,business.industry ,Deep learning ,Multi-task learning ,Pattern recognition ,02 engineering and technology ,Object (computer science) ,01 natural sciences ,Set (abstract data type) ,Tree (data structure) ,Discriminative model ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Ontology ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Software - Abstract
Recent studies on large-scale image classification mainly focus on categorizing images into 1000 object classes, and all these 1000 object classes are atomic and mutually exclusive in the semantic space. However, for a much larger set of image categories (such as the ImageNet 10k dataset), some of them may come from the high-level (non-leaf) nodes of the concept ontology and could contain some other lower-level categories semantically. The research that classifies images into large numbers of image categories with such inter-category subsumption correlations has received rare attention. In this paper, a Visual-Semantic Tree is learned to organize 10k image categories hierarchically in a coarse-to-fine fashion, where both the inter-category visual similarities and inter-category semantic correlations are seamlessly integrated for tree construction. Additionally, a deep learning method is developed by integrating the Visual-Semantic Tree with deep CNNs to learn more discriminative tree classifiers for large-scale image classification. Our experimental results have demonstrated that the proposed Visual-Semantic Tree can effectively organize large-scale structural image categories and significantly boost the classification accuracy rates for both atomic image categories and high-level image categories.
- Published
- 2019
24. Plant recognition via leaf shape and margin features
- Author
-
Jinye Peng, Wanqing Zhao, Hangzai Luo, Long Chen, Xiang Zhang, and Jianping Fan
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Generalization ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Leaf margin ,Hardware and Architecture ,Margin (machine learning) ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Feature (machine learning) ,Point (geometry) ,Artificial intelligence ,Representation (mathematics) ,business ,Software - Abstract
Botanists and foresters empirically determine plant categories mainly via visual features of leaves, e.g. leaf shape, leaf margin, leaf arrangement and leaf venation. The leaf shape and leaf margin can be captured easily with cheap devices. As a result, automatic plant recognition is generally based on leaf shape or margin features. In this paper, a set of features that depict leaf shape and margin are proposed to improve the performance of plant recognition. The proposed margin features utilize the area ratio to quantify the convexity/concavity of each contour point at different scales and such margin features are effective in capturing the global information and contour details. The area ratio is the ration of the disk to the inside of the contour. The proposed shape features use a combination of morphological features to characterize the global shape of the leaf, which has merits in preserving the geometric properties of leaf shape. Additionally, a series of multi-grained fusion methods that combine the margin feature and global shape feature are proposed as a better representation of a leaf. To validate the effectiveness and generalization, we evaluate our methods on two public datasets: Swedish Leaf dataset and ICL Leaf dataset. The experimental results show the superiority of our methods over state-of-the-art shape methods.
- Published
- 2019
25. A generalized multi-dictionary least squares framework regularized with multi-graph embeddings
- Author
-
Timothy Apasiba Abeo, Jianping Fan, Zheng-Jun Zha, Bing-Kun Bao, and Xiang-Jun Shen
- Subjects
Computer science ,Dimensionality reduction ,Feature extraction ,02 engineering and technology ,01 natural sciences ,Least squares ,Manifold ,Graph ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,010306 general physics ,Cluster analysis ,Algorithm ,Software - Abstract
Dimensionality reduction in high dimensional multi-view datasets is an important research topic. It can keep essential features to improve performance in subsequent tasks such as classification and clustering. This paper proposes a generalized framework, which extends the PCA idea of minimizing least squares reconstruction errors, to include data distribution and multiple dictionaries for preserving outliers-free global structures in multi-view datasets. To also preserve local manifold structures, multiple local graphs are incorporated. Finally two models, in Multi-dictionary Least Squares Framework regularized with Multi-graph Embeddings (MD-MGE), are proposed for preserving both global and local structures. Extensive experimental results on four multi-view datasets prove both methods outperform the existing comparative methods. Also, their accuracy rates improvements are statistically significant on all cases below the significance level of 0.05.
- Published
- 2019
26. A novel CNN structure for fine-grained classification of Chinese calligraphy styles
- Author
-
Mingtao Guo, Jiulong Zhang, and Jianping Fan
- Subjects
Structure (mathematical logic) ,Computer science ,business.industry ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Convolutional neural network ,Computer Science Applications ,Calligraphy ,Pattern recognition (psychology) ,Softmax function ,Font ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Layer (object-oriented design) ,business ,computer ,Software ,Natural language processing - Abstract
Chinese calligraphy is a valuable cultural heritage belonging to the world. It is liked by many people, and our mission is to endeavor to pursue calligraphy and make contributions to the business with technical means. The automatic recognition of the styles of calligraphy by image processing techniques has important meaning in arts collection and auction, etc. Traditional feature operators have some drawbacks that leave room for modern methods like convolutional neural network (CNN). However, most of the studies focus on the classification of five basic fonts that is somewhat different from styles. In this paper, four kinds of styles belonging to standard font are classified with a novel CNN structure where two squeeze-and-excitation modules that emphasize informative feature maps and suppress useless features are embedded after convolution layers, and a Haar transform layer that fuses the features is imposed before softmax layer. Experiment result shows the significance of the proposed structure over other networks in both font and style classification.
- Published
- 2019
27. An Efficient Greedy Incremental Sequence Clustering Algorithm
- Author
-
Huiling Zhang, Yi Pan, Zhen Ju, Yanjie Wei, Xuelei Li, Jingjing Zhang, Weiguo Liu, Jianping Fan, and Jingtao Meng
- Subjects
Biological data ,ComputingMethodologies_PATTERNRECOGNITION ,Software ,Computer science ,business.industry ,Filter (signal processing) ,Function (mathematics) ,Data packing ,Cluster analysis ,business ,Algorithm ,Word (computer architecture) ,Sequence clustering - Abstract
Gene sequence clustering is very basic and important in computational biology and bioinformatics for the study of phylogenetic relationships and gene function prediction, etc. With the rapid growth of the amount of biological data (gene/protein sequences), clustering faces more challenges in low efficiency and precision. For example, there are many redundant sequences in gene databases that do not provide valid information but consume computing resources. Widely used greedy incremental clustering tools improve the efficiency at the cost of precision. To design a balanced gene clustering algorithm, which is both fast and precise, we propose a modified greedy incremental sequence clustering tool, via introducing a pre-filter, a modified short word filter, a new data packing strategy, and GPU accelerates. The experimental evaluations on four independent datasets show that the proposed tool can cluster datasets with precisions of 99.99%. Compared with the results of CD-HIT, Uclust, and Vsearch, the number of redundant sequences by the proposed method is four orders of magnitude less. In addition, on the same hardware platform, our tool is 40% faster than the second-place. The software is available at https://github.com/SIAT-HPCC/gene-sequence-clustering.
- Published
- 2021
28. A Survey of Visual Transformers
- Author
-
Yang Liu, Yao Zhang, Yixin Wang, Feng Hou, Jin Yuan, Jiang Tian, Yang Zhang, Zhongchao Shi, Jianping Fan, and Zhiqiang He
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence ,Computer Networks and Communications ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Software ,Computer Science Applications - Abstract
Transformer, an attention-based encoder-decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering works have recently been done on employing Transformer-liked architectures in the computer vision (CV) field, which have demonstrated their effectiveness on three fundamental CV tasks (classification, detection, and segmentation) as well as multiple sensory data stream (images, point clouds, and vision-language data). Because of their competitive modeling capabilities, the visual Transformers have achieved impressive performance improvements over multiple benchmarks as compared with modern Convolution Neural Networks (CNNs). In this survey, we have reviewed over one hundred of different visual Transformers comprehensively according to three fundamental CV tasks and different data stream types, where a taxonomy is proposed to organize the representative methods according to their motivations, structures, and application scenarios. Because of their differences on training settings and dedicated vision tasks, we have also evaluated and compared all these existing visual Transformers under different configurations. Furthermore, we have revealed a series of essential but unexploited aspects that may empower such visual Transformers to stand out from numerous architectures, e.g., slack high-level semantic embeddings to bridge the gap between the visual Transformers and the sequential ones. Finally, three promising research directions are suggested for future investment. We will continue to update the latest articles and their released source codes at https://github.com/liuyang-ict/awesome-visual-transformers., Comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
- Published
- 2021
- Full Text
- View/download PDF
29. Panchromatic and multi-spectral image fusion for new satellites based on multi-channel deep model
- Author
-
Jianping Fan, Qingqing Huang, Guiqing He, Siyuan Xing, and Zhaoqiang Xia
- Subjects
Image fusion ,business.industry ,Computer science ,0211 other engineering and technologies ,02 engineering and technology ,Convolutional neural network ,Computer Science Applications ,Image (mathematics) ,Panchromatic film ,Range (mathematics) ,Hardware and Architecture ,Computer Science::Computer Vision and Pattern Recognition ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Fuse (electrical) ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,Randomness ,021101 geological & geomatics engineering - Abstract
With the launch and rapid development of new satellites such as WorldView-3, the bands number of multi-spectral images from new satellites is greatly increased. However, the spectral matching between the panchromatic image and multi-spectral images is deteriorated with the existing image fusion methods. In this paper, a novel method based on the multi-channel deep model is proposed to fuse images for new satellites. The deep model is implemented by convolutional neural networks and trained on each band to reduce the impact of spectral range mismatch. The proposed method also preserves the detailed information in multi-spectral images, which is ignored by the traditional methods. It also effectively alleviates the inconvenience for obtaining the remote sensing images by the data augmentation processing. In addition, it reduces the randomness of manual setting parameters using the parameter self-learning. Visual and quantitative assessments of fusion results show that the proposed method clearly improves the fusion quality compared to the state-of-the-art methods.
- Published
- 2018
30. Integrating multi-level deep learning and concept ontology for large-scale visual recognition
- Author
-
Baopeng Zhang, Zongmin Li, Zhenzhong Kuang, Jianping Fan, and Jun Yu
- Subjects
business.industry ,Computer science ,Deep learning ,Multi-task learning ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Visual recognition ,Discriminative model ,Artificial Intelligence ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Software ,0105 earth and related environmental sciences - Abstract
To support large-scale visual recognition (i.e., recognizing thousands or even tens of thousands of object classes), a multi-level deep learning algorithm is developed to learn multiple deep networks and a tree classifier jointly, where a concept ontology is constructed to organize large numbers of object classes hierarchically in a coarse-to-fine fashion and determine the inter-related learning tasks automatically. Our multi-level deep learning algorithm can: (a) train multiple deep networks simultaneously to achieve more discriminative representations of both coarse-grained groups and fine-grained object classes at different levels of the concept ontology (i.e., learning multiple sets of deep features simultaneously for different tasks); (b) leverage multi-task learning to train more discriminative classifiers for the fine-grained object classes in the same group to enhance their separability significantly and enable inter-class knowledge transferring; and (c) learn multiple deep networks and the tree classifier jointly in an end-to-end fashion. Our experimental results on three image sets have demonstrated that our multi-level deep learning algorithm can achieve very competitive results on both the accuracy rates and the computational efficiency for large-scale visual recognition.
- Published
- 2018
31. TOP-SIFT: the selected SIFT descriptor based on dictionary learning
- Author
-
Deng Yu, Yujie Liu, Xiaoming Chen, Jianping Fan, and Zongmin Li
- Subjects
business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-invariant feature transform ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Sparse approximation ,Computer Graphics and Computer-Aided Design ,Small set ,Image (mathematics) ,Constraint (information theory) ,Computer Science::Computer Vision and Pattern Recognition ,Scalability ,Simulated annealing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Selection algorithm ,Software - Abstract
The large amount of SIFT descriptors in an image and the high dimensionality of SIFT descriptor have made problems for the large-scale image database in terms of speed and scalability. In this paper, we present a descriptor selection algorithm based on dictionary learning to remove the redundant features and reserve only a small set of features, which we refer to as TOP-SIFTs. During the experiment, we discovered the inner relativity between the problem of descriptor selection and dictionary learning in sparse representation, and then turned our problem into dictionary learning. We designed a new dictionary learning method to adapt our problem and employed the simulated annealing algorithm to obtain the optimal solution. During the process of learning, we added the sparsity constraint and spatial distribution characteristic of SIFT points. And lastly selected the small representative feature set with good spatial distribution. Compared with the earlier methods, our method is neither relying on the database nor losing important information, and the experiments have shown that our algorithm can save memory space a lot and increase time efficiency while maintaining the accuracy as well.
- Published
- 2018
32. Graph-regularized multi-view semantic subspace learning
- Author
-
Jinye Peng, Jianping Fan, Ziyu Guan, and Peng Luo
- Subjects
Modalities ,Theoretical computer science ,Multiple kernel learning ,Computer science ,Graph embedding ,Complex system ,Computational intelligence ,02 engineering and technology ,External Data Representation ,Compact space ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Software ,Subspace topology - Abstract
Many real-world datasets are represented by multiple features or modalities which often provide compatible and complementary information to each other. In order to obtain a good data representation that synthesizes multiple features, researchers have proposed different multi-view subspace learning algorithms. Although label information has been exploited for guiding multi-view subspace learning, previous approaches did not well capture the underlying semantic structure in data. In this paper, we propose a new multi-view subspace learning algorithm called multi-view semantic learning (MvSL). MvSL learns a nonnegative latent space and tries to capture the semantic structure of data by a novel graph embedding framework, where an affinity graph characterizing intra-class compactness and a penalty graph characterizing inter-class separability are generally defined. The intuition is to let intra-class items be near each other while keeping inter-class items away from each other in the learned common subspace across multiple views. We explore three specific definitions of the graphs and compare them analytically and empirically. To properly assess nearest neighbors in the multi-view context, we develop a multiple kernel learning method for obtaining an optimal kernel combination from multiple features. In addition, we encourage each latent dimension to be associated with a subset of views via sparseness constraints. In this way, MvSL is able to capture flexible conceptual patterns hidden in multi-view features. Experiments on three real-world datasets demonstrate the effectiveness of MvSL.
- Published
- 2017
33. Hierarchical learning of multi-task sparse metrics for large-scale image classification
- Author
-
Yu Zheng, Ji Zhang, Xinbo Gao, and Jianping Fan
- Subjects
02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Image (mathematics) ,Discriminative model ,Artificial Intelligence ,Node (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,0105 earth and related environmental sciences ,Mathematics ,Incremental decision tree ,Contextual image classification ,business.industry ,Pattern recognition ,Tree (data structure) ,Signal Processing ,Metric (mathematics) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Scale (map) ,business ,computer ,Software - Abstract
An enhanced hierarchical visual tree is constructed to organize large numbers of image categories and automatically identify the inter-related tasks for multi-task sparse metric learning.A new objective function is define for multi-task sparse metric learning.A top-down approach is developed for supporting hierarchical learning of a tree of multi-task sparse metrics over the enhanced visual tree. In this paper, a novel approach is developed to learn a tree of multi-task sparse metrics hierarchically over a visual tree to achieve a fast solution to large-scale image classification, where an enhanced visual tree is first learned to organize large numbers of image categories hierarchically in a coarse-to-fine fashion. Over the visual tree, a tree of multi-task sparse metrics is learned hierarchically by: (a) performing multi-task sparse metric learning over the sibling child nodes under the same parent node to explicitly separate their commonly-shared metric from their node-specific metrics; and (b) propagating the node-specific metric for the parent node to its sibling child nodes (at the next level of the visual tree), so that more discriminative metrics can be learned for controlling inter-level error propagation effectively. We have evaluated our hierarchical multi-task sparse metric learning algorithm over three different image sets and the experimental results demonstrated that our hierarchical multi-task sparse metric learning algorithm can obtain better performance than the state-of-the-art algorithms on large-scale image classification.
- Published
- 2017
34. HD-MTL: Hierarchical Deep Multi-Task Learning for Large-Scale Visual Recognition
- Author
-
Yu Zheng, Ji Zhang, Tianyi Zhao, Jinye Peng, Jianping Fan, Zhenzhong Kuang, and Jun Yu
- Subjects
business.industry ,Computer science ,Deep learning ,Feature extraction ,Multi-task learning ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Computer Graphics and Computer-Aided Design ,Convolutional neural network ,Deep belief network ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Classifier (UML) ,Feature learning ,computer ,Software ,0105 earth and related environmental sciences - Abstract
In this paper, a hierarchical deep multi-task learning (HD-MTL) algorithm is developed to support large-scale visual recognition (e.g., recognizing thousands or even tens of thousands of atomic object classes automatically). First, multiple sets of multi-level deep features are extracted from different layers of deep convolutional neural networks (deep CNNs), and they are used to achieve more effective accomplishment of the coarseto- fine tasks for hierarchical visual recognition. A visual tree is then learned by assigning the visually-similar atomic object classes with similar learning complexities into the same group, which can provide a good environment for determining the interrelated learning tasks automatically. By leveraging the inter-task relatedness (inter-class similarities) to learn more discriminative group-specific deep representations, our deep multi-task learning algorithm can train more discriminative node classifiers for distinguishing the visually-similar atomic object classes effectively. Our hierarchical deep multi-task learning (HD-MTL) algorithm can integrate two discriminative regularization terms to control the inter-level error propagation effectively, and it can provide an end-to-end approach for jointly learning more representative deep CNNs (for image representation) and more discriminative tree classifier (for large-scale visual recognition) and updating them simultaneously. Our incremental deep learning algorithms can effectively adapt both the deep CNNs and the tree classifier to the new training images and the new object classes. Our experimental results have demonstrated that our HD-MTL algorithm can achieve very competitive results on improving the accuracy rates for large-scale visual recognition.
- Published
- 2017
35. Locally linear spatial pyramid hash for large-scale image search
- Author
-
Hangzai Luo, Jianping Fan, Jinye Peng, and Wanqing Zhao
- Subjects
Computer Networks and Communications ,Computer science ,Nearest neighbor search ,Hash function ,02 engineering and technology ,Rolling hash ,K-independent hashing ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Pyramid (image processing) ,Hamming space ,Universal hashing ,business.industry ,Dynamic perfect hashing ,Pattern recognition ,Hash table ,Hardware and Architecture ,Computer Science::Computer Vision and Pattern Recognition ,020201 artificial intelligence & image processing ,Feature hashing ,Artificial intelligence ,business ,Perfect hash function ,Software ,Double hashing - Abstract
Hash-based methods can achieve a fast similarity search by representing high-dimensional data with compact binary codes. However, the spatial structure in row images was always lost in most previous methods. In this paper, a novel Locally Linear Spatial Pyramid Hash(LLSPH) algorithm is developed for the task of fast image retrieval. Unlike the conventional approach, the spatial extent of image features is exploited in our method. The spatial pyramid structure is used both to construct binary hash codes and to increase the discriminability of the description. To generate interpretable binary codes, the proposed LLSPH method captures the spatial characteristics of the original SPM and generates a low-dimensional sparse representation using multi-dictionaries Locality-constrained Linear Coding(MD_LLC). LLSPH then converts the low-dimensional data into Hamming space by the TF-IDF binarization rule. Our experimental results show that our LLSPH method can outperform several state-of-the-art hashing algorithms on the Caltech256 and ImageNet-500 datasets.
- Published
- 2016
36. MapReduce-based clustering for near-duplicate image identification
- Author
-
Hangzai Luo, Wanqing Zhao, Jianping Fan, and Jinye Peng
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Process (computing) ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,computer.software_genre ,Image (mathematics) ,Discriminative model ,Hardware and Architecture ,Computer Science::Computer Vision and Pattern Recognition ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,Cluster analysis ,business ,computer ,Software ,Selection (genetic algorithm) ,Feature detection (computer vision) - Abstract
In this paper, an effective algorithm is developed for tackling the problem of near-duplicate image identification from large-scale image sets, where the LLC (locality-constrained linear coding) method is seamlessly integrated with the maxIDF cut model to achieve more discriminative representations of images. By incorporating MapReduce framework for image clustering and pairwise merging, the near duplicates of images can be identified effectively from large-scale image sets. An intuitive strategy is also introduced to guide the process for parameter selection. Our experimental results on large-scale image sets have revealed that our algorithm can achieve significant improvement on both the accuracy rates and the computation efficiency as compared with other baseline methods.
- Published
- 2016
37. A novel framework for semantic entity identification and relationship integration in large scale text data
- Author
-
Dingxian Wang, Xiao Liu, Hangzai Luo, and Jianping Fan
- Subjects
Computer Networks and Communications ,Computer science ,02 engineering and technology ,computer.software_genre ,Semantics ,Semantic data model ,Semantic equivalence ,Semantic similarity ,Explicit semantic analysis ,020204 information systems ,Semantic computing ,0202 electrical engineering, electronic engineering, information engineering ,Semantic integration ,Semantic compression ,Information retrieval ,Probabilistic latent semantic analysis ,business.industry ,Semantic search ,Semantic interoperability ,Semantic grid ,Hardware and Architecture ,Semantic technology ,020201 artificial intelligence & image processing ,Weak entity ,Artificial intelligence ,business ,computer ,Software ,Natural language processing - Abstract
Semantic entities carry the most important semantics of text data. Therefore, the identification and the relationship integration of semantic entities are very important for applications requiring semantics of text data. However, current strategies are still facing many problems such as semantic entity identification, new word identification and relationship integration among semantic entities. To address these problems, a two-phase framework for semantic entity identification with relationship integration in large scale text data is proposed in this paper. In the first semantic entities identification phase, we propose a novel strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. Compared with traditional approaches, our strategy is more effective in detecting semantic entities and more sensitive to new entities that just appear in the fresh data. After extracting the semantic entities, the second phase of our framework is for the integration of Semantic Entities Relationships (SER) which can help to cluster the semantic entities. A novel classification method using features such as similarity measures and co-occurrence probabilities is applied to tackle the clustering problem and discover the relationships among semantic entities. Comprehensive experimental results have shown that our framework can beat state-of-the-art strategies in semantic entity identification and discover over 80% relationship pairs among related semantic entities in large scale text data.
- Published
- 2016
38. An automatic image-text alignment method for large-scale web image retrieval
- Author
-
Baopeng Zhang, Jinye Peng, Jianping Fan, and Yanyun Qu
- Subjects
Information retrieval ,Computer Networks and Communications ,Text alignment ,Computer science ,business.industry ,Search engine indexing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,Partition (database) ,Web image ,Hardware and Architecture ,020204 information systems ,Web page ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,020201 artificial intelligence & image processing ,Artificial intelligence ,Cluster analysis ,business ,Image retrieval ,Software - Abstract
For reducing huge uncertainty on the relatedness between the web images and their auxiliary text terms, an automatic image-text alignment algorithm is developed to achieve more accurate indexing and retrieval of large-scale web images by assigning the web images into their most relevant visual text terms precisely. First, large-scale web pages are crawled, where the informative images and their most relevant auxiliary text blocks are extracted. Second, parallel image clustering is performed to partition large-scale informative web images into a large number of clusters. By grouping the visually-similar web images into the same cluster, our parallel image clustering algorithm can significantly reduce the huge uncertainty on the relatedness between the web images and their auxiliary text terms, which can provide a good starting point for supporting automatic image-text alignment. Finally, a relevance re-ranking algorithm is developed to identify the most relevant text terms for characterizing the semantics of the visually-similar web images in the same cluster, e.g., assigning the web images into their most relevant visual text terms. Our experiments on large-scale web images have obtained very positive results.
- Published
- 2016
39. Exploiting Related and Unrelated Tasks for Hierarchical Metric Learning and Image Classification
- Author
-
Ji Zhang, Xinbo Gao, Jianping Fan, and Yu Zheng
- Subjects
Propagation of uncertainty ,Contextual image classification ,Computer science ,business.industry ,Deep learning ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Visualization ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Software - Abstract
In multi-task learning, multiple interrelated tasks are jointly learned to achieve better performance. In many cases, if we can identify which tasks are related, we can also clearly identify which tasks are unrelated. In the past, most researchers emphasized exploiting correlations among interrelated tasks while completely ignoring the unrelated tasks that may provide valuable prior knowledge for multi-task learning. In this paper, a new approach is developed to hierarchically learn a tree of multi-task metrics by leveraging prior knowledge about both the related tasks and unrelated tasks. First, a visual tree is constructed to hierarchically organize large numbers of image categories in a coarse-to-fine fashion. Over the visual tree, a multi-task metric classifier is learned for each node by exploiting both the related and unrelated tasks, where the learning tasks for training the classifiers for the sibling child nodes under the same parent node are treated as the interrelated tasks, and the others are treated as the unrelated tasks. In addition, the node-specific metric for the parent node is propagated to its sibling child nodes to control inter-level error propagation. Our experimental results demonstrate that our hierarchical metric learning algorithm achieves better results than other state-of-the-art algorithms.
- Published
- 2019
40. Three-step action search networks with deep Q-learning for real-time object tracking
- Author
-
Jianping Fan, Zhu Teng, and Baopeng Zhang
- Subjects
Computer science ,business.industry ,Q-learning ,Collaborative learning ,02 engineering and technology ,Object (computer science) ,Tracking (particle physics) ,01 natural sciences ,Convolutional neural network ,Artificial Intelligence ,Video tracking ,Sliding window protocol ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Software - Abstract
Sliding window and candidate sampling are two widely used search strategies for visual object tracking, but they are far behind real-time. By treating the tracking problem as a three-step decision-making process, a novel tracking network, which explores only three small subsets of candidate regions, is developed to achieve faster (real-time) localization of the target object along the frames in a video. A convolutional neural network agent is formulated to interact with a video over time, and two action-value functions are exploited to learn a favorable policy off-line to determine the best action for visual object tracking. Our model is trained in a collaborative learning way by using action classification and cumulative reward approximation in reinforcement learning. We have evaluated our proposed tracker against a number of state-of-the-art ones over three popular tracking benchmarks including OTB-2013, OTB-2015, and VOT2017. The experimental results have demonstrated that our proposed method can achieve very competitive performance on real-time object tracking.
- Published
- 2020
41. A generalized least-squares approach regularized with graph embedding for dimensionality reduction
- Author
-
Jianping Fan, Bing-Kun Bao, Si-Xing Liu, Chun-Hong Pan, Xiang-Jun Shen, and Zheng-Jun Zha
- Subjects
Graph embedding ,Computer science ,Dimensionality reduction ,02 engineering and technology ,Generalized least squares ,01 natural sciences ,Linear subspace ,Least squares ,Projection (linear algebra) ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,Principal component analysis ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,010306 general physics ,Algorithm ,Software ,Subspace topology - Abstract
In current graph embedding methods, low dimensional projections are obtained by preserving either global geometrical structure of data or local geometrical structure of data. In this paper, the PCA (Principal Component Analysis) idea of minimizing least-squares reconstruction errors is regularized with graph embedding, to unify various local manifold embedding methods within a generalized framework to keep global and local low dimensional subspace. Different from the well-known PCA method, our proposed generalized least-squares approach considers data distributions together with an instance penalty in each data point. In this way, PCA is viewed as a special instance of our proposed generalized least squares framework for preserving global projections. Applying a regulation of graph embedding, we can obtain projection that preserves both intrinsic geometrical structure and global structure of data. From the experimental results on a variety of face and handwritten digit recognition, our proposed method has advantage of superior performances in keeping lower dimensional subspaces and higher classification results than state-of-the-art graph embedding methods.
- Published
- 2020
42. Deep Mixture of Diverse Experts for Large-Scale Visual Recognition
- Author
-
Wei Zhang, Tianyi Zhao, Qiuyu Chen, Jianping Fan, Jun Yu, and Zhenzhong Kuang
- Subjects
Computer science ,business.industry ,Applied Mathematics ,Message passing ,Pattern recognition ,02 engineering and technology ,Object (computer science) ,Residual neural network ,Visualization ,Task (project management) ,Set (abstract data type) ,Computational Theory and Mathematics ,Discriminative model ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Layer (object-oriented design) ,business ,Software - Abstract
In this paper, a deep mixture of diverse experts algorithm is developed to achieve more efficient learning of a huge (mixture) network for large-scale visual recognition application. First, a two-layer ontology is constructed to assign large numbers of atomic object classes into a set of task groups according to the similarities of their learning complexities, where certain degrees of inter-group task overlapping are allowed to enable sufficient inter-group message passing. Second, one particular base deep CNNs with $M+1$ M + 1 outputs is learned for each task group to recognize its $M$ M atomic object classes and identify one special class of “not-in-group”, where the network structure (numbers of layers and units in each layer) of the well-designed deep CNNs (such as AlexNet, VGG, GoogleNet, ResNet) is directly used to configure such base deep CNNs. For enhancing the separability of the atomic object classes in the same task group, two approaches are developed to learn more discriminative base deep CNNs: (a) our deep multi-task learning algorithm that can effectively exploit the inter-class visual similarities; (b) our two-layer network cascade approach that can improve the accuracy rates for the hard object classes at certain degrees while effectively maintaining the high accuracy rates for the easy ones. Finally, all these complementary base deep CNNs with diverse but overlapped outputs are seamlessly combined to generate a mixture network with larger outputs for recognizing tens of thousands of atomic object classes. Our experimental results have demonstrated that our deep mixture of diverse experts algorithm can achieve very competitive results on large-scale visual recognition.
- Published
- 2018
43. Embedding Visual Hierarchy with Deep Networks for Large-Scale Visual Recognition
- Author
-
Jun Yu, Baopeng Zhang, Jianping Fan, Wei Zhang, Tianyi Zhao, Ming He, and Ning Zhou
- Subjects
FOS: Computer and information sciences ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,Mixture model ,01 natural sciences ,Computer Graphics and Computer-Aided Design ,Electronic mail ,Visualization ,Text mining ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Visual hierarchy ,Classifier (UML) ,Software ,0105 earth and related environmental sciences - Abstract
In this paper, a level-wise mixture model (LMM) is developed by embedding visual hierarchy with deep networks to support large-scale visual recognition (i.e., recognizing thousands or even tens of thousands of object classes), and a Bayesian approach is used to adapt a pre-trained visual hierarchy automatically to the improvements of deep features (that are used for image and object class representation) when more representative deep networks are learned along the time. Our LMM model can provide an end-to-end approach for jointly learning: (a) the deep networks to extract more discriminative deep features for image and object class representation; (b) the tree classifier for recognizing large numbers of object classes hierarchically; and (c) the visual hierarchy adaptation for achieving more accurate indexing of large numbers of object classes hierarchically. By supporting joint learning of the tree classifier, the deep networks and the visual hierarchy adaptation, our LMM algorithm can provide an effective approach for controlling inter-level error propagation effectively, thus it can achieve better accuracy rates on large-scale visual recognition. Our experiments are carried on ImageNet1K and ImageNet10K image sets, and our LMM algorithm can achieve very competitive results on both the accuracy rates and the computation efficiency as compared with the baseline methods.
- Published
- 2018
44. Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering
- Author
-
Chenchao Xiang, Dacheng Tao, Jianping Fan, Jun Yu, and Zhou Yu
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Computer Networks and Communications ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Pooling ,Feature extraction ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Machine learning ,computer.software_genre ,020901 industrial engineering & automation ,Discriminative model ,Knowledge extraction ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,Artificial Intelligence & Image Processing ,Artificial neural network ,business.industry ,Computer Science Applications ,Visualization ,Feature (computer vision) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Software ,Natural language - Abstract
Visual question answering (VQA) is challenging because it requires a simultaneous understanding of both visual content of images and textual content of questions. To support the VQA task, we need to find good solutions for the following three issues: 1) fine-grained feature representations for both the image and the question; 2) multi-modal feature fusion that is able to capture the complex interactions between multi-modal features; 3) automatic answer prediction that is able to consider the complex correlations between multiple diverse answers for the same question. For fine-grained image and question representations, a `co-attention' mechanism is developed by using a deep neural network architecture to jointly learn the attentions for both the image and the question, which can allow us to reduce the irrelevant features effectively and obtain more discriminative features for image and question representations. For multi-modal feature fusion, a generalized Multi-modal Factorized High-order pooling approach (MFH) is developed to achieve more effective fusion of multi-modal features by exploiting their correlations sufficiently, which can further result in superior VQA performance as compared with the state-of-the-art approaches. For answer prediction, the KL (Kullback-Leibler) divergence is used as the loss function to achieve precise characterization of the complex correlations between multiple diverse answers with the same or similar meaning, which can allow us to achieve faster convergence rate and obtain slightly better accuracy on answer prediction. A deep neural network architecture is designed to integrate all these aforementioned modules into a unified model for achieving superior VQA performance. With an ensemble of our MFH models, we achieve the state-of-the-art performance on the large-scale VQA datasets and win the runner-up in VQA Challenge 2017., Comment: 13 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:1708.01471
- Published
- 2018
45. Identifying and Analyzing Popular Phrases Multi-Dimensionally in Social Media Data
- Author
-
Jun Luo, Shengzhong Feng, Yong Zhang, Jianping Fan, Joshua Zhexue Huang, Zhongying Zhao, and Chao Li
- Subjects
Focus (computing) ,Hardware and Architecture ,Order (exchange) ,Computer science ,Perspective (graphical) ,Social media ,Data science ,Social network analysis ,Software ,Decision tree model - Abstract
With the success of social media, social network analysis has become a very hot research topic and attracted much attention in the last decade. Most studies focus on analyzing the whole network from the perspective of topology or contents. However, there is still no systematic model proposed for multi-dimensional analysis on big social media data. Furthermore, little work has been done on identifying emerging new popular phrases and analyzing them multi-dimensionally. In this paper, the authors first propose an interactive systematic framework. In order to detect the emerging new popular phrases effectively and efficiently, they present an N-Pat Tree model and give some filtering mechanisms. They also propose an algorithm to find and analyze new popular phrases multi-dimensionally. The experiments on one-year Tencent-Microblogs data have demonstrated the effectiveness of their work and shown many meaningful results.
- Published
- 2015
46. Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection
- Author
-
Ling Gao, Jinye Peng, Kuizhi Mei, Ji Zhang, and Jianping Fan
- Subjects
Incremental decision tree ,Contextual image classification ,Computer science ,business.industry ,Stability (learning theory) ,ID3 algorithm ,Pattern recognition ,Semi-supervised learning ,Machine learning ,computer.software_genre ,Generalization error ,Tree structure ,Discriminative model ,Artificial Intelligence ,Signal Processing ,Incremental learning ,Leverage (statistics) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Software - Abstract
In this paper, a cost-sensitive learning algorithm is developed to train hierarchical tree classifiers for large-scale image classification application (i.e., categorizing large-scale images into thousands of object classes). A visual tree is first constructed for organizing large numbers of object classes hierarchically and identifying inter-related learning tasks automatically. For the fine-grained object classes at the sibling leaf nodes, they share significant common visual properties but still contain subtle visual differences, thus a multi-task structural learning algorithm is developed to train their inter-related classifiers jointly to enhance their discrimination power. For the coarse-grained categories (i.e., groups of visually similar object classes) at the sibling non-leaf nodes, a hierarchical learning algorithm is developed to leverage tree structure (by adding two inter-level constraints) to train their inter-related classifiers jointly and control inter-level error propagation effectively. To achieve more robust detection of large numbers of object classes, a visual forest is learned by combining multiple visual trees (for different configurations) and their hierarchical tree classifiers. By penalizing various types of misclassification errors differently, a cost-sensitive learning approach is further developed to detect the appearances of new object classes accurately, and an incremental learning algorithm is developed to achieve more effective training of the discriminative classifiers for new object classes. Our experimental results have demonstrated that our cost-sensitive hierarchical learning algorithm can achieve very competitive results on both classification accuracy and computational efficiency as compared with other state-of-the-art techniques. HighlightsVisual tree to organize large-scale object classes hierarchically and determine inter-related learning tasks automatically.Multi-task structural learning for joint classifier training to enhance their discrimination power significantly.Hierarchical learning to leverage inter-level constraints for classifier training and limiting inter-level error propagation.Task and tree parallelism to scale up our hierarchical learning algorithm for large-scale image classification.Cost-sensitive learning and incremental learning for training and detecting for new object classes more effectively.
- Published
- 2015
47. Training more discriminative multi-class classifiers for hand detection
- Author
-
Kuizhi Mei, Guohui Li, Ji Zhang, Jianping Fan, Bao Xi, and Nanning Zheng
- Subjects
Boosting (machine learning) ,Computer science ,business.industry ,Pattern recognition ,Machine learning ,computer.software_genre ,Random subspace method ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Discriminant ,Artificial Intelligence ,Signal Processing ,Classifier (linguistics) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Software - Abstract
In this paper, an effective algorithm is developed to learn more discriminative multi-class classifiers for achieving more accurate hand detection. At each round of boosting, a set of shared stump classifiers with relatively low discrimination power are selected by using a "slowest error growth" discriminant, and they are further combined to generate a multi-class classifier with high discrimination power. For the learned multi-class classifier, all of its shared stump classifiers can jointly cover all the potential situations (i.e., various classes of hand postures) sufficiently and discriminate each class of hand postures more effectively. In addition, multiple thresholds are set for each stump classifier to enhance its discrimination power. Finally, the optional mask images are further used to reduce both the feature dimensions and the computational cost for searching the appropriate features. The experimental results on both our hand dataset and NUS hand posture dataset-II have demonstrated the effectiveness and efficiency of our algorithm. HighlightsA set of shared stumps are combined to strengthen the discrimination power of weak classifiers.A "slowest error growth" discriminant to determine the optimal combination of stumps.Multiple thresholds are leveraged in shared stumps to fit different classes.We associate effective features with different classes of hands, and employ mix-type features.As compared with JointBoost, our classifier can obtain a better classification performance with less runtime cost.
- Published
- 2015
48. Automatic image–text alignment for large-scale web image indexing and retrieval
- Author
-
Ning Zhou and Jianping Fan
- Subjects
Phrase ,Information retrieval ,Computer science ,Search engine indexing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Semantics ,Set (abstract data type) ,Artificial Intelligence ,Signal Processing ,Web page ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Relevance (information retrieval) ,Computer Vision and Pattern Recognition ,Cluster analysis ,Image retrieval ,Software - Abstract
In this paper, an automatic image-text alignment algorithm is developed to achieve more effective indexing and retrieval of large-scale web images by aligning web images with their most relevant auxiliary text terms or phrases. First, a large number of cross-media web pages (which contain web images and their auxiliary texts) are crawled and segmented into a set of image-text pairs (informative web images and their associated text terms or phrases). Second, near-duplicate image clustering is used to group large-scale web images into a set of clusters of near-duplicate images according to their visual similarities. The near-duplicate web images in the same cluster share similar semantics and are simultaneously associated with a same or similar set of auxiliary text terms or phrases which co-occur frequently in the relevant text blocks, thus performing near-duplicate image clustering can significantly reduce the uncertainty on the relatedness between the semantics of web images and their auxiliary text terms or phrases. Finally, random walk is performed over a phrase correlation network to achieve more precise image-text alignment by refining the relevance scores between the web images and their auxiliary text terms or phrases. Our experiments on algorithm evaluation have achieved very positive results on large-scale cross-media web pages. HighlightsAn image-text alignment algorithm was developed for web image indexing and retrieval.Image clustering was used to better align the semantics of the Web image and text.A phrase-correlation network was constructed to characterize their relationship.Random walk was performed to achieve more precise image-text alignment.
- Published
- 2015
49. Hierarchical clustering algorithm for categorical data using a probabilistic rough set model
- Author
-
Shaobo Deng, Lei Wang, Shengzhong Feng, Jianping Fan, and Min Li
- Subjects
DBSCAN ,Clustering high-dimensional data ,Information Systems and Management ,Fuzzy clustering ,Computer science ,Correlation clustering ,Single-linkage clustering ,Conceptual clustering ,computer.software_genre ,Fuzzy logic ,Management Information Systems ,Biclustering ,Artificial Intelligence ,CURE data clustering algorithm ,Consensus clustering ,Cluster analysis ,Categorical variable ,k-medians clustering ,Brown clustering ,business.industry ,Constrained clustering ,Probabilistic logic ,Pattern recognition ,Hierarchical clustering ,Determining the number of clusters in a data set ,ComputingMethodologies_PATTERNRECOGNITION ,Data stream clustering ,Canopy clustering algorithm ,FLAME clustering ,Affinity propagation ,Artificial intelligence ,Rough set ,Data mining ,Hierarchical clustering of networks ,business ,Algorithm ,computer ,Software - Abstract
Several clustering analysis techniques for categorical data exist to divide similar objects into groups. Some are able to handle uncertainty in the clustering process, whereas others have stability issues. In this paper, we propose a new technique called TMDP (Total Mean Distribution Precision) for selecting the partitioning attribute based on probabilistic rough set theory. On the basis of this technique, with the concept of granularity, we derive a new clustering algorithm, MTMDP (Maximum Total Mean Distribution Precision), for categorical data. The MTMDP algorithm is a robust clustering algorithm that handles uncertainty in the process of clustering categorical data. We compare the MTMDP algorithm with the MMR (Min–Min–Roughness) algorithm which is the most relevant clustering algorithm, and also compared it with other unstable clustering algorithms, such as k-modes, fuzzy k-modes and fuzzy centroids. The experimental results indicate that the MTMDP algorithm can be successfully used to analyze grouped categorical data because it produces better clustering results.
- Published
- 2014
50. Jointly Learning Visually Correlated Dictionaries for Large-Scale Visual Recognition Applications
- Author
-
Ning Zhou and Jianping Fan
- Subjects
Contextual image classification ,Computer science ,business.industry ,Applied Mathematics ,Pattern recognition ,Visualization ,Tree (data structure) ,Text mining ,Computational Theory and Mathematics ,Categorization ,Discriminative model ,Artificial Intelligence ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Set (psychology) ,Representation (mathematics) ,business ,Cluster analysis ,Software - Abstract
Learning discriminative dictionaries for image content representation plays a critical role in visual recognition. In this paper, we present a joint dictionary learning (JDL) algorithm which exploits the inter-category visual correlations to learn more discriminative dictionaries. Given a group of visually correlated categories, JDL simultaneously learns one common dictionary and multiple category-specific dictionaries to explicitly separate the shared visual atoms from the category-specific ones. The problem of JDL is formulated as a joint optimization with a discrimination promotion term according to the Fisher discrimination criterion. A visual tree method is developed to cluster a large number of categories into a set of disjoint groups, so that each of them contains a reasonable number of visually correlated categories. The process of image category clustering helps JDL to learn better dictionaries for classification by ensuring that the categories in the same group are of strong visual correlations. Also, it makes JDL to be computationally affordable in large-scale applications. Three classification schemes are adopted to make full use of the dictionaries learned by JDL for visual content representation in the task of image categorization. The effectiveness of the proposed algorithms has been evaluated using two image databases containing 17 and 1,000 categories, respectively.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.