1,625 results
Search Results
2. A method of fingermark anti-counterfeiting for forensic document identification
- Author
-
Yongliang Zhang, Keyi Zhu, Yufan Lv, Chenhao Gao, and Zhiwei Li
- Subjects
Computer science ,business.industry ,Feature extraction ,Pattern recognition ,Convolutional neural network ,Forensic identification ,Identification (information) ,Artificial Intelligence ,Fingerprint ,Signal Processing ,Key (cryptography) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
Automatic fingerprint identification is a key technology of biological attribute measurement. Due to its sensitivity to fake fingerprint attack, its security has been widely concerned. The fingermark is an important evidence for forensic documents identification. The existence of fake fingermark seriously threatens the fairness and legitimacy in the process of forensic identification. In this paper, in view of the security risks existing in the fingermark identification of forensic documents, a database named JLW-FM-DB is introduced for detecting genuine and fake fingermarks, which consists of two sub databases, signed and unsigned, each of which covers common fake fingermark materials. Based on the database, this paper proposes a method of fingermark anti-counterfeiting based on convolution neural network(CNN). A Patch-Label training strategy is proposed, which uses the unified label as the supervision signal for the class heatmap output by the last convolution layer. This strategy realizes stronger local supervision ability to input fingermark image and enhances local coding ability of CNN feature extraction. The experiments show that our methods are suitable for the detection of genuine and fake fingermarks, achieving 97.999% average accuracy in the signed fingermark database. The effectiveness of Patch-Label training strategy in fingermark anti-counterfeiting is also proved. Moreover, a model fusion of different models can further improve the detection ability of genuine and fake fingermarks, reaching to 98.496% average accuracy.
- Published
- 2021
3. Imbalanced image classification with complement cross entropy
- Author
-
Younkwan Lee, Moongu Jeon, and Yechan Kim
- Subjects
FOS: Computer and information sciences ,Ground truth ,Contextual image classification ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Deep learning ,Computer Science - Computer Vision and Pattern Recognition ,Machine learning ,computer.software_genre ,Class (biology) ,Complement (complexity) ,Cross entropy ,Artificial Intelligence ,Signal Processing ,Softmax function ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Limit (mathematics) ,business ,computer ,Software - Abstract
Recently, deep learning models have achieved great success in computer vision applications, relying on large-scale class-balanced datasets. However, imbalanced class distributions still limit the wide applicability of these models due to degradation in performance. To solve this problem, in this paper, we concentrate on the study of cross entropy which mostly ignores output scores on incorrect classes. This work discovers that neutralizing predicted probabilities on incorrect classes improves the prediction accuracy for imbalanced image classification. This paper proposes a simple but effective loss named complement cross entropy based on this finding. The proposed loss makes the ground truth class overwhelm the other classes in terms of softmax probability, by neutralizing probabilities of incorrect classes, without additional training procedures. Along with it, this loss facilitates the models to learn key information especially from samples on minority classes. It ensures more accurate and robust classification results on imbalanced distributions. Extensive experiments on imbalanced datasets demonstrate the effectiveness of the proposed method., 8 pages, Accepted to Pattern Recognition Letters (PRL), August 2021
- Published
- 2021
4. Multimodal grid features and cell pointers for scene text visual question answering
- Author
-
Dimosthenis Karatzas, Ernest Valveny, Marçal Rusiñol, Ali Furkan Biten, Lluis Gomez, Andres Mafla, and Rubèn Tito
- Subjects
FOS: Computer and information sciences ,Information retrieval ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,media_common.quotation_subject ,Deep learning ,Computer Science - Computer Vision and Pattern Recognition ,Inference ,DUAL (cognitive architecture) ,Grid ,Task (project management) ,Artificial Intelligence ,Reading (process) ,Signal Processing ,Code (cryptography) ,Question answering ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,media_common - Abstract
This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it. The proposed model is based on an attention mechanism that attends to multi-modal features conditioned to the question, allowing it to reason jointly about the textual and visual modalities in the scene. The output weights of this attention module over the grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text the to the given question. Our experiments demonstrate competitive performance in two standard datasets. Furthermore, this paper provides a novel analysis of the ST-VQA dataset based on a human performance study., Comment: This paper is under consideration at Pattern Recognition Letters
- Published
- 2021
5. Improving user verification in human-robot interaction from audio or image inputs through sample quality assessment
- Author
-
Modesto Castrillón-Santana, Pedro A. Marín-Reyes, Adrian Penate-Sanchez, Kevin Rosales-Santana, David Freire-Obregón, and Javier Lorenzo-Navarro
- Subjects
Audio signal ,Biometrics ,Computer science ,business.industry ,Deep learning ,Context (language use) ,Human–robot interaction ,Artificial Intelligence ,Face (geometry) ,Signal Processing ,Robot ,Computer vision ,Computer Vision and Pattern Recognition ,Noise (video) ,Artificial intelligence ,business ,Software - Abstract
In this paper, we tackle the task of improving biometric verification in the context of Human-Robot Interaction (HRI). A robot that wants to identify a specific person to provide a service can do so by either image verification or, if light conditions are not favourable, through voice verification. In our approach, we will take advantage of the possibility a robot has of recovering further data until it is sure of the identity of the person. The key contribution is that we select from both image and audio signals the parts that are of higher confidence. For images we use a system that looks at the face of each person and selects frames in which the confidence is high while keeping those frames separate in time to avoid using very similar facial appearance. For audio our approach tries to find the parts of the signal that contain a person talking, avoiding those in which noise is present by segmenting the signal. Once the parts of interest are found, each input is described with an independent deep learning architecture that obtains a descriptor for each kind of input (face/voice). We also present in this paper fusion methods that improve performance by combining the features from both face and voice, results to validate this are shown for each independent input and for the fusion methods.
- Published
- 2021
6. Beyond visual semantics: Exploring the role of scene text in image understanding
- Author
-
Ernest Valveny, Gaurav Harit, Arka Ujjal Dey, and Suman K. Ghosh
- Subjects
FOS: Computer and information sciences ,Vocabulary ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Semantic interpretation ,media_common.quotation_subject ,Absolute accuracy ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Text recognition ,computer.software_genre ,ENCODE ,Artificial Intelligence ,Signal Processing ,Embedding ,Leverage (statistics) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Everyday life ,business ,computer ,Software ,Natural language processing ,media_common - Abstract
Images with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We do not only extract and encode visual and scene text cues, but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images, with scene text content, to demonstrate its effectiveness. In the retrieval framework, we augment our learned text-visual semantic representation with scene text cues, to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous recognition of scene text, we also apply query-based attention to our text channel. We show how the multi-channel approach, involving visual semantics and scene text, improves upon state of the art., Comment: The paper is under consideration at Pattern Recognition Letters
- Published
- 2021
7. Recognition of 3D emotional facial expression based on handcrafted and deep feature combination
- Author
-
Walid Hariri and Nadir Farah
- Subjects
Facial expression ,Computer science ,business.industry ,Deep learning ,Pattern recognition ,02 engineering and technology ,Covariance ,01 natural sciences ,Convolutional neural network ,Support vector machine ,Artificial Intelligence ,Face (geometry) ,0103 physical sciences ,Signal Processing ,Pattern recognition (psychology) ,Radial basis function kernel ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Software - Abstract
Facial emotion recognition (FER) methods have been proposed mainly using 2D images. These methods suffer from many problems caused by the difficult conditions of unconstrained environments such as light conditions and view variations. In this paper, we aim to recognize the emotional facial expressions independently of their identity using the 3D data and 2D depth images. Since the 3D FER is a very fine-grained recognition task, mapping the 3D images into 2D depth images may lack some geometric characteristics of the expressive face and decay the FER performance. Convolutional Neural Networks (CNN), however, have been successfully applied to the 2D depth images and improved handcrafted-based methods in computer vision and pattern recognition applications. For this reason, we combine in this paper two types of features; handcrafted and deep learning features and prove their complementarity for 3D FER. Favorably, covariance descriptors have proven a very good ability to combine features from different types into a compact representation. Therefore, we propose to use the covariance matrices of features (handcrafted and deep ones), instead of the features independently. Since covariance matrices belong to one of the manifold space types, formed by SPD (Symmetric Positive Definite) matrices, we mainly focus on the generalization of the RBF kernel to the manifold space for 3D FER using a supervised SVM classification. The achieved performance of the proposed method on the Bosphorus and BU-3DFE datasets outperforms similar state-of-the-arts.
- Published
- 2021
8. Building crack identification and total quality management method based on deep learning
- Author
-
Xinhua Wu and Xiujie Liu
- Subjects
Contextual image classification ,Computer science ,business.industry ,Deep learning ,Stability (learning theory) ,Image processing ,02 engineering and technology ,Image segmentation ,computer.software_genre ,01 natural sciences ,Convolutional neural network ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Computer Vision and Pattern Recognition ,Data mining ,Artificial intelligence ,010306 general physics ,Closing (morphology) ,business ,computer ,Software - Abstract
The existence of cracks will affect the stability of the building. It is very important to identify and deal with the cracks in time to ensure the safety and stability of the building. Based on the above background, the purpose of this paper is to study the method of building crack recognition and total quality management based on deep learning. This paper focuses on the computer vision technology in artificial intelligence, studies the image classification algorithm and semantic segmentation algorithm based on the deep learning method, and applies it to the field of building crack image analysis. In this paper, we use the deep convolution neural network to design the building image crack classification model and segmentation model, realize the identification and analysis of building cracks, and build a building crack analysis system, which can significantly improve the efficiency of building crack detection. Then, based on the image processing technology, the quantitative analysis of the fracture segmentation results is carried out. Through the basic morphological methods such as corrosion, expansion, opening and closing operations, the segmentation mark map, skeleton map and geometric parameter information of the fracture are obtained, which further provides the maintenance and judgment basis for professional engineers. The experimental results show that compared with FCN, the accuracy of rfcn-a is improved by 5.98%, the precision is improved by 6.07%, and the real and f'score are improved by 3.11% and 6.01%, respectively.
- Published
- 2021
9. Large group activity security risk assessment and risk early warning based on random forest algorithm
- Author
-
Yanyu Chen, Yimiao Huang, Wenbo Li, and Wenzhe Zheng
- Subjects
Risk analysis ,Warning system ,Computer science ,business.industry ,Deep learning ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,Random forest ,Artificial Intelligence ,Risk index ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,Risk assessment ,Large group ,business ,computer ,Software - Abstract
With the continuous development of artificial intelligence, machine learning, the necessary way to achieve artificial intelligence, is also constantly improving, of which deep learning is one of the contents. The purpose of this paper is to evaluate and warn the security risk of large-scale group activities based on the random forest algorithm. This paper uses the methods of calculating the importance of the random forest algorithm to variables and the calculation formula of the weight of the security risk index, and combining the model parameters of the random forest algorithm The optimization experiment and the random forest model training experiment are used for risk analysis, and the classification accuracy rate reaches a maximum of 0.86, which leads to the conclusion that the random forest algorithm has good predictive ability in the risk assessment of large-scale group activities. This article takes a certain international youth environmental protection festival as an example for analysis, and better verifies the feasibility and effectiveness of this article.
- Published
- 2021
10. Violence detection and face recognition based on deep learning
- Author
-
En Fan, Peng Wang, and Pin Wang
- Subjects
Computer science ,business.industry ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,Facial recognition system ,Convolutional neural network ,Artificial Intelligence ,Face (geometry) ,Computer Science::Multimedia ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Violence detection ,Artificial intelligence ,010306 general physics ,business ,Software - Abstract
With the emergence of the concept of “safe city”, security construction has gradually been valued by various cities, and video surveillance technology has also been continuously developed and applied. However, as the functional requirements of actual applications become more and more diverse, video surveillance systems also need to be more intelligent. The purpose of this article is to study methods of brute force detection and face recognition based on deep learning. Aiming at the problem of abnormal behavior detection, especially the low efficiency and low accuracy of brute force detection, a brute force detection method based on the combination of convolutional neural network and trajectory is proposed. This method uses artificial features and depth features to extract the spatiotemporal features of the video through a convolutional neural network and combines them with the trajectory features. In view of the problem that face images in surveillance video cannot be accurately recognized due to low resolution, two models are proposed: the multi-foot input CNN model and the SPP-based CNN model. By testing the performance of the brute force detection method proposed in this paper, the accuracy of the method on the Crow and Hockey datasets is as high as 92% and 97.6%, respectively. Experimental results show that the violence detection method proposed in this paper improves the accuracy of violence detection in video.
- Published
- 2021
11. Object-oriented remote sensing image information extraction method based on multi-classifier combination and deep learning algorithm
- Author
-
Jiping Hu, Qulin Tan, Bin Guo, Xiaofeng Dong, and Jun Hu
- Subjects
Object-oriented programming ,Pixel ,business.industry ,Computer science ,Deep learning ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Active appearance model ,Information extraction ,Tree structure ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Algorithm ,Classifier (UML) ,computer ,Software ,Remote sensing - Abstract
In recent years, high spatial resolution remote sensing technology has made significant progress. High-resolution remote sensing satellites provide great convenience for high-quality image acquisition. In order to adapt to changes in the appearance of the target, mainstream tracking algorithms often use pattern recognition methods to build a target appearance model with learning capabilities, and use the image frames acquired during the tracking process to update the appearance model. This paper mainly studies the object-oriented remote sensing image information extraction method based on multi-classifier combination and deep learning algorithm. In this paper, we use the splitting mechanism of the tree structure to retain the appearance model with diversity, and through the integrated learning integration strategy, the target position is collaboratively predicted. Through the comparative analysis on the OTB and VOT platforms, the algorithm works well when the requirements of the tracking standards are low (the accuracy threshold is greater than 20 pixels and the success threshold is less than 0.4 pixels). The experimental results in this paper show that compared with other advanced classification methods, the proposed method shows better generalization performance in accuracy, recall, f-measure, g-mean and AUC.
- Published
- 2021
12. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning
- Author
-
Peng Wang, En Fan, and Pin Wang
- Subjects
Computer science ,Image processing ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,Convolutional neural network ,Field (computer science) ,Artificial Intelligence ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,010306 general physics ,Contextual image classification ,business.industry ,Deep learning ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Signal Processing ,Feedforward neural network ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Algorithm ,Software ,MNIST database - Abstract
Image classification is a hot research topic in today's society and an important direction in the field of image processing research. SVM is a very powerful classification model in machine learning. CNN is a type of feedforward neural network that includes convolution calculation and has a deep structure. It is one of the representative algorithms of deep learning. Taking SVM and CNN as examples, this paper compares and analyzes the traditional machine learning and deep learning image classification algorithms. This study found that when using a large sample mnist dataset, the accuracy of SVM is 0.88 and the accuracy of CNN is 0.98; when using a small sample COREL1000 dataset, the accuracy of SVM is 0.86 and the accuracy of CNN is 0.83. The experimental results in this paper show that traditional machine learning has a better solution effect on small sample data sets, and deep learning framework has higher recognition accuracy on large sample data sets.
- Published
- 2021
13. Gradient clustering algorithm based on deep learning aerial image detection
- Author
-
Bin Guo, Ning Liu, Xinju Li, and Xiangyu Min
- Subjects
business.industry ,Computer science ,Deep learning ,Carry (arithmetic) ,Forest management ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Parallel processing (DSP implementation) ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Data mining ,010306 general physics ,Cluster analysis ,business ,computer ,Software ,Aerial image - Abstract
In recent years, computer vision, especially deep learning, has been widely used in various fields. Through the deep learning aerial image detection gradient clustering algorithm automatic recognition, it can solve the limitations of manual shooting by humans, can shoot from a high altitude to a panoramic view of a specific area, and provide a more comprehensive solution. The traditional forest resource management and management work is mainly carried out by forestry personnel to carry out a large number of investigations and investigations on the forest. This method not only consumes a lot of manpower and material resources, but also does not have real-time nature. It is difficult to deal with all kinds of forest management. Problems, causing unnecessary losses. In this regard, this paper proposes an aerial image change detection algorithm based on H-KFCM, and designs related experiments to verify and demonstrate the performance of the algorithm. In this paper, we conduct a parallel study based on deep learning on the gradient clustering algorithm of deep learning in aerial image processing. By using CUDA (Compute Unified Device Architecture) to perform large-scale parallel processing of aerial data. Can greatly shorten the time to obtain results, improve the efficiency of relevant personnel. Experiment analysis. It can be seen from the results that the deep learning parallelization program implemented in this paper has a faster calculation speed and uses less time in high-resolution images, and has a good acceleration ratio compared to the CPU.
- Published
- 2021
14. Simultaneous 3D hand detection and pose estimation using single depth images
- Author
-
Siya Mi, Jianxin Wu, Yu Zhang, and Xin Geng
- Subjects
Computer science ,business.industry ,02 engineering and technology ,3D pose estimation ,01 natural sciences ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Pose ,Software - Abstract
In this paper, we investigate 3D hand pose estimation using single depth images. On the one hand, accurate hand localization is a crucial factor for pose estimation. On the other hand, multi-task learning methods have achieved great success in visual recognition tasks. Therefore, in this paper we proposed to simultaneously detect the hand location and estimate its 3D pose in a multi-task learning framework. We used 3D region proposal for 3D pose estimation, which searches possible hand locations in the 3D space. In the experimental part, the proposed method is evaluated on several benchmark datasets and shown it is comparable to most existing 3D hand pose estimation methods.
- Published
- 2020
15. Speech emotion recognition model based on Bi-GRU and Focal Loss
- Author
-
Yi Hu, Zijiang Zhu, Junshan Li, and Weihuang Dai
- Subjects
Recall ,Computer science ,business.industry ,Deep learning ,Speech recognition ,Confusion matrix ,Sample (statistics) ,02 engineering and technology ,01 natural sciences ,Recurrent neural network ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Emotion recognition ,Artificial intelligence ,010306 general physics ,business ,Software - Abstract
For the problems of inconsistent sample duration and unbalance of sample categories in the speech emotion corpus, this paper proposes a speech emotion recognition model based on Bi-GRU (Bidirection Gated Recurrent Unit) and Focal Loss. The model has been improved on the basis of learning CRNN (Convolutional Recurrent Neural Network) deeply. In CRNN, Bi-GRU is used to effectively lengthen the samples of the speech with short duration, and Focal Loss function is used to deal with the difficulties in classification caused by the imbalance of emotional categories of the samples. Through different methods for experimental comparison, weighted average recall (WAR), unweighted average recall (UAR) and confusion matrix (CM) are used as evaluation index of the algorithm. The experimental results show that the speech emotion recognition model proposed in this paper improves the recognition accuracy and the imbalance of IEMOCAP database samples, and can effectively prove that the improvement of speech emotion recognition performance is not due to the adjustment of model parameters or the change of the model topology.
- Published
- 2020
16. The classification of gliomas based on a Pyramid dilated convolution resnet model
- Author
-
Zhenyu Lu, Shan-Shan Lu, Shuihua Wang, Yanzhong Bai, Chun-Qiu Su, Yi Chen, Xunning Hong, and Tianming Zhan
- Subjects
Computer science ,02 engineering and technology ,01 natural sciences ,Convolutional neural network ,Residual neural network ,Convolution ,Artificial Intelligence ,Glioma ,0103 physical sciences ,Pyramid ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Pyramid (image processing) ,010306 general physics ,medicine.diagnostic_test ,Artificial neural network ,business.industry ,Deep learning ,food and beverages ,Magnetic resonance imaging ,Pattern recognition ,medicine.disease ,Signal Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
Gliomas are characterized by high morbidity and high mortality in primary tumors. The identification of glioma type is helpful for radiologists to facilitate correct medical judgments and better prognosis for patients. In order to avoid harm to patients caused by a biopsy, radiologists attempt to classify Magnetic Resonance Images(MRI) using deep learning methods. In the present paper, we propose a deep learning convolutional neural network ResNet based on the pyramid dilated convolution for Gliomas classification. The pyramid dilated convolution is integrated into the bottom of Resnet to increase the receptive field of the original network and improve the classification accuracy. After adding the pyramid dilated convolution model, the receptive field of the original network underlying convolution was improved. A clinical dataset is used to test the pyramid dilated convolution ResNet neural network model proposed in this paper. The experimental results demonstrate that the proposed method can effectively improve glioma classification performance.
- Published
- 2020
17. Automated detection and classification of fundus diabetic retinopathy images using synergic deep learning model
- Author
-
Deepak Gupta, Hari Mohan Pandey, Abdul Rahaman Wahab Sait, Ashish Khanna, K. Shankar, and S. K. Lakshmanaprabu
- Subjects
Computer science ,business.industry ,Deep learning ,Pattern recognition ,02 engineering and technology ,Diabetic retinopathy ,Fundus (eye) ,medicine.disease ,01 natural sciences ,Artificial Intelligence ,Histogram ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,020201 artificial intelligence & image processing ,Segmentation ,Computer Vision and Pattern Recognition ,Noise (video) ,Artificial intelligence ,010306 general physics ,business ,Software - Abstract
In recent days, the incidence of Diabetic Retinopathy (DR)has become high, affecting the eyes because of drastic increase in the glucose level in blood. Globally, almost half of the people under the age of 70 gets severely affected by diabetes. In the absence of earlier recognition and proper medication, the DR patients tend to lose their vision. When the warning signs are tracked down, the severity level of the disease has to be validated so to take decisions regarding appropriate treatment further. The current research paper focuses on the concept of classification of DR fundus images on the basis of severity level using a deep learning model. This paper proposes a deep learning-based automated detection and classification model for fundus DR images. The proposed method involves various processes namely preprocessing, segmentation and classification. The methods begins with preprocessing stage in which unnecessary noise that exists in the edges is removed. Next, histogram-based segmentation takes place to extract the useful regions from the image. Then, Synergic Deep Learning (SDL) model was applied to classify the DR fundus images to various severity levels. The justification for the presented SDL model was carried out on Messidor DR dataset. The experimentation results indicated that the presented SDL model offers better classification over the existing models.
- Published
- 2020
18. Multiple Instance Learning with Genetic Pooling for medical data analysis
- Author
-
Kamanasish Bhattacharjee, Millie Pant, Suresh Chandra Satapathy, and Yudong Zhang
- Subjects
Computer science ,Pooling ,Initialization ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,Artificial Intelligence ,0103 physical sciences ,Genetic algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Layer (object-oriented design) ,010306 general physics ,Class (computer programming) ,Artificial neural network ,business.industry ,Supervised learning ,Function (mathematics) ,ComputingMethodologies_PATTERNRECOGNITION ,Signal Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software - Abstract
Multiple Instance Learning is a weakly supervised learning technique which is particularly well suited for medical data analysis as the class labels are often not available at desired granularity. Multiple Instance Learning through Deep Neural Networks is relatively a new paradigm in machine learning. The most important part of Multiple Instance Learning through Deep Neural Networks is designing a trainable pooling function which determines the instance-to bag relationship. In this paper, we propose a Multiple Instance pooling technique based on Genetic Algorithm called Genetic Pooling. In this technique, instance labels inside a bag are optimized by minimizing bag-level losses. The main contribution of the paper is that the bag level pooling layer for generating attention weights for bag instances are replaced by random initialization of attention weights and finding the optimized attention weights through Genetic Algorithm.
- Published
- 2020
19. Self-paced Learning for K-means Clustering Algorithm
- Author
-
Cong Lei, Guoqiu Wen, Wei Zheng, Jiangzhang Gan, and Hao Yu
- Subjects
Computer science ,Generalization ,business.industry ,k-means clustering ,Pattern recognition ,02 engineering and technology ,Construct (python library) ,Base (topology) ,01 natural sciences ,Data set ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,Outlier ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Noise (video) ,010306 general physics ,Cluster analysis ,business ,Software - Abstract
The traditional K-means clustering algorithm is easily affected by the noise, outliers and falling into local optimal solution. This paper proposes a K-means clustering algorithm based on self-paced learning. Firstly, a best training subset is selected to construct the initial cluster model base on self-paced learning theory, and then enhances the generalization ability of the initial clustering model by adding sub-good subsets of samples one by one until the model performance is optimal or all training samples are used up. By analyzing the experimental results, the clustering algorithm proposed in this paper achieves better performance than the compare algorithms on the five real data sets.
- Published
- 2020
20. Sample reduction using farthest boundary point estimation (FBPE) for support vector data description (SVDD)
- Author
-
Shamshe Alam, Sonali Agarwal, Sanjay Kumar Sonbhadra, Muhammad Tanveer, and P. Nagabhushan
- Subjects
Training set ,Computer science ,business.industry ,Boundary (topology) ,Sample (statistics) ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,Class (biology) ,Support vector machine ,Reduction (complexity) ,Credit card ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Software ,MNIST database - Abstract
The objective of this paper is to design an algorithm to maximize the learning ability and knowledge about the target class while minimizing the number of training samples for support vector data description (SVDD). With this motivation, a novel training sample reduction algorithm is proposed in this paper that selects the most promising boundary data points as training set. The proposed approach uses the local geometry of the distribution to estimate the farthest boundary points (also known as extreme points). The legitimacy of the proposed algorithm is verified via experiments performed on MNIST, Iris, UCI default credit card, svmguide and Indian Pines datasets.
- Published
- 2020
21. Multi-label chest X-ray image classification via category-wise residual attention learning
- Author
-
Qingji Guan and Yaping Huang
- Subjects
Scheme (programming language) ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Residual ,01 natural sciences ,Convolutional neural network ,Image (mathematics) ,Artificial Intelligence ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,010306 general physics ,Representation (mathematics) ,computer.programming_language ,Contextual image classification ,business.industry ,Pattern recognition ,Feature (computer vision) ,Signal Processing ,Embedding ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software - Abstract
This paper considers the problem of multi-label thorax disease classification on chest X-ray images. Identifying one or more pathologies from a chest X-ray image is often hindered by the pathologies unrelated to the targets. In this paper, we address the above problem by proposing a category-wise residual attention learning (CRAL) framework. CRAL predicts the presence of multiple pathologies in a class-specific attentive view. It aims to suppress the obstacles of irrelevant classes by endowing small weights to the corresponding feature representation. Meanwhile, the relevant features would be strengthened by assigning larger weights. Specifically, the proposed framework consists of two modules: feature embedding module and attention learning module. The feature embedding module learns high-level features with a convolutional neural network (CNN) while the attention learning module focuses on exploring the assignment scheme of different categories. The attention module can be flexibly integrated into any feature embedding networks with end-to-end training. The comprehensive experiments are conducted on the Chest X-ray14 dataset. CRAL yields the average AUC score of 0.816 which is a new state of the art.
- Published
- 2020
22. Semi-supervised cross-modal common representation learning with vector-valued manifold regularization
- Author
-
Ting Wang, Gang Dai, and Hong Zhang
- Subjects
Computer science ,Feature vector ,02 engineering and technology ,01 natural sciences ,Kernel (linear algebra) ,symbols.namesake ,Artificial Intelligence ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,010306 general physics ,Representation (mathematics) ,Manifold regularization ,business.industry ,Hilbert space ,Pattern recognition ,Kernel method ,Kernel (image processing) ,Feature (computer vision) ,Signal Processing ,symbols ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Feature learning ,Software ,Reproducing kernel Hilbert space - Abstract
While cross-media data, like text, image, audio, video and 3D model, has been the main form of big data, there is a current dearth of research on cross-media retrieval. In this paper, we focus on how to learn the common representation of heterogeneous data which is a key challenge for cross-media retrieval. Most existing approaches linearly project original low-level feature into a joint feature space for isomorphic data representation. However, linear projection cannot capture most complex cross-modal correlation with high nonlinearity. In this paper, we propose a novel feature learning algorithm, which is semi-supervised cross-modal vector-valued manifold regularization (SCVM), to explore common representation of heterogeneous data. SCVM jointly explores low-level feature correlation and semantic information in a unified framework. Based on manifold regularization, we learn cross-media features from vector-valued reproducing kernel Hilbert spaces (RKHS) by kernel transformation on both labeled and unlabeled samples; moreover, we impose smoothness constraints of possible solutions to improve retrieval accuracy. Comparing with the current state-of-the-art approaches on two public datasets, comprehensive experimental results show superior performance of our SCVM. The method is more robust and stable when extended from two media types to five media types, which is very attractive in practical application.
- Published
- 2020
23. Learning a strong detector for action localization in videos
- Author
-
Bernard Ghanem, Yancheng Bai, Mingli Ding, Yongqiang Zhang, and Dandan Liu
- Subjects
Normalization (statistics) ,business.industry ,Computer science ,Pipeline (computing) ,Detector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Normalization (image processing) ,02 engineering and technology ,01 natural sciences ,Task (computing) ,Action (philosophy) ,Artificial Intelligence ,Margin (machine learning) ,Feature (computer vision) ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Software - Abstract
We address the problem of spatio-temporal action localization in videos in this paper. Current state-of-the-art methods for this challenging task rely on an object detector to localize actors at frame-level firstly, and then link or track the detections across time. Most of these methods commonly pay more attention to leveraging the temporal context of videos for action detection while ignoring the importance of the object detector itself. In this paper, we prove the importance of the object detector in the pipeline of action localization, and propose a strong object detector for better action localization in videos, which is based on the single shot multibox detector (SSD) framework. Different from SSD, we introduce an anchor refine branch at the end of the backbone network to refine the input anchors, and add a batch normalization layer before concatenating the intermediate feature maps at frame-level and after stacking feature maps at clip-level. The proposed strong detector have two contributions: (1) reducing the phenomenon of missing target objects at frame-level; (2) generating deformable anchor cuboids for modeling temporal dynamic actions. Extensive experiments on UCF-Sports, J-HMDB and UCF-101 validate our claims, and we outperform the previous state-of-the-art methods by a large margin in terms of frame-mAP and video-mAP, especially at a higher overlap threshold.
- Published
- 2019
24. 3D Reconstruction system for collaborative scanning based on multiple RGB-D cameras
- Author
-
Lei Yu, Shumin Fei, Junyi Hou, and Haonan Xu
- Subjects
Sequence ,business.industry ,Computer science ,3D reconstruction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Bundle adjustment ,02 engineering and technology ,Virtual reality ,01 natural sciences ,Data acquisition ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Pose ,Software - Abstract
Current 3D reconstruction systems can easily fall into the problems of insufficient memory and low efficiency when reconstructing complex large scenes, and the reconstructed model drifts a lot at the same time. To solve the above problems, a 3D reconstruction system for collaborative scanning based on multiple RGB-D Cameras (Xtion sensors) is presented. In the case of no pose estimation, this paper can obtain the camera pose initial values of the sequence images through the image acquisition platform that has been calibrated in advance, which can well cope with the problems existing in the current 3D reconstruction system. The main innovations of this paper are as follows: (1) In view of the large amount of human-computer interaction required for the current 3D reconstruction system and the high requirements for data acquisition, a collaborative scanning 3D reconstruction system based on multiple Xiton sensors (RGB-D camera) is developed. The proposed system does not require human-computer interaction and can capture high quality image sequences fully automatically; (2) In view of the problems of memory shortage and low efficiency in current 3D reconstruction systems applied to complex large-scale scenes, this paper proposes a system which can obtain camera poses in advance calibration without the need for camera pose estimation; (3) In terms of camera pose optimization, a segmented bundle adjustment method has been presented to obtain the high-precision camera poses. A large number of experiments demonstrate that the proposed system can effectively solve the existing problems. Meanwhile, the proposed reconstruction system can obtain high-precision 3D models suitable for various complex large scenes, which can be widely used in human-computer interaction, virtual reality and other fields.
- Published
- 2019
25. Ground and aerial meta-data integration for localization and reconstruction: A review
- Author
-
Zhiheng Wang, Shuhan Shen, Xiang Gao, and Zhanyi Hu
- Subjects
Computer science ,business.industry ,Point cloud ,02 engineering and technology ,01 natural sciences ,Metadata ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,Scale (map) ,business ,Software - Abstract
Localization and reconstruction are two highly related research areas. Both of them have developed rapidly in recent years. Apparently, with the help of ground and aerial meta-data integration, the performance of both localization and reconstruction can go a step further. For localization, aerial meta-data provides a global reference, by which the ground query can achieve a cumulative error free absolute localization. As for reconstruction, a complete and detailed model can be reconstructed by integrating ground and aerial meta-data. Though with many advantages, the integration itself is non-trivial. It is difficult to obtain ground-to-aerial correspondences neither in 2D manner nor in 3D manner. That is because: (1) The differences between the ground and aerial images in viewpoint, scale, illumination, etc. are notable; (2) The discrepancies between the ground and aerial point clouds in terms of point density, accuracy, noise level, etc. are very large. To deal with these problems, lots of methods have been proposed recently. In this paper, the methods of integrating ground and aerial meta-data for localization and reconstruction are reviewed respectively. Though many intermediate results with high quality have been achieved, we hope that inspired by the reviewed methods in this paper, more thorough methods and impressive results would emerge.
- Published
- 2019
26. Biopen–Fusing password choice and biometric interaction at presentation level
- Author
-
Genoveffa Tortora, Federico Ponzi, Federico Scozzafava, and Maria De Marsico
- Subjects
Dynamic time warping ,Spoofing attack ,Biometrics ,Biometric authentication ,Computer science ,Augmented pen ,Dynamic writing recognition ,02 engineering and technology ,01 natural sciences ,Artificial Intelligence ,Handwriting ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,010306 general physics ,Password ,business.industry ,Passphrase ,Software ,Signal Processing ,1707 ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Word (computer architecture) - Abstract
The paper presents experiments with the home-made, low-cost prototype of a sensor-equipped pen for handwriting-based biometric authentication. The pen allows to capture the dynamics of user writing on normal paper, while producing a kind of password (passphrase) chosen in advance. The use of a word of any length instead of the user's signature makes the approach more robust to spoofing, since there is no repetitive pattern to steal. Moreover, if the template gets violated, this is much less harmful than signature catch. The entailed sensors are a pair of accelerometer and gyroscope and a pressure sensor. The aim is a natural yet precise interaction, that allows recognizing the user by the signals recorded while producing a specific word chosen during enrollment and possibly changed later. The pen can be exploited in a number of applications requiring user recognition, yet relieving from the need to learn complex procedures, and to undergo critical capture operations. The approach fuses the use of a kind of password, though not necessarily complex as those requested by traditional approaches, and biometric recognition. The novelty with respect to most proposals in literature is the combination of three elements at once: the matching of any handwritten text instead of user signature, the on-line capture of seven sensor signals to recognize handwriting dynamics (three from accelerometer, three from gyroscope and one from pressure sensor), and the use of normal paper instead of a digitizing tablet. Presented experiments test two different recognition techniques, implemented by two modules that can be alternatively plugged into the system. An SVM-based verification module entails to extract the most relevant features from writing dynamics, and to acquire a sufficient amount of enrolling data (30 samples per user) to train an SVM for each user. A pure Dynamic Time Warping (DTW) verification module does not require such training, and is tested using either a gallery with the same number of templates per user as those used for SVM training, or with a gallery containing a much lower number of templates per user (namely 5). Obtained results encourage further investigation of lightweight strategies for written password dynamics recognition.
- Published
- 2019
27. A biometric system based on Gabor feature extraction with SVM classifier for Finger-Knuckle-Print
- Author
-
A. Kavipriya and A. Muthukumar
- Subjects
Biometrics ,Computer science ,Feature extraction ,02 engineering and technology ,01 natural sciences ,Gabor filter ,Knuckle ,Artificial Intelligence ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,010306 general physics ,business.industry ,Hamming distance ,Pattern recognition ,Support vector machine ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,medicine.anatomical_structure ,Signal Processing ,Pattern recognition (psychology) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
An authentic Personal identification infrastructure helps to control the access in order to secure data and information. Biometric technology is mainly based on physiological or behavioural characteristics of human body. This paper elucidates Finger Knuckle Print (FKP) biometric system based on feature extraction methodology using the short and long Gabor feature extraction. This FKP authentication system involves all basic processes like pre-processing, feature extraction and classification. This feature extraction is done by Gabor filter for extracting the important features form the FKP dataset. The query FKP Gabor features are matched and compared with the enrolled template using Hamming distance [HD]. Finally this paper proposes the FKP recognition based on Support Vector Machines in accordance with score level fusion to improve the recognition performance of FKP by integrating the Gabor features. The main aim of this paper is to utilize the ability of Support Vector Machines (SVM) in pattern recognition and classifying with Hamming distance which helps to improve the False Acceptance Rate (FAR) and Genuine Acceptance Rate (GAR). This new combination (double instance) of FKP shows better results as 96.01% for MAX Rule and 92.33% for Min Rule than single instance performance as 89.11% .This idea shows good results in Finger Knuckle Print recognition of a person.
- Published
- 2019
28. Shape-aware label fusion for multi-atlas frameworks
- Author
-
Olof Enqvist, Johannes Ulén, Viktor Larsson, Fredrik Kahl, Matilda Landgren, and Jennifer Alvén
- Subjects
Fusion ,Computer science ,business.industry ,Atlas (topology) ,Brain atlas ,Multi atlas ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,Image segmentation ,computer.software_genre ,01 natural sciences ,Artificial Intelligence ,Voxel ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,computer ,Software - Abstract
Despite of having no explicit shape model, multi-atlas approaches to image segmentation have proved to be a top-performer for several diverse datasets and imaging modalities. In this paper, we show how one can directly incorporate shape regularization into the multi-atlas framework. Unlike traditional multi-atlas methods, our proposed approach does not rely on label fusion on the voxel level. Instead, each registered atlas is viewed as an estimate of the position of a shape model. We evaluate and compare our method on two public benchmarks: (i) the VISCERAL Grand Challenge on multi-organ segmentation of whole-body CT images and (ii) the Hammers brain atlas of MR images for segmenting the hippocampus and the amygdala. For this wide spectrum of both easy and hard segmentation tasks, our experimental quantitative results are on par or better than state-of-the-art. More importantly, we obtain qualitatively better segmentation boundaries, for instance, preserving topology and fine structures.Despite of having no explicit shape model, multi-atlas approaches to image segmentation have proved to be a top-performer for several diverse datasets and imaging modalities. In this paper, we show how one can directly incorporate shape regularization into the multi-atlas framework. Unlike traditional multi-atlas methods, our proposed approach does not rely on label fusion on the voxel level. Instead, each registered atlas is viewed as an estimate of the position of a shape model. We evaluate and compare our method on two public benchmarks: (i) the VISCERAL Grand Challenge on multi-organ segmentation of whole-body CT images and (ii) the Hammers brain atlas of MR images for segmenting the hippocampus and the amygdala. For this wide spectrum of both easy and hard segmentation tasks, our experimental quantitative results are on par or better than state-of-the-art. More importantly, we obtain qualitatively better segmentation boundaries, for instance, preserving topology and fine structures.
- Published
- 2019
29. Grayification: A meaningful grayscale conversion to improve handwritten historical documents analysis
- Author
-
Rolf Ingold, Manuel Bouillon, and Marcus Liwicki
- Subjects
business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,Luminance ,Grayscale ,Set (abstract data type) ,Artificial Intelligence ,Handwriting recognition ,Handwriting ,0103 physical sciences ,Signal Processing ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Software ,Historical document - Abstract
This paper presents an improvement of handwriting binarization techniques on colored historical documents. We introduce a novel preprocessing step into the usual document image analysis (DIA) workflow. Before binarization, we propose a grayification step to enhance the input image with the help of a new grayscale conversion algorithm, namely the grayification algorithm. This new algorithm uses luminance and color information to improve the contrast between the foreground and the background. Especially on documents with non-black ink and moreover with diverse colors, e.g., illuminations in historical manuscripts, we expect an increased performance. The binarization give then better results on this enhanced grayscale image, and in particular color text is binarized as well as black text. In fact, by adding a preprocessing step to enhance the input grayscale image, the results on all the following tasks of the analysis chain should be improved. This modification of the usual workflow of historical document analysis eases the binarization task as well as other following tasks like layout analysis, line segmentation, OCR, etc. We demonstrate the effects of our novel preprocessing technique on a set of challenging historical documents, which we make publicly available for research purpose, and two publicly available datasets. This improvement is illustrated in this paper on the binarization task, where the results of four different binarization methods are successfully improved.
- Published
- 2019
30. A multi-feature selection approach for gender identification of handwriting based on kernel mutual information
- Author
-
Jun Tan, Ching Y. Suen, Nicola Nobile, and Ning Bi
- Subjects
Chain code ,business.industry ,Computer science ,Feature selection ,Mutual information ,Handwriting segmentation ,computer.software_genre ,Support vector machine ,Identification (information) ,Kernel method ,Artificial Intelligence ,Handwriting ,Kernel (statistics) ,Signal Processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software ,Natural language processing - Abstract
This paper presents a new flexible approach to predict the gender of the writers from their handwriting samples. Handwriting features like slant, curvature, line separation, chain code, character shapes, and more, can be extracted from different methods. Therefore, the multi-feature sets are irrelevant and redundant. The conflict of the features exists in the sets, which affects the accuracy of classification and the computing cost. This paper proposes an approach, named kernel mutual information (KMI), that focuses on feature selection. The KMI approach can decrease redundancies and conflicts. In addition, it extracts an optimal subset of features from the writing samples produced by male and female writers. To ensure that KMI can apply the various features, this paper describes the handwriting segmentation and handwritten text recognition technology used. The classification is carried out using a Support Vector Machine (SVM) on two databases. The first database comes from the ICDAR 2013 competition on gender prediction, which provides the samples in both Arabic and English. The other database contains the Registration-Document-Form (RDF) database in Chinese. The proposed and compared methods were evaluated on both databases. Results from the methods highlight the importance of feature selection for gender prediction from handwriting.
- Published
- 2019
31. A review of Convolutional-Neural-Network-based action recognition
- Author
-
Tao Lei, Jiandan Zhong, and Guangle Yao
- Subjects
Computer science ,business.industry ,Deep learning ,Search engine indexing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,Convolutional neural network ,Image (mathematics) ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,Core (graph theory) ,0202 electrical engineering, electronic engineering, information engineering ,Action recognition ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,computer ,Software - Abstract
Video action recognition is widely applied in video indexing, intelligent surveillance, multimedia understanding, and other fields. Recently, it was greatly improved by incorporating the learning of deep information using Convolutional Neural Network (CNN). This motivated us to review the notable CNN-based action recognition works. Because CNN is primarily designed to extract 2D spatial features from still image and videos are naturally viewed as 3D spatiotemporal signals, the core issue of extending the CNN from image to video is temporal information exploitation. We divide the solutions for exploiting temporal information exploration into three strategies: 1) 3D CNN; 2) taking the motion-related information as the CNN input; and 3) fusion. In this paper, we present a comprehensive review of the CNN-based action recognition methods according to these strategies. We also discuss the action recognition performance on recent large-scale benchmarks and the limitations and future research directions of CNN-based action recognition. This paper offers an objective and clear review of CNN-based action recognition and provides a guide for future research.
- Published
- 2019
32. Convolutional kernel networks based on a convex combination of cosine kernels
- Author
-
Mohammad Reza Mohammadnia-Qaraei, Kamaledin Ghiasi-Shirazi, and Reza Monsefi
- Subjects
Computer science ,Gaussian ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Convolutional neural network ,Convolution ,symbols.namesake ,Kernel (linear algebra) ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Convex combination ,0105 earth and related environmental sciences ,business.industry ,Deep learning ,Euclidean distance ,Kernel method ,Kernel (image processing) ,Signal Processing ,symbols ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Algorithm ,Software ,MNIST database - Abstract
Convolutional Kernel Networks (CKNs) are efficient multilayer kernel machines, which are constructed by approximating a convolution kernel with a mapping based on Gaussian functions. In this paper, we introduce a new approximation of the same convolution kernel based on a convex combination of cosine kernels. CKNs are structurally similar to Convolutional Neural Networks (CNNs), but the convolution operation in CKNs is based on the Euclidean distance, which is not common in convolutional networks. We show that the CKN model obtained by the proposed approximation leads to the ordinary convolution operation, which is based on the inner product. From this point of view, the proposed model is a step forward towards bridging the gap between kernel methods and deep learning. In this paper, we use two methods for learning filters of the proposed CKN: Random Fourier Features, which is a randomized data-independent method for approximating shift-invariant kernels, and a novel method based on the minimization of the sum of squared errors of approximating shift-invariant kernels. Although the RFF method is much faster than ordinary CKN, it requires a high number of random features in order to obtain an acceptable accuracy. To overcome this problem, we proposed the second method, in which the filters are learned in a data-dependent fashion. We evaluate the proposed model on visual recognition datasets MNIST, CIFAR-10, C-Cube, and FERET. Our experiments show that the proposed model surpasses ordinary CKNs in terms of accuracy. Specifically, on CIFAR-10, the accuracy of the proposed method is 1.7% higher than ordinary CKN.
- Published
- 2018
33. Influence of emotion and cognitive demand on frame effect in crisis decision-making
- Author
-
Huizhang Shen, Xiao Han, and Xiaomin Gong
- Subjects
Value (ethics) ,Knowledge management ,Emergency management ,business.industry ,Computer science ,Process (engineering) ,Frame (networking) ,Information Dissemination ,Cognition ,02 engineering and technology ,01 natural sciences ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,Quantitative research ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,010306 general physics ,business ,Software - Abstract
As an important branch of emergency management, the research on decision-making methods of network forum crisis information dissemination has important theoretical and practical value. At present, there is a lack of effective quantitative research method for its decision-making methods in China. Based on multi - attribute uncertainty decision - making, this paper presents a quantitative research method for the decision - making of forum crisis information dissemination, and uses the example data to evaluate the multi - attribute research example of multi - decision maker's main body and decision attribute weight. The research results show that the research method of this paper can effectively play a role in the decision-making process of the forum crisis information dissemination.
- Published
- 2018
34. Multimodal vehicle detection: fusing 3D-LIDAR and color camera data
- Author
-
Cristiano Premebida, Paulo Peixoto, Alireza Asvadi, Luis Garrote, and Urbano Nunes
- Subjects
Modality (human–computer interaction) ,Artificial neural network ,Color image ,Computer science ,business.industry ,Deep learning ,010401 analytical chemistry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,01 natural sciences ,Convolutional neural network ,Object detection ,0104 chemical sciences ,Lidar ,Artificial Intelligence ,Minimum bounding box ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
Most of the current successful object detection approaches are based on a class of deep learning models called Convolutional Neural Networks (ConvNets). While most existing object detection researches are focused on using ConvNets with color image data, emerging fields of application such as Autonomous Vehicles (AVs) which integrates a diverse set of sensors, require the processing for multisensor and multimodal information to provide a more comprehensive understanding of real-world environment. This paper proposes a multimodal vehicle detection system integrating data from a 3D-LIDAR and a color camera. Data from LIDAR and camera, in the form of three modalities, are the inputs of ConvNet-based detectors which are later combined to improve vehicle detection. The modalities are: (i) up-sampled representation of the sparse LIDAR’s range data called dense-Depth Map (DM), (ii) high-resolution map from LIDAR’s reflectance data hereinafter called Reflectance Map (RM), and (iii) RGB image from a monocular color camera calibrated wrt the LIDAR. Bounding Box (BB) detections in each one of these modalities are jointly learned and fused by an Artificial Neural Network (ANN) late-fusion strategy to improve the detection performance of each modality. The contribution of this paper is two-fold: (1) probing and evaluating 3D-LIDAR modalities for vehicle detection (specifically the depth and reflectance map modalities), and (2) joint learning and fusion of the independent ConvNet-based vehicle detectors (in each modality) using an ANN to obtain more accurate vehicle detection. The obtained results demonstrate that (1) DM and RM are very promising modalities for vehicle detection, and (2) experiments show that the proposed fusion strategy achieves higher accuracy compared to each modality alone in all the levels of increasing difficulty (easy, moderate, hard) in KITTI object detection dataset.
- Published
- 2018
35. Combining CNN streams of RGB-D and skeletal data for human activity recognition
- Author
-
Javed Imran, Praveen Kumar, and Pushpajit A. Khaire
- Subjects
business.industry ,Computer science ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Skeleton (category theory) ,Convolutional neural network ,Motion (physics) ,Activity recognition ,Artificial Intelligence ,Signal Processing ,Softmax function ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Representation (mathematics) ,business ,Software ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Inspired by the success of deep learning methods, for human activity recognition based on individual vision cues, this paper presents a ConvNets based approach for activity recognition by combining multiple vision cues. Moreover, a new method of creating skeleton images, from skeleton joint sequences, representing motion information is presented in this paper. Motion representation images, namely, Motion History Image (MHI), Depth Motion Maps (DMMs) and skeleton images are constructed from RGB, depth and skeletal data of RGB-D sensor. These images are then separately trained on ConvNets and respective softmax scores are fused at the decision level. The combination of these distinct vision cues, leads to complete utilization of data, available from RGB-D sensor. To evaluate the effectiveness of the proposed 5-CNNs approach, we conduct our experiments on three well known and challenging RGB-D datasets, CAD-60, SBU Kinect interaction and UTD-MHAD. Results show that the proposed approach of combining multiple cues by means of decision level fusion is competitive with other state of the art methods.
- Published
- 2018
36. Distributed electricity load forecasting model mining based on hybrid gene expression programming and cloud computing
- Author
-
Changan Yuan, Liping Zhang, Song Deng, and Lechan Yang
- Subjects
Mathematical optimization ,Speedup ,Computer science ,business.industry ,020209 energy ,Crossover ,Swarm behaviour ,Cloud computing ,02 engineering and technology ,computer.software_genre ,Artificial Intelligence ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Power quality ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Electricity ,Data mining ,Gene expression programming ,business ,computer ,Software - Abstract
Load forecasting is an important part of power grid management. Accurate and timely load forecasting is of great significance to formulate economical and reasonable power allocation plan, improve safety and economy of power grid operation and improve power quality. In this paper, in order to find electricity load forecasting model, we propose an electricity load forecasting function mining algorithm based on artificial fish swarm and gene expression programming (ELFFM-AFSGEP). On the basis, distributed load forecast model mining based on hybrid gene expression programming and cloud computing (DLFMM-HGEPCloud) is proposed to solve the problem of massive electricity load forecasting. In order to better solve global electricity load forecasting model, error minimization crossover is introduced into DLFMM-HGEPCloud. The performance of the proposed algorithm in this paper is evaluated with a real-world dataset, and compared with GEP and some published algorithms by using the same dataset. Experimental results show that our proposed algorithm has an advantage in average time-consumption, average number of convergence, forecasted accuracy and excellent parallel performance in speedup and scaleup.
- Published
- 2018
37. Deep generative video prediction
- Author
-
Chunhong Pan, Lingfeng Wang, Tingzhao Yu, Shiming Xiang, and Huxiang Gu
- Subjects
Deblurring ,Computer science ,business.industry ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,01 natural sciences ,Motion (physics) ,Discriminative model ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Color filter array ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Encoder ,Software - Abstract
Video prediction plays a fundamental role in video analysis and pattern recognition. However, the generated future frames are often blurred, which are not sufficient for further research. To overcome this obstacle, this paper proposes a new deep generative video prediction network under the framework of generative adversarial nets. The network consists of three components: a motion encoder, a frame generator and a frame discriminator. The motion encoder receives multiple frame differences (also known as Eulerian motion) as input and outputs a global video motion representation. The frame generator is a pseudo-reverse two-stream network to generate the future frame. The frame discriminator is a discriminative 3D convolution network to determine whether the given frame is derived from the true future frame distribution or not. The frame generator and frame discriminator train jointly in an adversarial manner until a Nash equilibrium. Motivated by theories on color filter array, this paper also designs a novel cross channel color gradient (3CG) loss as a guidance of deblurring. Experiments on two state-of-the-art data sets demonstrate that the proposed network is promising.
- Published
- 2018
38. Back projection: An effective postprocessing method for GAN-based face sketch synthesis
- Author
-
Zha Wenjin, Xinbo Gao, Jie Li, and Nannan Wang
- Subjects
021110 strategic, defence & security studies ,business.industry ,Computer science ,0211 other engineering and technologies ,Probabilistic logic ,Pattern recognition ,02 engineering and technology ,Translation (geometry) ,Convolutional neural network ,Sketch ,Task (computing) ,Artificial Intelligence ,Face (geometry) ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Noise (video) ,business ,Software - Abstract
We consider the image transformation problems in this paper, where an input face photo is transformed into a sketch, i.e. face sketch synthesis. It plays important role in video surveillance-based law enforcement. Recent methods for such problems typically train feed-forward convolutional neural networks (CNN) or graphical probabilistic models. In this paper, inspired by the recent success in generating images of generative adversarial networks (GAN), we employ GAN to perform this task. However, accompanying with fine textures generated by GAN model, noise appears among the generated results. We proposed a back projection method to reconstruct the synthesized results. Extensive experiments on public face databases illustrate the effectiveness and superiority of the proposed method compared with state-of-the-art methods. The proposed back projection strategy can be extended to other GAN-based image-to-image translation problems. Data and implementation code in this paper are available online at www.ihitworld.com/WNN/Back_Projection.zip .
- Published
- 2018
39. Predictive complex event processing based on evolving Bayesian networks
- Author
-
Yongheng Wang, Guidan Chen, and Hui Gao
- Subjects
Computer science ,Big data ,Inference ,Complex event processing ,02 engineering and technology ,Machine learning ,computer.software_genre ,Bayesian inference ,Artificial Intelligence ,0502 economics and business ,Expectation–maximization algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Intelligent transportation system ,050210 logistics & transportation ,business.industry ,Event (computing) ,05 social sciences ,Bayesian network ,Mixture model ,Variable-order Bayesian network ,Signal Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Data mining ,Artificial intelligence ,business ,computer ,Wireless sensor network ,Software - Abstract
In the Big Data era, large volumes of data are continuously and rapidly generated from sensor networks, social network, the Internet, etc. Predicting from online event stream is an important task since users usually need to predict some future states and take some actions in advance. Many applications need online prediction models which can evolve automatically with data distribution drift and algorithms which can support single-pass processing of data, which are still faced with many challenges. In this paper, the authors propose a predictive complex event processing method based on evolving Bayesian networks. The Bayesian model is designed based on event type and time with inference method based on Gaussian mixture model and EM algorithm. When learning the structure of Bayesian network from event streams, this method supports calculating score metric incrementally when new data is arrived or edges in the network are changed. Evolving Bayesian network structure is supported based on hill-climbing method. The system can continuously monitor the Bayesian network model and modify it if it is found to be not appropriate for the new incoming data. The method of this paper is evaluated in road traffic domain with both real application data and data produced by a simulated transportation system. The total percentage error is 8.12% for real data and 7.78% for simulated data, while the best result for other methods is 11.79% for real data and 14.59% for simulated data. The experimental evaluations show that this method is effective for predictive complex event processing and it outperforms other popular methods when processing traffic prediction in intelligent transportation systems.
- Published
- 2018
40. Weighted kappa loss function for multi-class classification of ordinal data in deep learning
- Author
-
Jordi de la Torre, Aida Valls, and Domenec Puig
- Subjects
Ordinal data ,Computer science ,Stability (learning theory) ,Linear classifier ,02 engineering and technology ,Machine learning ,computer.software_genre ,Ordinal regression ,030218 nuclear medicine & medical imaging ,Multiclass classification ,03 medical and health sciences ,0302 clinical medicine ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,business.industry ,Deep learning ,Supervised learning ,Pattern recognition ,Signal Processing ,Ordinal number ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software ,Kappa - Abstract
Proposal of weighted kappa as loss function for ordinal regression in deep learning.Derivation of equations required for applying first order optimization algorithms.Solve three real-world ordinal classification problem.Performance comparison of the models trained by both log loss and kappa loss.Check stability of the kappa loss function using different type of data. Weighted Kappa is an index of reference used in many diagnosis systems to compare the agreement between different raters. This index can be also used to evaluate the performance of automatic classification methods against the gold standard given by an expert (or from a consensus of an expert group). On the other hand, in the last years, deep learning has achieved a great importance as a new machine learning method. The usual loss function used in deep learning for multi-class classification is the logarithmic loss. In this paper we explore the direct use of a weighted kappa loss function for multi-class classification of ordinal data, also known as ordinal regression. Three classification problems are solved in the paper using these two loss functions. Results confirm that better classification is made when the model is constructed with the optimization of kappa instead of logarithmic loss.
- Published
- 2018
41. A new feature selection method based on a validity index of feature subset
- Author
-
Martin Konan, Wenyong Wang, Chuan Liu, Qiang Zhao, and Xiaoming Shen
- Subjects
business.industry ,Computer science ,020206 networking & telecommunications ,Pattern recognition ,Feature selection ,02 engineering and technology ,computer.software_genre ,Support vector machine ,Feature (computer vision) ,Search algorithm ,Artificial Intelligence ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,Computer Vision and Pattern Recognition ,business ,Classifier (UML) ,computer ,Software - Abstract
The wrapper feature selection method can achieve high classification accuracy. However, the cross-validation scheme of the wrapper method in evaluation phase is very expensive regarding computing resource consumption. In this paper, we propose a new statistical measure named as LW-index which could replace the expensive cross-validation scheme to evaluate the feature subset. Then, a new feature selection method, which is the combination of the proposed LW-index with Sequence Forward Search algorithm (SFS-LW), is presented in this paper. Further, we show through plenty of experiments conducted on nine UCI datasets that the proposed method can obtain similar classification accuracy as the wrapper method with centroid-based classifier or support vector machine, and its computation cost is approximate to the compared filter methods.
- Published
- 2017
- Full Text
- View/download PDF
42. Enhancing classification performance using attribute-oriented functionally expanded data
- Author
-
Joo Roberto Bertini Junior and Maria do Carmo Nicoletti
- Subjects
0209 industrial biotechnology ,Training set ,Artificial neural network ,Computer science ,business.industry ,Pattern recognition ,02 engineering and technology ,Extension (predicate logic) ,Machine learning ,computer.software_genre ,Support vector machine ,Search engine ,Statistical classification ,020901 industrial engineering & automation ,Artificial Intelligence ,Signal Processing ,Genetic algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Raw data ,business ,computer ,Software - Abstract
Functional extended data for enhancing the performance of various types of classifiers.Use of attribute-oriented extensions instead of single extension to all attributes.Set of attribute-oriented extensions and corresponding expansion sizes found using GA.Proposed method consistently enhanced classification performance through all domains.Attribute-oriented extensions show superior results than single extension or raw data. There are many data pre-processing techniques that aim at enhancing the quality of classifiers induced by machine learning algorithms. Functional expansions (FE) are one of such techniques, which has been originally proposed to aid neural network based classification. Despite of being successfully employed, works reported in the literature use the same functional expansion, with the same expansion size (ES), applied to each attribute that describes the training data. In this paper it is argued that FE and ES can be attribute-oriented and, by choosing the most suitable FESE pair for each attribute, the input data representation improves and, as a consequence, learning algorithms can induce better classifiers. This paper proposes, as a pre-processing step to learning algorithms, a method that uses a genetic algorithm for searching for a suitable FESE pair for each data attribute, aiming at producing functionally extended training data. Experimental results using functionally expanded training sets, considering four classification algorithms, KNN, CART, SVM and RBNN, have confirmed the hypothesis; the proposed method for searching for FESE pairs through an attribute-oriented fashion has yielded statistically significant better results than learning from the original data or by considering the result from the best FESE pair for all attributes.
- Published
- 2017
43. Scene conditional background update for moving object detection in a moving camera
- Author
-
Kimin Yun, Jongin Lim, and Jin Young Choi
- Subjects
Scheme (programming language) ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Object detection ,Motion (physics) ,Artificial Intelligence ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software ,ComputingMethodologies_COMPUTERGRAPHICS ,computer.programming_language - Abstract
This paper proposes a moving object detection algorithm adapting to various scene changes in a moving camera.Our method estimates three scene condition variables: background motion, foreground motion, and illumination changes.According to scene condition variables, our method builds a background model adaptively.Our method adapts itself to the dynamic scene changes and outperforms the state-of-the art methods. This paper proposes a moving object detection algorithm adapting to various scene changes in a moving camera. In the moving camera scene, both backgrounds and objects are moving while the level of illumination in general varies frequently. To handle these scene changes, we propose a scene conditional background update scheme that adaptively builds the background according to how the scene changes. First, we estimate the three scene condition variables of background motion, foreground motion and illumination changes for an awareness of the scene condition. We then compensate for the camera movement and update the background model in different ways according to the scene condition. Lastly, we propose a new foreground decision method with a foreground likelihood map, two thresholds, and a watershed algorithm to generate a spatially connected foreground region. We validate the effectiveness of our method quantitatively and qualitatively with ten videos in various scene conditions. The experimental results show that our method adapts itself to dynamic scene changes and outperforms state-of-the-art methods.
- Published
- 2017
44. A saliency-modulated just-noticeable-distortion model with non-linear saliency modulation functions
- Author
-
Hadi Hadizadeh
- Subjects
Auditory masking ,Computer science ,Image quality ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020206 networking & telecommunications ,02 engineering and technology ,Nonlinear system ,Artificial Intelligence ,Salience (neuroscience) ,Distortion ,Signal Processing ,Modulation (music) ,Human visual system model ,0202 electrical engineering, electronic engineering, information engineering ,Discrete cosine transform ,Visual attention ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,Block (data storage) ,Visual saliency - Abstract
In this paper we present a saliency-modulated just-noticeable-distortion (JND) model for static images.The proposed model utilizes two non-linear modulation functions to elevate JND thresholds based on visual saliency.The proposed method achieves a high accuracy in JND estimation, and also it offers a high distortion-hiding capacity. It is well-known that the human visual system (HVS) cannot sense small variations of visual signals below the so-called just-noticeable distortion (JND) thresholds due to their underlying spatial/temporal masking properties. It is also known that the visual attention mechanism of the human brain can enhance or reduce visual sensitivity. In other words, the visual attention has modulatory effects on JND thresholds. The current knowledge also states that the visual attention is mainly driven by visual saliency in an automatic and involuntary manner. In this paper we present a saliency-modulated JND (SJND) model for static images in the discrete cosine transform (DCT) domain. In the proposed model, the JND thresholds of each block in a given image are elevated by two non-linear modulation functions using the visual saliency of the block. The parameters of the saliency modulation functions are obtained through an optimization framework, which utilizes a state-of-the-art saliency-based objective image quality assessment method. To evaluate the proposed SJND model, two subjective experiments were conducted. The obtained experimental results demonstrated that the proposed method achieves a high accuracy in JND estimation, and also it provides a high distortion-hiding capacity.
- Published
- 2016
45. Linear discrimination dictionary learning for shape descriptors
- Author
-
Jin Xie, Meng Wang, Fan Zhu, and Yi Fang
- Subjects
K-SVD ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Linear discriminant analysis ,Discriminative model ,Artificial Intelligence ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Invariant (mathematics) ,business ,Dictionary learning ,Software - Abstract
Learning a structured dictionary with shared sub-dictionary.Based on the structured dictionary, learning a discriminative 3D shape descriptor.Learnt descriptors achieves good performance. The complexity and variation of 3D models have posed a lot of challenges in 3D shape retrieval area, for example, the invariant representation and retrieval of nonrigid and noisy 3D shapes. This paper proposed a supervised dictionary learning scheme called Linear Discrimination Dictionary Learning (LDDL) which can learn shape representations that are insensitive to 3D shape deformations in the same category and different for shapes from different categories in the meantime. Besides, it can extract the subtle differences between 3D shapes for fine-grained shapes. To be specific, in this paper, category-specific dictionaries are learnt to encode subtle visual differences of shapes among different categories, a shared dictionary is learnt to encode common patterns of shapes among all the categories; with the Linear Discriminant Analysis (LDA) constraint on the learnt descriptors, the new descriptors can have small within-class scatter and big between-class scatter. Our method is efficient in training and can obtain promising shape retrieval performance on representative shape benchmark datasets.
- Published
- 2016
46. Estimating accurate water levels for rivers and reservoirs by using SAR products: A multitemporal analysis
- Author
-
Fabio A. M. Cappabianco, Thiago L. M. Barreto, and Jurandy Almeida
- Subjects
Water mass ,010504 meteorology & atmospheric sciences ,Computer science ,business.industry ,0211 other engineering and technologies ,Image processing ,02 engineering and technology ,01 natural sciences ,Field (computer science) ,Flooding (computer networking) ,Artificial Intelligence ,Signal Processing ,Interferometric synthetic aperture radar ,Graph (abstract data type) ,Coherence (signal processing) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Digital elevation model ,business ,Software ,021101 geological & geomatics engineering ,0105 earth and related environmental sciences ,Remote sensing - Abstract
Remote sensing applied to flooding and natural disasters have been field of study for several research papers, generally aiming at detecting water masses, its depth measurements, and even determining its dynamics over time. These are important information for monitoring, warning about, and preventing the occurrence of hazardous situations. Through image products acquired from sensors, it is possible to automatically extract information, replacing fieldwork in areas of difficult access. A well-known acquisition technique is obtained by airborne SAR/InSAR (Interferometric Synthetic Aperture Radar) resulting in images of phase measurements (height), digital elevation models, signal strength (amplitude), and coherence for a given area. As new technologies are developed, it is possible to capture higher resolution images, which require more sophisticated image processing tools. In this paper, we present a novel approach for estimating water levels from SAR/InSAR products. The proposed method is based on a graph framework, known as Image Foresting Transform (IFT). Here, we adapted this framework for detecting margins of rivers and reservoirs, enabling us to accurately estimate their water levels over time. A rigorous analysis with real world data is conducted and discussed. Experimental results show that our approach is able to reliably predict the water levels using SAR/InSAR products and provide estimates which correspond precisely to fieldwork measurements.
- Published
- 2016
47. Eye movements during scene understanding for biometric identification
- Author
-
Usman Saeed
- Subjects
Biometrics ,Computer science ,Feature vector ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Video camera ,02 engineering and technology ,law.invention ,Artificial Intelligence ,law ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Computer vision ,050207 economics ,Cluster analysis ,business.industry ,05 social sciences ,Eye movement ,medicine.anatomical_structure ,Signal Processing ,Eye tracking ,020201 artificial intelligence & image processing ,Human eye ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
The human eye is rich in physical and behavioral attributes that can be used for biometric identification. Eye movement is a behavioral attribute that can be collected non-intrusively for biometric identification. Usually a task oriented visual stimulus is presented to the subject and his eyes are tracked using a video camera, which are then used for biometric identification. The most common visual stimulus employed includes the moving object and free viewing. In this paper I have experimented with a novel task oriented visual stimulus i.e. scene understanding. In scene understanding the observers are instructed beforehand that they must perform a task based on the contents of the image/video that will be presented. A biometric identification system has been developed based on the eye movements extracted during scene understanding. A compact and easy to extract feature vector based on clustering of eye movements has been proposed and tested using several publicly available databases and two classification schemes. The results presented in this paper with a correct identification rate of 85.72% are quite promising. Furthermore, I also provide comparative results by implementing three commonly used feature vectors for eye movements. Eye movements during scene understanding can be used for biometric identification.Identification results from scene understanding task are similar to other tasks.Clustering of eye movements is an effective feature extraction method.Saccades and fixations, both contribute to the improvement of identification result.
- Published
- 2016
48. Towards demographic categorization using gaze analysis
- Author
-
Harry Wechsler, Marco Porta, Michele Nappi, Chiara Galdi, and Virginio Cantoni
- Subjects
Biometrics ,Computer science ,Adaboost ,SVM ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Gender and age categorization ,050801 communication & media studies ,02 engineering and technology ,Machine learning ,computer.software_genre ,0508 media and communications ,Artificial Intelligence ,Gaze analysis ,0202 electrical engineering, electronic engineering, information engineering ,GANT ,AdaBoost ,business.industry ,K-fold cross validation ,Software ,1707 ,Signal Processing ,05 social sciences ,Gaze ,Support vector machine ,Categorization ,Eye tracking ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer - Abstract
We studied the applicability of gaze analysis for gender and age categorization.We employed a publicly available gaze recording database consisting of scan paths of 112 different observers.Stimuli employed in gaze recording process are female and male face pictures.We used one protocol, K-fold cross validation, and two learning methods, namely Adaboost and support vector machines. Current use of gaze analysis, which is mostly restricted to eye gaze tracking for augmentative and alternative communication (AAC) medium, can benefit people afflicted with amyotrophic lateral sclerosis (ALS). This paper advances the use of gaze analysis for biometrics purposes related to gender and age demographics to benefit applications related to retail space for targeted advertising, behavioral biometrics to benefit health care, and surveillance applications. Towards that end, this paper expands on the recently introduced Gaze ANalysis Technique (GANT) for human identification to combine the length of time spent on observing patterns of interest and the scanning patterns for biometric representation with AdaBoost and super vector machines (SVM) subsequently used for biometric categorization. The experiments conducted show that while the initial results are promising further innovation and development is required to make gaze analysis a viable alternative for demographics categorization, on its own, or together with other biometrics. Further improvements on performance are expected from the derivation, extraction, and use of alternative and novel gaze driven features. This will include among others additional information that is already available about the arc features connecting the fixation points and the dynamics they encode about the roving gaze.
- Published
- 2016
49. Iris liveness detection using regional features
- Author
-
Konstantinos Sirlantzis, Gareth Howells, and Yang Hu
- Subjects
Computer science ,business.industry ,Liveness ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Measure (mathematics) ,Convolution ,Distribution (mathematics) ,Kernel (image processing) ,Artificial Intelligence ,Feature (computer vision) ,Signal Processing ,Pyramid ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Pyramid (image processing) ,Artificial intelligence ,business ,Software - Abstract
Regional features with both low level features and high level feature distribution.Intensity and local descriptors as low level features.Spatial pyramid model seeking feature distribution in regions with varying size.Relational measure expressing feature distribution in regions with varying shape.Experiments on both NIR and colour datasets. In this paper, we exploit regional features for iris liveness detection. Regional features are designed based on the relationship of the features in neighbouring regions. They essentially capture the feature distribution among neighbouring regions. We construct the regional features via two models: spatial pyramid and relational measure which seek the feature distributions in regions with varying size and shape respectively. The spatial pyramid model extracts features from coarse to fine grid regions, and, it models a local to global feature distribution. The local distribution captures the local feature variations while the global distribution includes the information that is more robust to translational transform. The relational measure is based on a feature-level convolution operation defined in this paper. By varying the shape of the convolution kernel, we are able to obtain the feature distribution in regions with different shapes. To combine the feature distribution information in regions with varying size and shape, we fuse the results based on the two models at the score level. Experimental results on benchmark datasets demonstrate that the proposed method achieves an improved performance compared to state-of-the-art features.
- Published
- 2016
50. Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data
- Author
-
Jing Zhang, Fanyong Cheng, and Cuihong Wen
- Subjects
Margin distribution ,Training set ,business.industry ,Computer science ,Cost sensitive ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,computer.software_genre ,Imbalanced data ,Support vector machine ,Artificial Intelligence ,Margin (machine learning) ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Data mining ,business ,computer ,Classifier (UML) ,Software - Abstract
Large margin Distribution Machine (LDM) is not satisfactory on imbalanced training data.Cost-sensitive margin distribution is introduced to design a balanced classifier.Cost-sensitive LDM (CS-LDM) has a very strong generalization performance.CS-LDM can gradually improve the detection rate of the minority class.CS-LDM can obtain a balanced detection rate at the balance point. This paper proposes a new method to design a balanced classifier on imbalanced training data based on margin distribution theory. Recently, Large margin Distribution Machine (LDM) is put forward and it obtains superior classification performance compared with Support Vector Machine (SVM) and many state-of-the-art methods. However, one of the deficiencies of LDM is that it easily leads to the lower detection rate of the minority class than that of the majority class on imbalanced data which contradicts to the needs of high detection rate of the minority class in the real application. In this paper, Cost-Sensitive Large margin Distribution Machine (CS-LDM) is brought forward to improve the detection rate of the minority class by introducing cost-sensitive margin mean and cost-sensitive penalty. Theoretical and experimental results show that CS-LDM can gradually improve the detection rate of the minority class with the increasing of the cost parameter and obtain a balanced classifier when the cost parameter increases to a certain value. CS-LDM is superior to some popular cost-sensitive methods and can be used in many applications.
- Published
- 2016
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.