285 results
Search Results
252. Automatic Skin Lesion Segmentation Using Deep Fully Convolutional Networks With Jaccard Distance.
- Author
-
Yuan, Yading, Chao, Ming, and Lo, Yeh-Chi
- Subjects
- *
IMAGE segmentation , *ARTIFICIAL neural networks , *ARTIFICIAL intelligence , *LESION (Canon law) , *CANON law - Abstract
Automatic skin lesion segmentation in dermoscopic images is a challenging task due to the low contrast between lesion and the surrounding skin, the irregular and fuzzy lesion borders, the existence of various artifacts, and various imaging acquisition conditions. In this paper, we present a fully automatic method for skin lesion segmentation by leveraging 19-layer deep convolutional neural networks that is trained end-to-end and does not rely on prior knowledge of the data. We propose a set of strategies to ensure effective and efficient learning with limited training data. Furthermore, we design a novel loss function based on Jaccard distance to eliminate the need of sample re-weighting, a typical procedure when using cross entropy as the loss function for image segmentation due to the strong imbalance between the number of foreground and background pixels. We evaluated the effectiveness, efficiency, as well as the generalization capability of the proposed framework on two publicly available databases. One is from ISBI 2016 skin lesion analysis towards melanoma detection challenge, and the other is the PH2 database. Experimental results showed that the proposed method outperformed other state-of-the-art algorithms on these two databases. Our method is general enough and only needs minimum pre- and post-processing, which allows its adoption in a variety of medical image segmentation tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
253. Robust Large Margin Deep Neural Networks.
- Author
-
Sokolic, Jure, Rodrigues, Miguel R. D., Giryes, Raja, and Sapiro, Guillermo
- Subjects
- *
DEEP learning , *ARTIFICIAL neural networks , *GENERALIZATION , *JACOBIAN matrices , *ROBUST control , *INVERSE scattering transform - Abstract
The generalization error of deep neural networks via their classification margin is studied in this paper. Our approach is based on the Jacobian matrix of a deep neural network and can be applied to networks with arbitrary nonlinearities and pooling layers, and to networks with different architectures such as feed forward networks and residual networks. Our analysis leads to the conclusion that a bounded spectral norm of the network's Jacobian matrix in the neighbourhood of the training samples is crucial for a deep neural network of arbitrary depth and width to generalize well. This is a significant improvement over the current bounds in the literature, which imply that the generalization error grows with either the width or the depth of the network. Moreover, it shows that the recently proposed batch normalization and weight normalization reparametrizations enjoy good generalization properties, and leads to a novel network regularizer based on the network's Jacobian matrix. The analysis is supported with experimental results on the MNIST, CIFAR-10, LaRED, and ImageNet datasets. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
254. Tiny hand gesture recognition without localization via a deep convolutional network.
- Author
-
Bao, Peijun, Maqueda, Ana I., del-Blanco, Carlos R., and Garc?a, Narciso
- Subjects
- *
ARTIFICIAL neural networks , *DETECTORS , *IMAGE processing , *ENGINEERING instruments , *ROBOTICS - Abstract
Visual hand-gesture recognition is being increasingly desired for human-computer interaction interfaces. In many applications, hands only occupy about 10% of the image, whereas the most of it contains background, human face, and human body. Spatial localization of the hands in such scenarios could be a challenging task and ground truth bounding boxes need to be provided for training, which is usually not accessible. However, the location of the hand is not a requirement when the criteria is just the recognition of a gesture to command a consumer electronics device, such as mobiles phones and TVs. In this paper, a deep convolutional neural network is proposed to directly classify hand gestures in images without any segmentation or detection stage that could discard the irrelevant not-hand areas. The designed hand-gesture recognition network can classify seven sorts of hand gestures in a user-independent manner and on real time, achieving an accuracy of 97.1% in the dataset with simple backgrounds and 85.3% in the dataset with complex backgrounds. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
255. Deep Recurrent Neural Networks for Hyperspectral Image Classification.
- Author
-
Mou, Lichao, Ghamisi, Pedram, and Zhu, Xiao Xiang
- Subjects
- *
ARTIFICIAL neural networks , *HYPERSPECTRAL imaging systems , *MACHINE learning , *IMAGE reconstruction algorithms , *PIXELS - Abstract
In recent years, vector-based machine learning algorithms, such as random forests, support vector machines, and 1-D convolutional neural networks, have shown promising results in hyperspectral image classification. Such methodologies, nevertheless, can lead to information loss in representing hyperspectral pixels, which intrinsically have a sequence-based data structure. A recurrent neural network (RNN), an important branch of the deep learning family, is mainly designed to handle sequential data. Can sequence-based RNN be an effective method of hyperspectral image classification? In this paper, we propose a novel RNN model that can effectively analyze hyperspectral pixels as sequential data and then determine information categories via network reasoning. As far as we know, this is the first time that an RNN framework has been proposed for hyperspectral image classification. Specifically, our RNN makes use of a newly proposed activation function, parametric rectified tanh (PRetanh), for hyperspectral sequential data analysis instead of the popular tanh or rectified linear unit. The proposed activation function makes it possible to use fairly high learning rates without the risk of divergence during the training procedure. Moreover, a modified gated recurrent unit, which uses PRetanh for hidden representation, is adopted to construct the recurrent layer in our network to efficiently process hyperspectral data and reduce the total number of parameters. Experimental results on three airborne hyperspectral images suggest competitive performance in the proposed mode. In addition, the proposed network architecture opens a new window for future research, showcasing the huge potential of deep recurrent networks for hyperspectral data analysis. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
256. Object Detection Networks on Convolutional Feature Maps.
- Author
-
Ren, Shaoqing, He, Kaiming, Girshick, Ross, Zhang, Xiangyu, and Sun, Jian
- Subjects
- *
OBJECT recognition (Computer vision) , *MULTILAYER perceptrons , *ARTIFICIAL neural networks , *DEEP learning , *HISTOGRAMS - Abstract
Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep convolutional architectures. The object classifier, however, has not received much attention and many recent systems (like SPPnet and Fast/Faster R-CNN) use simple multi-layer perceptrons. This paper demonstrates that carefully designing deep networks for object classification is just as important. We experiment with region-wise classifier networks that use shared, region-independent convolutional features. We call them “Networks on Convolutional feature maps” (NoCs). We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier. We show by experiments that despite the effective ResNets and Faster R-CNN systems, the design of NoCs is an essential element for the 1st-place winning entries in ImageNet and MS COCO challenges 2015. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
257. DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks.
- Author
-
Ouyang, Wanli, Zeng, Xingyu, Wang, Xiaogang, Qiu, Shi, Luo, Ping, Tian, Yonglong, Li, Hongsheng, Yang, Shuo, Wang, Zhe, Li, Hongyang, Wang, Kun, Yan, Junjie, Loy, Chen-Change, and Tang, Xiaoou
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *COMPUTER vision , *GEOMETRIC topology , *SUPPORT vector machines - Abstract
In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new deep learning object detection framework has innovations in multiple aspects. In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty. A new pre-training strategy is proposed to learn feature representations more suitable for the object detection task and with good generalization capability. By changing the net structures, training strategies, adding and removing some key components in the detection pipeline, a set of models with large diversity are obtained, which significantly improves the effectiveness of model averaging. The proposed approach improves the mean averaged precision obtained by RCNN
[1] , which was the state-of-the-art, from $31$ percent on the ILSVRC2014 detection test set. It also outperforms the winner of ILSVRC2014, GoogLeNet, by 6.1 percent. Detailed component-wise analysis is also provided through extensive experimental evaluation, which provides a global view for people to understand the deep learning object detection pipeline. [ABSTRACT FROM PUBLISHER]- Published
- 2017
- Full Text
- View/download PDF
258. Regularized Deep Belief Network for Image Attribute Detection.
- Author
-
Wu, Fei, Wang, Zhuhao, Lu, Weiming, Li, Xi, Yang, Yi, Luo, Jiebo, and Zhuang, Yueting
- Subjects
- *
BOLTZMANN machine , *DEEP learning , *ARTIFICIAL neural networks , *BACK propagation , *MACHINE learning - Abstract
In general, an image attribute is a human-nameable visual property that has a semantic connotation. Appropriate modeling of the intrinsic contextual correlations among attributes plays a fundamental role in attribute detection. In this paper, we consider image attribute detection from the perspective of regularized deep learning. In particular, we propose a regularized deep belief network (rDBN) to perform the image attribute detection task, which is composed of two parts: 1) a detection DBN (dDBN) that models the joint distribution of images and their corresponding attributes, which acts as an attribute detector and 2) a contextual restricted Boltzmann machine that explicitly models the correlations among attributes acting as a regularizer that restraints the output detection result given by the dDBN to meet the contextual prior of attributes. Furthermore, we propose an efficient fine-tuning scheme that can further optimize the performance of the dDBN by backpropagation. Experimental results show that the proposed rDBN obtains improvements over the state-of-the-art methods for attribute detection on the benchmark data sets. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
259. Deep Learning Based Approach for Bearing Fault Diagnosis.
- Author
-
Miao He and He, David
- Subjects
- *
BEARINGS (Machinery) , *FAULT location (Engineering) , *DEEP learning , *DATA analysis , *ACOUSTIC emission , *ARTIFICIAL neural networks , *FOURIER transforms - Abstract
Bearing is one of the most critical components in most electrical and power drives. Effective bearing fault diagnosis is important for keeping the electrical and power drives safe and operating normally. In the age of Internet of Things and Industrial 4.0, massive real-time data are collected from bearing health monitoring systems. Mechanical big data have the characteristics of large volume, diversity, and high velocity. There are two major problems in using the existing methods for bearing fault diagnosis with big data. The features are manually extracted relying on much prior knowledge about signal processing techniques and diagnostic expertise, and the used models have shallow architectures, limiting their capability in fault diagnosis. Effectively mining features from big data and accurately identifying the bearing health conditions with new advanced methods have become new issues. This paper presents a deep learning-based approach for bearing fault diagnosis. The presented approach preprocesses sensor signals using short-time Fourier transform (STFT). Based on a simple spectrum matrix obtained by STFT, an optimized deep learning structure, large memory storage retrieval (LAMSTAR) neural network, is built to diagnose the bearing faults. Acoustic emission signals acquired from a bearing test rig are used to validate the presented method. The validation results show the accurate classification performance on various bearing faults under different working conditions. The performance of the presented method is also compared with other effective bearing fault diagnosis methods reported in the literature. The comparison results have shown that the presented method gives much better diagnostic performance, even at relatively low rotating speeds. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
260. A Convolutional Neural Network for Fault Classification and Diagnosis in Semiconductor Manufacturing Processes.
- Author
-
Lee, Ki Bum, Cheon, Sejune, and Kim, Chang Ouk
- Subjects
- *
SEMICONDUCTOR manufacturing , *FAULT tolerance (Engineering) , *ARTIFICIAL neural networks , *MANUFACTURING processes , *FEATURE extraction - Abstract
Many studies on the prediction of manufacturing results using sensor signals have been conducted in the field of fault detection and classification (FDC) for semiconductor manufacturing processes. However, fault diagnosis used to find clues as to root causes remains a challenging area. In particular, process monitoring using neural networks has been employed to only a limited extent because it is a black box model, making the relationships between input data and output results difficult to interpret in actual manufacturing settings, despite its high classification performance. In this paper, we propose a convolutional neural network (CNN) model, named FDC-CNN, in which a receptive field tailored to multivariate sensor signals slides along the time axis, to extract fault features. This approach enables the association of the output of the first convolutional layer with the structural meaning of the raw data, making it possible to locate the variable and time information that represents process faults. In an experiment on a chemical vapor deposition process, the proposed method outperformed other deep learning models. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
261. Deformable Patterned Fabric Defect Detection With Fisher Criterion-Based Deep Learning.
- Author
-
Li, Yundong, Zhao, Weigang, and Pan, Jiahao
- Subjects
- *
SURFACE defects , *DEFORMATIONS (Mechanics) , *DEEP learning , *IMAGE reconstruction , *ARTIFICIAL neural networks - Abstract
In this paper, we propose a discriminative representation for patterned fabric defect detection when only limited negative samples are available. Fabric patches are efficiently classified into defectless and defective categories by Fisher criterion-based stacked denoising autoencoders (FCSDA). First, fabric images are divided into patches of the same size, and both defective and defectless samples are utilized to train FCSDA. Second, test patches are classified through FCSDA into defective and defectless categories. Finally, the residual between the reconstructed image and defective patch is calculated, and the defect is located by thresholding. Experimental results demonstrate the effectiveness of the proposed scheme in the defect detection for periodic patterned fabric and more complex jacquard warp-knitted fabric. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
262. Hyperspectral Image Classification Using Deep Pixel-Pair Features.
- Author
-
Li, Wei, Wu, Guodong, Zhang, Fan, and Du, Qian
- Subjects
- *
ARTIFICIAL neural networks , *HYPERSPECTRAL imaging systems , *DEEP learning , *FISHER discriminant analysis , *REMOTE sensing devices - Abstract
The deep convolutional neural network (CNN) is of great interest recently. It can provide excellent performance in hyperspectral image classification when the number of training samples is sufficiently large. In this paper, a novel pixel-pair method is proposed to significantly increase such a number, ensuring that the advantage of CNN can be actually offered. For a testing pixel, pixel-pairs, constructed by combining the center pixel and each of the surrounding pixels, are classified by the trained CNN, and the final label is then determined by a voting strategy. The proposed method utilizing deep CNN to learn pixel-pair features is expected to have more discriminative power. Experimental results based on several hyperspectral image data sets demonstrate that the proposed method can achieve better classification performance than the conventional deep learning-based method. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
263. Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks.
- Author
-
Volpi, Michele and Tuia, Devis
- Subjects
- *
LABELING theory , *ARTIFICIAL neural networks , *DEEP learning , *IMAGE processing , *DESCRIPTOR systems - Abstract
Semantic labeling (or pixel-level land-cover classification) in ultrahigh-resolution imagery (<10 cm) requires statistical models able to learn high-level concepts from spatial data, with large appearance variations. Convolutional neural networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper, we present a CNN-based system relying on a downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including: 1) the state-of-the-art numerical accuracy; 2) the improved geometric accuracy of predictions; and 3) high efficiency at inference time. We test the proposed system on the Vaihingen and Potsdam subdecimeter resolution data sets, involving the semantic labeling of aerial images of 9- and 5-cm resolution, respectively. These data sets are composed by many large and fully annotated tiles, allowing an unbiased evaluation of models making use of spatial information. We do so by comparing two standard CNN architectures with the proposed one: standard patch classification, prediction of local label patches by employing only convolutions, and full patch labeling by employing deconvolutions. All the systems compare favorably or outperform a state-of-the-art baseline relying on superpixels and powerful appearance descriptors. The proposed full patch labeling CNN outperforms these models by a large margin, also showing a very appealing inference time. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
264. Deep Metric Learning for Visual Tracking.
- Author
-
Hu, Junlin, Lu, Jiwen, and Tan, Yap-Peng
- Subjects
- *
OBJECT tracking (Computer vision) , *AUTOMATIC tracking , *ARTIFICIAL neural networks , *MACHINE learning , *VIDEO processing - Abstract
In this paper, we propose a deep metric learning (DML) approach for robust visual tracking under the particle filter framework. Unlike most existing appearance-based visual trackers, which use hand-crafted similarity metrics, our DML tracker learns a nonlinear distance metric to classify the target object and background regions using a feed-forward neural network architecture. Since there are usually large variations in visual objects caused by varying deformations, illuminations, occlusions, motions, rotations, scales, and cluttered backgrounds, conventional linear similarity metrics cannot work well in such scenarios. To address this, our proposed DML tracker first learns a set of hierarchical nonlinear transformations in the feed-forward neural network to project both the template and particles into the same feature space where the intra-class variations of positive training pairs are minimized and the interclass variations of negative training pairs are maximized simultaneously. Then, the candidate that is most similar to the template in the learned deep network is identified as the true target. Experiments on the benchmark data set including 51 challenging videos show that our DML tracker achieves a very competitive performance with the state-of-the-art trackers. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
265. Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks.
- Author
-
Chen, Yushi, Jiang, Hanlu, Li, Chunyang, Jia, Xiuping, and Ghamisi, Pedram
- Subjects
- *
HYPERSPECTRAL imaging systems , *FEATURE extraction , *SIGNAL convolution , *ARTIFICIAL neural networks , *DATA modeling - Abstract
Due to the advantages of deep learning, in this paper, a regularized deep feature extraction (FE) method is presented for hyperspectral image (HSI) classification using a convolutional neural network (CNN). The proposed approach employs several convolutional and pooling layers to extract deep features from HSIs, which are nonlinear, discriminant, and invariant. These features are useful for image classification and target detection. Furthermore, in order to address the common issue of imbalance between high dimensionality and limited availability of training samples for the classification of HSI, a few strategies such as L2 regularization and dropout are investigated to avoid overfitting in class data modeling. More importantly, we propose a 3-D CNN-based FE model with combined regularization to extract effective spectral–spatial features of hyperspectral imagery. Finally, in order to further improve the performance, a virtual sample enhanced method is proposed. The proposed approaches are carried out on three widely used hyperspectral data sets: Indian Pines, University of Pavia, and Kennedy Space Center. The obtained results reveal that the proposed models with sparse constraints provide competitive results to state-of-the-art methods. In addition, the proposed deep FE opens a new window for further research. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
266. Hybrid Deep Learning for Face Verification.
- Author
-
Sun, Yi, Wang, Xiaogang, and Tang, Xiaoou
- Subjects
- *
HUMAN facial recognition software , *FACE perception , *BOLTZMANN machine , *ARTIFICIAL neural networks , *MULTILAYER perceptrons - Abstract
This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification. A key contribution of this work is to learn high-level relational visual features with rich identity similarity information. The deep ConvNets in our model start by extracting local relational visual features from two face images in comparison, which are further processed through multiple layers to extract high-level and global relational features. To keep enough discriminative information, we use the last hidden layer neuron activations of the ConvNet as features for face verification instead of those of the output layer. To characterize face similarities from different aspects, we concatenate the features extracted from different face region pairs by different deep ConvNets. The resulting high-dimensional relational features are classified by an RBM for face verification. After pre-training each ConvNet and the RBM separately, the entire hybrid network is jointly optimized to further improve the accuracy. Various aspects of the ConvNet structures, relational features, and face verification classifiers are investigated. Our model achieves the state-of-the-art face verification performance on the challenging LFW dataset under both the unrestricted protocol and the setting when outside data is allowed to be used for training. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
267. Scale-Aware Pixelwise Object Proposal Networks.
- Author
-
Jie, Zequn, Liang, Xiaodan, Feng, Jiashi, Lu, Wen Feng, Tay, Eng Hock Francis, and Yan, Shuicheng
- Subjects
- *
COMPUTER networks , *PIXELS , *IMAGE segmentation , *PASCAL (Computer program language) , *ARTIFICIAL neural networks - Abstract
Object proposal is essential for current state-of-the-art object detection pipelines. However, the existing proposal methods generally fail in producing results with satisfying localization accuracy. The case is even worse for small objects, which, however, are quite common in practice. In this paper, we propose a novel scale-aware pixelwise object proposal network (SPOP-net) to tackle the challenges. The SPOP-net can generate proposals with high recall rate and average best overlap, even for small objects. In particular, in order to improve the localization accuracy, a fully convolutional network is employed which predicts locations of object proposals for each pixel. The produced ensemble of pixelwise object proposals enhances the chance of hitting the object significantly without incurring heavy extra computational cost. To solve the challenge of localizing objects at small scale, two localization networks, which are specialized for localizing objects with different scales are introduced, following the divide-and-conquer philosophy. Location outputs of these two networks are then adaptively combined to generate the final proposals by a large-/small-size weighting network. Extensive evaluations on PASCAL VOC 2007 and COCO 2014 show the SPOP network is superior over the state-of-the-art models. The high-quality proposals from SPOP-net also significantly improve the mean average precision of object detection with Fast-Regions with CNN features framework. Finally, the SPOP-net (trained on PASCAL VOC) shows great generalization performance when testing it on ILSVRC 2013 validation set. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
268. Factors of Transferability for a Generic ConvNet Representation.
- Author
-
Azizpour, Hossein, Razavian, Ali Sharif, Sullivan, Josephine, Maki, Atsuto, and Carlsson, Stefan
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *IMAGE recognition (Computer vision) , *CLASSIFIERS (Linguistics) , *COMPUTER vision software - Abstract
Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
269. Learning Stacked Image Descriptor for Face Recognition.
- Author
-
Lei, Zhen, Yi, Dong, and Li, Stan Z.
- Subjects
- *
HUMAN facial recognition software , *IMAGE recognition (Computer vision) , *DEEP learning , *KNOWLEDGE representation (Information theory) , *ARTIFICIAL neural networks , *MACHINE learning - Abstract
Learning-based face descriptors have constantly improved the face recognition performance. Compared with the hand-crafted features, learning-based features are considered to be able to exploit information with better discriminative ability for specific tasks. Motivated by the recent success of deep learning, in this paper, we extend the original shallow face descriptors to deep discriminant face features by introducing a stacked image descriptor (SID). With deep structure, more complex facial information can be extracted and the discriminant and compactness of feature representation can be improved. The SID is learned in a forward optimization way, which is computational efficient compared with deep learning. Extensive experiments on various face databases are conducted to show that SID is able to achieve high face recognition performance with compact face representation, compared with other state-of-the-art descriptors. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
270. Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition.
- Author
-
Wu, Di, Pigou, Lionel, Kindermans, Pieter-Jan, Le, Nam Do-Hoang, Shao, Ling, Dambre, Joni, and Odobez, Jean-Marc
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *HIDDEN Markov models , *GESTURE , *IMAGE segmentation - Abstract
This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
271. Learning Contextual Dependence With Convolutional Hierarchical Recurrent Neural Networks.
- Author
-
Zuo, Zhen, Shuai, Bing, Wang, Gang, Liu, Xiao, Wang, Xingxing, Wang, Bing, and Chen, Yushi
- Subjects
- *
MACHINE learning , *ARTIFICIAL neural networks , *IMAGE processing , *LOGIC circuits , *NATURAL language processing - Abstract
Deep convolutional neural networks (CNNs) have shown their great success on image classification. CNNs mainly consist of convolutional and pooling layers, both of which are performed on local image areas without considering the dependence among different image regions. However, such dependence is very important for generating explicit image representation. In contrast, recurrent neural networks (RNNs) are well known for their ability of encoding contextual information in sequential data, and they only require a limited number of network parameters. Thus, we proposed the hierarchical RNNs (HRNNs) to encode the contextual dependence in image representation. In HRNNs, each RNN layer focuses on modeling spatial dependence among image regions from the same scale but different locations. While the cross RNN scale connections target on modeling scale dependencies among regions from the same location but different scales. Specifically, we propose two RNN models: 1) hierarchical simple recurrent network (HSRN), which is fast and has low computational cost and 2) hierarchical long-short term memory recurrent network, which performs better than HSRN with the price of higher computational cost. In this paper, we integrate CNNs with HRNNs, and develop end-to-end convolutional hierarchical RNNs (C-HRNNs) for image classification. C-HRNNs not only utilize the discriminative representation power of CNNs, but also utilize the contextual dependence learning ability of our HRNNs. On four of the most challenging object/scene image classification benchmarks, our C-HRNNs achieve the state-of-the-art results on Places 205, SUN 397, and MIT indoor, and the competitive results on ILSVRC 2012. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
272. Automatic Segmentation of MR Brain Images With a Convolutional Neural Network.
- Author
-
Moeskops, Pim, Viergever, Max A., Mendrik, Adrienne M., de Vries, Linda S., Benders, Manon J. N. L., and Isgum, Ivana
- Subjects
- *
MAGNETIC resonance imaging of the brain , *IMAGE segmentation , *ARTIFICIAL neural networks , *MATHEMATICAL convolutions , *KERNEL (Mathematics) , *DIAGNOSTIC imaging - Abstract
Automatic segmentation in MR brain images is important for quantitative analysis in large-scale studies with images acquired at all ages. This paper presents a method for the automatic segmentation of MR brain images into a number of tissue classes using a convolutional neural network. To ensure that the method obtains accurate segmentation details as well as spatial consistency, the network uses multiple patch sizes and multiple convolution kernel sizes to acquire multi-scale information about each voxel. The method is not dependent on explicit features, but learns to recognise the information that is important for the classification based on training data. The method requires a single anatomical MR image only. The segmentation method is applied to five different data sets: coronal T2-weighted images of preterm infants acquired at 30 weeks postmenstrual age (PMA) and 40 weeks PMA, axial T2-weighted images of preterm infants acquired at 40 weeks PMA, axial T1-weighted images of ageing adults acquired at an average age of 70 years, and T1-weighted images of young adults acquired at an average age of 23 years. The method obtained the following average Dice coefficients over all segmented tissue classes for each data set, respectively: 0.87, 0.82, 0.84, 0.86, and 0.91. The results demonstrate that the method obtains accurate segmentations in all five sets, and hence demonstrates its robustness to differences in age and acquisition protocol. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
273. Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network.
- Author
-
Anthimopoulos, Marios, Christodoulidis, Stergios, Ebner, Lukas, Christe, Andreas, and Mougiakakou, Stavroula
- Subjects
- *
INTERSTITIAL lung diseases , *DEEP learning , *ARTIFICIAL neural networks , *COMPUTER-aided design , *DIFFERENTIAL diagnosis , *RADIOLOGISTS , *DIAGNOSIS - Abstract
Automated tissue characterization is one of the most crucial components of a computer aided diagnosis (CAD) system for interstitial lung diseases (ILDs). Although much research has been conducted in this field, the problem remains challenging. Deep learning techniques have recently achieved impressive results in a variety of computer vision problems, raising expectations that they might be applied in other domains, such as medical image analysis. In this paper, we propose and evaluate a convolutional neural network (CNN), designed for the classification of ILD patterns. The proposed network consists of 5 convolutional layers with 2\,\times\,2 kernels and LeakyReLU activations, followed by average pooling with size equal to the size of the final feature maps and three dense layers. The last dense layer has 7 outputs, equivalent to the classes considered: healthy, ground glass opacity (GGO), micronodules, consolidation, reticulation, honeycombing and a combination of GGO/reticulation. To train and evaluate the CNN, we used a dataset of 14696 image patches, derived by 120 CT scans from different scanners and hospitals. To the best of our knowledge, this is the first deep CNN designed for the specific problem. A comparative analysis proved the effectiveness of the proposed CNN against previous methods in a challenging dataset. The classification performance (\sim 85.5\%) demonstrated the potential of CNNs in analyzing lung patterns. Future work includes, extending the CNN to three-dimensional data provided by CT volume scans and integrating the proposed method into a CAD system that aims to provide differential diagnosis for ILDs as a supportive tool for radiologists. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
274. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?
- Author
-
Tajbakhsh, Nima, Shin, Jae Y., Gurudu, Suryakanth R., Hurst, R. Todd, Kendall, Christopher B., Gotway, Michael B., and Liang, Jianming
- Subjects
- *
ARTIFICIAL neural networks , *DIAGNOSTIC imaging , *STOCHASTIC convergence , *DEEP learning , *KNOWLEDGE transfer , *MATHEMATICAL convolutions - Abstract
Training a deep convolutional neural network (CNN) from scratch is difficult because it requires a large amount of labeled training data and a great deal of expertise to ensure proper convergence. A promising alternative is to fine-tune a CNN that has been pre-trained using, for instance, a large set of labeled natural images. However, the substantial differences between natural and medical images may advise against such knowledge transfer. In this paper, we seek to answer the following central question in the context of medical image analysis: Can the use of pre-trained deep CNNs with sufficient fine-tuning eliminate the need for training a deep CNN from scratch? To address this question, we considered four distinct medical imaging applications in three specialties (radiology, cardiology, and gastroenterology) involving classification, detection, and segmentation from three different imaging modalities, and investigated how the performance of deep CNNs trained from scratch compared with the pre-trained CNNs fine-tuned in a layer-wise manner. Our experiments consistently demonstrated that 1) the use of a pre-trained CNN with adequate fine-tuning outperformed or, in the worst case, performed as well as a CNN trained from scratch; 2) fine-tuned CNNs were more robust to the size of training sets than CNNs trained from scratch; 3) neither shallow tuning nor deep tuning was the optimal choice for a particular application; and 4) our layer-wise fine-tuning scheme could offer a practical way to reach the best performance for the application at hand based on the amount of available data. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
275. Brain Tumor Segmentation Using Convolutional Neural Networks in MRI Images.
- Author
-
Pereira, Sergio, Pinto, Adriano, Alves, Victor, and Silva, Carlos A.
- Subjects
- *
BRAIN tumors , *IMAGE segmentation , *ARTIFICIAL neural networks , *LIFE expectancy , *KERNEL (Mathematics) , *QUALITY of life , *HEALTH status indicators , *MAGNETIC resonance imaging - Abstract
Among brain tumors, gliomas are the most common and aggressive, leading to a very short life expectancy in their highest grade. Thus, treatment planning is a key stage to improve the quality of life of oncological patients. Magnetic resonance imaging (MRI) is a widely used imaging technique to assess these tumors, but the large amount of data produced by MRI prevents manual segmentation in a reasonable time, limiting the use of precise quantitative measurements in the clinical practice. So, automatic and reliable segmentation methods are required; however, the large spatial and structural variability among brain tumors make automatic segmentation a challenging problem. In this paper, we propose an automatic segmentation method based on Convolutional Neural Networks (CNN), exploring small 3\times3 kernels. The use of small kernels allows designing a deeper architecture, besides having a positive effect against overfitting, given the fewer number of weights in the network. We also investigated the use of intensity normalization as a pre-processing step, which though not common in CNN-based segmentation methods, proved together with data augmentation to be very effective for brain tumor segmentation in MRI images. Our proposal was validated in the Brain Tumor Segmentation Challenge 2013 database (BRATS 2013), obtaining simultaneously the first position for the complete, core, and enhancing regions in Dice Similarity Coefficient metric (0.88, 0.83, 0.77) for the Challenge data set. Also, it obtained the overall first position by the online evaluation platform. We also participated in the on-site BRATS 2015 Challenge using the same model, obtaining the second place, with Dice Similarity Coefficient metric of 0.78, 0.65, and 0.75 for the complete, core, and enhancing regions, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
276. Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks.
- Author
-
Dou, Qi, Chen, Hao, Yu, Lequan, Zhao, Lei, Qin, Jing, Wang, Defeng, Mok, Vincent CT, Shi, Lin, and Heng, Pheng-Ann
- Subjects
- *
CEREBROVASCULAR disease diagnosis , *ARTIFICIAL neural networks , *THREE-dimensional imaging in biology , *BIOMARKERS , *COGNITION disorders diagnosis , *RADIOLOGISTS - Abstract
Cerebral microbleeds (CMBs) are small haemorrhages nearby blood vessels. They have been recognized as important diagnostic biomarkers for many cerebrovascular diseases and cognitive dysfunctions. In current clinical routine, CMBs are manually labelled by radiologists but this procedure is laborious, time-consuming, and error prone. In this paper, we propose a novel automatic method to detect CMBs from magnetic resonance (MR) images by exploiting the 3D convolutional neural network (CNN). Compared with previous methods that employed either low-level hand-crafted descriptors or 2D CNNs, our method can take full advantage of spatial contextual information in MR volumes to extract more representative high-level features for CMBs, and hence achieve a much better detection accuracy. To further improve the detection performance while reducing the computational cost, we propose a cascaded framework under 3D CNNs for the task of CMB detection. We first exploit a 3D fully convolutional network (FCN) strategy to retrieve the candidates with high probabilities of being CMBs, and then apply a well-trained 3D CNN discrimination model to distinguish CMBs from hard mimics. Compared with traditional sliding window strategy, the proposed 3D FCN strategy can remove massive redundant computations and dramatically speed up the detection process. We constructed a large dataset with 320 volumetric MR scans and performed extensive experiments to validate the proposed method, which achieved a high sensitivity of 93.16% with an average number of 2.74 false positives per subject, outperforming previous methods using low-level descriptors or 2D CNNs by a significant margin. The proposed method, in principle, can be adapted to other biomarker detection tasks from volumetric medical data. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
277. Fast Convolutional Neural Network Training Using Selective Data Sampling: Application to Hemorrhage Detection in Color Fundus Images.
- Author
-
van Grinsven, Mark J. J. P., van Ginneken, Bram, Hoyng, Carel B., Theelen, Thomas, and Sanchez, Clara I.
- Subjects
- *
HEMORRHAGE diagnosis , *FUNDUS oculi , *ARTIFICIAL neural networks , *EYE color , *COMPUTER vision , *DEEP learning - Abstract
Convolutional neural networks (CNNs) are deep learning network architectures that have pushed forward the state-of-the-art in a range of computer vision applications and are increasingly popular in medical image analysis. However, training of CNNs is time-consuming and challenging. In medical image analysis tasks, the majority of training examples are easy to classify and therefore contribute little to the CNN learning process. In this paper, we propose a method to improve and speed-up the CNN training for medical image analysis tasks by dynamically selecting misclassified negative samples during training. Training samples are heuristically sampled based on classification by the current status of the CNN. Weights are assigned to the training samples and informative samples are more likely to be included in the next CNN training iteration. We evaluated and compared our proposed method by training a CNN with (SeS) and without (NSeS) the selective sampling method. We focus on the detection of hemorrhages in color fundus images. A decreased training time from 170 epochs to 60 epochs with an increased performance—on par with two human experts—was achieved with areas under the receiver operating characteristics curve of 0.894 and 0.972 on two data sets. The SeS CNN statistically outperformed the NSeS CNN on an independent test set. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
278. Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images.
- Author
-
Sirinukunwattana, Korsuk, Raza, Shan E Ahmed, Tsang, Yee-Wah, Snead, David R. J., Cree, Ian A., and Rajpoot, Nasir M.
- Subjects
- *
COLON cancer diagnosis , *DEEP learning , *CELL nuclei , *ARTIFICIAL neural networks , *PROBABILITY theory ,CANCER histopathology - Abstract
Detection and classification of cell nuclei in histopathology images of cancerous tissue stained with the standard hematoxylin and eosin stain is a challenging task due to cellular heterogeneity. Deep learning approaches have been shown to produce encouraging results on histopathology images in various studies. In this paper, we propose a Spatially Constrained Convolutional Neural Network (SC-CNN) to perform nucleus detection. SC-CNN regresses the likelihood of a pixel being the center of a nucleus, where high probability values are spatially constrained to locate in the vicinity of the centers of nuclei. For classification of nuclei, we propose a novel Neighboring Ensemble Predictor (NEP) coupled with CNN to more accurately predict the class label of detected cell nuclei. The proposed approaches for detection and classification do not require segmentation of nuclei. We have evaluated them on a large dataset of colorectal adenocarcinoma images, consisting of more than 20,000 annotated nuclei belonging to four different classes. Our results show that the joint detection and classification of the proposed SC-CNN and NEP produces the highest average F1 score as compared to other recently published approaches. Prospectively, the proposed methods could offer benefit to pathology practice in terms of quantitative analysis of tissue constituents in whole-slide images, and potentially lead to a better understanding of cancer. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
279. A CNN Regression Approach for Real-Time 2D/3D Registration.
- Author
-
Miao, Shun, Wang, Z. Jane, and Liao, Rui
- Subjects
- *
SIGNAL convolution , *ARTIFICIAL neural networks , *IMAGE registration , *MEDICAL imaging systems , *THREE-dimensional imaging , *REGRESSION analysis , *IMAGE quality analysis - Abstract
In this paper, we present a Convolutional Neural Network (CNN) regression approach to address the two major limitations of existing intensity-based 2-D/3-D registration technology: 1) slow computation and 2) small capture range. Different from optimization-based methods, which iteratively optimize the transformation parameters over a scalar-valued metric function representing the quality of the registration, the proposed method exploits the information embedded in the appearances of the digitally reconstructed radiograph and X-ray images, and employs CNN regressors to directly estimate the transformation parameters. An automatic feature extraction step is introduced to calculate 3-D pose-indexed features that are sensitive to the variables to be regressed while robust to other factors. The CNN regressors are then trained for local zones and applied in a hierarchical manner to break down the complex regression task into multiple simpler sub-tasks that can be learned separately. Weight sharing is furthermore employed in the CNN regression model to reduce the memory footprint. The proposed approach has been quantitatively evaluated on 3 potential clinical applications, demonstrating its significant advantage in providing highly accurate real-time 2-D/3-D registration with a significantly enlarged capture range when compared to intensity-based methods. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
280. Leveraging the Error Resilience of Neural Networks for Designing Highly Energy Efficient Accelerators.
- Author
-
Du, Zidong, Lingamneni, Avinash, Chen, Yunji, Palem, Krishna V., Temam, Olivier, and Wu, Chengyong
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *ARTIFICIAL intelligence , *NEURAL chips , *NEURO-controllers - Abstract
In recent years, inexact computing has been increasingly regarded as one of the most promising approaches for slashing energy consumption in many applications that can tolerate a certain degree of inaccuracy. Driven by the principle of trading tolerable amounts of application accuracy in return for significant resource savings—the energy consumed, the (critical path) delay, and the (silicon) area—this approach has been limited to application-specified integrated circuits (ASICs) so far. These ASIC realizations have a narrow application scope and are often rigid in their tolerance to inaccuracy, as currently designed; the latter often determining the extent of resource savings we would achieve. In this paper, we propose to improve the application scope, error resilience and the energy savings of inexact computing by combining it with hardware neural networks. These neural networks are fast emerging as popular candidate accelerators for future heterogeneous multicore platforms and have flexible error resilience limits owing to their ability to be trained. Our results in 65-nm technology demonstrate that the proposed inexact neural network accelerator could achieve 1.78– 2.67 \times savings in energy consumption (with corresponding delay and area savings being 1.23 and 1.46 \times , respectively) when compared to the existing baseline neural network implementation, at the cost of a small accuracy loss (mean squared error increases from 0.14 to 0.20 on average). [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
281. A Single-Shot Generalized Device Placement for Large Dataflow Graphs.
- Author
-
Zhou, Yanqi, Roy, Sudip, Abdolrashidi, Amirali, Wong, Daniel Lin-Kit, Ma, Peter, Xu, Qiumin, Mirhoseini, Azalia, and Laudon, James
- Subjects
- *
RECURRENT neural networks , *ARTIFICIAL neural networks , *DEEP learning - Abstract
With increasingly complex neural network architectures and heterogeneous device characteristics, finding a reasonable graph partitioning and device placement strategy is challenging. There have been prior attempts at learned approaches for solving device placement, these approaches are computationally expensive, unable to handle large graphs consisting over 50000 nodes, and do not generalize well to unseen graphs. To address all these limitations, we propose an efficient single-shot, generalized deep RL method (SGDP) based on a scalable sequential attention mechanism over a graph neural network that is transferable to new graphs. On a diverse set of representative deep learning models, our method on average achieves 20% improvement over human placement and 18% improvement over the prior art with 15× faster convergence. We are the first to demonstrate super human performance on 8-layer recurrent neural network language model and 8-layer GNMT consisting of over 50000 nodes, on 8-GPUs. We provide rationales and sensitivity study on model architecture selections. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
282. Compressed-Domain Ship Detection on Spaceborne Optical Image Using Deep Neural Network and Extreme Learning Machine.
- Author
-
Jiexiong Tang, Chenwei Deng, Guang-Bin Huang, and Baojun Zhao
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *MACHINE learning , *SPACE-based radar , *OPTICAL images , *AUTOMATIC detection in radar - Abstract
Ship detection on spaceborne images has attracted great interest in the applications of maritime security and traffic control. Optical images stand out from other remote sensing images in object detection due to their higher resolution and more visualized contents. However, most of the popular techniques for ship detection from optical spaceborne images have two shortcomings: 1) Compared with infrared and synthetic aperture radar images, their results are affected by weather conditions, like clouds and ocean waves, and 2) the higher resolution results in larger data volume, which makes processing more difficult. Most of the previous works mainly focus on solving the first problem by improving segmentation or classification with complicated algorithms. These methods face difficulty in efficiently balancing performance and complexity. In this paper, we propose a ship detection approach to solving the aforementioned two issues using wavelet coefficients extracted from JPEG2000 compressed domain combined with deep neural network (DNN) and extreme learning machine (ELM). Compressed domain is adopted for fast ship candidate extraction, DNN is exploited for high-level feature representation and classification, and ELM is used for efficient feature pooling and decision making. Extensive experiments demonstrate that, in comparison with the existing relevant state-of-the-art approaches, the proposed method requires less detection time and achieves higher detection accuracy. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
283. Wound-Rotor Induction Generator Inter-Turn Short-Circuits Diagnosis Using a New Digital Neural Network.
- Author
-
Toma, Samuel, Capocchi, Laurent, and Capolino, Gerard-Andre
- Subjects
- *
BACK propagation , *ARTIFICIAL neural networks software , *DEEP learning , *INDUCTION generators , *ALTERNATING current generators - Abstract
This paper deals with a new transformation and fusion of digital input patterns used to train and test feedforward neural network for a wound-rotor three-phase induction machine windings short-circuit diagnosis. The single type of short-circuits tested by the proposed approach is based on turn-to-turn fault which is known as the first stage of insulation degradation. Used input/output data have been binary coded in order to reduce the computation complexity. A new procedure, namely addition and mean of the set of same rank, has been implemented to eliminate the redundancy due to the periodic character of input signals. However, this approach has a great impact on the statistical properties on the processed data in terms of richness and of statistical distribution. The proposed neural network has been trained and tested with experimental signals coming from six current sensors implemented around a setup with a prime mover and a 5.5 kW wound-rotor three-phase induction generator. Both stator and rotor windings have been modified in order to sort out first and last turns in each phase. The experimental results highlight the superiority of using this new procedure in both training and testing modes. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
284. Representation Learning: A Review and New Perspectives.
- Author
-
Bengio, Yoshua, Courville, Aaron, and Vincent, Pascal
- Subjects
- *
MACHINE learning , *FEATURE extraction , *MANIFOLDS (Mathematics) , *ARTIFICIAL neural networks , *AUTOMATIC speech recognition , *BOLTZMANN machine - Abstract
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
285. 3D Convolutional Neural Networks for Human Action Recognition.
- Author
-
Ji, Shuiwang, Xu, Wei, Yang, Ming, and Yu, Kai
- Subjects
- *
ARTIFICIAL neural networks , *THREE-dimensional imaging , *ARTIFICIAL intelligence , *VIDEO surveillance , *TELEVISION in security systems , *CLOSED-circuit television - Abstract
We consider the automated recognition of human actions in surveillance videos. Most current methods build classifiers based on complex handcrafted features computed from the raw inputs. Convolutional neural networks (CNNs) are a type of deep model that can act directly on the raw inputs. However, such models are currently limited to handling 2D inputs. In this paper, we develop a novel 3D CNN model for action recognition. This model extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames. The developed model generates multiple channels of information from the input frames, and the final feature representation combines information from all channels. To further boost the performance, we propose regularizing the outputs with high-level features and combining the predictions of a variety of different models. We apply the developed models to recognize human actions in the real-world environment of airport surveillance videos, and they achieve superior performance in comparison to baseline methods. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.