15 results on '"IMAGE processing"'
Search Results
2. Normalized Non-Negative Sparse Encoder for Fast Image Representation.
- Author
-
Zhang, Shizhou, Wang, Jinjun, Shi, Weiwei, Gong, Yihong, Xia, Yong, and Zhang, Yanning
- Subjects
- *
IMAGE representation , *ARTIFICIAL neural networks , *FEATURE extraction , *IMAGE processing - Abstract
Image representation based on sparse coding generalizes the bag of words model. Although it reduces the reconstruction error for local features to achieve the state-of-the-art image classification performance, the large computational cost hinders the application of sparse coding-based image features. In this paper, we propose approximating a sparse code using the output of a simple neural network. The resulting parameter learning model for the neural network automatically incorporates non-negative and shift-invariant constraints, leading to an efficient normalized non-negative sparse coding (N3SC) sparse encoder. Without the use of the traditional iterative process to solve the sparse coding objective, the sparse encoder directly “converts” each local feature into a sparse code. We also introduce a method for training the encoder based on the auto-encoder method. In addition, we formally propose the corresponding sparse coding scheme called N3SC, which enforces both the non-negative constraint and the shift-invariant constraint in addition to the traditional sparse coding criteria. As demonstrated by several experiments, the obtained N3SC encoder requires only 3%–10% of the processing time for image feature extraction compared with the standard sparse coding scheme. At the same time, the features extracted using the exact solutions of the N3SC coding scheme and the N3SC encoder offer superior image classification accuracy compared to the accuracy of many existing sparse coding-based representations. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
3. Long-Short-Term Features for Dynamic Scene Classification.
- Author
-
Huang, Yuanjun, Cao, Xianbin, Wang, Qi, Zhang, Baochang, Zhen, Xiantong, and Li, Xuelong
- Subjects
- *
ARTIFICIAL intelligence , *ARTIFICIAL neural networks , *IMAGE processing , *SUPPORT vector machines , *MACHINE learning - Abstract
Dynamic scene classification has been extensively studied in computer vision due to its widespread applications. The key to dynamic scene classification lies in jointly characterizing spatial appearance and temporal dynamics to achieve informative representation, which remains an outstanding task in the literature. In this paper, we propose a unified framework to extract spatial and temporal features for dynamic scene representation. More specifically, we deploy two variants of deep convolutional neural networks to encode spatial appearance and short-term dynamics into short-term deep features (STDF). Based on STDF, we propose using the autoregressive moving average model to extract long-term frequency features (LTFF). By combining STDF and LTFF, we establish the long–short-term feature (LSTF) representations of dynamic scenes. The LSTF characterizes both spatial and temporal patterns of dynamic scenes for comprehensive and information representation that enables more accurate classification. Extensive experiments on three-dynamic scene classification benchmarks have shown that the proposed LSTF achieves high performance and substantially surpasses the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
4. Video Compression Based on Spatio-Temporal Resolution Adaptation.
- Author
-
Afonso, Mariana, Zhang, Fan, and Bull, David R.
- Subjects
- *
VIDEO compression , *VIDEO coding , *SPATIOTEMPORAL processes , *IMAGE quality analysis , *IMAGE compression , *ARTIFICIAL neural networks , *IMAGE processing , *SIGNAL convolution - Abstract
A video compression framework based on spatio-temporal resolution adaptation (ViSTRA) is proposed, which dynamically resamples the input video spatially and temporally during encoding, based on a quantisation-resolution decision, and reconstructs the full resolution video at the decoder. Temporal upsampling is performed using frame repetition, whereas a convolutional neural network super-resolution model is employed for spatial resolution upsampling. ViSTRA has been integrated into the high efficiency video coding reference software (HM 16.14). Experimental results verified via an international challenge show significant improvements, with BD-rate gains of 15% based on PSNR and an average MOS difference of 0.5 based on subjective visual quality tests. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
5. Once for All: A Two-Flow Convolutional Neural Network for Visual Tracking.
- Author
-
Chen, Kai and Tao, Wenbing
- Subjects
- *
OBJECT tracking (Computer vision) , *ARTIFICIAL neural networks , *FEATURE extraction , *VISUALIZATION , *IMAGE processing - Abstract
The main challenges of visual object tracking arise from the arbitrary appearance of the objects that need to be tracked. Most existing algorithms try to solve this problem by training a new model to regenerate or classify each tracked object. As a result, the model needs to be initialized and retrained for each new object. In this paper, we propose to track different objects in an object-independent approach with a novel two-flow convolutional neural network (YCNN). The YCNN takes two inputs (one is an object image patch, the other is a larger searching image patch), then outputs a response map which predicts how likely and where the object would appear in the search patch. Unlike the object-specific approaches, the YCNN is actually trained to measure the similarity between the two image patches. Thus, this model will not be limited to any specific object. Furthermore, the network is end-to-end trained to extract both shallow and deep dedicated convolutional features for visual tracking. And once properly trained, the YCNN can be used to track all kinds of objects without further training and updating. As a result, our algorithm is able to run at a very high speed of 45 frames-per-second. The effectiveness of the proposed algorithm can also be proved by the experiments on two popular data sets: OTB-100 and VOT-2014. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
6. An End-to-End Compression Framework Based on Convolutional Neural Networks.
- Author
-
Jiang, Feng, Tao, Wen, Liu, Shaohui, Ren, Jie, Guo, Xun, and Zhao, Debin
- Subjects
- *
IMAGE processing , *IMAGE compression , *COMPUTER vision software , *ARTIFICIAL neural networks , *DEEP learning , *IMAGE encryption , *ENCODING - Abstract
Deep learning, e.g., convolutional neural networks (CNNs), has achieved great success in image processing and computer vision especially in high-level vision applications, such as recognition and understanding. However, it is rarely used to solve low-level vision problems such as image compression studied in this paper. Here, we move forward a step and propose a novel compression framework based on CNNs. To achieve high-quality image compression at low bit rates, two CNNs are seamlessly integrated into an end-to-end compression framework. The first CNN, named compact convolutional neural network (ComCNN), learns an optimal compact representation from an input image, which preserves the structural information and is then encoded using an image codec (e.g., JPEG, JPEG2000, or BPG). The second CNN, named reconstruction convolutional neural network (RecCNN), is used to reconstruct the decoded image with high quality in the decoding end. To make two CNNs effectively collaborate, we develop a unified end-to-end learning algorithm to simultaneously learn ComCNN and RecCNN, which facilitates the accurate reconstruction of the decoded image using RecCNN. Such a design also makes the proposed compression framework compatible with existing image coding standards. Experimental results validate that the proposed compression framework greatly outperforms several compression frameworks that use existing image coding standards with the state-of-the-art deblocking or denoising post-processing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
7. Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition.
- Author
-
Zhang, Shiqing, Zhang, Shiliang, Huang, Tiejun, Gao, Wen, and Tian, Qi
- Subjects
- *
EMOTION recognition , *DEEP learning , *AUDIOVISUAL materials , *FEATURE extraction , *IMAGE processing , *ARTIFICIAL neural networks - Abstract
Emotion recognition is challenging due to the emotional gap between emotions and audio–visual features. Motivated by the powerful feature learning ability of deep neural networks, this paper proposes to bridge the emotional gap by using a hybrid deep model, which first produces audio–visual segment features with Convolutional Neural Networks (CNNs) and 3D-CNN, then fuses audio–visual segment features in a Deep Belief Networks (DBNs). The proposed method is trained in two stages. First, CNN and 3D-CNN models pre-trained on corresponding large-scale image and video classification tasks are fine-tuned on emotion recognition tasks to learn audio and visual segment features, respectively. Second, the outputs of CNN and 3D-CNN models are combined into a fusion network built with a DBN model. The fusion network is trained to jointly learn a discriminative audio–visual segment feature representation. After average-pooling segment features learned by DBN to form a fixed-length global video feature, a linear Support Vector Machine is used for video emotion classification. Experimental results on three public audio–visual emotional databases, including the acted RML database, the acted eNTERFACE05 database, and the spontaneous BAUM-1s database, demonstrate the promising performance of the proposed method. To the best of our knowledge, this is an early work fusing audio and visual cues with CNN, 3D-CNN, and DBN for audio–visual emotion recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
8. Improved Object Detection With Iterative Localization Refinement in Convolutional Neural Networks.
- Author
-
Cheng, Kai-Wen, Chen, Yie-Tarng, and Fang, Wen-Hsien
- Subjects
- *
OBJECT recognition (Computer vision) , *ITERATIVE refinement , *ARTIFICIAL neural networks , *PROBABILITY theory , *IMAGE processing - Abstract
To facilitate object localization, the existing convolutional neural network (CNN)-based object detection often requires an object proposal method, which, however, may produce inaccurate region proposals and thus impact the performance. To overcome this setback, this paper presents a novel iterative localization refinement method which, undertaken at a mid-layer of a CNN architecture, progressively refines a subset of region proposals in order to match as much ground-truth as possible. In each iteration, the refinement task is cast into a probabilistic framework based on an ingeniously devised probability function. To expedite the computation of the probability function, a divide-and-conquer paradigm is developed by the theorem of total probability. Moreover, an approximate variant based on a refined sampling strategy is also addressed to further reduce the complexity. The proposed ILR method is not only data-driven and free of learning, but it can also be incorporated with many existing CNN-based object detection algorithms, such as Faster R-CNN to enhance the detection accuracy without changing their configurations. Simulations show that the proposed method can improve the main state-of-the-art works on the PASCAL VOC 2007, 2012 and Youtube-Objects data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
9. Adaptive Resolution Optimization and Tracklet Reliability Assessment for Efficient Multi-Object Tracking.
- Author
-
Yu, Ruixing, Cheng, Irene, Zhu, Bing, Bedmutha, Sweta, and Basu, Anup
- Subjects
- *
IMAGE processing , *ARTIFICIAL neural networks , *OBJECT tracking (Computer vision) , *ARTIFICIAL intelligence , *TRACKING algorithms - Abstract
Recent digital acquisition systems can acquire high-resolution videos, generating a large amount of dynamic data and leading to higher computational cost in online target tracking and learning, especially for complex scenes. We introduce an efficient and robust approach to improve the performance of multi-object online tracking and learning. Prior methods saved on computational cost by scaling down each video frame to a fixed smaller resolution, without considering the image features. Our algorithm computes the optimal image resolution adaptively by exploiting the correlation between an image’s gray-value distribution and resolution. This dimensionality reduction step significantly improves the time performance in subsequent online tracking and learning, while preserving high tracking accuracy. Since a small detection error in one frame can cause cumulative error in the video sequence leading to incorrect labeling and tracking, we introduce a new tracklet reliability assessment metric to eliminate incorrect samples. Experimental results show that our approach can successfully track multiple objects in real time with both high precision and recall. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
10. Nonlinear Structural Hashing for Scalable Video Search.
- Author
-
Chen, Zhixiang, Lu, Jiwen, Feng, Jianjiang, and Zhou, Jie
- Subjects
- *
VIDEOS , *INFORMATION retrieval , *MACHINE learning , *IMAGE processing , *ARTIFICIAL neural networks - Abstract
In this paper, we propose a nonlinear structural hashing approach to learn compact binary codes for scalable video search. Unlike most existing video hashing methods which consider image frames within a video separately for binary code learning, we develop a multi-layer neural network to learn compact and discriminative binary codes by exploiting both the structural information between different frames within a video and the nonlinear relationship between video samples. To be specific, we learn these binary codes under two different constraints at the output of our network: 1) the distance between the learned binary codes for frames within the same scene is minimized and 2) the distance between the learned binary matrices for a video pair with the same label is less than a threshold and that for a video pair with different labels is larger than a threshold. To better measure the structural information of the scenes from videos, we employ a subspace clustering method to cluster frames into different scenes. Moreover, we design multiple hierarchical nonlinear transformations to preserve the nonlinear relationship between videos. Experimental results on three video data sets show that our method outperforms state-of-the-art hashing approaches on the scalable video search task. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
11. Pushing the Limits of Deep CNNs for Pedestrian Detection.
- Author
-
Hu, Qichang, Wang, Peng, Shen, Chunhua, van den Hengel, Anton, and Porikli, Fatih
- Subjects
- *
ARTIFICIAL neural networks , *ARTIFICIAL intelligence , *ALGORITHMS , *IMAGE processing , *BIG data - Abstract
Compared with other applications in computer vision, convolutional neural networks (CNNs) have underperformed on pedestrian detection. A breakthrough was made very recently using sophisticated deep CNN (DCNN) models, with a number of handcrafted features or explicit occlusion handling mechanism. In this paper, we show that by reusing the convolutional feature maps of a DCNN model as image features to train an ensemble of boosted decision models, we are able to achieve the best reported accuracy without using specially designed learning algorithms. We empirically identify and disclose important implementation details. We also show that pixel labeling may be simply combined with a detector to boost the detection performance. By adding complementary handcrafted features such as optical flow, the DCNN-based detector can be further improved. We advance the state-of-the-art results by lowering the log-average miss rate from 11.7% to 8.9% on the Caltech data set and from 11.2% to 8.6% on the Inria data set. We also achieve a comparable result to state-of-the-art approaches on the KITTI data set. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
12. No-Reference Video Quality Assessment With 3D Shearlet Transform and Convolutional Neural Networks.
- Author
-
Li, Yuming, Po, Lai-Man, Cheung, Chun-Ho, Xu, Xuyuan, Feng, Litong, Yuan, Fang, and Cheung, Kwok-Wai
- Subjects
- *
VIDEOS , *ARTIFICIAL neural networks , *LOGISTIC regression analysis , *ALGORITHMS , *IMAGE processing , *SIGNAL denoising - Abstract
In this paper, we propose an efficient general-purpose no-reference (NR) video quality assessment (VQA) framework that is based on 3D shearlet transform and convolutional neural network (CNN). Taking video blocks as input, simple and efficient primary spatiotemporal features are extracted by 3D shearlet transform, which are capable of capturing natural scene statistics properties. Then, CNN and logistic regression are concatenated to exaggerate the discriminative parts of the primary features and predict a perceptual quality score. The resulting algorithm, which we name shearlet- and CNN-based NR VQA (SACONVA), is tested on well-known VQA databases of Laboratory for Image & Video Engineering, Image & Video Processing Laboratory, and CSIQ. The testing results have demonstrated that SACONVA performs well in predicting video quality and is competitive with current state-of-the-art full-reference VQA methods and general-purpose NR-VQA algorithms. Besides, SACONVA is extended to classify different video distortion types in these three databases and achieves excellent classification accuracy. In addition, we also demonstrate that SACONVA can be directly applied in real applications such as blind video denoising. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
13. A Robust Object Segmentation System Using a Probability-Based Background Extraction Algorithm.
- Author
-
Chung-Cheng Chiu, Min-Yu Ku, and Li-Wey Liang
- Subjects
- *
COMPUTER vision , *VIDEO surveillance , *COMPUTATIONAL complexity , *ELECTRONIC data processing , *IMAGE processing , *ARTIFICIAL neural networks - Abstract
A video-based monitoring system must be capable of continuous operation under various weather and illumination conditions. Moreover, background subtraction is a very important part of surveillance applications for successful segmentation of objects from video sequences, and the accuracy, computational complexity, and memory requirements of the initial background extraction are crucial in any background subtraction method. This paper proposes an algorithm to extract initial color backgrounds from surveillance videos using a probability-based background extraction algorithm. With the proposed algorithm, the initial background can be extracted accurately and quickly, while using relatively little memory. The intrusive objects can then be segmented quickly and correctly by a robust object segmentation algorithm. The segmentation algorithm analyzes the threshold values of the background subtraction from the prior frame to obtain good quality while minimizing execution time and maximizing detection accuracy. The color background images can be extracted efficiently and quickly from color image sequences and updated in real time to overcome any variation in illumination conditions. Experimental results for various environmental sequences and a quantitative evaluation are provided to demonstrate the robustness, accuracy, effectiveness, and memory economy of the proposed algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
14. IEEE Transactions on Circuits and Systems for Video Technology information for authors.
- Subjects
- *
ELECTRIC circuits , *INFORMATION technology , *IMAGE analysis , *IMAGE processing , *VIDEO compression , *SIGNAL processing , *ARTIFICIAL neural networks - Published
- 2011
- Full Text
- View/download PDF
15. IEEE Transactions on Circuits and Systems for Video Technology information for authors.
- Subjects
- *
PUBLISHING , *AUTHORS , *INFORMATION technology , *IMAGE analysis , *IMAGE processing , *SIGNAL processing , *ARTIFICIAL neural networks , *COPYRIGHT , *MANUSCRIPTS , *PUBLICATIONS - Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.