141 results on '"Predictive coding"'
Search Results
2. Deep Pixel Restoration Loop Coding Network [Application Notes] [Application Notes].
- Author
-
Wang, Tianxiang, Dai, Qun, and Yuan, Muxuan
- Abstract
This paper presents a solution to the video prediction problem based on the deep learning paradigm. Predicted frames generated by existing video prediction models are often blurry and have difficulty maintaining accuracy in multi-step prediction. To overcome these limitations, this paper presents a deep learning model, named the deep pixel restoration loop coding network (DPR-LC-Net), which employs the concept of predictive coding and adopts pixels in the real frames. While making a long-term prediction, it can generate clear prediction frames with few errors. The calculation process of DPR-LC-Net is multi-sequential: it completes calculations in the form of an approximate loop from top to bottom and then from left to right. After predicting subsequent steps, DPR-LC-Net both calculates errors and removes them from sequential prediction. Finally, the model includes a unique pixel restoration module that works efficiently on pixels in the preceding real frames to generate predicted frames, thereby improving the clarity of the predicted frames. Extensive experiments using four video datasets illustrate that the prediction performance of DPR-LC-Net is superior to that of the state-of-the-art models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Learning-Free Hyperspectral Anomaly Detection With Unpredictive Frequency Residual Priors
- Author
-
Shichao Zhou, Wenzheng Wang, and Chentao Gao
- Subjects
Fast detection ,hyperspectral frequency domain analysis ,learning-free anomaly detection ,predictive coding ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Hyperspectral anomaly detection aims to fast and credibly find nontrivial candidate targets without prior knowledge, which has become an increasingly pressing need as imagery swath and resolution are growing rapidly. Relevant state-of-the-art learning-based anomaly detection approaches have benefited from data-driven hierarchical feature embeddings that typically model the geometric distribution of spectral vectors. However, most of these techniques are incompatible with resource-constrained applications: 1) huge computational costs caused by feedforward convolution operation cannot be supported with limited computation resources and storage (e.g., in-orbit processing); 2) detection accuracy relies on large-scale training datasets, which results in labor-expensive imagery collection, pixel-level labeling, and time-consuming learning procedure. To address these issues, we advocate a learning-free, frequency domain anomaly detection method combined with predictive coding, an intriguing heuristic prior from human vision system. Technically, 1) inherently efficient frequency transformation could be implemented with existing image compression modules (e.g., JPEG or JPEG2000 codec), which improves the utilization of computational resources; 2) predictive coding mechanism is exploited for suppressing frequently occurring information represented in the low-entropy frequency domain, such that the “unpredictive” subject (i.e., anomaly spectrums) can “pop out” with naive residuals. Such heuristic prior incorporated into the computational model can reduce dependence on the large-scale training set. Experiments on real-world hyperspectral datasets confirm the efficacy of our model. Besides, low computation cost of the proposed (fast frequency transformation and analytical solutions) anomaly detector facilitates rather straightforward sliding window verification in high-resolution imagery.
- Published
- 2022
- Full Text
- View/download PDF
4. Communication-Efficient Federated Learning via Predictive Coding.
- Author
-
Yue, Kai, Jin, Richeng, Wong, Chau-Wai, and Dai, Huaiyu
- Abstract
Federated learning can enable remote workers to collaboratively train a shared machine learning model while allowing training data to be kept locally. In the use case of wireless mobile devices, the communication overhead is a critical bottleneck due to limited power and bandwidth. Prior work has utilized various data compression tools such as quantization and sparsification to reduce the overhead. In this paper, we propose a predictive coding based compression scheme for federated learning. The scheme has shared prediction functions among all devices and allows each worker to transmit a compressed residual vector derived from the reference. In each communication round, we select the predictor and quantizer based on the rate–distortion cost, and further reduce the redundancy with entropy coding. Extensive simulations reveal that the communication cost can be reduced up to 99% with even better learning performance when compared with other baseline methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Lossless Image Compression by Joint Prediction of Pixel and Context Using Duplex Neural Networks
- Author
-
Hochang Rhee, Yeong Il Jang, Seyun Kim, and Nam Ik Cho
- Subjects
Lossless image compression ,predictive coding ,neural network ,adaptive arithmetic coding ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper presents a new lossless image compression method based on the learning of pixel values and contexts through multilayer perceptrons (MLPs). The prediction errors and contexts obtained by MLPs are forwarded to adaptive arithmetic encoders, like the conventional lossless compression schemes. The MLP-based prediction has long been attempted for lossless compression, and recently convolutional neural networks (CNNs) are also adopted for the lossy/lossless coding. While the existing MLP-based lossless compression schemes focused only on accurate pixel prediction, we jointly predict the pixel values and contexts. We also adopt and design channel-wise progressive learning, residual learning, and duplex network in this MLP-based framework, which leads to improved coding gain compared to the conventional methods. Experiments show that the proposed method performs better than the conventional non-learning algorithms and also recent learning-based compression methods with practical computation time.
- Published
- 2021
- Full Text
- View/download PDF
6. Binocular Rivalry Oriented Predictive Autoencoding Network for Blind Stereoscopic Image Quality Measurement.
- Author
-
Xu, Jiahua, Zhou, Wei, Chen, Zhibo, Ling, Suiyi, and Le Callet, Patrick
- Subjects
- *
DEEP learning , *BINOCULAR rivalry , *STEREO image processing , *BINOCULAR vision , *IMAGE fusion , *CODING theory - Abstract
Stereoscopic image quality measurement (SIQM) has become increasingly important for guiding stereo image processing and commutation systems due to the widespread usage of 3-D contents. Compared with conventional methods that are relied on handcrafted features, deep-learning-oriented measurements have achieved remarkable performance in recent years. However, most existing deep SIQM evaluators are not specifically built for stereoscopic contents and consider little prior domain knowledge of the 3-D human visual system (HVS) in network design. In this article, we develop a Predictive Auto-encoDing Network (PAD-Net) for blind/no-reference SIQM. In the first stage, inspired by the predictive coding theory that the cognition system tries to match bottom–up visual signal with top–down predictions, we adopt the encoder–decoder architecture to reconstruct the distorted inputs. Besides, motivated by the binocular rivalry phenomenon, we leverage the likelihood and prior maps generated from the predictive coding process in the Siamese framework for assisting SIQM. In the second stage, a quality regression network is applied to the fusion image for acquiring the perceptual quality prediction. The performance of PAD-Net has been extensively evaluated on three benchmark databases and the superiority has been well validated on both symmetrically and asymmetrically distorted stereoscopic images under various distortion types. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
7. Hierarchical Predictive Coding-Based JND Estimation for Image Compression.
- Author
-
Wang, Hongkui, Yu, Li, Liang, Junhui, Yin, Haibing, Li, Tiansong, and Wang, Shengwei
- Subjects
- *
IMAGE compression , *CODING theory , *UNCERTAINTY (Information theory) , *VISUAL perception , *BIT rate , *VIDEO coding - Abstract
The human visual system (HVS) is a hierarchical system, in which visual signals are processed hierarchically. In this paper, the HVS is modeled as a three-level communication system and visual perception is divided into three stages according to the hierarchical predictive coding theory. Then, a novel just noticeable distortion (JND) estimation scheme is proposed. In visual perception, the input signals are predicted constantly and spontaneously in each hierarchy, and neural response is evoked by the central residue and inhibited by surrounding residues. These two types’ residues are regarded as the positive and negative visual incentives which cause positive and negative perception effects, respectively. In neuroscience, the effect of incentive on observer is measured by the surprise of this incentive. Thus, we propose a surprise-based measurement method to measure both perception effects. Specifically, considering the biased competition of visual attention, we define the product of the residue self-information (i.e., surprise) and the competition biases as the perceptual surprise to measure the positive perception effect. As for the negative perception effect, it is measured by the average surprise (i.e., the local Shannon entropy). The JND threshold of each stage is estimated individually by considering both perception effects. The total JND threshold is finally obtained by non-linear superposition of three stage thresholds. Furthermore, the proposed JND estimation scheme is incorporated into the codec of Versatile Video Coding for image compression. Experimental results show that the proposed JND model outperforms the relevant existing ones, and over 16% of bit rate can be reduced without jeopardizing the perceptual quality. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Multiple Description Coding Based on Convolutional Auto-Encoder
- Author
-
Hongfei Li, Lili Meng, Jia Zhang, Yanyan Tan, Yuwei Ren, and Huaxiang Zhang
- Subjects
Convolutional auto-encoder (CAE) ,multiple description coding (MDC) ,predictive coding ,quality metric ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Deep learning, such as convolutional neural networks, has been achieved great success in image processing, computer vision task, and image compression, and has achieved better performance. This paper designs a multiple description coding frameworks based on symmetric convolutional auto-encoder, which can achieve high-quality image reconstruction. First, the image is input into the convolutional auto-encoder, and the extracted features are obtained. Then, the extracted features are encoded by the multiple description coding and split into two descriptions for transmission to the decoder. We can get the side information by the side decoder and the central information by the central decoder. Finally, the side information and the central information are deconvolved by convolutional auto-encoder. The experimental results validate that the proposed scheme outperforms the state-of-the-art methods.
- Published
- 2019
- Full Text
- View/download PDF
9. Optimal Reference Selection for Random Access in Predictive Coding Schemes.
- Author
-
Pham, Mai-Quyen, Roumy, Aline, Maugey, Thomas, Dupraz, Elsa, and Kieffer, Michel
- Subjects
- *
LINEAR programming , *VIDEO coding , *INTEGER programming , *ALGORITHMS , *FORECASTING , *VIDEO recording - Abstract
Data acquired over long periods of time like High Definition (HD) videos or records from a sensor over long time intervals, have to be efficiently compressed, to reduce their size. The compression has also to allow efficient access to random parts of the data upon request from the users. Efficient compression is usually achieved with prediction between data points at successive time instants. However, this creates dependencies between the compressed representations, which is contrary to the idea of random access. Prediction methods rely in particular on reference data points, used to predict other data points. The placement of these references balances compression efficiency and random access. Existing solutions to position the references use ad hoc methods. In this paper, we study this joint problem of compression efficiency and random access. We introduce the storage cost as a measure of the compression efficiency and the transmission cost for the random access ability. We express the reference placement problem that trades storage with transmission cost as an integer linear programming problem. Considering additional assumptions on the sources and coding methods reduces the complexity of the search space of the optimization problem. Moreover, we show that the classical periodic placement of the references is optimal, when the encoding costs of each data point are equal and when requests of successive data points are made. In this particular case, a closed-form expression of the optimal period is derived. Finally, the proposed optimal placement strategy is compared with an ad hoc method, where the references correspond to sources where the prediction does not help reducing significantly the encoding cost. The proposed optimal algorithm shows a bit saving of −20% with respect to the ad hoc method. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Graph-Based Transforms for Video Coding.
- Author
-
Egilmez, Hilmi E., Chao, Yung-Hsuan, and Ortega, Antonio
- Subjects
- *
VIDEO compression , *RANDOM fields , *IMAGE compression , *DESIGN techniques , *SYMMETRIC matrices , *COVARIANCE matrices , *VIDEO coding - Abstract
In many state-of-the-art compression systems, signal transformation is an integral part of the encoding and decoding process, where transforms provide compact representations for the signals of interest. This paper introduces a class of transforms called graph-based transforms (GBTs) for video compression, and proposes two different techniques to design GBTs. In the first technique, we formulate an optimization problem to learn graphs from data and provide solutions for optimal separable and nonseparable GBT designs, called GL-GBTs. The optimality of the proposed GL-GBTs is also theoretically analyzed based on Gaussian-Markov random field (GMRF) models for intra and inter predicted block signals. The second technique develops edge-adaptive GBTs (EA-GBTs) in order to flexibly adapt transforms to block signals with image edges (discontinuities). The advantages of EA-GBTs are both theoretically and empirically demonstrated. Our experimental results show that the proposed transforms can significantly outperform the traditional Karhunen-Loeve transform (KLT). [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. 3D Point Cloud Attribute Compression Using Geometry-Guided Sparse Representation.
- Author
-
Gu, Shuai, Hou, Junhui, Zeng, Huanqiang, Yuan, Hui, and Ma, Kai-Kuang
- Subjects
- *
POINT cloud , *IMAGE compression , *VIDEO compression , *IMAGE color analysis - Abstract
3D point clouds associated with attributes are considered as a promising paradigm for immersive communication. However, the corresponding compression schemes for this media are still in the infant stage. Moreover, in contrast to conventional image/video compression, it is a more challenging task to compress 3D point cloud data, arising from the irregular structure. In this paper, we propose a novel and effective compression scheme for the attributes of voxelized 3D point clouds. In the first stage, an input voxelized 3D point cloud is divided into blocks of equal size. Then, to deal with the irregular structure of 3D point clouds, a geometry-guided sparse representation (GSR) is proposed to eliminate the redundancy within each block, which is formulated as an $\ell _{0}$ -norm regularized optimization problem. Also, an inter-block prediction scheme is applied to remove the redundancy between blocks. Finally, by quantitatively analyzing the characteristics of the resulting transform coefficients by GSR, an effective entropy coding strategy that is tailored to our GSR is developed to generate the bitstream. Experimental results over various benchmark datasets show that the proposed compression scheme is able to achieve better rate-distortion performance and visual quality, compared with state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. Dealing With Large-Scale Spatio-Temporal Patterns in Imitative Interaction Between a Robot and a Human by Using the Predictive Coding Framework.
- Author
-
Hwang, Jungsik, Kim, Jinhyung, Ahmadi, Ahmadreza, Choi, Minkyu, and Tani, Jun
- Subjects
- *
MIRROR neurons , *SOCIAL perception , *ARTIFICIAL neural networks , *MENTAL imagery , *GENERATING functions , *ROBOTS - Abstract
This paper aims to investigate how adequate cognitive functions for recognizing, predicting, and generating a variety of actions can be developed through iterative learning of action-caused dynamic perceptual patterns. Particularly, we examined the capabilities of mental simulation of one’s own actions as well as the inference of others’ intention because they play a crucial role, especially in social cognition. We propose a dynamic neural network model based on predictive coding which can generate and recognize dynamic visuo-proprioceptive patterns. The proposed model was examined by conducting a set of robotic simulation experiments in which a robot was trained to imitate visually perceived gesture patterns of human subjects in a simulation environment. The experimental results showed that the proposed model was able to develop a predictive model of imitative interaction through iterative learning of large-scale spatio-temporal patterns in visuo-proprioceptive input streams. Also, the experiment verified that the model was able to generate mental imagery of dynamic visuo-proprioceptive patterns without feeding the external inputs. Furthermore, the model was able to recognize the intention of others by minimizing prediction error in the observations of the others’ action patterns in an online manner. These findings suggest that the error minimization principle in predictive coding could provide a primal account for the mirror neuron functions for generating actions as well as recognizing those generated by others in a social cognitive context. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
13. Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations.
- Author
-
Ororbia, Alexander, Mali, Ankur, Giles, C. Lee, and Kifer, Daniel
- Subjects
- *
RECURRENT neural networks , *MACHINE learning , *LINEAR network coding , *NEURAL codes , *COMPUTER architecture - Abstract
Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, training these models often relies on backpropagation through time (BPTT), which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of backpropagation itself does not permit the use of nondifferentiable activation functions and is inherently sequential, making parallelization of the underlying training process difficult. Here, we propose the parallel temporal neural coding network (P-TNCN), a biologically inspired model trained by the learning algorithm we call local representation alignment. It aims to resolve the difficulties and problems that plague recurrent networks trained by BPTT. The architecture requires neither unrolling in time nor the derivatives of its internal activation functions. We compare our model and learning procedure with other BPTT alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization. We show that it outperforms these on-sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we denote as Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, outperform full BPTT as well as variants such as sparse attentive backtracking. Significantly, the hidden unit correction phase of P-TNCN allows it to adapt to new data sets even if its synaptic weights are held fixed (zero-shot adaptation) and facilitates retention of prior generative knowledge when faced with a task sequence. We present results that show the P-TNCN’s ability to conduct zero-shot adaptation and online continual sequence modeling. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. Efficient Shape Coding for Object-Based 3D Video Applications.
- Author
-
Zhu, Zhongjie, Wang, Yuer, Jiang, Gangyi, and Yang, Yueping
- Subjects
- *
VIDEOS , *VIDEO coding - Abstract
Shape is a popular way to define objects and shape coding is a key technique for object-based 3D video applications. In this paper, the issue of efficient shape coding for object-based 3D video applications is addressed, and a novel contour-based and chain-represented scheme is proposed. For a given 3D shape video, contour extraction and preprocessing are first implemented followed by chain-based representation. Then, to achieve high coding efficiency, a chain-based prediction and compensation technique is developed based on joint motion-compensated prediction and disparity-compensated prediction to effectively exploit the intra-view temporal correlation and the inter-view spatial correlation. Experiments are conducted, and the results demonstrate that the proposed scheme is more efficient than the existing methods, including state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. A Novel Video Coding Framework Using a Self-Adaptive Dictionary.
- Author
-
Xue, Yuanyi and Wang, Yao
- Subjects
- *
VIDEO coding , *DATA dictionaries , *RATE distortion theory , *LEAST squares , *L1-minimization , *PARALLEL algorithms , *DISCRETE cosine transforms - Abstract
In this paper, we propose to use a self-adaptive redundant dictionary, consisting of all possible inter and intra prediction candidates, to directly represent the frame blocks in a video sequence. The self-adaptive dictionary generalizes the conventional predictive coding approach by allowing adaptive linear combinations of prediction candidates, which is solved by an rate-distortion aware L0-norm minimization problem using orthogonal least squares (OLS). To overcome the inefficiency in quantizing and coding coefficients corresponding to correlated chosen atoms, we orthonormalize the chosen atoms recursively as part of OLS process. We further propose a two-stage video coding framework, in which a second stage codes the residual from the chosen atoms using a modified discrete cosine transform (DCT) dictionary that is adaptively orthonormalized with respect to the subspace spanned by the first stage atoms. To determine the transition from the first stage to the second stage, we propose a rate-distortion (RD) aware adaptive switching algorithm. The proposed framework is further extended to accommodate variable block sizes ($16\times 16$ , $8\times 8$ , and $4\times 4$), and the partition mode is derived by a fast partition mode decision algorithm. A context-adaptive binary arithmetic entropy coder is designed to code the symbols of the proposed coding framework. The proposed coder shows competitive, and in some cases better RD performance, compared with the HEVC video coding standard for P-frames. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
16. Multilevel Split Regression Wavelet Analysis for Lossless Compression of Remote Sensing Data.
- Author
-
Alvarez-Cortes, Sara, Bartrina-Rapesta, Joan, and Serra-Sagrista, Joan
- Abstract
Spectral redundancy is a key element to be exploited in compression of remote sensing data. Combined with an entropy encoder, it can achieve competitive lossless coding performance. One of the latest techniques to decorrelate the spectral signal is the regression wavelet analysis (RWA). RWA applies a wavelet transform in the spectral domain and estimates the detail coefficients through the approximation coefficients using linear regression. RWA was originally coupled with JPEG 2000. This letter introduces a novel coding approach, where RWA is coupled with the predictor of CCSDS-123.0-B-1 standard and a lightweight contextual arithmetic coder. In addition, we also propose a smart strategy to select the number of RWA decomposition levels that maximize the coding performance. Experimental results indicate that, on average, the obtained coding gains vary between 0.1 and 1.35 bits-per-pixel-per-component compared with the other state-of-the-art coding techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
17. Leading or Following? Dyadic Robot ImitativeInteraction Using the Active Inference Framework
- Author
-
Jun Tani and Nadine Wirkuttis
- Subjects
FOS: Computer and information sciences ,Predictive coding ,Control and Optimization ,Computer science ,Computer Science - Artificial Intelligence ,Energy (esotericism) ,Biomedical Engineering ,Inference ,Cognitive robotics ,adaptive control ,Computer Science - Robotics ,Artificial Intelligence ,Synchronization (computer science) ,Neural and Evolutionary Computing (cs.NE) ,activity recognition ,Set (psychology) ,Free energy principle ,business.industry ,Mechanical Engineering ,Computer Science - Neural and Evolutionary Computing ,humanoid robots ,Computer Science Applications ,Term (time) ,Human-Computer Interaction ,Artificial Intelligence (cs.AI) ,Action (philosophy) ,inference algorithms ,Control and Systems Engineering ,Robot ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Robotics (cs.RO) - Abstract
This study investigated how social interaction among robotic agents changes dynamically depending on the individual belief of action intention. In a set of simulation studies, we examine dyadic imitative interactions of robots using a variational recurrent neural network model. The model is based on the free energy principle such that a pair of interacting robots find themselves in a loop, attempting to predict and infer each other's actions using active inference. We examined how regulating the complexity term to minimize free energy determines the dynamic characteristics of networks and interactions. When one robot trained with tighter regulation and another trained with looser regulation interact, the latter tends to lead the interaction by exerting stronger action intention, while the former tends to follow by adapting to its observations. The study confirms that the dyadic imitative interaction becomes successful by achieving a high synchronization rate when a leader and a follower are determined by developing action intentions with strong belief and weak belief, respectively., 8 pages, 5 figures
- Published
- 2021
18. A Truncated Prediction Framework for Streaming Over Erasure Channels.
- Author
-
Etezadi, Farrokh, Khisti, Ashish, and Chen, Jun
- Subjects
- *
BINARY erasure channels (Telecommunications) , *RATE distortion theory , *COMBINED source-channel coding , *ENCODING , *INDEXES - Abstract
We propose a new coding technique for sequential transmission of a stream of Gauss–Markov sources over erasure channels under a zero decoding delay constraint. Our proposed scheme is a combination (hybrid) of predictive coding with truncated memory, and quantization-and-binning. We study the optimality of our proposed scheme using an information theoretic model. In our setup, the encoder observes a stream of source vectors that are spatially independent and identically distributed (i.i.d.) and temporally sampled from a first-order stationary Gauss–Markov process. The channel introduces an erasure burst of a certain maximum length $B$ , starting at an arbitrary time, not known to the transmitter. The reconstruction of each source vector at the destination must be with zero delay and satisfy a quadratic distortion constraint with an average distortion of $D$ . The decoder is not required to reconstruct those source vectors that belong to the period spanning the erasure burst and a recovery window of length $W$ following it. We study the minimum compression rate $R(B,W,D)$ in this setup. As our main result, we establish upper and lower bounds on the compression rate. The upper bound (achievability) is based on our hybrid scheme. It achieves significant gains over baseline schemes such as (leaky) predictive coding, memoryless binning, a separation-based scheme, and a group of pictures-based scheme. The lower bound is established by observing connection to a network source coding problem. The bounds simplify in the high resolution regime, where we provide explicit expressions whenever possible, and identify conditions when the proposed scheme is close to optimal. We finally discuss the interplay between the parameters of our burst erasure channel and the statistical channel models and explain how the bounds in the former model can be used to derive insights into the simulation results involving the latter. In particular, our proposed scheme outperforms the baseline schemes over the i.i.d. erasure channel and the Gilbert–Elliott channel, and achieves performance close to a lower bound in some regimes. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
19. Fast and Lightweight Rate Control for Onboard Predictive Coding of Hyperspectral Images.
- Author
-
Valsesia, Diego and Magli, Enrico
- Abstract
Predictive coding is attractive for compression of hyperspectral images onboard of spacecrafts in light of the excellent rate-distortion performance and low complexity of recent schemes. In this letter, we propose a rate control algorithm and integrate it in a lossy extension to the CCSDS-123 lossless compression recommendation. The proposed rate algorithm overhauls our previous scheme by being orders of magnitude faster and simpler to implement, while still providing the same accuracy in terms of output rate and comparable or better image quality. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
20. Predictive Lossless Compression of Regions of Interest in Hyperspectral Images With No-Data Regions.
- Author
-
Shen, Hongda, Pan, W. David, and Wu, Dongsheng
- Subjects
- *
HYPERSPECTRAL imaging systems , *CODING theory , *LOSSY data compression , *REMOTE sensing , *SUPPORT vector machines - Abstract
This paper addresses the problem of efficient predictive lossless compression on the regions of interest (ROIs) in the hyperspectral images with no-data regions. We propose a two-stage prediction scheme, where a context-similarity-based weighted average prediction is followed by recursive least square filtering to decorrelate the hyperspectral images for compression. We then propose to apply separate Golomb–Rice codes for coding the prediction residuals of the full-context pixels and boundary pixels, respectively. To study the coding gains of this separate coding scheme, we introduce a mixture geometric model to represent the residuals associated with various combinations of the full-context pixels and boundary pixels. Both information-theoretic analysis and simulations on synthetic data confirm the advantage of the separate coding scheme over the conventional coding method based on a single underlying geometric distribution. We apply the aforementioned prediction and coding methods to four publicly available hyperspectral image data sets, attaining significant improvements over several other state-of-the-art methods, including the shape-adaptive JPEG 2000 method. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
21. Exploiting Spatio-Temporal Structure With Recurrent Winner-Take-All Networks.
- Author
-
Santana, Eder, Emigh, Matthew S., Zegers, Pablo, and Principe, Jose C.
- Subjects
- *
ARTIFICIAL neural networks , *OBJECT recognition (Computer vision) - Abstract
We propose a convolutional recurrent neural network (ConvRNNs), with winner-take-all (WTA) dropout for high-dimensional unsupervised feature learning in multidimensional time series. We apply the proposed method for object recognition using temporal context in videos and obtain better results than comparable methods in the literature, including the deep predictive coding networks (DPCNs) previously proposed by Chalasani and Principe. Our contributions can be summarized as a scalable reinterpretation of the DPCNs trained end-to-end with backpropagation through time, an extension of the previously proposed WTA autoencoders to sequences in time, and a new technique for initializing and regularizing ConvRNNs. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
22. Kalman filter-based prediction refinement and quality enhancement for geometry-based point cloud compression
- Author
-
Wang, Lu, Sun, Jian, Yuan, Hui, Hamzaoui, Raouf, and Wang, Xiaohui
- Subjects
Predictive coding ,Kalman filter ,Point clouds - Abstract
—A point cloud is a set of points representing a three-dimensional (3D) object or scene. To compress a point cloud, the Motion Picture Experts Group (MPEG) geometry-based point cloud compression (G-PCC) scheme may use three attribute coding methods: region adaptive hierarchical transform (RAHT), predicting transform (PT), and lifting transform (LT). To improve the coding efficiency of PT, we propose to use a Kalman filter to refine the predicted attribute values. We also apply a Kalman filter to improve the quality of the reconstructed attribute values at the decoder side. Experimental results show that the combination of the two proposed methods can achieve an average Bjontegaard delta bitrate of -0.48%, -5.18%, and -6.27% for the Luma, Chroma Cb, and Chroma Cr components, respectively, compared with a recent G-PCC reference software.
- Published
- 2021
23. Colored-Gaussian Multiple Descriptions: Spectral and Time-Domain Forms.
- Author
-
Ostergaard, Jan, Kochman, Yuval, and Zamir, Ram
- Subjects
- *
RATE distortion theory , *GAUSSIAN channels , *GAUSSIAN distribution , *GAUSSIAN function , *PULSE-code modulation - Abstract
It is well known that Shannon’s rate-distortion function (RDF) in the colored quadratic Gaussian (QG) case can be parametrized via a single Lagrangian variable (the water level in the reverse water filling solution). In this paper, we show that the symmetric colored QG multiple description (MD) RDF in the case of two descriptions can be parametrized in the spectral domain via two Lagrangian variables, which control the tradeoff between the side distortion, the central distortion, and the coding rate. This spectral-domain analysis is complemented by a time-domain scheme-design approach: we show that the symmetric colored QG MD RDF can be achieved by combining ideas of delta–sigma modulation and differential pulse-code modulation. In particular, two source prediction loops, one for each description, are embedded within a common noise-shaping loop, whose parameters are explicitly found from the spectral-domain characterization. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
24. Generalized Independent Component Analysis Over Finite Alphabets.
- Author
-
Painsky, Amichai, Rosset, Saharon, and Feder, Meir
- Subjects
- *
CODING theory , *INDEPENDENT component analysis , *DEEP learning , *ARTIFICIAL neural networks , *DATA compression - Abstract
Independent component analysis (ICA) is a statistical method for transforming an observable multi-dimensional random vector into components that are as statistically independent as possible from each other. Usually, the ICA framework assumes a model according to which the observations are generated (such as a linear transformation with additive noise). ICA over finite fields is a special case of ICA in which both the observations and the independent components are over a finite alphabet. In this paper, we consider a generalization of this framework in which an observation vector is decomposed to its independent components (as much as possible) with no prior assumption on the way it was generated. This generalization is also known as Barlow’s minimal redundancy representation problem and is considered an open problem. We propose several theorems and show that this hard problem can be accurately solved with a branch and bound search tree algorithm, or tightly approximated with a series of linear problems. Our contribution provides the first efficient set of solutions to Barlow’s problem. The minimal redundancy representation (also known as factorial code) has many applications, mainly in the fields of neural networks and deep learning. The binary ICA is also shown to have applications in several domains, including medical diagnosis, multi-cluster assignment, network tomography, and internet resource management. In this paper, we show that this formulation further applies to multiple disciplines in source coding, such as predictive coding, distributed source coding, and coding of large alphabet sources. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
25. Leading or Following? Dyadic Robot ImitativeInteraction Using the Active Inference Framework
- Author
-
Nadine, Wirkuttis, Jun, Tani, Nadine, Wirkuttis, and Jun, Tani
- Abstract
This study investigated how social interaction among robotic agents changes dynamically depending on the individual belief of action intention. In a set of simulation studies, we examine dyadic imitative interactions of robots using a variational recurrent neural network model. The model is based on the free energy principle such that a pair of interacting robots find themselves in a loop, attempting to predict and infer each other's actions using active inference. We examined how regulating the complexity term to minimize free energy determines the dynamic characteristics of networks and interactions. When one robot trained with tighter regulation and another trained with looser regulation interact, the latter tends to lead the interaction by exerting stronger action intention, while the former tends to follow by adapting to its observations. The study confirms that the dyadic imitative interaction becomes successful by achieving a high synchronization rate when a leader and a follower are determined by developing action intentions with strong belief and weak belief, respectively., source:https://ieeexplore.ieee.org/document/9457162
- Published
- 2021
26. The Performance of PredNet Using Predictive Coding in the Visual Cortex: An Empirical Analysis
- Author
-
Michael W. Totaro and Sai Ranganath Mikkilineni
- Subjects
Predictive coding ,Computer science ,business.industry ,Convolutional neural network ,Visualization ,Visual processing ,Visual cortex ,medicine.anatomical_structure ,Research based ,medicine ,Criticism ,Frame (artificial intelligence) ,Artificial intelligence ,business - Abstract
PredNet is a deep recurrent convolutional neural network developed by Lotter et al.. The architecture drew inspiration from a Hierarchical Neuroscience model of visual processing described and demonstrated by Rao and Ballard. In 2020, Rane, Roshan Prakash, et al. published a critical review of PredNet stating its lack of performance in the task of next frame prediction in videos on a crowd sourced action classification dataset. While their criticism was nearly coherent, it is dubious, when observed, considering the findings reported by Rao and Ballard. In this paper, we reevaluate their review through the application of the two primary datasets used by Lotter et al. and Rane, Roshan Prakash et al.. We address gaps, drawing reasoning using the findings reported by Rao and Ballard. As such, we provide a more comprehensive picture for future research based on predictive coding theory.
- Published
- 2021
27. Learning to Perceive the World as Probabilistic or Deterministic via Interaction With Others: A Neuro-Robotics Experiment.
- Author
-
Murata, Shingo, Yamashita, Yuichi, Arie, Hiroaki, Ogata, Tetsuya, Sugano, Shigeki, and Tani, Jun
- Subjects
- *
NEURAL circuitry , *ROBOTICS , *PREDICTION (Psychology) - Abstract
We suggest that different behavior generation schemes, such as sensory reflex behavior and intentional proactive behavior, can be developed by a newly proposed dynamic neural network model, named stochastic multiple timescale recurrent neural network (S-MTRNN). The model learns to predict subsequent sensory inputs, generating both their means and their uncertainty levels in terms of variance (or inverse precision) by utilizing its multiple timescale property. This model was employed in robotics learning experiments in which one robot controlled by the S-MTRNN was required to interact with another robot under the condition of uncertainty about the other’s behavior. The experimental results show that self-organized and sensory reflex behavior—based on probabilistic prediction—emerges when learning proceeds without a precise specification of initial conditions. In contrast, intentional proactive behavior with deterministic predictions emerges when precise initial conditions are available. The results also showed that, in situations where unanticipated behavior of the other robot was perceived, the behavioral context was revised adequately by adaptation of the internal neural dynamics to respond to sensory inputs during sensory reflex behavior generation. On the other hand, during intentional proactive behavior generation, an error regression scheme by which the internal neural activity was modified in the direction of minimizing prediction errors was needed for adequately revising the behavioral context. These results indicate that two different ways of treating uncertainty about perceptual events in learning, namely, probabilistic modeling and deterministic modeling, contribute to the development of different dynamic neuronal structures governing the two types of behavior generation schemes. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
28. Multiple Description Coding Based on Convolutional Auto-Encoder
- Author
-
Lili Meng, Yuwei Ren, Yanyan Tan, Jia Zhang, Huaxiang Zhang, and Hongfei Li
- Subjects
General Computer Science ,Computer science ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,02 engineering and technology ,Data_CODINGANDINFORMATIONTHEORY ,Convolutional neural network ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,predictive coding ,Transform coding ,business.industry ,Deep learning ,Multiple description coding ,General Engineering ,020206 networking & telecommunications ,Pattern recognition ,multiple description coding (MDC) ,Convolutional code ,quality metric ,020201 artificial intelligence & image processing ,Artificial intelligence ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Convolutional auto-encoder (CAE) ,business ,lcsh:TK1-9971 ,Decoding methods ,Image compression - Abstract
Deep learning, such as convolutional neural networks, has been achieved great success in image processing, computer vision task, and image compression, and has achieved better performance. This paper designs a multiple description coding frameworks based on symmetric convolutional auto-encoder, which can achieve high-quality image reconstruction. First, the image is input into the convolutional auto-encoder, and the extracted features are obtained. Then, the extracted features are encoded by the multiple description coding and split into two descriptions for transmission to the decoder. We can get the side information by the side decoder and the central information by the central decoder. Finally, the side information and the central information are deconvolved by convolutional auto-encoder. The experimental results validate that the proposed scheme outperforms the state-of-the-art methods.
- Published
- 2019
29. Visual Hull-Based Geometric Data Compression of a 3-D Object.
- Author
-
Hwang, Sung Soo, Kim, Wook-Joong, Yoo, Jisung, and Kim, Seong Dae
- Subjects
- *
DATA compression , *DATA visualization , *IMAGE compression , *IMAGE encryption , *THREE-dimensional modeling , *SHAPE theory (Topology) - Abstract
As image-based 3-D modeling is used in a variety of applications, accordingly, the compression of 3-D object geometry represented by multiple images becomes an important task. This paper presents a model-based approach to predict the geometric structure of an object using its visual hull. A visual hull is a geometric entity generated by shape-from-silhouette (SFS), and consequently it largely follows the overall shape of the object. The construction of a visual hull is computationally inexpensive and a visual hull can be encoded with relatively small amount of bits because it can be represented with 2-D silhouette images. Therefore, when it comes to the predictive compression of object’s geometric data, the visual hull should be an effective predictor. In the proposed method, the geometric structure of an object is represented by a layered depth image (LDI), and a visual hull from the LDI data is computed via silhouette generation and SFS. The geometry of an object is predicted with the computed visual hull, and the visual hull data with its prediction errors are encoded. Simulation results show that the proposed predictive coding based on visual hull outperforms the previous image-based methods and the partial surface-based method. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
30. Efficient depth map coding using linear residue approximation and a flexible prediction framework.
- Author
-
Lucas, Luis F. R., Rodrigues, Nuno M. M., Pagliari, Carla L., da Silva, Eduardo A. B., and de Faria, Sergio M. M.
- Abstract
The importance to develop more efficient 3D and multiview data representation algorithms results from the recent market growth for 3D video equipments and associated services. One of the most investigated formats is video+depth which uses depth image based rendering (DIBR) to combine the information of texture and depth, in order to create an arbitrary number of views in the decoder. Such approach requires that depth information must be accurately encoded. However, methods usually employed to encode texture do not seem to be suitable for depth map coding. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
31. Efficient quantization parameter coding based on intra/inter prediction for visual quality conscious video coders.
- Author
-
Aoki, Hirofumi
- Abstract
This paper presents an efficient method for coding quantization parameters (QPs) in video coding. Modern real-world video coders modulate QPs within a frame depending on local perceptual sensitivity for the human visual system, aiming to improve subjective quality. Frequent QP modulation increases the code amount of QPs and possibly counteracts subjective quality improvement, and therefore efficient QP coding is important. Based on the fact that similar textures have similar perceptual sensitivities for the human visual system, the proposed method predicts a QP of each coding block from spatially/temporally neighboring blocks referred in intra/ inter prediction. By leveraging coded information of intra/ inter prediction, the proposed method effectively predicts a probable QP from neighboring blocks without any additional bits. Experimental results have shown that the proposed method improves coding efficiency of QPs by from 16% to 20%, whereas the improvement with the conventional method is from 0.2% to 6.9%. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
32. Context Modeling and Correction of Quantization Errors in Prediction Loop.
- Author
-
Zhou, Jiantao and Wu, Xiaolin
- Abstract
In lossy predictive coding of Differential Pulse Code Modulation (DPCM) type, quantization performed in the prediction loop induces propagation of quantization errors, resulting in biased predictions of the subsequent samples. In this work, we aim to alleviate the negative effect of quantization errors on the robustness of prediction. We propose some practical techniques for context modeling of quantization errors and cancelation of estimation biases in the DPCM reconstruction. The resulting refined estimates are fed into the prediction to improve coding efficiency. When applied to 1D audio and 2D image signals, the proposed techniques can reduce the bit rate and at the same time improve the PSNR performance significantly. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
33. A multiresolution algorithm for focal-plane compression.
- Author
-
Wang, Hsuan-Tsung and Leon-Salas, Walter D.
- Abstract
An image compression algorithm suitable for focal plane integration and its hardware implementation are presented. In this approach an image is progressively decomposed into images of lower resolution. The low resolution images are then used as the predictors of the higher resolution images. The prediction residuals are entropy encoded and compressed. This compression approach can provide lossless or lossy compression and the resulting bitstream is a fully embedded code. A switched-capacitor circuit is proposed to implement the required operations. A prototype has been implemented on a 0.5 µm CMOS process. Simulation and measurements results validating the proposed approach are reported. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
34. 3D spatial reconstruction and communication from vision field.
- Author
-
Cao, Xun, Wang, Qifei, Ji, Xiangyang, and Dai, Qionghai
- Abstract
Vision field describes the real world visual information by summarizing the seven-dimensional plenoptic function into three domains: view, light, time, from which a better understanding of previous 3D capture and reconstruction systems can be provided. In this paper, we first show how to reconstruct 3D spatial information from all the three attributes of the vision field, namely full-space vision field reconstruction. Then, based on Laplacian iterative geometry prediction, a 3D mesh coding algorithm with cascaded quantization is presented to facilitate the communication of the reconstructed 3D models from vision field. At last, experimental results of both the 3D spatial reconstruction and the 3D mesh coding are demonstrated. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
35. Audio codingwith power spectral density preserving quantization.
- Author
-
Li, Minyue, Klejsa, Janusz, Ozerov, Alexey, and Kleijn, W. Bastiaan
- Abstract
The coding of audio-visual signals is generally based on different paradigms for high and low rates. At high rates the signal is approximated directly and at low rates only signal features are transmitted. The recently introduced distribution preserving quantization (DPQ) paradigm provides a seamless transition between these two regimes. In this paper we present a simplified scheme that preserves the power spectral density (PSD) rather than the probability distribution. In a practical system the PSD must be estimated. We show that both forward adaptive and backward adaptive PSD estimation are possible. Our experimental results confirm that preservation of PSD at finite precision leads to a unified coding paradigm that provides effective coding at both high and low rates. An audio coding application shows the perceptual benefits of PSD preserving quantization. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
36. A cloud computing-based image codec system for lossless compression of images on a Cortex-A8 platform.
- Author
-
Kau, Lih-Jen, Chung, Cheng-Chang, and Chen, Ming-Sian
- Abstract
Aimed to provide an efficient architecture for the storage and processing of medical images, a cloud computing-based system is proposed in this paper. In addition, with the highly increased computational ability of embedded systems, an ARM-based Cortex-A8 embedded platform is applied in the proposed architecture as the kernel for the lossless compression, decompression, as well as the processing of medical images. All the images generated are first sent to the embedded platform for lossless compression before stored in the image database server. On the other hand, images are retrieved and first sent to the platform for decompression, and then transmitted via the network to the client for viewing purpose. In the proposed system, the decompressed images are transmitted with the protocol of HTTP, which means only a browser is needed for the client. Moreover, the client can perform image processing, e.g, image sharpening and histogram equalization with respect to the selected image in the browser, no other application program is required. As no other application program is required for the client other than the browser, the proposed system architecture is very easy to maintain with, and the computational burden traditionally imposed on the client can be quite alleviated. Furthermore, the coding algorithm used is very effective and efficient for the lossless compression of images as we will see in the experiment. The cloud computing-based architecture in conjunction with the high efficient coding algorithm make the proposed system very feasible for practical usage. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
37. Research on a quasi-lossless compression algorithm based on huffman coding.
- Author
-
Ren Weizheng, Wang Haobo, Xu Lianming, and Cui Yansong
- Abstract
A quasi-lossless compression algorithm for location information within civilian global positioning system (GPS) was proposed based on the analysis and comparison of performance between Huffman coding and arithmetic coding thus improving location data compression ratio and compression speed within civilian GPS accuracy. In the algorithm, predictive coding and Huffman coding were organically combined. Furthermore, this algorithm removed redundant information through compression preprocessing and secondary quantization. The coding efficiency improved with the use of predictive coding. Tests with an MSP430 microcontroller showed that compression ratio of the proposed algorithm was 87.1% and processing time is 31.4s when compression data amount reached 668kB, which coincided well with simulation results. Experimental results indicated that the proposed algorithm had low requirement for hardware resources after optimization and resulted in improved compression ratio and fast coding speed, which saved storage resources and communication cost for data transmission. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
38. Initialization, Limitation, and Predictive Coding of the Depth and Texture Quadtree in 3D-HEVC.
- Author
-
Mora, Elie Gabriel, Jung, Joel, Cagnazzo, Marco, and Pesquet-Popescu, Beatrice
- Subjects
- *
VIDEO coding , *ENCODING , *QUADTREES , *RUN time systems (Computer science) , *BIT error rate - Abstract
The 3D video extension of High Efficiency Video Coding (3D-HEVC) exploits texture-depth redundancies in 3D videos using intercomponent coding tools. It also inherits the same quadtree coding structure as HEVC for both components. The current software implementation of 3D-HEVC includes encoder shortcuts that speed up the quadtree construction process, but those are always accompanied by coding losses. Furthermore, since the texture and its associated depth represent the same scene, at the same time instant and view point, their quadtrees are closely linked. In this paper, an intercomponent tool is proposed in which this link is exploited to save both runtime and bits through a joint coding of the quadtrees. If depth is coded before the texture, the texture quadtree is initialized from the coded depth quadtree. Otherwise, the depth quadtree is limited to the coded texture quadtree. A 31% encoder runtime saving, a −0.3% gain for coded and synthesized views and a −1.8% gain for coded views are reported for the second method. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
39. Lossless to Lossy Dual-Tree BEZW Compression for Hyperspectral Images.
- Author
-
Kai-jen Cheng and Dill, Jeffrey
- Subjects
- *
HYPERSPECTRAL imaging systems , *LOSSLESS data compression , *LOSSY data compression , *WAVELET transforms , *WAVELETS (Mathematics) - Abstract
This paper proposes a lossless to lossy compression scheme for hyperspectral images based on dual-tree Binary Embedded Zerotree Wavelet (BEZW) algorithm. The algorithm adapts Karhunen-Loève Transform and Discrete Wavelet Transform to achieve 3-D integer reversible hybrid transform and decorrelate spectral and spatial data. Since statistics of the hyperspectral image are not symmetrical, the asymmetrical dual-tree structure is introduced. The 3-D BEZW algorithm compresses hyperspectral images by implementing progressive bitplane coding. The lossless and lossy compression performance is compared with other state-of-the-art predictive coding and transform-based coding algorithms on Airborne Visible/Infrared Imaging Spectrometer images. Results show that the 3-D-BEZW lossless compression performance is comparable with the best predictive algorithms, while its computational cost is comparable with those of transform-based algorithms. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
40. Multiple Description Coding With Randomly and Uniformly Offset Quantizers.
- Author
-
Meng, Lili, Liang, Jie, Samarawickrama, Upul, Zhao, Yao, Bai, Huihui, and Kaup, Andre
- Subjects
- *
IMAGE compression , *QUANTUM groups , *MATHEMATICAL optimization , *SIGNAL quantization , *IMAGE reconstruction , *ITERATIVE methods (Mathematics) - Abstract
In this paper, two multiple description coding schemes are developed, based on prediction-induced randomly offset quantizers and unequal-deadzone-induced near-uniformly offset quantizers, respectively. In both schemes, each description encodes one source subset with a small quantization stepsize, and other subsets are predictively coded with a large quantization stepsize. In the first method, due to predictive coding, the quantization bins that a coefficient belongs to in different descriptions are randomly overlapped. The optimal reconstruction is obtained by finding the intersection of all received bins. In the second method, joint dequantization is also used, but near-uniform offsets are created among different low-rate quantizers by quantizing the predictions and by employing unequal deadzones. By generalizing the recently developed random quantization theory, the closed-form expression of the expected distortion is obtained for the first method, and a lower bound is obtained for the second method. The schemes are then applied to lapped transform-based multiple description image coding. The closed-form expressions enable the optimization of the lapped transform. An iterative algorithm is also developed to facilitate the optimization. Theoretical analyzes and image coding results show that both schemes achieve better performance than other methods in this category. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
41. A Coarse Representation of Frames Oriented Video Coding By Leveraging Cuboidal Partitioning of Image Data
- Author
-
Manoranjan Paul, David Taubman, Manzur Murshed, and Ashek Ahmmed
- Subjects
Predictive coding ,Cuboid ,Coding algorithm ,Computational complexity theory ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Video sequence ,030229 sport sciences ,02 engineering and technology ,03 medical and health sciences ,0302 clinical medicine ,Bit rate ,0202 electrical engineering, electronic engineering, information engineering ,Codec ,020201 artificial intelligence & image processing ,Algorithm ,Coding (social sciences) - Abstract
Video coding algorithms attempt to minimize the significant commonality that exists within a video sequence. Each new video coding standard contains tools that can perform this task more efficiently compared to its predecessors. In this work, we form a coarse representation of the current frame by minimizing commonality within that frame while preserving important structural properties of the frame. The building blocks of this coarse representation are rectangular regions called cuboids, which are computationally simple and has a compact description. Then we propose to employ the coarse frame as an additional source for predictive coding of the current frame. Experimental results show an improvement in bit rate savings over a reference codec for HEVC, with minor increase in the codec computational complexity.
- Published
- 2020
42. Generative Pre-Training for Speech with Autoregressive Predictive Coding
- Author
-
Yu-An Chung and James Glass
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Computer Science - Machine Learning ,Computer science ,Speech recognition ,010501 environmental sciences ,01 natural sciences ,Computer Science - Sound ,Machine Learning (cs.LG) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Audio and Speech Processing (eess.AS) ,Speech translation ,FOS: Electrical engineering, electronic engineering, information engineering ,0105 earth and related environmental sciences ,Transformer (machine learning model) ,Predictive coding ,Computer Science - Computation and Language ,Autoregressive model ,Spectrogram ,0305 other medical science ,Transfer of learning ,Computation and Language (cs.CL) ,Feature learning ,Generative grammar ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging. In this paper we propose to use autoregressive predictive coding (APC), a recently proposed self-supervised objective, as a generative pre-training approach for learning meaningful, non-specific, and transferable speech representations. We pre-train APC on large-scale unlabeled data and conduct transfer learning experiments on three speech applications that require different information about speech characteristics to perform well: speech recognition, speech translation, and speaker identification. Extensive experiments show that APC not only outperforms surface features (e.g., log Mel spectrograms) and other popular representation learning methods on all three tasks, but is also effective at reducing downstream labeled data size and model parameters. We also investigate the use of Transformers for modeling APC and find it superior to RNNs., Accepted to ICASSP 2020. Code and pre-trained models are available at https://github.com/iamyuanchung/Autoregressive-Predictive-Coding
- Published
- 2020
43. A Fixed Point Approach to Analysis and Optimization of Motion Compensated Predictive Coding.
- Author
-
Li, Xin
- Abstract
In this paper, we propose a fixed point theoretical analysis of motion compensated predictive coding and demonstrate its potential in encoder optimization. Through viewing the encoder-decoder pair as a nonlinear dynamical system and inquiring about its convergent property, we demonstrate the feasibility of approximating the fixed point through recursive coding both theoretically and experimentally. Such a recursive coding approach to encoder optimization admits an interpretation of finding a more compact representation through local perturbation on the perceptual manifold. Experimental results have shown that our approach can achieve bit savings of 5-40\% without sacrificing visual quality when tested on the KTA implementation of H.264 (JM14.2). [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
44. Context Dependent Encoding Using Convolutional Dynamic Networks.
- Author
-
Chalasani, Rakesh and Principe, Jose C.
- Subjects
- *
ENCODING , *SYMBOLISM in communication , *DECODERS & decoding , *SIGNS & symbols , *SYMBOLISM - Abstract
Perception of sensory signals is strongly influenced by their context, both in space and time. In this paper, we propose a novel hierarchical model, called convolutional dynamic networks, that effectively utilizes this contextual information, while inferring the representations of the visual inputs. We build this model based on a predictive coding framework and use the idea of empirical priors to incorporate recurrent and top-down connections. These connections endow the model with contextual information coming from temporal as well as abstract knowledge from higher layers. To perform inference efficiently in this hierarchical model, we rely on a novel scheme based on a smoothing proximal gradient method. When trained on unlabeled video sequences, the model learns a hierarchy of stable attractors, representing low-level to high-level parts of the objects. We demonstrate that the model effectively utilizes contextual information to produce robust and stable representations for object recognition in video sequences, even in case of highly corrupted inputs. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
45. Gaussian Robust Sequential and Predictive Coding.
- Author
-
Song, Lin, Chen, Jun, Wang, Jia, and Liu, Tie
- Subjects
- *
GAUSSIAN distribution , *CODING theory , *MARKOV processes , *MATHEMATICS theorems , *HYPERPLANES - Abstract
We introduce two new source coding problems: robust sequential coding and robust predictive coding. For the Gauss–Markov source model with the mean squared error distortion measure, we characterize certain supporting hyperplanes of the rate region of these two coding problems. Our investigation also reveals an information-theoretic minimax theorem and the associated extremal inequalities. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
46. Efficient Fine-Granular Scalable Coding of 3D Mesh Sequences.
- Author
-
Ahn, Jae-Kyun, Koh, Yeong Jun, and Kim, Chang-Su
- Abstract
An efficient fine-granular scalable coding algorithm of 3-D mesh sequences for low-latency streaming applications is proposed in this work. First, we decompose a mesh sequence into spatial and temporal layers to support scalable decoding. To support the finest-granular spatial scalability, we decimate only a single vertex at each layer to obtain the next layer. Then, we predict the coordinates of decimated vertices spatially and temporally based on a hierarchical prediction structure. Last, we quantize and transmit the spatio-temporal prediction residuals using an arithmetic coder. We propose an efficient context model for the arithmetic coding. Experiment results show that the proposed algorithm provides significantly better compression performance than the conventional algorithms, while supporting finer-granular spatial scalability. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
47. Analyzing the Optimality of Predictive Transform Coding Using Graph-Based Models.
- Author
-
Zhang, Cha and Florencio, Dinei
- Subjects
GAUSSIAN Markov random fields ,CODING theory ,MATRICES (Mathematics) ,PROBABILITY theory - Abstract
In this letter, we provide a theoretical analysis of optimal predictive transform coding based on the Gaussian Markov random field (GMRF) model. It is shown that the eigen-analysis of the precision matrix of the GMRF model is optimal in decorrelating the signal. The resulting graph transform degenerates to the well-known 2-D discrete cosine transform (DCT) for a particular 2-D first order GMRF, although it is not a unique optimal solution. Furthermore, we present an optimal scheme to perform predictive transform coding based on conditional probabilities of a GMRF model. Such an analysis can be applied to both motion prediction and intra-frame predictive coding, and may lead to improvements in coding efficiency in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
48. Segmentation of Source Symbols for Adaptive Arithmetic Coding.
- Author
-
Zhang, Liang, Wang, Demin, and Zheng, Dong
- Subjects
- *
IMAGE segmentation , *ADAPTIVE computing systems , *VIDEO coding , *DATA transmission systems , *ENCODING , *VIDEO compression , *VECTOR quantization - Abstract
Adaptive arithmetic coding is a general technique for coding source symbols of a stochastic process based on an adaptive model. The adaptive model provides measures of the statistics of source symbols and is updated, along with encoding/decoding processes, when more encoded/decoded symbols are fed as samples to the adaptive model. The coding performance depends on how well the adaptive model fits the statistics of source symbols. If the number of source symbols is large and the number of samples is small, the adaptive model may not be able to provide valid measures of the statistics, which results in an inefficient coding performance of the adaptive arithmetic coder. To this end, this paper presents segmentation of source symbols to improve the performance of the adaptive arithmetic coder. Each source symbol is divided into several segments. Each segment is separately coded with an adaptive arithmetic coder. With this division, possible values of each segment are concentrated within a small range. Given the limited number of samples, this concentration leads to a better fit of the adaptive model to the statistics of source symbols and therefore to an improvement of the coding efficiency. The proposed coding algorithm is applied to lossless motion vector coding for video transmission as an application example to show its performance improvement and coding gains. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
49. A New Encoder for Continuous-Time Gaussian Signals With Fixed Rate and Reconstruction Delay.
- Author
-
Marelli, Damián, Mahata, Kaushik, and Fu, Minyue
- Subjects
- *
BAYESIAN analysis , *MONTE Carlo method , *QUANTIZATION methods (Quantum mechanics) , *STATE-space methods , *DIGITAL electronics - Abstract
In this paper, we propose a method for encoding continuous-time Gaussian signals subject to a usual data rate constraint and, more importantly, a reconstruction delay constraint. We first apply a Karhunen-Loève decomposition to reparameterize the continuous-time signal as a discrete sequence of vectors. We then study the optimal recursive quantization of this sequence of vectors. Since the optimal scheme turns out to have a very cumbersome design, we consider a simplified method, for which a numerical example suggests that the incurred performance loss is negligible. In this simplified method, we first build a state space model for the vector sequence and then use Bayessian tracking to sequentially encode each vector. The tracking task is performed using particle filtering. Numerical experiments show that the proposed approach offers visible advantages over other available approaches, especially when the reconstruction delay is small. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
50. Predictive Quantization of Range-Focused SAR Raw Data.
- Author
-
Ikuma, T., Naraghi-Pour, M., and Lewis, T.
- Subjects
- *
SYNTHETIC aperture radar , *SIGNAL quantization , *DIFFERENTIAL pulse code modulation , *REMOTE sensing by radar , *SIGNAL-to-noise ratio - Abstract
Synthetic aperture radar (SAR) systems create massive amounts of data which require huge resources for transmission or storage. The limited capacity of the downlink channel demands efficient onboard compression of SAR data. However, SAR raw data exhibit very little correlation which can be exploited in a compression algorithm. Range focusing is shown to increase the data correlation by exposing some of the distinctive features of the scene under surveillance. In this paper, we first present analysis of spotlight-mode SAR to show the source of the increased correlation in the range-focused data. Next, we propose two algorithms-transform-domain block predictive quantization (TD-BPQ) and transform-domain block predictive trellis-coded quantization (TD-BPTCQ)-for the compression of the range-focused data. Experimental results indicate that, at the rate of 1 bit/sample, and for similar or lower computational complexity, TD-BPQ and TD-BPTCQ outperform the best method proposed in the literature by 1.5 and 2.3 dB in signal-to-quantization-noise ratio, respectively. Similar improvements are observed for the rate of 2 bits/sample. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.