Author: "Miska Hannuksela" - Searchworks@Jio Institute Digital Library Search Results

1. Overview of the Neural Network Compression and Representation (NNR) Standard

Author: Hamed Rezazadegan-Tavakoli, Wojciech Samek, Werner Bailer, Paul Haase, Karsten Muller, Swayambhoo Jain, Francesco Cricri, Miska Hannuksela, Shan Liu, Emre Aksu, Wei Jiang, Shahab Hamidi-Rad, Fabien Racape, Heiner Kirchhoffer, and Wei Wang
Subjects: Artificial neural network, Computer science, Quantization (signal processing), Encoding (memory), Media Technology, Data_CODINGANDINFORMATIONTHEORY, Pruning (decision trees), Electrical and Electronic Engineering, Representation (mathematics), Bitstream format, Algorithm, Decoding methods, Coding (social sciences)
Abstract: Neural Network Coding and Representation (NNR) is the first international standard for efficient compression of neural networks (NNs). The standard is designed as a toolbox of compression methods, which can be used to create coding pipelines. It can be either used as an independent coding framework (with its own bitstream format) or together with external neural network formats and frameworks. For providing the highest degree of flexibility, the network compression methods operate per parameter tensor in order to always ensure proper decoding, even if no structure information is provided. The NNR standard contains compression-efficient quantization and deep context-adaptive binary arithmetic coding (DeepCABAC) as core encoding and decoding technologies, as well as neural network parameter pre-processing methods like sparsification, pruning, low-rank decomposition, unification, local scaling and batch norm folding. NNR achieves a compression efficiency of more than 97% for transparent coding cases, i.e. without degrading classification quality, such as top-1 or top-5 accuracies. This paper provides an overview of the technical features and characteristics of NNR.
Published: 2022
Full Text: View/download PDF

2. An Overview of Omnidirectional MediA Format (OMAF)

Author: Ye-Kui Wang and Miska Hannuksela
Subjects: Multimedia, Computer science, Interoperability, Timed text, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Virtual reality, computer.software_genre, Variety (cybernetics), Metadata, Electrical and Electronic Engineering, Omnidirectional antenna, Mpeg standards, computer, Transform coding
Abstract: During recent years, there have been product launches and research for enabling immersive audio–visual media experiences. For example, a variety of head-mounted displays and 360° cameras are available in the market. To facilitate interoperability between devices and media system components by different vendors, the Moving Picture Experts Group (MPEG) developed the Omnidirectional MediA Format (OMAF), which is arguably the first virtual reality (VR) system standard. OMAF is a storage and streaming format for omnidirectional media, including 360° video and images, spatial audio, and associated timed text. This article provides a comprehensive overview of OMAF.
Published: 2021
Full Text: View/download PDF

3. Stochastic Binary-Ternary Quantization for Communication Efficient Federated Computation

Author: Rangu Goutham, Homayun Afrabandpey, Francesco Cricri, Honglei Zhang, Emre Aksu, Miska Hannuksela, and Hamed R. Tavakoli
Published: 2022
Full Text: View/download PDF

4. Efficient Topology Coding and Payload Partitioning Techniques for Neural Network Compression (NNC) Standard

Author: Jaakko Laitinen, Alexandre Mercat, Jarno Vanne, Hamed Rezazadegan Tavakoli, Francesco Cricri, Emre Aksu, Miska Hannuksela, Tampere University, and Computing Sciences
Subjects: 113 Computer and information sciences
Abstract: A Neural Network Compression (NNC) standard aims to define a set of coding tools for efficient compression and transmission of neural networks. This paper addresses the high-level syntax (HLS) of NNC and proposes three HLS techniques for network topology coding and payload partitioning. Our first technique provides an efficient way to code prune topology information. It removes redundancy in the bitmask and thereby improves coding efficiency by 4–‍99% over existing approaches. The second technique processes bitmasks in larger chunks instead of one bit at a time. It is shown to reduce computational complexity of NNC encoding by 63% and NNC decoding by 82%. Our third technique makes use of partial data counters to partition an NNC bitstream into uniformly sized units for more efficient data transmission. Even though the smaller partition sizes introduce some overhead, our network simulations show better throughput due to lower packet retransmission rates. To our knowledge, this the first work to address the practical implementation aspects of HLS. The proposed techniques can be seen as key enabling factors for efficient adaptation and economical deployment of the NNC standard in a plurality of next-generation industrial and academic applications. acceptedVersion
Published: 2022

5. Adaptation and Attention for Neural Video Coding

Author: Nannan Zou, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu, Tampere University, and Computing Sciences
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, FOS: Electrical engineering, electronic engineering, information engineering, Data_CODINGANDINFORMATIONTHEORY, Electrical Engineering and Systems Science - Image and Video Processing, 113 Computer and information sciences, Machine Learning (cs.LG)
Abstract: Neural image coding represents now the state-of-The-Art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-To-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired with an inter-frame codec. As one architectural novelty, we propose to train the inter-frame codec model to adapt the motion estimation process based on the resolution of the input video. A second architectural novelty is a new neural block that combines concepts from split-Attention based neural networks and from DenseNets. Finally, we propose to overfit a set of decoder-side multiplicative parameters at inference time. Through ablation studies and comparisons to prior art, we show the benefits of our proposed techniques in terms of coding gains. We compare our codec to VVC/H.266 and RLVC, which represent the state-of-The-Art traditional and end-To-end learned codecs, respectively, and to the top performing end-To-end learned approach in 2021 CLIC competition, E2E_T_OL. Our codec clearly outperforms E2E_T_OL, and compare favorably to VVC and RLVC in some settings. acceptedVersion
Published: 2021
Full Text: View/download PDF

6. Shared Coded Picture Technique for Tile-Based Viewport-Adaptive Streaming of Omnidirectional Video

Author: Miska Hannuksela, Moncef Gabbouj, Alireza Aminlou, Alireza Zare, and Ramin Ghaznavi-Youvalari
Subjects: Viewport, business.industry, Computer science, 02 engineering and technology, Virtual reality, visual_art, Bit rate, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, visual_art.visual_art_medium, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Tile, Electrical and Electronic Engineering, business, Omnidirectional antenna
Abstract: Tile-based viewport-adaptive streaming methods have been used in delivering omnidirectional video for virtual reality applications. In these methods, the 360° video is encoded in multiple quality versions by using the motion constrained tile set (MCTS) technique. A set of high-quality and low-quality tiles, corresponding to viewport and non-viewport areas, respectively, are selected and transmitted to the user. However, these methods require frequent intra random access points to ensure seamless viewport switching capability, very high decoding complexity, or a multi-layer coding scheme. The frequent intra random access points include very high bitrate in viewport switching points. The high decoding complexity and multi-layer decoder requirements are not aligned with the omnidirectional media format (OMAF) standard. Such requirements make these methods sub-optimal or impractical for streaming the omnidirectional video. This paper studies the current tile-based solutions for delivering the omnidirectional content. Moreover, the OMAF-compliant shared coded picture (SCP)-based scheme is proposed in this paper for streaming the omnidirectional video. The core concept of the SCP-based method is to manipulate the switching point pictures in a way that the frequent intra-coded pictures are no longer required for the viewport switching operations between different quality versions of the content. The experiments illustrated that the SCP-based method outperforms the MCTS-based method on average by 11% to 14% in terms of streaming bitrate reduction with only 4% extra decoding complexity.
Published: 2019
Full Text: View/download PDF

7. 6K and 8K Effective Resolution with 4K HEVC Decoding Capability for 360 Video Streaming

Author: Miska Hannuksela, Alireza Zare, Maryam Homayouni, Moncef Gabbouj, and Alireza Aminlou
Subjects: Viewport, Computer Networks and Communications, business.industry, Computer science, 020206 networking & telecommunications, 02 engineering and technology, Virtual reality, Cube mapping, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, Equirectangular projection, 020201 artificial intelligence & image processing, Video streaming, business, Omnidirectional antenna, Computer hardware, Decoding methods, Coding (social sciences)
Abstract: The recent Omnidirectional MediA Format (OMAF) standard, which specifies the delivery of 360° video content, supports only equirectangular projection (ERP) and cubemap projection and their region-wise packing with a limitation on video decoding capability to the maximum resolution of 4K (e.g., 4,096 × 2,048). Streaming of 4K ERP content allows only a limited viewport resolution, which is lower than the resolution of many current head-mounted displays (HMDs). Therefore, to take full advantage of high-resolution HMDs, delivery of 360° video content beyond 4K resolution needs to be enabled. In this regard, we propose two specific mixed-resolution packing schemes of 6K (e.g., 6,144 × 3,072) and 8K (e.g., 8,192 × 4,096) ERP content and their realization in tile-based streaming, while complying with the 4K decoding constraint and the High Efficiency Video Coding standard. The proposed packing schemes offer 6K and 8K effective resolution at the viewport. Using our proposed test methodology, experimental results indicate that the proposed layouts significantly decrease streaming bitrates when compared to mixed-quality viewport-adaptive streaming of 4K ERP. Our results further indicate that 8K-effective packing outperforms 6K-effective packing especially in high-quality videos.
Published: 2019
Full Text: View/download PDF

8. Learned video compression with intra-guided enhancement and implicit motion information

Author: Esa Rahtu, Nannan Zou, Hamed R. Tavakoli, Francesco Cricri, Jani Lainema, Miska Hannuksela, Emre Aksu, Honglei Zhang, Tampere University, and Computing Sciences
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Data_CODINGANDINFORMATIONTHEORY, 113 Computer and information sciences, Pipeline (software), Set (abstract data type), Compression (functional analysis), Codec, Computer vision, Artificial intelligence, Entropy encoding, business, Data compression, Reference frame, Image compression
Abstract: Although learned approaches to video compression have been proposed with promising results, hand-engineered video codecs are still unbeaten. On the other hand, learned image compression has already surpassed traditional image codecs. In this paper, we propose a learned video compression system that mimics part of the pipeline of traditional codecs while leveraging learned image compression. It comprises two main modules: a learned intra-frame compression module, and a learned inter-frame compression module that is conditioned on intra-coded frames. These modules use separate learned probability models for entropy coding. The intra-frame codec uses a variant of nonlocal attention layers. Regarding the inter-frame codec, we propose an implicit motion information mechanism, and an enhancement of the inter-frame predictions by leveraging the high quality information of intra-coded frames. On the learned probability model side, we propose to use the reference frames as additional conditioning information. We used this system as our submitted entry for the 2021 Challenge on Learned Image Compression (CLIC). In our experiments, we show the effectiveness of our system and its components via a set of ablation studies. acceptedVersion
Published: 2021

9. Lossless Image Compression Using a Multi-scale Progressive Statistical Model

Author: Francesco Cricri, Honglei Zhang, Emre Aksu, Nannan Zou, Hamed R. Tavakoli, and Miska Hannuksela
Subjects: Lossless compression, Pixel, business.industry, Computer science, Deep learning, Statistical model, Data compression ratio, 02 engineering and technology, Autoregressive model, Margin (machine learning), 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Artificial intelligence, business, Algorithm
Abstract: Lossless image compression is an important technique for image storage and transmission when information loss is not allowed. With the fast development of deep learning techniques, deep neural networks have been used in this field to achieve a higher compression rate. Methods based on pixel-wise autoregressive statistical models have shown good performance. However, the sequential processing way prevents these methods to be used in practice. Recently, multi-scale autoregressive models have been proposed to address this limitation. Multi-scale approaches can use parallel computing systems efficiently and build practical systems. Nevertheless, these approaches sacrifice compression performance in exchange for speed. In this paper, we propose a multi-scale progressive statistical model that takes advantage of the pixel-wise approach and the multi-scale approach. We developed a flexible mechanism where the processing order of the pixels can be adjusted easily. Our proposed method outperforms the state-of-the-art lossless image compression methods on two large benchmark datasets by a significant margin without degrading the inference speed dramatically.
Published: 2021
Full Text: View/download PDF

10. The High-Level Syntax of the Versatile Video Coding (VVC) Standard

Author: Gary J. Sullivan, Sachin G. Deshpande, Virginie Drugeon, Vadim Seregin, Wade Wan, Jill Macdonald Boyce, Miska Hannuksela, Yago Sanchez, Hendry, Choi Byeongdoo, Ye-Kui Wang, Robert Skupin, Rickard Sjöberg, and Publica
Subjects: Syntax (programming languages), Computer architecture, Computer science, Interface (computing), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Media Technology, Use case, Electrical and Electronic Engineering, Joint (audio engineering), Decoding methods, Coding (social sciences)
Abstract: Versatile Video Coding (VVC), a.k.a. ITU-T H.266 | ISO/IEC 23090-3, is the new generation video coding standard that has just been finalized by the Joint Video Experts Team (JVET) of ITU-T VCEG and ISO/IEC MPEG at its $19^{\mathrm {th}}$ meeting ending on July 1, 2020. This paper gives an overview of the VVC high-level syntax (HLS), which forms its system and transport interface. Comparisons to the HLS designs in High Efficiency Video Coding (HEVC) and Advanced Video Coding (AVC), the previous major video coding standards, are included. When discussing new HLS features introduced into VVC or differences relative to HEVC and AVC, the reasoning behind the design differences and the benefits they bring are described. The HLS of VVC enables newer and more versatile use cases such as video region extraction, composition and merging of content from multiple coded video bitstreams, and viewport-adaptive 360° immersive media.
Published: 2021

11. On Subpicture-based Viewport-dependent 360-degree Video Streaming using VVC

Author: Maryam Homayouni, Miska Hannuksela, and Alireza Aminlou
Subjects: Viewport, Computer science, Video capture, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Virtual reality, Frame rate, Bandwidth (computing), Computer vision, Artificial intelligence, business, Network Abstraction Layer, Decoding methods, Coding (social sciences)
Abstract: Virtual reality applications create an immersive experience using 360° video with high resolution and frame rate. However, since the user only views a portion of 360° video according to his/her current viewport, streaming the whole content with high resolution causes bandwidth wastage. To address this issue, viewport-dependent approaches have been proposed such that only the part of the video which falls within user's current viewport is transmitted in high quality while the rest of the content is transmitted in lower quality. The selection of high- and low-quality parts is constantly adapted according to the user's head motion, which requires frequent intra coded frames at switching points, leading to an increment in the overall streaming bitrate. In this paper a viewport-adaptive streaming scheme is introduced, which avoids intra frames at switching points by introducing long intra period for non-changing parts of the content during head motion. This scheme has been realized taking advantage of mixed Video Coding Layer (VCL) Network Abstraction Layer (NAL) unit feature of Versatile Video Coding (VVC) standard. This method reduces bitrate significantly, especially for the sequences with either no or only slow camera motion, which is common for 360° video capturing.
Published: 2020
Full Text: View/download PDF

12. L2C – Learning to Learn to Compress

Author: Francesco Cricri, Nannan Zou, Esa Rahtu, Jani Lainema, Hamed R. Tavakoli, Miska Hannuksela, Emre Aksu, and Honglei Zhang
Subjects: Computer Science::Machine Learning, Lossless compression, Artificial neural network, Standard test image, Computer science, business.industry, Inference, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, Overfitting, 01 natural sciences, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Cluster analysis, business, Encoder, 0105 earth and related environmental sciences, Image compression
Abstract: In this paper we present an end-to-end meta-learned system for image compression. Traditional machine learning based approaches to image compression train one or more neural network for generalization performance. However, at inference time, the encoder or the latent tensor output by the encoder can be optimized for each test image. This optimization can be regarded as a form of adaptation or benevolent overfitting to the input content. In order to reduce the gap between training and inference conditions, we propose a new training paradigm for learned image compression, which is based on meta-learning. In a first phase, the neural networks are trained normally. In a second phase, the Model-Agnostic Meta-learning approach is adapted to the specific case of image compression, where the inner-loop performs latent tensor overfitting, and the outer loop updates both encoder and decoder neural networks based on the overfitting performance. Furthermore, after meta-learning, we propose to overfit and cluster the bias terms of the decoder on training image patches, so that at inference time the optimal content-specific bias terms can be selected at encoder-side. Finally, we propose a new probability model for lossless compression, which combines concepts from both multi-scale and super-resolution probability model approaches. We show the benefits of all our proposed ideas via carefully designed experiments.
Published: 2020
Full Text: View/download PDF

13. Comparison of HEVC-based OMAF-compliant 6K effective packings for viewport-dependent 360-degree video streaming

Author: Alireza Aminlou, Maryam Homayouni, Miska Hannuksela, Moncef Gabbouj, and Alireza Zare
Subjects: Viewport, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Equirectangular projection, Context (language use), Omnidirectional antenna, Projection (set theory), Representation (mathematics), Cube mapping, Coding (social sciences), Computational science
Abstract: Omnidirectional MediA Format (OMAF), a recent standard for representation of 360° video content, supports only equirectangular projection (ERP) and cubemap projection (CMP) and their region-wise packing with a 4K decoding-capability constraint on the end-user devices. Streaming of 4K content allows only a limited viewport resolution, which is lower than the resolution of many current head-mounted displays (HMDs). Two 6K-effective multi-resolution packing schemes for ERP and CMP are recommended by the OMAF standard, which take full advantages of HMDs with quad HD resolution while sticking with 4K-decoding constraint. This paper studies and compares the performance of these two packings in the context of viewport-adaptive streaming using the High Efficiency Video Coding (HEVC) standard and Zonal-PSNR test methodology. Overall, the experimental results show that rate-distortion performance of both packings sequence wise mixed but similar in average.
Published: 2020
Full Text: View/download PDF

14. End-to-End Learning for Video Frame Compression with Self-Attention

Author: Miska Hannuksela, Emre Aksu, Hamed R. Tavakoli, Honglei Zhang, Esa Rahtu, Jani Lainema, Francesco Cricri, and Nannan Zou
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Context model, Channel allocation schemes, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Frame (networking), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, Electrical Engineering and Systems Science - Image and Video Processing, Motion (physics), Machine Learning (cs.LG), Arithmetic coding, End-to-end principle, FOS: Electrical engineering, electronic engineering, information engineering, Codec, Computer vision, Artificial intelligence, business
Abstract: One of the core components of conventional (i.e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations. In this paper, we propose an end-to-end learned system for compressing video frames. Instead of relying on pixel-space motion (as with optical flow), our system learns deep embeddings of frames and encodes their difference in latent space. At decoder-side, an attention mechanism is designed to attend to the latent space of frames to decide how different parts of the previous and current frame are combined to form the final predicted current frame. Spatially-varying channel allocation is achieved by using importance masks acting on the feature-channels. The model is trained to reduce the bitrate by minimizing a loss on importance maps and a loss on the probability output by a context model for arithmetic coding. In our experiments, we show that the proposed system achieves high compression rates and high objective visual quality as measured by MS-SSIM and PSNR. Furthermore, we provide ablation studies where we highlight the contribution of different components.
Published: 2020
Full Text: View/download PDF

15. Row-Interleaved Sampling for Depth-Enhanced 3D Video Coding for Polarized Displays

Author: Payman Aflaki, Maryam Homayouni, Miska Hannuksela, and Moncef Gabbouj
Subjects: General Computer Science, Computer science, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Illusion, Experimental and Cognitive Psychology, Stereoscopy, 02 engineering and technology, Polarizing filter, 01 natural sciences, Sub-sampling, Theoretical Computer Science, law.invention, 010309 optics, law, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, media_common, business.industry, Stereoscopic Video Coding, 020201 artificial intelligence & image processing, Artificial intelligence, Multiview Video Coding, business, Coding (social sciences)
Abstract: Passive stereoscopic displays create the illusion of three dimensions by employing orthogonal polarizing filters and projecting two images onto the same screen. In this article, a coding scheme targeting depth-enhanced stereoscopic video coding for polarized displays is introduced. We propose to use asymmetric row-interleaved sampling for texture and depth views prior to encoding. The performance of the proposed scheme is compared with several other schemes, and the objective results confirm the superior performance of the proposed method. Furthermore, subjective evaluation proves that no quality degradation is introduced by the proposed coding scheme compared to the reference method.
Published: 2017
Full Text: View/download PDF

16. Efficient Adaptation of Neural Network Filter for Video Compression

Author: Alireza Zare, Jani Lainema, Yat-Hong Lam, Miska Hannuksela, and Francesco Cricri
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Machine Learning (cs.LG), 0202 electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, Overhead (computing), Codec, Artificial neural network, business.industry, Image and Video Processing (eess.IV), Electrical Engineering and Systems Science - Image and Video Processing, Multimedia (cs.MM), Adaptive filter, Filter (video), 020201 artificial intelligence & image processing, business, Encoder, Computer Science - Multimedia, Computer hardware, Data compression, Coding (social sciences)
Abstract: We present an efficient finetuning methodology for neural-network filters which are applied as a postprocessing artifact-removal step in video coding pipelines. The fine-tuning is performed at encoder side to adapt the neural network to the specific content that is being encoded. In order to maximize the PSNR gain and minimize the bitrate overhead, we propose to finetune only the convolutional layers' biases. The proposed method achieves convergence much faster than conventional finetuning approaches, making it suitable for practical applications. The weight-update can be included into the video bitstream generated by the existing video codecs. We show that our method achieves up to 9.7% average BD-rate gain when compared to the state-of-art Versatile Video Coding (VVC) standard codec on 7 test sequences., Comment: Accepted in ACM Multimedia 2020
Published: 2020
Full Text: View/download PDF

17. Approximating Binarization in Neural Networks

Author: Jani Lainema, Emre Aksu, Miska Hannuksela, Francesco Cricri, and Caglar Aytekin
Subjects: Stochastic gradient descent, Artificial neural network, Computer science, business.industry, 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 020201 artificial intelligence & image processing, Pattern recognition, 02 engineering and technology, Artificial intelligence, business, Encoder, Image compression
Abstract: Binarization of neural networks’ activations may be a requirement for some applications. A typical example is end-to-end learned deep image compression systems where the encoder’s output is requred to be a binary vector. Binarization is non-differentiable, therefore one needs to approximate it in order to train neural networks with stochastic gradient descent. In this paper, we investigate these training strategies and provide improvements over baselines. We find that during training, constraining the activations in a region that is far away from binary points leads to a better performance at test-time. The above finding provides a counter-intuitive result and leads to re-thinking the binarization approximation problem in neural networks.
Published: 2019
Full Text: View/download PDF

18. An Overview of the OMAF Standard for 360° Video

Author: Ari Hourunranta, Ye-Kui Wang, and Miska Hannuksela
Subjects: Computer science, Timed text, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Virtual reality, Metadata, Three degrees of freedom, Computer graphics (images), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Omnidirectional antenna, Mpeg standards, Transform coding
Abstract: Omnidirectional MediA Format (OMAF) is arguably the first virtual reality (VR) system standard, recently developed by the Moving Picture Experts Group (MPEG). OMAF defines a media format that enables omnidirectional media applications, focusing on 360° video, images, and audio, as well as the associated timed text, supporting three degrees of freedom (3DOF). This paper gives an overview of the first edition of the OMAF standard.
Published: 2019
Full Text: View/download PDF

19. Video coding of dynamic 3D point cloud data

Author: Vida Fakour Sevom, Sebastian Schwarz, Nahid Sheikhipour, Miska Hannuksela, Tampere University, Doctoral Programme in Computing and Electrical Engineering, and Computing Sciences
Subjects: Signal processing, Multimedia, Computer science, Point cloud, 02 engineering and technology, 113 Computer and information sciences, computer.software_genre, Popularity, Mixed reality, 020303 mechanical engineering & transports, 0203 mechanical engineering, Algorithmic efficiency, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Information system, Six degrees of freedom, 020201 artificial intelligence & image processing, computer, Information Systems, Coding (social sciences)
Abstract: Due to the increased popularity of augmented (AR) and virtual (VR) reality experiences, the interest in representing the real world in an immersive fashion has never been higher. Distributing such representations enables users all over the world to freely navigate in never seen before media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today's networks. Thus, efficient compression technologies are in high demand. This paper proposes an approach to compress 3D video data utilizing 2D video coding technology. The proposed solution was developed to address the needs of "tele-immersive" applications, such as VR, AR, or mixed reality with "Six Degrees of Freedom" capabilities. Volumetric video data is projected on 2D image planes and compressed using standard 2D video coding solutions. A key benefit of this approach is its compatibility with readily available 2D video coding infrastructure. Furthermore, objective and subjective evaluation shows significant improvement in coding efficiency over reference technology. The proposed solution was contributed and evaluated in international standardization. Although it is was not selected as the winning proposal, as very similar solution has been selected developed since then. publishedVersion
Published: 2019
Full Text: View/download PDF

20. Is the transmission of depth data always necessary for 3D video streaming?

Author: Tammam Tillo, Chunyu Lin, Moncef Gabbouj, Li Yu, and Miska Hannuksela
Subjects: Computer science, Real-time computing, Bit rate, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, Video streaming, Rendering (computer graphics)
Abstract: Depth data is of vital importance in the 3D video streaming, which allows the flexible rendering of views at arbitrary viewpoints. Given the importance of the depth data, the Multi-view Video plus Depth (MVD) format has been conventionally used. In the MVD format, the depth data is transmitted along with the texture data. In this work, we argue that the transmission of the depth data is not necessary in cases, when 1) the bandwidth is limited and 2) viewpoint switching is not frequent. We propose that the depth transmission could be replaced by a receiver- side unit that can estimate the depth from the received multi- view videos. This replacement does not only spare the bandwidth dedicated for the transmission of the depth, but also achieves a competitive rate-distortion performance with the MVD method.
Published: 2018
Full Text: View/download PDF

21. 2D Video Coding of Volumetric Video Data

Author: Miska Hannuksela, Sebastian Schwarz, Nahid Sheikhi-Pour, and Vida Fakour-Sevom
Subjects: Multimedia, Computer science, Algorithmic efficiency, 0202 electrical engineering, electronic engineering, information engineering, Six degrees of freedom, 020201 artificial intelligence & image processing, 02 engineering and technology, Virtual reality, computer.software_genre, computer, Popularity, Coding (social sciences)
Abstract: Due to the increased popularity of augmented and virtual reality experiences, the interest in representing the real world in an immersive fashion has never been higher. Distributing such representations enables users all over the world to freely navigate in never seen before media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Thus, efficient compression technologies are in high demand. This paper proposes an approach to compress 3D video data utilizing 2D video coding technology. The proposed solution was developed to address the needs of ‘tele-immersive’ applications, such as virtual (VR), augmented (AR) or mixed (MR) reality with Six Degrees of Freedom (6DoF) capabilities. Volumetric video data is projected on 2D image planes and compressed using standard 2D video coding solutions. A key benefit of this approach is its compatibility with readily available 2D video coding infrastructure. Furthermore, objective and subjective evaluation shows significant improvement in coding efficiency over reference technology.
Published: 2018
Full Text: View/download PDF

22. 360-degree video streaming with MPEG-DASH

Author: Emmanuel Thomas, Thomas Stockhammer, Sejin Oh, Stefan Pham, Miska Hannuksela, Dimitri Podborski, and Publica
Subjects: VR, Moving picture experts group, Computer science, Interoperability, 02 engineering and technology, Virtual reality, computer.software_genre, Dynamic Adaptive Streaming over HTTP, 03 medical and health sciences, 0302 clinical medicine, Dash, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Content Protection, Electrical and Electronic Engineering, Media streaming, Video streaming, Focus (computing), Multimedia, Motion Picture Experts Group standards, Cornerstone, 020206 networking & telecommunications, 030229 sport sciences, DASH, Consumer devices, Video streaming services, 360 video streaming, MPEG, OMAF, Mpeg standards, computer
Abstract: Virtual reality (VR) has gained a lot of attention due to market availability of consumer devices. A challenge for wider use is the ability to stream VR content to the user. This paper looks at aspects of 360° video streaming and the considerations for possible solutions. The primary focus lies on current efforts to develop standards for interoperable 360° video streaming. These standards include the Moving Picture Experts Group's (MPEG) work on omnidirectional media format (OMAF) and the integration in MPEG dynamic adaptive streaming over HTTP, which is considered a cornerstone of 360° video streaming services with OMAF content. This paper also discusses content protection approaches and considerations. © 2002 Society of Motion Picture and Television Engineers, Inc.
Published: 2018

23. Standardization status of 360 degree video coding and delivery

Author: Robert Skupin, Mathias Wien, Jill Macdonald Boyce, Miska Hannuksela, Y.-K. Wang, and Yago Sanchez
Subjects: Entertainment, Multimedia, Standardization, Computer science, 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 020201 artificial intelligence & image processing, 02 engineering and technology, computer.software_genre, computer, Mpeg standards, Coding (social sciences), Display device
Abstract: The emergence of consumer level capturing and display devices for 360 degree video creates new and promising segments in entertainment, education, professional training, and other markets. In order to avoid market fragmentation and ensure interoperability of 360 degree video ecosystems, industry and academia cooperate in standardization efforts in this field. In the video coding domain, 360 degree video invalidates many established procedures, e.g., concerning evaluation of the visual quality, while the specific content characteristics offer potential for higher compression efficiency beyond the current standards. Likewise, 360 degree video puts stricter demands on the system level aspects of transmission but may also offer the potential to enhance existing transport schemes. The Joint Collaborative Team on Video Coding (JCT-VC) as well as the Joint Video Exploration Team (JVET) already started investigations into 360 degree video coding while numerous activities in the Systems subgroup of the Moving Picture Experts Group (MPEG) started to investigate application requirements and delivery aspects of 360 degree video. This paper reports on the current status of the outlined standardization efforts.
Published: 2017
Full Text: View/download PDF

24. Comparison of HEVC coding schemes for tile-based viewport-adaptive streaming of omnidirectional video

Author: Qingpeng Xie, Alireza Aminlou, Alireza Zare, Moncef Gabbouj, Miska Hannuksela, Ramin Ghaznavi-Youvalari, and Huameng Fang
Subjects: Viewport, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Field of view, 02 engineering and technology, Virtual reality, Server, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Omnidirectional antenna, business, Decoding methods, Coding (social sciences)
Abstract: Virtual reality applications make use of 360-degree panoramic or omnidirectional video with high resolution and high frame rate in order to create the immersive experience to the user. The user views only a portion of the captured 360-degree scene at each time instant, hence streaming the whole omnidirectional video in highest quality is not efficient. In order to alleviate the problem of bandwidth wastage, viewport-adaptive encoding and streaming schemes have been proposed. In these schemes, part of the captured scene that is within the viewer's field of view is delivered at highest quality while the rest of the scene in a lower quality. In this work, three tile-based viewport-adaptive methods using motion-constrained tile sets (MCTS), region-of-interest scalability and simulcast approach have been studied for streaming omnidirectional content. In the performed experiments with various tiling arrangements, MCTS-based scheme required highest bitrate compared to other methods. The scalable coding scheme provided the highest performance in terms of streaming bitrate saving on average up to 53% and 35% compared to streaming the whole omnidirectional video and MCTS-based method, respectively.
Published: 2017
Full Text: View/download PDF

25. Virtual reality content streaming: Viewport-dependent projection and tile-based techniques

Author: Miska Hannuksela, Alireza Zare, and Alireza Aminlou
Subjects: Viewport, business.industry, Computer science, 020206 networking & telecommunications, 02 engineering and technology, Virtual reality, Computer graphics (images), Server, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Projection (set theory), Server-side, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Virtual reality (VR) head-mounted display (HMD) requires spherical panoramic contents with high-spatial and temporal fidelity to immerse the viewers into the captured scene. Hereby, VR contents are extremely bandwidth intensive and impose technical challenges for the design of a VR streaming system. A bandwidth-efficient VR streaming system can be achieved using the viewport-aware adaptation techniques, in which part of the sphere within the viewer's field of view is presented at higher quality. In this paper, two recently emerged viewport-adaptive streaming methods so-called tile-based method and truncated square pyramid (TSP) projection, a well-studied viewport-dependent projection, are compared using a proposed quality assessment methodology. The comparison is made in terms of storage and streaming bitrate performances. The simulation results indicate that the tile-based approach has slightly lower streaming performance, while offering a significant storage and encoding time saving at the server side, compared to TSP-based streaming.
Published: 2017
Full Text: View/download PDF

26. Perceptual quality assessment of HEVC main profile depth map compression for six degrees of freedom virtual reality video

Author: Sebastian Schwarz and Miska Hannuksela
Subjects: Computer science, Quality assessment, media_common.quotation_subject, 020206 networking & telecommunications, 02 engineering and technology, Virtual reality, Rendering (computer graphics), Human–computer interaction, Depth map, Perception, Bit rate, 0202 electrical engineering, electronic engineering, information engineering, Immersion (virtual reality), Six degrees of freedom, 020201 artificial intelligence & image processing, media_common
Abstract: The last years have shown significant advances in immersive media. Virtual reality video, in the form of spherical panoramic video, is already widely available. Such technologies will continually evolve towards the ultimate goal of a truly virtual reality experience. The first step in this direction is the support of limited translational head movement for restricted six degrees of freedom virtual reality video. This experience can be achieved by rendering virtual viewports from supplementary depth information. In this context, this paper investigates the effect of depth map compression on the perceptual quality of immersive media. The presented study is focused on near-future applications with readily available hardware. Objective quality assessments indicate possible bit rate savings of 17% through high-quality depth maps. However, these findings could not be confirmed subjectively. On the contrary, subjective evaluation shows a robustness to low-quality depth maps when viewed in a virtual reality scenario.
Published: 2017
Full Text: View/download PDF

27. Nested polygonal chain mapping of omnidirectional video

Author: Kashyap Kammachi-Sreedhar and Miska Hannuksela
Subjects: Computer science, Sampling (statistics), 020207 software engineering, 02 engineering and technology, Sample (graphics), Reduction (complexity), Polygonal chain, 0202 electrical engineering, electronic engineering, information engineering, Equirectangular projection, 020201 artificial intelligence & image processing, Projection (set theory), Algorithm, Row, Decoding methods
Abstract: Conventionally, the equirectangular projection (ERP) has been used for representing 360-degree panorama videos. However, ERP suffers from over stretching in polar regions and thus increases the bitrate and the encoding/decoding complexity. This paper presents the nested polygonal chain packing method where the top and bottom stripes of an ERP picture are resampled sample-row-wise. The sampling ratio is a linear function of the sample row being processed. The height of the top or bottom stripe can be for example a quarter of the height of the ERP picture. The resampled sample rows are arranged in nested polygonal chains. According to the presented experimental results, the proposed nested polygonal chain packing provides on average 6.1% bitrate reduction compared to ERP.
Published: 2017
Full Text: View/download PDF

28. Viewport-dependent delivery schemes for stereoscopic panoramic video

Author: Alireza Aminlou, Moncef Gabbouj, Ramin Ghaznavi-Youvalari, and Miska Hannuksela
Subjects: Viewport, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Field of view, Stereoscopy, 02 engineering and technology, Virtual reality, law.invention, law, 0202 electrical engineering, electronic engineering, information engineering, High bandwidth, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Omnidirectional antenna, business, Decoding methods, ComputingMethodologies_COMPUTERGRAPHICS, Coding (social sciences)
Abstract: Stereoscopic panoramic or omnidirectional video is a key ingredient for an immersive experience in virtual reality applications. The user views only a portion of the omnidirectional scene at each time instant, hence streaming the whole stereoscopic panoramic or omnidirectional video in high quality is not necessary and will consume an unnecessary high bandwidth usage. In order to alleviate the problem of bandwidth wastage, viewport-dependent delivery schemes have been proposed, in which the part of the captured scene that is within the viewer's field of view is delivered at highest quality while the rest of the scene in lower quality. The low quality content is visible only after fast head movements for a short period, until the next periodic intra-coded picture that can be used for switching viewpoints is available. This paper proposes viewport-dependent delivery schemes for streaming of stereoscopic panoramic or omnidirectional video by using region-of-interest coding methods of MV-HEVC and SHVC standards. The proposed schemes avoid the need for frequent intra-coded pictures, and consequently in the performed experiments the streaming bitrate is reduced by more than 50% on average for the best schemes compared to a simulcast delivery method.
Published: 2017
Full Text: View/download PDF

29. Lossless compression of subaperture images using context modeling

Author: Miska Hannuksela, Moncef Gabbouj, Atanas Gotchev, and Ionut Schiopu
Subjects: Lossless compression, Context model, business.industry, Computer science, 020208 electrical & electronic engineering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Context (language use), 02 engineering and technology, computer.file_format, Image segmentation, Tree (data structure), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Algorithm design, Computer vision, Image file formats, Artificial intelligence, Entropy encoding, business, computer
Abstract: The paper proposes a method for lossless compression of subaperture image stacks obtained by rectifying light-field images captured by a plenoptic camera. We exploit the similarities between two subaperture images using a predictive coding algorithm, where the current view is predicted from one reference view. Context modeling is the main technique used to reduce the image file size. A suitable image segmentation and a template context are used by the context tree algorithm for encoding up to the smallest detail in each subaperture image. Entropy coding is configured by a residual analysis module. The results show improved performance compared to the state-of-the-art encoders.
Published: 2017
Full Text: View/download PDF

30. Differential Coding Using Enhanced Inter-Layer Reference Picture for the Scalable Extension of H.265/HEVC Video Codec

Author: Alireza Aminlou, Jani Lainema, Kemal Ugur, Moncef Gabbouj, and Miska Hannuksela
Subjects: H.262/MPEG-2 Part 2, Motion compensation, Average bitrate, Computer science, business.industry, Deblocking filter, Real-time computing, computer.file_format, Coding tree unit, Video compression picture types, Media Technology, Codec, Electrical and Electronic Engineering, Multiview Video Coding, business, computer, Context-adaptive binary arithmetic coding, Computer hardware
Abstract: Differential coding methods improve coding efficiency of scalable video codecs by adding the high-frequency component present in the previously coded enhancement layer (EL) pictures to the base layer (BL) picture. This paper proposes a method to enable differential coding in a scalable codec design without affecting the core coding tools, thus allowing a practical implementation to reuse single-layer hardware or software components. This is achieved by creating an additional reference picture called enhanced inter-layer reference (EILR) and inserting it to the EL decoded picture buffer and reference picture lists. An EILR picture is generated by adding differential information to the current inter-layer reference picture. The differential information is calculated using the previously decoded pictures of the BL and EL and the motion information of the BL picture. The proposed method reduces luma total bitrate on average by 2.2% and 2.8% for random access and low-delay test cases, respectively. The improvements are more significant for chroma components with the average bitrate reduction of 6.5%. The measured decoding time increase for a reference software implementation is $\sim 16$ % with negligible overhead on encoding time.
Published: 2014
Full Text: View/download PDF

31. Simultaneous 2D and 3D perception for stereoscopic displays based on polarized or active shutter glasses

Author: Hamed Sarbolandi, Moncef Gabbouj, Payman Aflaki, Miska Hannuksela, Tampere University, Department of Signal Processing, and Research Community on Data-to-Decision (D2D)
Subjects: Artifact (error), Computer science, business.industry, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Stereoscopy, 113 Computer and information sciences, GeneralLiterature_MISCELLANEOUS, law.invention, Active shutter 3D system, law, Perception, Signal Processing, Media Technology, Contrast (vision), 3d perception, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, Depth perception, Ghosting, business, media_common
Abstract: Viewing stereoscopic 3D content is typically enabled either by using polarizing or active shutter glasses. In certain cases, some viewers may not wear viewing glasses and hence, it would be desirable to tune the stereoscopic 3D content so that it could be simultaneously watched with and without viewing glasses. In this paper we propose a video post-processing technique which enables good quality 3D and 2D perception of the same content. This is done through manipulation of one view by making it more similar to the other view to reduce the ghosting artifact perceived without viewing glasses while 3D perception is maintained. The proposed technique includes three steps: disparity selection, contrast adjustment, and low-pass filtering. The proposed approach was evaluated through an extensive series of subjective tests, which also revealed good adjustment parameters to suit viewing with and without viewing glasses with an acceptable 3D and 2D quality, respectively.
Published: 2014
Full Text: View/download PDF

32. Overview of the MVC+D 3D video coding standard

Author: Teruhiko Suzuki, Ying Chen, Shinobu Hattori, and Miska Hannuksela
Subjects: Video post-processing, Multimedia, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Video processing, computer.file_format, computer.software_genre, Smacker video, Scalable Video Coding, Video compression picture types, Rendering (computer graphics), Signal Processing, Media Technology, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, Multiview Video Coding, computer, Coding (social sciences)
Abstract: 3D video services are emerging in various application domains including cinema, TV broadcasting, Blu-ray discs, streaming and smartphones. A majority of the 3D video content in market is still based on stereo video, which is typically coded with the multiview video coding (MVC) extension of the Advanced Video Coding (H.264/AVC) standard or as frame-compatible stereoscopic video. However, the 3D video technologies face challenges as well as opportunities to support more demanding application scenarios, such as immersive 3D telepresence with numerous views and 3D perception adaptation for heterogeneous 3D devices and/or user preferences. The Multiview Video plus Depth (MVD) format enables depth-image-based rendering (DIBR) of additional viewpoints in the decoding side and hence helps in such advanced application scenarios. This paper reviews the MVC+D standard, which specifies an MVC-compatible MVD coding format.
Published: 2014
Full Text: View/download PDF

33. Multiview-Video-Plus-Depth Coding Based on the Advanced Video Coding Standard

Author: Lulu Chen, Ri Li, Moncef Gabbouj, Payman Aflaki, Miska Hannuksela, Wenyi Su, Dmytro Rusanovskyy, Houqiang Li, Michal Joachimiak, and Deyan Lan
Subjects: H.262/MPEG-2 Part 2, Average bitrate, Motion compensation, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, computer.file_format, Computer Graphics and Computer-Aided Design, Coding tree unit, Motion vector, Scalable Video Coding, Image texture, Computer vision, Artificial intelligence, Multiview Video Coding, business, computer, Software, Context-adaptive binary arithmetic coding, Context-adaptive variable-length coding
Abstract: This paper presents a multiview-video-plus-depth coding scheme, which is compatible with the advanced video coding (H.264/AVC) standard and its multiview video coding (MVC) extension. This scheme introduces several encoding and in-loop coding tools for depth and texture video coding, such as depth-based texture motion vector prediction, depth-range-based weighted prediction, joint inter-view depth filtering, and gradual view refresh. The presented coding scheme is submitted to the 3D video coding (3DV) call for proposals (CfP) of the Moving Picture Experts Group standardization committee. When measured with commonly used objective metrics against the MVC anchor, the proposed scheme provides an average bitrate reduction of 26% and 35% for the 3DV CfP test scenarios with two and three views, respectively. The observed bitrate reduction is similar according to an analysis of the results obtained for the subjective tests on the 3DV CfP submissions.
Published: 2013
Full Text: View/download PDF

34. Nonlinear Depth Map Resampling for Depth-Enhanced 3-D Video Coding

Author: Payman Aflaki, Miska Hannuksela, Moncef Gabbouj, Dmytro Rusanovskyy, Tampere University, Department of Signal Processing, Research group: Video, and Research Community on Data-to-Decision (D2D)
Subjects: Average bitrate, business.industry, Applied Mathematics, 113 Computer and information sciences, Coding tree unit, Upsampling, Rate–distortion theory, Image texture, Depth map, Signal Processing, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Context-adaptive binary arithmetic coding, Coding (social sciences), Mathematics
Abstract: Depth-enhanced 3D video coding includes coding of texture views and associated depth maps. It has been observed that coding of depth map at reduced resolution provides better rate-distortion performance on synthesized views comparing to utilization of full resolution (FR) depth maps in many coding scenarios based on the Advanced Video Coding (H.264/AVC) standard. Conventional techniques for down and upsampling do not take typical characteristics of depth maps, such as distinct edges and smooth regions within depth objects, into account. Hence, more efficient down and upsampling tools, capable of preserving edges better, are needed. In this letter, novel non-linear methods to down and upsample depth maps are presented. Bitrate comparison of synthesized views, including texture and depth map bitstreams, is presented against a conventional linear resampling algorithm. Objective results show an average bitrate reduction of 5.29% and 3.31% for the proposed down and upsampling methods with ratio ½, respectively, comparing to the anchor method. Moreover, a joint utilization of the proposed down and upsampling brings up to 20% and on average 7.35% bitrate reduction.
Published: 2013
Full Text: View/download PDF

35. Efficient Coding of 360-Degree Pseudo-Cylindrical Panoramic Video for Virtual Reality Applications

Author: Ramin Ghaznavi Youvalari, Alireza Aminlou, Moncef Gabbouj, and Miska Hannuksela
Subjects: Motion compensation, Coding artifacts, Computer science, business.industry, Spherical coordinate system, 020206 networking & telecommunications, 02 engineering and technology, Iterative reconstruction, Virtual reality, Computer graphics (images), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Equidistant, Computer vision, Artificial intelligence, business, Coding (social sciences), Reference frame
Abstract: Pseudo-cylindrical panoramas represent the data distribution of spherical coordinates closely in two-dimensional domain due to the equidistant sampling of 360-degree scene. Therefore, unlike the cylindrical projections, they do not suffer from the over stretching in the polar areas. However, due to the non-rectangular format in effective picture area and sharp edges at its borders, the compression performance is inefficient. In this paper, we propose two methods which improve the compression performance of both intra-frame and inter-frame coding of pseudo-cylindrical panoramic content and meanwhile reduce the coding artifacts. In the intra-frame coding method, border edges are smoothed by modifying the content of the image in the non-effective picture area, which are cropped at the receiver side. In the inter-frame coding method, gaining the benefit of 360-degree property of the content, non-effective picture area of reference frames at border is filled with the content of the effective picture area from the opposite border to enhance the performance of motion compensation.
Published: 2016
Full Text: View/download PDF

36. Standard-Compliant Multiview Video Coding and Streaming for Virtual Reality Applications

Author: Miska Hannuksela, Kashyap Kammachi Sreedhar, Moncef Gabbouj, and Alireza Aminlou
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Stereoscopy, Video sequence, 02 engineering and technology, Virtual reality, law.invention, Categorization, law, Computer graphics (images), 0202 electrical engineering, electronic engineering, information engineering, Head movements, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Multiview Video Coding, business, Coding (social sciences)
Abstract: Virtual reality (VR) systems employ multiview cameras or camera rigs to capture a scene from the entire 360-degree perspective. Due to computational or latency constraints, it might not be possible to stitch multiview videos into a single video sequence prior to encoding. In this paper we investigate the coding and streaming of multiview VR video content. We present a standard-compliant method where we first divide the camera views into two types: Primary views represent a subset of camera views with lower resolution and non-overlapping (minimally overlapping) content which cover the entire 360-degree field-of-view to guarantee immediate monoscopic viewing during very rapid head movements. Auxiliary views consist of remaining camera views with higher resolution which produce overlapping content with the primary views and are additionally used for stereoscopic viewing. Based on this categorization, we propose a coding arrangement in which, the primary views are independently coded in the base layer and the additional auxiliary views are coded as an enhancement layer, using inter-layer prediction from primary views. The proposed system not only meets the low latency requirements of VR systems, but also conforms to the existing multilayer extensions of the High Efficiency Video Coding standard. Simulation results show that the coding and streaming performance of the proposed scheme is significantly improved compared to earlier methods.
Published: 2016
Full Text: View/download PDF

37. RTP/RTCP Reception Hint Tracks for Video Call Recording and Playback

Author: Miska Hannuksela, Marko Viitanen, Jarno Vanne, Joni Rasanen, Timo Hämäläinen, Vinod Kumar Malamal Vadakital, Tampere University, Department of Pervasive Computing, and Research area: Computer engineering
Subjects: Network packet, Computer science, business.industry, 213 Electronic, automation and communications engineering, electronics, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, Real-time computing, CPU time, 020206 networking & telecommunications, 02 engineering and technology, RTP Control Protocol, Software, 0202 electrical engineering, electronic engineering, information engineering, Codec, business, Protocol (object-oriented programming)
Abstract: This paper proposes to use RTP (Real-time Transport Protocol) Reception Hint Tracks for convenient recording and playback of a video call in MP4 format. The feasibility of RTP Reception Hint Tracks is validated as a part of an implemented end-to-end video call system. The proposed approach records a bidirectional Linphone video call and multiplexes it as MP4 RTP Reception Hint Tracks with the L-SMASH software library. It also stores RTCP (RTP Control Protocol) Reception Hint Tracks for additional timing information. Playback of the MP4 file is performed with VLC Media Player that is made compatible with RTP Reception Hint Tracks. The proposed proof-of-concept setup meets particularly well the needs of multi-codec solutions where different audio and video codecs can be used for a video call recording and playback. According to our analysis, recording RTP reception Hint Tracks increases the Linphone CPU time by under 1% and the bitrate by under 2% over the bare bitrate of the recorded RTP packets. acceptedVersion
Published: 2016
Full Text: View/download PDF

38. Viewport-Adaptive Encoding and Streaming of 360-Degree Video for Virtual Reality Applications

Author: Moncef Gabbouj, Kashyap Kammachi Sreedhar, Alireza Aminlou, and Miska Hannuksela
Subjects: Viewport, business.industry, Computer science, Image quality, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Stereoscopy, 02 engineering and technology, Virtual reality, Cube mapping, law.invention, law, Computer graphics (images), 0202 electrical engineering, electronic engineering, information engineering, Immersion (virtual reality), Equirectangular projection, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Coding (social sciences)
Abstract: Virtual reality applications use 360-degree videos and head mount displays (HMDs) with stereoscopic capabilities to provide full immersion experience. In these applications it is also common to use 4K resolution or higher per view for 360-degree videos. Consequently, this leads to technical challenges in handling the bandwidth requirements while keeping the system latency to the minimal. When the content is viewed with a HMD, a subset of the entire 360-degree video is displayed at a single point of time. To improve the resolution and picture quality of the displayed content, viewport based coding is desirable. In this regard, we investigated various viewport dependent projection schemes including the existing variants of Pyramidal projection. In this regard we propose the multi-resolution versions of Equirectangular and Cubemap projections. Additionally, we developed a methodology for comparing the rate-distortion performance of these projections. Based on the simulation results, it was observed that multi-resolution projections of Equirectangle and Cubemap outperform other projection schemes, significantly.
Published: 2016
Full Text: View/download PDF

39. HEVC-compliant Tile-based Streaming of Panoramic Video for Virtual Reality Applications

Author: Moncef Gabbouj, Alireza Aminlou, Alireza Zare, and Miska Hannuksela
Subjects: Viewport, Panorama, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, 02 engineering and technology, Virtual reality, visual_art, 0202 electrical engineering, electronic engineering, information engineering, visual_art.visual_art_medium, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Tile, business
Abstract: Delivering wide-angle and high-resolution spherical panoramic video content entails a high streaming bitrate. This imposes challenges when panorama clips are consumed in virtual reality (VR) head-mounted displays (HMD). The reason is that the HMDs typically require high spatial and temporal fidelity contents and strict low-latency in order to guarantee the user's sense of presence while using them. In order to alleviate the problem, we propose to store two versions of the same video content at different resolutions, each divided into multiple tiles using the High Efficiency Video Coding (HEVC) standard. According to the user's present viewport, a set of tiles is transmitted in the highest captured resolution, while the remaining parts are transmitted from the low-resolution version of the same content. In order to enable randomly choosing different combinations, the tile sets are encoded to be independently decodable. We further study the trade-off in the choice of tiling scheme and its impact on compression and streaming bitrate performances. The results indicate streaming bitrate saving from 30% to 40%, depending on the selected tiling scheme, when compared to streaming the entire video content.
Published: 2016
Full Text: View/download PDF

40. HEVC still image coding and high efficiency image file format

Author: Jani Lainema, Miska Hannuksela, Vinod Kumar Malamal Vadakital, and Emre Aksu
Subjects: Image quality, Computer science, business.industry, High Efficiency Image File Format, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, JPEG File Interchange Format, 02 engineering and technology, computer.file_format, File format, JPEG 2000, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Image file formats, business, computer, Transform coding, Image compression
Abstract: The High Efficiency Video Coding (HEVC) standard includes support for a large range of image representation formats and provides an excellent image compression capability. The High Efficiency Image File Format (HEIF) offers a convenient way to encapsulate HEVC coded images, image sequences and animations together with associated metadata into a single file. This paper discusses various features and functionalities of the HEIF file format and compares the compression efficiency of HEVC still image coding to that of JPEG 2000. According to the experimental results HEVC provides about 25% bitrate reduction compared to JPEG 2000, while keeping the same objective picture quality.
Published: 2016
Full Text: View/download PDF

41. Fisheye video coding using elastic motion compensated reference frames

Author: Ashek Ahmmed, Moncef Gabbouj, and Miska Hannuksela
Subjects: Motion compensation, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 030229 sport sciences, 02 engineering and technology, Quarter-pixel motion, 03 medical and health sciences, 0302 clinical medicine, Computer graphics (images), Motion estimation, 0202 electrical engineering, electronic engineering, information engineering, Codec, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Multiview Video Coding, business, ComputingMethodologies_COMPUTERGRAPHICS, Block-matching algorithm, Reference frame
Abstract: Fisheye cameras have become extremely popular in applications where the goal is to capture large fields of view with only one camera. However, the wide-angle fisheye imagery has special characteristics that may not be very well suited for modern video codecs that employ block-based translational motion model. This model fails to describe complex deformable motion which is often present in fisheye videos. In this paper, we advocate for the usage of elastic motion model in compensating such a complex motion. The presented design enables the re-use of existing codecs, such as HEVC, without modifications in low-level coding tools. Experimental results show that a savings in bit rate of up to 6.54% is achievable over standalone HEVC if the elastic motion compensated prediction is used as an additional reference frame.
Published: 2016
Full Text: View/download PDF

42. Motion hints compensated prediction as a reference frame for high efficiency video coding (HEVC)

Author: Ashek Ahmmed, Miska Hannuksela, and Moncef Gabbouj
Subjects: Motion compensation, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Inter frame, 030229 sport sciences, 02 engineering and technology, Quarter-pixel motion, 03 medical and health sciences, 0302 clinical medicine, Motion estimation, Bit rate, 0202 electrical engineering, electronic engineering, information engineering, Codec, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Reference frame, Coding (social sciences)
Abstract: Segment-based temporal prediction combined with higher-order motion models have been studied as an alternative to conventional block-based translational inter prediction. One example of such studies is known as motion hints, where an affine motion model has been used. In this paper, we explore the applicability of motion hints with an elastic motion model in generating reference frames for conventionally coded B-frames. The presented design enables the re-use of existing codecs, such as HEVC, without modifications in low-level coding tools. Experimental results show that a savings in bit rate of up to 5.1% is achievable over standalone HEVC with increased computational complexity.
Published: 2016
Full Text: View/download PDF

43. Decoding complexity reduction in projection-based light-field 3D displays using self-contained HEVC tiles

Author: Alireza Zare, Péter Tamás Kovács, Atanas Gotchev, Alireza Aminlou, and Miska Hannuksela
Subjects: Computer science, business.industry, Video decoding, 020206 networking & telecommunications, 02 engineering and technology, Stereo display, Low complexity, Computer engineering, visual_art, 0202 electrical engineering, electronic engineering, information engineering, visual_art.visual_art_medium, Computer vision, Artificial intelligence, Tile, business, Decoding methods, Light field, Coding (social sciences)
Abstract: The goal of this work is to provide a low complexity video decoding solution for High Efficiency Video Coding (HEVC) streams in applications where only a region of the video frames is needed to be decoded. This paper studies the problem of creating selfcontained (i.e., independently decodable) partitions in the HEVC streams. Further, the requirements for building self-contained regions are described, and an encoder-side solution is proposed based on HEVC tile feature. A particular application of self-contained tiles targets the type of light-field 3D displays, which employ a dense set of optical engines to recreate the light field. Correspondingly, such 3D displays require a dense set of input views and therefore the partial decoding of bitstreams helps providing less complex and consequently real-time decoding and processing. The simulation results show a significant increase in decoding speed at the cost of a minor increase in storage capacity.
Published: 2016
Full Text: View/download PDF

44. Overview of HEVC High-Level Syntax and Reference Picture Management

Author: Rickard Sjöberg, Jonatan Samuelsson, Stephan Wenger, Ying Chen, Thiow Keng Tan, Ye-Kui Wang, Miska Hannuksela, and Akira Fujibayashi
Subjects: H.262/MPEG-2 Part 2, Motion compensation, Multimedia, Computer science, computer.file_format, Video processing, Smacker video, computer.software_genre, Coding tree unit, Video compression picture types, Computer architecture, Bit rate, Media Technology, Electrical and Electronic Engineering, Multiview Video Coding, computer, Context-adaptive binary arithmetic coding, Data compression
Abstract: The increasing proportion of video traffic in telecommunication networks puts an emphasis on efficient video compression technology. High Efficiency Video Coding (HEVC) is the forthcoming video coding standard that provides substantial bit rate reductions compared to its predecessors. In the HEVC standardization process, technologies such as picture partitioning, reference picture management, and parameter sets are categorized as “high-level syntax.” The design of the high-level syntax impacts the interface to systems and error resilience, and provides new functionalities. This paper presents an overview of the HEVC high-level syntax, including network abstraction layer unit headers, parameter sets, picture partitioning schemes, reference picture management, and supplemental enhancement information messages.
Published: 2012
Full Text: View/download PDF

45. Rate adaptation for dynamic adaptive streaming over HTTP in content distribution network

Author: Miska Hannuksela, Chenghao Liu, Moncef Gabbouj, and Imed Bouazizi
Subjects: Computer science, Real-time computing, Dynamic Adaptive Streaming over HTTP, Network congestion, Sliding window protocol, Signal Processing, Bandwidth (computing), User Datagram Protocol, Computer Vision and Pattern Recognition, Enhanced Data Rates for GSM Evolution, Electrical and Electronic Engineering, Software, Buffer overflow, Buffer underrun
Abstract: Recently the 3rd Generation Partnership Project (3GPP) and the Moving Picture Experts Group (MPEG) specified Dynamic Adaptive Streaming over HTTP (DASH) to cope with the shortages in progressive HTTP based downloading and Real-time Transport Protocol (RTP) over the User Datagram Protocol (UDP), shortly RTP/UDP, based streaming. This paper investigates rate adaptation for the serial segment fetching method and the parallel segment fetching method in Content Distribution Network (CDN). The serial segment fetching method requests and receives segments sequentially whereas the parallel segment fetching method requests media segments in parallel. First, a novel rate adaptation metric is presented in this paper, which is the ratio of the expected segment fetch time (ESFT) and the measured segment fetch time to detect network congestion and spare network capacity quickly. ESFT represents the optimum segment fetch time determined by the media segment duration multiplied by the number of parallel HTTP threads to deliver media segments and the remaining duration to fetch the next segment to keep a certain amount of media time in the client buffer. Second, two novel rate adaptation algorithms are proposed for the serial and the parallel segment fetching methods, respectively, based on the proposed rate adaptation metric. The proposed rate adaptation algorithms use a step-wise switch-up and a multi-step switch-down strategy upon detecting the spare networks capacity and congestion with the proposed rate adaptation metric. To provide a good convergence in the representation level for DASH in CDN, a sliding window is used to measure the latest multiple rate adaptation metrics to determine switch-up. To decide switch-down, a rate adaptation metric is used. Each rate adaptation metric represents a reception of a segment/portion of a segment, which can be fetched from the different edge servers in CDN, hence it can be used to estimate the corresponding edge server bandwidth. To avoid buffer overflow due to a slight mismatch in the optimum representation level and bandwidth, an idling method is used to idle a given duration before sending the next segment. In order to solve the fairness between different clients who compete for bandwidth, the prioritized optimum segment fetch time is assigned to the newly joined clients. The proposed rate adaptation method does not require any transport layer information, which is not available at the application layer without cross layer communication. Simulation results show that the proposed rate adaptation algorithms for the serial and the parallel segment fetching methods quickly adapt the media bitrate to match the end-to-end network capacity, provide an advanced convergence and fairness between different clients and also effectively control buffer underflow and overflow for DASH in CDN. The reported simulation results demonstrate that the parallel rate adaptation outperforms the serial DASH rate adaptation algorithm with respect to achievable media bitrates while the serial rate adaptation is superior to the parallel DASH with respect to the convergence and buffer underflow frequency.
Published: 2012
Full Text: View/download PDF

46. The DVB File Format [Standards in a Nutshell]

Author: S. Dohla, Miska Hannuksela, K. Murray, and Publica
Subjects: Multimedia, Computer science, Applied Mathematics, Recording format, JPEG File Interchange Format, ISO base media file format, computer.file_format, computer.software_genre, File format, ComputingMilieux_GENERAL, Design rule for Camera File system, Matroska, Signal Processing, Wireless Application Protocol Bitmap Format, Electrical and Electronic Engineering, Bitstream format, computer
Abstract: The Digital Video Broadcasting (DVB) Project develops standards for television broadcast services. The DVB File Format, one of the recent DVB standards, is intended to be a recording format for received broadcasts and a file interchange format when recorded broadcasts are copied or moved from one DVB-compatible device to another. The DVB File Format is an extension of the ISO Base Media File Format (ISOBMFF), which is widely used as a basis format for various container formats, such as MPEG-4 File Format (MP4) and 3GP File Format (3GP) developed by the Moving Picture Experts Group (MPEG) and the Third Generation Partnership Project (3GPP), respectively.
Published: 2012
Full Text: View/download PDF

47. 3GPP Mobile Multimedia Services [Standards in a Nutshell

Author: Miska Hannuksela, Imed Bouazizi, and K Jarvinen
Subjects: Service (business), Multimedia, Computer science, business.industry, Applied Mathematics, Services computing, IP Multimedia Subsystem, Mobile QoS, computer.software_genre, User experience design, Signal Processing, Personal computer, The Internet, Mobile telephony, Electrical and Electronic Engineering, business, computer
Abstract: The 3rd Generation Partnership Project (3GPP) is developing and standardizing a wide range of advanced mobile multimedia services along with their radio access and network technologies. 3GPP targets to match or exceed the user experience of the personal computer (PC)-based services over the Internet, while at the same time facilitating unlimited receiver mobility, thus enabling ubiquitous service access for the user. This article gives an overview of the mobile multimedia services standardized by 3GPP. Those services can be classified based on their characteristics and target application scenarios into conversational, streaming, and other services. The related enablers, namely the media codecs and formats, are also presented.
Published: 2010
Full Text: View/download PDF

48. Perceptual-based quality assessment for audio–visual services: A survey

Author: Moncef Gabbouj, Andrew Perkis, Miska Hannuksela, Ulrich Reiter, and Junyong You
Subjects: Service quality, Multimedia, Computer science, Quality of service, media_common.quotation_subject, Information quality, computer.software_genre, Video quality, Signal Processing, Quality (business), Computer Vision and Pattern Recognition, Quality of experience, Electrical and Electronic Engineering, Quality policy, computer, Software, Subjective video quality, media_common
Abstract: Accurate measurement of the perceived quality of audio-visual services at the end-user is becoming a crucial issue in digital applications due to the growing demand for compression and transmission of audio-visual services over communication networks. Content providers strive to offer the best quality of experience for customers linked to their different quality of service (QoS) solutions. Therefore, developing accurate, perceptual-based quality metrics is a key requirement in multimedia services. In this paper, we survey state-of-the-art signal-driven perceptual audio and video quality assessment methods independently, and investigate relevant issues in developing joint audio-visual quality metrics. Experiments with respect to subjective quality results have been conducted for analyzing and comparing the performance of the quality metrics. We consider emerging trends in audio-visual quality assessment, and propose feasible solutions for future work in perceptual-based audio-visual quality metrics.
Published: 2010
Full Text: View/download PDF

49. Error Resilient Video Coding Using Redundant Pictures

Author: Ye-Kui Wang, Miska Hannuksela, Chunbo Zhu, and Houqiang Li
Subjects: Theoretical computer science, Computer science, Real-time computing, Macroblock, Image processing, Rate–distortion theory, Shannon–Fano coding, Media Technology, Redundancy (engineering), Computer vision, Forward error correction, Electrical and Electronic Engineering, Motion compensation, business.industry, Variable-length code, Coding tree unit, Coding gain, Video compression picture types, Adaptive coding, Artificial intelligence, Multiview Video Coding, business, Error detection and correction, Context-adaptive binary arithmetic coding, Decoding methods, Coding (social sciences)
Abstract: This paper presents several error resilient video coding methods based on redundant pictures. We combine redundant picture coding with reference picture selection and reference picture list reordering to prevent error propagation in motion compensated video coding. A hierarchical redundant picture allocation method is employed to make a tradeoff between error resilience and coding efficiency. For improved end-to-end rate-distortion performance in packet loss environment, three adaptive redundant picture allocation methods are further developed, utilizing characteristics of the input video content. Simulation results show that the adaptive redundant picture coding methods can achieve average luma peak-signal-to-noise improvements up to 3.5 dB compared to the loss-aware rate distortion optimized intra macroblock refresh algorithm implemented in the H.264/AVC Joint Model (JM). The proposed redundant picture coding methods are standard-compliant and do not introduce any additional end-to-end delay, therefore suit for low-delay applications such as video telephony and video conferencing. Due to the good error resilience performance, some of the proposed redundant picture coding methods have been adopted and integrated into the JM.
Published: 2009
Full Text: View/download PDF

50. HEVC-compliant viewport-adaptive streaming of stereoscopic panoramic video

Author: Kashyap Kammachi Sreedhar, Moncef Gabbouj, Vinod Kumar Malamal Vadakital, Alireza Zare, Alireza Aminlou, and Miska Hannuksela
Subjects: Viewport, Panorama, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Stereoscopy, 02 engineering and technology, Virtual reality, law.invention, law, 0202 electrical engineering, electronic engineering, information engineering, Codec, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Multiview Video Coding, business, Decoding methods, Random access
Abstract: Virtual reality (VR) provides unprecedented immersive experience using high-resolution spherical stereoscopic panoramic video. Such an experience is achieved by using head-mounted display (HMD) which has very strict latency bounds in order to respond promptly to user movements. Conventional streaming of VR video requires large bandwidth because the entire captured panorama is transmitted. However, only a limited field-of-view (FOV) is displayed by an HMD, resulting in wastage of bandwidth. To alleviate the problem, this paper proposes a High Efficiency Video Coding (HEVC) compliant approach for efficient coding and streaming of stereoscopic VR content. The proposed method is based on partitioning video pictures into tiles, where only the required tiles corresponding to the primary viewport are transmitted in high resolution, while the remaining parts are transmitted in low resolution. Furthermore, this method enables coding stereoscopic video contents using a conventional HEVC codec, while still achieving significant compression gain by means of adopting inter-view prediction only in intra random access point (IRAP) pictures. Using this method, the predicted view can be decoded independently of the main view, hence allowing simultaneous decoding instances. Experimental results demonstrate that the proposed approach is able to substantially improve compression efficiency and streaming bitrate performance.
Published: 2016
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

152 results on '"Miska Hannuksela"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources