7,559 results on '"Video Compression"'
Search Results
2. Exploring Lottery Ticket Hypothesis in Neural Video Representations
- Author
-
Chen, Jiacong, Mao, Qingyu, Liu, Shuai, Meng, Fanyang, Yi, Shuangyan, and Liang, Yongsheng
- Published
- 2025
- Full Text
- View/download PDF
3. A novel deep learning-based approach for video quality enhancement
- Author
-
Zilouchian Moghaddam, Parham, Modarressi, Mehdi, and Sadeghi, Mohammad Amin
- Published
- 2025
- Full Text
- View/download PDF
4. High performance holographic video compression using spatio-temporal phase unwrapping
- Author
-
Gonzalez, Sorayda Trejos, Velez-Zea, Alejandro, and Barrera-Ramírez, John Fredy
- Published
- 2024
- Full Text
- View/download PDF
5. Lightweight Motion-Aware Video Super-Resolution for Compressed Videos
- Author
-
Kwon, Ilhwan, Li, Jun, Shah, Rajiv Ratn, Prasad, Mukesh, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ide, Ichiro, editor, Kompatsiaris, Ioannis, editor, Xu, Changsheng, editor, Yanai, Keiji, editor, Chu, Wei-Ta, editor, Nitta, Naoko, editor, Riegler, Michael, editor, and Yamasaki, Toshihiko, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Parameter-Efficient Instance-Adaptive Neural Video Compression
- Author
-
Oh, Seungjun, Yang, Hyunmo, Park, Eunbyung, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cho, Minsu, editor, Laptev, Ivan, editor, Tran, Du, editor, Yao, Angela, editor, and Zha, Hongbin, editor
- Published
- 2025
- Full Text
- View/download PDF
7. Machine Vision-Aware Quality Metrics for Compressed Image and Video Assessment
- Author
-
Dremin, Mikhail, Kozhemyakov, Konstantin, Molodetskikh, Ivan, Kirill, Malakhov, Artur, Sagitov, Vatolin, Dmitriy, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
8. Long-Term Temporal Context Gathering for Neural Video Compression
- Author
-
Qi, Linfeng, Jia, Zhaoyang, Li, Jiahao, Li, Bin, Li, Houqiang, Lu, Yan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
9. A Survey on Video Diffusion Models.
- Author
-
Xing, Zhen, Feng, Qijun, Chen, Haoran, Dai, Qi, Hu, Han, Xu, Hang, Wu, Zuxuan, and Jiang, Yu-Gang
- Subjects
- *
ARTIFICIAL neural networks , *HUMAN activity recognition , *LANGUAGE models , *PATTERN recognition systems , *AUDITORY scene analysis , *VIDEO compression , *PROBABILISTIC generative models , *VIDEO coding , *POSE estimation (Computer vision) - Published
- 2025
- Full Text
- View/download PDF
10. FR-IBC: Flipping and Rotation Intra Block Copy for Versatile Video Coding.
- Author
-
Han, Heeji, Gwon, Daehyeok, Seo, Jeongil, and Choi, Haechul
- Subjects
VIDEO compression ,COMPUTATIONAL complexity ,CLOUD computing ,WIRELESS Internet ,CAMERAS ,VIDEO coding - Abstract
Screen content has become increasingly important in multimedia applications owing to the growth of remote desktops, Wi-Fi displays, and cloud computing. However, these applications generate large amounts of data, and their limited bandwidth necessitates efficient video coding. While existing video coding standards have been optimized for natural videos originally captured by cameras, screen content has unique characteristics such as large homogeneous areas and repeated patterns. In this paper, we propose an enhanced intra block copy (IBC) method for screen content coding (SCC) in versatile video coding (VVC) named flipping and rotation intra block copy (FR-IBC). The proposed method improves the prediction accuracy by using flipped and rotated versions of the reference blocks as additional references. To reduce the computational complexity, hash maps of these blocks are constructed on a 4 × 4 block size basis. Moreover, we modify the block vectors and block vector predictor candidates of IBC merge and IBC advanced motion vector prediction to indicate the locations within the available reference area at all times. The experimental results show that our FR-IBC method outperforms existing SCC tools in VVC. Bjøntegaard-Delta rate gains of 0.66% and 2.30% were achieved under the All Intra and Random Access conditions for Class F, respectively, while corresponding values of 0.40% and 2.46% were achieved for Class SCC, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
11. Object detection and classification from compressed video streams.
- Author
-
Joshi, Suvarna, Ojo, Stephen, Yadav, Sangeeta, Gulia, Preeti, Gill, Nasib Singh, Alsberi, Hassan, Rizwan, Ali, and Hassan, Mohamed M.
- Subjects
- *
OBJECT recognition (Computer vision) , *STREAMING video & television , *DATA analytics , *DEEP learning , *CLASSIFICATION , *VIDEO compression - Abstract
Video Analytics is widely used by the internet‐based platforms to govern the mass consumption of videos. Traditionally, it is carried out from the decoded format of the videos. This requires the analytics server to perform both decoding and analytics computation. This process can be made fast and efficient if performed over the compressed format of the videos as it reduces the decoding stress over the analytics server. The field of video analytics from the binarized formats using modern deep learning techniques is still emerging and needs further exploration. This proposed work is based on the same notion. In this work, two analytics tasks that is, classification and object detection are carried out from the binarized videos. The binarized formats are produced by using an already‐designed end‐to‐end video compression network. The experiments have been carried out over standard datasets. The proposed MobileNetv2‐based classification network shows an accuracy of 66% over the YouTube UGC dataset and the YOLOX‐S‐based detection network shows mAP of 45% over IMAGENet datasets. The proposed work shows competitiveness and improvement in the detection outcomes on compressed data and also provides further motivation for the adoption of deep learning‐based video compression in practical analytics domains. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
12. 基于智能合约的编码级视频安全存证方案.
- Author
-
郭冉, 王奎, 徐衍胜, 张守军, 潘晓刚, 佟雨镪, 王京, 何云华, and 焦泽政
- Subjects
- *
VIDEO compression , *DIGITAL watermarking , *ELECTRONIC paper , *VIDEO processing , *DISPUTE resolution , *VIDEO coding - Abstract
The short video application market has grown quickly due to the H. 264/H. 265 video compression technology. However, copyright conflicts and video security concerns have gained more attention as a result of this expansion. The current copyright notarization systems are plagued by issues with video traceability, opaque verification processes, and a lack of trust in the verification parties. This paper presented a smart contract-based encoding-level video security notarization technique as a solution to these problems. This strategy created a way to include copyright information by using the encoding properties of H. 264/H.265 and inserting the copyright owner s information as a watermark during the video encoding process. The encoded information had a minimal effect on the quality of the video while guaranteeing the watermark's durability and security. Furthermore, it standardized the processes of identity notarization, copyright verification, and dispute settlement through the use of blockchain and smart contract technology, which raised the legitimacy and transparency of the execution process. According to experimental results, the similarity comparison algorithm employed for video copyright notarization performs better than the current best solution, with a F₁-score that has grown by about 2%. Another test demonstrates that smart contracts have a manageable overhead, indicating their viability in real-world uses. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
13. Bit-rate aware effective inter-layer motion prediction using multi-loop encoding structure.
- Author
-
Siddaramappa, Sandeep Gowdra and Mamatha, Gowdra Shivanandappa
- Subjects
VIDEO compression ,INTERNET content ,DEEP learning ,STREAMING video & television ,VIDEO coding ,CODECS - Abstract
Recently, there has been a notable increase in the use of video content on the internet, leading for the creation of improved codecs like versatile-videocoding (VVC) and high-efficiency video-coding (HEVC). It is important to note that these video coding techniques continue to demonstrate quality degradation and the presence of noise throughout the decoded frames. A number of deep-learning (DL) algorithm-based network structures have been developed by experts to tackle this problem; nevertheless, because many of these solutions use in-loop filtration, extra bits must be sent among the encoding and decoding layers. Moreover, because they used fewer reference frames, they were unable to extract significant features by taking advantage from the temporal connection between frames. Hence, this paper introduces inter-layer motion prediction aware multi-loop video coding (ILMPA-MLVC) techniques. The ILMPA-MLVC first designs an multiloop adaptive encoder (MLAE) architecture to enhance inter-layer motion prediction and optimization process; second, this work designs multi-loop probabilistic-bitrate aware compression (MLPBAC) model to attain improved bitrate efficiency with minimal overhead; the training of ILMPAMLVC is done through novel distortion loss function using UVG dataset; the result shows the proposed ILMPA-MLVC attain improved peak-singalto-noise-ratio (PSNR) and structural similarity (SSIM) performance in comparison with existing video coding techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. Video-Wise Just-Noticeable Distortion Prediction Model for Video Compression with a Spatial–Temporal Network.
- Author
-
Liu, Huanhua, Liu, Shengzong, Xiao, Jianyu, Xu, Dandan, and Fan, Xiaoping
- Subjects
CONVOLUTIONAL neural networks ,VIDEO compression ,DATABASES ,DECISION making ,PREDICTION models ,DEEP learning - Abstract
Just-Noticeable Difference (JND) in an image/video refers to the maximum difference that the human visual system cannot perceive, which has been widely applied in perception-guided image/video compression. In this work, we propose a Binary Decision-based Video-Wise Just-Noticeable Difference Prediction Method (BD-VW-JND-PM) with deep learning. Firstly, we model the VW-JND prediction problem as a binary decision process to reduce the inferring complexity. Then, we propose a Perceptually Lossy/Lossless Predictor for Compressed Video (PLLP-CV) to identify whether the distortion can be perceived or not. In the PLLP-CV, a Spatial–Temporal Network-based Perceptually Lossy/Lossless predictor (ST-Network-PLLP) is proposed for key frames by learning the spatial and temporal distortion features, and a threshold-based integration strategy is proposed to obtain the final results. Experimental results evaluated on the VideoSet database show that the mean prediction accuracy of PLLP-CV is about 95.6%, and the mean JND prediction error is 1.46 in QP and 0.74 in Peak-to-Noise Ratio (PSNR), which achieve 15% and 14.9% improvements, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Generative Adversarial Network-Based Distortion Reduction Adapted to Peak Signal-to-Noise Ratio Parameters in VVC.
- Author
-
Deng, Weihao and Yang, Zhenglong
- Subjects
IMAGE stabilization ,VIDEO coding ,IMAGE reconstruction ,SIGNAL-to-noise ratio ,GENERATIVE adversarial networks ,VIDEO compression - Abstract
In order to address the issues of image quality degradation and distortion that arise in the context of video transmission coding and decoding, a method based on an enhanced version of CycleGAN is put forth. The lightweight attention module is integrated into the residual block of the generator module structure, thereby facilitating the extraction of image details and motion compensation. Furthermore, the perceptual function LPIPS loss is increased to align the image restoration effect more closely with human perception. Additionally, the network training method is modified, and the original image is divided into 128 × 128 small blocks for training, thus enhancing the network's accuracy in restoring details. The experimental results demonstrate that the algorithm attains an average PSNR value of 30.1147 on the publicly accessible YUV sequence dataset, YUV Trace Dataset, which is a 9.02% enhancement compared to the original network. Additionally, the LPIPS value reaches 0.2639, representing a 10.42% reduction, and effectively addresses the issue of image quality deterioration during transmission. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. A Novel Video Compression Approach Based on Two-Stage Learning.
- Author
-
Shao, Dan, Wang, Ning, Chen, Pu, Liu, Yu, and Lin, Lin
- Subjects
- *
IMAGE compression , *OPTICAL flow , *CONTINUOUS groups , *VIDEO coding , *VIDEO compression , *VIDEOS - Abstract
In recent years, the rapid growth of video data posed challenges for storage and transmission. Video compression techniques provided a viable solution to this problem. In this study, we proposed a bidirectional coding video compression model named DeepBiVC, which was based on two-stage learning. Firstly, we conducted preprocessing on the video data by segmenting the video flow into groups of continuous image frames, with each group comprising five frames. Then, in the first stage, we developed an image compression module based on an invertible neural network (INN) model to compress the first and last frames of each group. In the second stage, we designed a video compression module that compressed the intermediate frames using bidirectional optical flow estimation. Experimental results indicated that DeepBiVC outperformed other state-of-the-art video compression methods regarding PSNR and MS-SSIM metrics. Specifically, on the VUG dataset at bpp = 0.3, DeepBiVC achieved a PSNR of 37.16 and an MS-SSIM of 0.98. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Left-turn queue spillback identification based on single-section license plate recognition data.
- Author
-
Wu, Hao, Yao, Jiarong, Cao, Yumin, and Tang, Keshuang
- Subjects
- *
AUTOMOBILE license plates , *MARKOV chain Monte Carlo , *INTELLIGENT transportation systems , *VIDEO compression , *IDENTIFICATION - Abstract
Left-turn queue spillback occurs when the queue of left-turn or adjacent through traffic exceeds the channelized section's capacity, resulting in discharge flow breakdown and inefficient use of green time. License plate recognition (LPR) systems, which offer lane-based data on individual vehicle discharging, present an ideal data source for the identification of queue spillback. Therefore, this paper proposes a method for left-turn queue spillback identification using single-section LPR data. The proposed method involves defining 10 types of left-turn spillbacks based on the queuing status of the left-turn and adjacent through lanes. Then, a lightweight algorithm is proposed to estimate lane-based queue lengths. At last, a rule-based identification algorithm is developed to specify spillback types by considering the left-turn phasing, estimated queue lengths, and lane capacity. The empirical experiment demonstrates a promising accuracy of 87.28%. Additionally, simulations with five left-turn phasing schemes further validate the proposed method, averaging nearly 90% accuracy in identifying spillbacks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. OptiRet-Net: An Optimized Low-Light Image Enhancement Technique for CV-Based Applications in Resource-Constrained Environments.
- Author
-
HUSSAIN, HANAN, TAMIZHARASAN, P. S., and YADAV, PRAVEEN KUMAR
- Subjects
- *
IMAGE recognition (Computer vision) , *VIDEO compression , *IMAGE intensifiers , *VIDEO coding , *MATHEMATICAL optimization , *DEEP learning - Abstract
The illumination of images can significantly impact computer-vision applications such as image classification, multiple object detection, and tracking, leading to a significant decline in detection and tracking accuracy. Recent advancements in deep learning techniques have been applied to Low-Light Image Enhancement (LLIE) to combat this issue. Retinex theory-based methods following a decomposition-adjustment pipeline for LLIE have performed well in various aspects. Despite their success, current research on Retinex-based deep learning still needs to improve in terms of optimization techniques and complicated convolution connections, which can be computationally intensive for end-device deployment. We propose an Optimized Retinex-Based CNN (OptiRet-Net) deep learning framework to address these challenges for the LLIE problem. Our results demonstrate that the proposed method outperforms existing state-of-the-art models in terms of full reference metrics with a PSNR of 21.87, SSIM of 0.80, LPIPS of 0.16, and zero reference metrics with a NIQE of 3.4 and PIQE of 56.6. Additionally, we validate our approach using a comprehensive evaluation comprising five datasets and nine prior methods. Furthermore, we assess the efficacy of our proposed model combining low-light multiple object tracking applications using YOLOX and ByteTrack in Versatile Video Coding (VVC/H.266) across various quantization parameters. Our findings reveal that LLIE-enhanced frames surpass their tracking results with a MOTA of 80.6% and a remarkable precision rate of 96%. Our model also achieves minimal file sizes by effectively compressing the enhanced low-light images while maintaining their quality, making it suitable for resource-constrained environments where storage or bandwidth limitations are a concern. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. A ViT‐Based Adaptive Recurrent Mobilenet With Attention Network for Video Compression and Bit‐Rate Reduction Using Improved Heuristic Approach Under Versatile Video Coding.
- Author
-
Padmapriya, D. and Roseline A, Ameelia
- Subjects
- *
VIDEO compression standards , *TRANSFORMER models , *BIT rate , *VIDEO compression , *DEEP learning , *VIDEO processing , *VIDEO coding - Abstract
Video compression received attention from the communities of video processing and deep learning. Modern learning‐aided mechanisms use a hybrid coding approach to reduce redundancy in pixel space across time and space, improving motion compensation accuracy. The experiments in video compression have important improvements in past years. The Versatile Video Coding (VVC) is the primary enhancing standard of video compression which is also referred to as H. 226. The VVC codec is a block‐assisted hybrid codec, making it highly capable and complex. Video coding effectively compresses data while reducing compression artifacts, enhancing the quality and functionality of AI video technologies. However, the traditional models suffer from the incorrect compression of the motion and ineffective compensation frameworks of the motion leading to compression faults with a minimal trade‐off of the rate distortion. This work implements an automated and effective video compression task under VVC using a deep learning approach. Motion estimation is conducted using the Motion Vector (MV) encoder‐decoder model to track movements in the video. Based on these MV, the reconstruction of the frame is carried out to compensate for the motions. The residual images are obtained by using Vision Transformer‐based Adaptive Recurrent MobileNet with Attention Network (ViT‐ARMAN). The parameters optimization of the ViT‐ARMAN is done using the Opposition‐based Golden Tortoise Beetle Optimizer (OGTBO). Entropy coding is used in the training phase of the developed work to find the bit rate of residual images. Extensive experiments were conducted to demonstrate the effectiveness of the developed deep learning‐based method for video compression and bit rate reduction under VVC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Performance analysis of multiview video compression based on MIV and VVC multilayer.
- Author
-
Lee, Jinho, Bang, Gun, Kang, Jungwon, Teratani, Mehrdad, Lafruit, Gauthier, and Choi, Haechul
- Subjects
VIDEO coding ,VIDEO compression ,MPEG (Video coding standard) ,CAMERAS ,PICTURES - Abstract
To represent immersive media providing six degree‐of‐freedom experience, moving picture experts group (MPEG) immersive video (MIV) was developed to compress multiview videos. Meanwhile, the state‐of‐the‐art versatile video coding (VVC) also supports multilayer (ML) functionality, enabling the coding of multiview videos. In this study, we designed experimental conditions to assess the performance of these two state‐of‐the‐art standards in terms of objective and subjective quality. We observe that their performances are highly dependent on the conditions of the input source, such as the camera arrangement and the ratio of input views to all views. VVC‐ML is efficient when the input source is captured by a planar camera arrangement and many input views are used. Conversely, MIV outperforms VVC‐ML when the camera arrangement is non‐planar and the ratio of input views to all views is low. In terms of the subjective quality of the synthesized view, VVC‐ML causes severe rendering artifacts such as holes when occluded regions exist among the input views, whereas MIV reconstructs the occluded regions correctly but induces rendering artifacts with rectangular shapes at low bitrates. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Occupancy Map Guided Attributes Artifacts Removal for Video-Based Point Cloud Compression.
- Author
-
Chen, Peilin, Wang, Shiqi, and Li, Zhu
- Subjects
VIDEO compression ,POINT cloud ,SIGNALS & signaling - Abstract
Point clouds offer realistic 3D representations of objects and scenes at the expense of large data volumes. To represent such data compactly in real-world applications, Video-Based Point Cloud Compression (V-PCC) converts their texture into 2D attributes and occupancy maps before applying lossy video compression. Unfortunately, the coding artifacts introduced in the decoded attribute maps eventually degrade the quality of the reconstructed point cloud, thereby influencing its immersive experience. This article proposes a deep learning-based attribute map enhancement method that fully leverages the occupancy map's guidance. The design philosophy is that the cross-modality guidance from occupancy can be leveraged as critical information to enhance the attribute. Therefore, instead of treating attribute and occupancy as two separate sources of signals, occupancy serves as an indispensable auxiliary, such that the proposed framework explicitly provides the model with abundant clues by conducting local feature modification and global dependencies aggregation. In particular, the proposed framework is compatible with existing V-PCC bitstreams and can be feasibly incorporated into the standardized decoder pipeline. Extensive evaluations show the effectiveness of the proposed framework in attribute enhancement, with equivalently 6.0% Bjontegaard Delta-rate (BD-rate) savings obtained. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Upsampling Algorithm for V-PCC-Coded 3D Point Clouds.
- Author
-
Lin, Ting-Lan, Su, Bing-Wei, Shen, Po-Cheng, Chen, Ding-Yuan, Liang, Chi-Fu, Chen, Yan-Cheng, Wen, Yangming, and Shahid, Mohammad
- Subjects
MPEG (Video coding standard) ,VIDEO coding ,TIME complexity ,VIDEO compression ,POINT cloud - Abstract
Point cloud (PC) compression is crucial to immersive visual applications such as autonomous vehicles to classify objects on the roads. The Motion Picture Experts Group (MPEG) standardization group has achieved a notable compression efficiency, called video-based PC compression (V-PCC), which consists of an encoder-decoder. The V-PCC encoder takes original 3D PC data and projects them onto multiple 2D planes to generate several 2D feature images. These images are then compressed using the well-established High-Efficiency Video Coding (HEVC) method. The V-PCC decoder uses compressed information and decoding techniques to reconstruct the 3D PC. However, the PCs produced by V-PCC are often sparse, non-uniform, and contain artifacts. In many practical applications, it is necessary to recover complete PCs from partial ones in real time. This article presents a method for enhancing decoded PCs as a post-processing step in the V-PCC with reduced computational time. Our approach involves a 2D upsampling for the V-PCC occupancy image, which increases the density of the PC, and a 2D high-resolution auxiliary information modification algorithm for the 2D-3D conversion of high-resolution 3D PCs, which improves the uniformity and reduces the noise in the PC. The 3D high-resolution PC has been further enhanced using the developed 3D outlier removal and point regeneration algorithm. Our proposed work can significantly simplify the state-of-the-art super resolution methods for PCs and reduce the time complexity of 61–75% while maintaining a high level of quality in PCs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Video compression based on zig-zag 3D DCT and run-length encoding for multimedia communication systems
- Author
-
Chutke, Sravanthi, N.M., Nandhitha, and Lendale, Praveen Kumar
- Published
- 2024
- Full Text
- View/download PDF
24. Deepfake Detection: A Comprehensive Survey from the Reliability Perspective.
- Author
-
Wang, Tianyi, Liao, Xin, Chow, Kam Pui, Lin, Xiaodong, and Wang, Yinglong
- Subjects
- *
ARTIFICIAL neural networks , *PATTERN recognition systems , *ARTIFICIAL intelligence , *CONVOLUTIONAL neural networks , *NATURAL language processing , *VIDEO compression , *DEEP learning - Published
- 2025
- Full Text
- View/download PDF
25. An intelligent framework of VVC-based video compression and bit rate reduction using vision transformer-based adaptive residual attention densenet.
- Author
-
Padmapriya, D. and Ameelia Roseline, A.
- Abstract
The Versatile Video Coding (VVC) shows better performance by combining various functions and features for the high dynamic range, and better spatial resolution achieving better bit rate savings than the existing video coding models. Although the VVC maintains better quality compressed video utilizing extra encoding functions, the VVC is still in the operation of continuous enhancement, and the experts are continuously suggesting new technologies to enhance the VVC's coding performance. In addition, the traditional mechanisms utilize more resources and the encoding time to perform the task. Since the conventional models are normally complex and the parameter's amount is highly large, it is needed to develop a lightweight model for VVC. Hence, a new mechanism is suggested in this work for video compression and bit rate minimization based on VVC by influencing deep learning models. At first, by employing the Motion Vector (MV) encoder-decoder task, the motion is measured in the suggested work. Moreover, with the assistance of this MV, the frame reformation is carried out to conduct the motion compensation. The compression process is performed and the residual images are achieved by adopting the Vision Transformer-based Adaptive Residual Attention DenseNet (ViT-ARADNet), where the parameters included in this network are optimally tuned by the Random Value Enhanced Pelican Optimization (RVEPO). Further, the bit rate of the residual image is determined by the entropy coding in the presented work's training phase. Subsequently, the video quality assessment metrics such as Visual Information Fidelity (VIF) and predicted Differential Mean Opinion Score (DMOSp) are measured to enrich the model functionality. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
26. Learned scalable video coding for humans and machines
- Author
-
Hadi Hadizadeh and Ivan V. Bajić
- Subjects
Video compression ,Video analytics ,Scalable coding ,Deep learning ,Coding for machines ,Electronics ,TK7800-8360 - Abstract
Abstract Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.
- Published
- 2024
- Full Text
- View/download PDF
27. Deep learning-guided video compression for machine vision tasks
- Author
-
Aro Kim, Seung-taek Woo, Minho Park, Dong-hwi Kim, Hanshin Lim, Soon-heung Jung, Sangwoon Kwak, and Sang-hyo Park
- Subjects
Video compression ,Video coding for machines ,Deep learning ,Electronics ,TK7800-8360 - Abstract
Abstract In the video compression industry, video compression tailored to machine vision tasks has recently emerged as a critical area of focus. Given the unique characteristics of machine vision, the current practice of directly employing conventional codecs reveals inefficiency, which requires compressing unnecessary regions. In this paper, we propose a framework that more aptly encodes video regions distinguished by machine vision to enhance coding efficiency. For that, the proposed framework consists of deep learning-based adaptive switch networks that guide the efficient coding tool for video encoding. Through the experiments, it is demonstrated that the proposed framework has superiority over the latest standardization project, video coding for machine benchmark, which achieves a Bjontegaard delta (BD)-rate gain of 5.91% on average and reaches up to a 19.51% BD-rate gain.
- Published
- 2024
- Full Text
- View/download PDF
28. Efficient Video Compression Using Afterimage Representation.
- Author
-
Jeon, Minseong and Cheoi, Kyungjoo
- Subjects
- *
LANGUAGE models , *DATA compression , *OPTICAL flow , *VIDEO processing , *VIDEO compression , *CLASSIFICATION - Abstract
Recent advancements in large-scale video data have highlighted the growing need for efficient data compression techniques to enhance video processing performance. In this paper, we propose an afterimage-based video compression method that significantly reduces video data volume while maintaining analytical performance. The proposed approach utilizes optical flow to adaptively select the number of keyframes based on scene complexity, optimizing compression efficiency. Additionally, object movement masks extracted from keyframes are accumulated over time using alpha blending to generate the final afterimage. Experiments on the UCF-Crime dataset demonstrated that the proposed method achieved a 95.97% compression ratio. In binary classification experiments on normal/abnormal behaviors, the compressed videos maintained performance comparable to the original videos, while in multi-class classification, they outperformed the originals. Notably, classification experiments focused exclusively on abnormal behaviors exhibited a significant 4.25% improvement in performance. Moreover, further experiments showed that large language models (LLMs) can interpret the temporal context of original videos from single afterimages. These findings confirm that the proposed afterimage-based compression technique effectively preserves spatiotemporal information while significantly reducing data size. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Learned scalable video coding for humans and machines.
- Author
-
Hadizadeh, Hadi and Bajić, Ivan V.
- Subjects
ARTIFICIAL neural networks ,COMPUTER vision ,VIDEO codecs ,STREAMING video & television ,TRAFFIC monitoring ,VIDEO coding ,VIDEO compression - Abstract
Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. A video compression-cum-classification network for classification from compressed video streams.
- Author
-
Yadav, Sangeeta, Gulia, Preeti, Gill, Nasib Singh, Yahya, Mohammad, Shukla, Piyush Kumar, Pareek, Piyush Kumar, and Shukla, Prashant Kumar
- Subjects
- *
ARTIFICIAL neural networks , *STREAMING video & television , *VIDEO compression , *USER-generated content , *DEEP learning , *VIDEO coding - Abstract
Video analytics can achieve increased speed and efficiency by operating directly on the compressed video format, thereby alleviating the decoding burden on the analytics server. The encoded video streams are rich in semantic binary information and this information can be utilized more efficiently to train the classifiers. Motivated by the same notion, a deep learning-based video compression-cum-classification network has been proposed. In the proposed work, the binary-coded semantic information is extracted by using an auto encoder-based video compression component and the same fed to the MobileNetv2-based classifier for the classification of the given video streams based on their content. Using large-scale user-generated content provided by YouTube UGC dataset, it has been demonstrated that using deep neural networks for compression not only provides on-par compression results to traditional methods, it makes analytical processing of these videos faster. Video content tagging of YouTube UGC dataset has been used as the analytics task. The proposed DLVCC approach performs 10 × faster with 30 × fewer parameters than MobileNetv2 in video tagging of compressed video with no loss in accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Efficient Block Matching Motion Estimation Using Variable-Size Blocks and Predictive Tools.
- Author
-
Mirjalili, Milad and Mousavinia, Amir
- Subjects
- *
VIDEO compression , *SIGNAL-to-noise ratio , *SEARCH algorithms , *ALGORITHMS - Abstract
In this research paper, we introduce an adaptive block-matching motion estimation algorithm to improve the accuracy and efficiency of motion estimation (ME). First, we present a block generation system that creates blocks of varying sizes based on the detected motion location. Second, we incorporate predictive tools such as early termination and variable window size to optimize our block-matching algorithm. Furthermore, we propose two distinct search patterns to achieve maximum quality and efficiency. We evaluated the proposed algorithms on 20 videos and compared the results with known algorithms, including the full search algorithm (FSA), which is a benchmark for ME accuracy. Our proposed quality-based algorithm shows an improvement of 0.27 dB in peak signal-to-noise ratio (PSNR) on average for reconstructed frames compared to FSA, along with a reduction of 71.66% in searched blocks. Similarly, our proposed efficiency-based method results in a 0.07 dB increase in PSNR and a 97.93% reduction in searched blocks compared to FSA. These findings suggest that our proposed method has the potential to improve the performance of ME in video coding. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. A Flexible Algorithm Design of Spatial Scalability for Real-time Surveillance Applications.
- Author
-
Zhe Zheng, Jinghua Liu, Darui Sun, Jinghui Lu, Song Qiu, Yanwei Xiong, Rui Liu, and Wenpeng Cu
- Abstract
The Surveillance Video and Audio Coding )SVAC( working group is currently developing the third-generation video compression standard, SVAC 3.0. To extend this standard, this paper proposes a spatial scalable coding Scalable Surveillance Video Coding )SSVC( framework, so that the SVAC3.0 video stream can gracefully adapt to different transmission bandwidth limitations and the requirements of decoding hardware, while keep the quality of the reconstructed image without degradation. In order to achieve the scalability of hardware implementation, SSVC designs a flexible reference frame marking and usage scheme, so that the enhanced layer coding does not directly depend on the basic layer, and effectively reduces the coding coupling between layers. SSVC improves the motion vector prediction method, effectively utilizes the encoding information of the base layer, and is mainly compatible with SVAC3.0 syntax structures. In addition, SSVC provides several different operational modes to adapt to various application scenarios and achieve an optimal trade-off between coding efficiency and complexity. The performance comparisons among simulcast stream and SSVC, single-layer stream and SSVC enhancement layer, as well as experimental data for different operational modes are provided. The coding efficiency and computational complexity are also analyzed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Human-Machine Collaborative Image and Video Compression: A Survey.
- Author
-
Li, Huanyang, Zhang, Xinfeng, Wang, Shiqi, Wang, Shanshe, Pan, Jingshan, Gao, Wei, and Kwong, Sam
- Subjects
COMPUTER vision ,IMAGE compression ,VIDEO compression ,BINARY sequences ,VIDEO coding - Abstract
Traditional image and video compression methods are designed to maintain the quality of human visual perception, which makes it necessary to reconstruct the image or video before machine analysis. Compression methods oriented towards machine vision tasks make it possible to use the bit stream directly for machine vision tasks, but it is difficult for them to decode high quality images. To bridge the gap between machine vision tasks and signal-level representation, researchers present plenty of the human-machine collaborative compression methods. In order to provide researchers with a comprehensive understanding of this field and promote the development of image and video compression, we present this survey. In this work, we give a problem definition and explore the relationship and application scenarios of different methods. In addition, we provide a comparative analysis of existing methods on compression and machine vision tasks performance. Finally, we provide a discussion of several directions that are most promising for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Remote physiological signal recovery with efficient spatio-temporal modeling.
- Author
-
Bochao Zou, Yu Zhao, Xiaocheng Hu, Changyu He, and Tianwa Yang
- Subjects
AFFECTIVE computing ,VIDEO compression ,PEARSON correlation (Statistics) ,RESPIRATORY measurements ,DEEP learning - Abstract
Contactless physiological signal measurement has great applications in various fields, such as affective computing and health monitoring. Physiological measurements based on remote photoplethysmography (rPPG) are realized by capturing the weak periodic color changes. The changes are caused by the variation in the light absorption of skin surface during systole and diastole stages of a functioning heart. This measurement mode has advantages of contactless measurement, simple operation, low cost, etc. In recent years, several deep learning-based rPPG measurement methods have been proposed. However, the features learned by deep learning models are vulnerable to motion and illumination artefacts, and are unable to fully exploit the intrinsic temporal characteristics of the rPPG. This paper presents an efficient spatiotemporal modeling-based rPPG recovery method for physiological signal measurements. First, two modules are utilized in the rPPG task: 1) 3D central difference convolution for temporal context modeling with enhanced representation and generalization capacity, and 2) Huber loss for robust intensity-level rPPG recovery. Second, a dual branch structure for both motion and appearance modeling and a soft attention mask are adapted to take full advantage of the central difference convolution. Third, a multi-task setting for joint cardiac and respiratory signals measurements is introduced to benefit from the internal relevance between two physiological signals. Last, extensive experiments performed on three public databases show that the proposed method outperforms prior state-of-the-art methods with the Pearson's correlation coefficient higher than 0.96 on all three datasets. The generalization ability of the proposed method is also evaluated by cross-database and video compression experiments. The effectiveness and necessity of each module are confirmed by ablation studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Efficient immersive video coding using specular detection for high rendering quality.
- Author
-
Choi, Yongho, Van Le, The, Bang, Gun, Lee, Jinho, and Lee, Jin Young
- Subjects
MIXED reality ,VIRTUAL reality ,VIDEO coding ,VIDEO compression - Abstract
Previous 2D video coding standards obtain efficient compression of traditional 2D color images. However, because new services, such as virtual reality (VR), augmented reality (AR), and mixed reality (MR), have been recently introduced, an immersive video coding standard that compresses view information captured at many viewpoints is being actively developed for high immersion of VR, AR, and MR. This video coding standard generates patches, which represent non-overlapping areas among different views. In general, the patches give a high impact on rendering of specular areas in virtual viewpoints, but it is very difficult to accurately find them. Therefore, this paper proposes an efficient immersive video coding method using specular detection for high rendering quality, which generates additional specular patches. Experimental results demonstrate that the proposed method improves the rendering quality in terms of specularity with a negligible change in coding performance. In particular, subjective assessments clearly show the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. CoFFEE: a codec-based forensic feature extraction and evaluation software for H.264 videos.
- Author
-
Bertazzini, Giulia, Baracchi, Daniele, Shullani, Dasara, Iuliani, Massimo, and Piva, Alessandro
- Subjects
VIDEO codecs ,FEATURE extraction ,DIGITAL video ,SOCIAL networks ,SCIENTIFIC community ,VIDEO compression - Abstract
The forensic analysis of digital videos is becoming increasingly relevant to deal with forensic cases, propaganda, and fake news. The research community has developed numerous forensic tools to address various challenges, such as integrity verification, manipulation detection, and source characterization. Each tool exploits characteristic traces to reconstruct the video life-cycle. Among these traces, a significant source of information is provided by the specific way in which the video has been encoded. While several tools are available to analyze codec-related information for images, a similar approach has been overlooked for videos, since video codecs are extremely complex and involve the analysis of a huge amount of data. In this paper, we present a new tool designed for extracting and parsing a plethora of video compression information from H.264 encoded files, including macroblocks structure, prediction residuals, and motion vectors. We demonstrate how the extracted features can be effectively exploited to address various forensic tasks, such as social network identification, source characterization, and double compression detection. We provide a detailed description of the developed software, which is released free of charge to enable its use by the research community to create new tools for forensic analysis of video files. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. iHELP: a model for instant learning of video coding in VR/AR real-time applications.
- Author
-
Sharrab, Yousef O., Alsmirat, Mohammad A., Eljinini, Mohammad Ali H., and Sarhan, Nabil J.
- Subjects
ARTIFICIAL intelligence ,SMART structures ,STREAMING video & television ,COMPUTATIONAL complexity ,VIDEO coding ,VIDEO compression ,AUGMENTED reality - Abstract
Virtual and augmented reality (VR/AR), teleoperation, and telepresence technologies heavily depend on video streaming and playback to enable immersive user experiences. However, the substantial bandwidth requirements and file sizes associated with VR/AR and 360-degree video content present significant challenges for efficient transmission and storage. Modern video coding standards, including HEVC, AV1, VP9, VVC, and EVC, have been designed to address these issues by enhancing coding efficiency while maintaining video quality on par with the H.264 standard. Nonetheless, the adaptive block structures inherent to these video coding standards introduce increased computational complexity, necessitating additional intra-prediction modes. The integration of AI in video coding has the potential to substantially improve video compression efficiency, reduce file sizes, and enhance video quality, making it a crucial area of research and development within the video coding domain. As AI systems can execute a wide array of tasks and adapt to new challenges, their incorporation into video coding may result in even more advanced compression techniques and innovative solutions to meet the ever-evolving demands of the industry. In this study, we introduce a state-of-the-art adaptive instant learning-based model, named iHELP, developed to address the computational complexity arising from encoders' adaptive block structures. The iHELP model achieves outstanding coding efficiency and quality while considerably improving encoding speed. iHELP model has been tested on HEVC, but it applies to other encoders with similar adaptive block structures. iHELP model employs entropy-based block similarity to predict the splitting decision of the LCU, determining whether to divide the block based on the correlation between the block content and previously adjacent encoded blocks in both spatial and temporal dimensions. Our methodology has been rigorously evaluated using the HEVC standard's common test conditions, and the results indicate that iHELP serves as an effective solution for efficient video coding in bandwidth-constrained situations, making it suitable for real-time video applications. The proposed method achieves an 80% reduction in encoding time while maintaining comparable PSNR performance relative to the RDO approach. The exceptional potential of the iHELP model calls for further exploration, as no other existing methods have demonstrated such a high level of performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Fast Subpixel Motion Estimation Based on Human Visual System.
- Author
-
Hosseini Avashanagh, Dadvar, Nooshyar, Mehdi, Barghandan, Saeed, Ghandchi, Majid, and Ortale, Riccardo
- Subjects
VISUAL perception ,STATISTICS ,ALGORITHMS ,VIDEO compression ,VIDEOS ,ENCODING ,VIDEO coding - Abstract
More than 80% of video coding times are consumed by motion estimation calculations, which are the most complex aspect of the process. This method eliminates temporal redundancies in a video sequence to achieve maximum compression. Numerous efforts have been made to bring calculations closer to real time, yielding fruitful results. This study proposes a fast subpixel motion estimation algorithm for video encoding with fewer search points. This method employs the capabilities of human visual systems (HVSs), physical motion characteristics of real‐world objects, and special image information from successive frames. The number of search points (NSP) using the statistical data of the movement of the blocks in the frames of video sequences is reduced to apply fewer calculations to the system while maintaining the quality of images. Therefore, it is possible to approach fast and real‐time calculations instead of time‐consuming algorithms by accurately modeling this algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Fast CU Partition Decision Algorithm Based on Bayesian and Texture Features.
- Author
-
Tian, Erlin, Yang, Yifan, and Zhang, Qiuwen
- Subjects
BLOCK codes ,VIDEO coding ,INTERNET speed ,VIDEO processing ,VIDEO compression ,PARALLEL algorithms - Abstract
As internet speeds increase and user demands for video quality grow, video coding standards continue to evolve. H.266/Versatile Video Coding (VVC), as the new generation of video coding standards, further improves compression efficiency but also brings higher computational complexity. Despite the significant advancements VVC has made in compression ratio and video quality, the introduction of new coding techniques and complex coding unit (CU) partitioning methods have also led to increased encoding complexity. This complexity not only extends encoding time but also increases hardware resource consumption, limiting the application of VVC in real-time video processing and low-power devices.To alleviate the encoding complexity of VVC, this paper puts forward a Bayesian and texture-feature-based fast splitting algorithm for coding intraframe bloc of VVC, which aims to reduce unnecessary computational steps, enhance encoding efficiency, and maintain video quality as much as possible. In the stage of rapid coding, the video frames are coded by the original VVC test model (VTM), and Joint Rough Mode Decision (JRMD) evaluation cost is used to update the parameter in the Bayesian algorithm to come and set the two thresholds to judge whether the current coding block continues to be split or not. Then, for coding blocks larger than those satisfying the above threshold conditions, the predominant direction of the texture within the coding block is ascertained by calculating the standard deviations along both the horizontal and vertical axes so as to skip some unnecessary splits in the current coding block patterns. The findings from our experiments demonstrate that our proposed approach improves the encoding rate by 1.40% on average, and the execution time of the encoder has been reduced by 49.50%. The overall algorithm has optimized the VVC intraframe coding technology and reduced the coding complexity of VVC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Information Bottleneck Driven Deep Video Compression—IBOpenDVCW.
- Author
-
Leiderman, Timor and Ben Ezra, Yosef
- Subjects
- *
WAVELETS (Mathematics) , *VIDEO coding , *LINEAR network coding , *CODECS , *VIDEO compression , *MOTHERS - Abstract
Video compression remains a challenging task despite significant advancements in end-to-end optimized deep networks for video coding. This study, inspired by information bottleneck (IB) theory, introduces a novel approach that combines IB theory with wavelet transform. We perform a comprehensive analysis of information and mutual information across various mother wavelets and decomposition levels. Additionally, we replace the conventional average pooling layers with a discrete wavelet transform creating more advanced pooling methods to investigate their effects on information and mutual information. Our results demonstrate that the proposed model and training technique outperform existing state-of-the-art video compression methods, delivering competitive rate-distortion performance compared to the AVC/H.264 and HEVC/H.265 codecs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. MEDUSA: A Dynamic Codec Switching Approach in HTTP Adaptive Streaming.
- Author
-
Lorenzi, Daniele, Tashtarian, Farzad, Hellwagner, Hermann, and Timmerer, Christian
- Subjects
VIDEO codecs ,VIDEO on demand ,SET functions ,CODECS ,PERCEIVED quality ,VIDEO compression - Abstract
HTTP Adaptive Streaming (HAS) solutions utilize various Adaptive BitRate (ABR) algorithms to dynamically select appropriate video representations, aiming at adapting to fluctuations in network bandwidth. However, current ABR implementations have a limitation in that they are designed to function with one set of video representations, i.e., the bitrate ladder, which differ in bitrate and resolution, but are encoded with the same video codec. When multiple codecs are available, current ABR algorithms select one of them prior to the streaming session and stick to it throughout the entire streaming session. Although newer codecs are generally preferred over older ones, their compression efficiencies differ depending on the content's complexity , which varies over time. Therefore, it is necessary to select the appropriate codec for each video segment to reduce the requested data while delivering the highest possible quality. In this article, we first provide a practical example where we compare compression efficiencies of different codecs on a set of video sequences. Based on this analysis, we formulate the optimization problem of selecting the appropriate codec for each user and video segment (on a per-segment basis in the outmost case), refining the selection of the ABR algorithms by exploiting key metrics, such as the perceived segment quality and size. Subsequently, to address the scalability issues of this centralized model, we introduce a novel distributed plug-in ABR algorithm for Video on Demand (VoD) applications called MEDUSA to be deployed on top of existing ABR algorithms. MEDUSA enhances the user's Quality of Experience (QoE) by utilizing a multi-objective function that considers the quality and size of video segments when selecting the next representation. Using quality information and segment size from the modified Media Presentation Description (MPD), MEDUSA utilizes buffer occupancy to prioritize quality or size by assigning specific weights in the objective function. To show the impact of MEDUSA, we compare the proposed plug-in approach on top of state-of-the-art techniques with their original implementations and analyze the results for different network traces, video content, and buffer capacities. According to the experimental findings, MEDUSA shows the ability to improve QoE for various test videos and scenarios. The results reveal an impressive improvement in the QoE score of up to 42% according to the ITU-T P.1203 model (mode 0). Additionally, MEDUSA can reduce the transmitted data volume by up to more than 40% achieving a QoE similar to the techniques compared, reducing the burden on streaming service providers for delivery costs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Preprocessing for Multi-Dimensional Enhancement and Reconstruction in Neural Video Compression.
- Author
-
Wang, Jiajia, Zhang, Qi, Zhao, Haiwu, Wang, Guozhong, and Shang, Xiwu
- Subjects
CONVOLUTIONAL neural networks ,FEATURE extraction ,VIDEO compression ,CODECS ,VIDEO coding ,VIDEO on demand ,PIXELS - Abstract
The surge in ultra-high-definition video content has intensified the demand for advanced video compression techniques. Video encoding preprocessing can improve coding efficiency while ensuring a high degree of compatibility with existing codecs. Existing video encoding preprocessing methods are limited in their ability to fully exploit redundant features in video data and recover high-frequency details, and their network architectures often lack compatibility with neural video encoders. To addressing these challenges, we propose a Multi-Dimensional Enhancement and Reconstruction (MDER) preprocessing method to improve the efficiency of deep learning-based neural video encoders. Firstly, our approach integrates a degradation compensation module to mitigate encoding noise and boost feature extraction efficiency. Secondly, a lightweight fully convolutional neural network is employed, which utilizes residual learning and knowledge distillation to refine and suppress irrelevant features across spatial and channel dimensions. Furthermore, to maximize the use of redundant information, we incorporate Dense Blocks, which can enhance and reconstruct important features in the video data during preprocessing. Finally, the preprocessed frames are then mapped from pixel space to feature space through the Dense Feature-Enhanced Video Compression (DFVC) module, which improves motion estimation and compensation accuracy. The experimental results demonstrate that, compared to neural video encoders, the MDER method can reduce bits per pixel (Bpp) by 0.0714 and 0.0536 under equivalent PSNR and MS-SSIM conditions, respectively. These results demonstrate significant improvements in compression efficiency and reconstruction quality, highlighting the effectiveness of the MDER preprocessing method and its compatibility with neural video codec workflows. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. МОДЕЛЬ ОЦІНЮВАННЯ ІНФОРМАТИВНОСТІ СПЕКТРАЛЬНО-ПАРАМЕТРИЧНОГО ОПИСУ ТРАНСФОРМОВАНИХ ВІДЕОФРАГМЕНТІВ.
- Author
-
Цімура, Ю. В., Юдін, О. К., Мельников, О. Є., Коляденко, Ю. Ю., and Гуржій, П. М.
- Subjects
IMAGE compression ,STREAMING video & television ,VIDEO compression ,VISUAL perception ,INFORMATION organization - Abstract
The article substantiates that in modern conditions infocommunication technologies that use the wireless principle of data transmission are more developed. Approaches to the use of such technologies for the organization of information exchange with unmanned platforms are shown. This increases the efficiency of information processes in support and decision-making systems. At the same time, the article asserts the presence of problematic aspects. They are dictated by an imbalance between a certain level of intensity of the information flow that needs to be transmitted from the unmanned system and the level of throughput of on-board telecommunications facilities. To solve this problem, the article thoroughly proposes to use technologies for compressing video information flows. A study of modern methods of compression of video information streams is carried out. It is shown that they use platforms that are based on the encoding of transformed video fragments (VFR). One of the tested directions here is the formation of a spectralparametric description of transformants. An example of such an approach is the construction of two structural components: the vector of the lengths of spectral bands and the vector of their significant levels. However, existing image compression technologies based on the JPEG platform do not provide the necessary efficiency of delivery of compressed video streams for a given level of visual quality of their visual perception after decompression. A model for estimating the informativeness of the transformer has been developed, which is presented in the spectral-parametric description by two components: the vector of the length of the SSB; vector of significant levels of the SSB. In this case, a limited number of permissible states of the SPT is taken into account on the basis of taking into account the current power of the alphabets of its two structural components. It is substantiated that as a result of the establishment of restrictions on the components of the structural components of SPT, conditions are created to reduce the amount of redundancy. It is shown that the amount of redundancy that can be reduced will increase in the case of: a decrease in the number of spectral subbands; decrease in the current power of the alphabet of significant components of the SSB. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Adaptive QP algorithm for depth range prediction and encoding output in virtual reality video encoding process.
- Author
-
Yang, Hui, Liu, Qiuming, and Song, Chao
- Subjects
- *
VIRTUAL reality , *ELECTRONIC data processing , *VIDEO processing , *VIDEO coding , *ENCODING , *DECISION making , *VIDEO compression - Abstract
In order to reduce the encoding complexity and stream size, improve the encoding performance and further improve the compression performance, the depth prediction partition encoding is studied in this paper. In terms of pattern selection strategy, optimization analysis is carried out based on fast strategic decision-making methods to ensure the comprehensiveness of data processing. In the design of adaptive strategies, different adaptive quantization parameter adjustment strategies are adopted for the equatorial and polar regions by considering the different levels of user attention in 360 degree virtual reality videos. The purpose is to achieve the optimal balance between distortion and stream size, thereby managing the output stream size while maintaining video quality. The results showed that this strategy achieved a maximum reduction of 2.92% in bit rate and an average reduction of 1.76%. The average coding time could be saved by 39.28%, and the average reconstruction quality was 0.043, with almost no quality loss detected by the audience. At the same time, the model demonstrated excellent performance in sequences of 4K, 6K, and 8K. The proposed deep partitioning adaptive strategy has significant improvements in video encoding quality and efficiency, which can improve encoding efficiency while ensuring video quality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. SVG-CNN: A shallow CNN based on VGGNet applied to intra prediction partition block in HEVC.
- Author
-
Linck, Iris, Gómez, Arthur Tórgo, and Alaghband, Gita
- Subjects
CONVOLUTIONAL neural networks ,RECURSIVE partitioning ,VIDEO coding ,PROBABILITY theory ,FORECASTING ,VIDEO compression - Abstract
High Efficiency Video Coding (HEVC) offers superior compression rates, but its adoption introduces increased coding complexity due to its reliance on a recursive quad-tree for partitioning frames into varying block sizes. This quad-tree process is a central feature in upcoming video coding standards. Our paper presents a novel framework, SVG-CNN, which integrates three shallow Convolutional Neural Networks (CNNs) inspired by VGGNet. Each CNN is specifically designed for individual quad-tree levels to predict the Code Unit (CU) partition in HEVC, leading to reduced intra-frame coding time. SVG-CNN has an inherent capability for early terminations, leveraging sequential CNN feeding based on quad-tree level probabilities. This provides a mechanism to halt processes when further refinement is seemed unlikely. Enhancing the model's efficacy, we have crafted three specialized datasets, each focusing on distinct quad-tree levels and quantization parameter (QP) contexts. This allows each CNN within our framework to undergo targeted training, establishing a cutting-edge training methodology. Our study shows that performance, in terms of accuracy and F1 metrics, is highly dependent on QP settings, with lower QPs yielding better results, and higher QPs diminishing performance due to potential loss of critical features. To enhance our model, we tackled hyperparameter selection and CU split threshold determination for HEVC prediction. We utilized Grid Search Cross-Validation for the former and assessed multiple thresholds across selected videos for the latter. The model has a moderate complexity with over 328,000 parameters across 18 layers, which ensures memory efficiency. It boasts a swift prediction time of 0.05 ms and reduces HEVC encoding time by 61.64%, while slightly improving the bitrate-distortion performance by -0.24% BDBR, indicating better compression without notable PSNR loss. Significantly, our approach outperforms other CNN-based quad-tree partitioning methods that reduce HEVC coding complexity but sacrifice compression performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Deep learning-guided video compression for machine vision tasks.
- Author
-
Kim, Aro, Woo, Seung-taek, Park, Minho, Kim, Dong-hwi, Lim, Hanshin, Jung, Soon-heung, Kwak, Sangwoon, and Park, Sang-hyo
- Subjects
COMPUTER vision ,VIDEO compression ,SWITCHING systems (Telecommunication) ,CODECS ,COMPUTER programming education ,VIDEO coding ,DEEP learning - Abstract
In the video compression industry, video compression tailored to machine vision tasks has recently emerged as a critical area of focus. Given the unique characteristics of machine vision, the current practice of directly employing conventional codecs reveals inefficiency, which requires compressing unnecessary regions. In this paper, we propose a framework that more aptly encodes video regions distinguished by machine vision to enhance coding efficiency. For that, the proposed framework consists of deep learning-based adaptive switch networks that guide the efficient coding tool for video encoding. Through the experiments, it is demonstrated that the proposed framework has superiority over the latest standardization project, video coding for machine benchmark, which achieves a Bjontegaard delta (BD)-rate gain of 5.91% on average and reaches up to a 19.51% BD-rate gain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. ULSR-UV: an ultra-lightweight super-resolution networks for UAV video.
- Author
-
Yang, Xin, Wu, Lingxiao, and Wang, Xiangchen
- Subjects
- *
DRONE aircraft , *NETWORK performance , *BLOCK designs , *VIDEO processing , *GENERALIZATION , *VIDEO compression - Abstract
Existing lightweight video super-resolution network architectures are often simple in structure and lack generalization ability when dealing with complex and varied real scenes in aerial videos of unmanned aerial vehicle. Furthermore, these networks may cause issues such as the checkerboard effect and loss of texture information when processing drone videos. To address these challenges, we propose a super-lightweight video super-resolution reconstruction network based on convolutional pyramids and progressive residual blocks: ULSR-UV. The ULSR-UV network significantly reduces model redundancy and achieves high levels of lightness by incorporating a 3D lightweight spatial pyramid structure and more efficient residual block designs. This network utilizes a specific optimizer to efficiently process drone videos from both multi-frame and single-frame dimensions. Additionally, the ULSR-UV network incorporates a multidimensional feature loss calculation module that enhances network performance and significantly improves the reconstruction quality of drone aerial videos. Extensive experimental verification has demonstrated ULSR-UV's outstanding performance in the field of drone video super-resolution reconstruction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec.
- Author
-
Gwun, Woowoen, Choi, Kiho, and Park, Gwang Hoon
- Subjects
- *
CONVOLUTIONAL neural networks , *STREAMING media , *VIDEO coding , *VIDEO compression , *BANDWIDTHS , *VIDEOS - Abstract
Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating on High-Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC). This narrow focus has limited the exploration and application of these techniques to other video coding standards such as AV1, developed by the Alliance for Open Media, which offers excellent compression efficiency, reducing bandwidth usage and improving video quality, making it highly attractive for modern streaming and media applications. This paper introduces a novel approach that extends beyond traditional CNN methods by integrating three different self-attention layers into the CNN framework. Applied to the AV1 codec, the proposed method significantly improves video quality by incorporating these distinct self-attention layers. This enhancement demonstrates the potential of self-attention mechanisms to revolutionize post-filtering techniques in video coding beyond the limitations of convolution-based methods. The experimental results show that the proposed network achieves an average BD-rate reduction of 10.40% for the Luma component and 19.22% and 16.52% for the Chroma components compared to the AV1 anchor. Visual quality assessments further validated the effectiveness of our approach, showcasing substantial artifact reduction and detail enhancement in videos. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Robust Reversible Watermarking Scheme in Video Compression Domain Based on Multi-Layer Embedding.
- Author
-
Meng, Yifei, Niu, Ke, Zhang, Yingnan, Liang, Yucheng, and Hu, Fangmeng
- Subjects
DIGITAL watermarking ,VIDEO compression ,DISCRETE cosine transforms ,WATERMARKS ,INFORMATION filtering - Abstract
Most of the existing research on video watermarking schemes focus on improving the robustness of watermarking. However, in application scenarios such as judicial forensics and telemedicine, the distortion caused by watermark embedding on the original video is unacceptable. To solve this problem, this paper proposes a robust reversible watermarking (RRW)scheme based on multi-layer embedding in the video compression domain. Firstly, the watermarking data are divided into several sub-secrets by using Shamir's (t, n)-threshold secret sharing. After that, the chroma sub-block with more complex texture information is filtered out in the I-frame of each group of pictures (GOP), and the sub-secret is embedded in that frame by modifying the discrete cosine transform (DCT) coefficients within the sub-block. Finally, the auxiliary information required to recover the coefficients is embedded into the motion vector of the P-frame of each GOP by a reversible steganography algorithm. In the absence of an attack, the receiver can recover the DCT coefficients by extracting the auxiliary information in the vectors, ultimately recovering the video correctly. The watermarking scheme demonstrates strong robustness even when it suffers from malicious attacks such as recompression attacks and requantization attacks. The experimental results demonstrate that the watermarking scheme proposed in this paper exhibits reversibility and high visual quality. Moreover, the scheme surpasses other comparable methods in the robustness test session. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Efficient Lossy Compression of Video Sequences of Automotive High-Dynamic Range Image Sensors for Advanced Driver-Assistance Systems and Autonomous Vehicles.
- Author
-
Pawłowski, Paweł and Piniarski, Karol
- Subjects
IMAGE compression ,IMAGE color analysis ,VIDEO compression ,IMAGE processing software ,GRAYSCALE model - Abstract
In this paper, we introduce an efficient lossy coding procedure specifically tailored for handling video sequences of automotive high-dynamic range (HDR) image sensors in advanced driver-assistance systems (ADASs) for autonomous vehicles. Nowadays, mainly for security reasons, lossless compression is used in the automotive industry. However, it offers very low compression rates. To obtain higher compression rates, we suggest using lossy codecs, especially when testing image processing algorithms in software in-the-loop (SiL) or hardware-in-the-loop (HiL) conditions. Our approach leverages the high-quality VP9 codec, operating in two distinct modes: grayscale image compression for automatic image analysis and color (in RGB format) image compression for manual analysis. In both modes, images are acquired from the automotive-specific RCCC (red, clear, clear, clear) image sensor. The codec is designed to achieve a controlled image quality and state-of-the-art compression ratios while maintaining real-time feasibility. In automotive applications, the inherent data loss poses challenges associated with lossy codecs, particularly in rapidly changing scenes with intricate details. To address this, we propose configuring the lossy codecs in variable bitrate (VBR) mode with a constrained quality (CQ) parameter. By adjusting the quantization parameter, users can tailor the codec behavior to their specific application requirements. In this context, a detailed analysis of the quality of lossy compressed images in terms of the structural similarity index metric (SSIM) and the peak signal-to-noise ratio (PSNR) metrics is presented. With this analysis, we extracted some codec parameters, which have an important impact on preservation of video quality and compression ratio. The proposed compression settings are very efficient: the compression ratios vary from 51 to 7765 for grayscale image mode and from 4.51 to 602.6 for RGB image mode, depending on the specified output image quality settings. We reached 129 frames per second (fps) for compression and 315 fps for decompression in grayscale mode and 102 fps for compression and 121 fps for decompression in the RGB mode. These make it possible to achieve a much higher compression ratio compared to lossless compression while maintaining control over image quality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.