Back to Search Start Over

High-order relational generative adversarial network for video super-resolution.

Authors :
Chen, Rui
Mu, Yang
Zhang, Yan
Source :
Pattern Recognition. Feb2024, Vol. 146, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

• We would like to submit the manuscript entitled "High-order relational generative adversarial network for video super-resolution", which we wish to be considered as a research article for publication in "Pattern Recognition". Because our contributed research covers the area" Image processing and representation", this manuscript belongs in this journal. No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication. • Both the instability of motion compensation and feature aggregation often undermine the SR results of videos. Only the global spatio-temporal dependencies of same scale features are considered as one-order relations, with difficulty in handling the complex motions and context variations. To deal with these issues, we argue that high-order relations should be modelled and applied in the generative strategies to exploit feature alignments and long-range information fusion. In fact, the high-order relations mean that the dependencies of features among pixel positions are mined not only from global feature representations but also through local patch interactions. Moreover, the stronger relations can be further revealed across different scales, by which the aggregation weight matrices with high-order statistics are adaptively determined. The videos are regarded as the complementary fusion of motion and context. Thus, simultaneously capturing the underlying relations of the motion and context is the key to video SR. In this paper, we propose a high-order relational generative adversarial network (HOR-GAN) for accurate video SR, which has strong capacity of alleviating the erroneous motion compensation and merging the useful information among the consecutive frames. • In summary, we highlight the main contributions of this paper as follows: We propose a HOR-GAN framework for highly accurate and realistic video SR. By exploiting high-order relations of feature patches under the construction of the pyramid graphs, the proposed HOR-GAN works well to produce a better feature alignment and fuse more spatio-temporal information. We adopt the dual discriminators to provide spatially coherent feedback to the generator, thus making the generator focus more on fine-grained features. The effectiveness of the proposed method has been justified through the extensive experiments. • We design a motion-aware relation module to accurately align neighboring frames with the reference ones. In this module, a patch-wise matching strategy is first used to build cross-scale correspondences of similar patches. The graph attention layers then adaptively aggregate the local patch features to further decrease the alignment error. We develop a context-aware relation module to make full use of high-order dependencies among all warped feature patches. We introduce the multi-scale graph convolution layers to mine contextual interaction relations for better fusing the spatio-temporal features, in which the position information is encoded to augment the detail recovery. Finally, each pixel is enhanced via a global self-attention. • We deeply appreciate your consideration of our manuscript, and we look forward to receiving comments from the reviewers. If you have any queries, please don't hesitate to contact me. Yours sincerely, Rui Chen Video super-resolution can reconstruct a sequence of high-resolution frames with temporally consistent contents from their corresponding low-resolution sequences. The key challenge for this task is how to effectively utilize both inter-frame temporal relations and intra-frame spatial relations. The existing methods for super-resolving the videos commonly estimate optical flows to align the features of multiple frames based on temporal correlations. However, motion estimation is often error-prone and hence largely hinders the recovery of plausible details. Moreover, high-order contextual dependencies in the feature space are rarely exploited for further enhancing the spatio-temporal information fusion. To this end, we propose a novel generative adversarial network to super-resolve low-resolution videos, which makes full use of patch embeddings and is effective in exploring high-order spatio-temporal relations of the feature patches. Specifically, a motion-aware relation module is designed to handle the alignment between neighboring frames and reference ones. Depending on a patch-matching strategy for adaptive selection of multiple most similar patches, the cross-scale graph is constructed to reliably aggregate these patches using a feature pyramid. Based on the structure of multi-scale graph, a context-aware relation module is developed to capture high-order dependencies among resulting warped patches for better leveraging long-range complementary contexts. To further enhance reconstruction ability, the temporal position information of video sequences is also encoded into this module. Dual discriminators with cycle consistent constraints are adopted to provide more informative feedback to the generator while maintaining the global coherence. Extensive experiments have demonstrated the effectiveness of the proposed method in terms of quantitative and qualitative evaluation metrics. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00313203
Volume :
146
Database :
Academic Search Index
Journal :
Pattern Recognition
Publication Type :
Academic Journal
Accession number :
173416099
Full Text :
https://doi.org/10.1016/j.patcog.2023.110059