Author: "Lai, Wei-Sheng" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lai, Wei-Sheng"' showing total 153 results

Start Over Author "Lai, Wei-Sheng"

153 results on '"Lai, Wei-Sheng"'

1. High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion

Author: Hur, Junhwa, Herrmann, Charles, Saxena, Saurabh, Kontkanen, Janne, Lai, Wei-Sheng, Shih, Yichang, Rubinstein, Michael, Fleet, David J., and Sun, Deqing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the recent progress, existing frame interpolation methods still struggle with processing extremely high resolution input and handling challenging cases such as repetitive textures, thin objects, and large motion. To address these issues, we introduce a patch-based cascaded pixel diffusion model for frame interpolation, HiFI, that excels in these scenarios while achieving competitive performance on standard benchmarks. Cascades, which generate a series of images from low- to high-resolution, can help significantly with large or complex motion that require both global context for a coarse solution and detailed context for high resolution output. However, contrary to prior work on cascaded diffusion models which perform diffusion on increasingly large resolutions, we use a single model that always performs diffusion at the same resolution and upsamples by processing patches of the inputs and the prior solution. We show that this technique drastically reduces memory usage at inference time and also allows us to use a single model at test time, solving both frame interpolation and spatial up-sampling, saving training cost. We show that HiFI helps significantly with high resolution and complex repeated textures that require global context. HiFI demonstrates comparable or beyond state-of-the-art performance on multiple benchmarks (Vimeo, Xiph, X-Test, SEPE-8K). On our newly introduced dataset that focuses on particularly challenging cases, HiFI also significantly outperforms other baselines on these cases. Please visit our project page for video results: https://hifi-diffusion.github.io, Comment: Project page: https://hifi-diffusion.github.io/
Published: 2024

2. Efficient Hybrid Zoom using Camera Fusion on Mobile Phones

Author: Wu, Xiaotong, Lai, Wei-Sheng, Shih, YiChang, Herrmann, Charles, Krainin, Michael, Sun, Deqing, and Liang, Chia-Kai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types. However, these techniques are not possible on smartphone devices due to space constraints. Most smartphone manufacturers adopt a hybrid zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T) camera at a high zoom level. To simulate zoom levels between W and T, these systems crop and digitally upsample images from W, leading to significant detail loss. In this paper, we propose an efficient system for hybrid zoom super-resolution on mobile devices, which captures a synchronous pair of W and T shots and leverages machine learning models to align and transfer details from T to W. We further develop an adaptive blending method that accounts for depth-of-field mismatches, scene occlusion, flow uncertainty, and alignment errors. To minimize the domain gap, we design a dual-phone camera rig to capture real-world inputs and ground-truths for supervised training. Our method generates a 12-megapixel image in 500ms on a mobile platform and compares favorably against state-of-the-art methods under extensive evaluation on real-world scenarios., Comment: Accepted to SIGGRAPH Asia 2023 (ACM TOG). Project website: https://www.wslai.net/publications/fusion_zoom
Published: 2024

3. Face Deblurring using Dual Camera Fusion on Mobile Phones

Author: Lai, Wei-Sheng, Shih, YiChang, Chu, Lun-Cheng, Wu, Xiaotong, Tsai, Sung-Fang, Krainin, Michael, Sun, Deqing, and Liang, Chia-Kai
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Motion blur of fast-moving subjects is a longstanding problem in photography and very common on mobile phones due to limited light collection efficiency, particularly in low-light conditions. While we have witnessed great progress in image deblurring in recent years, most methods require significant computational power and have limitations in processing high-resolution photos with severe local motions. To this end, we develop a novel face deblurring system based on the dual camera fusion technique for mobile phones. The system detects subject motion to dynamically enable a reference camera, e.g., ultrawide angle camera commonly available on recent premium phones, and captures an auxiliary photo with faster shutter settings. While the main shot is low noise but blurry, the reference shot is sharp but noisy. We learn ML models to align and fuse these two shots and output a clear photo without motion blur. Our algorithm runs efficiently on Google Pixel 6, which takes 463 ms overhead per shot. Our experiments demonstrate the advantage and robustness of our system against alternative single-image, multi-frame, face-specific, and video deblurring algorithms as well as commercial products. To the best of our knowledge, our work is the first mobile solution for face motion deblurring that works reliably and robustly over thousands of images in diverse motion and lighting conditions., Comment: Accepted to SIGGRAPH 2022 (ACM TOG). Project websit: https://www.wslai.net/publications/fusion_deblur/
Published: 2022

4. Vision Transformer for NeRF-Based View Synthesis from a Single Input Image

Author: Lin, Kai-En, Yen-Chen, Lin, Lai, Wei-Sheng, Lin, Tsung-Yi, Shih, Yi-Chang, and Ramamoorthi, Ravi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Although neural radiance fields (NeRF) have shown impressive advances for novel view synthesis, most methods typically require multiple input images of the same scene with accurate camera poses. In this work, we seek to substantially reduce the inputs to a single unposed image. Existing approaches condition on local image features to reconstruct a 3D object, but often render blurry predictions at viewpoints that are far away from the source view. To address this issue, we propose to leverage both the global and local features to form an expressive 3D representation. The global features are learned from a vision transformer, while the local features are extracted from a 2D convolutional network. To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering. This novel 3D representation allows the network to reconstruct unseen regions without enforcing constraints like symmetry or canonical coordinate systems. Our method can render novel views from only a single input image and generalize across multiple object categories using a single model. Quantitative and qualitative evaluations demonstrate that the proposed method achieves state-of-the-art performance and renders richer details than existing approaches., Comment: WACV 2023 Project website: https://cseweb.ucsd.edu/~viscomp/projects/VisionNeRF/
Published: 2022

5. Deep Image Deblurring: A Survey

Author: Zhang, Kaihao, Ren, Wenqi, Luo, Wenhan, Lai, Wei-Sheng, Stenger, Bjorn, Yang, Ming-Hsuan, and Li, Hongdong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Image deblurring is a classic problem in low-level computer vision with the aim to recover a sharp image from a blurred input image. Advances in deep learning have led to significant progress in solving this problem, and a large number of deblurring networks have been proposed. This paper presents a comprehensive and timely survey of recently published deep-learning based image deblurring approaches, aiming to serve the community as a useful literature review. We start by discussing common causes of image blur, introduce benchmark datasets and performance metrics, and summarize different problem formulations. Next, we present a taxonomy of methods using convolutional neural networks (CNN) based on architecture, loss function, and application, offering a detailed review and comparison. In addition, we discuss some domain-specific deblurring applications including face images, text, and stereo image pairs. We conclude by discussing key challenges and future research directions., Comment: To appear in International Journal of Computer Vision (IJCV)
Published: 2022

6. Correcting Face Distortion in Wide-Angle Videos

Author: Lai, Wei-Sheng, Shih, YiChang, Liang, Chia-Kai, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Video blogs and selfies are popular social media formats, which are often captured by wide-angle cameras to show human subjects and expanded background. Unfortunately, due to perspective projection, faces near corners and edges exhibit apparent distortions that stretch and squish the facial features, resulting in poor video quality. In this work, we present a video warping algorithm to correct these distortions. Our key idea is to apply stereographic projection locally on the facial regions. We formulate a mesh warp problem using spatial-temporal energy minimization and minimize background deformation using a line-preservation term to maintain the straight edges in the background. To address temporal coherency, we constrain the temporal smoothness on the warping meshes and facial trajectories through the latent variables. For performance evaluation, we develop a wide-angle video dataset with a wide range of focal lengths. The user study shows that 83.9% of users prefer our algorithm over other alternatives based on perspective projection., Comment: Project website: https://www.wslai.net/publications/video_face_correction/
Published: 2021
Full Text: View/download PDF

7. Toward Real-World Super-Resolution via Adaptive Downsampling Models

Author: Son, Sanghyun, Kim, Jaeha, Lai, Wei-Sheng, Yang, Ming-Husan, and Lee, Kyoung Mu
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Most image super-resolution (SR) methods are developed on synthetic low-resolution (LR) and high-resolution (HR) image pairs that are constructed by a predetermined operation, e.g., bicubic downsampling. As existing methods typically learn an inverse mapping of the specific function, they produce blurry results when applied to real-world images whose exact formulation is different and unknown. Therefore, several methods attempt to synthesize much more diverse LR samples or learn a realistic downsampling model. However, due to restrictive assumptions on the downsampling process, they are still biased and less generalizable. This study proposes a novel method to simulate an unknown downsampling process without imposing restrictive prior knowledge. We propose a generalizable low-frequency loss (LFL) in the adversarial training framework to imitate the distribution of target LR images without using any paired examples. Furthermore, we design an adaptive data loss (ADL) for the downsampler, which can be adaptively learned and updated from the data during the training loops. Extensive experiments validate that our downsampling model can facilitate existing SR methods to perform more accurate reconstructions on various synthetic and real-world examples than the conventional approaches., Comment: Accepted at TPAMI
Published: 2021
Full Text: View/download PDF

8. Stylizing 3D Scene via Implicit Representation and HyperNetwork

Author: Chiang, Pei-Ze, Tsai, Meng-Shiun, Tseng, Hung-Yu, Lai, Wei-sheng, and Chiu, Wei-Chen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we aim to address the 3D scene stylization problem - generating stylized images of the scene at arbitrary novel view angles. A straightforward solution is to combine existing novel view synthesis and image/video style transfer approaches, which often leads to blurry results or inconsistent appearance. Inspired by the high-quality results of the neural radiance fields (NeRF) method, we propose a joint framework to directly render novel views with the desired style. Our framework consists of two components: an implicit representation of the 3D scene with the neural radiance fields model, and a hypernetwork to transfer the style information into the scene representation. In particular, our implicit representation model disentangles the scene into the geometry and appearance branches, and the hypernetwork learns to predict the parameters of the appearance branch from the reference style image. To alleviate the training difficulties and memory burden, we propose a two-stage training procedure and a patch sub-sampling approach to optimize the style and content losses with the neural radiance fields model. After optimization, our model is able to render consistent novel views at arbitrary view angles with arbitrary style. Both quantitative evaluation and human subject study have demonstrated that the proposed method generates faithful stylization results with consistent appearance across different views., Comment: Accepted to WACV2022; Project page: https://ztex08010518.github.io/3dstyletransfer/
Published: 2021

9. Hybrid Neural Fusion for Full-frame Video Stabilization

Author: Liu, Yu-Lun, Lai, Wei-Sheng, Yang, Ming-Hsuan, Chuang, Yung-Yu, and Huang, Jia-Bin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing video stabilization methods often generate visible distortion or require aggressive cropping of frame boundaries, resulting in smaller field of views. In this work, we present a frame synthesis algorithm to achieve full-frame video stabilization. We first estimate dense warp fields from neighboring frames and then synthesize the stabilized frame by fusing the warped contents. Our core technical novelty lies in the learning-based hybrid-space fusion that alleviates artifacts caused by optical flow inaccuracy and fast-moving objects. We validate the effectiveness of our method on the NUS, selfie, and DeepStab video datasets. Extensive experiment results demonstrate the merits of our approach over prior video stabilization methods., Comment: ICCV 2021. Project page: https://alex04072000.github.io/FuSta/ Code: https://github.com/alex04072000/FuSta
Published: 2021

10. Deep Online Fused Video Stabilization

Author: Shi, Zhenmei, Shi, Fuhao, Lai, Wei-Sheng, Liang, Chia-Kai, and Liang, Yingyu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a deep neural network (DNN) that uses both sensor data (gyroscope) and image content (optical flow) to stabilize videos through unsupervised learning. The network fuses optical flow with real/virtual camera pose histories into a joint motion representation. Next, the LSTM block infers the new virtual camera pose, and this virtual pose is used to generate a warping grid that stabilizes the frame. Novel relative motion representation as well as a multi-stage training process are presented to optimize our model without any supervision. To the best of our knowledge, this is the first DNN solution that adopts both sensor data and image for stabilization. We validate the proposed framework through ablation studies and demonstrated the proposed method outperforms the state-of-art alternative solutions via quantitative evaluations and a user study., Comment: 9 pages. Project page: https://zhmeishi.github.io/dvs/
Published: 2021

11. Portrait Neural Radiance Fields from a Single Image

Author: Gao, Chen, Shih, Yichang, Lai, Wei-Sheng, Liang, Chia-Kai, and Huang, Jia-Bin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a method for estimating Neural Radiance Fields (NeRF) from a single headshot portrait. While NeRF has demonstrated high-quality view synthesis, it requires multiple images of static scenes and thus impractical for casual captures and moving subjects. In this work, we propose to pretrain the weights of a multilayer perceptron (MLP), which implicitly models the volumetric density and colors, with a meta-learning framework using a light stage portrait dataset. To improve the generalization to unseen faces, we train the MLP in the canonical coordinate space approximated by 3D face morphable models. We quantitatively evaluate the method using controlled captures and demonstrate the generalization to real portrait images, showing favorable results against state-of-the-arts., Comment: Project webpage: https://portrait-nerf.github.io/
Published: 2020

12. Real-time Localized Photorealistic Video Style Transfer

Author: Xia, Xide, Xue, Tianfan, Lai, Wei-sheng, Sun, Zheng, Chang, Abby, Kulis, Brian, and Chen, Jiawen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a novel algorithm for transferring artistic styles of semantically meaningful local regions of an image onto local regions of a target video while preserving its photorealism. Local regions may be selected either fully automatically from an image, through using video segmentation algorithms, or from casual user guidance such as scribbles. Our method, based on a deep neural network architecture inspired by recent work in photorealistic style transfer, is real-time and works on arbitrary inputs without runtime optimization once trained on a diverse dataset of artistic styles. By augmenting our video dataset with noisy semantic labels and jointly optimizing over style, content, mask, and temporal losses, our method can cope with a variety of imperfections in the input and produce temporally coherent videos without visual artifacts. We demonstrate our method on a variety of style images and target videos, including the ability to transfer different styles onto multiple objects simultaneously, and smoothly transition between styles in time., Comment: 16 pages, 15 figures
Published: 2020

13. Learning to See Through Obstructions with Layered Decomposition

Author: Liu, Yu-Lun, Lai, Wei-Sheng, Yang, Ming-Hsuan, Chuang, Yung-Yu, and Huang, Jia-Bin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions, or adherent raindrops, from a short sequence of images captured by a moving camera. Our method leverages motion differences between the background and obstructing elements to recover both layers. Specifically, we alternate between estimating dense optical flow fields of the two layers and reconstructing each layer from the flow-warped images via a deep convolutional neural network. This learning-based layer reconstruction module facilitates accommodating potential errors in the flow estimation and brittle assumptions, such as brightness consistency. We show that the proposed approach learned from synthetically generated data performs well to real images. Experimental results on numerous challenging scenarios of reflection and fence removal demonstrate the effectiveness of the proposed method., Comment: Project page: https://alex04072000.github.io/SOLD/ Code: https://github.com/alex04072000/SOLD Extension of the CVPR 2020 paper: arXiv:2004.01180
Published: 2020

14. Learning to See Through Obstructions

Author: Liu, Yu-Lun, Lai, Wei-Sheng, Yang, Ming-Hsuan, Chuang, Yung-Yu, and Huang, Jia-Bin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions or raindrops, from a short sequence of images captured by a moving camera. Our method leverages the motion differences between the background and the obstructing elements to recover both layers. Specifically, we alternate between estimating dense optical flow fields of the two layers and reconstructing each layer from the flow-warped images via a deep convolutional neural network. The learning-based layer reconstruction allows us to accommodate potential errors in the flow estimation and brittle assumptions such as brightness consistency. We show that training on synthetically generated data transfers well to real images. Our results on numerous challenging scenarios of reflection and fence removal demonstrate the effectiveness of the proposed method., Comment: CVPR 2020. Project page: https://www.cmlab.csie.ntu.edu.tw/~yulunliu/ObstructionRemoval Code: https://github.com/alex04072000/ObstructionRemoval
Published: 2020

15. Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

Author: Liu, Yu-Lun, Lai, Wei-Sheng, Chen, Yu-Sheng, Kao, Yi-Lung, Yang, Ming-Hsuan, Chuang, Yung-Yu, and Huang, Jia-Bin
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recovering a high dynamic range (HDR) image from a single low dynamic range (LDR) input image is challenging due to missing details in under-/over-exposed regions caused by quantization and saturation of camera sensors. In contrast to existing learning-based methods, our core idea is to incorporate the domain knowledge of the LDR image formation pipeline into our model. We model the HDRto-LDR image formation pipeline as the (1) dynamic range clipping, (2) non-linear mapping from a camera response function, and (3) quantization. We then propose to learn three specialized CNNs to reverse these steps. By decomposing the problem into specific sub-tasks, we impose effective physical constraints to facilitate the training of individual sub-networks. Finally, we jointly fine-tune the entire model end-to-end to reduce error accumulation. With extensive quantitative and qualitative experiments on diverse image datasets, we demonstrate that the proposed method performs favorably against state-of-the-art single-image HDR reconstruction algorithms., Comment: CVPR 2020. Project page: https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR Code: https://github.com/alex04072000/SingleHDR
Published: 2020

16. Gated Fusion Network for Degraded Image Super Resolution

Author: Zhang, Xinyi, Dong, Hang, Hu, Zhe, Lai, Wei-Sheng, Wang, Fei, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Single image super resolution aims to enhance image quality with respect to spatial content, which is a fundamental task in computer vision. In this work, we address the task of single frame super resolution with the presence of image degradation, e.g., blur, haze, or rain streaks. Due to the limitations of frame capturing and formation processes, image degradation is inevitable, and the artifacts would be exacerbated by super resolution methods. To address this problem, we propose a dual-branch convolutional neural network to extract base features and recovered features separately. The base features contain local and global information of the input image. On the other hand, the recovered features focus on the degraded regions and are used to remove the degradation. Those features are then fused through a recursive gate module to obtain sharp features for super resolution. By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process by avoiding learning the mixed degradation all-in-one and thus enhance the final high-resolution prediction results. We evaluate the proposed method in three degradation scenarios. Experiments on these scenarios demonstrate that the proposed method performs more efficiently and favorably against the state-of-the-art approaches on benchmark datasets., Comment: Accepted by IJCV. The code will be publicly available at https://github.com/BookerDeWitt/GFN-IJCV
Published: 2020

17. Exploiting Semantics for Face Image Deblurring

Author: Shen, Ziyi, Lai, Wei-Sheng, Xu, Tingfa, Kautz, Jan, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose an effective and efficient face deblurring algorithm by exploiting semantic cues via deep convolutional neural networks. As the human faces are highly structured and share unified facial components (e.g., eyes and mouths), such semantic information provides a strong prior for restoration. We incorporate face semantic labels as input priors and propose an adaptive structural loss to regularize facial local structures within an end-to-end deep convolutional neural network. Specifically, we first use a coarse deblurring network to reduce the motion blur on the input face image. We then adopt a parsing network to extract the semantic features from the coarse deblurred image. Finally, the fine deblurring network utilizes the semantic information to restore a clear face image. We train the network with perceptual and adversarial losses to generate photo-realistic results. The proposed method restores sharp images with more accurate facial features and details. Quantitative and qualitative evaluations demonstrate that the proposed face deblurring algorithm performs favorably against the state-of-the-art methods in terms of restoration quality, face recognition and execution speed., Comment: Submitted to International Journal of Computer Vision (IJCV). arXiv admin note: text overlap with arXiv:1803.03345
Published: 2020

18. Visual Question Answering on 360{\deg} Images

Author: Chou, Shih-Han, Chao, Wei-Lun, Lai, Wei-Sheng, Sun, Min, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we introduce VQA 360, a novel task of visual question answering on 360 images. Unlike a normal field-of-view image, a 360 image captures the entire visual content around the optical center of a camera, demanding more sophisticated spatial understanding and reasoning. To address this problem, we collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types. We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions. We demonstrate that the cubemap-based model with multi-level fusion and attention diffusion performs favorably against other variants and the equirectangular-based models. Nevertheless, the gap between the humans' and machines' performance reveals the need for more advanced VQA 360 algorithms. We, therefore, expect our dataset and studies to serve as the benchmark for future development in this challenging task. Dataset, code, and pre-trained models are available online., Comment: Accepted to WACV 2020
Published: 2020

19. Video Stitching for Linear Camera Arrays

Author: Lai, Wei-Sheng, Gallo, Orazio, Gu, Jinwei, Sun, Deqing, Yang, Ming-Hsuan, and Kautz, Jan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the long history of image and video stitching research, existing academic and commercial solutions still produce strong artifacts. In this work, we propose a wide-baseline video stitching algorithm for linear camera arrays that is temporally stable and tolerant to strong parallax. Our key insight is that stitching can be cast as a problem of learning a smooth spatial interpolation between the input videos. To solve this problem, inspired by pushbroom cameras, we introduce a fast pushbroom interpolation layer and propose a novel pushbroom stitching network, which learns a dense flow field to smoothly align the multiple input videos for spatial interpolation. Our approach outperforms the state-of-the-art by a significant margin, as we show with a user study, and has immediate applications in many areas such as virtual reality, immersive telepresence, autonomous driving, and video surveillance., Comment: This work is accepted in BMVC 2019. Project website: http://vllab.ucmerced.edu/wlai24/video_stitching/
Published: 2019

20. Depth-Aware Video Frame Interpolation

Author: Bao, Wenbo, Lai, Wei-Sheng, Ma, Chao, Zhang, Xiaoyun, Gao, Zhiyong, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video frame interpolation aims to synthesize nonexistent frames in-between the original frames. While significant advances have been made from the recent deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features to gather contextual information from neighboring pixels. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets., Comment: This work is accepted in CVPR 2019. The source code and pre-trained model are available on https://github.com/baowenbo/DAIN
Published: 2019

21. MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement

Author: Bao, Wenbo, Lai, Wei-Sheng, Zhang, Xiaoyun, Gao, Zhiyong, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Motion estimation (ME) and motion compensation (MC) have been widely used for classical video frame interpolation systems over the past decades. Recently, a number of data-driven frame interpolation methods based on convolutional neural networks have been proposed. However, existing learning based methods typically estimate either flow or compensation kernels, thereby limiting performance on both computational efficiency and interpolation accuracy. In this work, we propose a motion estimation and compensation driven neural network for video frame interpolation. A novel adaptive warping layer is developed to integrate both optical flow and interpolation kernels to synthesize target frame pixels. This layer is fully differentiable such that both the flow and kernel estimation networks can be optimized jointly. The proposed model benefits from the advantages of motion estimation and compensation methods without using hand-crafted features. Compared to existing methods, our approach is computationally efficient and able to generate more visually appealing results. Furthermore, the proposed MEMC-Net can be seamlessly adapted to several video enhancement tasks, e.g., super-resolution, denoising, and deblocking. Extensive quantitative and qualitative evaluations demonstrate that the proposed method performs favorably against the state-of-the-art video frame interpolation and enhancement algorithms on a wide range of datasets., Comment: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence
Published: 2018

22. Learning Blind Video Temporal Consistency

Author: Lai, Wei-Sheng, Huang, Jia-Bin, Wang, Oliver, Shechtman, Eli, Yumer, Ersin, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Applying image processing algorithms independently to each frame of a video often leads to undesired inconsistent results over time. Developing temporally consistent video-based extensions, however, requires domain knowledge for individual tasks and is unable to generalize to other applications. In this paper, we present an efficient end-to-end approach based on deep recurrent network for enforcing temporal consistency in a video. Our method takes the original unprocessed and per-frame processed videos as inputs to produce a temporally consistent video. Consequently, our approach is agnostic to specific image processing algorithms applied on the original video. We train the proposed network by minimizing both short-term and long-term temporal losses as well as the perceptual loss to strike a balance between temporal stability and perceptual similarity with the processed frames. At test time, our model does not require computing optical flow and thus achieves real-time speed even for high-resolution videos. We show that our single model can handle multiple and unseen tasks, including but not limited to artistic style transfer, enhancement, colorization, image-to-image translation and intrinsic image decomposition. Extensive objective evaluation and subject study demonstrate that the proposed approach performs favorably against the state-of-the-art methods on various types of videos., Comment: This work is accepted in ECCV 2018. Project website: http://vllab.ucmerced.edu/wlai24/video_consistency/
Published: 2018

23. Gated Fusion Network for Joint Image Deblurring and Super-Resolution

Author: Zhang, Xinyi, Dong, Hang, Hu, Zhe, Lai, Wei-Sheng, Wang, Fei, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Single-image super-resolution is a fundamental task for vision applications to enhance the image quality with respect to spatial resolution. If the input image contains degraded pixels, the artifacts caused by the degradation could be amplified by super-resolution methods. Image blur is a common degradation source. Images captured by moving or still cameras are inevitably affected by motion blur due to relative movements between sensors and objects. In this work, we focus on the super-resolution task with the presence of motion blur. We propose a deep gated fusion convolution neural network to generate a clear high-resolution frame from a single natural image with severe blur. By decomposing the feature extraction step into two task-independent streams, the dual-branch design can facilitate the training process by avoiding learning the mixed degradation all-in-one and thus enhance the final high-resolution prediction results. Extensive experiments demonstrate that our method generates sharper super-resolved images from low-resolution inputs with high computational efficiency., Comment: Accepted as an oral presentation at BMVC 2018
Published: 2018

24. Deep Semantic Face Deblurring

Author: Shen, Ziyi, Lai, Wei-Sheng, Xu, Tingfa, Kautz, Jan, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we present an effective and efficient face deblurring algorithm by exploiting semantic cues via deep convolutional neural networks (CNNs). As face images are highly structured and share several key semantic components (e.g., eyes and mouths), the semantic information of a face provides a strong prior for restoration. As such, we propose to incorporate global semantic priors as input and impose local structure losses to regularize the output within a multi-scale deep CNN. We train the network with perceptual and adversarial losses to generate photo-realistic results and develop an incremental training strategy to handle random blur kernels in the wild. Quantitative and qualitative evaluations demonstrate that the proposed face deblurring algorithm restores sharp images with more facial details and performs favorably against state-of-the-art methods in terms of restoration quality, face recognition and execution speed., Comment: This work is accepted in CVPR 2018. The project website is on https://sites.google.com/site/ziyishenmi/cvpr18_face_deblur
Published: 2018

25. Learning a Discriminative Prior for Blind Image Deblurring

Author: Li, Lerenhan, Pan, Jinshan, Lai, Wei-Sheng, Gao, Changxin, Sang, Nong, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present an effective blind image deblurring method based on a data-driven discriminative prior.Our work is motivated by the fact that a good image prior should favor clear images over blurred images.In this work, we formulate the image prior as a binary classifier which can be achieved by a deep convolutional neural network (CNN).The learned prior is able to distinguish whether an input image is clear or not.Embedded into the maximum a posterior (MAP) framework, it helps blind deblurring in various scenarios, including natural, face, text, and low-illumination images.However, it is difficult to optimize the deblurring method with the learned image prior as it involves a non-linear CNN.Therefore, we develop an efficient numerical approach based on the half-quadratic splitting method and gradient decent algorithm to solve the proposed model.Furthermore, the proposed model can be easily extended to non-uniform deblurring.Both qualitative and quantitative experimental results show that our method performs favorably against state-of-the-art algorithms as well as domain-specific image deblurring approaches., Comment: This paper is accepted by CVPR2018 as poster
Published: 2018

26. Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks

Author: Lai, Wei-Sheng, Huang, Jia-Bin, Ahuja, Narendra, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Convolutional neural networks have recently demonstrated high-quality reconstruction for single image super-resolution. However, existing methods often require a large number of network parameters and entail heavy computational loads at runtime for generating high-accuracy super-resolution results. In this paper, we propose the deep Laplacian Pyramid Super-Resolution Network for fast and accurate image super-resolution. The proposed network progressively reconstructs the sub-band residuals of high-resolution images at multiple pyramid levels. In contrast to existing methods that involve the bicubic interpolation for pre-processing (which results in large feature maps), the proposed method directly extracts features from the low-resolution input space and thereby entails low computational loads. We train the proposed network with deep supervision using the robust Charbonnier loss functions and achieve high-quality image reconstruction. Furthermore, we utilize the recursive layers to share parameters across as well as within pyramid levels, and thus drastically reduce the number of parameters. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of run-time and image quality., Comment: The code and datasets are available at http://vllab.ucmerced.edu/wlai24/LapSRN/
Published: 2017

27. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution

Author: Lai, Wei-Sheng, Huang, Jia-Bin, Ahuja, Narendra, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy., Comment: This work is accepted in CVPR 2017. The code and datasets are available on http://vllab.ucmerced.edu/wlai24/LapSRN/
Published: 2017

28. Semantic-driven Generation of Hyperlapse from $360^\circ$ Video

Author: Lai, Wei-Sheng, Huang, Yujia, Joshi, Neel, Buehler, Chris, Yang, Ming-Hsuan, and Kang, Sing Bing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a system for converting a fully panoramic ($360^\circ$) video into a normal field-of-view (NFOV) hyperlapse for an optimal viewing experience. Our system exploits visual saliency and semantics to non-uniformly sample in space and time for generating hyperlapses. In addition, users can optionally choose objects of interest for customizing the hyperlapses. We first stabilize an input $360^\circ$ video by smoothing the rotation between adjacent frames and then compute regions of interest and saliency scores. An initial hyperlapse is generated by optimizing the saliency and motion smoothness followed by the saliency-aware frame selection. We further smooth the result using an efficient 2D video stabilization approach that adaptively selects the motion model to generate the final hyperlapse. We validate the design of our system by showing results for a variety of scenes and comparing against the state-of-the-art method through a user study., Comment: This work is accepted in Transactions on Visualization and Computer Graphics (TVCG)
Published: 2017

29. Learning Fully Convolutional Networks for Iterative Non-blind Deconvolution

Author: Zhang, Jiawei, Pan, Jinshan, Lai, Wei-Sheng, Lau, Rynson, and Yang, Ming-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose a fully convolutional networks for iterative non-blind deconvolution We decompose the non-blind deconvolution problem into image denoising and image deconvolution. We train a FCNN to remove noises in the gradient domain and use the learned gradients to guide the image deconvolution step. In contrast to the existing deep neural network based methods, we iteratively deconvolve the blurred images in a multi-stage framework. The proposed method is able to learn an adaptive image prior, which keeps both local (details) and global (structures) information. Both quantitative and qualitative evaluations on benchmark datasets demonstrate that the proposed method performs favorably against state-of-the-art algorithms in terms of quality and speed.
Published: 2016

30. Efficient Hybrid Zoom Using Camera Fusion on Mobile Phones

Author: Wu, Xiaotong, primary, Lai, Wei-Sheng, additional, Shih, Yichang, additional, Herrmann, Charles, additional, Krainin, Michael, additional, Sun, Deqing, additional, and Liang, Chia-Kai, additional
Published: 2023
Full Text: View/download PDF

31. Blind Image Deblurring via Deep Discriminative Priors

Author: Li, Lerenhan, Pan, Jinshan, Lai, Wei-Sheng, Gao, Changxin, Sang, Nong, and Yang, Ming-Hsuan
Published: 2019
Full Text: View/download PDF

32. Learning Spatial and Temporal Visual Enhancement

Author: Lai, Wei-Sheng
Subjects: Computer science, deep networks, super-resolution, temporal consistency, video stitching, visual enhancement
Abstract: Visual enhancement is concerned with problems to improve the visual quality and viewing experience for images and videos. Researchers have been actively working on this area due to its theoretical and practical interest. However, obtaining high visual quality often comes with a cost of computational efficiency. With the growth of mobile applications and cloud services, it is crucial to develop effective and efficient algorithms for generating visually attractive images and videos. In this thesis, we address the visual enhancement problems in three aspects, including the spatial, temporal, and the joint spatial-temporal domains. We propose efficient algorithms based on deep convolutional neural networks for solving various visual enhancement problems.First, we address the problem of spatial enhancement for single-image super-resolution. We propose a deep Laplacian Pyramid Network to reconstruct a high-resolution image from an input low-resolution input in a coarse-to-fine manner. Our model directly extracts features from input LR images and progressively reconstructs the sub-band residuals. We train the proposed model with a multi-scale training, deep supervision, and robust loss functions to achieve state-of-the-art performance. Furthermore, we exploit the recursive learning technique to share parameters across and within pyramid levels to significantly reduce the model parameters. As most of the operations are performed on a low-resolution space, our model requires less memory and runs faster than state-of-the-art methods.Second, we address the temporal enhancement problem by learning the temporal consistency in videos. Given an input video and a per-frame processed video (processed by an existing image-based algorithm), we learn a recurrent network to reduce the temporal flickering and generate a temporally consistent video. We train the proposed network by minimizing both short-term and long-term temporal losses as well as a perceptual loss to strike a balance between temporal coherence and perceptual similarity with the processed frames. At test time, our model does not require computing optical flow and thus runs at 400+ FPS on GPU for high-resolution videos. Our model is task independent, where a single model can handle multiple and unseen tasks, including but not limited to artistic style transfer, enhancement, colorization, image-to-image translation and intrinsic image decomposition.Third, we address the spatial-temporal enhancement problem for video stitching. Inspired by the pushbroom cameras, we cast the stitching as a spatial interpolation problem. We propose a pushbroom stitching network to learn dense flow fields to smoothly align the input videos. The stitched videos can be generated from an efficient pushbroom interpolation layer. Our approach generates more temporally stable and visually pleasing results than existing video stitching approaches and commercial software. Furthermore, our algorithm has immediate applications in many areas such as virtual reality, immersive telepresence, autonomous driving, and video surveillance.
Published: 2019

33. Learning Blind Video Temporal Consistency

Author: Lai, Wei-Sheng, primary, Huang, Jia-Bin, additional, Wang, Oliver, additional, Shechtman, Eli, additional, Yumer, Ersin, additional, and Yang, Ming-Hsuan, additional
Published: 2018
Full Text: View/download PDF

34. Vision Transformer for NeRF-Based View Synthesis from a Single Input Image

Author: Lin, Kai-En, primary, Yen-Chen, Lin, additional, Lai, Wei-Sheng, additional, Lin, Tsung-Yi, additional, Shih, Yi-Chang, additional, and Ramamoorthi, Ravi, additional
Published: 2023
Full Text: View/download PDF

35. High Quality Image Deblurring Scheme Using the Pyramid Hyper-Laplacian L2 Norm Priors Algorithm

Author: Chen, Yu, Ding, Jian-Jiun, Lai, Wei-Sheng, Chen, Ying-Jou, Chang, Chir-Weei, Chang, Chuan-Chung, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Huet, Benoit, editor, Ngo, Chong-Wah, editor, Tang, Jinhui, editor, Zhou, Zhi-Hua, editor, Hauptmann, Alexander G., editor, and Yan, Shuicheng, editor
Published: 2013
Full Text: View/download PDF

36. Toward Real-World Super-Resolution via Adaptive Downsampling Models

Author: Son, Sanghyun, primary, Kim, Jaeha, additional, Lai, Wei-Sheng, additional, Yang, Ming-Hsuan, additional, and Lee, Kyoung Mu, additional
Published: 2022
Full Text: View/download PDF

37. Learning to See Through Obstructions With Layered Decomposition

Author: Liu, Yu-Lun, primary, Lai, Wei-Sheng, additional, Yang, Ming-Hsuan, additional, Chuang, Yung-Yu, additional, and Huang, Jia-Bin, additional
Published: 2022
Full Text: View/download PDF

38. Face deblurring using dual camera fusion on mobile phones

Author: Lai, Wei-Sheng, primary, Shih, Yichang, additional, Chu, Lun-Cheng, additional, Wu, Xiaotong, additional, Tsai, Sung-Fang, additional, Krainin, Michael, additional, Sun, Deqing, additional, and Liang, Chia-Kai, additional
Published: 2022
Full Text: View/download PDF

39. Deep Image Deblurring: A Survey

Author: Zhang, Kaihao, primary, Ren, Wenqi, additional, Luo, Wenhan, additional, Lai, Wei-Sheng, additional, Stenger, Björn, additional, Yang, Ming-Hsuan, additional, and Li, Hongdong, additional
Published: 2022
Full Text: View/download PDF

40. Dynamic Enhanced Inter-cell Interference Coordination Strategy with Quality of Service Guarantees for Heterogeneous Networks

Author: Lai, Wei-Sheng, primary, Chang, Tsung-Hui, additional, Yeh, Kuan-Hsuan, additional, and Lee, Ta-Sung, additional
Published: 2016
Full Text: View/download PDF

41. Deep Online Fused Video Stabilization

Author: Shi, Zhenmei, Shi, Fuhao, Lai, Wei-Sheng, Liang, Chia-Kai, and Liang, Yingyu
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Abstract: We present a deep neural network (DNN) that uses both sensor data (gyroscope) and image content (optical flow) to stabilize videos through unsupervised learning. The network fuses optical flow with real/virtual camera pose histories into a joint motion representation. Next, the LSTM block infers the new virtual camera pose, and this virtual pose is used to generate a warping grid that stabilizes the frame. Novel relative motion representation as well as a multi-stage training process are presented to optimize our model without any supervision. To the best of our knowledge, this is the first DNN solution that adopts both sensor data and image for stabilization. We validate the proposed framework through ablation studies and demonstrated the proposed method outperforms the state-of-art alternative solutions via quantitative evaluations and a user study., Comment: 9 pages. Project page: https://zhmeishi.github.io/dvs/
Published: 2022

42. Correcting Face Distortion in Wide-Angle Videos

Author: Lai, Wei-Sheng, primary, Shih, Yichang, additional, Liang, Chia-Kai, additional, and Yang, Ming-Hsuan, additional
Published: 2022
Full Text: View/download PDF

43. Stylizing 3D Scene via Implicit Representation and HyperNetwork

Author: Chiang, Pei-Ze, primary, Tsai, Meng-Shiun, additional, Tseng, Hung-Yu, additional, Lai, Wei-Sheng, additional, and Chiu, Wei-Chen, additional
Published: 2022
Full Text: View/download PDF

44. Hybrid Neural Fusion for Full-frame Video Stabilization

Author: Liu, Yu-Lun, primary, Lai, Wei-Sheng, additional, Yang, Ming-Hsuan, additional, Chuang, Yung-Yu, additional, and Huang, Jia-Bin, additional
Published: 2021
Full Text: View/download PDF

45. MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement

Author: Bao, Wenbo, primary, Lai, Wei-Sheng, additional, Zhang, Xiaoyun, additional, Gao, Zhiyong, additional, and Yang, Ming-Hsuan, additional
Published: 2021
Full Text: View/download PDF

46. Real-time Localized Photorealistic Video Style Transfer

Author: Xia, Xide, primary, Xue, Tianfan, additional, Lai, Wei-sheng, additional, Sun, Zheng, additional, Chang, Abby, additional, Kulis, Brian, additional, and Chen, Jiawen, additional
Published: 2021
Full Text: View/download PDF

47. Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

Author: Liu, Yu-Lun, primary, Lai, Wei-Sheng, additional, Chen, Yu-Sheng, additional, Kao, Yi-Lung, additional, Yang, Ming-Hsuan, additional, Chuang, Yung-Yu, additional, and Huang, Jia-Bin, additional
Published: 2020
Full Text: View/download PDF

48. Learning to See Through Obstructions

Author: Liu, Yu-Lun, primary, Lai, Wei-Sheng, additional, Yang, Ming-Hsuan, additional, Chuang, Yung-Yu, additional, and Huang, Jia-Bin, additional
Published: 2020
Full Text: View/download PDF

49. Exploiting Semantics for Face Image Deblurring

Author: Shen, Ziyi, primary, Lai, Wei-Sheng, additional, Xu, Tingfa, additional, Kautz, Jan, additional, and Yang, Ming-Hsuan, additional
Published: 2020
Full Text: View/download PDF

50. Visual Question Answering on 360° Images

Author: Chou, Shih-Han, primary, Chao, Wei-Lun, additional, Lai, Wei-Sheng, additional, Sun, Min, additional, and Yang, Ming-Hsuan, additional
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

153 results on '"Lai, Wei-Sheng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources