Author: "Gharbi, Michaël" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Gharbi, Michaël"' showing total 45 results

Start Over Author "Gharbi, Michaël"

45 results on '"Gharbi, Michaël"'

1. Image Neural Field Diffusion Models

Author: Chen, Yinbo, Wang, Oliver, Zhang, Richard, Shechtman, Eli, Wang, Xiaolong, and Gharbi, Michael
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion models have shown an impressive ability to model complex data distributions, with several key advantages over GANs, such as stable training, better coverage of the training distribution's modes, and the ability to solve inverse problems without extra training. However, most diffusion models learn the distribution of fixed-resolution images. We propose to learn the distribution of continuous images by training diffusion models on image neural fields, which can be rendered at any resolution, and show its advantages over fixed-resolution models. To achieve this, a key challenge is to obtain a latent space that represents photorealistic image neural fields. We propose a simple and effective method, inspired by several recent techniques but with key changes to make the image neural fields photorealistic. Our method can be used to convert existing latent diffusion autoencoders into image neural field autoencoders. We show that image neural field diffusion models can be trained using mixed-resolution image datasets, outperform fixed-resolution diffusion models followed by super-resolution models, and can solve inverse problems with conditions applied at different scales efficiently., Comment: Project page: https://yinboc.github.io/infd/
Published: 2024

2. Improved Distribution Matching Distillation for Fast Image Synthesis

Author: Yin, Tianwei, Gharbi, Michaël, Park, Taesung, Zhang, Richard, Shechtman, Eli, Durand, Fredo, and Freeman, William T.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss computed using a large set of noise-image pairs generated by the teacher with many steps of a deterministic sampler. This is costly for large-scale text-to-image synthesis and limits the student's quality, tying it too closely to the teacher's original sampling paths. We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. First, we eliminate the regression loss and the need for expensive dataset construction. We show that the resulting instability is due to the fake critic not estimating the distribution of generated samples accurately and propose a two time-scale update rule as a remedy. Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images. This lets us train the student model on real data, mitigating the imperfect real score estimation from the teacher model, and enhancing quality. Lastly, we modify the training procedure to enable multi-step sampling. We identify and address the training-inference input mismatch problem in this setting, by simulating inference-time generator samples during training time. Taken together, our improvements set new benchmarks in one-step image generation, with FID scores of 1.28 on ImageNet-64x64 and 8.35 on zero-shot COCO 2014, surpassing the original teacher despite a 500X reduction in inference cost. Further, we show our approach can generate megapixel images by distilling SDXL, demonstrating exceptional visual quality among few-step methods., Comment: Code, model, and dataset are available at https://tianweiy.github.io/dmd2
Published: 2024

3. Editable Image Elements for Controllable Synthesis

Author: Mu, Jiteng, Gharbi, Michaël, Zhang, Richard, Shechtman, Eli, Vasconcelos, Nuno, Wang, Xiaolong, and Park, Taesung
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion models have made significant advances in text-guided synthesis tasks. However, editing user-provided images remains challenging, as the high dimensional noise input space of diffusion models is not naturally suited for image inversion or spatial editing. In this work, we propose an image representation that promotes spatial editing of input images using a diffusion model. Concretely, we learn to encode an input into "image elements" that can faithfully reconstruct an input image. These elements can be intuitively edited by a user, and are decoded by a diffusion model into realistic images. We show the effectiveness of our representation on various image editing tasks, such as object resizing, rearrangement, dragging, de-occlusion, removal, variation, and image composition. Project page: https://jitengmu.github.io/Editable_Image_Elements/, Comment: Project page: https://jitengmu.github.io/Editable_Image_Elements/
Published: 2024

4. Lazy Diffusion Transformer for Interactive Image Editing

Author: Nitzan, Yotam, Wu, Zongze, Zhang, Richard, Shechtman, Eli, Cohen-Or, Daniel, Park, Taesung, and Gharbi, Michaël
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications using binary masks and text prompts. Our generator operates in two phases. First, a context encoder processes the current canvas and user mask to produce a compact global context tailored to the region to generate. Second, conditioned on this context, a diffusion-based transformer decoder synthesizes the masked pixels in a "lazy" fashion, i.e., it only generates the masked region. This contrasts with previous works that either regenerate the full canvas, wasting time and computation, or confine processing to a tight rectangular crop around the mask, ignoring the global image context altogether. Our decoder's runtime scales with the mask size, which is typically small, while our encoder introduces negligible overhead. We demonstrate that our approach is competitive with state-of-the-art inpainting methods in terms of quality and fidelity while providing a 10x speedup for typical user interactions, where the editing mask represents 10% of the image.
Published: 2024

5. Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos

Author: Alzayer, Hadi, Xia, Zhihao, Zhang, Xuaner, Shechtman, Eli, Huang, Jia-Bin, and Gharbi, Michael
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a generative model that, given a coarsely edited image, synthesizes a photorealistic output that follows the prescribed layout. Our method transfers fine details from the original image and preserves the identity of its parts. Yet, it adapts it to the lighting and context defined by the new layout. Our key insight is that videos are a powerful source of supervision for this task: objects and camera motions provide many observations of how the world changes with viewpoint, lighting, and physical interactions. We construct an image dataset in which each sample is a pair of source and target frames extracted from the same video at randomly chosen time intervals. We warp the source frame toward the target using two motion models that mimic the expected test-time user edits. We supervise our model to translate the warped image into the ground truth, starting from a pretrained diffusion model. Our model design explicitly enables fine detail transfer from the source frame to the generated image, while closely following the user-specified layout. We show that by using simple segmentations and coarse 2D manipulations, we can synthesize a photorealistic edit faithful to the user's input while addressing second-order effects like harmonizing the lighting and physical interactions between edited objects., Comment: Project page: https://magic-fixup.github.io/
Published: 2024

6. Learning Subject-Aware Cropping by Outpainting Professional Photos

Author: Hong, James, Yuan, Lu, Gharbi, Michaël, Fisher, Matthew, and Fatahalian, Kayvon
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: How to frame (or crop) a photo often depends on the image subject and its context; e.g., a human portrait. Recent works have defined the subject-aware image cropping task as a nuanced and practical version of image cropping. We propose a weakly-supervised approach (GenCrop) to learn what makes a high-quality, subject-aware crop from professional stock images. Unlike supervised prior work, GenCrop requires no new manual annotations beyond the existing stock image collection. The key challenge in learning from this data, however, is that the images are already cropped and we do not know what regions were removed. Our insight is to combine a library of stock images with a modern, pre-trained text-to-image diffusion model. The stock image collection provides diversity and its images serve as pseudo-labels for a good crop, while the text-image diffusion model is used to out-paint (i.e., outward inpainting) realistic uncropped images. Using this procedure, we are able to automatically generate a large dataset of cropped-uncropped training pairs to train a cropping model. Despite being weakly-supervised, GenCrop is competitive with state-of-the-art supervised methods and significantly better than comparable weakly-supervised baselines on quantitative and qualitative evaluation metrics., Comment: AAAI 24. Extended version with supplemental materials
Published: 2023

7. VecFusion: Vector Font Generation with Diffusion

Author: Thamizharasan, Vikas, Liu, Difan, Agarwal, Shantanu, Fisher, Matthew, Gharbi, Michael, Wang, Oliver, Jacobson, Alec, and Kalogerakis, Evangelos
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: We present VecFusion, a new neural architecture that can generate vector fonts with varying topological structures and precise control point positions. Our approach is a cascaded diffusion model which consists of a raster diffusion model followed by a vector diffusion model. The raster model generates low-resolution, rasterized fonts with auxiliary control point information, capturing the global style and shape of the font, while the vector model synthesizes vector fonts conditioned on the low-resolution raster fonts from the first stage. To synthesize long and complex curves, our vector diffusion model uses a transformer architecture and a novel vector representation that enables the modeling of diverse vector geometry and the precise prediction of control points. Our experiments show that, in contrast to previous generative models for vector graphics, our new cascaded vector diffusion model generates higher quality vector fonts, with complex structures and diverse styles.
Published: 2023

8. One-step Diffusion with Distribution Matching Distillation

Author: Yin, Tianwei, Gharbi, Michaël, Zhang, Richard, Shechtman, Eli, Durand, Fredo, Freeman, William T., and Park, Taesung
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient can be expressed as the difference between 2 score functions, one of the target distribution and the other of the synthetic distribution being produced by our one-step generator. The score functions are parameterized as two diffusion models trained separately on each distribution. Combined with a simple regression loss matching the large-scale structure of the multi-step diffusion outputs, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model generates images at 20 FPS on modern hardware., Comment: CVPR 2024, Project page: https://tianweiy.github.io/dmd/
Published: 2023

9. Editable Image Elements for Controllable Synthesis

Author: Mu, Jiteng, Gharbi, Michaël, Zhang, Richard, Shechtman, Eli, Vasconcelos, Nuno, Wang, Xiaolong, Park, Taesung, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

10. Materialistic: Selecting Similar Materials in Images

Author: Sharma, Prafull, Philip, Julien, Gharbi, Michaël, Freeman, William T., Durand, Fredo, and Deschaintre, Valentin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic segmentation (different woods or metal should not be selected together), we formulate the problem as a similarity-based grouping problem based on a user-provided image location. In particular, we propose to leverage the unsupervised DINO features coupled with a proposed Cross-Similarity module and an MLP head to extract material similarities in an image. We train our model on a new synthetic image dataset, that we release. We show that our method generalizes well to real-world images. We carefully analyze our model's behavior on varying material properties and lighting. Additionally, we evaluate it against a hand-annotated benchmark of 50 real photographs. We further demonstrate our model on a set of applications, including material editing, in-video selection, and retrieval of object photographs with similar materials.
Published: 2023

11. Semi-supervised Parametric Real-world Image Harmonization

Author: Wang, Ke, Gharbi, Michaël, Zhang, He, Xia, Zhihao, and Shechtman, Eli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Learning-based image harmonization techniques are usually trained to undo synthetic random global transformations applied to a masked foreground in a single ground truth photo. This simulated data does not model many of the important appearance mismatches (illumination, object boundaries, etc.) between foreground and background in real composites, leading to models that do not generalize well and cannot model complex local changes. We propose a new semi-supervised training strategy that addresses this problem and lets us learn complex local appearance harmonization from unpaired real composites, where foreground and background come from different images. Our model is fully parametric. It uses RGB curves to correct the global colors and tone and a shading map to model local variations. Our method outperforms previous work on established benchmarks and real composites, as shown in a user study, and processes high-resolution images interactively., Comment: 19 pages, 16 figures, 5 tables
Published: 2023

12. Domain Expansion of Image Generators

Author: Nitzan, Yotam, Gharbi, Michaël, Zhang, Richard, Park, Taesung, Zhu, Jun-Yan, Cohen-Or, Daniel, and Shechtman, Eli
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: Can one inject new concepts into an already trained generative model, while respecting its existing structure and knowledge? We propose a new task - domain expansion - to address this. Given a pretrained generator and novel (but related) domains, we expand the generator to jointly model all domains, old and new, harmoniously. First, we note the generator contains a meaningful, pretrained latent space. Is it possible to minimally perturb this hard-earned representation, while maximally representing the new domains? Interestingly, we find that the latent space offers unused, "dormant" directions, which do not affect the output. This provides an opportunity: By "repurposing" these directions, we can represent new domains without perturbing the original representation. In fact, we find that pretrained generators have the capacity to add several - even hundreds - of new domains! Using our expansion method, one "expanded" model can supersede numerous domain-specific models, without expanding the model size. Additionally, a single expanded generator natively supports smooth transitions between domains, as well as composition of domains. Code and project page available at https://yotamnitzan.github.io/domain-expansion/., Comment: Project Page and code are available at https://yotamnitzan.github.io/domain-expansion/. CVPR 2023 Camera-Ready
Published: 2023

13. Spotting Temporally Precise, Fine-Grained Events in Video

Author: Hong, James, Zhang, Haotian, Gharbi, Michaël, Fisher, Matthew, and Fatahalian, Kayvon
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce the task of spotting temporally precise, fine-grained events in video (detecting the precise moment in time events occur). Precise spotting requires models to reason globally about the full-time scale of actions and locally to identify subtle frame-to-frame appearance and motion differences that identify events during these actions. Surprisingly, we find that top performing solutions to prior video understanding tasks such as action detection and segmentation do not simultaneously meet both requirements. In response, we propose E2E-Spot, a compact, end-to-end model that performs well on the precise spotting task and can be trained quickly on a single GPU. We demonstrate that E2E-Spot significantly outperforms recent baselines adapted from the video action detection, segmentation, and spotting literature to the precise spotting task. Finally, we contribute new annotations and splits to several fine-grained sports action datasets to make these datasets suitable for future work on precise spotting., Comment: ECCV 2022; Website URL: https://jhong93.github.io/projects/spot.html
Published: 2022

14. Differentiable Rendering of Neural SDFs through Reparameterization

Author: Bangaru, Sai Praveen, Gharbi, Michaël, Li, Tzu-Mao, Luan, Fujun, Sunkavalli, Kalyan, Hašan, Miloš, Bi, Sai, Xu, Zexiang, Bernstein, Gilbert, and Durand, Frédo
Subjects: Computer Science - Graphics, Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a method to automatically compute correct gradients with respect to geometric scene parameters in neural SDF renderers. Recent physically-based differentiable rendering techniques for meshes have used edge-sampling to handle discontinuities, particularly at object silhouettes, but SDFs do not have a simple parametric form amenable to sampling. Instead, our approach builds on area-sampling techniques and develops a continuous warping function for SDFs to account for these discontinuities. Our method leverages the distance to surface encoded in an SDF and uses quadrature on sphere tracer points to compute this warping function. We further show that this can be done by subsampling the points to make the method tractable for neural SDFs. Our differentiable renderer can be used to optimize neural shapes from multi-view images and produces comparable 3D reconstructions to recent SDF-based inverse rendering methods, without the need for 2D segmentation masks to guide the geometry optimization and no volumetric approximations to the geometry.
Published: 2022

15. Any-resolution Training for High-resolution Image Synthesis

Author: Chai, Lucy, Gharbi, Michael, Shechtman, Eli, Isola, Phillip, and Zhang, Richard
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Generative models operate at fixed resolution, even though natural images come in a variety of sizes. As high-resolution details are downsampled away and low-resolution images are discarded altogether, precious supervision is lost. We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions. To take advantage of varied-size data, we introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions. First, conditioning the generator on a target scale allows us to generate higher resolution images than previously possible, without adding layers to the model. Second, by conditioning on continuous coordinates, we can sample patches that still obey a consistent global layout, which also allows for scalable training at higher resolutions. Controlled FFHQ experiments show that our method can take advantage of multi-resolution training data better than discrete multi-scale approaches, achieving better FID scores and cleaner high-frequency details. We also train on other natural image domains including churches, mountains, and birds, and demonstrate arbitrary scale synthesis with both coherent global layouts and realistic local details, going beyond 2K resolution in our experiments. Our project page is available at: https://chail.github.io/anyres-gan/., Comment: ECCV 2022 camera ready version; project page https://chail.github.io/anyres-gan/
Published: 2022

16. Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition

Author: Hong, James, Fisher, Matthew, Gharbi, Michaël, and Fatahalian, Kayvon
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Human pose is a useful feature for fine-grained sports action understanding. However, pose estimators are often unreliable when run on sports video due to domain shift and factors such as motion blur and occlusions. This leads to poor accuracy when downstream tasks, such as action recognition, depend on pose. End-to-end learning circumvents pose, but requires more labels to generalize. We introduce Video Pose Distillation (VPD), a weakly-supervised technique to learn features for new video domains, such as individual sports that challenge pose estimation. Under VPD, a student network learns to extract robust pose features from RGB frames in the sports video, such that, whenever pose is considered reliable, the features match the output of a pretrained teacher pose detector. Our strategy retains the best of both pose and end-to-end worlds, exploiting the rich visual patterns in raw video frames, while learning features that agree with the athletes' pose and motion in the target video domain to avoid over-fitting to patterns unrelated to athletes' motion. VPD features improve performance on few-shot, fine-grained action recognition, retrieval, and detection tasks in four real-world sports video datasets, without requiring additional ground-truth pose annotations., Comment: ICCV 2021 (poster)
Published: 2021

17. Free-viewpoint Indoor Neural Relighting from Multi-view Stereo

Author: Philip, Julien, Morgenthaler, Sébastien, Gharbi, Michaël, and Drettakis, George
Subjects: Computer Science - Graphics, Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a neural relighting algorithm for captured indoors scenes, that allows interactive free-viewpoint navigation. Our method allows illumination to be changed synthetically, while coherently rendering cast shadows and complex glossy materials. We start with multiple images of the scene and a 3D mesh obtained by multi-view stereo (MVS) reconstruction. We assume that lighting is well-explained as the sum of a view-independent diffuse component and a view-dependent glossy term concentrated around the mirror reflection direction. We design a convolutional network around input feature maps that facilitate learning of an implicit representation of scene materials and illumination, enabling both relighting and free-viewpoint navigation. We generate these input maps by exploiting the best elements of both image-based and physically-based rendering. We sample the input views to estimate diffuse scene irradiance, and compute the new illumination caused by user-specified light sources using path tracing. To facilitate the network's understanding of materials and synthesize plausible glossy reflections, we reproject the views and compute mirror images. We train the network on a synthetic dataset where each scene is also reconstructed with MVS. We show results of our algorithm relighting real indoor scenes and performing free-viewpoint navigation with complex and realistic glossy reflections, which so far remained out of reach for view-synthesis techniques.
Published: 2021

18. MarioNette: Self-Supervised Sprite Learning

Author: Smirnov, Dmitriy, Gharbi, Michael, Fisher, Matthew, Guizilini, Vitor, Efros, Alexei A., and Solomon, Justin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Artists and video game designers often construct 2D animations using libraries of sprites -- textured patches of objects and characters. We propose a deep learning approach that decomposes sprite-based video animations into a disentangled representation of recurring graphic elements in a self-supervised manner. By jointly learning a dictionary of possibly transparent patches and training a network that places them onto a canvas, we deconstruct sprite-based content into a sparse, consistent, and explicit representation that can be easily used in downstream tasks, like editing or analysis. Our framework offers a promising approach for discovering recurring visual patterns in image collections without supervision., Comment: Accepted to NeurIPS 2021
Published: 2021

19. Modulated Periodic Activations for Generalizable Local Functional Representations

Author: Mehta, Ishit, Gharbi, Michaël, Barnes, Connelly, Shechtman, Eli, Ramamoorthi, Ravi, and Chandraker, Manmohan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Multi-Layer Perceptrons (MLPs) make powerful functional representations for sampling and reconstruction problems involving low-dimensional signals like images,shapes and light fields. Recent works have significantly improved their ability to represent high-frequency content by using periodic activations or positional encodings. This often came at the expense of generalization: modern methods are typically optimized for a single signal. We present a new representation that generalizes to multiple instances and achieves state-of-the-art fidelity. We use a dual-MLP architecture to encode the signals. A synthesis network creates a functional mapping from a low-dimensional input (e.g. pixel-position) to the output domain (e.g. RGB color). A modulation network maps a latent code corresponding to the target signal to parameters that modulate the periodic activations of the synthesis network. We also propose a local-functional representation which enables generalization. The signal's domain is partitioned into a regular grid,with each tile represented by a latent code. At test time, the signal is encoded with high-fidelity by inferring (or directly optimizing) the latent code-book. Our approach produces generalizable functional representations of images, videos and shapes, and achieves higher reconstruction quality than prior works that are optimized for a single signal., Comment: Project Page at https://ishit.github.io/modsine/
Published: 2021

20. Im2Vec: Synthesizing Vector Graphics without Vector Supervision

Author: Reddy, Pradyumna, Gharbi, Michael, Lukac, Michal, and Mitra, Niloy J.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Vector graphics are widely used to represent fonts, logos, digital artworks, and graphic designs. But, while a vast body of work has focused on generative algorithms for raster images, only a handful of options exists for vector graphics. One can always rasterize the input graphic and resort to image-based generative approaches, but this negates the advantages of the vector representation. The current alternative is to use specialized models that require explicit supervision on the vector graphics representation at training time. This is not ideal because large-scale high quality vector-graphics datasets are difficult to obtain. Furthermore, the vector representation for a given design is not unique, so models that supervise on the vector representation are unnecessarily constrained. Instead, we propose a new neural network that can generate complex vector graphics with varying topologies, and only requires indirect supervision from readily-available raster training images (i.e., with no vector counterparts). To enable this, we use a differentiable rasterization pipeline that renders the generated vector shapes and composites them together onto a raster canvas. We demonstrate our method on a range of datasets, and provide comparison with state-of-the-art SVG-VAE and DeepSVG, both of which require explicit vector graphics supervision. Finally, we also demonstrate our approach on the MNIST dataset, for which no groundtruth vector representation is available. Source code, datasets, and more results are available at geometry.cs.ucl.ac.uk/projects/2021/Im2Vec/
Published: 2021

21. Deep Denoising of Flash and No-Flash Pairs for Photography in Low-Light Environments

Author: Xia, Zhihao, Gharbi, Michaël, Perazzi, Federico, Sunkavalli, Kalyan, and Chakrabarti, Ayan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a neural network-based method to denoise pairs of images taken in quick succession, with and without a flash, in low-light environments. Our goal is to produce a high-quality rendering of the scene that preserves the color and mood from the ambient illumination of the noisy no-flash image, while recovering surface texture and detail revealed by the flash. Our network outputs a gain map and a field of kernels, the latter obtained by linearly mixing elements of a per-image low-rank kernel basis. We first apply the kernel field to the no-flash image, and then multiply the result with the gain map to create the final output. We show our network effectively learns to produce high-quality images by combining a smoothed out estimate of the scene's ambient appearance from the no-flash image, with high-frequency albedo details extracted from the flash input. Our experiments show significant improvements over alternative captures without a flash, and baseline denoisers that use flash no-flash pairs. In particular, our method produces images that are both noise-free and contain accurate ambient colors without the sharp shadows or strong specular highlights visible in the flash image., Comment: CVPR 2021. Project page at https://www.cse.wustl.edu/~zhihao.xia/deepfnf/
Published: 2020

22. Spatially-Adaptive Pixelwise Networks for Fast Image Translation

Author: Shaham, Tamar Rott, Gharbi, Michael, Zhang, Richard, Shechtman, Eli, and Michaeli, Tomer
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation. We design the generator to be an extremely lightweight function of the full-resolution image. In fact, we use pixel-wise networks; that is, each pixel is processed independently of others, through a composition of simple affine transformations and nonlinearities. We take three important steps to equip such a seemingly simple function with adequate expressivity. First, the parameters of the pixel-wise networks are spatially varying so they can represent a broader function class than simple 1x1 convolutions. Second, these parameters are predicted by a fast convolutional network that processes an aggressively low-resolution representation of the input; Third, we augment the input image with a sinusoidal encoding of spatial coordinates, which provides an effective inductive bias for generating realistic novel high-frequency image content. As a result, our model is up to 18x faster than state-of-the-art baselines. We achieve this speedup while generating comparable visual quality across different image resolutions and translation domains.
Published: 2020

23. Basis Prediction Networks for Effective Burst Denoising with Large Kernels

Author: Xia, Zhihao, Perazzi, Federico, Gharbi, Michaël, Sunkavalli, Kalyan, and Chakrabarti, Ayan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Bursts of images exhibit significant self-similarity across both time and space. This motivates a representation of the kernels as linear combinations of a small set of basis elements. To this end, we introduce a novel basis prediction network that, given an input burst, predicts a set of global basis kernels -- shared within the image -- and the corresponding mixing coefficients -- which are specific to individual pixels. Compared to state-of-the-art techniques that output a large tensor of per-pixel spatiotemporal kernels, our formulation substantially reduces the dimensionality of the network output. This allows us to effectively exploit comparatively larger denoising kernels, achieving both significant quality improvements (over 1dB PSNR) and faster run-times over state-of-the-art methods., Comment: CVPR 2020. Project website at https://www.cse.wustl.edu/~zhihao.xia/bpn/
Published: 2019

24. A Dataset of Multi-Illumination Images in the Wild

Author: Murmann, Lukas, Gharbi, Michael, Aittala, Miika, and Durand, Fredo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Collections of images under a single, uncontrolled illumination have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. To fill this gap, we introduce a new multi-illumination dataset of more than 1000 real scenes, each captured under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance., Comment: ICCV 2019
Published: 2019

25. Learning to optimize halide with tree search and random programs

Author: Adams, Andrew, Ma, Karima, Anderson, Luke, Baghdadi, Riyadh, Li, Tzu-Mao, Gharbi, Michaël, Steiner, Benoit, Johnson, Steven, Fatahalian, Kayvon, Durand, Frédo, and Ragan-Kelley, Jonathan
Subjects: optimizing compilers, Halide, Artificial Intelligence and Image Processing, Information Systems, Software Engineering
Abstract: We present a new algorithm to automatically schedule Halide programs for high-performance image processing and deep learning. We significantly improve upon the performance of previous methods, which considered a limited subset of schedules. We define a parameterization of possible schedules much larger than prior methods and use a variant of beam search to search over it. The search optimizes runtime predicted by a cost model based on a combination of new derived features and machine learning. We train the cost model by generating and featurizing hundreds of thousands of random programs and schedules. We show that this approach operates effectively with or without autotuning. It produces schedules which are on average almost twice as fast as the existing Halide autoscheduler without autotuning, or more than twice as fast with, and is the first automatic scheduling algorithm to significantly outperform human experts on average.
Published: 2019

26. Deep Bilateral Learning for Real-Time Image Enhancement

Author: Gharbi, Michaël, Chen, Jiawen, Barron, Jonathan T., Hasinoff, Samuel W., and Durand, Frédo
Subjects: Computer Science - Graphics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher., Comment: 12 pages, 14 figures, Siggraph 2017
Published: 2017
Full Text: View/download PDF

27. Convolutional Neural Network for Earthquake Detection and Location

Author: Perol, Thibaut, Gharbi, Michaël, and Denolle, Marine
Subjects: Physics - Geophysics
Abstract: The recent evolution of induced seismicity in Central United States calls for exhaustive catalogs to improve seismic hazard assessment. Over the last decades, the volume of seismic data has increased exponentially, creating a need for efficient algorithms to reliably detect and locate earthquakes. Today's most elaborate methods scan through the plethora of continuous seismic records, searching for repeating seismic signals. In this work, we leverage the recent advances in artificial intelligence and present ConvNetQuake, a highly scalable convolutional neural network for earthquake detection and location from a single waveform. We apply our technique to study the induced seismicity in Oklahoma (USA). We detect 20 times more earthquakes than previously cataloged by the Oklahoma Geological Survey. Our algorithm is orders of magnitude faster than established methods.
Published: 2017

28. Spotting Temporally Precise, Fine-Grained Events in Video

Author: Hong, James, primary, Zhang, Haotian, additional, Gharbi, Michaël, additional, Fisher, Matthew, additional, and Fatahalian, Kayvon, additional
Published: 2022
Full Text: View/download PDF

29. Any-Resolution Training for High-Resolution Image Synthesis

Author: Chai, Lucy, primary, Gharbi, Michaël, additional, Shechtman, Eli, additional, Isola, Phillip, additional, and Zhang, Richard, additional
Published: 2022
Full Text: View/download PDF

30. Shadow Harmonization for Realistic Compositing

Author: Valença, Lucas, primary, Zhang, Jinsong, additional, Gharbi, Michaël, additional, Hold-Geoffroy, Yannick, additional, and Lalonde, Jean-François, additional
Published: 2023
Full Text: View/download PDF

31. Discontinuity-Aware 2D Neural Fields

Author: Belhe, Yash, primary, Gharbi, Michaël, additional, Fisher, Matthew, additional, Georgiev, Iliyan, additional, Ramamoorthi, Ravi, additional, and Li, Tzu-Mao, additional
Published: 2023
Full Text: View/download PDF

32. Materialistic: Selecting Similar Materials in Images

Author: Sharma, Prafull, primary, Philip, Julien, additional, Gharbi, Michaël, additional, Freeman, Bill, additional, Durand, Fredo, additional, and Deschaintre, Valentin, additional
Published: 2023
Full Text: View/download PDF

33. Domain Expansion of Image Generators

Author: Nitzan, Yotam, primary, Gharbi, Michaël, additional, Zhang, Richard, additional, Park, Taesung, additional, Zhu, Jun-Yan, additional, Cohen-Or, Daniel, additional, and Shechtman, Eli, additional
Published: 2023
Full Text: View/download PDF

34. Free-viewpoint Indoor Neural Relighting from Multi-view Stereo

Author: Philip, Julien, primary, Morgenthaler, Sébastien, additional, Gharbi, Michaël, additional, and Drettakis, George, additional
Published: 2021
Full Text: View/download PDF

35. Interactive Monte Carlo denoising using affinity of neural features

Author: Işik, Mustafa, primary, Mullia, Krishna, additional, Fisher, Matthew, additional, Eisenmann, Jonathan, additional, and Gharbi, Michaël, additional
Published: 2021
Full Text: View/download PDF

36. Interactive Monte Carlo denoising using affinity of neural features

Author: Işık, Mustafa, primary, Mullia, Krishna, additional, Fisher, Matthew, additional, Eisenmann, Jonathan, additional, and Gharbi, Michaël, additional
Published: 2021
Full Text: View/download PDF

37. Deep joint demosaicking and denoising

Author: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Gharbi, Michaël, Chaurasia, Gaurav, Paris, Sylvain, Durand, Frédo, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Gharbi, Michaël, Chaurasia, Gaurav, Paris, Sylvain, and Durand, Frédo
Abstract: © 2016 ACM. SA'16 Technical Papers,, December 05-08, 2016, Macao Demosaicking and denoising are the key first stages of the digital imaging pipeline but they are also a severely ill-posed problem that infers three color values per pixel from a single noisy measurement. Earlier methods rely on hand-crafted filters or priors and still exhibit disturbing visual artifacts in hard cases such as moiré or thin edges. We introduce a new data-driven approach for these challenges: we train a deep neural network on a large corpus of images instead of using hand-tuned filters. While deep learning has shown great success, its naive application using existing training datasets does not give satisfactory results for our problem because these datasets lack hard cases. To create a better training set, we present metrics to identify difficult patches and techniques for mining community photographs for such patches. Our experiments show that this network and training procedure outperform state-of-the-art both on noisy and noise-free data. Furthermore, our algorithm is an order of magnitude faster than the previous best performing techniques.
Published: 2021

38. Differentiable vector graphics rasterization for editing and learning

Author: Li, Tzu-Mao, primary, Lukáč, Michal, additional, Gharbi, Michaël, additional, and Ragan-Kelley, Jonathan, additional
Published: 2020
Full Text: View/download PDF

39. Multi-view relighting using a geometry-aware network

Author: Philip, Julien, primary, Gharbi, Michaël, additional, Zhou, Tinghui, additional, Efros, Alexei A., additional, and Drettakis, George, additional
Published: 2019
Full Text: View/download PDF

40. Sample-based Monte Carlo denoising using a kernel-splatting network

Author: Gharbi, Michaël, primary, Li, Tzu-Mao, additional, Aittala, Miika, additional, Lehtinen, Jaakko, additional, and Durand, Frédo, additional
Published: 2019
Full Text: View/download PDF

41. Differentiable programming for image processing and deep learning in halide

Author: Li, Tzu-Mao, primary, Gharbi, Michaël, additional, Adams, Andrew, additional, Durand, Frédo, additional, and Ragan-Kelley, Jonathan, additional
Published: 2018
Full Text: View/download PDF

42. Convolutional neural network for earthquake detection and location

Author: Perol, Thibaut, primary, Gharbi, Michaël, additional, and Denolle, Marine, additional
Published: 2018
Full Text: View/download PDF

43. Deep bilateral learning for real-time image enhancement

Author: Gharbi, Michaël, primary, Chen, Jiawen, additional, Barron, Jonathan T., additional, Hasinoff, Samuel W., additional, and Durand, Frédo, additional
Published: 2017
Full Text: View/download PDF

44. Deep joint demosaicking and denoising

Author: Gharbi, Michaël, primary, Chaurasia, Gaurav, additional, Paris, Sylvain, additional, and Durand, Frédo, additional
Published: 2016
Full Text: View/download PDF

45. Transform recipes for efficient cloud photo enhancement

Author: Gharbi, Michaël, primary, Shih, YiChang, additional, Chaurasia, Gaurav, additional, Ragan-Kelley, Jonathan, additional, Paris, Sylvain, additional, and Durand, Frédo, additional
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

45 results on '"Gharbi, Michaël"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources