Author: "Mairal, Julien" - Searchworks@Jio Institute Digital Library Search Results

1. LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes

Author: Marrie, Juliette, Menegaux, Romain, Arbel, Michael, Larlus, Diane, and Mairal, Julien
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We address the problem of extending the capabilities of vision foundation models such as DINO, SAM, and CLIP, to 3D tasks. Specifically, we introduce a novel method to uplift 2D image features into 3D Gaussian Splatting scenes. Unlike traditional approaches that rely on minimizing a reconstruction loss, our method employs a simpler and more efficient feature aggregation technique, augmented by a graph diffusion mechanism. Graph diffusion enriches features from a given model, such as CLIP, by leveraging 3D geometry and pairwise similarities induced by another strong model such as DINOv2. Our approach achieves performance comparable to the state of the art on multiple downstream tasks while delivering significant speed-ups. Notably, we obtain competitive segmentation results using generic DINOv2 features, despite DINOv2 not being trained on millions of annotated segmentation masks like SAM. When applied to CLIP features, our method demonstrates strong performance in open-vocabulary object detection tasks, highlighting the versatility of our approach.
Published: 2024

2. MODEL&CO: Exoplanet detection in angular differential imaging by learning across multiple observations

Author: Bodrito, Théo, Flasseur, Olivier, Mairal, Julien, Ponce, Jean, Langlois, Maud, and Lagrange, Anne-Marie
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Earth and Planetary Astrophysics, Computer Science - Computer Vision and Pattern Recognition, Physics - Data Analysis, Statistics and Probability
Abstract: Direct imaging of exoplanets is particularly challenging due to the high contrast between the planet and the star luminosities, and their small angular separation. In addition to tailored instrumental facilities implementing adaptive optics and coronagraphy, post-processing methods combining several images recorded in pupil tracking mode are needed to attenuate the nuisances corrupting the signals of interest. Most of these post-processing methods build a model of the nuisances from the target observations themselves, resulting in strongly limited detection sensitivity at short angular separations due to the lack of angular diversity. To address this issue, we propose to build the nuisance model from an archive of multiple observations by leveraging supervised deep learning techniques. The proposed approach casts the detection problem as a reconstruction task and captures the structure of the nuisance from two complementary representations of the data. Unlike methods inspired by reference differential imaging, the proposed model is highly non-linear and does not resort to explicit image-to-image similarity measurements and subtractions. The proposed approach also encompasses statistical modeling of learnable spatial features. The latter is beneficial to improve both the detection sensitivity and the robustness against heterogeneous data. We apply the proposed algorithm to several datasets from the VLT/SPHERE instrument, and demonstrate a superior precision-recall trade-off compared to the PACO algorithm. Interestingly, the gain is especially important when the diversity induced by ADI is the most limited, thus supporting the ability of the proposed approach to learn information across multiple observations.
Published: 2024

3. Combining statistical learning with deep learning for improved exoplanet detection and characterization

Author: Flasseur, Olivier, Bodrito, Théo, Mairal, Julien, Ponce, Jean, Langlois, Maud, and Lagrange, Anne-Marie
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics
Abstract: In direct imaging at high contrast, the bright glare produced by the host star makes the detection and the characterization of sub-stellar companions particularly challenging. In spite of the use of an extreme adaptive optics system combined with a coronagraphic mask to strongly attenuate the starlight contamination, dedicated post-processing methods combining several images recorded with the pupil tracking mode of the telescope are needed to reach the required contrast. In that context, we recently proposed to combine the statistics-based model of PACO with a deep learning approach in a three-step algorithm. First, the data are centered and whitened locally using the PACO framework to improve the stationarity and the contrast in a preprocessing step. Second, a convolutional neural network (CNN) is trained in a supervised fashion to detect the signature of synthetic sources in the preprocessed science data. Finally, the trained network is applied to the preprocessed observations and delivers a detection map. A second network is trained to infer locally the photometry of detected sources. Both deep models are trained from scratch with a custom data augmentation strategy allowing to generate a large training set from a single spatio-temporo-spectral dataset. This strategy can be applied to process jointly the images of observations conducted with angular, and eventually spectral, differential imaging (A(S)DI). In this proceeding, we present in a unified framework the key ingredients of the deep PACO algorithm both for ADI and ASDI. We apply our method on several datasets from the the IRDIS imager of the VLT/SPHERE instrument. Our method reaches, in average, a better trade-off between precision and recall than the comparative algorithms., Comment: AO4ELT 2023
Published: 2024

4. SpectralEarth: Training Hyperspectral Foundation Models at Scale

Author: Braham, Nassim Ait Ali, Albrecht, Conrad M, Mairal, Julien, Chanussot, Jocelyn, Wang, Yi, and Zhu, Xiao Xiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Foundation models have triggered a paradigm shift in computer vision and are increasingly being adopted in remote sensing, particularly for multispectral imagery. Yet, their potential in hyperspectral imaging (HSI) remains untapped due to the absence of comprehensive and globally representative hyperspectral datasets. To close this gap, we introduce SpectralEarth, a large-scale multi-temporal dataset designed to pretrain hyperspectral foundation models leveraging data from the Environmental Mapping and Analysis Program (EnMAP). SpectralEarth comprises 538,974 image patches covering 415,153 unique locations from more than 11,636 globally distributed EnMAP scenes spanning two years of archive. Additionally, 17.5% of these locations include multiple timestamps, enabling multi-temporal HSI analysis. Utilizing state-of-the-art self-supervised learning (SSL) algorithms, we pretrain a series of foundation models on SpectralEarth. We integrate a spectral adapter into classical vision backbones to accommodate the unique characteristics of HSI. In tandem, we construct four downstream datasets for land-cover and crop-type mapping, providing benchmarks for model evaluation. Experimental results support the versatility of our models, showcasing their generalizability across different tasks and sensors. We also highlight computational efficiency during model fine-tuning. The dataset, models, and source code will be made publicly available.
Published: 2024

5. Functional Bilevel Optimization for Machine Learning

Author: Petrulionyte, Ieva, Mairal, Julien, and Arbel, Michael
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: In this paper, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point of view does not rely on this assumption and notably allows using over-parameterized neural networks as the inner prediction function. We propose scalable and efficient algorithms for the functional bilevel optimization problem and illustrate the benefits of our approach on instrumental regression and reinforcement learning tasks.
Published: 2024

6. On Good Practices for Task-Specific Distillation of Large Pretrained Visual Models

Author: Marrie, Juliette, Arbel, Michael, Mairal, Julien, and Larlus, Diane
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large pretrained visual models exhibit remarkable generalization across diverse recognition tasks. Yet, real-world applications often demand compact models tailored to specific problems. Variants of knowledge distillation have been devised for such a purpose, enabling task-specific compact models (the students) to learn from a generic large pretrained one (the teacher). In this paper, we show that the excellent robustness and versatility of recent pretrained models challenge common practices established in the literature, calling for a new set of optimal guidelines for task-specific distillation. To address the lack of samples in downstream tasks, we also show that a variant of Mixup based on stable diffusion complements standard data augmentation. This strategy eliminates the need for engineered text prompts and improves distillation of generic models into streamlined specialized networks.
Published: 2024

7. Fast Semisupervised Unmixing Using Nonconvex Optimization

Author: Rasti, Behnood, Zouaoui, Alexandre, Mairal, Julien, and Chanussot, Jocelyn
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: In this paper, we introduce a novel linear model tailored for semisupervised/library-based unmixing. Our model incorporates considerations for library mismatch while enabling the enforcement of the abundance sum-to-one constraint (ASC). Unlike conventional sparse unmixing methods, this model involves nonconvex optimization, presenting significant computational challenges. We demonstrate the efficacy of Alternating Methods of Multipliers (ADMM) in cyclically solving these intricate problems. We propose two semisupervised unmixing approaches, each relying on distinct priors applied to the new model in addition to the ASC: sparsity prior and convexity constraint. Our experimental results validate that enforcing the convexity constraint outperforms the sparsity prior for the endmember library. These results are corroborated across three simulated datasets (accounting for spectral variability and varying pixel purity levels) and the Cuprite dataset. Additionally, our comparison with conventional sparse unmixing methods showcases considerable advantages of our proposed model, which entails nonconvex optimization. Notably, our implementations of the proposed algorithms-fast semisupervised unmixing (FaSUn) and sparse unmixing using soft-shrinkage (SUnS)-prove considerably more efficient than traditional sparse unmixing methods. SUnS and FaSUn were implemented using PyTorch and provided in a dedicated Python package called Fast Semisupervised Unmixing (FUnmix), which is open-source and available at https://github.com/BehnoodRasti/FUnmix
Published: 2024
Full Text: View/download PDF

8. Fine Dense Alignment of Image Bursts through Camera Pose and Depth Estimation

Author: Lecouat, Bruno, de Mont-Marin, Yann Dubois, Bodrito, Théo, Mairal, Julien, and Ponce, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper introduces a novel approach to the fine alignment of images in a burst captured by a handheld camera. In contrast to traditional techniques that estimate two-dimensional transformations between frame pairs or rely on discrete correspondences, the proposed algorithm establishes dense correspondences by optimizing both the camera motion and surface depth and orientation at every pixel. This approach improves alignment, particularly in scenarios with parallax challenges. Extensive experiments with synthetic bursts featuring small and even tiny baselines demonstrate that it outperforms the best optical flow methods available today in this setting, without requiring any training. Beyond enhanced alignment, our method opens avenues for tasks beyond simple image restoration, such as depth estimation and 3D reconstruction, as supported by promising preliminary results. This positions our approach as a versatile tool for various burst image processing applications.
Published: 2023

9. Vision Transformers Need Registers

Author: Darcet, Timothée, Oquab, Maxime, Mairal, Julien, and Bojanowski, Piotr
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. The artifacts correspond to high-norm tokens appearing during inference primarily in low-informative background areas of images, that are repurposed for internal computations. We propose a simple yet effective solution based on providing additional tokens to the input sequence of the Vision Transformer to fill that role. We show that this solution fixes that problem entirely for both supervised and self-supervised models, sets a new state of the art for self-supervised visual models on dense visual prediction tasks, enables object discovery methods with larger models, and most importantly leads to smoother feature maps and attention maps for downstream visual processing.
Published: 2023

10. Image Processing and Machine Learning for Hyperspectral Unmixing: An Overview and the HySUPP Python Package

Author: Rasti, Behnood, Zouaoui, Alexandre, Mairal, Julien, and Chanussot, Jocelyn
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Spectral pixels are often a mixture of the pure spectra of the materials, called endmembers, due to the low spatial resolution of hyperspectral sensors, double scattering, and intimate mixtures of materials in the scenes. Unmixing estimates the fractional abundances of the endmembers within the pixel. Depending on the prior knowledge of endmembers, linear unmixing can be divided into three main groups: supervised, semi-supervised, and unsupervised (blind) linear unmixing. Advances in Image processing and machine learning substantially affected unmixing. This paper provides an overview of advanced and conventional unmixing approaches. Additionally, we draw a critical comparison between advanced and conventional techniques from the three categories. We compare the performance of the unmixing techniques on three simulated and two real datasets. The experimental results reveal the advantages of different unmixing categories for different unmixing scenarios. Moreover, we provide an open-source Python-based package available at https://github.com/BehnoodRasti/HySUPP to reproduce the results., Comment: IEEE Transactions on Geoscience and Remote Sensing, 2024
Published: 2023
Full Text: View/download PDF

11. SUnAA: Sparse Unmixing using Archetypal Analysis

Author: Rasti, Behnood, Zouaoui, Alexandre, Mairal, Julien, and Chanussot, Jocelyn
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: This paper introduces a new sparse unmixing technique using archetypal analysis (SUnAA). First, we design a new model based on archetypal analysis. We assume that the endmembers of interest are a convex combination of endmembers provided by a spectral library and that the number of endmembers of interest is known. Then, we propose a minimization problem. Unlike most conventional sparse unmixing methods, here the minimization problem is non-convex. We minimize the optimization objective iteratively using an active set algorithm. Our method is robust to the initialization and only requires the number of endmembers of interest. SUnAA is evaluated using two simulated datasets for which results confirm its better performance over other conventional and advanced techniques in terms of signal-to-reconstruction error. SUnAA is also applied to Cuprite dataset and the results are compared visually with the available geological map provided for this dataset. The qualitative assessment demonstrates the successful estimation of the minerals abundances and significantly improves the detection of dominant minerals compared to the conventional regression-based sparse unmixing methods. The Python implementation of SUnAA can be found at: https://github.com/BehnoodRasti/SUnAA.
Published: 2023
Full Text: View/download PDF

12. GloptiNets: Scalable Non-Convex Optimization with Certificates

Author: Beugnot, Gaspard, Mairal, Julien, and Rudi, Alessandro
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: We present a novel approach to non-convex optimization with certificates, which handles smooth functions on the hypercube or on the torus. Unlike traditional methods that rely on algebraic properties, our algorithm exploits the regularity of the target function intrinsic in the decay of its Fourier spectrum. By defining a tractable family of models, we allow at the same time to obtain precise certificates and to leverage the advanced and powerful computational techniques developed to optimize neural networks. In this way the scalability of our approach is naturally enhanced by parallel computing with GPUs. Our approach, when applied to the case of polynomials of moderate dimensions but with thousands of coefficients, outperforms the state-of-the-art optimization methods with certificates, as the ones based on Lasserre's hierarchy, addressing problems intractable for the competitors., Comment: Edit affiliations and acknowledgments
Published: 2023

13. Combining multi-spectral data with statistical and deep-learning models for improved exoplanet detection in direct imaging at high contrast

Author: Flasseur, Olivier, Bodrito, Théo, Mairal, Julien, Ponce, Jean, Langlois, Maud, and Lagrange, Anne-Marie
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Earth and Planetary Astrophysics, Computer Science - Machine Learning
Abstract: Exoplanet detection by direct imaging is a difficult task: the faint signals from the objects of interest are buried under a spatially structured nuisance component induced by the host star. The exoplanet signals can only be identified when combining several observations with dedicated detection algorithms. In contrast to most of existing methods, we propose to learn a model of the spatial, temporal and spectral characteristics of the nuisance, directly from the observations. In a pre-processing step, a statistical model of their correlations is built locally, and the data are centered and whitened to improve both their stationarity and signal-to-noise ratio (SNR). A convolutional neural network (CNN) is then trained in a supervised fashion to detect the residual signature of synthetic sources in the pre-processed images. Our method leads to a better trade-off between precision and recall than standard approaches in the field. It also outperforms a state-of-the-art algorithm based solely on a statistical framework. Besides, the exploitation of the spectral diversity improves the performance compared to a similar model built solely from spatio-temporal data., Comment: accepted to EUSIPCO 2023
Published: 2023

14. SLACK: Stable Learning of Augmentations with Cold-start and KL regularization

Author: Marrie, Juliette, Arbel, Michael, Larlus, Diane, and Mairal, Julien
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Data augmentation is known to improve the generalization capabilities of neural networks, provided that the set of transformations is chosen with care, a selection often performed manually. Automatic data augmentation aims at automating this process. However, most recent approaches still rely on some prior information; they start from a small pool of manually-selected default transformations that are either used to pretrain the network or forced to be part of the policy learned by the automatic data augmentation algorithm. In this paper, we propose to directly learn the augmentation policy without leveraging such prior knowledge. The resulting bilevel optimization problem becomes more challenging due to the larger search space and the inherent instability of bilevel optimization algorithms. To mitigate these issues (i) we follow a successive cold-start strategy with a Kullback-Leibler regularization, and (ii) we parameterize magnitudes as continuous distributions. Our approach leads to competitive results on standard benchmarks despite a more challenging setting, and generalizes beyond natural images., Comment: Accepted to CVPR 2023
Published: 2023

15. Semi-supervised learning made simple with self-supervised clustering

Author: Fini, Enrico, Astolfi, Pietro, Alahari, Karteek, Alameda-Pineda, Xavier, Mairal, Julien, Nabi, Moin, and Ricci, Elisa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations. However, in many real-world scenarios, labels are partially available, motivating a recent line of work on semi-supervised methods inspired by self-supervised principles. In this paper, we propose a conceptually simple yet empirically powerful approach to turn clustering-based self-supervised methods such as SwAV or DINO into semi-supervised learners. More precisely, we introduce a multi-task framework merging a supervised objective using ground-truth labels and a self-supervised objective relying on clustering assignments with a single cross-entropy loss. This approach may be interpreted as imposing the cluster centroids to be class prototypes. Despite its simplicity, we provide empirical evidence that our approach is highly effective and achieves state-of-the-art performance on CIFAR100 and ImageNet., Comment: CVPR 2023 - Code available at https://github.com/pietroastolfi/suave-daino
Published: 2023

16. Self-Attention in Colors: Another Take on Encoding Graph Structure in Transformers

Author: Menegaux, Romain, Jehanno, Emmanuel, Selosse, Margot, and Mairal, Julien
Subjects: Computer Science - Machine Learning
Abstract: We introduce a novel self-attention mechanism, which we call CSA (Chromatic Self-Attention), which extends the notion of attention scores to attention _filters_, independently modulating the feature channels. We showcase CSA in a fully-attentional graph Transformer CGT (Chromatic Graph Transformer) which integrates both graph structural information and edge features, completely bypassing the need for local message-passing components. Our method flexibly encodes graph structure through node-node interactions, by enriching the original edge features with a relative positional encoding scheme. We propose a new scheme based on random walks that encodes both structural and positional information, and show how to incorporate higher-order topological information, such as rings in molecular graphs. Our approach achieves state-of-the-art results on the ZINC benchmark dataset, while providing a flexible framework for encoding graph structure and incorporating higher-order topology.
Published: 2023

17. DINOv2: Learning Robust Visual Features without Supervision

Author: Oquab, Maxime, Darcet, Timothée, Moutakanni, Théo, Vo, Huy, Szafraniec, Marc, Khalidov, Vasil, Fernandez, Pierre, Haziza, Daniel, Massa, Francisco, El-Nouby, Alaaeldin, Assran, Mahmoud, Ballas, Nicolas, Galuba, Wojciech, Howes, Russell, Huang, Po-Yao, Li, Shang-Wen, Misra, Ishan, Rabbat, Michael, Sharma, Vasu, Synnaeve, Gabriel, Xu, Hu, Jegou, Hervé, Mairal, Julien, Labatut, Patrick, Joulin, Armand, and Bojanowski, Piotr
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.
Published: 2023

18. deep PACO: Combining statistical models with deep learning for exoplanet detection and characterization in direct imaging at high contrast

Author: Flasseur, Olivier, Bodrito, Théo, Mairal, Julien, Ponce, Jean, Langlois, Maud, and Lagrange, Anne-Marie
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Electrical Engineering and Systems Science - Image and Video Processing, Physics - Data Analysis, Statistics and Probability
Abstract: Direct imaging is an active research topic in astronomy for the detection and the characterization of young sub-stellar objects. The very high contrast between the host star and its companions makes the observations particularly challenging. In this context, post-processing methods combining several images recorded with the pupil tracking mode of telescope are needed. In previous works, we have presented a data-driven algorithm, PACO, capturing locally the spatial correlations of the data with a multi-variate Gaussian model. PACO delivers better detection sensitivity and confidence than the standard post-processing methods of the field. However, there is room for improvement due to the approximate fidelity of the PACO statistical model to the time evolving observations. In this paper, we propose to combine the statistical model of PACO with supervised deep learning. The data are first pre-processed with the PACO framework to improve the stationarity and the contrast. A convolutional neural network (CNN) is then trained in a supervised fashion to detect the residual signature of synthetic sources. Finally, the trained network delivers a detection map. The photometry of detected sources is estimated by a second CNN. We apply the proposed approach to several datasets from the VLT/SPHERE instrument. Our results show that its detection stage performs significantly better than baseline methods (cADI, PCA), and leads to a contrast improvement up to half a magnitude compared to PACO. The characterization stage of the proposed method performs on average on par with or better than the comparative algorithms (PCA, PACO) for angular separation above 0.5"., Comment: Accepted to Monthly Notices of the Royal Astronomical Society
Published: 2023

19. Sequential Counterfactual Risk Minimization

Author: Zenati, Houssam, Diemert, Eustache, Martin, Matthieu, Mairal, Julien, and Gaillard, Pierre
Subjects: Computer Science - Machine Learning
Abstract: Counterfactual Risk Minimization (CRM) is a framework for dealing with the logged bandit feedback problem, where the goal is to improve a logging policy using offline data. In this paper, we explore the case where it is possible to deploy learned policies multiple times and acquire new data. We extend the CRM principle and its theory to this scenario, which we call "Sequential Counterfactual Risk Minimization (SCRM)." We introduce a novel counterfactual estimator and identify conditions that can improve the performance of CRM in terms of excess risk and regret rates, by using an analysis similar to restart strategies in accelerated optimization methods. We also provide an empirical evaluation of our method in both discrete and continuous action settings, and demonstrate the benefits of multiple deployments of CRM., Comment: To appear at ICML23
Published: 2023

20. Learning Reward Functions for Robotic Manipulation by Observing Humans

Author: Alakuijala, Minttu, Dulac-Arnold, Gabriel, Mairal, Julien, Ponce, Jean, and Schmid, Cordelia
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Observing a human demonstrator manipulate objects provides a rich, scalable and inexpensive source of data for learning robotic policies. However, transferring skills from human videos to a robotic manipulator poses several challenges, not least a difference in action and observation spaces. In this work, we use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. Thanks to the diversity of this training data, the learned reward function sufficiently generalizes to image observations from a previously unseen robot embodiment and environment to provide a meaningful prior for directed exploration in reinforcement learning. We propose two methods for scoring states relative to a goal image: through direct temporal regression, and through distances in an embedding space obtained with time-contrastive learning. By conditioning the function on a goal image, we are able to reuse one model across a variety of tasks. Unlike prior work on leveraging human videos to teach robots, our method, Human Offline Learned Distances (HOLD) requires neither a priori data from the robot environment, nor a set of task-specific human demonstrations, nor a predefined notion of correspondence across morphologies, yet it is able to accelerate training of several manipulation tasks on a simulated robot arm compared to using only a sparse reward obtained from task completion.
Published: 2022

21. Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Author: Zouaoui, Alexandre, Muhawenayo, Gedeon, Rasti, Behnood, Chanussot, Jocelyn, and Mairal, Julien
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In this paper, we introduce a new algorithm based on archetypal analysis for blind hyperspectral unmixing, assuming linear mixing of endmembers. Archetypal analysis is a natural formulation for this task. This method does not require the presence of pure pixels (i.e., pixels containing a single material) but instead represents endmembers as convex combinations of a few pixels present in the original hyperspectral image. Our approach leverages an entropic gradient descent strategy, which (i) provides better solutions for hyperspectral unmixing than traditional archetypal analysis algorithms, and (ii) leads to efficient GPU implementations. Since running a single instance of our algorithm is fast, we also propose an ensembling mechanism along with an appropriate model selection procedure that make our method robust to hyper-parameter choices while keeping the computational complexity reasonable. By using six standard real datasets, we show that our approach outperforms state-of-the-art matrix factorization and recent deep learning methods. We also provide an open-source PyTorch implementation: https://github.com/inria-thoth/EDAA.
Published: 2022

22. High Dynamic Range and Super-Resolution from Raw Image Bursts

Author: Lecouat, Bruno, Eboli, Thomas, Ponce, Jean, and Mairal, Julien
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Photographs captured by smartphones and mid-range cameras have limited spatial resolution and dynamic range, with noisy response in underexposed regions and color artefacts in saturated areas. This paper introduces the first approach (to the best of our knowledge) to the reconstruction of high-resolution, high-dynamic range color images from raw photographic bursts captured by a handheld camera with exposure bracketing. This method uses a physically-accurate model of image formation to combine an iterative optimization algorithm for solving the corresponding inverse problem with a learned image representation for robust alignment and a learned natural image prior. The proposed algorithm is fast, with low memory requirements compared to state-of-the-art learning-based approaches to image restoration, and features that are learned end to end from synthetic yet realistic data. Extensive experiments demonstrate its excellent performance with super-resolution factors of up to $\times 4$ on real photographs taken in the wild with hand-held cameras, and high robustness to low-light conditions, noise, camera shake, and moderate object motion., Comment: Accepted to Siggraph 2022 Technical Papers program
Published: 2022

23. Non-Convex Bilevel Games with Critical Point Selection Maps

Author: Arbel, Michael and Mairal, Julien
Subjects: Mathematics - Optimization and Control, Mathematics - Differential Geometry
Abstract: Bilevel optimization problems involve two nested objectives, where an upper-level objective depends on a solution to a lower-level problem. When the latter is non-convex, multiple critical points may be present, leading to an ambiguous definition of the problem. In this paper, we introduce a key ingredient for resolving this ambiguity through the concept of a selection map which allows one to choose a particular solution to the lower-level problem. Using such maps, we define a class of hierarchical games between two agents that resolve the ambiguity in bilevel problems. This new class of games requires introducing new analytical tools in Morse theory to extend implicit differentiation, a technique used in bilevel optimization resulting from the implicit function theorem. In particular, we establish the validity of such a method even when the latter theorem is inapplicable due to degenerate critical points. Finally, we show that algorithms for solving bilevel problems based on unrolled optimization solve these games up to approximation errors due to finite computational power. A simple correction to these algorithms is then proposed for removing these errors.
Published: 2022

24. Self Supervised Learning for Few Shot Hyperspectral Image Classification

Author: Braham, Nassim Ait Ali, Mou, Lichao, Chanussot, Jocelyn, Mairal, Julien, and Zhu, Xiao Xiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Deep learning has proven to be a very effective approach for Hyperspectral Image (HSI) classification. However, deep neural networks require large annotated datasets to generalize well. This limits the applicability of deep learning for HSI classification, where manually labelling thousands of pixels for every scene is impractical. In this paper, we propose to leverage Self Supervised Learning (SSL) for HSI classification. We show that by pre-training an encoder on unlabeled pixels using Barlow-Twins, a state-of-the-art SSL algorithm, we can obtain accurate models with a handful of labels. Experimental results demonstrate that this approach significantly outperforms vanilla supervised learning., Comment: Accepted in IGARSS 2022
Published: 2022

25. On the Benefits of Large Learning Rates for Kernel Methods

Author: Beugnot, Gaspard, Mairal, Julien, and Rudi, Alessandro
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms. First observed in the deep learning literature, we show that a phenomenon can be precisely characterized in the context of kernel methods, even though the resulting optimization problem is convex. Specifically, we consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution on the Hessian's eigenvectors. This extends an intuition described by Nakkiran (2020) on a two-dimensional toy problem to realistic learning scenarios such as kernel ridge regression. While large learning rates may be proven beneficial as soon as there is a mismatch between the train and test objectives, we further explain why it already occurs in classification tasks without assuming any particular mismatch between train and test data distributions., Comment: Accepted paper at Conference COLT 2022. To be published to Proceedings of Machine Learning Research (PMLR)
Published: 2022

26. The Spectral Bias of Polynomial Neural Networks

Author: Choraria, Moulik, Dadi, Leello Tadesse, Chrysos, Grigorios, Mairal, Julien, and Cevher, Volkan
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Polynomial neural networks (PNNs) have been recently shown to be particularly effective at image generation and face recognition, where high-frequency information is critical. Previous studies have revealed that neural networks demonstrate a $\textit{spectral bias}$ towards low-frequency functions, which yields faster learning of low-frequency components during training. Inspired by such studies, we conduct a spectral analysis of the Neural Tangent Kernel (NTK) of PNNs. We find that the $\Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the learning of the higher frequencies. We verify the theoretical bias through extensive experiments. We expect our analysis to provide novel insights into designing architectures and learning frameworks by incorporating multiplicative interactions via polynomials., Comment: Accepted at the International Conference on Learning Representations(ICLR) 2022
Published: 2022

27. Efficient Kernel UCB for Contextual Bandits

Author: Zenati, Houssam, Bietti, Alberto, Diemert, Eustache, Mairal, Julien, Martin, Matthieu, and Gaillard, Pierre
Subjects: Computer Science - Machine Learning
Abstract: In this paper, we tackle the computational efficiency of kernelized UCB algorithms in contextual bandits. While standard methods require a O(CT^3) complexity where T is the horizon and the constant C is related to optimizing the UCB rule, we propose an efficient contextual algorithm for large-scale problems. Specifically, our method relies on incremental Nystrom approximations of the joint kernel embedding of contexts and actions. This allows us to achieve a complexity of O(CTm^2) where m is the number of Nystrom points. To recover the same regret as the standard kernelized UCB algorithm, m needs to be of order of the effective dimension of the problem, which is at most O(\sqrt(T)) and nearly constant in some cases., Comment: To appear at AISTATS2022
Published: 2022

28. Self-Supervised Models are Continual Learners

Author: Fini, Enrico, da Costa, Victor G. Turrisi, Alameda-Pineda, Xavier, Ricci, Elisa, Alahari, Karteek, and Mairal, Julien
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale. However, their efficacy is catastrophically reduced in a Continual Learning (CL) scenario where data is presented to the model sequentially. In this paper, we show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for CL by adding a predictor network that maps the current state of the representations to their past state. This enables us to devise a framework for Continual self-supervised visual representation Learning that (i) significantly improves the quality of the learned representations, (ii) is compatible with several state-of-the-art self-supervised objectives, and (iii) needs little to no hyperparameter tuning. We demonstrate the effectiveness of our approach empirically by training six popular self-supervised models in various CL settings.
Published: 2021

29. Amortized Implicit Differentiation for Stochastic Bilevel Optimization

Author: Arbel, Michael and Mairal, Julien
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: We study a class of algorithms for solving bilevel optimization problems in both stochastic and deterministic settings when the inner-level objective is strongly convex. Specifically, we consider algorithms based on inexact implicit differentiation and we exploit a warm-start strategy to amortize the estimation of the exact gradient. We then introduce a unified theoretical framework inspired by the study of singularly perturbed systems (Habets, 1974) to analyze such amortized algorithms. By using this framework, our analysis shows these algorithms to match the computational complexity of oracle methods that have access to an unbiased estimate of the gradient, thus outperforming many existing results for bilevel optimization. We illustrate these findings on synthetic experiments and demonstrate the efficiency of these algorithms on hyper-parameter optimization experiments involving several thousands of variables.
Published: 2021

30. A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

Author: Bodrito, Théo, Zouaoui, Alexandre, Chanussot, Jocelyn, and Mairal, Julien
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Hyperspectral imaging offers new perspectives for diverse applications, ranging from the monitoring of the environment using airborne or satellite remote sensing, precision farming, food safety, planetary exploration, or astrophysics. Unfortunately, the spectral diversity of information comes at the expense of various sources of degradation, and the lack of accurate ground-truth "clean" hyperspectral signals acquired on the spot makes restoration tasks challenging. In particular, training deep neural networks for restoration is difficult, in contrast to traditional RGB imaging problems where deep models tend to shine. In this paper, we advocate instead for a hybrid approach based on sparse coding principles that retains the interpretability of classical techniques encoding domain knowledge with handcrafted image priors, while allowing to train model parameters end-to-end without massive amounts of data. We show on various denoising benchmarks that our method is computationally efficient and significantly outperforms the state of the art.
Published: 2021

31. Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization

Author: Beugnot, Gaspard, Mairal, Julien, and Rudi, Alessandro
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characterize the difficulty of the learning task. In order to understand estimators derived from other loss functions, Marteau-Ferey et al. have extended the theory of Tikhonov regularization to generalized self concordant loss functions (GSC), which contain, e.g., the logistic loss. In this paper, we go a step further and show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme, which is intrinsically related to the proximal point method in optimization, and overcomes the limitation of the classical Tikhonov regularization., Comment: To be published in NeurIPS 2021
Published: 2021

32. Residual Reinforcement Learning from Demonstrations

Author: Alakuijala, Minttu, Dulac-Arnold, Gabriel, Mairal, Julien, Ponce, Jean, and Schmid, Cordelia
Subjects: Computer Science - Machine Learning
Abstract: Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Learning from images, proprioceptive inputs and a sparse task-completion reward relaxes the requirement of accessing full state features, such as object and target positions. In addition, replacing the base controller with a policy learned from demonstrations removes the dependency on a hand-engineered controller in favour of a dataset of demonstrations, which can be provided by non-experts. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning, and is capable of solving high-dimensional, sparse-reward tasks out of reach for RL from scratch.
Published: 2021

33. GraphiT: Encoding Graph Structure in Transformers

Author: Mialon, Grégoire, Chen, Dexiong, Selosse, Margot, and Mairal, Julien
Subjects: Computer Science - Machine Learning
Abstract: We show that viewing graphs as sets of node features and incorporating structural and positional information into a transformer architecture is able to outperform representations learned with classical graph neural networks (GNNs). Our model, GraphiT, encodes such information by (i) leveraging relative positional encoding strategies in self-attention scores based on positive definite kernels on graphs, and (ii) enumerating and encoding local sub-structures such as paths of short length. We thoroughly evaluate these two ideas on many classification and regression tasks, demonstrating the effectiveness of each of them independently, as well as their combination. In addition to performing well on standard benchmarks, our model also admits natural visualization mechanisms for interpreting graph motifs explaining the predictions, making it a potentially strong candidate for scientific applications where interpretation is important. Code available at https://github.com/inria-thoth/GraphiT.
Published: 2021

34. NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Author: Bhat, Goutam, Danelljan, Martin, Timofte, Radu, Akita, Kazutoshi, Cho, Wooyeong, Fan, Haoqiang, Jia, Lanpeng, Kim, Daeshik, Lecouat, Bruno, Li, Youwei, Liu, Shuaicheng, Liu, Ziluan, Luo, Ziwei, Maeda, Takahiro, Mairal, Julien, Micheloni, Christian, Mo, Xuan, Oba, Takeru, Ostyakov, Pavel, Ponce, Jean, Son, Sanghyeok, Sun, Jian, Ukita, Norimichi, Umer, Rao Muhammad, Yan, Youliang, Yu, Lei, Zhussip, Magauiya, and Zou, Xueyi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using a diverse set of solutions. The top-performing methods set a new state-of-the-art for the burst super-resolution task., Comment: NTIRE 2021 Burst Super-Resolution challenge report
Published: 2021

35. Emerging Properties in Self-Supervised Vision Transformers

Author: Caron, Mathilde, Touvron, Hugo, Misra, Ishan, Jégou, Hervé, Mairal, Julien, Bojanowski, Piotr, and Joulin, Armand
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets. Second, these features are also excellent k-NN classifiers, reaching 78.3% top-1 on ImageNet with a small ViT. Our study also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs. We implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels. We show the synergy between DINO and ViTs by achieving 80.1% top-1 on ImageNet in linear evaluation with ViT-Base., Comment: 21 pages
Published: 2021

36. Lucas-Kanade Reloaded: End-to-End Super-Resolution from Raw Image Bursts

Author: Lecouat, Bruno, Ponce, Jean, and Mairal, Julien
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: This presentation addresses the problem of reconstructing a high-resolution image from multiple lower-resolution snapshots captured from slightly different viewpoints in space and time. Key challenges for solving this problem include (i) aligning the input pictures with sub-pixel accuracy, (ii) handling raw (noisy) images for maximal faithfulness to native camera data, and (iii) designing/learning an image prior (regularizer) well suited to the task. We address these three challenges with a hybrid algorithm building on the insight from Wronski et al. that aliasing is an ally in this setting, with parameters that can be learned end to end, while retaining the interpretability of classical approaches to inverse problems. The effectiveness of our approach is demonstrated on synthetic and real image bursts, setting a new state of the art on several benchmarks and delivering excellent qualitative results on real raw bursts captured by smartphones and prosumer cameras.
Published: 2021

37. A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding

Author: Lecouat, Bruno, Ponce, Jean, and Mairal, Julien
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems, and whose architectures are derived from an optimization algorithm. We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions. This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end. The priors used in this presentation include variants of total variation, Laplacian regularization, bilateral filtering, sparse coding on learned dictionaries, and non-local self similarities. Our models are fully interpretable as well as parameter and data efficient. Our experiments demonstrate their effectiveness on a large diversity of tasks ranging from image denoising and compressed sensing for fMRI to dense stereo matching., Comment: NeurIPS 2020
Published: 2020

38. A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

Author: Mialon, Grégoire, Chen, Dexiong, d'Aspremont, Alexandre, and Mairal, Julien
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We address the problem of learning on sets of features, motivated by the need of performing pooling operations in long biological sequences of varying sizes, with long-range dependencies, and possibly few labeled data. To address this challenging task, we introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost. Our aggregation technique admits two useful interpretations: it may be seen as a mechanism related to attention layers in neural networks, or it may be seen as a scalable surrogate of a classical optimal transport-based kernel. We experimentally demonstrate the effectiveness of our approach on biological sequences, achieving state-of-the-art results for protein fold recognition and detection of chromatin profiles tasks, and, as a proof of concept, we show promising results for processing natural language sequences. We provide an open-source implementation of our embedding that can be used alone or as a module in larger learning models at https://github.com/claying/OTK., Comment: ICLR 2021
Published: 2020

39. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Author: Caron, Mathilde, Misra, Ishan, Mairal, Julien, Goyal, Priya, Bojanowski, Piotr, and Joulin, Armand
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or views) of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a swapped prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks., Comment: NeurIPS 2020
Published: 2020

40. Counterfactual Learning of Stochastic Policies with Continuous Actions: from Models to Offline Evaluation

Author: Zenati, Houssam, Bietti, Alberto, Martin, Matthieu, Diemert, Eustache, Gaillard, Pierre, and Mairal, Julien
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Counterfactual reasoning from logged data has become increasingly important for many applications such as web advertising or healthcare. In this paper, we address the problem of learning stochastic policies with continuous actions from the viewpoint of counterfactual risk minimization (CRM). While the CRM framework is appealing and well studied for discrete actions, the continuous action case raises new challenges about modelization, optimization, and~offline model selection with real data which turns out to be particularly challenging. Our paper contributes to these three aspects of the CRM estimation pipeline. First, we introduce a modelling strategy based on a joint kernel embedding of contexts and actions, which overcomes the shortcomings of previous discretization approaches. Second, we empirically show that the optimization aspect of counterfactual learning is important, and we demonstrate the benefits of proximal point algorithms and differentiable estimators. Finally, we propose an evaluation protocol for offline policies in real-world logged systems, which is challenging since policies cannot be replayed on test data, and we release a new large-scale dataset along with multiple synthetic, yet realistic, evaluation setups.
Published: 2020

41. Selecting Relevant Features from a Multi-domain Representation for Few-shot Classification

Author: Dvornik, Nikita, Schmid, Cordelia, and Mairal, Julien
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Popular approaches for few-shot classification consist of first learning a generic data representation based on a large annotated dataset, before adapting the representation to new classes given only a few labeled samples. In this work, we propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches. First, we obtain a multi-domain representation by training a set of semantically different feature extractors. Then, given a few-shot learning task, we use our multi-domain feature bank to automatically select the most relevant representations. We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training, which leads to state-of-the-art results on MetaDataset and improved accuracy on mini-ImageNet., Comment: ECCV'20
Published: 2020

42. Convolutional Kernel Networks for Graph-Structured Data

Author: Chen, Dexiong, Jacob, Laurent, and Mairal, Julien
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We introduce a family of multilayer graph kernels and establish new links between graph convolutional neural networks and kernel methods. Our approach generalizes convolutional kernel networks to graph-structured data, by representing graphs as a sequence of kernel feature maps, where each node carries information about local graph substructures. On the one hand, the kernel point of view offers an unsupervised, expressive, and easy-to-regularize data representation, which is useful when limited samples are available. On the other hand, our model can also be trained end-to-end on large-scale data, leading to new types of graph convolutional neural networks. We show that our method achieves competitive performance on several graph classification benchmarks, while offering simple model interpretation. Our code is freely available at https://github.com/claying/GCKN.
Published: 2020

43. Pruning Convolutional Neural Networks with Self-Supervision

Author: Caron, Mathilde, Morcos, Ari, Bojanowski, Piotr, Mairal, Julien, and Joulin, Armand
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Convolutional neural networks trained without supervision come close to matching performance with supervised pre-training, but sometimes at the cost of an even higher number of parameters. Extracting subnetworks from these large unsupervised convnets with preserved performance is of particular interest to make them less computationally intensive. Typical pruning methods operate during training on a task while trying to maintain the performance of the pruned network on the same task. However, in self-supervised feature learning, the training objective is agnostic on the representation transferability to downstream tasks. Thus, preserving performance for this objective does not ensure that the pruned subnetwork remains effective for solving downstream tasks. In this work, we investigate the use of standard pruning methods, developed primarily for supervised learning, for networks trained without labels (i.e. on self-supervised tasks). We show that pruned masks obtained with or without labels reach comparable performance when re-trained on labels, suggesting that pruning operates similarly for self-supervised and supervised learning. Interestingly, we also find that pruning preserves the transfer performance of self-supervised subnetwork representations.
Published: 2020

44. Cyanure: An Open-Source Toolbox for Empirical Risk Minimization for Python, C++, and soon more

Author: Mairal, Julien
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Cyanure is an open-source C++ software package with a Python interface. The goal of Cyanure is to provide state-of-the-art solvers for learning linear models, based on stochastic variance-reduced stochastic optimization with acceleration mechanisms. Cyanure can handle a large variety of loss functions (logistic, square, squared hinge, multinomial logistic) and regularization functions (l_2, l_1, elastic-net, fused Lasso, multi-task group Lasso). It provides a simple Python API, which is very close to that of scikit-learn, which should be extended to other languages such as R or Matlab in a near future., Comment: http://julien.mairal.org/cyanure/welcome.html
Published: 2019

45. Fully Trainable and Interpretable Non-Local Sparse Models for Image Restoration

Author: Lecouat, Bruno, Ponce, Jean, and Mairal, Julien
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Non-local self-similarity and sparsity principles have proven to be powerful priors for natural image modeling. We propose a novel differentiable relaxation of joint sparsity that exploits both principles and leads to a general framework for image restoration which is (1) trainable end to end, (2) fully interpretable, and (3) much more compact than competing deep learning architectures. We apply this approach to denoising, jpeg deblocking, and demosaicking, and show that, with as few as 100K parameters, its performance on several standard benchmarks is on par or better than state-of-the-art methods that may have an order of magnitude or more parameters., Comment: ECCV 2020
Published: 2019

46. Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Functions

Author: Mialon, Grégoire, d'Aspremont, Alexandre, and Mairal, Julien
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsity-inducing property, and propose a general method to design screening tests for classification or regression based on ellipsoidal approximations of the optimal set. In addition to producing computational gains, our approach also allows us to compress a dataset into a subset of representative points., Comment: AISTATS 2020
Published: 2019

47. Recurrent Kernel Networks

Author: Chen, Dexiong, Jacob, Laurent, and Mairal, Julien
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Substring kernels are classical tools for representing biological sequences or text. However, when large amounts of annotated data are available, models that allow end-to-end training such as neural networks are often preferred. Links between recurrent neural networks (RNNs) and substring kernels have recently been drawn, by formally showing that RNNs with specific activation functions were points in a reproducing kernel Hilbert space (RKHS). In this paper, we revisit this link by generalizing convolutional kernel networks---originally related to a relaxation of the mismatch kernel---to model gaps in sequences. It results in a new type of recurrent neural network which can be trained end-to-end with backpropagation, or without supervision by using kernel approximation techniques. We experimentally show that our approach is well suited to biological sequences, where it outperforms existing methods for protein classification tasks.
Published: 2019

48. A Generic Acceleration Framework for Stochastic Composite Optimization

Author: Kulunchakov, Andrei and Mairal, Julien
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper, we introduce various mechanisms to obtain accelerated first-order stochastic optimization algorithms when the objective function is convex or strongly convex. Specifically, we extend the Catalyst approach originally designed for deterministic objectives to the stochastic setting. Given an optimization method with mild convergence guarantees for strongly convex problems, the challenge is to accelerate convergence to a noise-dominated region, and then achieve convergence with an optimal worst-case complexity depending on the noise variance of the gradients. A side contribution of our work is also a generic analysis that can handle inexact proximal operators, providing new insights about the robustness of stochastic algorithms when the proximal operator cannot be exactly computed.
Published: 2019

49. On the Inductive Bias of Neural Tangent Kernels

Author: Bietti, Alberto and Mairal, Julien
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: State-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties. A recent line of work has shown that in a certain over-parameterized regime, the learning dynamics of gradient descent are governed by a certain kernel obtained at initialization, called the neural tangent kernel. We study the inductive bias of learning in such a regime by analyzing this kernel and the corresponding function space (RKHS). In particular, we study smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compare to other known kernels for similar architectures., Comment: NeurIPS 2019
Published: 2019

50. Estimate Sequences for Variance-Reduced Stochastic Composite Optimization

Author: Kulunchakov, Andrei and Mairal, Julien
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. This point of view covers the stochastic gradient descent method, variants of the approaches SAGA, SVRG, and has several advantages: (i) we provide a generic proof of convergence for the aforementioned methods; (ii) we show that this SVRG variant is adaptive to strong convexity; (iii) we naturally obtain new algorithms with the same guarantees; (iv) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we show that this viewpoint is useful to obtain new accelerated algorithms in the sense of Nesterov., Comment: short version of preprint arXiv:1901.08788
Published: 2019

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Database

Publisher

422 results on '"Mairal, Julien"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources