98 results on '"Lazaric, A."'
Search Results
2. Organizational routines: Evolution in the research landscape of two core communities
- Author
-
Giada Baldessarelli, Nathalie Lazaric, and Michele Pezzoni
- Subjects
Economics and Econometrics ,General Business, Management and Accounting - Published
- 2022
3. À quoi servent les théories évolutionnistes ? Réflexions et outils pour faire face aux crises contemporaines
- Author
-
Nathalie Lazaric
- Published
- 2022
4. Editorial: Alternative building blocks and new recycling routes for polymers: Challenges for circular economy and triggers for innovations
- Author
-
Valerie Massardier, Naima Belhaneche-Bensemra, and Nathalie Lazaric
- Subjects
Materials Science (miscellaneous) - Published
- 2023
5. Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping
- Author
-
Mezghani, Lina, Sukhbaatar, Sainbayar, Bojanowski, Piotr, Lazaric, Alessandro, Alahari, Karteek, Apprentissage de modèles à partir de données massives (Thoth), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Meta AI, ANR-18-CE23-0011, and ANR-18-CE23-0011,AVENUE,Réseau de mémoire visuelle pour l'interprétation de scènes(2018)
- Subjects
FOS: Computer and information sciences ,Computer Science - Robotics ,Computer Science - Machine Learning ,Self-Supervised Learning ,Artificial Intelligence (cs.AI) ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Computer Science - Artificial Intelligence ,Goal-Conditioned RL ,[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] ,Offline RL ,Robotics (cs.RO) ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Machine Learning (cs.LG) - Abstract
Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming. Moreover, manually designing reward functions for every single desired skill is prohibitive. Prior works targeted these challenges by learning goal-conditioned policies from offline datasets without manually specified rewards, through hindsight relabelling. These methods suffer from the issue of sparsity of rewards, and fail at long-horizon tasks. In this work, we propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model, and shape a dense reward function for learning policies offline. We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches, especially on tasks that involve long-term planning., Comment: Code: https://github.com/facebookresearch/go-fresh
- Published
- 2023
- Full Text
- View/download PDF
6. Solvent‐mediated forces in protein dielectrophoresis
- Author
-
Morteza M. Waskasi, Aleksandar Lazaric, and Matthias Heyden
- Subjects
Electrophoresis ,Physics ,Work (thermodynamics) ,Clinical Biochemistry ,Solvation ,Proteins ,Water ,Context (language use) ,Molecular Dynamics Simulation ,Dielectrophoresis ,Biochemistry ,Analytical Chemistry ,Dipole ,Molecular dynamics ,Electricity ,Chemical physics ,Electric field ,Solvents ,Polarization (electrochemistry) - Abstract
DEP is an established method to manipulate micrometer-sized particles, but standard continuum theories predict only negligible effects for nanometer-sized proteins despite contrary experimental evidence. A theoretical description of protein DEP needs to consider details on the molecular scale. Previous work toward this goal addressed the role of orientational polarization of static protein dipole moments for dielectrophoretic effects, which successfully predicts the general magnitude of dielectrophoretic forces on proteins but does not readily explain negative DEP forces observed for proteins in some experiments. However, contributions to the protein chemical potential due to protein-water interactions have not yet been considered in this context. Here, we utilize atomistic molecular dynamics simulations to evaluate polarization-induced changes in the protein solvation free energy, which result in a solvent-mediated contribution to dielectrophoretic forces. We quantify solvent-mediated dielectrophoretic forces for two proteins and a small peptide in water, which follow expectations for protein-water dipole-dipole interactions. The magnitude of solvent-mediated dielectrophoretic forces exceeds predictions of nonmolecular continuum theories, but plays a minor role for the total dielectrophoretic force for the simulated proteins due to dominant contributions from the orientational polarization of their static protein dipoles. However, we extrapolate that solvent-mediated contributions to negative protein DEP forces will become increasingly relevant for multidomain proteins, complexes and aggregates with large protein-water interfaces, as well as for high electric field frequencies, which provides a potential mechanism for corresponding experimental observations of negative protein DEP.
- Published
- 2021
7. Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies
- Author
-
Yuan, Rui, Du, Simon S., Gower, Robert M., Lazaric, Alessandro, and Xiao, Lin
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Optimization and Control (math.OC) ,FOS: Mathematics ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size., Comment: This version adds a table of comparison for the literature review. The paper is published as a conference paper at ICLR 2023
- Published
- 2022
- Full Text
- View/download PDF
8. Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path
- Author
-
Chen, Liyu, Tirinzoni, Andrea, Pirotta, Matteo, and Lazaric, Alessandro
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
We study the sample complexity of learning an $\epsilon$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case SSP instance with $S$ states, $A$ actions, minimum cost $c_{\min}$, and maximum expected cost of the optimal policy over all states $B_{\star}$, where any algorithm requires at least $\Omega(SAB_{\star}^3/(c_{\min}\epsilon^2))$ samples to return an $\epsilon$-optimal policy with high probability. Surprisingly, this implies that whenever $c_{\min}=0$ an SSP problem may not be learnable, thus revealing that learning in SSPs is strictly harder than in the finite-horizon and discounted settings. We complement this result with lower bounds when prior knowledge of the hitting time of the optimal policy is available and when we restrict optimality by competing against policies with bounded hitting time. Finally, we design an algorithm with matching upper bounds in these cases. This settles the sample complexity of learning $\epsilon$-optimal polices in SSP with generative models. We also initiate the study of learning $\epsilon$-optimal policies without access to a generative model (i.e., the so-called best-policy identification problem), and show that sample-efficient learning is impossible in general. On the other hand, efficient learning can be made possible if we assume the agent can directly reach the goal state from any state by paying a fixed cost. We then establish the first upper and lower bounds under this assumption. Finally, using similar analytic tools, we prove that horizon-free regret is impossible in SSPs under general costs, resolving an open problem in (Tarbouriech et al., 2021c).
- Published
- 2022
- Full Text
- View/download PDF
9. Contextual bandits with concave rewards, and an application to fair ranking
- Author
-
Do, Virginie, Dohmatob, Elvis, Pirotta, Matteo, Lazaric, Alessandro, and Usunier, Nicolas
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computers and Society ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Computers and Society (cs.CY) ,Machine Learning (stat.ML) ,Information Retrieval (cs.IR) ,Computer Science - Information Retrieval ,Machine Learning (cs.LG) - Abstract
We consider Contextual Bandits with Concave Rewards (CBCR), a multi-objective bandit problem where the desired trade-off between the rewards is defined by a known concave objective function, and the reward vector depends on an observed stochastic context. We present the first algorithm with provably vanishing regret for CBCR without restrictions on the policy space, whereas prior works were restricted to finite policy spaces or tabular representations. Our solution is based on a geometric interpretation of CBCR algorithms as optimization algorithms over the convex set of expected rewards spanned by all stochastic policies. Building on Frank-Wolfe analyses in constrained convex optimization, we derive a novel reduction from the CBCR regret to the regret of a scalar-reward bandit problem. We illustrate how to apply the reduction off-the-shelf to obtain algorithms for CBCR with both linear and general reward functions, in the case of non-combinatorial actions. Motivated by fairness in recommendation, we describe a special case of CBCR with rankings and fairness-aware objectives, leading to the first algorithm with regret guarantees for contextual combinatorial bandits with fairness of exposure., Comment: ICLR 2023
- Published
- 2022
- Full Text
- View/download PDF
10. On the Complexity of Representation Learning in Contextual Linear Bandits
- Author
-
Tirinzoni, Andrea, Pirotta, Matteo, and Lazaric, Alessandro
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs. In practice, the embedding is often learned at the same time as the reward vector, thus leading to an online representation learning problem. Existing approaches to representation learning in contextual bandits are either very generic (e.g., model-selection techniques or algorithms for learning with arbitrary function classes) or specialized to particular structures (e.g., nested features or representations with certain spectral properties). As a result, the understanding of the cost of representation learning in contextual linear bandit is still limited. In this paper, we take a systematic approach to the problem and provide a comprehensive study through an instance-dependent perspective. We show that representation learning is fundamentally more complex than linear bandits (i.e., learning with a given representation). In particular, learning with a given set of representations is never simpler than learning with the worst realizable representation in the set, while we show cases where it can be arbitrarily harder. We complement this result with an extensive discussion of how it relates to existing literature and we illustrate positive instances where representation learning is as complex as learning with a fixed representation and where sub-logarithmic regret is achievable.
- Published
- 2022
- Full Text
- View/download PDF
11. Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
- Author
-
Tirinzoni, Andrea, Papini, Matteo, Touati, Ahmed, Lazaric, Alessandro, and Pirotta, Matteo
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BanditSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BanditSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BanditSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks., Comment: Accepted at Neurips 2022
- Published
- 2022
- Full Text
- View/download PDF
12. Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times
- Author
-
Calandriello, Daniele, Carratino, Luigi, Lazaric, Alessandro, Valko, Michal, and Rosasco, Lorenzo
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Computing a Gaussian process (GP) posterior has a computational cost cubical in the number of historical points. A reformulation of the same GP posterior highlights that this complexity mainly depends on how many \emph{unique} historical points are considered. This can have important implication in active learning settings, where the set of historical points is constructed sequentially by the learner. We show that sequential black-box optimization based on GPs (GP-Opt) can be made efficient by sticking to a candidate solution for multiple evaluation steps and switch only when necessary. Limiting the number of switches also limits the number of unique points in the history of the GP. Thus, the efficient GP reformulation can be used to exactly and cheaply compute the posteriors required to run the GP-Opt algorithms. This approach is especially useful in real-world applications of GP-Opt with high switch costs (e.g. switching chemicals in wet labs, data/model loading in hyperparameter optimization). As examples of this meta-approach, we modify two well-established GP-Opt algorithms, GP-UCB and GP-EI, to switch candidates as infrequently as possible adapting rules from batched GP-Opt. These versions preserve all the theoretical no-regret guarantees while improving practical aspects of the algorithms such as runtime, memory complexity, and the ability of batching candidates and evaluating them in parallel.
- Published
- 2022
- Full Text
- View/download PDF
13. Determinants of sustainable consumption in France: the importance of social influence and environmental values
- Author
-
Fabrice Le Guel, Sébastien Lavaud, Jean Belin, Vanessa Oltra, Ali Douai, Nathalie Lazaric, COMUE Université Côte d'Azur (2015-2019) (COMUE UCA), Centre National de la Recherche Scientifique (CNRS), Groupe de Recherche en Droit, Economie et Gestion (GREDEG), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Université Paris-Sud - Paris 11 (UP11), Groupe de Recherche en Economie Théorique et Appliquée (GREThA), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS), ESIA, Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB), and COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Consumption (economics) ,Economics and Econometrics ,JEL: D - Microeconomics ,Public economics ,media_common.quotation_subject ,05 social sciences ,1. No poverty ,[SHS.ECO]Humanities and Social Sciences/Economics and Finance ,General Business, Management and Accounting ,Promotion (rank) ,JEL: Q - Agricultural and Natural Resource Economics • Environmental and Ecological Economics ,0502 economics and business ,Sustainable consumption ,Environmental impact assessment ,Business ,Ordered logit ,050207 economics ,Peer pressure ,050203 business & management ,Consumer behaviour ,Social influence ,media_common - Abstract
International audience; Our article provides empirical findings for France related to sustainable consumption and what triggers sustainable behaviour. We investigate various potential key explanatory variables including social influence and environmental values, among others. Our main contribution is to survey and to analyse a set of consumption practices (rather than the examination of single practices as in most of the literature) for a large sample of more than 3,000 households. The survey was conducted in France in 2012. We use cluster analysis to identify and describe the different consumer behaviour profiles. This methodology identifies three clusters of consumers characterized by diverse concerns related to the environmental impact of their consumption. Based on these clusters, ordered Logit models are fitted on three levels of sustainable consumption behaviours. Our results emphasize the importance of age, gender, education, environmental concern and peer effects for spurring sustainable consumption. We discuss the role of peer pressure as a major determinant. Learning about sustainable behaviour from peers seems to complement changing environmental values and stimulate pro-environmental behaviour. Our findings show that local externalities clearly outweigh the global consequences related to the promotion of sustainable consumption behaviours; that is, the ability to learn in small networks is critical for the promotion of trust and the exchange of ideas and practices.
- Published
- 2019
14. Cognition and Routine Dynamics
- Author
-
Lazaric, Nathalie, Groupe de Recherche en Droit, Economie et Gestion (GREDEG), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Feldman, M., Pentland, B., D'Adderio, L., Dittrich, K., Rerup, C., Seidl, D., and ESIA
- Subjects
Cognition ,4. Education ,Routines ,organizational dynamics ,[SHS.ECO]Humanities and Social Sciences/Economics and Finance ,[SHS]Humanities and Social Sciences - Abstract
International audience; Cognition is critical for finding different solutions to problems and providing new, robust patterns of action for the performance of routines. Routine Dynamics research provides significant empirical evidence about patterns and performance, and reveals how practices are permanently co-shaped using the notions of artefacts, reflection, replication of knowledge and intentionality. The notions of reflective action and reflective thinking have been identified as critical for current patterns of interdependent actions, thus offering an opportunity to reshape both cognition and the representation of routines that is far from the original conception of the Carnegie School.
- Published
- 2021
15. Isolating non-additive solvation effects from explicit solvent simulations
- Author
-
Matthias Heyden, Kaprao Fuegner, Viren Pattni, and Aleksandar Lazaric
- Published
- 2021
16. Editorial
- Author
-
Uwe Cantner, Bart Verspagen, Nathalie Lazaric, Maria Savona, Roberto Fontana, Reinoud Joosten, Andreas Pyka, Andrea Roventini, Simone Vannuccini, Mt Economic Research Inst on Innov/Techn, RS: UNU-MERIT Theme 1, RS: GSBE other - not theme-related research, and RS: GSBE MGSoG
- Subjects
Economics and Econometrics ,General Business, Management and Accounting - Published
- 2022
17. Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
- Author
-
Tarbouriech, Jean, Zhou, Runlong, Du, Simon, Pirotta, Matteo, Valko, Michal, Lazaric, Alessandro, Facebook AI Research [Paris] (FAIR), Facebook, Scool (Scool), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Tsinghua University [Beijing] (THU), Paul G. Allen School of Computer Science and Engineering [Seattle], University of Washington [Seattle], DeepMind [Paris], Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, and Tarbouriech, Jean
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,Astrophysics::Cosmology and Extragalactic Astrophysics ,[STAT.ML] Statistics [stat]/Machine Learning [stat.ML] ,Astrophysics::Galaxy Astrophysics ,Machine Learning (cs.LG) - Abstract
We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to induce an optimistic SSP problem whose associated value iteration scheme is guaranteed to converge. We prove that EB-SSP achieves the minimax regret rate $\tilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions, and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$, which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first (nearly) horizon-free regret bound beyond the finite-horizon MDP setting., Comment: NeurIPS 2021
- Published
- 2021
- Full Text
- View/download PDF
18. A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs
- Author
-
Tirinzoni, Andrea, Pirotta, Matteo, and Lazaric, Alessandro
- Subjects
Computer Science::Machine Learning ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs). While, similar to prior work (e.g., for ergodic MDPs), the lower-bound is the solution to an optimization problem, our derivation reveals the need for an additional constraint on the visitation distribution over state-action pairs that explicitly accounts for the dynamics of the MDP. We provide a characterization of our lower-bound through a series of examples illustrating how different MDPs may have significantly different complexity. 1) We first consider a "difficult" MDP instance, where the novel constraint based on the dynamics leads to a larger lower-bound (i.e., a larger regret) compared to the classical analysis. 2) We then show that our lower-bound recovers results previously derived for specific MDP instances. 3) Finally, we show that, in certain "simple" MDPs, the lower bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all. We show that this last result is attainable (up to $poly(H)$ terms, where $H$ is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.
- Published
- 2021
- Full Text
- View/download PDF
19. Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching
- Author
-
Kamienny, Pierre-Alexandre, Tarbouriech, Jean, Lamprier, Sylvain, Lazaric, Alessandro, and Denoyer, Ludovic
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
Learning meaningful behaviors in the absence of reward is a difficult problem in reinforcement learning. A desirable and challenging unsupervised objective is to learn a set of diverse skills that provide a thorough coverage of the state space while being directed, i.e., reliably reaching distinct regions of the environment. In this paper, we build on the mutual information framework for skill discovery and introduce UPSIDE, which addresses the coverage-directedness trade-off in the following ways: 1) We design policies with a decoupled structure of a directed skill, trained to reach a specific region, followed by a diffusing part that induces a local coverage. 2) We optimize policies by maximizing their number under the constraint that each of them reaches distinct regions of the environment (i.e., they are sufficiently discriminable) and prove that this serves as a lower bound to the original mutual information objective. 3) Finally, we compose the learned directed skills into a growing tree that adaptively covers the environment. We illustrate in several navigation and control environments how the skills learned by UPSIDE solve sparse-reward downstream tasks better than existing baselines., Comment: ICLR 2022
- Published
- 2021
- Full Text
- View/download PDF
20. A general sample complexity analysis of vanilla policy gradient
- Author
-
Yuan, Rui, Gower, Robert M., and Lazaric, Alessandro
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Optimization and Control (math.OC) ,FOS: Mathematics ,Machine Learning (stat.ML) ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex optimization to obtain convergence and sample complexity guarantees for the vanilla policy gradient (PG). Our only assumptions are that the expected return is smooth w.r.t. the policy parameters, that its $H$-step truncated gradient is close to the exact gradient, and a certain ABC assumption. This assumption requires the second moment of the estimated gradient to be bounded by $A\geq 0$ times the suboptimality gap, $B \geq 0$ times the norm of the full batch gradient and an additive constant $C \geq 0$, or any combination of aforementioned. We show that the ABC assumption is more general than the commonly used assumptions on the policy space to prove convergence to a stationary point. We provide a single convergence theorem that recovers the $\widetilde{\mathcal{O}}(\epsilon^{-4})$ sample complexity of PG to a stationary point. Our results also affords greater flexibility in the choice of hyper parameters such as the step size and the batch size $m$, including the single trajectory case (i.e., $m=1$). When an additional relaxed weak gradient domination assumption is available, we establish a novel global optimum convergence theory of PG with $\widetilde{\mathcal{O}}(\epsilon^{-3})$ sample complexity. We then instantiate our theorems in different settings, where we both recover existing results and obtain improved sample complexity, e.g., $\widetilde{\mathcal{O}}(\epsilon^{-3})$ sample complexity for the convergence to the global optimum for Fisher-non-degenerated parametrized policies., Comment: Accepted at AISTATS 2022. This version updates references and adds acknowledgement to Matteo Papini who greatly improved our work before the submission
- Published
- 2021
- Full Text
- View/download PDF
21. Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning
- Author
-
Yarats, Denis, Fergus, Rob, Lazaric, Alessandro, and Pinto, Lerrel
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Machine Learning (cs.LG) - Abstract
We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL. DrQ-v2 is conceptually simple, easy to implement, and provides significantly better computational footprint compared to prior work, with the majority of tasks taking just 8 hours to train on a single GPU. Finally, we publicly release DrQ-v2's implementation to provide RL practitioners with a strong and computationally efficient baseline.
- Published
- 2021
- Full Text
- View/download PDF
22. Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations
- Author
-
Garcelon, Evrard, Avadhanula, Vashist, Lazaric, Alessandro, and Pirotta, Matteo
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds. Under the assumption that at each round the true reward of each arm is drawn from a fixed distribution, we derive different algorithmic approaches and theoretical guarantees depending on how the evaluations are generated. First, we show a $\widetilde{O}(T^{2/3})$ regret in the general case when the observation functions are a genearalized linear function of the true rewards. On the other hand, we show that an improved $\widetilde{O}(\sqrt{T})$ regret can be derived when the observation functions are noisy linear functions of the true rewards. Finally, we report an empirical validation that confirms our theoretical findings, provides a thorough comparison to alternative approaches, and further supports the interest of this setting in practice.
- Published
- 2021
- Full Text
- View/download PDF
23. Comparison Study of Upper Subcritical Limits Derived Using Sensitivity/Uncertainty Tools Case Studies of Benchmarks and Applications
- Author
-
Kristina Yancey Spencer, Jennifer Louise Alwin, Benjamin Murphy, Forrest B. Brown, and Matthew Lazaric
- Subjects
Comparison study ,Sensitivity (control systems) ,Biological system ,Mathematics - Published
- 2020
24. Meta-learning with Stochastic Linear Bandits
- Author
-
Cella, Leonardo, Lazaric, Alessandro, and Pontil, Massimiliano
- Subjects
Computer Science::Machine Learning ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
- Published
- 2020
- Full Text
- View/download PDF
25. A Provably Efficient Sample Collection Strategy for Reinforcement Learning
- Author
-
Tarbouriech, Jean, Pirotta, Matteo, Valko, Michal, Lazaric, Alessandro, Facebook AI Research [Paris] (FAIR), Facebook, Scool (Scool), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), DeepMind [Paris], and Tarbouriech, Jean
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,[STAT.ML] Statistics [stat]/Machine Learning [stat.ML] ,Machine Learning (cs.LG) - Abstract
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. Whether we optimize for regret, sample complexity, state-space coverage or model estimation, we need to strike a different exploration-exploitation trade-off. In this paper, we propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that (adaptively) prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., a simulator of the environment); 2) An "objective-agnostic" sample collection exploration strategy responsible for generating the prescribed samples as fast as possible. Building on recent methods for exploration in the stochastic shortest path problem, we first provide an algorithm that, given as input the number of samples $b(s,a)$ needed in each state-action pair, requires $\tilde{O}(B D + D^{3/2} S^2 A)$ time steps to collect the $B=\sum_{s,a} b(s,a)$ desired samples, in any unknown communicating MDP with $S$ states, $A$ actions and diameter $D$. Then we show how this general-purpose exploration algorithm can be paired with "objective-specific" strategies that prescribe the sample requirements to tackle a variety of settings -- e.g., model estimation, sparse reward discovery, goal-free cost-free exploration in communicating MDPs -- for which we obtain improved or novel sample complexity guarantees., Comment: NeurIPS 2021
- Published
- 2020
- Full Text
- View/download PDF
26. Learning Near Optimal Policies with Low Inherent Bellman Error
- Author
-
Zanette, Andrea, Lazaric, Alessandro, Kochenderfer, Mykel, and Brunskill, Emma
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Machine Learning (cs.LG) - Abstract
We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally employed to show convergence of approximate value iteration. First we relate this condition to other common frameworks and show that it is strictly more general than the low rank (or linear) MDP assumption of prior work. Second we provide an algorithm with a high probability regret bound $\widetilde O(\sum_{t=1}^H d_t \sqrt{K} + \sum_{t=1}^H \sqrt{d_t} \IBE K)$ where $H$ is the horizon, $K$ is the number of episodes, $\IBE$ is the value if the inherent Bellman error and $d_t$ is the feature dimension at timestep $t$. In addition, we show that the result is unimprovable beyond constants and logs by showing a matching lower bound. This has two important consequences: 1) it shows that exploration is possible using only \emph{batch assumptions} with an algorithm that achieves the optimal statistical rate for the setting we consider, which is more general than prior work on low-rank MDPs 2) the lack of closedness (measured by the inherent Bellman error) is only amplified by $\sqrt{d_t}$ despite working in the online setting. Finally, the algorithm reduces to the celebrated \textsc{LinUCB} when $H=1$ but with a different choice of the exploration parameter that allows handling misspecified contextual linear bandits. While computational tractability questions remain open for the MDP setting, this enriches the class of MDPs with a linear representation for the action-value function where statistically efficient reinforcement learning is possible., Comment: Bug fixes in appendix; appears in ICML 2020
- Published
- 2020
- Full Text
- View/download PDF
27. Sketched Newton-Raphson
- Author
-
Rui Yuan, Alessandro Lazaric, and Robert M. Gower
- Subjects
Optimization and Control (math.OC) ,Applied Mathematics ,G.1.6 ,FOS: Mathematics ,Mathematics - Numerical Analysis ,Numerical Analysis (math.NA) ,58C15, 90C06, 90C53, 62L20, 46N10, 46N40, 49M15, 68W20, 68W40, 65Y20 ,Mathematics - Optimization and Control ,Software ,Theoretical Computer Science - Abstract
We propose a new globally convergent stochastic second order method. Our starting point is the development of a new Sketched Newton-Raphson (SNR) method for solving large scale nonlinear equations of the form $F(x)=0$ with $F:\mathbb{R}^p \rightarrow \mathbb{R}^m$. We then show how to design several stochastic second order optimization methods by re-writing the optimization problem of interest as a system of nonlinear equations and applying SNR. For instance, by applying SNR to find a stationary point of a generalized linear model (GLM), we derive completely new and scalable stochastic second order methods. We show that the resulting method is very competitive as compared to state-of-the-art variance reduced methods. Furthermore, using a variable splitting trick, we also show that the Stochastic Newton method (SNM) is a special case of SNR, and use this connection to establish the first global convergence theory of SNM. We establish the global convergence of SNR by showing that it is a variant of the stochastic gradient descent (SGD) method, and then leveraging proof techniques of SGD. As a special case, our theory also provides a new global convergence theory for the original Newton-Raphson method under strictly weaker assumptions as compared to the classic monotone convergence theory., Comment: Accepted for SIAM Journal on Optimization. 47 pages, 4 figures
- Published
- 2020
- Full Text
- View/download PDF
28. Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization
- Author
-
Kamienny, Pierre-Alexandre, Pirotta, Matteo, Lazaric, Alessandro, Lavril, Thibault, Usunier, Nicolas, and Denoyer, Ludovic
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,68T99 ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments, where the task may change over time. While RNN-based policies could in principle represent such strategies, in practice their training time is prohibitive and the learning process often converges to poor solutions. In this paper, we consider the case where the agent has access to a description of the task (e.g., a task id or task parameters) at training time, but not at test time. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task. This dramatically reduces the sample complexity of training RNN-based policies, without losing their representational power. As a result, our method learns exploration strategies that efficiently balance between gathering information about the unknown and changing task and maximizing the reward over time. We test the performance of our algorithm in a variety of environments where tasks may vary within each episode., Comment: 18 pages
- Published
- 2020
- Full Text
- View/download PDF
29. Concentration Inequalities for Multinoulli Random Variables
- Author
-
Qian, Jian, Fruit, Ronan, Pirotta, Matteo, and Lazaric, Alessandro
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics::Applications ,Statistics - Machine Learning ,Computer Science::Sound ,Statistics::Methodology ,Machine Learning (stat.ML) ,Mathematics::Spectral Theory ,Statistics::Computation ,Machine Learning (cs.LG) - Abstract
We investigate concentration inequalities for Dirichlet and Multinomial random variables., Comment: Tutorial at ALT'19 on Regret Minimization in Infinite-Horizon Finite Markov Decision Processes
- Published
- 2020
- Full Text
- View/download PDF
30. Improved Algorithms for Conservative Exploration in Bandits
- Author
-
Alessandro Lazaric, Evrard Garcelon, Mohammad Ghavamzadeh, and Matteo Pirotta
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Digital marketing ,Computer science ,business.industry ,Process (engineering) ,Regret ,Robotics ,Machine Learning (stat.ML) ,General Medicine ,Recommender system ,Machine Learning (cs.LG) ,Constraint (information theory) ,Statistics - Machine Learning ,Production (economics) ,Artificial intelligence ,Baseline (configuration management) ,business ,Algorithm - Abstract
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a well-tested and reliable baseline policy running in production (e.g., a recommender system). Nonetheless, the baseline policy is often suboptimal. In this case, it is desirable to deploy online learning algorithms (e.g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself. In this paper, we study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2). We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems. Finally, we consider a more realistic constraint where the performance is verified only at predefined checkpoints (instead of at every step) and show how this relaxed constraint favorably impacts the regret and empirical performance of CLUCB2.
- Published
- 2020
- Full Text
- View/download PDF
31. Active Model Estimation in Markov Decision Processes
- Author
-
Tarbouriech, Jean, Shekhar, Shubhanshu, Pirotta, Matteo, Ghavamzadeh, Mohammad, and Lazaric, Alessandro
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP). Efficient exploration in this problem requires the agent to identify the regions in which estimating the model is more difficult and then exploit this knowledge to collect more samples there. In this paper, we formalize this problem, introduce the first algorithm to learn an $\epsilon$-accurate estimate of the dynamics, and provide its sample complexity analysis. While this algorithm enjoys strong guarantees in the large-sample regime, it tends to have a poor performance in early stages of exploration. To address this issue, we propose an algorithm that is based on maximum weighted entropy, a heuristic that stems from common sense and our theoretical analysis. The main idea here is to cover the entire state-action space with the weight proportional to the noise in the transitions. Using a number of simple domains with heterogeneous noise in their transitions, we show that our heuristic-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime, while achieving similar asymptotic performance as that of the original algorithm.
- Published
- 2020
- Full Text
- View/download PDF
32. Improved Analysis of UCRL2 with Empirical Bernstein Inequality
- Author
-
Fruit, Ronan, Pirotta, Matteo, and Lazaric, Alessandro
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with $S$ states, $A$ actions, $\Gamma \leq S$ next states and diameter $D$, the regret of UCRL2B is bounded as $\widetilde{O}(\sqrt{D\Gamma S A T})$., Comment: Document in support of the tutorial at ALT 2019
- Published
- 2020
- Full Text
- View/download PDF
33. Chapter 8 Learning a New Ecology of Space and Looking for New Routines: Experimenting Robotics in a Surgical Team
- Author
-
Lea Kiwan and Nathalie Lazaric
- Subjects
Surgical team ,business.industry ,Computer science ,Ecology ,media_common.quotation_subject ,Ecology (disciplines) ,Performative utterance ,Robotics ,Artifact (software development) ,Space (commercial competition) ,Interdependence ,Artificial intelligence ,business ,Ostensive definition ,media_common - Abstract
Members of an organization facing change often struggle to adapt and may create new routines. Drawing on insights from a case study of bariatric robotic surgery, the authors illustrate how a new ecology of space transforms the ostensive and performative aspect of a routine during the introduction of a new technological artifact. The authors discuss two types of space: experimental and reflective. The authors show that the reflective space through debriefings enables practitioners to discuss the new patterns of interdependent actions. Practitioners explore the different aspects of the performative struggle with new artifacts and try to integrate new actions and delineate the boundaries of this change during experimental performances. The findings of this study throw light on the role of the reflective space in addition to the experimental space in routine change, and suggest that socio-material ensembles can produce opportunities for reshaping routines.
- Published
- 2019
34. A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning
- Author
-
Carion, Nicolas, Synnaeve, Gabriel, Lazaric, Alessandro, and Usunier, Nicolas
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Computer Science - Multiagent Systems ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) ,Multiagent Systems (cs.MA) - Abstract
Effective coordination is crucial to solve multi-agent collaborative (MAC) problems. While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training. In this paper, we consider MAC problems with some intrinsic notion of locality (e.g., geographic proximity) such that interactions between agents and tasks are locally limited. By leveraging this property, we introduce a novel structured prediction approach to assign agents to tasks. At each step, the assignment is obtained by solving a centralized optimization problem (the inference procedure) whose objective function is parameterized by a learned scoring model. We propose different combinations of inference procedures and scoring models able to represent coordination patterns of increasing complexity. The resulting assignment policy can be efficiently learned on small problem instances and readily reused in problems with more agents and tasks (i.e., zero-shot generalization). We report experimental results on a toy search and rescue problem and on several target selection scenarios in StarCraft: Brood War, in which our model significantly outperforms strong rule-based baselines on instances with 5 times more agents and tasks than those seen during training.
- Published
- 2019
- Full Text
- View/download PDF
35. Frequentist Regret Bounds for Randomized Least-Squares Value Iteration
- Author
-
Zanette, Andrea, Brandfonbrener, David, Brunskill, Emma, Pirotta, Matteo, and Lazaric, Alessandro
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory. In this paper, we introduce an optimistically-initialized variant of the popular randomized least-squares value iteration (RLSVI), a model-free algorithm where exploration is induced by perturbing the least-squares approximation of the action-value function. Under the assumption that the Markov decision process has low-rank transition dynamics, we prove that the frequentist regret of RLSVI is upper-bounded by $\widetilde O(d^2 H^2 \sqrt{T})$ where $ d $ are the feature dimension, $ H $ is the horizon, and $ T $ is the total number of steps. To the best of our knowledge, this is the first frequentist regret analysis for randomized exploration with function approximation., Comment: AISTATS 2020; minor bug fix
- Published
- 2019
- Full Text
- View/download PDF
36. Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
- Author
-
Calandriello, Daniele, Carratino, Luigi, Lazaric, Alessandro, Valko, Michal, Rosasco, Lorenzo, Istituto Italiano di Tecnologia (IIT), Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi [Genova] (DIBRIS), Università degli studi di Genova = University of Genoa (UniGe), Facebook, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), and Universita degli studi di Genova
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,sparse Gaussian process optimization ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,kernelized linear bandits ,variance starvation ,sketching ,regret ,Machine Learning (stat.ML) ,black-box optimization ,Bayesian optimization ,Machine Learning (cs.LG) - Abstract
Gaussian processes (GP) are a well studied Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions $d$ and iterations $t$. Given a set of $A$ alternatives to choose from, the overall runtime $O(t^3A)$ is prohibitive. In this paper we introduce BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching based on leverage score sampling, and we prove that randomly sampling inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from variance starvation, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$. This greatly reduces the dimensionality of the problem, thus leading to a $O(TAd_{eff}^2)$ runtime and $O(A d_{eff})$ space complexity., Comment: Accepted at COLT 2019. Corrected typos and improved comparison with existing methods
- Published
- 2019
- Full Text
- View/download PDF
37. Determinants of energy tracking application use at the city level: Evidence from France
- Author
-
Jackie Krafft, Nathalie Lazaric, Amel Attour, Marco Baudino, Groupe de Recherche en Droit, Economie et Gestion (GREDEG), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), and GFI
- Subjects
JEL: R - Urban, Rural, Regional, Real Estate, and Transportation Economics/R.R2 - Household Analysis ,020209 energy ,Energy (esotericism) ,Ordered probit ,02 engineering and technology ,010501 environmental sciences ,Management, Monitoring, Policy and Law ,Affect (psychology) ,01 natural sciences ,7. Clean energy ,Home automation ,Smart city ,11. Sustainability ,0202 electrical engineering, electronic engineering, information engineering ,Digitization ,0105 earth and related environmental sciences ,business.industry ,JEL: R - Urban, Rural, Regional, Real Estate, and Transportation Economics/R.R1 - General Regional Economics/R.R1.R11 - Regional Economic Activity: Growth, Development, Environmental Issues, and Changes ,Environmental economics ,[SHS.ECO]Humanities and Social Sciences/Economics and Finance ,General Energy ,13. Climate action ,JEL: Q - Agricultural and Natural Resource Economics • Environmental and Ecological Economics/Q.Q3 - Nonrenewable Resources and Conservation/Q.Q3.Q33 - Resource Booms ,JEL: R - Urban, Rural, Regional, Real Estate, and Transportation Economics/R.R2 - Household Analysis/R.R2.R20 - General ,Level evidence ,Tracking (education) ,business - Abstract
International audience; This paper investigates the determinants of smart energy tracking app usage by citizens residing in French cities. Our framework is inspired by the extant strands of literature on smart cities and smart home technology adoption, but also contributing to them as smart energy applications reveal specificities that need to be incorporated; the latter include, for instance, the distinction between adoption and frequency of use, or the consideration of additional determinants such as privacy or environmental concerns. For our study, we build an original survey and rely upon citizen-level data, testing a Zero-Inflated Ordered Probit (ZIOP) model which allows to differentiate between adoption of the smart energy app and its frequency of utilisation. Our empirical findings reveal how the drivers related to smart city characteristics mainly affect the decision of adoption of energy tracking apps. Conversely, the more individual characteristics related to the perceived benefits of using energy tracking apps, dwelling type, and privacy concerns, primarily affect the frequency of utilisation. Our results bear policy implications on the issue of privacy, premising additional research on energy challenges in the utilization of energy apps in smart versus non-smart environments.
- Published
- 2020
38. Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes
- Author
-
Qian, Jian, Fruit, Ronan, Pirotta, Matteo, and Lazaric, Alessandro
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
We introduce and analyse two algorithms for exploration-exploitation in discrete and continuous Markov Decision Processes (MDPs) based on exploration bonuses. SCAL$^+$ is a variant of SCAL (Fruit et al., 2018) that performs efficient exploration-exploitation in any unknown weakly-communicating MDP for which an upper bound C on the span of the optimal bias function is known. For an MDP with $S$ states, $A$ actions and $\Gamma \leq S$ possible next states, we prove that SCAL$^+$ achieves the same theoretical guarantees as SCAL (i.e., a high probability regret bound of $\widetilde{O}(C\sqrt{\Gamma SAT})$), with a much smaller computational complexity. Similarly, C-SCAL$^+$ exploits an exploration bonus to achieve sublinear regret in any undiscounted MDP with continuous state space. We show that C-SCAL$^+$ achieves the same regret bound as UCCRL (Ortner and Ryabko, 2012) while being the first implementable algorithm with regret guarantees in this setting. While optimistic algorithms such as UCRL, SCAL or UCCRL maintain a high-confidence set of plausible MDPs around the true unknown MDP, SCAL$^+$ and C-SCAL$^+$ leverage on an exploration bonus to directly plan on the empirically estimated MDP, thus being more computationally efficient.
- Published
- 2018
- Full Text
- View/download PDF
39. Thompson Sampling for Linear-Quadratic Control Problems
- Author
-
Abeille, Marc, Lazaric, Alessandro, Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), and ANR-14-CE24-0010,ExTra-Learn,Extraction et transfert de connaissances dans l'apprentissage par renforcement(2014)
- Subjects
FOS: Computer and information sciences ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,Machine Learning (stat.ML) - Abstract
International audience; We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (\ts) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret \ts in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy. This results in an overall regret of $O(T^{2/3})$, which is significantly worse than the regret $O(\sqrt{T})$ achieved by the optimism-in-face-of-uncertainty algorithm in LQ control problems.
- Published
- 2017
- Full Text
- View/download PDF
40. The new challenges of organizing intellectual property in complex industries: A discussion based on the case of Thales
- Author
-
Michel Callois, Liliana Mitkova, Cécile Ayerbe, and Nathalie Lazaric
- Subjects
business.industry ,Management of Technology and Innovation ,General Engineering ,Context (language use) ,Business ,Intellectual property ,Marketing ,Division of labour ,Industrial organization ,Outsourcing - Abstract
The defence industries in France and elsewhere have, in recent years, undergone important technological, organizational and institutional changes that have profoundly altered their architectures. These changes have introduced a new division of labour bringing new opportunities for interaction leading to the creation of additional assets. In this context, the issue of protecting innovations and their exploitation has become central. Managing Intellectual Property Rights (IPR) requires industrial groups to draw on additional capabilities. This article analyzes these evolutions and focuses in particular on the new organizational arrangements that have accompanied them. Using the case of Thales, which in 2005 outsourced its Intellectual Property (IP), we answer questions such as: why should IP be outsourced; how should the outsourcing of IP activities be organized; and, how should capabilities involved in this new organizational arrangement be managed. These issues lie at the centre of this research and illustrate new challenges inherent to in-house and outsourced IPR management strategies.
- Published
- 2014
41. Reinforcement Learning of POMDPs using Spectral Methods
- Author
-
Azizzadenesheli, Kamyar, Lazaric, Alessandro, Anandkumar, Animashree, University of California [Irvine] (UC Irvine), University of California (UC), Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), ANR-14-CE24-0010,ExTra-Learn,Extraction et transfert de connaissances dans l'apprentissage par renforcement(2014), University of California [Irvine] (UCI), and University of California
- Subjects
FOS: Computer and information sciences ,Computer Science::Machine Learning ,math.OC ,Computer Science - Artificial Intelligence ,cs.LG ,Computer Science - Numerical Analysis ,Machine Learning (stat.ML) ,Numerical Analysis (math.NA) ,cs.AI ,Latent Variable Model ,stat.ML ,Machine Learning (cs.LG) ,Computer Science - Learning ,Artificial Intelligence (cs.AI) ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Optimization and Control (math.OC) ,Statistics - Machine Learning ,FOS: Mathematics ,Spectral Methods ,Upper Confidence Reinforcement Learning ,Mathematics - Optimization and Control ,cs.NA ,Method of Moments ,Partially Observable Markov Decision Pro-cess - Abstract
International audience; We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through episodes, in each episode we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound w.r.t. the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.
- Published
- 2016
42. XIX. Sidney G. Winter. Évolutionnisme et management de l’innovation
- Author
-
Nathalie Lazaric
- Published
- 2016
43. Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting
- Author
-
Calandriello, Daniele, Lazaric, Alessandro, and Valko, Michal
- Subjects
FOS: Computer and information sciences ,Computer Science - Learning ,Statistics - Machine Learning ,Computer Science - Data Structures and Algorithms ,Machine Learning (stat.ML) ,Data Structures and Algorithms (cs.DS) ,Machine Learning (cs.LG) - Abstract
We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability. We rigorously take into account the dependencies across subsequent resparsifications using martingale inequalities, fixing a flaw in the original analysis.
- Published
- 2016
- Full Text
- View/download PDF
44. LQG for Portfolio Optimization
- Author
-
Alessandro Lazaric, Xavier Brokmann, Marc Abeille, and Emmanuel Sérié
- Subjects
Computer Science::Computer Science and Game Theory ,Mathematical optimization ,Computer science ,Solver ,Linear-quadratic-Gaussian control ,Optimal control ,FOS: Economics and business ,Portfolio Management (q-fin.PM) ,Control theory ,State space ,Uniqueness ,Observability ,Portfolio optimization ,Quantitative Finance - Portfolio Management - Abstract
We introduce a generic solver for dynamic portfolio allocation problems when the market exhibits return predictability, price impact and partial observability. We assume that the price modeling can be encoded into a linear state-space and we demonstrate how the problem then falls into the LQG framework. We derive the optimal control policy and introduce analytical tools that preserve the intelligibility of the solution. Furthermore, we link the existence and uniqueness of the optimal controller to a dynamical non-arbitrage criterion. Finally, we illustrate our method using a synthetic portfolio allocation problem., 20 pages, 6 figures, submitted to Quantitative Finance
- Published
- 2016
45. Overcoming inertia: insights from evolutionary economics into improved energy and climate policies
- Author
-
Kevin Maréchal and Nathalie Lazaric
- Subjects
Atmospheric Science ,Global and Planetary Change ,business.industry ,Economic framework ,media_common.quotation_subject ,Energy (esotericism) ,Environmental resource management ,Mainstream economics ,Climate change ,Environmental Science (miscellaneous) ,Management, Monitoring, Policy and Law ,Climate policy ,Individual level ,Inertia ,Microeconomics ,Economics ,Evolutionary economics ,business ,media_common - Abstract
The ‘efficiency paradox’ has generated controversy and suggests that mainstream economics is not neutral in the way it deals with climate change. An alternative economic framework, evolutionary economics, is used to investigate this crucial issue and offer insights into the development of a complementary framework for designing climate policy and for managing the transition to a low-carbon society. The evolutionary framework allows us to identify the presence of two sources of inertia (i.e. at the individual level through ‘habits’ and at the level of socio-technical systems) that mutually reinforce each other in a path-dependent manner. To overcome ‘carbon lock-in’, decision-makers should design measures (e.g. commitment strategies, niche management) that specifically target those change-resisting factors, as they tend to reduce the efficiency of traditional instruments. A series of recommendations for policy-makers is provided.
- Published
- 2010
46. La nouvelle architecture de l’industrie de la Défense en France
- Author
-
Sylvie Rochhia, Nathalie Lazaric, and Valérie Mérindol
- Abstract
L’architecture de l’industrie de la Defense se structure autour de deux acteurs essentiels qui interagissent etroitement pour l’elaboration des programmes complexes : les firmes integrateurs de systemes et l’Etat, client et maitre d’ouvrage des programmes. Au cours des annees 1990 et 2000, des changements technologiques et institutionnels ont affecte la division du travail et des connaissances. Les entretiens qualitatifs realises entre 2000 et 2008 ont permis d’observer les origines de ces mutations. Cet article met en evidence de nouvelles formes de co-specialisation des actifs entre partenaires publics et prives au sein du systeme national d’innovation. Plus precisement, il montre comment au sein de cette nouvelle architecture industrielle, le role et les competences de la Delegation Generale pour l’Armement (DGA) ont evolue en passant de la fonction de « maitre d’ouvrage architecte » a celle de « maitre d’ouvrage des interfaces ».
- Published
- 2009
47. Gatekeepers of Knowledge versus Platforms of Knowledge: From Potential to Realized Absorptive Capacity
- Author
-
Nathalie Lazaric, Catherine Thomas, Christian Longhi, Groupe de Recherche en Droit, Economie et Gestion (GREDEG), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), CNRS, Centre National de la Recherche Scientifique (CNRS), and COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Knowledge management ,0211 other engineering and technologies ,02 engineering and technology ,Sociology & anthropology ,High technology Cluster ,Sophia Antipolis ,Absorptive capacity ,0502 economics and business ,Social Sciences & Humanities ,Sociology ,Social sciences, sociology, anthropology ,ComputingMilieux_MISCELLANEOUS ,General Environmental Science ,Systems of innovation ,Sozialwissenschaften, Soziologie ,business.industry ,05 social sciences ,General Social Sciences ,021107 urban & regional planning ,Cognition ,Wirtschafts- und Sozialgeographie ,[SHS.ECO]Humanities and Social Sciences/Economics and Finance ,Economic and Social Geography ,Soziologie, Anthropologie ,Regional studies ,ddc:300 ,Knowledge Gatekeepers ,ddc:301 ,Sociology of Science, Sociology of Technology, Research on Science and Technology ,business ,Wissenschaftssoziologie, Wissenschaftsforschung, Technikforschung, Techniksoziologie ,050203 business & management ,Externality - Abstract
Lazaric N., Longhi C. and Thomas C. Gatekeepers of knowledge versus platforms of knowledge: from potential to realized absorptive capacity, Regional Studies. The development of clusters rests on geographical proximity, cognitive interactions as well as entrepreneurial initiatives. Sophia Antipolis, a multi-technology cluster in Valbonne, France, is a good illustration of the type of challenges local systems of innovation face in creating positive knowledge externalities. This paper shows that if the existence of ‘gatekeepers of knowledge’ can generate the potential implementation of ‘absorptive capacity’, its effective realization requires some additional effort regarding the transfer of knowledge into the cluster. The concept of ‘platform of knowledge’ defined shows how a project of knowledge codification could generate externalities by creating new opportunities for effectively combining and absorbing knowledge. Lazaric N., Longhi C. et Thomas C. Deux modalites opposees de diffusion des connaissances: l...
- Published
- 2008
48. Capacités d’absorption et d’interaction : une étude de la coopération dans les PME françaises
- Author
-
Nathalie Lazaric and Frédéric Huet
- Subjects
Economics and Econometrics ,capacités d’absorption ,Political science ,coopération ,Industrial relations ,Learning ,Absorptive Capacities ,Cognitive Distance ,Humanities ,apprentissage ,innovation ,distance cognitive - Abstract
La coopération est fréquemment envisagée comme un levier pour l’innovation dans les PME, notamment à l’égard de leurs ressources internes limitées. A travers une étude menée auprès de plus de 600 PME françaises (combinant approches qualitative et quantitative), nous mettons en évidence les facteurs d’apprentissage et d’innovation par la coopération. Les résultats économétriques montrent, tout d’abord, que l’innovation par la coopération repose sur une complémentarité et une articulation des capacités organisationnelles (absorption) et inter-organisationnelles (interaction). Par ailleurs, l’exploration par la coopération repose moins sur un pilotage et une stratégie délibérée que sur une recherche active et actualisée d’opportunités d’apprentissage se traduisant par une évolution et une redéfinition des frontières et du périmètre de la relation. Cooperation is frequently considered as a key ingredient for innovation for SMEs, particularly when considering their limited internal resources. Using a survey conducted among more than 600 French SMEs (combining quantitative and qualitative data), we shall highlight the factors related to learning and innovation through cooperation. Econometric results first show that innovation through cooperation relies on the complementarity and articulation of both organisational (absorption) and inter-organisational (interaction) capacities. Secondly, exploration through cooperation relies less on monitoring and deliberation than on active and up to date research on learning opportunities, which leads to the evolution and the redefinition of the frontiers and perimeter of the relation.
- Published
- 2008
49. Obituary: Steven Klepper
- Author
-
Kenneth I. Carlaw, Luigi Orsenigo, Charles R. McCann, Nathalie Lazaric, Roberto Fontana, Elias Dinopoulos, Horst Hanusch, Uwe Cantner, and Andreas Pyka
- Subjects
Economics and Econometrics ,Entrepreneurship ,Philosophy ,Obituary ,General Business, Management and Accounting ,Management - Published
- 2013
50. Nouveaux enjeux d'organisation de la propriété intellectuelle dans les industries complexes: une discussion à partir du cas Thales
- Author
-
Cécile Ayerbe, Michel Callois, Liliana Mitkova, Nathalie Lazaric, Groupe de Recherche en Droit, Economie et Gestion (GREDEG), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), Institut de Recherche en Gestion (IRG), Université Paris-Est Marne-la-Vallée (UPEM)-Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12), ESIA, Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS), Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12)-Université Paris-Est Marne-la-Vallée (UPEM), Lazaric, Nathalie, and Université Nice Sophia Antipolis (1965 - 2019) (UNS)
- Subjects
Economics and Econometrics ,Organisation ,Industrial relations ,Intellectual Property, Complex Industries, Organisation, Outsourcing, Case Study ,Propriété intellectuelle ,Organisation,Externalisation,Etude de cas,Propriété intellectuelle,Industries Complexes ,Etude de cas ,[SHS.ECO] Humanities and Social Sciences/Economics and Finance ,Externalisation ,Industries Complexes ,[SHS.ECO]Humanities and Social Sciences/Economics and Finance - Abstract
The Defense industry in France as well as abroad, has in recent years undergone important technological, organizational and institutional changes. These changes have introduced a new division of labour that has brought about new interaction opportunities. In this context, the question of the protection of innovations and of their exploitation has become central. All the more so as the management of Intellectual Property Rights (IPR) requires new competencies from industrial groups. This article analyses these evolutions and focuses particular attention on the new organizational arrangements that have accompanied them. The case of Thales, which in 2005 outsourced much of the management of their intellectual property, is representative of these changes. The question is, are we witnessing the emergence of a new division of labour with the creation of new co-specialized assets or of a new form of spin-offs specializing in protection activities. Our analysis will show that what is happening is less the emergence of a new form of co-specialized assets than that of “entrepreneurial spin offs”.
- Published
- 2011
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.