Author: "Orseau, Laurent" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Orseau, Laurent"' showing total 153 results

Start Over Author "Orseau, Laurent"

153 results on '"Orseau, Laurent"'

1. Super-Exponential Regret for UCT, AlphaGo and Variants

Author: Orseau, Laurent and Munos, Remi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $\Omega(D)$ exp terms) on the $D$-chain environment, and that a `polynomial' UCT variant has $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$, which we fix in the present draft. We also adapt the proofs to AlphaGo's MCTS and its descendants (e.g., AlphaZero, Leela Zero) to also show $\exp_2(\exp_2(D - O(\log D)))$ regret.
Published: 2024

2. Learning Universal Predictors

Author: Grau-Moya, Jordi, Genewein, Tim, Hutter, Marcus, Orseau, Laurent, Delétang, Grégoire, Catt, Elliot, Ruoss, Anian, Wenliang, Li Kevin, Mattern, Christopher, Aitchison, Matthew, and Veness, Joel
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data. Broad exposure to different tasks leads to versatile representations enabling general problem solving. But, what are the limits of meta-learning? In this work, we explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction (SI), into neural networks via leveraging meta-learning to its limits. We use Universal Turing Machines (UTMs) to generate training data used to expose networks to a broad range of patterns. We provide theoretical analysis of the UTM data generation processes and meta-training protocols. We conduct comprehensive experiments with neural architectures (e.g. LSTMs, Transformers) and algorithmic data generators of varying complexity and universality. Our results suggest that UTM data is a valuable resource for meta-learning, and that it can be used to train neural networks capable of learning universal prediction strategies., Comment: 32 pages, 11 figures
Published: 2024

3. Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

Author: Mehrabian, Abbas, Anand, Ankit, Kim, Hyunjik, Sonnerat, Nicolas, Balog, Matej, Comanici, Gheorghe, Berariu, Tudor, Lee, Andrew, Ruoss, Anian, Bulanova, Anna, Toyama, Daniel, Blackwell, Sam, Paredes, Bernardino Romera, Veličković, Petar, Orseau, Laurent, Lee, Joonkyung, Naredla, Anurag Murty, Precup, Doina, and Wagner, Adam Zsolt
Subjects: Computer Science - Artificial Intelligence, Computer Science - Discrete Mathematics, Computer Science - Machine Learning
Abstract: This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erd\H{o}s, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles. We formulate this problem as a sequential decision-making problem and compare AlphaZero, a neural network-guided tree search, with tabu search, a heuristic local search method. Using either method, by introducing a curriculum -- jump-starting the search for larger graphs using good graphs found at smaller sizes -- we improve the state-of-the-art lower bounds for several sizes. We also propose a flexible graph-generation environment and a permutation-invariant network architecture for learning to search in the space of graphs., Comment: To appear in the proceedings of IJCAI 2024. First three authors contributed equally, last two authors made equal senior contribution
Published: 2023

4. Language Modeling Is Compression

Author: Delétang, Grégoire, Ruoss, Anian, Duquenne, Paul-Ambroise, Catt, Elliot, Genewein, Tim, Mattern, Christopher, Grau-Moya, Jordi, Wenliang, Li Kevin, Aitchison, Matthew, Orseau, Laurent, Hutter, Marcus, and Veness, Joel
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Information Theory
Abstract: It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.
Published: 2023

5. Line Search for Convex Minimization

Author: Orseau, Laurent and Hutter, Marcus
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: Golden-section search and bisection search are the two main principled algorithms for 1d minimization of quasiconvex (unimodal) functions. The first one only uses function queries, while the second one also uses gradient queries. Other algorithms exist under much stronger assumptions, such as Newton's method. However, to the best of our knowledge, there is no principled exact line search algorithm for general convex functions -- including piecewise-linear and max-compositions of convex functions -- that takes advantage of convexity. We propose two such algorithms: $\Delta$-Bisection is a variant of bisection search that uses (sub)gradient information and convexity to speed up convergence, while $\Delta$-Secant is a variant of golden-section search and uses only function queries. While bisection search reduces the $x$ interval by a factor 2 at every iteration, $\Delta$-Bisection reduces the (sometimes much) smaller $x^*$-gap $\Delta^x$ (the $x$ coordinates of $\Delta$) by at least a factor 2 at every iteration. Similarly, $\Delta$-Secant also reduces the $x^*$-gap by at least a factor 2 every second function query. Moreover, the $y^*$-gap $\Delta^y$ (the $y$ coordinates of $\Delta$) also provides a refined stopping criterion, which can also be used with other algorithms. Experiments on a few convex functions confirm that our algorithms are always faster than their quasiconvex counterparts, often by more than a factor 2. We further design a quasi-exact line search algorithm based on $\Delta$-Secant. It can be used with gradient descent as a replacement for backtracking line search, for which some parameters can be finicky to tune -- and we provide examples to this effect, on strongly-convex and smooth functions. We provide convergence guarantees, and confirm the efficiency of quasi-exact line search on a few single- and multivariate convex functions.
Published: 2023

6. Levin Tree Search with Context Models

Author: Orseau, Laurent, Hutter, Marcus, and Lelis, Levi H. S.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy. This guarantee can be used as a loss function, which we call the LTS loss, to optimize neural networks representing the policy (LTS+NN). In this work we show that the neural network can be substituted with parameterized context models originating from the online compression literature (LTS+CM). We show that the LTS loss is convex under this new model, which allows for using standard convex optimization tools, and obtain convergence guarantees to the optimal parameters in an online setting for a given set of solution trajectories -- guarantees that cannot be provided for neural networks. The new LTS+CM algorithm compares favorably against LTS+NN on several benchmarks: Sokoban (Boxoban), The Witness, and the 24-Sliding Tile puzzle (STP). The difference is particularly large on STP, where LTS+NN fails to solve most of the test instances while LTS+CM solves each test instance in a fraction of a second. Furthermore, we show that LTS+CM is able to learn a policy that solves the Rubik's cube in only a few hundred expansions, which considerably improves upon previous machine learning techniques.
Published: 2023

7. Memory-Based Meta-Learning on Non-Stationary Distributions

Author: Genewein, Tim, Delétang, Grégoire, Ruoss, Anian, Wenliang, Li Kevin, Catt, Elliot, Dutordoir, Vincent, Grau-Moya, Jordi, Orseau, Laurent, Hutter, Marcus, and Veness, Joel
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. The focus is on piecewise stationary sources with unobserved switching-points, which arguably capture an important characteristic of natural language and action-observation sequences in partially observable environments. We show that various types of memory-based neural models, including Transformers, LSTMs, and RNNs can learn to accurately approximate known Bayes-optimal algorithms and behave as if performing Bayesian inference over the latent switching-points and the latent parameters governing the data distribution within each segment.
Published: 2023

8. Isotuning With Applications To Scale-Free Online Learning

Author: Orseau, Laurent and Hutter, Marcus
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms. Scale-free regret bounds must scale linearly with the maximum loss, both toward large losses and toward very small losses. Adaptive regret bounds demonstrate that an algorithm can take advantage of easy data and potentially have constant regret. We seek to develop fast algorithms that depend on as few parameters as possible, in particular they should be anytime and thus not depend on the time horizon. Our first and main tool, isotuning, is a generalization of the idea of designing adaptive learning rates that balance the trade-off of the regret. We provide a simple and versatile theorem that can be applied to a wide range of settings, and competes with the best balancing in hindsight within a factor 2. The second tool is an online correction, which allows us to obtain centered bounds for many algorithms, to prevent the regret bounds from being vacuous when the domain is overly large or only partially constrained. The last tool, null updates, prevents the algorithm from performing overly large updates, which could result in unbounded regret, or even invalid updates. We develop a general theory to combine all these tools and apply it to several standard algorithms. In particular, we (almost entirely) restore the adaptivity to small losses of FTRL for unbounded domains, design and prove scale-free adaptive guarantees for a variant of Mirror Descent (at least when the Bregman divergence is convex in its second argument), extend Adapt-ML-Prod to scale-free guarantees, and provide several additional contributions about Prod, AdaHedge, BOA and Soft-Bayes.
Published: 2021

9. Proving Theorems using Incremental Learning and Hindsight Experience Replay

Author: Aygün, Eser, Orseau, Laurent, Anand, Ankit, Glorot, Xavier, Firoiu, Vlad, Zhang, Lei M., Precup, Doina, and Mourad, Shibl
Subjects: Computer Science - Artificial Intelligence, Computer Science - Logic in Computer Science, I.2.3
Abstract: Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains. Machine learning approaches in literature either depend on these traditional provers to bootstrap themselves or fall short on reaching comparable performance. In this paper, we propose a general incremental learning algorithm for training domain specific provers for first-order logic without equality, based only on a basic given-clause algorithm, but using a learned clause-scoring function. Clauses are represented as graphs and presented to transformer networks with spectral features. To address the sparsity and the initial lack of training data as well as the lack of a natural curriculum, we adapt hindsight experience replay to theorem proving, so as to be able to learn even when no proof can be found. We show that provers trained this way can match and sometimes surpass state-of-the-art traditional provers on the TPTP dataset in terms of both quantity and quality of the proofs., Comment: 16 pages, 2 figures
Published: 2021

10. Goal Misgeneralization in Deep Reinforcement Learning

Author: Langosco, Lauro, Koch, Jack, Sharkey, Lee, Pfau, Jacob, Orseau, Laurent, and Krueger, David
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL). Goal misgeneralization failures occur when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time. We formalize this distinction between capability and goal generalization, provide the first empirical demonstrations of goal misgeneralization, and present a partial characterization of its causes., Comment: Published in ICML 2022. 9 Pages
Published: 2021

11. Policy-Guided Heuristic Search with Guarantees

Author: Orseau, Laurent and Lelis, Levi H. S.
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The use of a policy and a heuristic function for guiding search can be quite effective in adversarial problems, as demonstrated by AlphaGo and its successors, which are based on the PUCT search algorithm. While PUCT can also be used to solve single-agent deterministic problems, it lacks guarantees on its search effort and it can be computationally inefficient in practice. Combining the A* algorithm with a learned heuristic function tends to work better in these domains, but A* and its variants do not use a policy. Moreover, the purpose of using A* is to find solutions of minimum cost, while we seek instead to minimize the search loss (e.g., the number of search steps). LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function. In this work we introduce Policy-guided Heuristic Search (PHS), a novel search algorithm that uses both a heuristic function and a policy and has theoretical guarantees on the search loss that relates to both the quality of the heuristic and of the policy. We show empirically on the sliding-tile puzzle, Sokoban, and a puzzle from the commercial game `The Witness' that PHS enables the rapid learning of both a policy and a heuristic function and compares favorably with A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT in terms of number of problems solved and search time in all three domains tested.
Published: 2021

12. Training a First-Order Theorem Prover from Synthetic Data

Author: Firoiu, Vlad, Aygun, Eser, Anand, Ankit, Ahmed, Zafarali, Glorot, Xavier, Orseau, Laurent, Zhang, Lei, Precup, Doina, and Mourad, Shibl
Subjects: Computer Science - Artificial Intelligence
Abstract: A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. We use these theorems to train a neurally-guided saturation-based prover. Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps, and shows significant transfer to the unseen human-written theorems from the TPTP library, where it solves 72\% of first-order problems without equality.
Published: 2021

13. Avoiding Side Effects By Considering Future Tasks

Author: Krakovna, Victoria, Orseau, Laurent, Ngo, Richard, Martic, Miljan, and Legg, Shane
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions., Comment: Published in NeurIPS 2020
Published: 2020

14. Logarithmic Pruning is All You Need

Author: Orseau, Laurent, Hutter, Marcus, and Rivasplata, Omar
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network. An even stronger conjecture has been proven recently: Every sufficiently overparameterized network contains a subnetwork that, at random initialization, but without training, achieves comparable accuracy to the trained large network. This latter result, however, relies on a number of strong assumptions and guarantees a polynomial factor on the size of the large network compared to the target function. In this work, we remove the most limiting assumptions of this previous work while providing significantly tighter bounds:the overparameterized network only needs a logarithmic factor (in all variables but depth) number of neurons per weight of the target subnetwork., Comment: NeurIPS 2020
Published: 2020

15. Learning to Prove from Synthetic Theorems

Author: Aygün, Eser, Ahmed, Zafarali, Anand, Ankit, Firoiu, Vlad, Glorot, Xavier, Orseau, Laurent, Precup, Doina, and Mourad, Shibl
Subjects: Computer Science - Logic in Computer Science, Computer Science - Machine Learning, I.2.3
Abstract: A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems. We demonstrate that a prover trained exclusively on synthetic theorems can solve a substantial fraction of problems in TPTP, a benchmark dataset that is used to compare state-of-the-art heuristic provers. Our approach outperforms a model trained on human-generated problems in most axiom sets, thereby showing the promise of using synthetic data for this task., Comment: 17 pages, 6 figures, submitted to NeurIPS 2020
Published: 2020

16. Pitfalls of learning a reward function online

Author: Armstrong, Stuart, Leike, Jan, Orseau, Laurent, and Legg, Shane
Subjects: Computer Science - Artificial Intelligence
Abstract: In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We consider a continual (``one life'') learning approach where the agent both learns the reward function and optimises for it at the same time. We show that this comes with a number of pitfalls, such as deliberately manipulating the learning process in one direction, refusing to learn, ``learning'' facts already known to the agent, and making decisions that are strictly dominated (for all relevant reward functions). We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise. The second is `uninfluenceability', whereby the reward-function learning process operates by learning facts about the environment. We show that an uninfluenceable process is automatically unriggable, and if the set of possible environments is sufficiently rich, the converse is true too.
Published: 2020

17. Iterative Budgeted Exponential Search

Author: Helmert, Malte, Lattimore, Tor, Lelis, Levi H. S., Orseau, Laurent, and Sturtevant, Nathan R.
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Artificial Intelligence
Abstract: We tackle two long-standing problems related to re-expansions in heuristic search algorithms. For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound. Existing algorithms that address this problem like B and B' improve this bound to $\Omega(n^2)$. For tree search, IDA* can also require $\Omega(n^2)$ expansions. We describe a new algorithmic framework that iteratively controls an expansion budget and solution cost limit, giving rise to new graph and tree search algorithms for which the number of expansions is $O(n \log C)$, where $C$ is the optimal solution cost. Our experiments show that the new algorithms are robust in scenarios where existing algorithms fail. In the case of tree search, our new algorithms have no overhead over IDA* in scenarios to which IDA* is well suited and can therefore be recommended as a general replacement for IDA*.
Published: 2019

18. Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

Author: Orseau, Laurent, Lelis, Levi H. S., and Lattimore, Tor
Subjects: Computer Science - Artificial Intelligence, Computer Science - Data Structures and Algorithms
Abstract: We introduce and analyze two parameter-free linear-memory tree search algorithms. Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree. Previously, the best guarantee for a linear-memory algorithm under similar assumptions was achieved by IDA*, which in the worst case expands quadratically more nodes than in its last iteration. Empirical results support the theory and demonstrate the practicality and robustness of our algorithms. Furthermore, they are fast and easy to implement., Comment: This paper and another independent IJCAI 2019 submission have been merged into a single paper that subsumes both of them (Helmert et. al., 2019). This paper is placed here only for historical context. Please only cite the subsuming paper
Published: 2019

19. An investigation of model-free planning

Author: Guez, Arthur, Mirza, Mehdi, Gregor, Karol, Kabra, Rishabh, Racanière, Sébastien, Weber, Théophane, Raposo, David, Santoro, Adam, Orseau, Laurent, Eccles, Tom, Wayne, Greg, Silver, David, and Lillicrap, Timothy
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.
Published: 2019

20. Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

Author: Orseau, Laurent, Lattimore, Tor, and Legg, Shane
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms. We argue that existing algorithms such as exponentiated gradient, online gradient descent and online Newton step do not adequately satisfy both requirements. Our main contribution is an analysis of the Prod algorithm that is robust to any data sequence and runs in linear time relative to the number of experts in each round. Despite the unbounded nature of the log-loss, we derive a bound that is independent of the largest loss and of the largest gradient, and depends only on the number of experts and the time horizon. Furthermore we give a Bayesian interpretation of Prod and adapt the algorithm to derive a tracking regret.
Published: 2019

21. Single-Agent Policy Tree Search With Guarantees

Author: Orseau, Laurent, Lelis, Levi H. S., Lattimore, Tor, and Weber, Théophane
Subjects: Computer Science - Artificial Intelligence
Abstract: We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to prove an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for `needle-in-a-haystack' problems. The second algorithm is based on sampling and we prove an upper bound on the expected number of nodes it expands before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide the search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.
Published: 2018

22. Penalizing side effects using stepwise relative reachability

Author: Krakovna, Victoria, Orseau, Laurent, Kumar, Ramana, Martic, Miljan, and Legg, Shane
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? We show that current approaches to penalizing side effects can introduce bad incentives, e.g. to prevent any irreversible changes in the environment, including the actions of other agents. To isolate the source of such undesirable incentives, we break down side effects penalties into two components: a baseline state and a measure of deviation from this baseline state. We argue that some of these incentives arise from the choice of baseline, and others arise from the choice of deviation measure. We introduce a new variant of the stepwise inaction baseline and a new deviation measure based on relative reachability of states. The combination of these design choices avoids the given undesirable incentives, while simpler baselines and the unreachability measure fail. We demonstrate this empirically by comparing different combinations of baseline and deviation measure choices on a set of gridworld experiments designed to illustrate possible bad incentives.
Published: 2018

23. Agents and Devices: A Relative Definition of Agency

Author: Orseau, Laurent, McGill, Simon McGregor, and Legg, Shane
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: According to Dennett, the same system may be described using a `physical' (mechanical) explanatory stance, or using an `intentional' (belief- and goal-based) explanatory stance. Humans tend to find the physical stance more helpful for certain systems, such as planets orbiting a star, and the intentional stance for others, such as living animals. We define a formal counterpart of physical and intentional stances within computational theory: a description of a system as either a device, or an agent, with the key difference being that `devices' are directly described in terms of an input-output mapping, while `agents' are described in terms of the function they optimise. Bayes' rule can then be applied to calculate the subjective probability of a system being a device or an agent, based only on its behaviour. We illustrate this using the trajectories of an object in a toy grid-world domain.
Published: 2018

24. AI Safety Gridworlds

Author: Leike, Jan, Martic, Miljan, Krakovna, Victoria, Ortega, Pedro A., Everitt, Tom, Lefrancq, Andrew, Orseau, Laurent, and Legg, Shane
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence
Abstract: We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.
Published: 2017

25. Reinforcement Learning with a Corrupted Reward Channel

Author: Everitt, Tom, Krakovna, Victoria, Orseau, Laurent, Hutter, Marcus, and Legg, Shane
Subjects: Computer Science - Artificial Intelligence, Computer Science - Learning, Statistics - Machine Learning, I.2.6, I.2.8
Abstract: No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions., Comment: A shorter version of this report was accepted to IJCAI 2017 AI and Autonomy track
Published: 2017

26. Thompson Sampling is Asymptotically Optimal in General Environments

Author: Leike, Jan, Lattimore, Tor, Orseau, Laurent, and Hutter, Marcus
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear., Comment: UAI 2016
Published: 2016

27. Levin Tree Search with Context Models

Author: Orseau, Laurent, primary, Hutter, Marcus, additional, and Lelis, Levi H. S., additional
Published: 2023
Full Text: View/download PDF

28. Teleporting Universal Intelligent Agents

Author: Orseau, Laurent, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Tanaka, Yuzuru, editor, Wahlster, Wolfgang, editor, Siekmann, Jörg, editor, Goertzel, Ben, editor, Orseau, Laurent, editor, and Snaider, Javier, editor
Published: 2014
Full Text: View/download PDF

29. The Multi-slot Framework: A Formal Model for Multiple, Copiable AIs

Author: Orseau, Laurent, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Tanaka, Yuzuru, editor, Wahlster, Wolfgang, editor, Siekmann, Jörg, editor, Goertzel, Ben, editor, Orseau, Laurent, editor, and Snaider, Javier, editor
Published: 2014
Full Text: View/download PDF

30. Universal Knowledge-Seeking Agents for Stochastic Environments

Author: Orseau, Laurent, Lattimore, Tor, Hutter, Marcus, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Jain, Sanjay, editor, Munos, Rémi, editor, Stephan, Frank, editor, and Zeugmann, Thomas, editor
Published: 2013
Full Text: View/download PDF

31. Space-Time Embedded Intelligence

Author: Orseau, Laurent, Ring, Mark, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Bach, Joscha, editor, Goertzel, Ben, editor, and Iklé, Matthew, editor
Published: 2012
Full Text: View/download PDF

32. Memory Issues of Intelligent Agents

Author: Orseau, Laurent, Ring, Mark, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Bach, Joscha, editor, Goertzel, Ben, editor, and Iklé, Matthew, editor
Published: 2012
Full Text: View/download PDF

33. Universal Knowledge-Seeking Agents

Author: Orseau, Laurent, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Kivinen, Jyrki, editor, Szepesvári, Csaba, editor, Ukkonen, Esko, editor, and Zeugmann, Thomas, editor
Published: 2011
Full Text: View/download PDF

34. Delusion, Survival, and Intelligent Agents

Author: Ring, Mark, Orseau, Laurent, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Schmidhuber, Jürgen, editor, Thórisson, Kristinn R., editor, and Looks, Moshe, editor
Published: 2011
Full Text: View/download PDF

35. Self-Modification and Mortality in Artificial Agents

Author: Orseau, Laurent, Ring, Mark, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Schmidhuber, Jürgen, editor, Thórisson, Kristinn R., editor, and Looks, Moshe, editor
Published: 2011
Full Text: View/download PDF

36. Optimality Issues of Universal Greedy Agents with Static Priors

Author: Orseau, Laurent, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Hutter, Marcus, editor, Stephan, Frank, editor, Vovk, Vladimir, editor, and Zeugmann, Thomas, editor
Published: 2010
Full Text: View/download PDF

37. Universal knowledge-seeking agents

Author: Orseau, Laurent
Published: 2014
Full Text: View/download PDF

38. Short Term Memories and Forcing the Re-use of Knowledge for Generalization

Author: Orseau, Laurent, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Duch, Włodzisław, editor, Kacprzyk, Janusz, editor, Oja, Erkki, editor, and Zadrożny, Sławomir, editor
Published: 2005
Full Text: View/download PDF

39. Asymptotic non-learnability of universal agents with computable horizon functions

Author: Orseau, Laurent
Published: 2013
Full Text: View/download PDF

40. Proving Theorems using Incremental Learning and Hindsight Experience Replay

Author: Ayg��n, Eser, Orseau, Laurent, Anand, Ankit, Glorot, Xavier, Firoiu, Vlad, Zhang, Lei M., Precup, Doina, and Mourad, Shibl
Subjects: FOS: Computer and information sciences, Computer Science - Logic in Computer Science, I.2.3, Artificial Intelligence (cs.AI), TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, Computer Science - Artificial Intelligence, Logic in Computer Science (cs.LO)
Abstract: Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains. Machine learning approaches in literature either depend on these traditional provers to bootstrap themselves or fall short on reaching comparable performance. In this paper, we propose a general incremental learning algorithm for training domain specific provers for first-order logic without equality, based only on a basic given-clause algorithm, but using a learned clause-scoring function. Clauses are represented as graphs and presented to transformer networks with spectral features. To address the sparsity and the initial lack of training data as well as the lack of a natural curriculum, we adapt hindsight experience replay to theorem proving, so as to be able to learn even when no proof can be found. We show that provers trained this way can match and sometimes surpass state-of-the-art traditional provers on the TPTP dataset in terms of both quantity and quality of the proofs., 16 pages, 2 figures
Published: 2021

41. Teleporting Universal Intelligent Agents

Author: Orseau, Laurent, primary
Published: 2014
Full Text: View/download PDF

42. The Multi-slot Framework: A Formal Model for Multiple, Copiable AIs

Author: Orseau, Laurent, primary
Published: 2014
Full Text: View/download PDF

43. Policy-Guided Heuristic Search with Guarantees

Author: Orseau, Laurent, primary and Lelis, Levi H. S., additional
Published: 2021
Full Text: View/download PDF

44. Learning to Prove from Synthetic Theorems

Author: Ayg��n, Eser, Ahmed, Zafarali, Anand, Ankit, Firoiu, Vlad, Glorot, Xavier, Orseau, Laurent, Precup, Doina, and Mourad, Shibl
Subjects: FOS: Computer and information sciences, Computer Science - Logic in Computer Science, Computer Science - Machine Learning, I.2.3, TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, Logic in Computer Science (cs.LO), Machine Learning (cs.LG)
Abstract: A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems. We demonstrate that a prover trained exclusively on synthetic theorems can solve a substantial fraction of problems in TPTP, a benchmark dataset that is used to compare state-of-the-art heuristic provers. Our approach outperforms a model trained on human-generated problems in most axiom sets, thereby showing the promise of using synthetic data for this task., 17 pages, 6 figures, submitted to NeurIPS 2020
Published: 2020

45. Space-Time Embedded Intelligence

Author: Orseau, Laurent, primary and Ring, Mark, additional
Published: 2012
Full Text: View/download PDF

46. Memory Issues of Intelligent Agents

Author: Orseau, Laurent, primary and Ring, Mark, additional
Published: 2012
Full Text: View/download PDF

47. Self-Modification and Mortality in Artificial Agents

Author: Orseau, Laurent, primary and Ring, Mark, additional
Published: 2011
Full Text: View/download PDF

48. Delusion, Survival, and Intelligent Agents

Author: Ring, Mark, primary and Orseau, Laurent, additional
Published: 2011
Full Text: View/download PDF

49. Apprentissage artificiel

Author: Orseau, Laurent, primary
Published: 2011
Full Text: View/download PDF

50. Pitfalls of Learning a Reward Function Online

Author: Armstrong, Stuart, primary, Leike, Jan, additional, Orseau, Laurent, additional, and Legg, Shane, additional
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

153 results on '"Orseau, Laurent"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources