Author: "Grzes, Marek" / Topic: q335 - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Grzes, Marek"' showing total 5 results

Start Over Author "Grzes, Marek" Topic q335

Author: Grzes, Marek
Subjects: QA75, QA273, Q335
Abstract: Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming because the learning algorithms have to determine the long term consequences of their actions using delayed feedback or rewards. Reward shaping is a method of incorporating domain knowledge into reinforcement learning so that the algorithms are guided faster towards more promising solutions. Under an overarching theme of episodic reinforcement learning, this paper shows a unifying analysis of potential-based reward shaping which leads to new theoretical insights into reward shaping in both model-free and model-based algorithms, as well as in multi-agent reinforcement learning.
Published: 2017

Author: Liza, Farhana Ferdousi, Grzes, Marek, Verwer, Sicco, van Zaanen, Menno, and Smetsers, Rick
Subjects: Q335
Abstract: We present methods used in our submission to the Sequence Prediction ChallengE (SPiCe’16) 1 .\ud The two methods used to solve the competition tasks were spectral learning and a count\ud based method. Spectral learning led to better results on most of the problems.
Published: 2016

Author: Grzes, Marek and Poupart, Pascal
Subjects: Q335
Abstract: Partially observable Markov decision processes (POMDPs) provide a natural framework to design applications that continuously make decisions based on noisy sensor measurements. The recent proliferation of smart phones and other wearable devices leads to new applications where, unfortunately, energy efficiency becomes an issue. To circumvent energy requirements, finite-state controllers can be applied because they are computationally inexpensive to execute. Additionally, when multi-agent POMDPs (e.g. Dec-POMDPs or I-POMDPs) are taken into account, finite-state controllers become one of the most important policy representations. Online methods scale the best; however, they are energy demanding. Thus methods to optimize finite-state controllers are necessary. In this paper, we present a new, efficient approach to bounded policy interaction (BPI). BPI keeps the size of the controller small which is a desirable property for applications, especially on small devices. However, finding an optimal or near optimal finite-state controller of a bounded size poses a challenging combinatorial optimization problem. Exhaustive search methods clearly do not scale to larger problems, whereas local search methods are subject to local optima. Our new approach solves all of the common benchmarks on which local search methods fail, yet it scales to large problems.
Published: 2015

Author: Grzes, Marek and Poupart, Pascal
Subjects: Q335
Abstract: In planning with partially observable Markov decision processes, pre-compiled policies are often represented as finite state controllers or sets of alpha-vectors, which provide a lower bound on the value of the optimal policy. Some algorithms (e.g., HSVI2, SARSOP, GapMin) also compute an upper bound to guide the search and to offer performance guarantees, but they do not derive a policy from this upper bound due to computational reasons. The execution of a policy derived from an upper bound requires a one step lookahead simulation to determine the next best action and the evaluation of the upper bound at the reachable beliefs is complicated and costly (i.e., linear programming or sawtoooth approximation). The first aim of this paper is to show principled and computationally cheap ways of executing upper bound policies which can be even faster than executing lower bound policies based on alpha vectors. The second complementary contribution is a new method to find better upper bound policies that outperforms those obtained by existing algorithms, such as HSVI2, SARSOP, or GapMin, on a suite of benchmarks. Our approach is based on a novel synthesis of augmented and deterministic POMDPs and it facilitates efficient optimization of upper bound policies.
Published: 2014

Author: Grzes, Marek, Poupart, Pascal, and Hoey, Jesse
Subjects: Q335, Q1
Abstract: The recent proliferation of smart-phones and other wearable devices has lead\ud to a surge of new mobile applications. Partially observable Markov decision\ud processes provide a natural framework to design applications that\ud continuously make decisions based on noisy sensor measurements. However,\ud given the limited battery life, there is a need to minimize the amount of\ud online computation. This can be achieved by compiling a policy into a\ud finite state controller since there is no need for belief monitoring or\ud online search. In this paper, we propose a new branch and bound technique\ud to search for a good controller. In contrast to many existing algorithms\ud for controllers, our search technique is not subject to local optima. We\ud also show how to reduce the amount of search by avoiding the enumeration of\ud isomorphic controllers and by taking advantage of suitable upper and lower\ud bounds. The approach is demonstrated on several benchmark problems as well\ud as a smart-phone application to assist persons with Alzheimer's to wayfind.
Published: 2013

Books, media, physical & digital resources

Searchworks