Author: "Schlegel, Matthew" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Schlegel, Matthew"' showing total 11 results

Start Over Author "Schlegel, Matthew"

11 results on '"Schlegel, Matthew"'

1. Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

Author: Zhu, Lingwei, Chen, Zheng, Schlegel, Matthew, and White, Martha
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence -- called the Tsallis KL divergence -- which use the $q$-logarithm in the definition. The approach is a strict generalization, as $q = 1$ corresponds to the standard KL divergence; $q > 1$ provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when $q >1$ could be beneficial. To obtain a practical algorithm that incorporates Tsallis KL regularization, we extend MVI, which is one of the simplest approaches to incorporate KL regularization. We show that this generalized MVI($q$) obtains significant improvements over the standard MVI($q = 1$) across 35 Atari games., Comment: Accepted by NeurIPS 2023
Published: 2023

2. Continual Auxiliary Task Learning

Author: McLeod, Matthew, Lo, Chunlok, Schlegel, Matthew, Jacobsen, Andrew, Kumaraswamy, Raksha, White, Martha, and White, Adam
Subjects: Computer Science - Machine Learning
Abstract: Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there is little work on how to adapt the behavior to gather useful data for those off-policy predictions. In this work, we investigate a reinforcement learning system designed to learn a collection of auxiliary tasks, with a behavior policy learning to take actions to improve those auxiliary predictions. We highlight the inherent non-stationarity in this continual auxiliary task learning problem, for both prediction learners and the behavior learner. We develop an algorithm based on successor features that facilitates tracking under non-stationary rewards, and prove the separation into learning successor features and rewards provides convergence rate improvements. We conduct an in-depth study into the resulting multi-prediction learning system., Comment: Neural Information Processing Systems 2021
Published: 2022

3. Meta-descent for Online, Continual Prediction

Author: Jacobsen, Andrew, Schlegel, Matthew, Linke, Cameron, Degris, Thomas, White, Adam, and White, Martha
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update---a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the step-size parameters to minimize prediction error. These meta-descent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental meta-descent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot., Comment: AAAI Conference on Artificial Intelligence 2019. v2: Correction to Baird's counterexample. A bug in the code lead to results being reported for AMSGrad in this experiment, when they were actually results for Adam
Published: 2019

4. Importance Resampling for Off-policy Prediction

Author: Schlegel, Matthew, Chung, Wesley, Graves, Daniel, Qian, Jian, and White, Martha
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning. While it is consistent and unbiased, it can result in high variance updates to the weights for the value function. In this work, we explore a resampling strategy as an alternative to reweighting. We propose Importance Resampling (IR) for off-policy prediction, which resamples experience from a replay buffer and applies standard on-policy updates. The approach avoids using importance sampling ratios in the update, instead correcting the distribution before the update. We characterize the bias and consistency of IR, particularly compared to Weighted IS (WIS). We demonstrate in several microworlds that IR has improved sample efficiency and lower variance updates, as compared to IS and several variance-reduced IS strategies, including variants of WIS and V-trace which clips IS ratios. We also provide a demonstration showing IR improves over IS for learning a value function from images in a racing car simulator., Comment: Recently published in NeurIPS 2019
Published: 2019

5. Context-Dependent Upper-Confidence Bounds for Directed Exploration

Author: Kumaraswamy, Raksha, Schlegel, Matthew, White, Adam, and White, Martha
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment. Many algorithms use optimism to direct exploration, either through visitation estimates or upper confidence bounds, as opposed to data-inefficient strategies like \epsilon-greedy that use random, undirected exploration. Most data-efficient exploration methods require significant computation, typically relying on a learned model to guide exploration. Least-squares methods have the potential to provide some of the data-efficiency benefits of model-based approaches -- because they summarize past interactions -- with the computation closer to that of model-free approaches. In this work, we provide a novel, computationally efficient, incremental exploration strategy, leveraging this property of least-squares temporal difference learning (LSTD). We derive upper confidence bounds on the action-values learned by LSTD, with context-dependent (or state-dependent) noise variance. Such context-dependent noise focuses exploration on a subset of variable states, and allows for reduced exploration in other states. We empirically demonstrate that our algorithm can converge more quickly than other incremental exploration strategies using confidence estimates on action-values., Comment: Neural Information Processing Systems 2018
Published: 2018

6. General Value Function Networks

Author: Schlegel, Matthew, Jacobsen, Andrew, Abbas, Zaheer, Patterson, Andrew, White, Adam, and White, Martha
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence, to facilitate accurate predictions and decision-making. At the same time, specifying and training RNNs is notoriously tricky, particularly as the common strategy to approximate gradients back in time, called truncated Back-prop Through Time (BPTT), can be sensitive to the truncation window. Further, domain-expertise--which can usually help constrain the function class and so improve trainability--can be difficult to incorporate into complex recurrent units used within RNNs. In this work, we explore how to use multi-step predictions to constrain the RNN and incorporate prior knowledge. In particular, we revisit the idea of using predictions to construct state and ask: does constraining (parts of) the state to consist of predictions about the future improve RNN trainability? We formulate a novel RNN architecture, called a General Value Function Network (GVFN), where each internal state component corresponds to a prediction about the future represented as a value function. We first provide an objective for optimizing GVFNs, and derive several algorithms to optimize this objective. We then show that GVFNs are more robust to the truncation level, in many cases only requiring one-step gradient updates., Comment: Published in the Journal of Artificial Intelligence Research
Published: 2018
Full Text: View/download PDF

7. Flame Retardant Emissions from Spray Polyurethane Foam Insulation

Author: Poppendieck, Dustin, primary, Schlegel, Matthew, additional, Connor, Angelica, additional, and Blickley, Adam, additional
Published: 2017
Full Text: View/download PDF

8. General Value Function Networks

Author: Schlegel, Matthew, primary, Jacobsen, Andrew, additional, Abbas, Zaheer, additional, Patterson, Andrew, additional, White, Adam, additional, and White, Martha, additional
Published: 2021
Full Text: View/download PDF

9. Meta-Descent for Online, Continual Prediction

Author: Jacobsen, Andrew, primary, Schlegel, Matthew, additional, Linke, Cameron, additional, Degris, Thomas, additional, White, Adam, additional, and White, Martha, additional
Published: 2019
Full Text: View/download PDF

10. Leveraging Off-Policy Prediction in Recurrent Networks for Reinforcement Learning

Author: Schlegel, Matthew K
Subjects: Reinforcement Learning, Partial Observability, Recurrent Neural Networks, General Value Functions, General Value Function Networks, Off-policy Prediction
Abstract: Abstract: Partial observability---when the senses lack enough detail to make an optimal decision---is the reality of any decision making agent acting in the real world. While an agent could be made to make due with its available senses, taking advantage of the history of senses can provide more context and enable the agent to make better decisions. This thesis investigates recurrent architectures to learn agent state (a summarization of the agent's history), and identifies some modifications---inspired by predictive representations of state---to enable efficient learning in (continual) reinforcement learning. First, I contribute to standard recurrent neural networks trained through back-propagation through time. This contribution provides pragmatic recommendations for incorporating action information into a recurrent architecture, and through extensive empirical investigations shows the trade-offs of several techniques. Second, I develop a recurrent predictive architecture which uses temporal abstractions---predictions in the form of general value functions---as the basis for its state representation. I show advantages of this architecture over standard recurrent networks in a continuing reinforcement learning domain, derive an objective and corresponding learning algorithm, and discuss several added concerns when using this architecture---such as discovery, what types of networks can be constructed, and off-policy prediction.
Published: 2023

11. Litigating tortious interference claims.

Author: Tucker, Donald F. and Schlegel, Matthew W.
Subjects: Tortious interference with contracts -- Cases, Labor relations -- Cases
Published: 1988

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"Schlegel, Matthew"'

1. Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

2. Continual Auxiliary Task Learning

3. Meta-descent for Online, Continual Prediction

4. Importance Resampling for Off-policy Prediction

5. Context-Dependent Upper-Confidence Bounds for Directed Exploration

6. General Value Function Networks

7. Flame Retardant Emissions from Spray Polyurethane Foam Insulation

8. General Value Function Networks

9. Meta-Descent for Online, Continual Prediction

10. Leveraging Off-Policy Prediction in Recurrent Networks for Reinforcement Learning

11. Litigating tortious interference claims.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

11 results on '"Schlegel, Matthew"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources