Author: "Dann, Christoph" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Dann, Christoph"' showing total 46 results

Start Over Author "Dann, Christoph"

46 results on '"Dann, Christoph"'

1. Preserving Expert-Level Privacy in Offline Reinforcement Learning

Author: Sharma, Navodita, Vinod, Vishnu, Thakurta, Abhradeep, Agarwal, Alekh, Balle, Borja, Dann, Christoph, and Raghuveer, Aravindan
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: The offline reinforcement learning (RL) problem aims to learn an optimal policy from historical data collected by one or more behavioural policies (experts) by interacting with an environment. However, the individual experts may be privacy-sensitive in that the learnt policy may retain information about their precise choices. In some domains like personalized retrieval, advertising and healthcare, the expert choices are considered sensitive data. To provably protect the privacy of such experts, we propose a novel consensus-based expert-level differentially private offline RL training approach compatible with any existing offline RL algorithm. We prove rigorous differential privacy guarantees, while maintaining strong empirical performance. Unlike existing work in differentially private RL, we supplement the theory with proof-of-concept experiments on classic RL environments featuring large continuous state spaces, demonstrating substantial improvements over a natural baseline across multiple tasks.
Published: 2024

2. Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Author: Wang, Kaiwen, Kidambi, Rahul, Sullivan, Ryan, Agarwal, Alekh, Dann, Christoph, Michi, Andrea, Gelmi, Marco, Li, Yunxuan, Gupta, Raghav, Dubey, Avinava, Ramé, Alexandre, Ferret, Johan, Cideron, Geoffrey, Hou, Le, Yu, Hongkun, Ahmed, Amr, Mehta, Aranyak, Hussenot, Léonard, Bachem, Olivier, and Leurent, Edouard
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditional Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through extensive experiments and ablations on two summarization datasets, we show that CLP learns steerable language models that outperform and Pareto-dominate the existing approaches for multi-objective finetuning., Comment: 40 pages. Findings of EMNLP 2024
Published: 2024

3. Rate-Preserving Reductions for Blackwell Approachability

Author: Dann, Christoph, Mansour, Yishay, Mohri, Mehryar, Schneider, Jon, and Sivan, Balasubramanian
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent, in the sense that any algorithm that solves a specific Blackwell approachability instance can be converted to a sublinear regret algorithm for a specific no-regret learning instance, and vice versa. In this paper, we study a more fine-grained form of such reductions, and ask when this translation between problems preserves not only a sublinear rate of convergence, but also preserves the optimal rate of convergence. That is, in which cases does it suffice to find the optimal regret bound for a no-regret learning instance in order to find the optimal rate of convergence for a corresponding approachability instance? We show that the reduction of Abernethy et al. (2011) does not preserve rates: their reduction may reduce a $d$-dimensional approachability instance $I_1$ with optimal convergence rate $R_1$ to a no-regret learning instance $I_2$ with optimal regret-per-round of $R_2$, with $R_{2}/R_{1}$ arbitrarily large (in particular, it is possible that $R_1 = 0$ and $R_{2} > 0$). On the other hand, we show that it is possible to tightly reduce any approachability instance to an instance of a generalized form of regret minimization we call improper $\phi$-regret minimization (a variant of the $\phi$-regret minimization of Gordon et al. (2008) where the transformation functions may map actions outside of the action set). Finally, we characterize when linear transformations suffice to reduce improper $\phi$-regret minimization problems to standard classes of regret minimization problems in a rate preserving manner. We prove that some improper $\phi$-regret minimization instances cannot be reduced to either subclass of instance in this way, suggesting that approachability can capture some problems that cannot be phrased in the language of online learning.
Published: 2024

4. A Minimaximalist Approach to Reinforcement Learning from Human Feedback

Author: Swamy, Gokul, Dann, Christoph, Kidambi, Rahul, Wu, Zhiwei Steven, and Agarwal, Alekh
Subjects: Computer Science - Machine Learning
Abstract: We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training and is therefore rather simple to implement. Our approach is maximalist in that it provably handles non-Markovian, intransitive, and stochastic preferences while being robust to the compounding errors that plague offline approaches to sequential prediction. To achieve the preceding qualities, we build upon the concept of a Minimax Winner (MW), a notion of preference aggregation from the social choice theory literature that frames learning from preferences as a zero-sum game between two policies. By leveraging the symmetry of this game, we prove that rather than using the traditional technique of dueling two policies to compute the MW, we can simply have a single agent play against itself while maintaining strong convergence guarantees. Practically, this corresponds to sampling multiple trajectories from a policy, asking a preference or teacher model to compare them, and then using the proportion of wins as the reward for a particular trajectory. We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches while maintaining robustness to the intransitive and stochastic preferences that frequently occur in practice when aggregating human judgments.
Published: 2024

5. Data-Driven Online Model Selection With Regret Guarantees

Author: Pacchiano, Aldo, Dann, Christoph, and Gentile, Claudio
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies recommended by each base learner. Model selection is performed by regret balancing but, unlike the recent literature on this subject, we do not assume any prior knowledge about the base learners like candidate regret guarantees; instead, we uncover these quantities in a data-driven manner. The meta-learner is therefore able to leverage the realized regret incurred by each base learner for the learning environment at hand (as opposed to the expected regret), and single out the best such regret. We design two model selection algorithms operating with this more ambitious notion of regret and, besides proving model selection guarantees via regret balancing, we experimentally demonstrate the compelling practical benefits of dealing with actual regrets instead of candidate regret bounds.
Published: 2023

6. A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Author: Dann, Christoph, Wei, Chen-Yu, and Zimmert, Julian
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing techniques often require careful adaptation to every new problem setup, including specialised potentials and careful tuning of algorithm parameters. Yet, in domains such as linear bandits, it is still unknown if there exists an algorithm that can simultaneously obtain $O(\log(T))$ regret in the stochastic regime and $\tilde{O}(\sqrt{T})$ regret in the adversarial regime. In this work, we resolve this question positively and present a general reduction from best of both worlds to a wide family of follow-the-regularized-leader (FTRL) and online-mirror-descent (OMD) algorithms. We showcase the capability of this reduction by transforming existing algorithms that are only known to achieve worst-case guarantees into new algorithms with best-of-both-worlds guarantees in contextual bandits, graph bandits and tabular Markov decision processes.
Published: 2023

7. Best of Both Worlds Policy Optimization

Author: Dann, Christoph, Wei, Chen-Yu, and Zimmert, Julian
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Policy optimization methods are popular reinforcement learning algorithms in practice. Recent works have built theoretical foundation for them by proving $\sqrt{T}$ regret bounds even when the losses are adversarial. Such bounds are tight in the worst case but often overly pessimistic. In this work, we show that in tabular Markov decision processes (MDPs), by properly designing the regularizer, the exploration bonus and the learning rates, one can achieve a more favorable polylog$(T)$ regret when the losses are stochastic, without sacrificing the worst-case guarantee in the adversarial regime. To our knowledge, this is also the first time a gap-dependent polylog$(T)$ regret bound is shown for policy optimization. Specifically, we achieve this by leveraging a Tsallis entropy or a Shannon entropy regularizer in the policy update. Then we show that under known transitions, we can further obtain a first-order regret bound in the adversarial regime by leveraging the log-barrier regularizer.
Published: 2023

8. Pseudonorm Approachability and Applications to Regret Minimization

Author: Dann, Christoph, Mansour, Yishay, Mohri, Mehryar, Schneider, Jon, and Sivan, Balasubramanian
Subjects: Computer Science - Machine Learning
Abstract: Blackwell's celebrated approachability theory provides a general framework for a variety of learning problems, including regret minimization. However, Blackwell's proof and implicit algorithm measure approachability using the $\ell_2$ (Euclidean) distance. We argue that in many applications such as regret minimization, it is more useful to study approachability under other distance metrics, most commonly the $\ell_\infty$-metric. But, the time and space complexity of the algorithms designed for $\ell_\infty$-approachability depend on the dimension of the space of the vectorial payoffs, which is often prohibitively large. Thus, we present a framework for converting high-dimensional $\ell_\infty$-approachability problems to low-dimensional pseudonorm approachability problems, thereby resolving such issues. We first show that the $\ell_\infty$-distance between the average payoff and the approachability set can be equivalently defined as a pseudodistance between a lower-dimensional average vector payoff and a new convex set we define. Next, we develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $\ell_2$ and other norms, showing that it can be achieved via online linear optimization (OLO) over a convex set given by the Fenchel dual of the unit pseudonorm ball. We then use that to show, modulo mild normalization assumptions, that there exists an $\ell_\infty$-approachability algorithm whose convergence is independent of the dimension of the original vectorial payoff. We further show that that algorithm admits a polynomial-time complexity, assuming that the original $\ell_\infty$-distance can be computed efficiently. We also give an $\ell_\infty$-approachability algorithm whose convergence is logarithmic in that dimension using an FTRL algorithm with a maximum-entropy regularizer., Comment: To appear at ALT 2023
Published: 2023

9. Learning in POMDPs is Sample-Efficient with Hindsight Observability

Author: Lee, Jonathan N., Agarwal, Alekh, Dann, Christoph, and Zhang, Tong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability. However, in many realistic problems, more information is either revealed or can be computed during some point of the learning process. Motivated by diverse applications ranging from robotics to data center scheduling, we formulate a Hindsight Observable Markov Decision Process (HOMDP) as a POMDP where the latent states are revealed to the learner in hindsight and only during training. We introduce new algorithms for the tabular and function approximation settings that are provably sample-efficient with hindsight observability, even in POMDPs that would otherwise be statistically intractable. We give a lower bound showing that the tabular algorithm is optimal in its dependence on latent state and observation cardinalities.
Published: 2023

10. A Unified Algorithm for Stochastic Path Problems

Author: Dann, Christoph, Wei, Chen-Yu, and Zimmert, Julian
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We study reinforcement learning in stochastic path (SP) problems. The goal in these problems is to maximize the expected sum of rewards until the agent reaches a terminal state. We provide the first regret guarantees in this general problem by analyzing a simple optimistic algorithm. Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards. For SSP, we present an adaptation procedure for the case when the scale of rewards $B_\star$ is unknown. We show that there is no price for adaptation, and our regret bound matches that with a known $B_\star$. We also provide a scale adaptation procedure for the special case of stochastic longest paths (SLP) where all rewards are non-negative. However, unlike in SSP, we show through a lower bound that there is an unavoidable price for adaptation.
Published: 2022

11. A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning

Author: Dann, Christoph, Mohri, Mehryar, Zhang, Tong, and Zimmert, Julian
Subjects: Computer Science - Machine Learning
Abstract: Thompson Sampling is one of the most effective methods for contextual bandits and has been generalized to posterior sampling for certain MDP settings. However, existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs. This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees. We introduce novel proof techniques to show that under suitable conditions, the worst-case regret of our posterior sampling method matches the best known results of optimization based methods. In the linear MDP setting with dimension, the regret of our algorithm scales linearly with the dimension as compared to a quadratic dependence of the existing posterior sampling-based exploration algorithms.
Published: 2022

12. Best of Both Worlds Model Selection

Author: Pacchiano, Aldo, Dann, Christoph, and Gentile, Claudio
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a schedule that keeps the base learner's candidate regret bounds balanced until they are detected to violate their guarantees. We develop careful mis-specification tests specifically designed to blend the above model selection criterion with the ability to leverage the (potentially benign) nature of the environment. We recover the model selection guarantees of the CORRAL algorithm for adversarial environments, but with the additional benefit of achieving high probability regret bounds, specifically in the case of nested adversarial linear bandits. More importantly, our model selection results also hold simultaneously in stochastic environments under gap assumptions. These are the first theoretical results that achieve best of both world (stochastic and adversarial) guarantees while performing model selection in (linear) bandit scenarios., Comment: 10 pages in main, 43 pages appendix
Published: 2022

13. Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

Author: Dann, Christoph, Mansour, Yishay, Mohri, Mehryar, Sekhari, Ayush, and Sridharan, Karthik
Subjects: Computer Science - Machine Learning
Abstract: Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks and yet, they perform well in many others. In fact, in practice, they are often selected as the top choices, due to their simplicity. But, for what tasks do such policies succeed? Can we give theoretical guarantees for their favorable performance? These crucial questions have been scarcely investigated, despite the prominent practical importance of these policies. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration. Our results apply to value-function-based algorithms in episodic MDPs with bounded Bellman Eluder dimension. We propose a new complexity measure called myopic exploration gap, denoted by alpha, that captures a structural property of the MDP, the exploration policy and the given value function class. We show that the sample-complexity of myopic exploration scales quadratically with the inverse of this quantity, 1 / alpha^2. We further demonstrate through concrete examples that myopic exploration gap is indeed favorable in several tasks where myopic exploration succeeds, due to the corresponding dynamics and reward structure., Comment: to appear at ICML 2022
Published: 2022

14. Same Cause; Different Effects in the Brain

Author: Toneva, Mariya, Williams, Jennifer, Bollu, Anand, Dann, Christoph, and Wehbe, Leila
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Statistics - Applications
Abstract: To study information processing in the brain, neuroscientists manipulate experimental stimuli while recording participant brain activity. They can then use encoding models to find out which brain "zone" (e.g. which region of interest, volume pixel or electrophysiology sensor) is predicted from the stimulus properties. Given the assumptions underlying this setup, when stimulus properties are predictive of the activity in a zone, these properties are understood to cause activity in that zone. In recent years, researchers have used neural networks to construct representations that capture the diverse properties of complex stimuli, such as natural language or natural images. Encoding models built using these high-dimensional representations are often able to significantly predict the activity in large swathes of cortex, suggesting that the activity in all these brain zones is caused by stimulus properties captured in the representation. It is then natural to ask: "Is the activity in these different brain zones caused by the stimulus properties in the same way?" In neuroscientific terms, this corresponds to asking if these different zones process the stimulus properties in the same way. Here, we propose a new framework that enables researchers to ask if the properties of a stimulus affect two brain zones in the same way. We use simulated data and two real fMRI datasets with complex naturalistic stimuli to show that our framework enables us to make such inferences. Our inferences are strikingly consistent between the two datasets, indicating that the proposed framework is a promising new tool for neuroscientists to understand how information is processed in the brain., Comment: Accepted to CLeaR 2022
Published: 2022

15. A Model Selection Approach for Corruption Robust Reinforcement Learning

Author: Wei, Chen-Yu, Dann, Christoph, and Zimmert, Julian
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward. For finite-horizon tabular MDPs, without prior knowledge on the total amount of corruption, our algorithm achieves a regret bound of $\widetilde{\mathcal{O}}(\min\{\frac{1}{\Delta}, \sqrt{T}\}+C)$ where $T$ is the number of episodes, $C$ is the total amount of corruption, and $\Delta$ is the reward gap between the best and the second-best policy. This is the first worst-case optimal bound achieved without knowledge of $C$, improving previous results of Lykouris et al. (2021); Chen et al. (2021); Wu et al. (2021). For finite-horizon linear MDPs, we develop a computationally efficient algorithm with a regret bound of $\widetilde{\mathcal{O}}(\sqrt{(1+C)T})$, and another computationally inefficient one with $\widetilde{\mathcal{O}}(\sqrt{T}+C)$, improving the result of Lykouris et al. (2021) and answering an open question by Zhang et al. (2021b). Finally, our model selection framework can be easily applied to other settings including linear bandits, linear contextual bandits, and MDPs with general function approximation, leading to several improved or new results.
Published: 2021

16. Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

Author: Dann, Christoph, Marinov, Teodor V., Mohri, Mehryar, and Zimmert, Julian
Subjects: Computer Science - Machine Learning
Abstract: We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the insight that, in order to achieve a favorable regret, an algorithm does not need to learn how to behave optimally in states that are not reached by an optimal policy. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs. Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.
Published: 2021

17. Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

Author: Dann, Christoph, Mansour, Yishay, Mohri, Mehryar, Sekhari, Ayush, and Sridharan, Karthik
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Systems and Control
Abstract: There have been many recent advances on provably efficient Reinforcement Learning (RL) in problems with rich observation spaces. However, all these works share a strong realizability assumption about the optimal value function of the true MDP. Such realizability assumptions are often too strong to hold in practice. In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $\Pi$ that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP. Specifically, our algorithm enjoys a sample complexity bound of $\widetilde{O}\left((H^{4d} K^{3d} \log |\Pi|)/\epsilon^2\right)$ where $H$ is the length of episodes, $K$ is the number of actions and $\epsilon>0$ is the desired sub-optimality. We also provide a nearly matching lower bound for this agnostic setting that shows that the exponential dependence on rank is unavoidable, without further assumptions.
Published: 2021

18. Neural Active Learning with Performance Guarantees

Author: Awasthi, Pranjal, Dann, Christoph, Gentile, Claudio, Sekhari, Ayush, and Wang, Zhilei
Subjects: Computer Science - Machine Learning
Abstract: We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop. Since the shape of the label requesting threshold is tightly related to the complexity of the function to be learned, which is a-priori unknown, we also derive a version of the algorithm which is agnostic to any prior knowledge. This algorithm relies on a regret balancing scheme to solve the resulting online model selection problem, and is computationally efficient. We prove joint guarantees on the cumulative regret and number of requested labels which depend on the complexity of the labeling function at hand. In the linear case, these guarantees recover known minimax results of the generalization error as a function of the label complexity in a standard statistical learning setting., Comment: 30 pages
Published: 2021

19. Regret Bound Balancing and Elimination for Model Selection in Bandits and RL

Author: Pacchiano, Aldo, Dann, Christoph, Gentile, Claudio, and Bartlett, Peter
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning, Statistics - Other Statistics
Abstract: We propose a simple model selection approach for algorithms in stochastic bandit and reinforcement learning problems. As opposed to prior work that (implicitly) assumes knowledge of the optimal regret, we only require that each base algorithm comes with a candidate regret bound that may or may not hold during all rounds. In each round, our approach plays a base algorithm to keep the candidate regret bounds of all remaining base algorithms balanced, and eliminates algorithms that violate their candidate bound. We prove that the total regret of this approach is bounded by the best valid candidate regret bound times a multiplicative factor. This factor is reasonably small in several applications, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and LinUCB applied to linear bandits with different confidence parameters. We further show that, under a suitable gap-assumption, this factor only scales with the number of base algorithms and not their complexity when the number of rounds is large enough. Finally, unlike recent efforts in model selection for linear stochastic bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment, rather than a stochastic one., Comment: 57 pages
Published: 2020

20. Reinforcement Learning with Feedback Graphs

Author: Dann, Christoph, Mansour, Yishay, Mohri, Mehryar, Sekhari, Ayush, and Sridharan, Karthik
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We study episodic reinforcement learning in Markov decision processes when the agent receives additional feedback per step in the form of several transition observations. Such additional observations are available in a range of tasks through extended sensors or prior knowledge about the environment (e.g., when certain actions yield similar outcome). We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can leverage the additional feedback for more sample-efficient learning. We give a regret bound that, ignoring logarithmic factors and lower-order terms, depends only on the size of the maximum acyclic subgraph of the feedback graph, in contrast with a polynomial dependency on the number of states and actions in the absence of a feedback graph. Finally, we highlight challenges when leveraging a small dominating set of the feedback graph as compared to the bandit setting and propose a new algorithm that can use knowledge of such a dominating set for more sample-efficient learning of a near-optimal policy.
Published: 2020

21. Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Author: Keramati, Ramtin, Dann, Christoph, Tamkin, Alex, and Brunskill, Emma
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle. This method relies on a novel optimistic version of the distributional Bellman operator that moves probability mass from the lower to the upper tail of the return distribution. We prove asymptotic convergence and optimism of this operator for the tabular policy evaluation case. We further demonstrate that our algorithm finds CVaR-optimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces.
Published: 2019

22. Policy Certificates: Towards Accountable Reinforcement Learning

Author: Dann, Christoph, Li, Lihong, Wei, Wei, and Brunskill, Emma
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about the quality of their current policy before executing it, and thus have limited use in high-stakes applications like healthcare. We address this lack of accountability by proposing that algorithms output policy certificates. These certificates bound the sub-optimality and return of the policy in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further introduce two new algorithms with certificates and present a new framework for theoretical analysis that guarantees the quality of their policies and certificates. For tabular MDPs, we show that computing certificates can even improve the sample-efficiency of optimism-based exploration. As a result, one of our algorithms is the first to achieve minimax-optimal PAC bounds up to lower-order terms, and this algorithm also matches (and in some settings slightly improves upon) existing minimax regret bounds., Comment: article appearing at ICML 2019; full version including appendix
Published: 2018

23. On Oracle-Efficient PAC RL with Rich Observations

Author: Dann, Christoph, Jiang, Nan, Krishnamurthy, Akshay, Agarwal, Alekh, Langford, John, and Schapire, Robert E.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE, cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings., Comment: appeared at NeurIPS 18; full paper including appendix; updated style file
Published: 2018

24. Decoupling Learning Rules from Representations

Author: Thomas, Philip S., Dann, Christoph, and Brunskill, Emma
Subjects: Computer Science - Artificial Intelligence, Computer Science - Learning, Statistics - Machine Learning
Abstract: In the artificial intelligence field, learning often corresponds to changing the parameters of a parameterized function. A learning rule is an algorithm or mathematical expression that specifies precisely how the parameters should be changed. When creating an artificial intelligence system, we must make two decisions: what representation should be used (i.e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions. Using most learning rules, these two decisions are coupled in a subtle (and often unintentional) way. That is, using the same learning rule with two different representations that can represent the same sets of functions can result in two different outcomes. After arguing that this coupling is undesirable, particularly when using artificial neural networks, we present a method for partially decoupling these two decisions for a broad class of learning rules that span unsupervised learning, reinforcement learning, and supervised learning.
Published: 2017

25. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Author: Dann, Christoph, Lattimore, Tor, and Brunskill, Emma
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon., Comment: appears in Neural Information Processing Systems 2017
Published: 2017

26. Sample Efficient Policy Search for Optimal Stopping Domains

Author: Goel, Karan, Dann, Christoph, and Brunskill, Emma
Subjects: Computer Science - Artificial Intelligence, Computer Science - Learning
Abstract: Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible model-free policy search method that reuses data for sample efficiency by leveraging problem structure. We bound the sample complexity of our approach to guarantee uniform convergence of policy value estimates, tightening existing PAC bounds to achieve logarithmic dependence on horizon length for our setting. We also examine the benefit of our method against prevalent model-based and model-free approaches on 3 domains taken from diverse fields., Comment: To appear in IJCAI-2017
Published: 2017

27. Scaling up behavioral science interventions in online education

Author: Kizilcec, René F., Reich, Justin, Yeomans, Michael, Dann, Christoph, Brunskill, Emma, Lopez, Glenn, Turkay, Selen, Williams, Joseph Jay, and Tingley, Dustin
Published: 2020

28. Memory Lens: How Much Memory Does an Agent Use?

Author: Dann, Christoph, Hofmann, Katja, and Nowozin, Sebastian
Subjects: Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We propose a new method to study the internal memory used by reinforcement learning policies. We estimate the amount of relevant past information by estimating mutual information between behavior histories and the current action of an agent. We perform this estimation in the passive setting, that is, we do not intervene but merely observe the natural behavior of the agent. Moreover, we provide a theoretical justification for our approach by showing that it yields an implementation-independent lower bound on the minimal memory capacity of any agent that implement the observed policy. We demonstrate our approach by estimating the use of memory of DQN policies on concatenated Atari frames, demonstrating sharply different use of memory across 49 games. The study of memory as information that flows from the past to the current action opens avenues to understand and improve successful reinforcement learning algorithms., Comment: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Published: 2016

29. Thoughts on Massively Scalable Gaussian Processes

Author: Wilson, Andrew Gordon, Dann, Christoph, and Nickisch, Hannes
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence, Statistics - Methodology, Statistics - Machine Learning
Abstract: We introduce a framework and early results for massively scalable Gaussian processes (MSGP), significantly extending the KISS-GP approach of Wilson and Nickisch (2015). The MSGP framework enables the use of Gaussian processes (GPs) on billions of datapoints, without requiring distributed inference, or severe assumptions. In particular, MSGP reduces the standard $O(n^3)$ complexity of GP learning and inference to $O(n)$, and the standard $O(n^2)$ complexity per test point prediction to $O(1)$. MSGP involves 1) decomposing covariance matrices as Kronecker products of Toeplitz matrices approximated by circulant matrices. This multi-level circulant approximation allows one to unify the orthogonal computational benefits of fast Kronecker and Toeplitz approaches, and is significantly faster than either approach in isolation; 2) local kernel interpolation and inducing points to allow for arbitrarily located data inputs, and $O(1)$ test time predictions; 3) exploiting block-Toeplitz Toeplitz-block structure (BTTB), which enables fast inference and learning when multidimensional Kronecker structure is not present; and 4) projections of the input space to flexibly model correlated inputs and high dimensional data. The ability to handle many ($m \approx n$) inducing points allows for near-exact accuracy and large scale kernel learning., Comment: 25 pages, 9 figures
Published: 2015

30. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Author: Dann, Christoph and Brunskill, Emma
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Learning
Abstract: Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an upper PAC bound $\tilde O(\frac{|\mathcal S|^2 |\mathcal A| H^2}{\epsilon^2} \ln\frac 1 \delta)$ and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A| H^2}{\epsilon^2} \ln \frac 1 {\delta + c})$ that match up to log-terms and an additional linear dependency on the number of states $|\mathcal S|$. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein's inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least $H^3$., Comment: 28 pages, appeared in Neural Information Processing Systems (NIPS) 2015, updated version with fixed typos and modified Lemma 1 and Lemma C.5
Published: 2015

31. The Human Kernel

Author: Wilson, Andrew Gordon, Dann, Christoph, Lucas, Christopher G., and Xing, Eric P.
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Bayesian nonparametric models, such as Gaussian processes, provide a compelling framework for automatic statistical modelling: these models have a high degree of flexibility, and automatically calibrated complexity. However, automating human expertise remains elusive; for example, Gaussian processes with standard kernels struggle on function extrapolation problems that are trivial for human learners. In this paper, we create function extrapolation problems and acquire human responses, and then design a kernel learning framework to reverse engineer the inductive biases of human learners across a set of behavioral experiments. We use the learned kernels to gain psychological insights and to extrapolate in human-like ways that go beyond traditional stationary and polynomial kernels. Finally, we investigate Occam's razor in human and Gaussian process based function learning., Comment: 11 pages, 5 figures. To appear in Neural Information Processing Systems (NIPS) 2015. Version 2: Figure 2 (i)-(n) now displays the second set of progressive function learning experiments
Published: 2015

32. Bayesian Time-of-Flight for Realtime Shape, Illumination and Albedo

Author: Adam, Amit, Dann, Christoph, Yair, Omer, Mazor, Shai, and Nowozin, Sebastian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a computational model for shape, illumination and albedo inference in a pulsed time-of-flight (TOF) camera. In contrast to TOF cameras based on phase modulation, our camera enables general exposure profiles. This results in added flexibility and requires novel computational approaches. To address this challenge we propose a generative probabilistic model that accurately relates latent imaging conditions to observed camera responses. While principled, realtime inference in the model turns out to be infeasible, and we propose to employ efficient non-parametric regression trees to approximate the model outputs. As a result we are able to provide, for each pixel, at video frame rate, estimates and uncertainty for depth, effective albedo, and ambient light intensity. These results we present are state-of-the-art in depth imaging. The flexibility of our approach allows us to easily enrich our generative model. We demonstrate that by extending the original single-path model to a two-path model, capable of describing some multipath effects. The new model is seamlessly integrated in the system at no additional computational cost. Our work also addresses the important question of optimal exposure design in pulsed TOF systems. Finally, for benchmark purposes and to obtain realistic empirical priors of multipath and insights into this phenomena, we propose a physically accurate simulation of multipath phenomena.
Published: 2015

33. Automated matching of pipeline corrosion features from in-line inspection data

Author: Dann, Markus R. and Dann, Christoph
Published: 2017
Full Text: View/download PDF

34. Pottics – The Potts Topic Model for Semantic Image Segmentation

Author: Dann, Christoph, Gehler, Peter, Roth, Stefan, Nowozin, Sebastian, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Pinz, Axel, editor, Pock, Thomas, editor, Bischof, Horst, editor, and Leberl, Franz, editor
Published: 2012
Full Text: View/download PDF

35. A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning

Author: Dann, Christoph, Mohri, Mehryar, Zhang, Tong, Zimmert, Julian, Dann, Christoph, Mohri, Mehryar, Zhang, Tong, and Zimmert, Julian
Abstract: Thompson Sampling is one of the most effective methods for contextual bandits and has been generalized to posterior sampling for certain MDP settings. However, existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs. This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees. We introduce novel proof techniques to show that under suitable conditions, the worst-case regret of our posterior sampling method matches the best known results of optimization based methods. In the linear MDP setting with dimension, the regret of our algorithm scales linearly with the dimension as compared to a quadratic dependence of the existing posterior sampling-based exploration algorithms. © 2021 Neural information processing systems foundation. All rights reserved.
Published: 2021

36. Scaling up behavioral science interventions in online education

Author: Massachusetts Institute of Technology. Program in Comparative Media Studies/Writing, Kizilcec, René F, Reich, Justin, Yeomans, Michael, Dann, Christoph, Brunskill, Emma, Lopez, Glenn, Turkay, Selen, Williams, Joseph Jay, Tingley, Dustin, Massachusetts Institute of Technology. Program in Comparative Media Studies/Writing, Kizilcec, René F, Reich, Justin, Yeomans, Michael, Dann, Christoph, Brunskill, Emma, Lopez, Glenn, Turkay, Selen, Williams, Joseph Jay, and Tingley, Dustin
Abstract: Online education is rapidly expanding in response to rising demand for higher and continuing education, but many online students struggle to achieve their educational goals. Several behavioral science interventions have shown promise in raising student persistence and completion rates in a handful of courses, but evidence of their effectiveness across diverse educational contexts is limited. In this study, we test a set of established interventions over 2.5 y, with one-quarter million students, from nearly every country, across 247 online courses offered by Harvard, the Massachusetts Institute of Technology, and Stanford. We hypothesized that the interventions would produce medium-to-large effects as in prior studies, but this is not supported by our results. Instead, using an iterative scientific process of cyclically preregistering new hypotheses in between waves of data collection, we identified individual, contextual, and temporal conditions under which the interventions benefit students. Self-regulation interventions raised student engagement in the first few weeks but not final completion rates. Value-relevance interventions raised completion rates in developing countries to close the global achievement gap, but only in courses with a global gap. We found minimal evidence that state-of-the-art machine learning methods can forecast the occurrence of a global gap or learn effective individualized intervention policies. Scaling behavioral science interventions across various online learning contexts can reduce their average effectiveness by an order-of-magnitude. However, iterative scientific investigations can uncover what works where for whom.
Published: 2021

37. Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees

Author: Dann, Christoph
Subjects: FOS: Psychology, 170203 Knowledge Representation and Machine Learning
Abstract: Reinforcement learning (RL) focuses on an essential aspect of intelligent behavior – how an agent can learn to make good decisions given experience and rewards in a stochasticworld. Yet popular RL algorithms that have enabled exciting successes in domains with good simulators (Go, Atari, etc) still often fail to learn in other domains because they rely onsimple heuristics for exploration. This provides additional empirical justification for essential questions around RL, specifically around algorithms that learn in a provably efficient manner through strategic exploration in any considered domain. This thesis provides new algorithmsand theory that enable good performance with respect to existing theoretical frameworks for evaluating RL algorithms (specifically, probably approximately correct) and introduces new stronger evaluation criteria, that may be particularly of interest as RL is applied to more real world problems.For the first line of work on probably approximately correct (PAC) RL algorithms, we introduce a series of algorithms for episodic tabular domains with substantially better PACsample complexity bounds that culminate in a new algorithm with close to minimax optimal PAC and regret bounds. Look up tables are required by most sample efficient and computationally tractable algorithms, but cannot represent many practical domains. We therefore also present a new RL algorithm that can learn a good policy in environments with high dimensional observations and hidden deterministic states; unlike predecessors, this algorithm provablyexplores not only in a statistically but also computationally efficient manner assuming access to function classes with efficient optimization oracles. To make progress it is critical to have the right measures of success. While empiricaldemonstrations are quite clear, we find that for theoretical properties, two of the most commonly used learning frameworks, PAC guarantees and regret guarantees, each allow undesirable algorithm behavior (e.g. ignoring new observations that could improve the policy). We presenta new stronger learning framework called Uniform-PAC that unifies the existing frameworks and prevents undesirable algorithm properties. One caveat of all existing learning frameworks is that for any particular episode, we do notknow how well the algorithm will perform. To address this, we introduce the IPOC framework that requires algorithms to provide a certificate before each episode bounding how suboptimal the current policy can be. Such certifications may be of substantial interest in high stakes scenarios when an organization may wish to track or even pause an online RL system should the potential expected performance bound drop below a required expected outcome.
Published: 2020
Full Text: View/download PDF

38. Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Author: Keramati, Ramtin, primary, Dann, Christoph, additional, Tamkin, Alex, additional, and Brunskill, Emma, additional
Published: 2020
Full Text: View/download PDF

39. Sample Efficient Policy Search for Optimal Stopping Domains

Author: Goel, Karan, primary, Dann, Christoph, additional, and Brunskill, Emma, additional
Published: 2017
Full Text: View/download PDF

40. Bayesian Time-of-Flight for Realtime Shape, Illumination and Albedo

Author: Adam, Amit, primary, Dann, Christoph, additional, Yair, Omer, additional, Mazor, Shai, additional, and Nowozin, Sebastian, additional
Published: 2017
Full Text: View/download PDF

41. The Human Kernel

Author: Wilson, Andrew, Dann, Christoph, Lucas, Christopher G., and Xing, Eric P.
Subjects: Computer Science::Machine Learning
Abstract: Bayesian nonparametric models, such as Gaussian processes, provide a compelling framework for automatic statistical modelling: these models have a high degree of flexibility, and automatically calibrated complexity. However, automating human expertise remains elusive; for example, Gaussian processes with standard kernels struggle on function extrapolation problems that are trivial for human learners. In this paper, we create function extrapolation problems and acquire human responses, and then design a kernel learning framework to reverse engineer the inductive biases of human learners across a set of behavioral experiments. We use the learned kernels to gain psychological insights and to extrapolate in human-like ways that go beyond traditional stationary and polynomial kernels. Finally, we investigate Occam's razor in human and Gaussian process based function learning.
Published: 2015

42. Rlpy: A Value-Function-Based Reinforcement Learning Framework for Education and Research

Author: Massachusetts Institute of Technology. Laboratory for Information and Decision Systems, Geramifard, Alborz, Klein, Robert Henry, How, Jonathan P., Dann, Christoph, Dabney, William, Massachusetts Institute of Technology. Laboratory for Information and Decision Systems, Geramifard, Alborz, Klein, Robert Henry, How, Jonathan P., Dann, Christoph, and Dabney, William
Abstract: RLPy is an object-oriented reinforcement learning software package with a focus on valuefunction-based methods using linear function approximation and discrete actions. The framework was designed for both educational and research purposes. It provides a rich library of fine-grained, easily exchangeable components for learning agents (e.g., policies or representations of value functions), facilitating recently increased specialization in reinforcement learning. RLPy is written in Python to allow fast prototyping, but is also suitable for large-scale experiments through its built-in support for optimized numerical libraries and parallelization. Code profiling, domain visualizations, and data analysis are integrated in a self-contained package available under the Modified BSD License at http://github.com/rlpy/rlpy. All of these properties allow users to compare various reinforcement learning algorithms with little effort.
Published: 2016

43. Policy Evaluation with Temporal Differences: A Survey and Comparison (Extended Abstract)

Author: Dann, Christoph, primary, Neumann, Gerhard, additional, and Peters, Jan, additional
Published: 2015
Full Text: View/download PDF

44. RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Research.

Author: Geramifard, Alborz, Dann, Christoph, Klein, Robert H., Dabney, William, and How, Jonathan P.
Subjects: *MATHEMATICAL functions, *REINFORCEMENT learning, *DATA analysis, *COMPUTER software, *RAPID prototyping, *MACHINE learning
Abstract: RLPy is an object-oriented reinforcement learning software package with a focus on valuefunction- based methods using linear function approximation and discrete actions. The framework was designed for both educational and research purposes. It provides a rich library of fine-grained, easily exchangeable components for learning agents (e.g., policies or representations of value functions), facilitating recently increased specialization in reinforcement learning. RLPy is written in Python to allow fast prototyping, but is also suitable for large-scale experiments through its built-in support for optimized numerical libraries and parallelization. Code profiling, domain visualizations, and data analysis are integrated in a self-contained package available under the Modified BSD License at http://github.com/rlpy/rlpy. All of these properties allow users to compare various reinforcement learning algorithms with little effort. [ABSTRACT FROM AUTHOR]
Published: 2015

45. Policy Evaluation with Temporal Differences: A Survey and Comparison.

Author: Dann, Christoph, Neumann, Gerhard, and Peters, Jan
Subjects: *PROBABILITY theory, *COST functions, *SURVEYS, *MATHEMATICAL regularization, *REINFORCEMENT learning, *ESTIMATION theory
Abstract: Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches. This paper aims at making these new developments accessible in a concise overview, with foci on underlying cost functions, the off-policy scenario as well as on regularization in high dimensional feature spaces. By presenting the first extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual-gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved off-policy performance. [ABSTRACT FROM AUTHOR]
Published: 2014

46. Neural Active Learning with Performance Guarantees

Author: Awasthi, Pranjal, Dann, Christoph, Gentile, Claudio, Sekhari, Ayush, and Wang, Zhilei
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop. Since the shape of the label requesting threshold is tightly related to the complexity of the function to be learned, which is a-priori unknown, we also derive a version of the algorithm which is agnostic to any prior knowledge. This algorithm relies on a regret balancing scheme to solve the resulting online model selection problem, and is computationally efficient. We prove joint guarantees on the cumulative regret and number of requested labels which depend on the complexity of the labeling function at hand. In the linear case, these guarantees recover known minimax results of the generalization error as a function of the label complexity in a standard statistical learning setting., Comment: 30 pages

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

46 results on '"Dann, Christoph"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources