Author: "Yang, Zhuoran" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yang, Zhuoran"' showing total 647 results

Start Over Author "Yang, Zhuoran"

647 results on '"Yang, Zhuoran"'

201. A modified MERR model for predicting mode II fracture initiation angle considering boundary constraint effect

Author: Xia, Yan, primary, Yao, Chengbin, additional, Zhu, Zhongmeng, additional, Yang, Zhuoran, additional, and Jiang, Han, additional
Published: 2023
Full Text: View/download PDF

202. Cohesive zone model to investigate complex soft adhesive failure: state-of-the-art review

Author: Yang, Zhuoran, primary, Xia, Yan, additional, Zhu, Zhongmeng, additional, Yao, Chengbin, additional, and Jiang, Han, additional
Published: 2023
Full Text: View/download PDF

203. Being Trustworthy is Not Enough: How Untrustworthy Artificial Intelligence (AI) Can Deceive the End-Users and Gain Their Trust

Author: Banovic, Nikola, primary, Yang, Zhuoran, additional, Ramesh, Aditya, additional, and Liu, Alice, additional
Published: 2023
Full Text: View/download PDF

204. Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics

Author: Zheng, Sirui, Wang, Lingxiao, Qiu, Shuang, Fu, Zuyue, Yang, Zhuoran, Szepesvari, Csaba, Wang, Zhaoran, Zheng, Sirui, Wang, Lingxiao, Qiu, Shuang, Fu, Zuyue, Yang, Zhuoran, Szepesvari, Csaba, and Wang, Zhaoran
Published: 2023

205. Effect of thermal aging on the scratch behavior of poly (methyl methacrylate)

Author: Cheng, Qian, Jiang, Chengkai, Zhang, Jianwei, Yang, Zhuoran, Zhu, Zhongmeng, and Jiang, Han
Published: 2016
Full Text: View/download PDF

206. One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration

Author: Liu, Zhihan, Lu, Miao, Xiong, Wei, Zhong, Han, Hu, Hao, Zhang, Shenao, Zheng, Sirui, Yang, Zhuoran, and Wang, Zhaoran
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Optimization and Control (math.OC), Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Statistics - Machine Learning, FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: In online reinforcement learning (online RL), balancing exploration and exploitation is crucial for finding an optimal policy in a sample-efficient way. To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration. However, in order to cope with general function approximators, most of them involve impractical algorithmic components to incentivize exploration, such as optimization within data-dependent level-sets or complicated sampling procedures. To address this challenge, we propose an easy-to-implement RL framework called \textit{Maximize to Explore} (\texttt{MEX}), which only needs to optimize \emph{unconstrainedly} a single objective that integrates the estimation and planning components while balancing exploration and exploitation automatically. Theoretically, we prove that \texttt{MEX} achieves a sublinear regret with general function approximations for Markov decision processes (MDP) and is further extendable to two-player zero-sum Markov games (MG). Meanwhile, we adapt deep RL baselines to design practical versions of \texttt{MEX}, in both model-free and model-based manners, which can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards. Compared with existing sample-efficient online RL algorithms with general function approximations, \texttt{MEX} achieves similar sample efficiency while enjoying a lower computational cost and is more compatible with modern deep RL methods.
Published: 2023

207. False Correlation Reduction for Offline Reinforcement Learning

Author: Deng, Zhihong, Fu, Zuyue, Wang, Lingxiao, Yang, Zhuoran, Bai, Chenjia, Zhou, Tianyi, Wang, Zhaoran, and Jiang, Jing
Abstract: Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating false correlations from suboptimality. Theoretically, we justify the rationality of the proposed method and prove its convergence to the optimal policy with a sublinear rate under mild assumptions.
Published: 2024
Full Text: View/download PDF

208. Understanding Implicit Regularization in Over-Parameterized Single Index Model.

Author: Fan, Jianqing, Yang, Zhuoran, and Yu, Mengxin
Subjects: *REGULARIZATION parameter, *LOW-rank matrices, *SYMMETRIC matrices, *MATHEMATICAL regularization, *NONLINEAR functions
Abstract: In this article, we leverage over-parameterization to design regularization-free algorithms for the high-dimensional single index model and provide theoretical guarantees for the induced implicit regularization phenomenon. Specifically, we study both vector and matrix single index models where the link function is nonlinear and unknown, the signal parameter is either a sparse vector or a low-rank symmetric matrix, and the response variable can be heavy-tailed. To gain a better understanding of the role played by implicit regularization without excess technicality, we assume that the distribution of the covariates is known a priori. For both the vector and matrix settings, we construct an over-parameterized least-squares loss function by employing the score function transform and a robust truncation step designed specifically for heavy-tailed data. We propose to estimate the true parameter by applying regularization-free gradient descent to the loss function. When the initialization is close to the origin and the stepsize is sufficiently small, we prove that the obtained solution achieves minimax optimal statistical rates of convergence in both the vector and matrix cases. In addition, our experimental results support our theoretical findings and also demonstrate that our methods empirically outperform classical methods with explicit regularization in terms of both l 2 -statistical rate and variable selection consistency. for this article are available online. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

209. Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning.

Author: Ramprasad, Pratik, Li, Yuantong, Yang, Zhuoran, Wang, Zhaoran, Sun, Will Wei, and Cheng, Guang
Subjects: MACHINE learning, INFERENTIAL statistics, STOCHASTIC approximation, ONLINE education, ASYMPTOTIC normality
Abstract: The recent emergence of reinforcement learning (RL) has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for inference in online learning are restricted to settings involving independently sampled observations, while inference methods in RL have so far been limited to the batch setting. The bootstrap is a flexible and efficient approach for statistical inference in online learning algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this article, we study the use of the online bootstrap method for inference in RL policy evaluation. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic approximation under Markov noise. The method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm across a range of real RL environments. for this article are available online. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

210. In vivo evaluation of intravascular lithotripsy in a healthy porcine coronary model

Author: Yin, Jiasheng, primary, Wang, Rui, additional, Chen, Han, additional, Lu, Hao, additional, Yang, Zhuoran, additional, Xu, Fei, additional, Zang, Tongtong, additional, Liu, Chengpeng, additional, Shen, Li, additional, and Ge, Junbo, additional
Published: 2023
Full Text: View/download PDF

211. L‐Arginine‐Modified CoWO 4 /FeWO 4 S‐Scheme Heterojunction Enhances Ferroptosis Against Solid Tumor

Author: Yang, Zhuoran, primary, Yang, Chunyu, additional, Yang, Dan, additional, zhang, Ye, additional, Yang, Qingzhu, additional, Qu, Fengyu, additional, and Guo, Wei, additional
Published: 2023
Full Text: View/download PDF

212. COMPARISON OF THREE-YEAR OUTCOMES OF DRUG-COATED BALLOON ANGIOPLASTY IN TOTALLY OCCLUSIVE VS. NON-OCCLUSIVE IN-STENT RESTENOSIS OF DRUG-ELUTING STENTS

Author: Yang, Zhuoran, primary, Yin, Jiasheng, additional, Zhang, Yaqi, additional, Krishnamurthi, Nirupama, additional, Wu, Lingling, additional, Tamis-Holland, Jacqueline E., additional, and Ge, Junbo, additional
Published: 2023
Full Text: View/download PDF

213. UTILIZATION AND IN-HOSPITAL OUTCOMES OF PERCUTANEOUS LEFT ATRIAL APPENDAGE OCCLUSION IN PATIENTS WITH CANCER

Author: Zhang, Yaqi, primary, Yang, Zhuoran, additional, Soon-Shiong, Raquel, additional, Almani, Muhammad Usman, additional, Vardar, Ufuk, additional, Shoura, Sami, additional, Karki, Sadichhya, additional, Liu, Bolun, additional, and Stroger, John H., additional
Published: 2023
Full Text: View/download PDF

214. Nanomaterials as well their applications and effects in batteries

Author: Yang, Zhuoran, primary
Published: 2023
Full Text: View/download PDF

215. Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Author: Xie, Qiaomin, primary, Chen, Yudong, additional, Wang, Zhaoran, additional, and Yang, Zhuoran, additional
Published: 2023
Full Text: View/download PDF

216. A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Author: Hong, Mingyi, primary, Wai, Hoi-To, additional, Wang, Zhaoran, additional, and Yang, Zhuoran, additional
Published: 2023
Full Text: View/download PDF

217. False Correlation Reduction for Offline Reinforcement Learning

Author: Deng, Zhihong, primary, Fu, Zuyue, additional, Wang, Lingxiao, additional, Yang, Zhuoran, additional, Bai, Chenjia, additional, Zhou, Tianyi, additional, Wang, Zhaoran, additional, and Jiang, Jing, additional
Published: 2023
Full Text: View/download PDF

218. Enhanced Interfacial Shear Debonding Resistance of Soft Material Bilayers Based on Mechanical Mismatch

Author: Zhu, Zhongmeng, primary, Yang, Zhuoran, additional, Yang, Fan, additional, Yao, Chengbin, additional, and Jiang, Han, additional
Published: 2023
Full Text: View/download PDF

219. Hollow Nanooxidase Enhanced Phototherapy Against Solid Tumors

Author: Wang, Yuzhu, primary, Jia, Lu, additional, Hu, Tingting, additional, Yang, Zhuoran, additional, Yang, Chunyu, additional, Lin, Huiming, additional, Zhang, Feng, additional, Yu, Kai, additional, Qu, Fengyu, additional, and Guo, Wei, additional
Published: 2022
Full Text: View/download PDF

220. Study of temperature fields and losses in high voltage cables under different layings

Author: Lv, Lixiang, primary, Yang, Zhuoran, additional, Deng, Jingfang, additional, Rao, Huanyu, additional, Teng, Changpeng, additional, Gao, Yuan, additional, and Wang, Han, additional
Published: 2022
Full Text: View/download PDF

221. Accelerate online reinforcement learning for building HVAC control with heterogeneous expert guidances

Author: Xu, Shichao, primary, Fu, Yangyang, additional, Wang, Yixuan, additional, Yang, Zhuoran, additional, O'Neill, Zheng, additional, Wang, Zhaoran, additional, and Zhu, Qi, additional
Published: 2022
Full Text: View/download PDF

222. l‐Arginine‐Modified CoWO4/FeWO4 S‐Scheme Heterojunction Enhances Ferroptosis against Solid Tumor.

Author: Yang, Zhuoran, Yang, Chunyu, Yang, Dan, Zhang, Ye, Yang, Qingzhu, Qu, Fengyu, and Guo, Wei
Published: 2023
Full Text: View/download PDF

223. Computing Independent Variable Sets for Polynomial Ideals

Author: Yang, Zhuoran, primary and Tan, Chang, additional
Published: 2022
Full Text: View/download PDF

224. Time-temperature superposition principle for the shear fracture behaviour of soft adhesive layers: From bulk to interface

Author: Xia, Yan, primary, Zhu, Zhongmeng, additional, Yang, Zhuoran, additional, Sun, Taolin, additional, Yao, Chengbin, additional, and Jiang, Han, additional
Published: 2022
Full Text: View/download PDF

225. Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning

Author: Ramprasad, Pratik, primary, Li, Yuantong, additional, Yang, Zhuoran, additional, Wang, Zhaoran, additional, Sun, Will Wei, additional, and Cheng, Guang, additional
Published: 2022
Full Text: View/download PDF

226. Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

Author: Wu, Jibang, primary, Zhang, Zixuan, additional, Feng, Zhe, additional, Wang, Zhaoran, additional, Yang, Zhuoran, additional, Jordan, Michael I., additional, and Xu, Haifeng, additional
Published: 2022
Full Text: View/download PDF

227. Relationship Between Electrical Treeing Degradation and DCIC-Q(t) Characteristics of XLPE Insulation

Author: Wang, Heyu, primary, Li, Zhonglei, additional, Zhou, Shuofan, additional, Fan, Mingsheng, additional, Wu, You, additional, Du, Boxue, additional, and Yang, Zhuoran, additional
Published: 2022
Full Text: View/download PDF

228. Effect of Polycyclic Aromatic Compounds Content on Electrical Tree and Partial Discharge of XLPE

Author: Wang, Heyu, primary, Li, Zhonglei, additional, Fan, Mingsheng, additional, Zhou, Shuofan, additional, Wu, You, additional, Du, Boxue, additional, and Yang, Zhuoran, additional
Published: 2022
Full Text: View/download PDF

229. Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games

Author: Zhan, Wenhao, Lee, Jason D., Yang, Zhuoran, Zhan, Wenhao, Lee, Jason D., and Yang, Zhuoran
Abstract: We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents. Our goal is to develop a no-regret online learning algorithm that (i) takes actions based on the local information observed by the agent and (ii) is able to find the best policy in hindsight. For such a problem, the nonstationary state transitions due to the varying opponent pose a significant challenge. In light of a recent hardness result \citep{liu2022learning}, we focus on the setting where the opponent's previous policies are revealed to the agent for decision making. With such an information structure, we propose a new algorithm, \underline{D}ecentralized \underline{O}ptimistic hype\underline{R}policy m\underline{I}rror de\underline{S}cent (DORIS), which achieves $\sqrt{K}$-regret in the context of general function approximation, where $K$ is the number of episodes. Moreover, when all the agents adopt DORIS, we prove that their mixture policy constitutes an approximate coarse correlated equilibrium. In particular, DORIS maintains a \textit{hyperpolicy} which is a distribution over the policy space. The hyperpolicy is updated via mirror descent, where the update direction is obtained by an optimistic variant of least-squares policy evaluation. Furthermore, to illustrate the power of our method, we apply DORIS to constrained and vector-valued MDPs, which can be formulated as zero-sum Markov games with a fictitious opponent.
Published: 2022

230. Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Author: Lu, Miao, Min, Yifei, Wang, Zhaoran, Yang, Zhuoran, Lu, Miao, Min, Yifei, Wang, Zhaoran, and Yang, Zhuoran
Abstract: We study offline reinforcement learning (RL) in partially observable Markov decision processes. In particular, we aim to learn an optimal policy from a dataset collected by a behavior policy which possibly depends on the latent state. Such a dataset is confounded in the sense that the latent state simultaneously affects the action and the observation, which is prohibitive for existing offline RL algorithms. To this end, we propose the \underline{P}roxy variable \underline{P}essimistic \underline{P}olicy \underline{O}ptimization (\texttt{P3O}) algorithm, which addresses the confounding bias and the distributional shift between the optimal and behavior policies in the context of general function approximation. At the core of \texttt{P3O} is a coupled sequence of pessimistic confidence regions constructed via proximal causal inference, which is formulated as minimax estimation. Under a partial coverage assumption on the confounded dataset, we prove that \texttt{P3O} achieves a $n^{-1/2}$-suboptimality, where $n$ is the number of trajectories in the dataset. To our best knowledge, \texttt{P3O} is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset., Comment: Updates. 52 pages
Published: 2022

231. Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Author: Bai, Chenjia, Wang, Lingxiao, Yang, Zhuoran, Deng, Zhihong, Garg, Animesh, Liu, Peng, and Wang, Zhaoran
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment. Directly applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions. Previous methods tackle such problem by penalizing the Q-values of OOD actions or constraining the trained policy to be close to the behavior policy. Nevertheless, such methods typically prevent the generalization of value functions beyond the offline data and also lack precise characterization of OOD data. In this paper, we propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints. Specifically, PBRL conducts uncertainty quantification via the disagreement of bootstrapped Q-functions, and performs pessimistic updates by penalizing the value function based on the estimated uncertainty. To tackle the extrapolating error, we further propose a novel OOD sampling method. We show that such OOD sampling and pessimistic bootstrapping yields provable uncertainty quantifier in linear MDPs, thus providing the theoretical underpinning for PBRL. Extensive experiments on D4RL benchmark show that PBRL has better performance compared to the state-of-the-art algorithms., ICLR 2022
Published: 2022

232. Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

Author: Wu, Jibang, Zhang, Zixuan, Feng, Zhe, Wang, Zhaoran, Yang, Zhuoran, Jordan, Michael I., and Xu, Haifeng
Subjects: FOS: Computer and information sciences, FOS: Economics and business, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Economics - Theoretical Economics, Theoretical Economics (econ.TH), Computer Science and Game Theory (cs.GT), Machine Learning (cs.LG)
Abstract: In today's economy, it becomes important for Internet platforms to consider the sequential information design problem to align its long term interest with incentives of the gig service providers. This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs), where a sender, with informational advantage, seeks to persuade a stream of myopic receivers to take actions that maximizes the sender's cumulative utilities in a finite horizon Markovian environment with varying prior and utility functions. Planning in MPPs thus faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. Nevertheless, in the population level where the model is known, it turns out that we can efficiently determine the optimal (resp. $\epsilon$-optimal) policy with finite (resp. infinite) states and outcomes, through a modified formulation of the Bellman equation. Our main technical contribution is to study the MPP under the online reinforcement learning (RL) setting, where the goal is to learn the optimal signaling policy by interacting with with the underlying MPP, without the knowledge of the sender's utility functions, prior distributions, and the Markov transition kernels. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles. Our algorithm enjoys sample efficiency by achieving a sublinear $\sqrt{T}$-regret upper bound. Furthermore, both our algorithm and theory can be applied to MPPs with large space of outcomes and states via function approximation, and we showcase such a success under the linear setting.
Published: 2022

233. Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

Author: Chen, Xiaoyu, Zhong, Han, Yang, Zhuoran, Wang, Zhaoran, and Wang, Liwei
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer. The goal of the agent is to learn the optimal policy which is most preferred by the human overseer. Despite the empirical successes, the theoretical understanding of preference-based RL (PbRL) is only limited to the tabular case. In this paper, we propose the first optimistic model-based algorithm for PbRL with general function approximation, which estimates the model using value-targeted regression and calculates the exploratory policies by solving an optimistic planning problem. Our algorithm achieves the regret of $\tilde{O} (\operatorname{poly}(d H) \sqrt{K} )$, where $d$ is the complexity measure of the transition and preference model depending on the Eluder dimension and log-covering numbers, $H$ is the planning horizon, $K$ is the number of episodes, and $\tilde O(\cdot)$ omits logarithmic terms. Our lower bound indicates that our algorithm is near-optimal when specialized to the linear setting. Furthermore, we extend the PbRL problem by formulating a novel problem called RL with $n$-wise comparisons, and provide the first sample-efficient algorithm for this new setting. To the best of our knowledge, this is the first theoretical result for PbRL with (general) function approximation.
Published: 2022
Full Text: View/download PDF

234. The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches

Author: Velegkas, Grigoris, Yang, Zhuoran, and Karbasi, Amin
Subjects: Computer Science::Machine Learning, FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: In this paper, we study the problem of regret minimization for episodic Reinforcement Learning (RL) both in the model-free and the model-based setting. We focus on learning with general function classes and general model classes, and we derive results that scale with the eluder dimension of these classes. In contrast to the existing body of work that mainly establishes instance-independent regret guarantees, we focus on the instance-dependent setting and show that the regret scales logarithmically with the horizon $T$, provided that there is a gap between the best and the second best action in every state. In addition, we show that such a logarithmic regret bound is realizable by algorithms with $O(\log T)$ switching cost (also known as adaptivity complexity). In other words, these algorithms rarely switch their policy during the course of their execution. Finally, we complement our results with lower bounds which show that even in the tabular setting, we cannot hope for regret guarantees lower than $o(\log T)$.
Published: 2022
Full Text: View/download PDF

235. Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Author: Qiu, Shuang, Wei, Xiaohan, Ye, Jieping, Wang, Zhaoran, and Yang, Zhuoran
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight $\widetilde{\mathcal{O}}(\sqrt{K})$ regret bounds after $K$ episodes in a two-agent competitive game scenario. The regret of each agent is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is $\widetilde{\mathcal{O}}(\sqrt{K})$., Comment: ICML 2021
Published: 2022
Full Text: View/download PDF

236. Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

Author: Lyu, Boxiang, Meng, Qinglin, Qiu, Shuang, Wang, Zhaoran, Yang, Zhuoran, and Jordan, Michael I.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: Dynamic mechanism design studies how mechanism designers should allocate resources among agents in a time-varying environment. We consider the problem where the agents interact with the mechanism designer according to an unknown Markov Decision Process (MDP), where agent rewards and the mechanism designer's state evolve according to an episodic MDP with unknown reward functions and transition kernels. We focus on the online setting with linear function approximation and attempt to recover the dynamic Vickrey-Clarke-Grove (VCG) mechanism over multiple rounds of interaction. A key contribution of our work is incorporating reward-free online Reinforcement Learning (RL) to aid exploration over a rich policy space to estimate prices in the dynamic VCG mechanism. We show that the regret of our proposed method is upper bounded by $\tilde{\mathcal{O}}(T^{2/3})$ and further devise a lower bound to show that our algorithm is efficient, incurring the same $\tilde{\mathcal{O}}(T^{2 / 3})$ regret as the lower bound, where $T$ is the total number of rounds. Our work establishes the regret guarantee for online RL in solving dynamic mechanism design problems without prior knowledge of the underlying model., Comment: The first three authors contribute equally and are listed in alphabetical order
Published: 2022
Full Text: View/download PDF

237. Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Author: Fu, Zuyue, Qi, Zhengling, Yang, Zhuoran, Wang, Zhaoran, and Wang, Lan
Subjects: Methodology (stat.ME), FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Statistics - Methodology, Machine Learning (cs.LG)
Abstract: Motivated by the human-machine interaction such as training chatbots for improving customer satisfaction, we study human-guided human-machine interaction involving private information. We model this interaction as a two-player turn-based game, where one player (Alice, a human) guides the other player (Bob, a machine) towards a common goal. Specifically, we focus on offline reinforcement learning (RL) in this game, where the goal is to find a policy pair for Alice and Bob that maximizes their expected total rewards based on an offline dataset collected a priori. The offline setting presents two challenges: (i) We cannot collect Bob's private information, leading to a confounding bias when using standard RL methods, and (ii) a distributional mismatch between the behavior policy used to collect data and the desired policy we aim to learn. To tackle the confounding bias, we treat Bob's previous action as an instrumental variable for Alice's current decision making so as to adjust for the unmeasured confounding. We develop a novel identification result and use it to propose a new off-policy evaluation (OPE) method for evaluating policy pairs in this two-player turn-based game. To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy learning algorithm for finding a desirable policy pair for both Alice and Bob. Finally, we prove that under mild assumptions such as partial coverage of the offline data, the policy pair obtained through our method converges to the optimal one at a satisfactory rate.
Published: 2022
Full Text: View/download PDF

238. Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency

Author: Wang, Lingxiao, Cai, Qi, Yang, Zhuoran, and Wang, Zhaoran
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, FOS: Electrical engineering, electronic engineering, information engineering, Machine Learning (stat.ML), Systems and Control (eess.SY), Electrical Engineering and Systems Science - Systems and Control, Machine Learning (cs.LG)
Abstract: Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges. (i) It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon. (ii) The observation and state spaces are often continuous, which induces a sample complexity that scales exponentially with the extrinsic dimension. Addressing such challenges requires learning a minimal but sufficient representation of the observation and state histories by exploiting the structure of the POMDP. To this end, we propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.~(i) For each step, ETC learns to represent the state with a low-dimensional feature, which factorizes the transition kernel. (ii) Across multiple steps, ETC learns to represent the full history with a low-dimensional embedding, which assembles the per-step feature. We integrate (i) and (ii) in a unified framework that allows a variety of estimators (including maximum likelihood estimators and generative adversarial networks). For a class of POMDPs with a low-rank structure in the transition kernel, ETC attains an $O(1/\epsilon^2)$ sample complexity that scales polynomially with the horizon and the intrinsic dimension (that is, the rank). Here $\epsilon$ is the optimality gap. To our best knowledge, ETC is the first sample-efficient algorithm that bridges representation learning and policy optimization in POMDPs with infinite observation and state spaces.
Published: 2022
Full Text: View/download PDF

239. Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

Author: Min, Yifei, Wang, Tianhao, Xu, Ruitu, Wang, Zhaoran, Jordan, Michael I., and Yang, Zhuoran
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, FOS: Mathematics, Mathematics - Statistics Theory, Statistics Theory (math.ST), Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market. At each step, the agents are presented with a dynamical context, where the contexts determine the utilities. The planner controls the transition of the contexts to maximize the cumulative social welfare, while the agents aim to find a myopic stable matching at each step. Such a setting captures a range of applications including ridesharing platforms. We formalize the problem by proposing a reinforcement learning framework that integrates optimistic value iteration with maximum weight matching. The proposed algorithm addresses the coupled challenges of sequential exploration, matching stability, and function approximation. We prove that the algorithm achieves sublinear regret., Comment: 40 pages
Published: 2022
Full Text: View/download PDF

240. Offline Policy Optimization in RL with Variance Regularizaton

Author: Islam, Riashat, Sinha, Samarth, Bharadhwaj, Homanga, Arnob, Samin Yeasar, Yang, Zhuoran, Garg, Animesh, Wang, Zhaoran, Li, Lihong, and Precup, Doina
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms. We show that the regularizer leads to a lower bound to the offline policy optimization objective, which can help avoid over-estimation errors, and explains the benefits of our approach across a range of continuous control domains when compared to existing state-of-the-art algorithms., Comment: Old Draft, Offline RL Workshop, NeurIPS'20
Published: 2022
Full Text: View/download PDF

241. Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

Author: Lyu, Boxiang, Wang, Zhaoran, Kolar, Mladen, and Yang, Zhuoran
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: Dynamic mechanism design has garnered significant attention from both computer scientists and economists in recent years. By allowing agents to interact with the seller over multiple rounds, where agents' reward functions may change with time and are state-dependent, the framework is able to model a rich class of real-world problems. In these works, the interaction between agents and sellers is often assumed to follow a Markov Decision Process (MDP). We focus on the setting where the reward and transition functions of such an MDP are not known a priori, and we are attempting to recover the optimal mechanism using an a priori collected data set. In the setting where the function approximation is employed to handle large state spaces, with only mild assumptions on the expressiveness of the function class, we are able to design a dynamic mechanism using offline reinforcement learning algorithms. Moreover, learned mechanisms approximately have three key desiderata: efficiency, individual rationality, and truthfulness. Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set. To the best of our knowledge, our work provides the first offline RL algorithm for dynamic mechanism design without assuming uniform coverage., Comment: 52 pages
Published: 2022
Full Text: View/download PDF

242. Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL

Author: Zhang, Fengzhuo, Liu, Boyi, Wang, Kaixin, Tan, Vincent Y. F., Yang, Zhuoran, and Wang, Zhaoran
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Computer Science - Multiagent Systems, Machine Learning (stat.ML), Machine Learning (cs.LG), Multiagent Systems (cs.MA)
Abstract: The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents. Our improved generalization bound may be of independent interest and is applicable to other regression problems related to the transformer beyond MARL.
Published: 2022
Full Text: View/download PDF

243. Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

Author: Fu, Zuyue, Qi, Zhengling, Wang, Zhaoran, Yang, Zhuoran, Xu, Yanxun, and Kosorok, Michael R.
Subjects: Methodology (stat.ME), FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Methodology, Machine Learning (cs.LG)
Abstract: We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment. To tackle the above challenges, we study the policy learning in the confounded MDPs with the aid of instrumental variables. Specifically, we first establish value function (VF)-based and marginalized importance sampling (MIS)-based identification results for the expected total reward in the confounded MDPs. Then by leveraging pessimism and our identification results, we propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy under minimal data coverage and modeling assumptions. Lastly, our extensive theoretical investigations and one numerical study motivated by the kidney transplantation demonstrate the promising performance of the proposed methods.
Published: 2022
Full Text: View/download PDF

244. Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

Author: Wang, Yixuan, Zhan, Simon Sinong, Jiao, Ruochen, Wang, Zhilu, Jin, Wanxin, Yang, Zhuoran, Wang, Zhaoran, Huang, Chao, and Zhu, Qi
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, FOS: Electrical engineering, electronic engineering, information engineering, Systems and Control (eess.SY), Electrical Engineering and Systems Science - Systems and Control, Machine Learning (cs.LG)
Abstract: It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions. Many popular safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain the expectation of cumulative cost under a threshold. However, it is often difficult to effectively capture and enforce hard reachability-based safety constraints indirectly with such constraints on safety violation costs. In this work, we leverage the notion of barrier function to explicitly encode the hard safety constraints, and given that the environment is unknown, relax them to our design of \emph{generative-model-based soft barrier functions}. Based on such soft barriers, we propose a safe RL approach that can jointly learn the environment and optimize the control policy, while effectively avoiding unsafe regions with safety probability optimization. Experiments on a set of examples demonstrate that our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations., Comment: Accepted to ICML 2023
Published: 2022
Full Text: View/download PDF

245. Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

Author: Zhong, Han, Xiong, Wei, Tan, Jiyuan, Wang, Liwei, Zhang, Tong, Wang, Zhaoran, and Yang, Zhuoran
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori. When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving. We propose a pessimism-based algorithm, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a policy pair by solving NEs based on the two value functions. Furthermore, we establish a data-dependent upper bound on the suboptimality which recovers a sublinear rate without the assumption on uniform coverage of the dataset. We also prove an information-theoretical lower bound, which suggests that the data-dependent term in the upper bound is intrinsic. Our theoretical results also highlight a notion of "relative uncertainty", which characterizes the necessary and sufficient condition for achieving sample efficiency in offline MGs. To the best of our knowledge, we provide the first nearly minimax optimal result for offline MGs with function approximation.
Published: 2022
Full Text: View/download PDF

246. Study on Surface Discharge Characteristics of GO-Doped Epoxy Resin–LN2 Composite Insulation

Author: Xing, Yunqi, primary, Chen, Yuanyuan, additional, Yuan, Ruiyi, additional, Yang, Zhuoran, additional, Yao, Tianyi, additional, Li, Jiehua, additional, Zhu, Wenbo, additional, and Wang, Xiaoxue, additional
Published: 2022
Full Text: View/download PDF

247. Electrostatically Controlled ex Situ and in Situ Polymerization of Diacetylene-Containing Peptide Amphiphiles in Living Cells

Author: Lv, Niannian, primary, Yin, Xiaoyan, additional, Yang, Zhuoran, additional, Ma, Teng, additional, Qin, Huimin, additional, Xiong, Bijin, additional, Jiang, Hao, additional, and Zhu, Jintao, additional
Published: 2022
Full Text: View/download PDF

248. Dynamic multifunctional devices enabled by ultrathin metal nanocoatings with optical/photothermal and morphological versatility

Author: Zeng, Songshan, primary, Yang, Zhuoran, additional, Hou, Zaili, additional, Park, Cheonjin, additional, Jones, Michael D., additional, Ding, Hao, additional, Shen, Kuangyu, additional, Smith, Andrew T., additional, Jin, Henry X., additional, Wang, Bing, additional, Jiang, Han, additional, and Sun, Luyi, additional
Published: 2022
Full Text: View/download PDF

249. Investigating Inter/Intralayer Interface-Triggered Toughening Mechanisms of Three-Dimensional Printed Polylactic Acid Using Double-Notch Four-Point-Bending Method

Author: Chen, Kang, primary, Zhu, Zhongmeng, additional, Yang, Zhuoran, additional, Xia, Yan, additional, Sun, Yuzhou, additional, Liu, Tianyuan, additional, Cheng, Qian, additional, Yao, Chengbin, additional, and Jiang, Han, additional
Published: 2022
Full Text: View/download PDF

250. Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

Author: Fei, Yingjie, Yang, Zhuoran, Chen, Yudong, and Wang, Zhaoran
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap. To remedy these deficiencies, we investigate a simple transformation of the risk-sensitive Bellman equations, which we call the exponential Bellman equation. The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism. We show that these analytic and algorithmic innovations together lead to improved regret upper bounds over existing ones.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

647 results on '"Yang, Zhuoran"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources