418 results on '"Zhou, Xun Yu"'
Search Results
2. Reward-Directed Score-Based Diffusion Models via q-Learning
- Author
-
Gao, Xuefeng, Zha, Jiale, and Zhou, Xun Yu
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Mathematics - Optimization and Control - Abstract
We propose a new reinforcement learning (RL) formulation for training continuous-time score-based diffusion models for generative AI to generate samples that maximize reward functions while keeping the generated distributions close to the unknown target data distributions. Different from most existing studies, our formulation does not involve any pretrained model for the unknown score functions of the noise-perturbed data distributions. We present an entropy-regularized continuous-time RL problem and show that the optimal stochastic policy has a Gaussian distribution with a known covariance matrix. Based on this result, we parameterize the mean of Gaussian policies and develop an actor-critic type (little) q-learning algorithm to solve the RL problem. A key ingredient in our algorithm design is to obtain noisy observations from the unknown score function via a ratio estimator. Numerically, we show the effectiveness of our approach by comparing its performance with two state-of-the-art RL methods that fine-tune pretrained models. Finally, we discuss extensions of our RL formulation to probability flow ODE implementation of diffusion models and to conditional diffusion models.
- Published
- 2024
3. Learning to Optimally Stop Diffusion Processes, with Financial Applications
- Author
-
Dai, Min, Sun, Yu, Xu, Zuo Quan, and Zhou, Xun Yu
- Subjects
Mathematics - Optimization and Control ,Quantitative Finance - Mathematical Finance ,Quantitative Finance - Pricing of Securities - Abstract
We study optimal stopping for diffusion processes with unknown model primitives within the continuous-time reinforcement learning (RL) framework developed by Wang et al. (2020), and present applications to option pricing and portfolio choice. By penalizing the corresponding variational inequality formulation, we transform the stopping problem into a stochastic optimal control problem with two actions. We then randomize controls into Bernoulli distributions and add an entropy regularizer to encourage exploration. We derive a semi-analytical optimal Bernoulli distribution, based on which we devise RL algorithms using the martingale approach established in Jia and Zhou (2022a), and prove a policy improvement theorem. We demonstrate the effectiveness of the algorithms in pricing finite-horizon American put options and in solving Merton's problem with transaction costs, and show that both the offline and online algorithms achieve high accuracy in learning the value functions and characterizing the associated free boundaries., Comment: 35 pages, 9 figures
- Published
- 2024
4. Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems
- Author
-
Huang, Yilie, Jia, Yanwei, and Zhou, Xun Yu
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparisons between our method and those of the recent model-based stochastic LQ RL studies adapted to the state- and control-dependent volatility setting, demonstrating a better performance of the former in terms of regret bounds., Comment: 44 pages, 4 figures
- Published
- 2024
5. Reinforcement Learning for Jump-Diffusions, with Financial Applications
- Author
-
Gao, Xuefeng, Li, Lingfei, and Zhou, Xun Yu
- Subjects
Computer Science - Machine Learning ,Mathematics - Optimization and Control ,Quantitative Finance - Mathematical Finance - Abstract
We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and $q$-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. We investigate as an application the mean--variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps. Finally, we present a detailed study on applying the general theory to option hedging.
- Published
- 2024
6. Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration
- Author
-
Dai, Min, Dong, Yuchao, Jia, Yanwei, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Portfolio Management ,Computer Science - Machine Learning ,Quantitative Finance - Computational Finance - Abstract
We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. We take the reinforcement learning (RL) approach to learn optimal portfolio policies directly by exploring the unknown market, without attempting to estimate the model parameters. Based on the entropy-regularization framework for general continuous-time RL formulated in Wang et al. (2020), we propose a recursive weighting scheme on exploration that endogenously discounts the current exploration reward by the past accumulative amount of exploration. Such a recursive regularization restores the optimality of Gaussian exploration. However, contrary to the existing results, the optimal Gaussian policy turns out to be biased in general, due to the interwinding needs for hedging and for exploration. We present an asymptotic analysis of the resulting errors to show how the level of exploration affects the learned policies. Furthermore, we establish a policy improvement theorem and design several RL algorithms to learn Merton's optimal strategies. At last, we carry out both simulation and empirical studies with a stochastic volatility environment to demonstrate the efficiency and robustness of the RL algorithms in comparison to the conventional plug-in method., Comment: 43 pages, 5 figures, 3 tables
- Published
- 2023
7. Robust utility maximization with intractable claims
- Author
-
Li, Yunhong, Xu, Zuo Quan, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Mathematical Finance ,Mathematics - Optimization and Control ,Quantitative Finance - Portfolio Management ,Quantitative Finance - Risk Management ,91B28, 91G10, 35Q91 - Abstract
We study a continuous-time expected utility maximization problem in which the investor at maturity receives the value of a contingent claim in addition to the investment payoff from the financial market. The investor knows nothing about the claim other than its probability distribution, hence an ``intractable claim''. In view of the lack of necessary information about the claim, we consider a robust formulation to maximize her utility in the worst scenario. We apply the quantile formulation to solve the problem, expressing the quantile function of the optimal terminal investment income as the solution of certain variational inequalities of ordinary differential equations and obtaining the resulting optimal trading strategy. In the case of an exponential utility, the problem reduces to a (non-robust) rank--dependent utility maximization with probability distortion whose solution is available in the literature. The results can also be used to determine the utility indifference price of the intractable claim.
- Published
- 2023
8. Variable Clustering via Distributionally Robust Nodewise Regression
- Author
-
Wang, Kaizheng, Xu, Xiao, and Zhou, Xun Yu
- Subjects
Computer Science - Machine Learning ,Mathematics - Optimization and Control ,Quantitative Finance - Computational Finance ,Quantitative Finance - Portfolio Management ,Quantitative Finance - Statistical Finance - Abstract
We study a multi-factor block model for variable clustering and connect it to the regularized subspace clustering by formulating a distributionally robust version of the nodewise regression. To solve the latter problem, we derive a convex relaxation, provide guidance on selecting the size of the robust region, and hence the regularization weighting parameter, based on the data, and propose an ADMM algorithm for implementation. We validate our method in an extensive simulation study. Finally, we propose and apply a variant of our method to stock return data, obtain interpretable clusters that facilitate portfolio selection and compare its out-of-sample performance with other clustering methods in an empirical study., Comment: 34 pages
- Published
- 2022
9. Naive Markowitz Policies
- Author
-
Chen, Lin and Zhou, Xun Yu
- Subjects
Quantitative Finance - Mathematical Finance ,91B28 - Abstract
We study a continuous-time Markowitz mean-variance portfolio selection model in which a naive agent, unaware of the underlying time-inconsistency, continuously reoptimizes over time. We define the resulting naive policies through the limit of discretely naive policies that are committed only in very small time intervals, and derive them analytically and explicitly. We compare naive policies with pre-committed optimal policies and with consistent planners' equilibrium policies in a Black-Scholes market, and find that the former are mean-variance inefficient starting from any given time and wealth, and always take riskier exposure than equilibrium policies.
- Published
- 2022
10. Square-root regret bounds for continuous-time episodic Markov decision processes
- Author
-
Gao, Xuefeng and Zhou, Xun Yu
- Subjects
Computer Science - Machine Learning ,Mathematics - Optimization and Control - Abstract
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. In contrast to discrete-time MDPs, the inter-transition times of a continuous-time MDP are exponentially distributed with rate parameters depending on the state--action pair at each transition. We present a learning algorithm based on the methods of value iteration and upper confidence bound. We derive an upper bound on the worst-case expected regret for the proposed algorithm, and establish a worst-case lower bound, both bounds are of the order of square-root on the number of episodes. Finally, we conduct simulation experiments to illustrate the performance of our algorithm.
- Published
- 2022
11. Choquet regularization for reinforcement learning
- Author
-
Han, Xia, Wang, Ruodu, and Zhou, Xun Yu
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Quantitative Finance - Mathematical Finance - Abstract
We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as $\epsilon$-greedy, exponential, uniform and Gaussian.
- Published
- 2022
12. $g$-Expectation of Distributions
- Author
-
Xu, Mingyu, Xu, Zuo Quan, and Zhou, Xun Yu
- Subjects
Mathematics - Probability ,Mathematics - Optimization and Control ,Quantitative Finance - Mathematical Finance ,Quantitative Finance - Risk Management - Abstract
We define $g$-expectation of a distribution as the infimum of the $g$-expectations of all the terminal random variables sharing that distribution. We present two special cases for nonlinear $g$ where the $g$-expectation of distributions can be explicitly derived. As a related problem, we introduce the notion of law-invariant $g$-expectation and provide its sufficient conditions. Examples of application in financial dynamic portfolio choice are supplied.
- Published
- 2022
13. q-Learning in Continuous Time
- Author
-
Jia, Yanwei and Zhou, Xun Yu
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Quantitative Finance - Computational Finance - Abstract
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ``q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms., Comment: 64 pages, 4 figures
- Published
- 2022
14. Logarithmic regret bounds for continuous-time average-reward Markov decision processes
- Author
-
Gao, Xuefeng and Zhou, Xun Yu
- Subjects
Computer Science - Machine Learning ,Mathematics - Optimization and Control ,Statistics - Machine Learning - Abstract
We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret lower bounds that are logarithmic in the time horizon. Moreover, we design a learning algorithm and establish a finite-time regret bound that achieves the logarithmic growth rate. Our analysis builds upon upper confidence reinforcement learning, a delicate estimation of the mean holding times, and stochastic comparison of point processes.
- Published
- 2022
15. Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
- Author
-
Jia, Yanwei and Zhou, Xun Yu
- Subjects
Computer Science - Machine Learning ,Computer Science - Computational Engineering, Finance, and Science ,Quantitative Finance - Computational Finance ,Quantitative Finance - Portfolio Management - Abstract
We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given parameterized stochastic policy as the expected integration of an auxiliary running reward function that can be evaluated using samples and the current value function. This effectively turns PG into a policy evaluation (PE) problem, enabling us to apply the martingale approach recently developed by Jia and Zhou (2021) for PE to solve our PG problem. Based on this analysis, we propose two types of the actor-critic algorithms for RL, where we learn and update value functions and policies simultaneously and alternatingly. The first type is based directly on the aforementioned representation which involves future trajectories and hence is offline. The second type, designed for online learning, employs the first-order condition of the policy gradient and turns it into martingale orthogonality conditions. These conditions are then incorporated using stochastic approximation when updating policies. Finally, we demonstrate the algorithms by simulations in two concrete examples., Comment: 52 pages, 1 figure
- Published
- 2021
16. Exploratory HJB equations and their convergence
- Author
-
Tang, Wenpin, Zhang, Paul Yuming, and Zhou, Xun Yu
- Subjects
Mathematics - Optimization and Control ,Mathematics - Analysis of PDEs ,Mathematics - Probability ,35F21, 60J60, 93E15, 93E20 - Abstract
We study the exploratory Hamilton--Jacobi--Bellman (HJB) equation arising from the entropy-regularized exploratory control problem, which was formulated by Wang, Zariphopoulou and Zhou (J. Mach. Learn. Res., 21, 2020) in the context of reinforcement learning in continuous time and space. We establish the well-posedness and regularity of the viscosity solution to the equation, as well as the convergence of the exploratory control problem to the classical stochastic control problem when the level of exploration decays to zero. We then apply the general results to the exploratory temperature control problem, which was introduced by Gao, Xu and Zhou (arXiv:2005.04057, 2020) to design an endogenous temperature schedule for simulated annealing (SA) in the context of non-convex optimization. We derive an explicit rate of convergence for this problem as exploration diminishes to zero, and find that the steady state of the optimally controlled process exists, which is however neither a Dirac mass on the global optimum nor a Gibbs measure., Comment: 31 pages
- Published
- 2021
17. Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach
- Author
-
Jia, Yanwei and Zhou, Xun Yu
- Subjects
Computer Science - Machine Learning ,Quantitative Finance - Mathematical Finance - Abstract
We propose a unified framework to study policy evaluation (PE) and the associated temporal difference (TD) methods for reinforcement learning in continuous time and space. We show that PE is equivalent to maintaining the martingale condition of a process. From this perspective, we find that the mean--square TD error approximates the quadratic variation of the martingale and thus is not a suitable objective for PE. We present two methods to use the martingale characterization for designing PE algorithms. The first one minimizes a "martingale loss function", whose solution is proved to be the best approximation of the true value function in the mean--square sense. This method interprets the classical gradient Monte-Carlo algorithm. The second method is based on a system of equations called the "martingale orthogonality conditions" with test functions. Solving these equations in different ways recovers various classical TD algorithms, such as TD($\lambda$), LSTD, and GTD. Different choices of test functions determine in what sense the resulting solutions approximate the true value function. Moreover, we prove that any convergent time-discretized algorithm converges to its continuous-time counterpart as the mesh size goes to zero, and we provide the convergence rate. We demonstrate the theoretical results and corresponding algorithms with numerical experiments and applications., Comment: 58 pages, 12 figures
- Published
- 2021
18. Variance insurance contracts
- Author
-
Chi, Yichun, Zhou, Xun Yu, and Zhuang, Sheng Chao
- Published
- 2024
- Full Text
- View/download PDF
19. Who Are I: Time Inconsistency and Intrapersonal Conflict and Reconciliation
- Author
-
He, Xue Dong and Zhou, Xun Yu
- Subjects
Mathematics - Optimization and Control ,Quantitative Finance - Mathematical Finance - Abstract
Time inconsistency is prevalent in dynamic choice problems: a plan of actions to be taken in the future that is optimal for an agent today may not be optimal for the same agent in the future. If the agent is aware of this intra-personal conflict but unable to commit herself in the future to following the optimal plan today, the rational strategy for her today is to reconcile with her future selves, namely to correctly anticipate her actions in the future and then act today accordingly. Such a strategy is named intra-personal equilibrium and has been studied since as early as in the 1950s. A rigorous treatment in continuous-time settings, however, had not been available until a decade ago. Since then, the study on intra-personal equilibrium for time-inconsistent problems in continuous time has grown rapidly. In this chapter, we review the classical results and some recent development in this literature.
- Published
- 2021
20. Asset Selection via Correlation Blockmodel Clustering
- Author
-
Tang, Wenpin, Xu, Xiao, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Portfolio Management ,Quantitative Finance - Computational Finance ,Quantitative Finance - Statistical Finance - Abstract
We aim to cluster financial assets in order to identify a small set of stocks to approximate the level of diversification of the whole universe of stocks. We develop a data-driven approach to clustering based on a correlation blockmodel in which assets in the same cluster are highly correlated with each other and, at the same time, have the same correlations with all other assets. We devise an algorithm to detect the clusters, with theoretical analysis and practical guidance. Finally, we conduct an empirical analysis to verify the performance of the algorithm., Comment: 46 pages, 9 figures and 8 tables
- Published
- 2021
21. When to Quit Gambling, if You Must!
- Author
-
Hu, Sang, Obloj, Jan, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Mathematical Finance - Abstract
We develop an approach to solve Barberis (2012)'s casino gambling model in which a gambler whose preferences are specified by the cumulative prospect theory (CPT) must decide when to stop gambling by a prescribed deadline. We assume that the gambler can assist their decision using an independent randomization, and explain why it is a reasonable assumption. The problem is inherently time-inconsistent due to the probability weighting in CPT, and we study both precommitted and naive stopping strategies. We turn the original problem into a computationally tractable mathematical program, based on which we derive an optimal precommitted rule which is randomized and Markovian. The analytical treatment enables us to make several predictions regarding a gambler's behavior, including that with randomization they may enter the casino even when allowed to play only once, that whether they will play longer once they are granted more bets depends on whether they are in a gain or at a loss, and that it is prevalent that a naivite never stops loss., Comment: 50 pages, 12 figures
- Published
- 2021
22. Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law
- Author
-
Tang, Wenpin and Zhou, Xun Yu
- Subjects
Mathematics - Probability ,Mathematics - Statistics Theory ,Statistics - Machine Learning - Abstract
We study the convergence rate of continuous-time simulated annealing $(X_t; \, t \ge 0)$ and its discretization $(x_k; \, k =0,1, \ldots)$ for approximating the global optimum of a given function $f$. We prove that the tail probability $\mathbb{P}(f(X_t) > \min f +\delta)$ (resp. $\mathbb{P}(f(x_k) > \min f +\delta)$) decays polynomial in time (resp. in cumulative step size), and provide an explicit rate as a function of the model parameters. Our argument applies the recent development on functional inequalities for the Gibbs measure at low temperatures -- the Eyring-Kramers law. In the discrete setting, we obtain a condition on the step size to ensure the convergence., Comment: 19 pages, 1 figure
- Published
- 2021
23. State-Dependent Temperature Control for Langevin Diffusions
- Author
-
Gao, Xuefeng, Xu, Zuo Quan, and Zhou, Xun Yu
- Subjects
Mathematics - Optimization and Control ,Computer Science - Machine Learning - Abstract
We study the temperature control problem for Langevin diffusions in the context of non-convex optimization. The classical optimal control of such a problem is of the bang-bang type, which is overly sensitive to errors. A remedy is to allow the diffusions to explore other temperature values and hence smooth out the bang-bang control. We accomplish this by a stochastic relaxed control formulation incorporating randomization of the temperature control and regularizing its entropy. We derive a state-dependent, truncated exponential distribution, which can be used to sample temperatures in a Langevin algorithm, in terms of the solution to an HJB partial differential equation. We carry out a numerical experiment on a one-dimensional baseline example, in which the HJB equation can be easily solved, to compare the performance of the algorithm with three other available algorithms in search of a global optimum.
- Published
- 2020
24. Variance Contracts
- Author
-
Chi, Yichun, Zhou, Xun Yu, and Zhuang, Sheng Chao
- Subjects
Quantitative Finance - Risk Management - Abstract
We study the design of an optimal insurance contract in which the insured maximizes her expected utility and the insurer limits the variance of his risk exposure while maintaining the principle of indemnity and charging the premium according to the expected value principle. We derive the optimal policy semi-analytically, which is coinsurance above a deductible when the variance bound is binding. This policy automatically satisfies the incentive-compatible condition, which is crucial to rule out ex post moral hazard. We also find that the deductible is absent if and only if the contract pricing is actuarially fair. Focusing on the actuarially fair case, we carry out comparative statics on the effects of the insured's initial wealth and the variance bound on insurance demand. Our results indicate that the expected coverage is always larger for a wealthier insured, implying that the underlying insurance is a normal good, which supports certain recent empirical findings. Moreover, as the variance constraint tightens, the insured who is prudent cedes less losses, while the insurer is exposed to less tail risk., Comment: 42 pages, 3 figures
- Published
- 2020
25. Consistent Investment of Sophisticated Rank-Dependent Utility Agents in Continuous Time
- Author
-
Hu, Ying, Jin, Hanqing, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Mathematical Finance ,Quantitative Finance - Portfolio Management ,91G10, 91B06, 91B16, 91A40 - Abstract
We study portfolio selection in a complete continuous-time market where the preference is dictated by the rank-dependent utility. As such a model is inherently time inconsistent due to the underlying probability weighting, we study the investment behavior of sophisticated consistent planners who seek (subgame perfect) intra-personal equilibrium strategies. We provide sufficient conditions under which an equilibrium strategy is a replicating portfolio of a final wealth. We derive this final wealth profile explicitly, which turns out to be in the same form as in the classical Merton model with the market price of risk process properly scaled by a deterministic function in time. We present this scaling function explicitly through the solution to a highly nonlinear and singular ordinary differential equation, whose existence of solutions is established. Finally, we give a necessary and sufficient condition for the scaling function to be smaller than 1 corresponding to an effective reduction in risk premium due to probability weighting., Comment: 44 pages, submitted already
- Published
- 2020
26. Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework
- Author
-
Wang, Haoran and Zhou, Xun Yu
- Subjects
Quantitative Finance - Portfolio Management ,Computer Science - Computational Engineering, Finance, and Science ,Computer Science - Machine Learning ,Mathematics - Optimization and Control ,91G10 - Abstract
We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations., Comment: 39 pages, 5 figures
- Published
- 2019
27. Who Are I: Time Inconsistency and Intrapersonal Conflict and Reconciliation
- Author
-
He, Xue Dong, Zhou, Xun Yu, Yin, George, editor, and Zariphopoulou, Thaleia, editor
- Published
- 2022
- Full Text
- View/download PDF
28. Failure of Smooth Pasting Principle and Nonexistence of Equilibrium Stopping Rules under Time-Inconsistency
- Author
-
Tan, Ken Seng, Wei, Wei, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Mathematical Finance ,Economics - General Economics - Abstract
This paper considers a time-inconsistent stopping problem in which the inconsistency arises from non-constant time preference rates. We show that the smooth pasting principle, the main approach that has been used to construct explicit solutions for conventional time-consistent optimal stopping problems, may fail under time-inconsistency. Specifically, we prove that the smooth pasting principle solves a time-inconsistent problem within the intra-personal game theoretic framework if and only if a certain inequality on the model primitives is satisfied. We show that the violation of this inequality can happen even for very simple non-exponential discount functions. Moreover, we demonstrate that the stopping problem does not admit any intra-personal equilibrium whenever the smooth pasting principle fails. The "negative" results in this paper caution blindly extending the classical approaches for time-consistent stopping problems to their time-inconsistent counterparts.
- Published
- 2018
29. Distributionally Robust Mean-Variance Portfolio Selection with Wasserstein Distances
- Author
-
Blanchet, Jose, Chen, Lin, and Zhou, Xun Yu
- Subjects
Statistics - Methodology ,91G10, 91G70 - Abstract
We revisit Markowitz's mean-variance portfolio selection model by considering a distributionally robust version, where the region of distributional uncertainty is around the empirical measure and the discrepancy between probability measures is dictated by the so-called Wasserstein distance. We reduce this problem into an empirical variance minimization problem with an additional regularization term. Moreover, we extend recent inference methodology in order to select the size of the distributional uncertainty as well as the associated robust target return rate in a data-driven way., Comment: 20 pages
- Published
- 2018
30. Preface: Special Issue on Optimization, Financial Engineering, Risk and Operations Management
- Author
-
Yao, David, Zhang, Shu-Zhong, and Zhou, Xun-Yu
- Published
- 2022
- Full Text
- View/download PDF
31. Asset selection via correlation blockmodel clustering
- Author
-
Tang, Wenpin, Xu, Xiao, and Zhou, Xun Yu
- Published
- 2022
- Full Text
- View/download PDF
32. General Stopping Behaviors of Naive and Non-Committed Sophisticated Agents, with Application to Probability Distortion
- Author
-
Huang, Yu-Jui, Nguyen-Huu, Adrien, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Mathematical Finance ,Economics - General Economics ,60G40, 91B06 - Abstract
We consider the problem of stopping a diffusion process with a payoff functional that renders the problem time-inconsistent. We study stopping decisions of naive agents who reoptimize continuously in time, as well as equilibrium strategies of sophisticated agents who anticipate but lack control over their future selves' behaviors. When the state process is one dimensional and the payoff functional satisfies some regularity conditions, we prove that any equilibrium can be obtained as a fixed point of an operator. This operator represents strategic reasoning that takes the future selves' behaviors into account. We then apply the general results to the case when the agents distort probability and the diffusion process is a geometric Brownian motion. The problem is inherently time-inconsistent as the level of distortion of a same event changes over time. We show how the strategic reasoning may turn a naive agent into a sophisticated one. Moreover, we derive stopping strategies of the two types of agent for various parameter specifications of the problem, illustrating rich behaviors beyond the extreme ones such as "never-stopping" or "never-starting".
- Published
- 2017
- Full Text
- View/download PDF
33. Naïve Markowitz policies.
- Author
-
Chen, Lin and Zhou, Xun Yu
- Subjects
RISK aversion ,EQUILIBRIUM ,PLANNERS - Abstract
We study a continuous‐time Markowitz mean–variance portfolio selection model in which a naïve agent, unaware of the underlying time‐inconsistency, continuously reoptimizes over time. We define the resulting naïve policies through the limit of discretely naïve policies that are committed only in very small time intervals, and derive them analytically and explicitly. We compare naïve policies with pre‐committed optimal policies and with consistent planners' equilibrium policies in a Black–Scholes market, and find that the former achieve higher expected terminal returns than originally planned yet are mean–variance inefficient when the risk aversion level is sufficiently small, and always take strictly riskier exposure than equilibrium policies. We finally define an efficiency ratio for comparing return–risk tradeoff with the same original level of risk aversion, and show that naïve policies are always strictly less efficient than pre‐committed and equilibrium policies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Predictable Forward Performance Processes: The Binomial Case
- Author
-
Angoshtari, Bahman, Zariphopoulou, Thaleia, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Mathematical Finance ,Quantitative Finance - Portfolio Management ,Quantitative Finance - Risk Management - Abstract
We introduce a new class of forward performance processes that are endogenous and predictable with regards to an underlying market information set and, furthermore, are updated at discrete times. We analyze in detail a binomial model whose parameters are random and updated dynamically as the market evolves. We show that the key step in the construction of the associated predictable forward performance process is to solve a single-period inverse investment problem, namely, to determine, period-by-period and conditionally on the current market information, the end-time utility function from a given initial-time value function. We reduce this inverse problem to solving a functional equation and establish conditions for the existence and uniqueness of its solutions in the class of inverse marginal functions.
- Published
- 2016
35. Reinforcement Learning for Jump-Diffusions
- Author
-
Gao, Xuefeng, Li, Lingfei, Zhou, Xun Yu, Gao, Xuefeng, Li, Lingfei, and Zhou, Xun Yu
- Abstract
We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and q-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. Finally, we investigate as an application the mean-variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps.
- Published
- 2024
36. Time-Inconsistent Stochastic Linear--Quadratic Control: Characterization and Uniqueness of Equilibrium
- Author
-
Hu, Ying, Jin, Hanqing, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Portfolio Management ,Mathematics - Probability - Abstract
In this paper, we continue our study on a general time-inconsistent stochastic linear--quadratic (LQ) control problem originally formulated in [6]. We derive a necessary and sufficient condition for equilibrium controls via a flow of forward--backward stochastic differential equations. When the state is one dimensional and the coefficients in the problem are all deterministic, we prove that the explicit equilibrium control constructed in \cite{HJZ} is indeed unique. Our proof is based on the derived equivalent condition for equilibria as well as a stochastic version of the Lebesgue differentiation theorem. Finally, we show that the equilibrium strategy is unique for a mean--variance portfolio selection model in a complete financial market where the risk-free rate is a deterministic function of time but all the other market parameters are possibly stochastic processes.
- Published
- 2015
37. Evolution of the Arrow–Pratt measure of risk-tolerance for predictable forward utility processes
- Author
-
Strub, Moris S. and Zhou, Xun Yu
- Published
- 2021
- Full Text
- View/download PDF
38. Weighted discounting—On group diversity, time-inconsistency, and consequences for investment
- Author
-
Ebert, Sebastian, Wei, Wei, and Zhou, Xun Yu
- Published
- 2020
- Full Text
- View/download PDF
39. Discrete-time simulated annealing: A convergence analysis via the Eyring–Kramers law
- Author
-
Tang, Wenpin, primary, Wu, Yuhang, additional, and Zhou, Xun Yu, additional
- Published
- 2024
- Full Text
- View/download PDF
40. Two explicit Skorokhod embeddings for simple symmetric random walk
- Author
-
He, Xue Dong, Hu, Sang, Obłój, Jan, and Zhou, Xun Yu
- Published
- 2019
- Full Text
- View/download PDF
41. A Note on Indefinite Stochastic Riccati Equations
- Author
-
Qian, Zhongmin and Zhou, Xun Yu
- Subjects
Mathematics - Probability - Abstract
An indefinite stochastic Riccati Equation is a matrix-valued, highly nonlinear backward stochastic differential equation together with an algebraic, matrix positive definiteness constraint. We introduce a new approach to solve a class of such equations (including the existence of solutions) driven by one-dimensional Brownian motion. The idea is to replace the original equation by a system of BSDEs (without involving any algebraic constraint) whose existence of solutions automatically enforces the original algebraic constraint to be satisfied.
- Published
- 2012
42. Time-Inconsistent Stochastic Linear--Quadratic Control
- Author
-
Hu, Ying, Jin, Hanqing, and Zhou, Xun Yu
- Subjects
Mathematics - Optimization and Control ,Mathematics - Dynamical Systems ,Mathematics - Probability ,Quantitative Finance - Portfolio Management ,93E99, 60H10, 91B28 - Abstract
In this paper, we formulate a general time-inconsistent stochastic linear--quadratic (LQ) control problem. The time-inconsistency arises from the presence of a quadratic term of the expected state as well as a state-dependent term in the objective functional. We define an equilibrium, instead of optimal, solution within the class of open-loop controls, and derive a sufficient condition for equilibrium controls via a flow of forward--backward stochastic differential equations. When the state is one dimensional and the coefficients in the problem are all deterministic, we find an explicit equilibrium control. As an application, we then consider a mean-variance portfolio selection model in a complete financial market where the risk-free rate is a deterministic function of time but all the other market parameters are possibly stochastic processes. Applying the general sufficient condition, we obtain explicit equilibrium strategies when the risk premium is both deterministic and stochastic., Comment: 24 pages. To be submitted to SICON
- Published
- 2011
43. Optimal stopping under probability distortion
- Author
-
Xu, Zuo Quan and Zhou, Xun Yu
- Subjects
Mathematics - Probability ,Mathematics - Optimization and Control ,Quantitative Finance - Portfolio Management - Abstract
We formulate an optimal stopping problem for a geometric Brownian motion where the probability scale is distorted by a general nonlinear function. The problem is inherently time inconsistent due to the Choquet integration involved. We develop a new approach, based on a reformulation of the problem where one optimally chooses the probability distribution or quantile function of the stopped state. An optimal stopping time can then be recovered from the obtained distribution/quantile function, either in a straightforward way for several important cases or in general via the Skorokhod embedding. This approach enables us to solve the problem in a fairly general manner with different shapes of the payoff and probability distortion functions. We also discuss economical interpretations of the results. In particular, we justify several liquidation strategies widely adopted in stock trading, including those of "buy and hold", "cut loss or take profit", "cut loss and let profit run" and "sell on a percentage of historical high"., Comment: Published in at http://dx.doi.org/10.1214/11-AAP838 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- Published
- 2011
- Full Text
- View/download PDF
44. The premium of dynamic trading
- Author
-
Chiu, Chun Hung and Zhou, Xun Yu
- Subjects
Quantitative Finance - Portfolio Management - Abstract
It is well established that in a market with inclusion of a risk-free asset the single-period mean-variance efficient frontier is a straight line tangent to the risky region, a fact that is the very foundation of the classical CAPM. In this paper, it is shown that in a continuous-time market where the risky prices are described by Ito's processes and the investment opportunity set is deterministic (albeit time-varying), any efficient portfolio must involve allocation to the risk-free asset at any time. As a result, the dynamic mean-variance efficient frontier, though still a straight line, is strictly above the entire risky region. This in turn suggests a positive premium, in terms of the Sharpe ratio of the efficient frontier, arising from the dynamic trading. Another implication is that the inclusion of a risk-free asset boosts the Sharpe ratio of the efficient frontier, which again contrasts sharply with the single-period case., Comment: 24 pages, 6 figures
- Published
- 2009
45. Continuous-Time Markowitz's Model with Transaction Costs
- Author
-
Dai, Min, Xu, Zuo Quan, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Portfolio Management - Abstract
A continuous-time Markowitz's mean-variance portfolio selection problem is studied in a market with one stock, one bond, and proportional transaction costs. This is a singular stochastic control problem,inherently in a finite time horizon. With a series of transformations, the problem is turned into a so-called double obstacle problem, a well studied problem in physics and partial differential equation literature, featuring two time-varying free boundaries. The two boundaries, which define the buy, sell, and no-trade regions, are proved to be smooth in time. This in turn characterizes the optimal strategy, via a Skorokhod problem, as one that tries to keep a certain adjusted bond-stock position within the no-trade region. Several features of the optimal strategy are revealed that are remarkably different from its no-transaction-cost counterpart. It is shown that there exists a critical length in time, which is dependent on the stock excess return as well as the transaction fees but independent of the investment target and the stock volatility, so that an expected terminal return may not be achievable if the planning horizon is shorter than that critical length (while in the absence of transaction costs any expected return can be reached in an arbitrary period of time). It is further demonstrated that anyone following the optimal strategy should not buy the stock beyond the point when the time to maturity is shorter than the aforementioned critical length. Moreover, the investor would be less likely to buy the stock and more likely to sell the stock when the maturity date is getting closer. These features, while consistent with the widely accepted investment wisdom, suggest that the planning horizon is an integral part of the investment opportunities., Comment: 30 pages, 1 figure
- Published
- 2009
- Full Text
- View/download PDF
46. Robust utility maximisation with intractable claims
- Author
-
Li, Yunhong, primary, Xu, Zuo Quan, additional, and Zhou, Xun Yu, additional
- Published
- 2023
- Full Text
- View/download PDF
47. Choquet Regularization for Continuous-Time Reinforcement Learning
- Author
-
Han, Xia, primary, Wang, Ruodu, additional, and Zhou, Xun Yu, additional
- Published
- 2023
- Full Text
- View/download PDF
48. A Convex Stochastic Optimization Problem Arising from Portfolio Selection
- Author
-
Jin, Hanqing, Xu, Zuo Quan, and Zhou, Xun Yu
- Subjects
Quantitative Finance - Portfolio Management ,Mathematics - Numerical Analysis ,Mathematics - Optimization and Control ,Mathematics - Probability ,49K20 - Abstract
A continuous-time financial portfolio selection model with expected utility maximization typically boils down to solving a (static) convex stochastic optimization problem in terms of the terminal wealth, with a budget constraint. In literature the latter is solved by assuming {\it a priori} that the problem is well-posed (i.e., the supremum value is finite) and a Lagrange multiplier exists (and as a consequence the optimal solution is attainable). In this paper it is first shown, via various counter-examples, neither of these two assumptions needs to hold, and an optimal solution does not necessarily exist. These anomalies in turn have important interpretations in and impacts on the portfolio selection modeling and solutions. Relations among the non-existence of the Lagrange multiplier, the ill-posedness of the problem, and the non-attainability of an optimal solution are then investigated. Finally, explicit and easily verifiable conditions are derived which lead to finding the unique optimal solution., Comment: 15 pages
- Published
- 2007
- Full Text
- View/download PDF
49. Continuous-time mean-variance efficiency: the 80% rule
- Author
-
Li, Xun and Zhou, Xun Yu
- Subjects
Mathematics - Probability ,Quantitative Finance - Statistical Finance ,90A09 (Primary) 93E20 (Secondary) - Abstract
This paper studies a continuous-time market where an agent, having specified an investment horizon and a targeted terminal mean return, seeks to minimize the variance of the return. The optimal portfolio of such a problem is called mean-variance efficient \`{a} la Markowitz. It is shown that, when the market coefficients are deterministic functions of time, a mean-variance efficient portfolio realizes the (discounted) targeted return on or before the terminal date with a probability greater than 0.8072. This number is universal irrespective of the market parameters, the targeted return and the length of the investment horizon., Comment: Published at http://dx.doi.org/10.1214/105051606000000349 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- Published
- 2007
- Full Text
- View/download PDF
50. Interplay between dividend rate and business constraints for a financial corporation
- Author
-
Choulli, Tahir, Taksar, Michael, and Zhou, Xun Yu
- Subjects
Mathematics - Probability ,Quantitative Finance - General Finance ,91B70, 93E20. (Primary) - Abstract
We study a model of a corporation which has the possibility to choose various production/business policies with different expected profits and risks. In the model there are restrictions on the dividend distribution rates as well as restrictions on the risk the company can undertake. The objective is to maximize the expected present value of the total dividend distributions. We outline the corresponding Hamilton-Jacobi-Bellman equation, compute explicitly the optimal return function and determine the optimal policy. As a consequence of these results, the way the dividend rate and business constraints affect the optimal policy is revealed. In particular, we show that under certain relationships between the constraints and the exogenous parameters of the random processes that govern the returns, some business activities might be redundant, that is, under the optimal policy they will never be used in any scenario., Comment: Published at http://dx.doi.org/10.1214/105051604000000909 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- Published
- 2005
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.