1. Optimal recovery of unsecured debt via interpretable reinforcement learning
- Author
-
Michael Mark, Naveed Chehrazi, Huanxi Liu, and Thomas A. Weber
- Subjects
Reinforcement learning ,Interpretable machine learning ,Deterministic policy gradient ,Monotonicity constrained learning ,Debt recovery ,Control of Hawkes processes ,Cybernetics ,Q300-390 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
This paper addresses the issue of interpretability and auditability of reinforcement-learning agents employed in the recovery of unsecured consumer debt. To this end, we develop a deterministic policy-gradient method that allows for a natural integration of domain expertise into the learning procedure so as to encourage learning of consistent, and thus interpretable, policies. Domain knowledge can often be expressed in terms of policy monotonicity and/or convexity with respect to relevant state inputs. We augment the standard actor–critic policy approximator using a monotonically regularized loss function which integrates domain expertise into the learning. Our formulation overcomes the challenge of learning interpretable policies by constraining the search to policies satisfying structural-consistency properties. The resulting state-feedback control laws can be readily understood and implemented by human decision makers. This new domain-knowledge enhanced learning approach is applied to the problem of optimal debt recovery which features a controlled Hawkes process and an asynchronous action–feedback relationship.
- Published
- 2022
- Full Text
- View/download PDF