Start Over

Three Years, Two Papers, One Course Off: Optimal Nonmonetary Reward Policies

Authors :: Shivam Gupta
Wei Chen
Milind Dawande
Ganesh Janakiraman
Source :: Management Science. 69:2852-2869
Publication Year :: 2023
Publisher :: Institute for Operations Research and the Management Sciences (INFORMS), 2023.
Abstract: We consider a principal who periodically offers a fixed and costly nonmonetary reward to agents to incentivize them to invest effort over the long run. An agent’s output, as a function of his effort, is a priori uncertain and is worth a fixed per-unit value to the principal. The principal’s goal is to design an attractive reward policy that specifies how the rewards are to be given to an agent over time based on that agent’s past performance. This problem, which we denote by [Formula: see text], is motivated by practical examples from both academia (e.g., a reduced teaching load) and industry (e.g., “Supplier of the Year” awards). The following “limited-term” (LT) reward policy structure has been quite popular in practice. The principal evaluates each agent periodically; if an agent’s performance over a certain (limited) number of periods in the immediate past exceeds a predefined threshold, then the principal rewards him for a certain (limited) number of periods in the immediate future. When agents’ outputs are deterministic in their efforts, we show that there always exists an optimal policy that is an LT policy and also, obtain such a policy. When agents’ outputs are stochastic, we show that the class of LT policies may not contain any optimal policy of problem [Formula: see text] but is guaranteed to contain policies that are arbitrarily near optimal. Given any [Formula: see text], we show how to obtain an LT policy whose performance is within ϵ of that of an optimal policy. This guarantee depends crucially on the use of sufficiently long histories of the agents’ outputs. We also analyze LT policies with short histories and derive structural insights on the role played by (i) the length of the available history and (ii) the variability in the random variable governing an agent’s output. We show that the average performance of these policies is within 5% of the optimum, justifying their popularity in practice. We then introduce and analyze the class of “score-based” reward policies; we show that this class is guaranteed to contain an optimal policy and also, obtain such a policy. Finally, we analyze a generalization in which the principal has a limited number for rewards in any given period and show that the class of score-based policies, with modifications to accommodate the limited availability of the rewards, continues to contain an optimal solution for the principal. This paper was accepted by Jeannette Song, operations management. Supplemental Material: The online appendix is available at https://doi.org/10.1287/mnsc.2022.4482 .