Author: "Wu, Jibang" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wu, Jibang"' showing total 18 results

Start Over Author "Wu, Jibang"

18 results on '"Wu, Jibang"'

1. Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

Author: Wu, Jibang, Chen, Siyu, Wang, Mengdi, Wang, Huazheng, and Xu, Haifeng
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Economics - Theoretical Economics
Abstract: The agency problem emerges in today's large scale machine learning tasks, where the learners are unable to direct content creation or enforce data collection. In this work, we propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. The problem, termed \emph{contractual reinforcement learning}, naturally arises from the classic model of Markov decision processes, where a learning principal seeks to optimally influence the agent's action policy for their common interests through a set of payment rules contingent on the realization of next state. For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent. For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation, reducing the complexity analysis to the construction of efficient search algorithms. For several natural classes of problems, we design tailored search algorithms that provably achieve $\tilde{O}(\sqrt{T})$ regret. We also present an algorithm with $\tilde{O}(T^{2/3})$ for the general problem that improves the existing analysis in online contract design with mild technical assumptions.
Published: 2024

2. A Truth Serum for Eliciting Self-Evaluations in Scientific Reviews

Author: Wu, Jibang, Xu, Haifeng, Guo, Yifan, and Su, Weijie
Subjects: Computer Science - Computer Science and Game Theory, Economics - Theoretical Economics, Statistics - Applications
Abstract: This paper designs a simple, efficient and truthful mechanism to to elicit self-evaluations about items jointly owned by owners. A key application of this mechanism is to improve the peer review of large scientific conferences where a paper often has multiple authors and many authors have multiple papers. Our mechanism is designed to generate an entirely new source of review data truthfully elicited from paper owners, and can be used to augment the traditional approach of eliciting review data only from peer reviewers. Our approach starts by partitioning all submissions of a conference into disjoint blocks, each of which shares a common set of co-authors. We then elicit the ranking of the submissions from each author and employ isotonic regression to produce adjusted review scores that align with both the reported ranking and the raw review scores. Under certain conditions, truth-telling by all authors is a Nash equilibrium for any valid partition of the overlapping ownership sets. We prove that to ensure truthfulness for such isotonic regression based mechanisms, partitioning the authors into blocks and eliciting only ranking information independently from each block is necessary. This leave the optimization of block partition as the only room for maximizing the estimation efficiency of our mechanism, which is a computationally intractable optimization problem in general. Fortunately, we develop a nearly linear-time greedy algorithm that provably finds a performant partition with appealing robust approximation guarantees. Extensive experiments on both synthetic data and real-world conference review data demonstrate the effectiveness of this owner-assisted calibration mechanism.
Published: 2023

3. Robust Stackelberg Equilibria

Author: Gan, Jiarui, Han, Minbiao, Wu, Jibang, and Xu, Haifeng
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Computational Complexity, Economics - Theoretical Economics
Abstract: This paper provides a systematic study of the robust Stackelberg equilibrium (RSE), which naturally generalizes the widely adopted solution concept of the strong Stackelberg equilibrium (SSE). The RSE accounts for any possible up-to-$\delta$ suboptimal follower responses in Stackelberg games and is adopted to improve the robustness of the leader's strategy. While a few variants of robust Stackelberg equilibrium have been considered in previous literature, the RSE solution concept we consider is importantly different -- in some sense, it relaxes previously studied robust Stackelberg strategies and is applicable to much broader sources of uncertainties. We provide a thorough investigation of several fundamental properties of RSE, including its utility guarantees, algorithmics, and learnability. We first show that the RSE we defined always exists and thus is well-defined. Then we characterize how the leader's utility in RSE changes with the robustness level considered. On the algorithmic side, we show that, in sharp contrast to the tractability of computing an SSE, it is NP-hard to obtain a fully polynomial approximation scheme (FPTAS) for any constant robustness level. Nevertheless, we develop a quasi-polynomial approximation scheme (QPTAS) for RSE. Finally, we examine the learnability of the RSE in a natural learning scenario, where both players' utilities are not known in advance, and provide almost tight sample complexity results on learning the RSE. As a corollary of this result, we also obtain an algorithm for learning SSE, which strictly improves a key result of Bai et al. in terms of both utility guarantee and computational efficiency.
Published: 2023

4. Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model

Author: Chen, Siyu, Wu, Jibang, Wu, Yifan, and Yang, Zhuoran
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Economics - Theoretical Economics, Statistics - Machine Learning
Abstract: We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. Such a problem is modeled as a Stackelberg game between the principal and the agent, where the principal announces a scoring rule that specifies the payment, and then the agent then chooses an effort level that maximizes her own profit and reports the information. We study the online setting of such a problem from the principal's perspective, i.e., designing the optimal scoring rule by repeatedly interacting with the strategic agent. We design a provably sample efficient algorithm that tailors the UCB algorithm (Auer et al., 2002) to our model, which achieves a sublinear $T^{2/3}$-regret after $T$ iterations. Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized. Furthermore, a key feature of our regret bound is that it is independent of the number of states of the environment., Comment: 35 pages, adding an impossible result (Lemma 3.2) with its proof in Section D.1
Published: 2023

5. Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

Author: Wu, Jibang, Shen, Weiran, Fang, Fei, and Xu, Haifeng
Subjects: Computer Science - Computer Science and Game Theory, Economics - Theoretical Economics
Abstract: Optimizing strategic decisions (a.k.a. computing equilibrium) is key to the success of many non-cooperative multi-agent applications. However, in many real-world situations, we may face the exact opposite of this game-theoretic problem -- instead of prescribing equilibrium of a given game, we may directly observe the agents' equilibrium behaviors but want to infer the underlying parameters of an unknown game. This research question, also known as inverse game theory, has been studied in multiple recent works in the context of Stackelberg games. Unfortunately, existing works exhibit quite negative results, showing statistical hardness and computational hardness, assuming follower's perfectly rational behaviors. Our work relaxes the perfect rationality agent assumption to the classic quantal response model, a more realistic behavior model of bounded rationality. Interestingly, we show that the smooth property brought by such bounded rationality model actually leads to provably more efficient learning of the follower utility parameters in general Stackelberg games. Systematic empirical experiments on synthesized games confirm our theoretical results and further suggest its robustness beyond the strict quantal response model.
Published: 2022

6. Generalized Principal-Agency: Contracts, Information, Games and Beyond

Author: Gan, Jiarui, Han, Minbiao, Wu, Jibang, and Xu, Haifeng
Subjects: Computer Science - Computer Science and Game Theory, Economics - Theoretical Economics
Abstract: In the principal-agent problem formulated by Myerson'82, agents have private information (type) and make private decisions (action), both of which are unobservable to the principal. Myerson pointed out an elegant linear programming solution that relies on the revelation principle. This paper extends Myerson's results to a more general setting where the principal's action space can be infinite and subject to additional design constraints. Our generalized principal-agent model unifies several important design problems including contract design, information design, and Bayesian Stackelberg games, and encompasses them as special cases. We first extend the revelation principle to this general model, based on which a polynomial-time algorithm is then derived for computing the optimal mechanism for the principal. This algorithm not only implies new efficient solutions simultaneously for all the aforementioned special cases but also significantly simplifies previously known algorithms designed for special cases. Inspired by the recent interest in the algorithmic design of a single contract and menu of contracts, we study such constrained design problems to our general principal-agent model. In contrast to the above unification, our results here illustrate the other facet of diversity among different principal-agent design problems and demonstrate how their different structures can lead to different complexities: some are tractable whereas others are APX-hard. Finally, we reveal an interesting connection of our model to the problem of information acquisition for decision making and study its algorithmic properties in general.
Published: 2022

7. Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

Author: Wu, Jibang, Zhang, Zixuan, Feng, Zhe, Wang, Zhaoran, Yang, Zhuoran, Jordan, Michael I., and Xu, Haifeng
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Economics - Theoretical Economics
Abstract: In today's economy, it becomes important for Internet platforms to consider the sequential information design problem to align its long term interest with incentives of the gig service providers. This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs), where a sender, with informational advantage, seeks to persuade a stream of myopic receivers to take actions that maximizes the sender's cumulative utilities in a finite horizon Markovian environment with varying prior and utility functions. Planning in MPPs thus faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. Nevertheless, in the population level where the model is known, it turns out that we can efficiently determine the optimal (resp. $\epsilon$-optimal) policy with finite (resp. infinite) states and outcomes, through a modified formulation of the Bellman equation. Our main technical contribution is to study the MPP under the online reinforcement learning (RL) setting, where the goal is to learn the optimal signaling policy by interacting with with the underlying MPP, without the knowledge of the sender's utility functions, prior distributions, and the Markov transition kernels. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles. Our algorithm enjoys sample efficiency by achieving a sublinear $\sqrt{T}$-regret upper bound. Furthermore, both our algorithm and theory can be applied to MPPs with large space of outcomes and states via function approximation, and we showcase such a success under the linear setting.
Published: 2022

8. Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms

Author: Wu, Jibang, Xu, Haifeng, and Yao, Fan
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: Under the uncoupled learning setup, the last-iterate convergence guarantee towards Nash equilibrium is shown to be impossible in many games. This work studies the last-iterate convergence guarantee in general games toward rationalizability, a key solution concept in epistemic game theory that relaxes the stringent belief assumptions in both Nash and correlated equilibrium. This learning task naturally generalizes best arm identification problems, due to the intrinsic connections between rationalizable action profiles and the elimination of iteratively dominated actions. Despite a seemingly simple task, our first main result is a surprisingly negative one; that is, a large and natural class of no regret algorithms, including the entire family of Dual Averaging algorithms, provably take exponentially many rounds to reach rationalizability. Moreover, algorithms with the stronger no swap regret also suffer similar exponential inefficiency. To overcome these barriers, we develop a new algorithm that adjusts Exp3 with Diminishing Historical rewards (termed Exp3-DH); Exp3-DH gradually forgets history at carefully tailored rates. We prove that when all agents run Exp3-DH (a.k.a., self-play in multi-agent learning), all iteratively dominated actions can be eliminated within polynomially many rounds. Our experimental results further demonstrate the efficiency of Exp3-DH, and that state-of-the-art bandit algorithms, even those developed specifically for learning in games, fail to reach rationalizability efficiently.
Published: 2021

9. Least Square Calibration for Peer Review

Author: Tan, Sijun, Wu, Jibang, Bei, Xiaohui, and Xu, Haifeng
Subjects: Computer Science - Machine Learning, Computer Science - Social and Information Networks
Abstract: Peer review systems such as conference paper review often suffer from the issue of miscalibration. Previous works on peer review calibration usually only use the ordinal information or assume simplistic reviewer scoring functions such as linear functions. In practice, applications like academic conferences often rely on manual methods, such as open discussions, to mitigate miscalibration. It remains an important question to develop algorithms that can handle different types of miscalibrations based on available prior knowledge. In this paper, we propose a flexible framework, namely least square calibration (LSC), for selecting top candidates from peer ratings. Our framework provably performs perfect calibration from noiseless linear scoring functions under mild assumptions, yet also provides competitive calibration results when the scoring function is from broader classes beyond linear functions and with arbitrary noise. On our synthetic dataset, we empirically demonstrate that our algorithm consistently outperforms the baseline which select top papers based on the highest average ratings.
Published: 2021

10. Auctioning with Strategically Reticent Bidders

Author: Wu, Jibang, Badanidiyuru, Ashwinkumar, and Xu, Haifeng
Subjects: Computer Science - Computer Science and Game Theory, Economics - Theoretical Economics
Abstract: We propose and study a novel mechanism design setup where each bidder holds two kinds of private information: (1) type variable, which can be misreported; (2) information variable, which the bidder may want to conceal or partially reveal, but importantly, not to misreport. We refer to bidders with such behaviors as strategically reticent bidders. Among others, one direct motivation of our model is the ad auction in which many ad platforms today elicit from each bidder not only their private value per conversion but also their private information about Internet users (e.g., user activities on the advertiser's websites) in order to improve the platform's estimation of conversion rates. We show that in this new setup, it is still possible to design mechanisms that are both Incentive and Information Compatible (IIC). We develop two different black-box transformations, which convert any mechanism $\mathcal{M}$ for classic bidders to a mechanism $\bar{\mathcal{M}}$ for strategically reticent bidders, based on either outcome of expectation or expectation of outcome, respectively. We identify properties of the original mechanism $\mathcal{M}$ under which the transformation leads to IIC mechanisms $\bar{\mathcal{M}}$. Interestingly, as corollaries of these results, we show that running VCG with bidders' expected values maximizes welfare, whereas the mechanism using expected outcome of Myerson's auction maximizes revenue. Finally, we study how regulation on the auctioneer's usage of information can lead to more robust mechanisms.
Published: 2021

11. D\'ej\`a vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation

Author: Wu, Jibang, Cai, Renqin, and Wang, Hongning
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Predicting users' preferences based on their sequential behaviors in history is challenging and crucial for modern recommender systems. Most existing sequential recommendation algorithms focus on transitional structure among the sequential actions, but largely ignore the temporal and context information, when modeling the influence of a historical event to current prediction. In this paper, we argue that the influence from the past events on a user's current action should vary over the course of time and under different context. Thus, we propose a Contextualized Temporal Attention Mechanism that learns to weigh historical actions' influence on not only what action it is, but also when and how the action took place. More specifically, to dynamically calibrate the relative input dependence from the self-attention mechanism, we deploy multiple parameterized kernel functions to learn various temporal dynamics, and then use the context information to determine which of these reweighing kernels to follow for each input. In empirical evaluations on two large public recommendation datasets, our model consistently outperformed an extensive set of state-of-the-art sequential recommendation methods., Comment: Key Words: Sequential Recommendation, Self-attention mechanism, Temporal Recommendation
Published: 2020

12. Robust Stackelberg Equilibria

Author: Gan, Jiarui, primary, Han, Minbiao, additional, Wu, Jibang, additional, and Xu, Haifeng, additional
Published: 2023
Full Text: View/download PDF

13. An Isotonic Mechanism for Overlapping Ownership

Author: Wu, Jibang, Xu, Haifeng, Guo, Yifan, and Su, Weijie
Subjects: FOS: Computer and information sciences, FOS: Economics and business, Computer Science - Computer Science and Game Theory, Economics - Theoretical Economics, Theoretical Economics (econ.TH), Applications (stat.AP), Statistics - Applications, Computer Science and Game Theory (cs.GT)
Abstract: This paper extends the Isotonic Mechanism from the single-owner to multi-owner settings, in an effort to make it applicable to peer review where a paper often has multiple authors. Our approach starts by partitioning all submissions of a machine learning conference into disjoint blocks, each of which shares a common set of co-authors. We then employ the Isotonic Mechanism to elicit a ranking of the submissions from each author and to produce adjusted review scores that align with both the reported ranking and the original review scores. The generalized mechanism uses a weighted average of the adjusted scores on each block. We show that, under certain conditions, truth-telling by all authors is a Nash equilibrium for any valid partition of the overlapping ownership sets. However, we demonstrate that while the mechanism's performance in terms of estimation accuracy depends on the partition structure, optimizing this structure is computationally intractable in general. We develop a nearly linear-time greedy algorithm that provably finds a performant partition with appealing robust approximation guarantees. Extensive experiments on both synthetic data and real-world conference review data demonstrate the effectiveness of this generalized Isotonic Mechanism.
Published: 2023
Full Text: View/download PDF

14. Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

Author: Wu, Jibang, primary, Zhang, Zixuan, additional, Feng, Zhe, additional, Wang, Zhaoran, additional, Yang, Zhuoran, additional, Jordan, Michael I., additional, and Xu, Haifeng, additional
Published: 2022
Full Text: View/download PDF

15. Optimal Coordination in Generalized Principal-Agent Problems: A Revisit and Extensions

Author: Gan, Jiarui, Han, Minbiao, Wu, Jibang, and Xu, Haifeng
Subjects: FOS: Computer and information sciences, FOS: Economics and business, Computer Science - Computer Science and Game Theory, Economics - Theoretical Economics, Theoretical Economics (econ.TH), Computer Science and Game Theory (cs.GT)
Abstract: In the principal-agent problem formulated in [Myerson 1982], agents have private information (type) and make private decisions (action), both of which are unobservable to the principal. Myerson pointed out an elegant solution that relies on the revelation principle, which states that without loss of generality optimal coordination mechanisms of this problem can be assumed to be truthful and direct. Consequently, the problem can be solved by a linear program when the support sets of the action and type spaces are finite. In this paper, we extend Myerson's results to the setting where the principal's action space might be infinite and subject to additional design constraints. This generalized principal-agent model unifies several important design problems -- including contract design, information design, and Bayesian Stackelberg games -- and encompasses them as special cases. We present a revelation principle for this general model, based on which a polynomial-time algorithm is derived for computing the optimal coordination mechanism. This algorithm not only implies new efficient algorithms simultaneously for all the aforementioned special cases but also significantly simplifies previous approaches in the literature.
Published: 2022
Full Text: View/download PDF

16. Category-aware Collaborative Sequential Recommendation

Author: Cai, Renqin, primary, Wu, Jibang, additional, San, Aidan, additional, Wang, Chong, additional, and Wang, Hongning, additional
Published: 2021
Full Text: View/download PDF

17. Multi-Agent Learning for Iterative Dominance Elimination: Formal Barriers and New Algorithms

Author: Wu, Jibang, Xu, Haifeng, and Yao, Fan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems, Computer Science and Game Theory (cs.GT), Machine Learning (cs.LG), Multiagent Systems (cs.MA)
Abstract: Dominated actions are natural (and perhaps the simplest possible) multi-agent generalizations of sub-optimal actions as in standard single-agent decision making. Thus similar to standard bandit learning, a basic learning question in multi-agent systems is whether agents can learn to efficiently eliminate all dominated actions in an unknown game if they can only observe noisy bandit feedback about the payoff of their played actions. Surprisingly, despite a seemingly simple task, we show a quite negative result; that is, standard no regret algorithms -- including the entire family of Dual Averaging algorithms -- provably take exponentially many rounds to eliminate all dominated actions. Moreover, algorithms with the stronger no swap regret also suffer similar exponential inefficiency. To overcome these barriers, we develop a new algorithm that adjusts Exp3 with Diminishing Historical rewards (termed Exp3-DH); Exp3-DH gradually forgets history at carefully tailored rates. We prove that when all agents run Exp3-DH (a.k.a., self-play in multi-agent learning), all dominated actions can be iteratively eliminated within polynomially many rounds. Our experimental results further demonstrate the efficiency of Exp3-DH, and that state-of-the-art bandit algorithms, even those developed specifically for learning in games, fail to eliminate all dominated actions efficiently.
Published: 2021
Full Text: View/download PDF

18. Déjà vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation

Author: Wu, Jibang, primary, Cai, Renqin, additional, and Wang, Hongning, additional
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

18 results on '"Wu, Jibang"'

1. Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

2. A Truth Serum for Eliciting Self-Evaluations in Scientific Reviews

3. Robust Stackelberg Equilibria

4. Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model

5. Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

6. Generalized Principal-Agency: Contracts, Information, Games and Beyond

7. Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

8. Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms

9. Least Square Calibration for Peer Review

10. Auctioning with Strategically Reticent Bidders

11. D\'ej\`a vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation

12. Robust Stackelberg Equilibria

13. An Isotonic Mechanism for Overlapping Ownership

14. Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

15. Optimal Coordination in Generalized Principal-Agent Problems: A Revisit and Extensions

16. Category-aware Collaborative Sequential Recommendation

17. Multi-Agent Learning for Iterative Dominance Elimination: Formal Barriers and New Algorithms

18. Déjà vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

18 results on '"Wu, Jibang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources