"Wang, Weixun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wang, Weixun"' showing total 213 results

Start Over "Wang, Weixun"

213 results on '"Wang, Weixun"'

1. OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Author: Hu, Jian, Wu, Xibin, Wang, Weixun, Xianyu, Zhang, Dehao, and Cao, Yu
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present OpenRLHF, an open-source framework enabling efficient RLHF scaling. Unlike existing RLHF frameworks that co-locate four models on the same GPUs, OpenRLHF re-designs scheduling for the models beyond 70B parameters using Ray, vLLM, and DeepSpeed, leveraging improved resource utilization and diverse training approaches. Integrating seamlessly with Hugging Face, OpenRLHF provides an out-of-the-box solution with optimized algorithms and launch scripts, which ensures user-friendliness. OpenRLHF implements RLHF, DPO, rejection sampling, and other alignment techniques. Empowering state-of-the-art LLM development, OpenRLHF's code is available at \url{https://github.com/OpenRLHF/OpenRLHF}.
Published: 2024

2. The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Author: Huang, Shengyi, Noukhovitch, Michael, Hosseini, Arian, Rasul, Kashif, Wang, Weixun, and Tunstall, Lewis
Subjects: Computer Science - Machine Learning
Abstract: This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work. We create an RLHF pipeline from scratch, enumerate over 20 key implementation details, and share key insights during the reproduction. Our RLHF-trained Pythia models demonstrate significant gains in response quality that scale with model size, with our 2.8B, 6.9B models outperforming OpenAI's released 1.3B checkpoint. We publicly release the trained model checkpoints and code to facilitate further research and accelerate progress in the field (\url{https://github.com/vwxyzjn/summarize_from_feedback_details}).
Published: 2024

3. MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library

Author: Hu, Siyi, Zhong, Yifan, Gao, Minquan, Wang, Weixun, Dong, Hao, Liang, Xiaodan, Li, Zhihui, Chang, Xiaojun, and Yang, Yaodong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: A significant challenge facing researchers in the area of multi-agent reinforcement learning (MARL) pertains to the identification of a library that can offer fast and compatible development for multi-agent tasks and algorithm combinations, while obviating the need to consider compatibility issues. In this paper, we present MARLlib, a library designed to address the aforementioned challenge by leveraging three key mechanisms: 1) a standardized multi-agent environment wrapper, 2) an agent-level algorithm implementation, and 3) a flexible policy mapping strategy. By utilizing these mechanisms, MARLlib can effectively disentangle the intertwined nature of the multi-agent task and the learning process of the algorithm, with the ability to automatically alter the training strategy based on the current task's attributes. The MARLlib library's source code is publicly accessible on GitHub: \url{https://github.com/Replicable-MARL/MARLlib}.
Published: 2022

4. Off-Beat Multi-Agent Reinforcement Learning

Author: Qiu, Wei, Wang, Weixun, Wang, Rundong, An, Bo, Hu, Yujing, Obraztsova, Svetlana, Rabinovich, Zinovi, Hao, Jianye, Chen, Yingfeng, and Fan, Changjie
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent, i.e., all actions have pre-set execution durations. During execution durations, the environment changes are influenced by, but not synchronised with, action execution. Such a setting is ubiquitous in many real-world problems. However, most MARL methods assume actions are executed immediately after inference, which is often unrealistic and can lead to catastrophic failure for multi-agent coordination with off-beat actions. In order to fill this gap, we develop an algorithmic framework for MARL with off-beat actions. We then propose a novel episodic memory, LeGEM, for model-free MARL algorithms. LeGEM builds agents' episodic memories by utilizing agents' individual experiences. It boosts multi-agent learning by addressing the challenging temporal credit assignment problem raised by the off-beat actions via our novel reward redistribution scheme, alleviating the issue of non-Markovian reward. We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks. Empirical results show that LeGEM significantly boosts multi-agent coordination and achieves leading performance and improved sample efficiency., Comment: Fix typos
Published: 2022

5. A2C is a special case of PPO

Author: Huang, Shengyi, Kanervisto, Anssi, Raffin, Antonin, Wang, Weixun, Ontañón, Santiago, and Dossa, Rousslan Fernand Julien
Subjects: Computer Science - Machine Learning
Abstract: Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same models when other settings are controlled.
Published: 2022

6. Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

Author: Zhao, Jian, Zhao, Youpeng, Wang, Weixun, Yang, Mingyu, Hu, Xunhan, Zhou, Wengang, Hao, Jianye, and Li, Houqiang
Subjects: Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: Multi-agent reinforcement learning is difficult to be applied in practice, which is partially due to the gap between the simulated and real-world scenarios. One reason for the gap is that the simulated systems always assume that the agents can work normally all the time, while in practice, one or more agents may unexpectedly "crash" during the coordination process due to inevitable hardware or software failures. Such crashes will destroy the cooperation among agents, leading to performance degradation. In this work, we present a formal formulation of a cooperative multi-agent reinforcement learning system with unexpected crashes. To enhance the robustness of the system to crashes, we propose a coach-assisted multi-agent reinforcement learning framework, which introduces a virtual coach agent to adjust the crash rate during training. We design three coaching strategies and the re-sampling strategy for our coach agent. To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system. Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy. The ablation study further illustrates the effectiveness of our re-sampling strategy.
Published: 2022

7. Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework

Author: Hao, Xiaotian, Mao, Hangyu, Wang, Weixun, Yang, Yaodong, Li, Dong, Zheng, Yan, Wang, Zhen, and Hao, Jianye
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: The state space in Multiagent Reinforcement Learning (MARL) grows exponentially with the agent number. Such a curse of dimensionality results in poor scalability and low sample efficiency, inhibiting MARL for decades. To break this curse, we propose a unified agent permutation framework that exploits the permutation invariance (PI) and permutation equivariance (PE) inductive biases to reduce the multiagent state space. Our insight is that permuting the order of entities in the factored multiagent state space does not change the information. Specifically, we propose two novel implementations: a Dynamic Permutation Network (DPN) and a Hyper Policy Network (HPN). The core idea is to build separate entity-wise PI input and PE output network modules to connect the entity-factored state space and action space in an end-to-end way. DPN achieves such connections by two separate module selection networks, which consistently assign the same input module to the same input entity (guarantee PI) and assign the same output module to the same entity-related output (guarantee PE). To enhance the representation capability, HPN replaces the module selection networks of DPN with hypernetworks to directly generate the corresponding module weights. Extensive experiments in SMAC, Google Research Football and MPE validate that the proposed methods significantly boost the performance and the learning efficiency of existing MARL algorithms. Remarkably, in SMAC, we achieve 100% win rates in almost all hard and super-hard scenarios (never achieved before).
Published: 2022

8. Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

Author: Zhao, Jian, Zhang, Yue, Hu, Xunhan, Wang, Weixun, Zhou, Wengang, Hao, Jianye, Zhu, Jiangcheng, and Li, Houqiang
Subjects: Computer Science - Artificial Intelligence
Abstract: In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards. In the absence of individual reward signals, credit assignment mechanisms are usually introduced to discriminate the contributions of different agents so as to achieve effective cooperation. Recently, the value decomposition paradigm has been widely adopted to realize credit assignment, and QMIX has become the state-of-the-art solution. In this paper, we revisit QMIX from two aspects. First, we propose a new perspective on credit assignment measurement and empirically show that QMIX suffers limited discriminability on the assignment of credits to agents. Second, we propose a gradient entropy regularization with QMIX to realize a discriminative credit assignment, thereby improving the overall performance. The experiments demonstrate that our approach can comparatively improve learning efficiency and achieve better performance.
Published: 2022

9. ASN: action semantics network for multiagent reinforcement learning

Author: Yang, Tianpei, Wang, Weixun, Hao, Jianye, Taylor, Matthew E., Liu, Yong, Hao, Xiaotian, Hu, Yujing, Chen, Yingfeng, Fan, Changjie, Ren, Chunxu, Huang, Ye, Zhu, Jiangcheng, and Gao, Yang
Published: 2023
Full Text: View/download PDF

10. Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

Author: Zhou, Tianze, Zhang, Fubiao, Shao, Kun, Li, Kai, Huang, Wenhan, Luo, Jun, Wang, Weixun, Yang, Yaodong, Mao, Hangyu, Wang, Bin, Li, Dong, Liu, Wulong, and Hao, Jianye
Subjects: Computer Science - Artificial Intelligence
Abstract: Extending transfer learning to cooperative multi-agent reinforcement learning (MARL) has recently received much attention. In contrast to the single-agent setting, the coordination indispensable in cooperative MARL constrains each agent's policy. However, existing transfer methods focus exclusively on agent policy and ignores coordination knowledge. We propose a new architecture that realizes robust coordination knowledge transfer through appropriate decomposition of the overall coordination into several coordination patterns. We use a novel mixing network named level-adaptive QTransformer (LA-QTransformer) to realize agent coordination that considers credit assignment, with appropriate coordination patterns for different agents realized by a novel level-adaptive Transformer (LA-Transformer) dedicated to the transfer of coordination knowledge. In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios. Extensive experiments in StarCraft II micro-management show that LA-QTransformer together with PIT achieves superior performance compared with state-of-the-art baselines., Comment: 12 pages, 9 figures
Published: 2021

11. Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Author: Hu, Yujing, Wang, Weixun, Jia, Hangtian, Wang, Yixiang, Chen, Yingfeng, Hao, Jianye, Wu, Feng, and Fan, Changjie
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL). Existing approaches such as potential-based reward shaping normally make full use of a given shaping reward function. However, since the transformation of human knowledge into numeric reward values is often imperfect due to reasons such as human cognitive bias, completely utilizing the shaping reward function may fail to improve the performance of RL algorithms. In this paper, we consider the problem of adaptively utilizing a given shaping reward function. We formulate the utilization of shaping rewards as a bi-level optimization problem, where the lower level is to optimize policy using the shaping rewards and the upper level is to optimize a parameterized shaping weight function for true reward maximization. We formally derive the gradient of the expected true reward with respect to the shaping weight function parameters and accordingly propose three learning algorithms based on different assumptions. Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards, and meanwhile ignore unbeneficial shaping rewards or even transform them into beneficial ones., Comment: Accepted by NeurIPS2020
Published: 2020

12. Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising

Author: Hao, Xiaotian, Jin, Junqi, Hao, Jianye, Li, Jin, Wang, Weixun, Ma, Yi, Zheng, Zhenzhe, Li, Han, Xu, Jian, and Gai, Kun
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc. These practical problems usually exhibit two distinct features: large-scale and dynamic, which requires the matching algorithm to be repeatedly executed at regular intervals. However, existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. To address this issue, we propose \texttt{NeuSearcher} which leverages the knowledge learned from previously instances to solve new problem instances. Specifically, we design a multichannel graph neural network to predict the threshold of the matched edges weights, by which the search region could be significantly reduced. We further propose a parallel heuristic search algorithm to iteratively improve the solution quality until convergence. Experiments on both open and industrial datasets demonstrate that \texttt{NeuSearcher} can speed up 2 to 3 times while achieving exactly the same matching solution compared with the state-of-the-art approximation approaches., Comment: accepted by IJCAI 2020
Published: 2020

13. Efficient Deep Reinforcement Learning via Adaptive Policy Transfer

Author: Yang, Tianpei, Hao, Jianye, Meng, Zhaopeng, Zhang, Zongzhang, Hu, Yujing, Cheng, Yingfeng, Fan, Changjie, Wang, Weixun, Liu, Wulong, Wang, Zhaodong, and Peng, Jiajie
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks. Existing transfer approaches either explicitly computes the similarity between tasks or select appropriate source policies to provide guided explorations for the target task. However, how to directly optimize the target policy by alternatively utilizing knowledge from appropriate source policies without explicitly measuring the similarity is currently missing. In this paper, we propose a novel Policy Transfer Framework (PTF) to accelerate RL by taking advantage of this idea. Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it by modeling multi-policy transfer as the option learning problem. PTF can be easily combined with existing deep RL approaches. Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods in terms of learning efficiency and final performance in both discrete and continuous action spaces., Comment: Accepted by IJCAI'2020
Published: 2020

14. An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning

Author: Yang, Tianpei, Wang, Weixun, Tang, Hongyao, Hao, Jianye, Meng, Zhaopeng, Mao, Hangyu, Li, Dong, Liu, Wulong, Zhang, Chengwei, Hu, Yujing, Chen, Yingfeng, and Fan, Changjie
Subjects: Computer Science - Multiagent Systems
Abstract: Transfer Learning has shown great potential to enhance single-agent Reinforcement Learning (RL) efficiency. Similarly, Multiagent RL (MARL) can also be accelerated if agents can share knowledge with each other. However, it remains a problem of how an agent should learn from other agents. In this paper, we propose a novel Multiagent Policy Transfer Framework (MAPTF) to improve MARL efficiency. MAPTF learns which agent's policy is the best to reuse for each agent and when to terminate it by modeling multiagent policy transfer as the option learning problem. Furthermore, in practice, the option module can only collect all agent's local experiences for update due to the partial observability of the environment. While in this setting, each agent's experience may be inconsistent with each other, which may cause the inaccuracy and oscillation of the option-value's estimation. Therefore, we propose a novel option learning algorithm, the successor representation option learning to solve it by decoupling the environment dynamics from rewards and learning the option-value under each agent's preference. MAPTF can be easily combined with existing deep RL and MARL approaches, and experimental results show it significantly boosts the performance of existing methods in both discrete and continuous state spaces., Comment: Accepted by NeurIPS'2021
Published: 2020

15. KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

Author: Zhang, Peng, Hao, Jianye, Wang, Weixun, Tang, Hongyao, Ma, Yi, Duan, Yihai, and Zheng, Yan
Subjects: Computer Science - Artificial Intelligence
Abstract: Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines human suboptimal knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.
Published: 2020

16. Multi-Agent Game Abstraction via Graph Attention Neural Network

Author: Liu, Yong, Wang, Weixun, Hu, Yujing, Hao, Jianye, Chen, Xingguo, and Gao, Yang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: In large-scale multi-agent systems, the large number of agents and complex game relationship cause great difficulty for policy learning. Therefore, simplifying the learning process is an important research issue. In many multi-agent systems, the interactions between agents often happen locally, which means that agents neither need to coordinate with all other agents nor need to coordinate with others all the time. Traditional methods attempt to use pre-defined rules to capture the interaction relationship between agents. However, the methods cannot be directly used in a large-scale environment due to the difficulty of transforming the complex interactions between agents into rules. In this paper, we model the relationship between agents by a complete graph and propose a novel game abstraction mechanism based on two-stage attention network (G2ANet), which can indicate whether there is an interaction between two agents and the importance of the interaction. We integrate this detection mechanism into graph neural network-based multi-agent reinforcement learning for conducting game abstraction and propose two novel learning algorithms GA-Comm and GA-AC. We conduct experiments in Traffic Junction and Predator-Prey. The results indicate that the proposed methods can simplify the learning process and meanwhile get better asymptotic performance compared with state-of-the-art algorithms., Comment: Accepted by AAAI2020
Published: 2019

17. Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems

Author: Hao, Xiaotian, Wang, Weixun, Hao, Jianye, and Yang, Yaodong
Subjects: Computer Science - Multiagent Systems
Abstract: Many tasks in practice require the collaboration of multiple agents through reinforcement learning. In general, cooperative multiagent reinforcement learning algorithms can be classified into two paradigms: Joint Action Learners (JALs) and Independent Learners (ILs). In many practical applications, agents are unable to observe other agents' actions and rewards, making JALs inapplicable. In this work, we focus on independent learning paradigm in which each agent makes decisions based on its local observations only. However, learning is challenging in independent settings due to the local viewpoints of all agents, which perceive the world as a non-stationary environment due to the concurrently exploring teammates. In this paper, we propose a novel framework called Independent Generative Adversarial Self-Imitation Learning (IGASIL) to address the coordination problems in fully cooperative multiagent environments. To the best of our knowledge, we are the first to combine self-imitation learning with generative adversarial imitation learning (GAIL) and apply it to cooperative multiagent systems. Besides, we put forward a Sub-Curriculum Experience Replay mechanism to pick out the past beneficial experiences as much as possible and accelerate the self-imitation learning process. Evaluations conducted in the testbed of StarCraft unit micromanagement and a commonly adopted benchmark show that our IGASIL produces state-of-the-art results and even outperforms JALs in terms of both convergence speed and final performance., Comment: accepted as a full paper by AAMAS 2019
Published: 2019

18. From Few to More: Large-scale Dynamic Multiagent Curriculum Learning

Author: Wang, Weixun, Yang, Tianpei, Liu, Yong, Hao, Jianye, Hao, Xiaotian, Hu, Yujing, Chen, Yingfeng, Fan, Changjie, and Gao, Yang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: A lot of efforts have been devoted to investigating how agents can learn effectively and achieve coordination in multiagent systems. However, it is still challenging in large-scale multiagent settings due to the complex dynamics between the environment and agents and the explosion of state-action space. In this paper, we design a novel Dynamic Multiagent Curriculum Learning (DyMA-CL) to solve large-scale problems by starting from learning on a multiagent scenario with a small size and progressively increasing the number of agents. We propose three transfer mechanisms across curricula to accelerate the learning process. Moreover, due to the fact that the state dimension varies across curricula,, and existing network structures cannot be applied in such a transfer setting since their network input sizes are fixed. Therefore, we design a novel network structure called Dynamic Agent-number Network (DyAN) to handle the dynamic size of the network input. Experimental results show that DyMA-CL using DyAN greatly improves the performance of large-scale multiagent learning compared with state-of-the-art deep reinforcement learning approaches. We also investigate the influence of three transfer mechanisms across curricula through extensive simulations., Comment: Accepted by AAAI2020
Published: 2019

19. Action Semantics Network: Considering the Effects of Actions in Multiagent Systems

Author: Wang, Weixun, Yang, Tianpei, Liu, Yong, Hao, Jianye, Hao, Xiaotian, Hu, Yujing, Chen, Yingfeng, Fan, Changjie, and Gao, Yang
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence
Abstract: In multiagent systems (MASs), each agent makes individual decisions but all of them contribute globally to the system evolution. Learning in MASs is difficult since each agent's selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the increase in the number of agents. Previous works borrow various multiagent coordination mechanisms into deep learning architecture to facilitate multiagent coordination. However, none of them explicitly consider action semantics between agents that different actions have different influences on other agents. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions' influence on other agents using neural networks based on the action semantics between them. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II micromanagement and Neural MMO show ASN significantly improves the performance of state-of-the-art DRL approaches compared with several network architectures., Comment: accepted by ICLR2020
Published: 2019

20. Method development and optimization for detecting mRNAS of drug metabolizing enzymes and transporters in exosomes derived from human blood

Author: Xu, Shengjie, primary, Gibson, Christopher, additional, Wang, Weixun, additional, Spellman, Daniel, additional, and Zhang, Rena, additional
Published: 2024
Full Text: View/download PDF

21. Learning Adaptive Display Exposure for Real-Time Advertising

Author: Wang, Weixun, Jin, Junqi, Hao, Jianye, Chen, Chunjie, Yu, Chuan, Zhang, Weinan, Wang, Jun, Hao, Xiaotian, Wang, Yixi, Li, Han, Xu, Jian, and Gai, Kun
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: In E-commerce advertising, where product recommendations and product ads are presented to users simultaneously, the traditional setting is to display ads at fixed positions. However, under such a setting, the advertising system loses the flexibility to control the number and positions of ads, resulting in sub-optimal platform revenue and user experience. Consequently, major e-commerce platforms (e.g., Taobao.com) have begun to consider more flexible ways to display ads. In this paper, we investigate the problem of advertising with adaptive exposure: can we dynamically determine the number and positions of ads for each user visit under certain business constraints so that the platform revenue can be increased? More specifically, we consider two types of constraints: request-level constraint ensures user experience for each user visit, and platform-level constraint controls the overall platform monetization rate. We model this problem as a Constrained Markov Decision Process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning approach to decompose the original problem into two relatively independent sub-problems. To accelerate policy learning, we also devise a constrained hindsight experience replay mechanism. Experimental evaluations on industry-scale real-world datasets demonstrate the merits of our approach in both obtaining higher revenue under the constraints and the effectiveness of the constrained hindsight experience replay mechanism., Comment: accepted by CIKM2019
Published: 2018

22. Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach

Author: Wang, Weixun, Hao, Jianye, Wang, Yixi, and Taylor, Matthew
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Learning, Computer Science - Multiagent Systems
Abstract: The Iterated Prisoner's Dilemma has guided research on social dilemmas for decades. However, it distinguishes between only two atomic actions: cooperate and defect. In real-world prisoner's dilemmas, these choices are temporally extended and different strategies may correspond to sequences of actions, reflecting grades of cooperation. We introduce a Sequential Prisoner's Dilemma (SPD) game to better capture the aforementioned characteristics. In this work, we propose a deep multiagent reinforcement learning approach that investigates the evolution of mutual cooperation in SPD games. Our approach consists of two phases. The first phase is offline: it synthesizes policies with different cooperation degrees and then trains a cooperation degree detection network. The second phase is online: an agent adaptively selects its policy based on the detected degree of opponent cooperation. The effectiveness of our approach is demonstrated in two representative SPD 2D games: the Apple-Pear game and the Fruit Gathering game. Experimental results show that our strategy can avoid being exploited by exploitative opponents and achieve cooperation with cooperative opponents., Comment: 13 pages, 21 figures
Published: 2018

23. Cooperative Multi-Agent Transfer Learning with Coalition Pattern Decomposition

Author: Zhou, Tianze, primary, Zhang, Fubiao, additional, Shao, Kun, additional, Dai, Zipeng, additional, Li, Kai, additional, Huang, Wenhan, additional, Wang, Weixun, additional, Wang, Bin, additional, Li, Dong, additional, Liu, Wulong, additional, and Hao, Jianye, additional
Published: 2024
Full Text: View/download PDF

24. A Novel Approach for Handling Misbehaving Nodes in Behavior-Aware Mobile Networking

Author: Basu, Kanad, Mitra, Subrata, Mukherjee, Srishti, and Wang, Weixun
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Cryptography and Security
Abstract: Profile-cast is a service paradigm within the communication framework of delay tolerant networks (DTN). Instead of using destination addresses to determine the final destination it uses similarity-based forwarding protocol. With the rise in popularity of various wireless networks, the need to make wireless technologies robust, resilient to attacks and failure becomes mandatory. One issue that remains to be addressed in behavioral networks is node co-operation in forwarding packets. Nodes might behave selfishly (due to bandwidth preservation, energy /power constraints) or maliciously by dropping packets or not forwarding them to other nodes based on profile similarity. In both cases the net result is degradation in the performance of the network. It is our goal to show that the performance of the behavioral network can be improved by employing self-policing scheme that would detect node misbehavior and then decide how to tackle them in order to ensure node cooperation or so that the overall performance does not fall below a certain threshold. For this various existing self-policing techniques which are in use in ad-hoc networks will be first tried on this behavioral scenario.At various stages simulation would be used to measure performances of the network under different constraints, and after subjected to different techniques
Published: 2012

25. Investigating Performance of the SLIM-Based High Resolution Ion Mobility Platform for Separation of Isomeric Phosphatidylcholine Species

Author: Kedia, Komal, primary, Harris, Rachel, additional, Ekroos, Kim, additional, Moser, Kelly W, additional, DeBord, Daniel, additional, Tiberi, Paolo, additional, Goracci, Laura, additional, Zhang, Nanyan Rena, additional, Wang, Weixun, additional, Spellman, Daniel S., additional, and Bateman, Kevin, additional
Published: 2023
Full Text: View/download PDF

26. Quantitation of Super Basic Peptides in Biological Matrices by a Generic Perfluoropentanoic Acid-Based Liquid Chromatography–Mass Spectrometry Method

Author: Wen, Jianzhong, Wang, Weixun, Lee, Keun-Joong, Choi, Bernard K., Harradine, Paul, Salituro, Gino M., and Hittle, Lucinda
Published: 2019
Full Text: View/download PDF

27. Cooperative Multiagent Transfer Learning With Coalition Pattern Decomposition

Author: Zhou, Tianze, Zhang, Fubiao, Shao, Kun, Dai, Zipeng, Li, Kai, Huang, Wenhan, Wang, Weixun, Wang, Bin, Li, Dong, Liu, Wulong, and Hao, Jianye
Abstract: Knowledge transfer in cooperative multiagent reinforcement learning (MARL) has drawn increasing attention in recent years. Unlike generalizing policies in single-agent tasks, it is more important to consider coordination knowledge than individual knowledge in multiagent transfer learning. However, most of the existing methods only focus on knowledge transfer of the individual agent policy, which leads to coordination bias, and finally, affects the final performance in cooperative MARL. In this article, we propose a level-adaptive MARL framework called “LA-QTransformer,” to realize the knowledge transfer on the coordination level via efficiently decomposing the agent coordination into multilevel coalition patterns for different agents. Compatible with centralized training with decentralized execution regime, LA-QTransformer utilizes the level-adaptive transformer to generate suitable coalition patterns, and then, realizes the credit assignment for each agent. Besides, to deal with unexpected changes in the number of agents in the coordination transfer phase, we design a policy network called “population invariant agent with transformer (PIT)” to adapt dynamic observation and action space. We evaluate the LA-QTransformer and PIT in the StarCraft II micromanagement benchmark by comparing them with several state-of-the-art MARL baselines. The experimental results demonstrate the superiority of LA-QTransformer and PIT and verify the feasibility of coordination knowledge transfer.
Published: 2024
Full Text: View/download PDF

28. Rationally Designed Mutations Convert de novo Amyloid-like Fibrils into Monomeric β-Sheet Proteins

Author: Wang, Weixun and Hecht, Michael H.
Published: 2002

29. Self-Assembled Monolayers from a Designed Combinatorial Library of de novo β-Sheet Proteins

Author: Xu, Guofeng, Wang, Weixun, Groves, John T., and Hecht, Michael H.
Published: 2001

30. System-Wide Energy Optimization with DVS and DCR

Author: Wang, Weixun, Mishra, Prabhat, Ranka, Sanjay, Wang, Weixun, Mishra, Prabhat, and Ranka, Sanjay
Published: 2013
Full Text: View/download PDF

31. Temperature- and Energy-Constrained Scheduling

Author: Wang, Weixun, Mishra, Prabhat, Ranka, Sanjay, Wang, Weixun, Mishra, Prabhat, and Ranka, Sanjay
Published: 2013
Full Text: View/download PDF

32. Energy Optimization of Cache Hierarchy in Multicore Real-Time Systems

Author: Wang, Weixun, Mishra, Prabhat, Ranka, Sanjay, Wang, Weixun, Mishra, Prabhat, and Ranka, Sanjay
Published: 2013
Full Text: View/download PDF

33. Energy-Aware Scheduling with Dynamic Voltage Scaling

Author: Wang, Weixun, Mishra, Prabhat, Ranka, Sanjay, Wang, Weixun, Mishra, Prabhat, and Ranka, Sanjay
Published: 2013
Full Text: View/download PDF

34. Dynamic Cache Reconfiguration in Real-Time Systems

Author: Wang, Weixun, Mishra, Prabhat, Ranka, Sanjay, Wang, Weixun, Mishra, Prabhat, and Ranka, Sanjay
Published: 2013
Full Text: View/download PDF

35. Modeling of Real-Time and Reconfigurable Systems

Author: Wang, Weixun, Mishra, Prabhat, Ranka, Sanjay, Wang, Weixun, Mishra, Prabhat, and Ranka, Sanjay
Published: 2013
Full Text: View/download PDF

36. Introduction

Author: Wang, Weixun, Mishra, Prabhat, Ranka, Sanjay, Wang, Weixun, Mishra, Prabhat, and Ranka, Sanjay
Published: 2013
Full Text: View/download PDF

37. Conclusions

Author: Wang, Weixun, Mishra, Prabhat, Ranka, Sanjay, Wang, Weixun, Mishra, Prabhat, and Ranka, Sanjay
Published: 2013
Full Text: View/download PDF

38. MARLlib: A Scalable Multi-agent Reinforcement Learning Library

Author: Hu, Siyi, Zhong, Yifan, Gao, Minquan, Wang, Weixun, Dong, Hao, Li, Zhihui, Liang, Xiaodan, Chang, Xiaojun, and Yang, Yaodong
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems, Machine Learning (cs.LG), Multiagent Systems (cs.MA)
Abstract: Despite the fast development of multi-agent systems (MAS) and multi-agent reinforcement learning (MARL) algorithms, there is a lack of unified evaluation platforms and commonly-acknowledged baseline implementation. Therefore, an urgent need is to develop an integrated library suite that delivers reliable MARL implementation and replicable evaluation in various benchmarks. To fill such a research gap, in this paper, we propose MARLlib, a comprehensive MARL algorithm library for solving multi-agent problems. With a novel design of agent-level distributed dataflow, MARLlib manages to unify tens of algorithms in a highly composable integration style. Moreover, MARLlib goes beyond current work by integrating diverse environment interfaces and providing flexible parameter sharing strategies; this allows for versatile solutions to cooperative, competitive, and mixed tasks with minimal code modifications for end users. Finally, MARLlib provides easy-to-use APIs and a fully decoupled configuration system to help end users manipulate the learning process. A plethora of experiments is conducted to substantiate the correctness of our implementation, based on which we further derive new insights into the relationship between the performance and the design of algorithmic components. With MARLlib, we expect researchers to be able to tackle broader real-world multi-agent problems with trustworthy solutions. Github: \url{https://github.com/Replicable-MARL/MARLlib
Published: 2022

39. Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents

Author: Zhao, Jian, primary, Zhao, Youpeng, additional, Wang, Weixun, additional, Yang, Mingyu, additional, Hu, Xunhan, additional, Zhou, Wengang, additional, Hao, Jianye, additional, and Li, Houqiang, additional
Published: 2022
Full Text: View/download PDF

40. The 37 Implementation Details of Proximal Policy Optimization

Author: Huang, Shengyi, Dossa, Rousslan Fernand Julien, Raffin, Antonin, Kanervisto, Anssi, and Wang, Weixun
Subjects: reinforcement learning, ppo, policy optimization, implementation
Published: 2022

41. Discovery of Insulin Receptor Partial Agonists MK-5160 and MK-1092 as Novel Basal Insulins with Potential to Improve Therapeutic Index

Author: Pissarnitski, Dmitri A., primary, Kekec, Ahmet, additional, Yan, Lin, additional, Zhu, Yuping, additional, Feng, Danqing D., additional, Huo, Pei, additional, Madsen-Duggan, Christina, additional, Moyes, Christopher R., additional, Nargund, Ravi P., additional, Kelly, Terri, additional, Zhang, Xiaoping, additional, Carballo-Jane, Ester, additional, Gorski, Judith, additional, Zafian, Peter, additional, Qatanani, Mo, additional, Kaarsholm, Niels, additional, Meng, Fanyu, additional, Jia, Xiujuan, additional, Lee, Keun-Joong, additional, Wang, Weixun, additional, Xu, Sherrie, additional, Hohn, Michael J., additional, Iammarino, Michael J., additional, McCoy, Mark A., additional, Okoh, Grace A., additional, Liang, Yingkai, additional, Hollingsworth, Scott A., additional, Erion, Mark D., additional, Kelley, David E., additional, Garbaccio, Robert M., additional, Zhang, Amy, additional, Mu, James, additional, and Lin, Songnian, additional
Published: 2022
Full Text: View/download PDF

42. Development of ProTx-II Analogues as Highly Selective Peptide Blockers of Nav1.7 for the Treatment of Pain

Author: Adams, Gregory L., primary, Pall, Parul S., additional, Grauer, Steven M., additional, Zhou, Xiaoping, additional, Ballard, Jeanine E., additional, Vavrek, Marissa, additional, Kraus, Richard L., additional, Morissette, Pierre, additional, Li, Nianyu, additional, Colarusso, Stefania, additional, Bianchi, Elisabetta, additional, Palani, Anandan, additional, Klein, Rebecca, additional, John, Christopher T., additional, Wang, Deping, additional, Tudor, Matthew, additional, Nolting, Andrew F., additional, Biba, Mirlinda, additional, Nowak, Timothy, additional, Makarov, Alexey A., additional, Reibarkh, Mikhail, additional, Buevich, Alexei V., additional, Zhong, Wendy, additional, Regalado, Erik L., additional, Wang, Xiao, additional, Gao, Qi, additional, Shahripour, Aurash, additional, Zhu, Yuping, additional, de Simone, Daniele, additional, Frattarelli, Tommaso, additional, Pasquini, Nicolo’ Maria, additional, Magotti, Paola, additional, Iaccarino, Roberto, additional, Li, Yuxing, additional, Solly, Kelli, additional, Lee, Keun-Joong, additional, Wang, Weixun, additional, Chen, Feifei, additional, Zeng, Haoyu, additional, Wang, Jixin, additional, Regan, Hilary, additional, Amin, Rupesh P., additional, Regan, Christopher P., additional, Burgey, Christopher S., additional, Henze, Darrell A., additional, Sun, Chengzao, additional, and Tellers, David M., additional
Published: 2021
Full Text: View/download PDF

43. Guiding Chemically Synthesized Peptide Drug Lead Optimization by Derisking Mast Cell Degranulation-Related Toxicities of a NaV1.7 Peptide Inhibitor

Author: Morissette, Pierre, primary, Li, Nianyu, additional, Ballard, Jeanine E, additional, Vavrek, Marissa, additional, Adams, Gregory L, additional, Regan, Chris, additional, Regan, Hillary, additional, Lee, K J, additional, Wang, Weixun, additional, Burton, Aimee, additional, Chen, Feifei, additional, Gerenser, Pamela, additional, Li, Yuxing, additional, Kraus, Richard L, additional, Tellers, David, additional, Palani, Anand, additional, Zhu, Yuping, additional, Sun, Chengzao, additional, Bianchi, Elisabetta, additional, Colarusso, Stefania, additional, De Simone, Daniele, additional, Frattarelli, Tommaso, additional, Pasquini, Nicolo’ Maria, additional, and Amin, Rupesh P, additional
Published: 2021
Full Text: View/download PDF

44. A Series of Novel, Highly Potent, and Orally Bioavailable Next-Generation Tricyclic Peptide PCSK9 Inhibitors

Author: Tucker, Thomas J., primary, Embrey, Mark W., additional, Alleyne, Candice, additional, Amin, Rupesh P., additional, Bass, Alan, additional, Bhatt, Bhavana, additional, Bianchi, Elisabetta, additional, Branca, Danila, additional, Bueters, Tjerk, additional, Buist, Nicole, additional, Ha, Sookhee N., additional, Hafey, Mike, additional, He, Huaibing, additional, Higgins, John, additional, Johns, Douglas G., additional, Kerekes, Angela D., additional, Koeplinger, Kenneth A., additional, Kuethe, Jeffrey T., additional, Li, Nianyu, additional, Murphy, BethAnn, additional, Orth, Peter, additional, Salowe, Scott, additional, Shahripour, Aurash, additional, Tracy, Rodger, additional, Wang, Weixun, additional, Wu, Chengwei, additional, Xiong, Yusheng, additional, Zokian, Hratch J., additional, Wood, Harold B., additional, and Walji, Abbas, additional
Published: 2021
Full Text: View/download PDF

45. Quantification of Intact and Truncated Stromal Cell-Derived Factor-1α in Circulation by Immunoaffinity Enrichment and Tandem Mass Spectrometry

Author: Wang, Weixun, Choi, Bernard K., Li, Wenyu, Lao, Zhege, Lee, Anita Y. H., Souza, Sandra C., Yates, Nathan A., Kowalski, Timothy, Pocai, Alessandro, and Cohen, Lucinda H.
Published: 2014
Full Text: View/download PDF

46. An ultrasensitive method for the quantitation of active and inactive GLP-1 in human plasma via immunoaffinity LC–MS/MS

Author: Chappell, Derek L, Lee, Anita YH, Castro-Perez, Jose, Zhou, Haihong, Roddy, Thomas P, Lassman, Michael E, Shankar, Sudha S, Yates, Nathan A, Wang, Weixun, and Laterza, Omar F
Published: 2014
Full Text: View/download PDF

47. Dynamic Reconfiguration in Real-Time Systems

Author: Wang, Weixun, primary, Mishra, Prabhat, additional, and Ranka, Sanjay, additional
Published: 2013
Full Text: View/download PDF

48. Introduction

Author: Wang, Weixun, primary, Mishra, Prabhat, additional, and Ranka, Sanjay, additional
Published: 2012
Full Text: View/download PDF

49. Dynamic Cache Reconfiguration in Real-Time Systems

Author: Wang, Weixun, primary, Mishra, Prabhat, additional, and Ranka, Sanjay, additional
Published: 2012
Full Text: View/download PDF

50. Temperature- and Energy-Constrained Scheduling

Author: Wang, Weixun, primary, Mishra, Prabhat, additional, and Ranka, Sanjay, additional
Published: 2012
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Database

Publisher

213 results on '"Wang, Weixun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources