Author: "Zhang, Chengwei" / Topic: reinforcement learning - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang, Chengwei"' showing total 7 results

Start Over Author "Zhang, Chengwei" Topic reinforcement learning

7 results on '"Zhang, Chengwei"'

1. Dynamic programming with meta-reinforcement learning: a novel approach for multi-objective optimization

Author: Wang, Qi, Zhang, Chengwei, and Hu, Bin
Published: 2024
Full Text: View/download PDF

2. SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions

Author: Zhang, Chengwei, Han, Zhuobing, Liu, Bingfu, Xue, Wanli, Hao, Jianye, Li, Xiaohong, An, Dou, and Chen, Rong
Published: 2022
Full Text: View/download PDF

3. Data Integrity Attack in Dynamic State Estimation of Smart Grid: Attack Model and Countermeasures.

Author: An, Dou, Zhang, Feiye, Yang, Qingyu, and Zhang, Chengwei
Subjects: DATA integrity, PARTIALLY observable Markov decision processes, REINFORCEMENT learning
Abstract: A smart grid integrates advanced sensors, efficient measurement methods, progressive control technologies, and other techniques and devices to achieve safe, efficient and economical operation of the grid system. However, the diversified and open environment of a smart grid makes energy and information of the smart grid vulnerable to malicious attacks. As a representative cyber-physical attack, the data integrity attack has an extremely severe impact on the grid operation for it can bypass the traditional detection mechanisms by adjusting the attack vector. In this paper, we first present the attack strategy against dynamic state estimation of power grid in the perspective of adversary and formulate the data integrity attack detection problem that has the characteristic of sequential decision making as a partially observable Markov decision process. Then, a deep reinforcement learning-based approach is proposed to detect against data integrity attacks, which utilizes the Long Short-Term Memory layer to extract the state features of previous time steps in determining whether the system is currently under attack. Moreover, the noisy networks are employed to ensure effective agent exploration, which prevents the agent from sticking to the non-optimal policy. The principle of a multi-step learning is adopted to increase the estimation accuracy of Q value. To address the sparse rewards problem, the prioritized experience replay is proposed to increase training efficiency. Simulation results demonstrated that the proposed detection approach surpasses the benchmarks in the comparison metrics: delay error rate and false rate. Note to Practitioners—In this paper, we present a deep reinforcement learning-based algorithm to defend against the data integrity attacks of smart grid. Most of the previous works discretized the system states and utilized the current state information to identify whether the system is under attack. For this reason, the detection policy may totally ignored the continuously changing characteristics of the grid states, which will lead to poor detection performance. Moreover, the attacked system states only accounts for a small part of the entire grid operation states, the probability of sampling the experience containing the attack state is extremely small, which limits the learning efficiency of previous RL-based detection approaches. In order to increase the accuracy of detection, we first present the attack strategy against power grid’s dynamic state estimation in the perspective of adversary and formulate the partially observable Markov decision process model of attack detection problem. Moreover, we propose a deep reinforcement learning-based detection approach combining the LSTM network to extract the system state features of the previous time steps to determine whether the system is currently being attacked. To address the sparse rewards problem, the prioritized experience replay is used to increase learning efficiency. The experiments demonstrate the effectiveness of proposed detection scheme compared with benchmarks in terms of detection delay as well as accuracy. In conclusion, the proposed detection scheme is helpful in defending against the data integrity attacks without obtaining the opponent’s strategy in advance and can be conveniently applied to the real-world security management system of smart grid. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

4. Independent Reinforcement Learning for Weakly Cooperative Multiagent Traffic Control Problem.

Author: Zhang, Chengwei, Jin, Shan, Xue, Wanli, Xie, Xiaofei, Chen, Shengyong, and Chen, Rong
Subjects: *REINFORCEMENT learning, *TRAFFIC engineering, *TRAFFIC signs & signals, *GROUP work in education, *PROBLEM solving, *CITY traffic
Abstract: The adaptive traffic signal control (ATSC) problem can be modeled as a multiagent cooperative game among urban intersections, where intersections cooperate to counter the city's traffic conditions. Recently, reinforcement learning (RL) has achieved marked successes in managing sequential decision making problems, which motivates us to apply RL in the ATSC problem. One of the largest challenges of this problem is that the observation of intersection is typically partially observable, which limits the learning performance of RL algorithms. Considering the large scale of intersections in an urban traffic environment, we use independent RL to solve ATSC problem in this study. We model ATSC problem as a partially observable weak cooperative traffic model (PO-WCTM). Different from a traditional IRL task that averages the returns of all agents in fully cooperative games, the learning goal of each intersection in PO-WCTM is to reduce the cooperative difficulty of learning, which is also consistent with the traffic environment hypothesis. To achieve the optimal cooperative strategy of PO-WCTM, we propose an IRL algorithm called Cooperative Important Lenient Double DQN (CIL-DDQN), which extends Double DQN (DDQN) algorithm using two mechanisms: the forgetful experience mechanism and the lenient weight training mechanism. The former mechanism decreases the importance of experiences stored in the experience reply buffers, while the latter mechanism increases the weight experiences with high estimation and ‘leniently’ trains the DDQN neural network. Experiments in two real traffic scenarios and one simulated traffic scenarios show that, CIL-DDQN outperforms other methods in almost all performance indicators of ATSC. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

5. SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes.

Author: Zhang, Chengwei, Li, Xiaohong, Hao, Jianye, Chen, Siqi, Tuyls, Karl, Xue, Wanli, and Feng, Zhiyong
Subjects: REINFORCEMENT learning, SOCIAL services, NONLINEAR analysis, NASH equilibrium, ROBUST control
Abstract: In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer's perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner's dilemma game and the coordination game) are analyzed in detail. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

6. Discovering Lin-Kernighan-Helsgaun heuristic for routing optimization using self-supervised reinforcement learning.

Author: Wang, Qi, Zhang, Chengwei, and Tang, Chunlei
Subjects: REINFORCEMENT learning, VEHICLE routing problem, CLIENT satisfaction, HEURISTIC, OPERATING costs
Abstract: Vehicle routing optimization is a crucial responsibility of transportation service providers, which can significantly reduce operating expenses and improve client satisfaction. Learning to tackle routing optimization problems automatically can be the next significant step forward in optimization technology. Despite recent advancements in automatically learned heuristics for routing optimization problems, state-of-the-art traditional methods such as Lin-Kernighan-Helsgaun (LKH) still outperform machine learning-based approaches. To narrow this gap, we propose a novel technique called self-supervised reinforcement learning (SSRL), which combines self-supervised learning with the LKH heuristic. We provide a node decoder and an edge decoder corresponding to reinforcement learning and self-supervised learning for learning node penalties and edge scores, respectively. The self-supervised part with cross-entropy loss offers strong gradient signals for parameter updates. At the same time, the reinforcement learning component functions as a regularizer to drive the supervised part, which focuses on particular rewards. SSRL learns and replicates all of the LKH's significant components, improving the original LKH's generalization and performance. Through experiments on multiple vehicle routing problems, SSRL has demonstrated superior accuracy and efficiency compared to existing methods. Our results provide empirical evidence of SSRL's effectiveness and potential as a promising solution for optimizing complex routing problems. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. Adaptive Object Tracking via Multi-Angle Analysis Collaboration.

Author: Xue, Wanli, Feng, Zhiyong, Xu, Chao, Meng, Zhaopeng, and Zhang, Chengwei
Subjects: OBJECT tracking (Computer vision), FEATURE selection, REINFORCEMENT learning, ALGORITHMS, ROBUST control
Abstract: Although tracking research has achieved excellent performance in mathematical angles, it is still meaningful to analyze tracking problems from multiple perspectives. This motivation not only promotes the independence of tracking research but also increases the flexibility of practical applications. This paper presents a significant tracking framework based on the multi-dimensional state–action space reinforcement learning, termed as multi-angle analysis collaboration tracking (MACT). MACT is comprised of a basic tracking framework and a strategic framework which assists the former. Especially, the strategic framework is extensible and currently includes feature selection strategy (FSS) and movement trend strategy (MTS). These strategies are abstracted from the multi-angle analysis of tracking problems (observer's attention and object's motion). The content of the analysis corresponds to the specific actions in the multidimensional action space. Concretely, the tracker, regarded as an agent, is trained with Q-learning algorithm and ϵ -greedy exploration strategy, where we adopt a customized rewarding function to encourage robust object tracking. Numerous contrast experimental evaluations on the OTB50 benchmark demonstrate the effectiveness of the strategies and improvement in speed and accuracy of MACT tracker. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Zhang, Chengwei"'

1. Dynamic programming with meta-reinforcement learning: a novel approach for multi-objective optimization

2. SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions

3. Data Integrity Attack in Dynamic State Estimation of Smart Grid: Attack Model and Countermeasures.

4. Independent Reinforcement Learning for Weakly Cooperative Multiagent Traffic Control Problem.

5. SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes.

6. Discovering Lin-Kernighan-Helsgaun heuristic for routing optimization using self-supervised reinforcement learning.

7. Adaptive Object Tracking via Multi-Angle Analysis Collaboration.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

7 results on '"Zhang, Chengwei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources