Descriptor: "Policy Gradient" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Policy Gradient"' showing total 429 results

Start Over Descriptor "Policy Gradient"

429 results on '"Policy Gradient"'

1. Convergence of a L2 Regularized Policy Gradient Algorithm for the Multi Armed Bandit

Author: Aniţa, Ştefana-Lucia, Turinici, Gabriel, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
Published: 2025
Full Text: View/download PDF

2. The RL Toolkit: A Spectrum of Algorithms

Author: Lin, Baihan, Celebi, Emre, Series Editor, Chen, Jingdong, Series Editor, Gopi, E. S., Series Editor, Neustein, Amy, Series Editor, Liotta, Antonio, Series Editor, Di Mauro, Mario, Series Editor, and Lin, Baihan
Published: 2025
Full Text: View/download PDF

3. A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation.

Author: Zhang, Huaqing, Ma, Hongbin, Mersha, Bemnet Wondimagegnehu, and Jin, Ying
Subjects: REINFORCEMENT learning, DEEP reinforcement learning, MACHINE learning, ALGORITHMS
Abstract: On-policy deep reinforcement learning (DRL) has the inherent advantage of using multi-step interaction data for policy learning. However, on-policy DRL still faces challenges in improving the sample efficiency of policy evaluations. Therefore, we propose a multi-step on-policy DRL method assisted by off-policy policy evaluation (abbreviated as MSOAO), whichs integrates on-policy and off-policy policy evaluations and belongs to a new type of DRL method. We propose a low-pass filtering algorithm for state-values to perform off-policy policy evaluation and make it efficiently assist on-policy policy evaluation. The filtered state-values and the multi-step interaction data are used as the input of the V-trace algorithm. Then, the state-value function is learned by simultaneously approximating the target state-values obtained from the V-trace output and the action-values of the current policy. The action-value function is learned by using the one-step bootstrapping algorithm to approximate the target action-values obtained from the V-trace output. Extensive evaluation results indicate that MSOAO outperformed the performance of state-of-the-art on-policy DRL algorithms, and the simultaneous learning of the state-value function and the action-value function in MSOAO can promote each other, thus improving the learning capability of the algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Relabeling and policy distillation of hierarchical reinforcement learning.

Author: Zou, Qijie, Zhao, Xiling, Gao, Bing, Chen, Shuang, Liu, Zhiguo, and Zhang, Zhejie
Abstract: Hierarchical reinforcement learning (HRL) is a promising method to extend traditional reinforcement learning to solve more complex tasks. HRL can solve the problems of long-term reward sparsity and credit assignment. However, the existing HRL methods are trained in specific environments and target tasks each time, resulting in low sample utilization. In addition, the low-level sub-policies of the agent will interfere with each other during the migration process, resulting in poor policy stability. Aiming at the issue above, this paper proposes an HRL method, Relabeling and Policy Distillation of Hierarchical Reinforcement Learning (R-PD-HRL), that integrates meta-learning, shared reward relabeling and policy distillation to accelerate the learning speed and improve the policy stability of the agent. In the training process, a reward relabeling module is introduced to act on the experience buffer. Different reward functions are used to relabel the interaction trajectory for the training of other tasks under the same task distribution. At the low-level, policy distillation technology is used to compress the sub-policies of the low-level, and the interference between the policies is reduced while ensuring the correctness of the original low-level sub-policies. Finally, according to different tasks, the high-level policy calls the low-level optimal policy to complete the decision. In both continuous and discrete state-action environments, experimental results show that compared with other methods, the improved sample utilization of this method greatly accelerates the learning speed, and the success rate is as high as 0.6. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. HG-search: multi-stage search for heterogeneous graph neural networks.

Author: Sun, Hongmin, Kan, Ao, Liu, Jianhao, and Du, Wei
Abstract: In recent years, heterogeneous graphs, a complex graph structure that can express multiple types of nodes and edges, have been widely used for modeling various real-world scenarios. As a powerful analysis tool, heterogeneous graph neural networks (HGNNs) can effectively mine the information and knowledge in heterogeneous graphs. However, designing an excellent HGNN architecture requires a lot of domain knowledge and is a time-consuming and laborious task. Inspired by neural architecture search (NAS), some works on homogeneous graph NAS have emerged. However, there are few works on heterogeneous graph NAS. In addition, the hyperparameters related to the HGNN architecture are also important factors affecting its performance in downstream tasks. Manually tuning hyperparameters is also a tedious and inefficient process. To solve the above problems, we propose a novel search (HG-Search for short) algorithm specifically for HGNNs, which achieves fully automatic architecture design and hyperparameter tuning. Specifically, we first design a search space for HG-Search, composed of two parts: HGNN architecture search space and hyperparameter search space. Furthermore, we propose a multi-stage search (MS-Search for short) module and combine it with the policy gradient search (PG-Search for short). Experiments on real-world datasets show that this method can design HGNN architectures comparable to those manually designed by humans and achieve automatic hyperparameter tuning, significantly improving the performance in downstream tasks. The code and related datasets can be found at . [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

6. Experimental Implementation of a TD3 Agent Based Speed Controller for Direct Torque Control of PMSM Drives.

Author: Mastanaiah, Aenugu and Ramesh, Tejavathu
Subjects: *DEEP reinforcement learning, *REINFORCEMENT learning, *PERMANENT magnet motors, *ELECTRIC drives, *DIGITAL signal processing, *TORQUE control
Abstract: This manuscript presents a novel control approach for Permanent Magnet Synchronous Motors (PMSMs) by integrating the widely used Direct Torque Control (DTC) method with Deep Reinforcement Learning (DRL), specifically the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. The TD3 algorithm is well-suited to address the challenges posed by high-dimensional state and action spaces in PMSM drives. The conventional Proportional–Integral (PI) controller in the outer loop of DTC is replaced with the DRL based TD3 agent to overcome the limitations of traditional PI controllers, such as model dependency and complex parameter tuning. The DRL technique allows the agent to learn optimal control policies from experience, making it suitable for complex and nonlinear control problems. Extensive simulation studies demonstrate the effectiveness of the TD3-based control in achieving accurate speed tracking under varying operating conditions. Real-time experimental validation using a TMS320F28379D digital signal processor confirms the feasibility of the proposed DRL based approach. The research offers new insights into improving the performance of PMSM drives and paves the way for future advancements in electric drive control. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds.

Author: Paczolay, Gabor, Papini, Matteo, Metelli, Alberto Maria, Harmati, Istvan, and Restelli, Marcello
Subjects: REINFORCEMENT learning, COMPUTER simulation, ALGORITHMS
Abstract: Several variance-reduced versions of REINFORCE based on importance sampling achieve an improved O (ϵ - 3) sample complexity to find an ϵ -stationary point, under an unrealistic assumption on the variance of the importance weights. In this paper, we propose the Defensive Policy Gradient (DEF-PG) algorithm, based on defensive importance sampling, achieving the same result without any assumption on the variance of the importance weights. We also show that this is not improvable by establishing a matching Ω (ϵ - 3) lower bound, and that REINFORCE with its O (ϵ - 4) sample complexity is actually optimal under weaker assumptions on the policy class. Numerical simulations show promising results for the proposed technique compared to similar algorithms based on vanilla importance sampling. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION.

Author: CAYCI, SEMIH, NIAO HE, and SRIKANT, R.
Subjects: *REINFORCEMENT learning, *ERROR functions, *APPROXIMATION error, *MARKOV processes, *LEARNING problems
Abstract: Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finitetime convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of O(1/T) up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for sample-based NPG with entropy regularization. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Investigating the Efficacy of Deep Reinforcement Learning Models in Detecting and Mitigating Cyber-attacks: a Novel Approach.

Author: Praveen, S. Phani, Chokka, Anuradha, Sarala, Pappula, Nakka, Rajeswari, Chandolu, Suresh Babu, and Jyothi, V. Esther
Subjects: REINFORCEMENT learning, DEEP reinforcement learning, ARTIFICIAL intelligence, INTERNET security, COMPUTER network security, DENIAL of service attacks, CYBERTERRORISM
Abstract: Ordinary defence components like rule-based firewalls and mark based detection are not staying aware of the always expanding intricacy and frequency of cyber security dangers. The reason for this work is to explore the way that deep reinforcement learning (DRL), a subfield of artificial intelligence famous for its viability in handling testing decisionproduction situations, may be utilized to improve cyber security conventions. To mimic and balance threatening cyberattacks, we present a system that utilizes deep reinforcement learning (DRL). We propose a specialist based model that can learn and adjust ceaselessly in powerful network security situations. In light of the present status of the network and the rewards it gets for its decisions, the specialist concludes what the best game-plans are. Specifically, we utilize the policy gradient (PG)- based double deep Q-network (DDQN) model and trial on three different datasets: NSL-KDD, CIC-IDS, and AWID. Our review demonstrates the way that DRL can really further develop the detection after-effects of cyber-attacks. Utilizing the policy gradient DDQN model on different datasets, we find prominent upgrades in cyber security conventions. Specific boundary modifications upgrade the viability of our philosophy much more, displaying empowering results on different datasets. This exploration features the potential of deep reinforcement learning (DRL) as a successful instrument in the field of cyber security. Our examination progresses detection techniques and gives a versatile arrangement that can be applied to an assortment of cyber security worries by giving areas of strength for a to demonstrating and relieving cyber dangers. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. 基于强化学习的多智能体协同电子对抗方法.

Author: 杨洋, 王烨, ,康大勇, ,陈嘉玉, 李姜, and 赵华栋
Abstract: Copyright of Journal of Ordnance Equipment Engineering is the property of Chongqing University of Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

11. Anti-conflict AGV path planning in automated container terminals based on multi-agent reinforcement learning.

Author: Hu, Hongtao, Yang, Xurui, Xiao, Shichang, and Wang, Feiyang
Subjects: REINFORCEMENT learning, CONTAINER terminals, AUTOMATED planning & scheduling, AUTOMATED guided vehicle systems, INTRAMEDULLARY fracture fixation, INTEGER programming
Abstract: AGV conflict prevention path planning is a key factor to improve transportation cost and operation efficiency of the container terminal. This paper studies the anti-conflict path planning problem of Automated Guided Vehicle (AGV) in the horizontal transportation area of the Automated Container Terminals (ACTs). According to the characteristics of magnetic nail guided AGVs, a node network is constructed. Through the analysis of two conflict situations, namely the opposite conflict situation and same point occupation conflict situation, an integer programming model is established to obtain the shortest path. The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method is proposed to solve the problem, and the Gumbel-Softmax strategy is applied to discretize the scenario created by the node network. A series of numerical experiments are conducted to verify the effectiveness and the efficiency of the model and the algorithm. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

12. Landscape Analysis of Stochastic Policy Gradient Methods

Author: Liu, Xingtu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bifet, Albert, editor, Davis, Jesse, editor, Krilavičius, Tomas, editor, Kull, Meelis, editor, Ntoutsi, Eirini, editor, and Žliobaitė, Indrė, editor
Published: 2024
Full Text: View/download PDF

13. Enhancing Adversarial Robustness for Deep Metric Learning via Attention-Aware Knowledge Guidance

Author: Li, Chaofei, Zhu, Ziyuan, Pan, Yuedong, Niu, Ruicheng, Zhao, Yuting, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Pan, Yijie, editor, and Guo, Jiayang, editor
Published: 2024
Full Text: View/download PDF

14. A Reinforcement Learning Framework for Lung Segmentation of COVID-19 and Pneumonia Affected Chest X-Ray Image

Author: Chakraborty, Soarov, Hasan, K. M. Azharul, Paul, Shourav, Hartmanis, Juris, Founding Editor, Goos, Gerhard, Series Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ghosh, Ashish, editor, King, Irwin, editor, Bhattacharyya, Malay, editor, Sankar Ray, Shubhra, editor, and K. Pal, Sankar, editor
Published: 2024
Full Text: View/download PDF

15. Enhancing Student Engagement in Online Learning Through Strategy Gradient Reinforcement Learning

Author: Long, Si, Tsihrintzis, George A., Series Editor, Virvou, Maria, Series Editor, Jain, Lakhmi C., Series Editor, Paas, Fred, editor, Patnaik, Srikanta, editor, and Wang, Taosheng, editor
Published: 2024
Full Text: View/download PDF

16. Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning

Author: Zhang, Yunchao, Liao, Kewen, Liao, Zhibin, Guo, Longkun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Yang, De-Nian, editor, Xie, Xing, editor, Tseng, Vincent S., editor, Pei, Jian, editor, Huang, Jen-Wei, editor, and Lin, Jerry Chun-Wei, editor
Published: 2024
Full Text: View/download PDF

17. Addressing Coupled Constrained Reinforcement Learning via Interative Iteration Design

Author: Huang, Wei, Zhang, Shichao, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Tari, Zahir, editor, Li, Keqiu, editor, and Wu, Hongyi, editor
Published: 2024
Full Text: View/download PDF

18. Efficient Graph Sequence Reinforcement Learning for Traveling Salesman Problem

Author: Liu, Yiyang, Li, Lin, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Tan, Ying, editor, and Shi, Yuhui, editor
Published: 2024
Full Text: View/download PDF

19. Reinforce Model Tracklet for Multi-Object Tracking

Author: Ouyang, Jianhong, Wang, Shuai, Zhang, Yang, Wu, Yubin, Shen, Jiahao, Sheng, Hao, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sheng, Bin, editor, Bi, Lei, editor, Kim, Jinman, editor, Magnenat-Thalmann, Nadia, editor, and Thalmann, Daniel, editor
Published: 2024
Full Text: View/download PDF

20. List-Based Workflow Scheduling Utilizing Deep Reinforcement Learning

Author: Tseng, Wei-Cheng, Huang, Kuo-Chan, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Park, Ji Su, editor, Takizawa, Hiroyuki, editor, Shen, Hong, editor, and Park, James J., editor
Published: 2024
Full Text: View/download PDF

21. Obstacle Avoidance Control Method for Robotic Assembly Process Based on Lagrange PPO

Author: Quan, Weixin, Zhu, Wenbo, Lu, Qinghua, Luo, Lufeng, Wang, Kai, Liu, Meng, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Sun, Fuchun, editor, Meng, Qinghu, editor, Fu, Zhumu, editor, and Fang, Bin, editor
Published: 2024
Full Text: View/download PDF

22. 策略梯度的超启发算法求解带容量约束车辆路径问题.

Author: 张景玲, 孙钰粟, 赵燕伟, 余孟凡, and 蒋玉勇
Abstract: Copyright of Control Theory & Applications / Kongzhi Lilun Yu Yinyong is the property of Editorial Department of Control Theory & Applications and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

23. Reinforcement learning with dynamic convex risk measures.

Author: Coache, Anthony and Jaimungal, Sebastian
Subjects: REINFORCEMENT learning, ROBOT control systems, DYNAMIC programming, RANDOM variables, HEDGING (Finance), MOBILE robots
Abstract: We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization problems using model‐free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time‐consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Optimization of Reinforcement Learning Using Quantum Computation

Author: Roopa Ravish, Nischal R. Bhat, N. Nandakumar, S. Sagar, Sunil, and Prasad B. Honnavalli
Subjects: Advantage actor critic, deep Q network, policy gradient, Q learning, quantum approximate optimization algorithm, quantum computing, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Exploring the convergence of quantum computing and machine learning, this paper delves into Quantum Reinforcement Learning (QRL) with a specific focus on Variational Quantum Circuits (VQC). Alongside a comprehensive examination of the field, the study presents a concrete implementation of QRL tailored to foundational gym environments. Leveraging principles from Quantum Computing, experiments reveal QRL’s potential in enhancing reinforcement learning tasks within these environments, showcasing notable space optimization for RL models. Quantitative results indicate a reduction in the number of trainable parameters by up to 90% when compared to classical approaches, significantly improving efficiency, mainly due to quantum principles of superposition and entanglement. This highlights the promising implications of integrating quantum computing techniques to address challenges and advance capabilities in fundamental reinforcement learning scenarios. Furthermore, the study discusses the adaptability of QRL algorithms to diverse problem domains, suggesting their potential for scalability and applicability beyond simple environments. Such versatility underscores the robustness and practical relevance of QRL methodologies in real-world scenarios, positioning them as valuable tools for tackling complex reinforcement learning challenges.
Published: 2024
Full Text: View/download PDF

25. Optimal Power Allocation in Optical GEO Satellite Downlinks Using Model-Free Deep Learning Algorithms.

Author: Kapsis, Theodore T., Lyras, Nikolaos K., and Panagopoulos, Athanasios D.
Subjects: MACHINE learning, REINFORCEMENT learning, DEEP reinforcement learning, DEEP learning, HEURISTIC algorithms, ATMOSPHERIC turbulence
Abstract: Geostationary (GEO) satellites are employed in optical frequencies for a variety of satellite services providing wide coverage and connectivity. Multi-beam GEO high-throughput satellites offer Gbps broadband rates and, jointly with low-Earth-orbit mega-constellations, are anticipated to enable a large-scale free-space optical (FSO) network. In this paper, a power allocation methodology based on deep reinforcement learning (DRL) is proposed for optical satellite systems disregarding any channel statistics knowledge requirements. An all-FSO, multi-aperture GEO-to-ground system is considered and an ergodic capacity optimization problem for the downlink is formulated with transmitted power constraints. A power allocation algorithm was developed, aided by a deep neural network (DNN) which is fed channel state information (CSI) observations and trained in a parameterized on-policy manner through a stochastic policy gradient approach. The proposed method does not require the channels' transition models or fading distributions. To validate and test the proposed allocation scheme, experimental measurements from the European Space Agency's ARTEMIS optical satellite campaign were utilized. It is demonstrated that the predicted average capacity greatly exceeds other baseline heuristic algorithms while strongly converging to the supervised, unparameterized approach. The predicted average channel powers differ only by 0.1 W from the reference ones, while the baselines differ significantly more, about 0.1–0.5 W. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Adaptive bias-variance trade-off in advantage estimator for actor–critic algorithms.

Author: Chen, Yurou, Zhang, Fengyi, and Liu, Zhiyong
Subjects: *ALGORITHMS, *REINFORCEMENT learning
Abstract: Actor–critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor–critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Vision-based control in the open racing car simulator with deep and reinforcement learning.

Author: Zhu, Yuanheng and Zhao, Dongbin
Abstract: With decades of development, computer intelligence has now reached a really high level. Especially deep learning (DL) and reinforcement learning (RL) endow computers the perception and decision abilities. This paper aims to design a vision-based system that is able to play The Open Racing Car Simulator (TORCS) like a human player that uses images. With the DL-trained perception module, useful and low-dimensional information is extracted from first-person images. Based on that, the RL-trained module further manipulates the simulated car in the middle of the lane. The two modules are separately trained, and both DL and RL advantages are maximally utilized. Experiments on different tracks show the promising performance of the method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. Reinforcement learning-based cost-sensitive classifier for imbalanced fault classification.

Author: Zhang, Xinmin, Fan, Saite, and Song, Zhihuan
Abstract: Fault classification plays a crucial role in the industrial process monitoring domain. In the datasets collected from real-life industrial processes, the data distribution is usually imbalanced. The datasets contain a large amount of normal data (majority) and only a small amount of faulty data (minority); this phenomenon is also known as the imbalanced fault classification problem. To solve the imbalanced fault classification problem, a novel reinforcement learning (RL)-based cost-sensitive classifier (RLCC) based on policy gradient is proposed in this paper. In RLCC, a novel cost-sensitive learning strategy based on policy gradient and the actor-critic of RL is developed. The novel cost-sensitive learning strategy can adaptively learn the cost matrix and dynamically yield the sample weights. In addition, RLCC uses a newly designed reward to train the sample weight learner and classifier using an alternating iterative approach. The alternating iterative approach makes RLCC highly flexible and effective in solving the imbalanced fault classification problem. The effectiveness and practicability of the proposed RLCC method are verified through its application in a real-world dataset and an industrial process benchmark. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

29. FeMIP: detector-free feature matching for multimodal images with policy gradient.

Author: Di, Yide, Liao, Yun, Zhou, Hao, Zhu, Kaijun, Zhang, Yijia, Duan, Qing, Liu, Junhui, and Lu, Mingyu
Subjects: IMAGE registration, DATA augmentation, PATTERN matching, REINFORCEMENT learning, IMAGE processing
Abstract: Feature matching for multimodal images is an important task in image processing. However, most methods perform image feature detection, description, and matching sequentially, resulting in a large loss, low matching accuracy, and slow performance. To tackle these challenges, we propose a detector-free method called FeMIP for feature matching of multimodal images. We design coarse matching and fine regression modules to implement accurate multimodal image feature matches in a coarse-to-fine manner. Furthermore, we add a novel data augmentation method enabling FeMIP to achieve feature matching faster and more accurately. The coarse-to-fine module automatically generates pixel-level labels on the original image, enabling FeMIP to perform pixel-level matching on data with only image-level labels. In addition, we use the principle of reinforcement learning to design a policy gradient method to improve the solution to the problem of discreteness in matching. Extensive experiments show that FeMIP has good generalization and achieves excellent matching performances. The code will be released at: https://github.com/LiaoYun0x0/FeMIP. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

30. 用于连续时间中策略梯度算法的动作稳定更新算法.

Author: 宋江帆 and 李金龙
Subjects: *REINFORCEMENT learning, *PROBLEM solving, *ALGORITHMS, *SAMPLING (Process), *PROBABILITY theory
Abstract: In reinforcement learning, the policy gradient algorithm often needs to model the continuous-time process as a discrete-time process through sampling. To model the problem more accurately, it improves the sampling frequency. However, the excessive sampling frequency may reduce the training efficiency. To solve this problem, this paper proposed action stable updating algorithm. This method calculated the probability of action repetition using the change of the output of the policy function, and randomly repeated or changed the action based on this probability. This paper theoretically analyzed the performance of this method. This paper evaluated the performance of this method in nine different environments and compared it with the existing methods. This method surpassed existing methods in six of these environments. The experimental results show that this method can improve the training efficiency of the policy gradient algorithm in continuous-time problems. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

31. Combining Neural Networks with Logic Rules.

Author: Zhang, Lujiang
Subjects: *REINFORCEMENT learning, *ARTIFICIAL neural networks, *DEEP reinforcement learning, *SUPERVISED learning, *DEEP learning, *LOGIC
Abstract: How to utilize symbolic knowledge in deep learning is an important problem. Deep neural networks are flexible and powerful, while symbolic knowledge has the virtue of interpretability and intuitiveness. It is necessary to combine the two together to inject symbolic knowledge into neural networks. We propose a novel approach to combine neural networks with logic rules. In this approach, task-specific supervised learning and policy-based reinforcement learning are performed alternately to train a neural model until convergence. The basic idea is to use supervised learning to train a deep model and use reinforcement learning to propel the deep model to meet logic rules. In the process of the policy gradient reinforcement learning, if a predicted output of a deep model meets all logical rules, the deep model is given a positive reward, otherwise, it is given a negative reward. By maximizing the expected rewards, the deep model can be gradually adjusted to meet logical constraints. We conduct experiments on the tasks of named entity recognition. The experimental results demonstrate the effectiveness of our method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

32. Modeling limit order trading with a continuous action policy for deep reinforcement learning.

Author: Tsantekidis, Avraam, Passalis, Nikolaos, and Tefas, Anastasios
Subjects: *REINFORCEMENT learning, *DISTRIBUTION (Probability theory), *PRICE fluctuations, *PRICES, *CONTINUOUS distributions, *MACHINE learning
Abstract: Limit Orders allow buyers and sellers to set a "limit price" they are willing to accept in a trade. On the other hand, market orders allow for immediate execution at any price. Thus, market orders are susceptible to slippage, which is the additional cost incurred due to the unfavorable execution of a trade order. As a result, limit orders are often preferred, since they protect traders from excessive slippage costs due to larger than expected price fluctuations. Despite the price guarantees of limit orders, they are more complex compared to market orders. Orders with overly optimistic limit prices might never be executed, which increases the risk of employing limit orders in Machine Learning (ML)-based trading systems. Indeed, the current ML literature for trading almost exclusively relies on market orders. To overcome this limitation, a Deep Reinforcement Learning (DRL) approach is proposed to model trading agents that use limit orders. The proposed method (a) uses a framework that employs a continuous probability distribution to model limit prices, while (b) provides the ability to place market orders when the risk of no execution is more significant than the cost of slippage. Extensive experiments are conducted with multiple currency pairs, using hourly price intervals, validating the effectiveness of the proposed method and paving the way for introducing limit order modeling in DRL-based trading. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

33. Reinforced mixture learning.

Author: Le, Yuan, Zhou, Fan, and Bai, Yang
Subjects: *FISHER discriminant analysis, *MARKOV processes, *EXPECTATION-maximization algorithms, *SPECTRAL theory, *GRAPH theory
Abstract: In this article, we formulate the standard mixture learning problem as a Markov Decision Process (MDP). We theoretically show that the objective value of the MDP is equivalent to the log-likelihood of the observed data with a slightly different parameter space constrained by the policy. Different from some classic mixture learning methods such as Expectation–Maximization (EM) algorithm, the proposed reinforced algorithm requires no distribution assumptions and can handle the non-convex clustered data by constructing a model-free reward to evaluate the mixture assignment based on the spectral graph theory and Linear Discriminant Analysis (LDA). Extensive experiments on both synthetic and real examples demonstrate that the proposed method is comparable with the EM algorithm when the Gaussian mixture assumption is satisfied, and significantly outperforms it and other clustering methods in most scenarios when the model is misspecified. A Python implementation of our proposed method is available at https://github.com/leyuanheart/Reinforced-Mixture-Learning. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

34. Regret Analysis of a Markov Policy Gradient Algorithm for Multiarm Bandits.

Author: Walton, Neil and Denisov, Denis
Subjects: MARKOV processes, ROBBERS, REGRET, POLICY analysis, ALGORITHMS
Abstract: We consider a policy gradient algorithm applied to a finite-arm bandit problem with Bernoulli rewards. We allow learning rates to depend on the current state of the algorithm rather than using a deterministic time-decreasing learning rate. The state of the algorithm forms a Markov chain on the probability simplex. We apply Foster–Lyapunov techniques to analyze the stability of this Markov chain. We prove that, if learning rates are well-chosen, then the policy gradient algorithm is a transient Markov chain, and the state of the chain converges on the optimal arm with logarithmic or polylogarithmic regret. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

35. Exploring the Use of Invalid Action Masking in Reinforcement Learning: A Comparative Study of On-Policy and Off-Policy Algorithms in Real-Time Strategy Games.

Author: Hou, Yueqi, Liang, Xiaolong, Zhang, Jiaqiang, Yang, Qisong, Yang, Aiwu, and Wang, Ning
Subjects: STRATEGY games, REINFORCEMENT learning, MACHINE learning, ALGORITHMS
Abstract: Invalid action masking is a practical technique in deep reinforcement learning to prevent agents from taking invalid actions. Existing approaches rely on action masking during policy training and utilization. This study focuses on developing reinforcement learning algorithms that incorporate action masking during training but can be used without action masking during policy execution. The study begins by conducting a theoretical analysis to elucidate the distinction between naive policy gradient and invalid action policy gradient. Based on this analysis, we demonstrate that the naive policy gradient is a valid gradient and is equivalent to the proposed composite objective algorithm, which optimizes both the masked policy and the original policy in parallel. Moreover, we propose an off-policy algorithm for invalid action masking that employs the masked policy for sampling while optimizing the original policy. To compare the effectiveness of these algorithms, experiments are conducted using a simplified real-time strategy (RTS) game simulator called Gym- μ RTS. Based on empirical findings, we recommend utilizing the off-policy algorithm for addressing most tasks while employing the composite objective algorithm for handling more complex tasks. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

36. BLOCK POLICY MIRROR DESCENT.

Author: GUANGHUI LAN, YAN LI, and TUO ZHAO
Subjects: *REINFORCEMENT learning, *MIRRORS, *COMPUTATIONAL complexity, *INTERIOR-point methods
Abstract: In this paper, we present a new policy gradient (PG) method, namely, the block policy mirror descent (BPMD) method, for solving a class of regularized reinforcement learning (RL) problems with (strongly) convex regularizers. Compared to the traditional PG methods with a batch update rule, which visits and updates the policy for every state, the BPMD method has cheap per-iteration computation via a partial update rule that performs the policy update on a sampled state. Despite the nonconvex nature of the problem and a partial update rule, we provide a unified analysis for several sampling schemes and show that BPMD achieves fast linear convergence to the global optimality. In particular, uniform sampling leads to worst-case total computational complexity comparable to batch PG methods. A necessary and sufficient condition for convergence with on-policy sampling is also identified. With a hybrid sampling scheme, we further show that BPMD enjoys potential instance-dependent acceleration, leading to improved dependence on the state space and consequently outperforming batch PG methods. We then extend BPMD methods to the stochastic setting by utilizing stochastic first-order information constructed from samples. With a generative model, O(|S||A|κ) (resp., O(|S||A|κ²)) sample complexities are established for the strongly convex (resp., non-strongly convex) regularizers, where κdenotes the target accuracy. To the best of our knowledge, this is the first time that block coordinate descent methods have been developed and analyzed for policy optimization in reinforcement learning, which provides a new perspective on solving large-scale RL problems. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

37. Credit assignment with predictive contribution measurement in multi-agent reinforcement learning.

Author: Chen, Renlong and Tan, Ying
Subjects: *REINFORCEMENT learning, *WARMUP
Abstract: Credit assignment is a crucial issue in multi-agent tasks employing a centralized training and decentralized execution paradigm. While value decomposition has demonstrated strong performance in Q-learning-based approaches and certain Actor–Critic variants, it remains challenging to achieve efficient credit assignment in multi-agent tasks using policy gradient methods due to decomposable value limitations. This paper introduces Predictive Contribution Measurement, an explicit credit assignment method that compares prediction errors among agents and allocates surrogate rewards based on their relevance to global state transitions, with a theoretical guarantee. With multi-agent proximal policy optimization (MAPPO) as a training backend, we propose Predictive Contribution MAPPO (PC-MAPPO). Our experiments demonstrate that PC-MAPPO, with a 10% warm-up phase, outperforms MAPPO, QMIX, and Weighted QMIX on StarCraft multi-agent challenge tasks, particularly in maps requiring heightened cooperation to defeat enemies, such as the map corridor. Employing a pre-trained predictor, PC-MAPPO achieves significantly improved performance on all tested super-hard maps. In parallel training scenarios, PC-MAPPO exhibits superior data efficiency and achieves state-of-the-art performance compared to other methods. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

38. DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning

Author: Xuhan Liu, Kai Ye, Herman W. T. van Vlijmen, Adriaan P. IJzerman, and Gerard J. P. van Westen
Subjects: Deep learning, Reinforcement learning, Policy gradient, Drug design, Transformer, Multi-objective optimization, Information technology, T58.5-58.64, Chemistry, QD1-999
Abstract: Abstract Rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified due to the large drug-like chemical space available to search for novel drug-like molecules. With the rapid growth of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. Here, a Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules a novel positional encoding for each atom and bond based on an adjacency matrix was proposed, extending the architecture of the Transformer. The graph Transformer model contains growing and connecting procedures for molecule generation starting from a given scaffold based on fragments. Moreover, the generator was trained under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, the method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results show that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds.
Published: 2023
Full Text: View/download PDF

39. Standardising policy and technology responses in the immediate aftermath of a pandemic: a comparative and conceptual framework

Author: Naomi Moy, Marcello Antonini, Mattias Kyhlstedt, Gianluca Fiorentini, and Francesco Paolucci
Subjects: COVID-19, Health research systems, Policy categorisation, Public health crisis, Policy gradient, Policy interventions, Public aspects of medicine, RA1-1270
Abstract: Abstract Background The initial policy response to the COVID-19 pandemic has differed widely across countries. Such variability in government interventions has made it difficult for policymakers and health research systems to compare what has happened and the effectiveness of interventions across nations. Timely information and analysis are crucial to addressing the lag between the pandemic and government responses to implement targeted interventions to alleviate the impact of the pandemic. Methods To examine the effect government interventions and technological responses have on epidemiological and economic outcomes, this policy paper proposes a conceptual framework that provides a qualitative taxonomy of government policy directives implemented in the immediate aftermath of a pandemic announcement and before vaccines are implementable. This framework assigns a gradient indicating the intensity and extent of the policy measures and applies the gradient to four countries that share similar institutional features but different COVID-19 experiences: Italy, New Zealand, the United Kingdom and the United States of America. Results Using the categorisation framework allows qualitative information to be presented, and more specifically the gradient can show the dynamic impact of policy interventions on specific outcomes. We have observed that the policy categorisation described here can be used by decision-makers to examine the impacts of major viral outbreaks such as SARS-CoV-2 on health and economic outcomes over time. The framework allows for a visualisation of the frequency and comparison of dominant policies and provides a conceptual tool to assess how dominant interventions (and innovations) affect different sets of health and non-health related outcomes during the response phase to the pandemic. Conclusions Policymakers and health researchers should converge toward an optimal set of policy interventions to minimize the costs of the pandemic (i.e., health and economic), and facilitate coordination across governance levels before effective vaccines are produced. The proposed framework provides a useful tool to direct health research system resources and build a policy benchmark for future viral outbreaks where vaccines are not readily available.
Published: 2023
Full Text: View/download PDF

40. Policy Gradient for Arabic to English Neural Machine Translation

Author: Zouidine, Mohamed, Khalil, Mohammed, Farouk, Abdelhamid Ibn El, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Lazaar, Mohamed, editor, Duvallet, Claude, editor, Touhafi, Abdellah, editor, and Al Achhab, Mohammed, editor
Published: 2022
Full Text: View/download PDF

41. RLPassGAN: Password Guessing Model Based on GAN with Policy Gradient

Author: Huang, Deng, Wang, Yufei, Chen, Wen, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin (Sherman), Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Shi, Wenbo, editor, Chen, Xiaofeng, editor, and Choo, Kim-Kwang Raymond, editor
Published: 2022
Full Text: View/download PDF

42. Policy Gradient Reinforcement Learning Method for Backward Motion Control of Tractor-Trailer Mobile Robot

Author: Wang, Qiqi, Cheng, Jin, Zhang, Han, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Liu, Qi, editor, Liu, Xiaodong, editor, Chen, Bo, editor, Zhang, Yiming, editor, and Peng, Jiansheng, editor
Published: 2022
Full Text: View/download PDF

43. An Open Domain Question Answering System Trained by Reinforcement Learning

Author: Afrae, Bghiel, Mohamed, Ben Ahmed, Abdelhakim, Anouar Boudhir, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Corchado, Juan M., editor, and Trabelsi, Saber, editor
Published: 2022
Full Text: View/download PDF

44. Robust reinforcement learning algorithm based on pigeon-inspired optimization

Author: Mingying ZHANG, Bing HUA, Yuguang ZHANG, Haidong LI, and Mohong ZHENG
Subjects: pigeon-inspired optimization algorithm, strengthen learning, policy gradient, robustness, Electronic computers. Computer science, QA75.5-76.95
Abstract: Reinforcement learning(RL) is an artificial intelligence algorithm with the advantages of clear calculation logic and easy expansion of the model.Through interacting with the environment and maximizing value functions on the premise of obtaining little or no prior information, RL can optimize the performance of strategies and effectively reduce the complexity caused by physical models .The RL algorithm based on strategy gradient has been successfully applied in many fields such as intelligent image recognition, robot control and path planning for automatic driving.However, the highly sampling-dependent characteristics of RL determine that the training process needs a large number of samples to converge, and the accuracy of decision making is easily affected by slight interference that does not match with the simulation environment.Especially when RL is applied to the control field, it is difficult to prove the stability of the algorithm because the convergence of the algorithm cannot be guaranteed.Considering that swarm intelligence algorithm can solve complex problems through group cooperation and has the characteristics of self-organization and strong stability, it is an effective way to be used for improving the stability of RL model.The pigeon-inspired optimization algorithm in swarm intelligence was combined to improve RL based on strategy gradient.A RL algorithm based on pigeon-inspired optimization was proposed to solve the strategy gradient in order to maximize long-term future rewards.Adaptive function of pigeon-inspired optimization algorithm and RL were combined to estimate the advantages and disadvantages of strategies, avoid solving into an infinite loop, and improve the stability of the algorithm.A nonlinear two-wheel inverted pendulum robot control system was selected for simulation verification.The simulation results show that the RL algorithm based on pigeon-inspired optimization can improve the robustness of the system, reduce the computational cost, and reduce the algorithm’s dependence on the sample database.
Published: 2022
Full Text: View/download PDF

45. A task allocation algorithm based on reinforcement learning in spatio-temporal crowdsourcing.

Author: Zhao, Bingxu, Dong, Hongbin, Wang, Yingjie, and Pan, Tingwei
Subjects: CROWDSOURCING, REINFORCEMENT learning, BIPARTITE graphs, SHARING economy, ALGORITHMS
Abstract: With the pervasiveness of dynamic task allocation in sharing economy applications, online bipartite graph matching has attracted more and more research attention. In sharing economy applications, crowdsourcing platforms need to allocate tasks to workers dynamically. Previous studies have low allocation utility. To increase the allocation utility of the Spatio-temporal crowdsourcing system, this paper proposes a dynamic delay bipartite matching(DDBM) problem, and designs Value Based Task Allocation(VBTA) and Policy Gradient Based Task Allocation(PGTA) frameworks respectively. According to the current state, VBTA and PGTA could enhance the allocation utility by selecting appropriate thresholds. The convergence of the algorithm is proved. Extensive experimental results on two real datasets demonstrate that the proposed algorithms are superior to the existing algorithms in effectiveness and efficiency. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

46. Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics.

Author: Lin, Mingduo, Zhao, Bo, and Liu, Derong
Subjects: *ZERO sum games, *DYNAMIC programming, *NONLINEAR programming, *ITERATIVE learning control, *WEIGHT training, *ADAPTIVE fuzzy control, *REINFORCEMENT learning, *ELECTRONIC data processing
Abstract: A novel policy gradient (PG) adaptive dynamic programming method is developed to deal with nonlinear discrete-time zero-sum games with unknown dynamics. To facilitate the implementation, a policy iteration algorithm is established to approximate the iterative Q-function, as well as the control and disturbance policies via three neural network (NN) approximators, respectively. Then, the iterative Q-function is exploited to update the control and disturbance policies via PG method. To stabilize the training process and improve the data usage efficiency, the experience replay technique is applied to train the weight vectors of the three NNs by using mini-batch empirical data from replay memory. Furthermore, the convergence in terms of the iterative Q-function is proved. Simulation results of two numerical examples are provided to show the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

47. Decentralized multi-task reinforcement learning policy gradient method with momentum over networks.

Author: Junru, Shi, Qiong, Wang, Muhua, Liu, Zhihang, Ji, Ruijuan, Zheng, and Qingtao, Wu
Subjects: REINFORCEMENT learning, LEARNING problems, CLASSROOM environment
Abstract: To find the optimal policy quickly for reinforcement learning problems, policy gradient (PG) method is very effective, it parameters the policy and updates policy parameter directly. Besides, momentum methods are commonly employed to improve convergence performance in the training of centralized deep networks, which can accelerate training rate by changing the descending direction of gradients. However, decentralized variants with momentum of PG are rarely investigated. For this reason, we propose a Decentralized Policy Gradient algorithm with Momentum called DPGM for solving multi-task reinforcement learning problems. Moreover, this article makes theoretical analysis on the convergence performance of DPGM rigorously, it can reach the rate of O(1/T), where T denotes the number of iterations. This rate can match the state of the art of decentralized PG methods. Furthermore, we provide experimental verification on decentralized reinforcement learning environment to support the theoretical result. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

48. Multi-label sequence generating model via label semantic attention mechanism.

Author: Zhang, Xiuling, Tan, Xiaofei, Luo, Zhaoci, and Zhao, Jun
Abstract: In recent years, a new attempt has been made to capture label co-occurrence by applying the sequence-to-sequence (Seq2Seq) model to multi-label text classification (MLTC). However, existing approaches frequently ignore the semantic information contained in the labels themselves. Besides, the Seq2Seq model is susceptible to the negative impact of label sequence order. Furthermore, it has been demonstrated that the traditional attention mechanism underperforms in MLTC. Therefore, we propose a novel Seq2Seq model with a different label semantic attention mechanism (S2S-LSAM), which generates fused information containing label and text information through the interaction of label semantics and text features in the label semantic attention mechanism. With the fused information, our model can select the text features that are most relevant to the labels more effectively. A combination of the cross-entropy loss function and the policy gradient-based loss function is employed to reduce the label sequence order effect. The experiments show that our model outperforms the baseline models. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

49. Reinforcement Learning-Based Approach for Minimizing Energy Loss of Driving Platoon Decisions †.

Author: Gu, Zhiru, Liu, Zhongwei, Wang, Qi, Mao, Qiyun, Shuai, Zhikang, and Ma, Ziji
Subjects: *MACHINE learning, *REINFORCEMENT learning, *ENERGY dissipation, *ADAPTIVE control systems, *SUSTAINABLE transportation
Abstract: Reinforcement learning (RL) methods for energy saving and greening have recently appeared in the field of autonomous driving. In inter-vehicle communication (IVC), a feasible and increasingly popular research direction of RL is to obtain the optimal action decision of agents in a special environment. This paper presents the application of reinforcement learning in the vehicle communication simulation framework (Veins). In this research, we explore the application of reinforcement learning algorithms in a green cooperative adaptive cruise control (CACC) platoon. Our aim is to train member vehicles to react appropriately in the event of a severe collision involving the leading vehicle. We seek to reduce collision damage and optimize energy consumption by encouraging behavior that conforms to the platoon's environmentally friendly aim. Our study provides insight into the potential benefits of using reinforcement learning algorithms to improve the safety and efficiency of CACC platoons while promoting sustainable transportation. The policy gradient algorithm used in this paper has good convergence in the calculation of the minimum energy consumption problem and the optimal solution of vehicle behavior. In terms of energy consumption metrics, the policy gradient algorithm is used first in the IVC field for training the proposed platoon problem. It is a feasible training decision-planning algorithm for solving the minimization of energy consumption caused by decision making in platoon avoidance behavior. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

50. Human Pathogenic Monkeypox Disease Recognition Using Q-Learning Approach.

Author: Velu, Malathi, Dhanaraj, Rajesh Kumar, Balusamy, Balamurugan, Kadry, Seifedine, Yu, Yang, Nadeem, Ahmed, and Rauf, Hafiz Tayyab
Subjects: *MONKEYPOX, *IMAGE recognition (Computer vision), *CONVOLUTIONAL neural networks, *REINFORCEMENT learning, *FEATURE selection
Abstract: While the world is working quietly to repair the damage caused by COVID-19's widespread transmission, the monkeypox virus threatens to become a global pandemic. There are several nations that report new monkeypox cases daily, despite the virus being less deadly and contagious than COVID-19. Monkeypox disease may be detected using artificial intelligence techniques. This paper suggests two strategies for improving monkeypox image classification precision. Based on reinforcement learning and parameter optimization for multi-layer neural networks, the suggested approaches are based on feature extraction and classification: the Q-learning algorithm determines the rate at which an act occurs in a particular state; Malneural networks are binary hybrid algorithms that improve the parameters of neural networks. The algorithms are evaluated using an openly available dataset. In order to analyze the proposed optimization feature selection for monkeypox classification, interpretation criteria were utilized. In order to evaluate the efficiency, significance, and robustness of the suggested algorithms, a series of numerical tests were conducted. There were 95% precision, 95% recall, and 96% f1 scores for monkeypox disease. As compared to traditional learning methods, this method has a higher accuracy value. The overall macro average was around 0.95, and the overall weighted average was around 0.96. When compared to the benchmark algorithms, DDQN, Policy Gradient, and Actor–Critic, the Malneural network had the highest accuracy (around 0.985). In comparison with traditional methods, the proposed methods were found to be more effective. Clinicians can use this proposal to treat monkeypox patients and administration agencies can use it to observe the origin and current status of the disease. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

429 results on '"Policy Gradient"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources