4,506 results
Search Results
2. Deep Reinforcement Learning-Based Optimization for RIS-Based UAV-NOMA Downlink Networks (Invited Paper)
- Author
-
Shiyu Jiao, Ximing Xie, and Zhiguo Ding
- Subjects
non-orthogonal multiple access ,reconfigurable intelligent surface ,unmanned aerial vehicles ,deep reinforcement learning ,deep deterministic policy gradient ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This study investigates the application of deep deterministic policy gradient (DDPG) to reconfigurable intelligent surface (RIS)-based unmanned aerial vehicles (UAV)-assisted non-orthogonal multiple access (NOMA) downlink networks. The deployment of UAV equipped with a RIS is important, as the UAV increases the flexibility of the RIS significantly, especially for the case of users who have no line-of-sight (LoS) path to the base station (BS). Therefore, the aim of this study is to maximize the sum-rate by jointly optimizing the power allocation of the BS, the phase shifting of the RIS, and the horizontal position of the UAV. The formulated problem is non-convex, the DDPG algorithm is utilized to solve it. The computer simulation results are provided to show the superior performance of the proposed DDPG-based algorithm.
- Published
- 2022
- Full Text
- View/download PDF
3. Guest Editorial: Special Issue on Transdisciplinary Artificial Intelligence.
- Author
-
D'Auria, Daniela and Persia, Fabio
- Subjects
DEEP reinforcement learning ,INFORMATION technology ,SUPERVISED learning ,INTEGRATED circuit design ,SEMANTIC computing - Abstract
This document is a guest editorial from the International Journal of Semantic Computing, discussing a special issue on Transdisciplinary Artificial Intelligence. The special issue includes seven papers, four from the 5th IEEE International Conference on Transdisciplinary AI and three from the 6th International Conference on Artificial Intelligence for Industries. The papers cover a range of topics, including future visioning in AI research, utilizing satellite imagery to predict socioeconomic indicators, smart city wildfire risk analysis, leveraging community health workers for predicting emergency department readmissions, optimization of 2D irregular packing using deep reinforcement learning, real-time sampling strategies for regression, and efficient dynamic IC design analysis using graph-based semi-supervised learning. The papers underwent a rigorous review process and have been extended with new and unpublished materials. The authors present their findings and highlight the potential applications and implications of their research. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
4. Guest Editorial: Operational and structural resilience of power grids with high penetration of renewables.
- Author
-
Lei, Shunbo, Zhang, Yichen, Shahidehpour, Mohammad, Hou, Yunhe, Panteli, Mathaios, Chen, Xia, Aydin, Nazli Yonca, Liang, Liang, Wang, Cheng, Wang, Chong, and She, Buxin
- Subjects
MICROGRIDS ,ELECTRIC power distribution grids ,CYBER physical systems ,MIXED integer linear programming ,DEEP reinforcement learning ,ARTIFICIAL neural networks ,REINFORCEMENT learning ,ELECTRIC power - Published
- 2024
- Full Text
- View/download PDF
5. Integrating human experience in deep reinforcement learning for multi-UAV collision detection and avoidance
- Author
-
Wang, Guanzheng, Xu, Yinbo, Liu, Zhihong, Xu, Xin, Wang, Xiangke, and Yan, Jiarun
- Published
- 2022
- Full Text
- View/download PDF
6. Probability Dueling DQN active visual SLAM for autonomous navigation in indoor environment
- Author
-
Wen, Shuhuan, Lv, Xiaohan, Lam, Hak Keung, Fan, Shaokang, Yuan, Xiao, and Chen, Ming
- Published
- 2021
- Full Text
- View/download PDF
7. Optimization of news dissemination push mode by intelligent edge computing technology for deep learning.
- Author
-
DeGe, JiLe and Sang, Sina
- Subjects
DEEP reinforcement learning ,PATTERN recognition systems ,SOCIAL media ,NEWS websites ,RECOMMENDER systems ,DEEP learning ,REINFORCEMENT learning - Abstract
The Internet era is an era of information explosion. By 2022, the global Internet users have reached more than 4 billion, and the social media users have exceeded 3 billion. People face a lot of news content every day, and it is almost impossible to get interesting information by browsing all the news content. Under this background, personalized news recommendation technology has been widely used, but it still needs to be further optimized and improved. In order to better push the news content of interest to different readers, users' satisfaction with major news websites should be further improved. This study proposes a new recommendation algorithm based on deep learning and reinforcement learning. Firstly, the RL algorithm is introduced based on deep learning. Deep learning is excellent in processing large-scale data and complex pattern recognition, but it often faces the challenge of low sample efficiency when it comes to complex decision-making and sequential tasks. While reinforcement learning (RL) emphasizes learning optimization strategies through continuous trial and error through interactive learning with the environment. Compared with deep learning, RL is more suitable for scenes that need long-term decision-making and trial-and-error learning. By feeding back the reward signal of the action, the system can better adapt to the unknown environment and complex tasks, which makes up for the relative shortcomings of deep learning in these aspects. A scenario is applied to an action to solve the sequential decision problem in the news dissemination process. In order to enable the news recommendation system to consider the dynamic changes in users' interest in news content, the Deep Deterministic Policy Gradient algorithm is applied to the news recommendation scenario. Opposing learning complements and combines Deep Q-network with the strategic network. On the basis of fully summarizing and thinking, this paper puts forward the mode of intelligent news dissemination and push. The push process of news communication information based on edge computing technology is proposed. Finally, based on Area Under Curve a Q-Leaning Area Under Curve for RL models is proposed. This indicator can measure the strengths and weaknesses of RL models efficiently and facilitates comparing models and evaluating offline experiments. The results show that the DDPG algorithm improves the click-through rate by 2.586% compared with the conventional recommendation algorithm. It shows that the algorithm designed in this paper has more obvious advantages in accurate recommendation by users. This paper effectively improves the efficiency of news dissemination by optimizing the push mode of intelligent news dissemination. In addition, the paper also deeply studies the innovative application of intelligent edge technology in news communication, which brings new ideas and practices to promote the development of news communication methods. Optimizing the push mode of intelligent news dissemination not only improves the user experience, but also provides strong support for the application of intelligent edge technology in this field, which has important practical application prospects. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Adaptive control for circulating cooling water system using deep reinforcement learning.
- Author
-
Xu, Jin, Li, Han, and Zhang, Qingxin
- Subjects
DEEP reinforcement learning ,ADAPTIVE control systems ,COOLING systems ,WATER use ,SMART structures ,REINFORCEMENT learning - Abstract
Due to the complex internal working process of circulating cooling water systems, most traditional control methods struggle to achieve stable and precise control. Therefore, this paper presents a novel adaptive control structure for the Twin Delayed Deep Deterministic Policy Gradient algorithm, which is based on a reference trajectory model (TD3-RTM). The structure is based on the Markov decision process of the recirculating cooling water system. Initially, the TD3 algorithm is employed to construct a deep reinforcement learning agent. Subsequently, a state space is selected, and a dense reward function is designed, considering the multivariable characteristics of the recirculating cooling water system. The agent updates its network based on different reward values obtained through interactions with the system, thereby gradually aligning the action values with the optimal policy. The TD3-RTM method introduces a reference trajectory model to accelerate the convergence speed of the agent and reduce oscillations and instability in the control system. Subsequently, simulation experiments were conducted in MATLAB/Simulink. The results show that compared to PID, fuzzy PID, DDPG and TD3, the TD3-RTM method improved the transient time in the flow loop by 6.09s, 5.29s, 0.57s, and 0.77s, respectively, and the Integral of Absolute Error(IAE) indexes decreased by 710.54, 335.1, 135.97, and 89.96, respectively, and the transient time in the temperature loop improved by 25.84s, 13.65s, 15.05s, and 0.81s, and the IAE metrics were reduced by 143.9, 59.13, 31.79, and 1.77, respectively. In addition, the overshooting of the TD3-RTM method in the flow loop was reduced by 17.64, 7.79, and 1.29 per cent, respectively, in comparison with the PID, the fuzzy PID, and the TD3. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. USVs Path Planning for Maritime Search and Rescue Based on POS-DQN: Probability of Success-Deep Q-Network.
- Author
-
Liu, Lu, Shan, Qihe, and Xu, Qi
- Subjects
DEEP reinforcement learning ,RESCUE work ,AUTONOMOUS vehicles ,PROBLEM solving ,ALGORITHMS - Abstract
Efficient maritime search and rescue (SAR) is crucial for responding to maritime emergencies. In traditional SAR, fixed search path planning is inefficient and cannot prioritize high-probability regions, which has significant limitations. To solve the above problems, this paper proposes unmanned surface vehicles (USVs) path planning for maritime SAR based on POS-DQN so that USVs can perform SAR tasks reasonably and efficiently. Firstly, the search region is allocated as a whole using an improved task allocation algorithm so that the task region of each USV has priority and no duplication. Secondly, this paper considers the probability of success (POS) of the search environment and proposes a POS-DQN algorithm based on deep reinforcement learning. This algorithm can adapt to the complex and changing environment of SAR. It designs a probability weight reward function and trains USV agents to obtain the optimal search path. Finally, based on the simulation results, by considering the complete coverage of obstacle avoidance and collision avoidance, the search path using this algorithm can prioritize high-probability regions and improve the efficiency of SAR. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. An FPGA-Accelerated CNN with Parallelized Sum Pooling for Onboard Realtime Routing in Dynamic Low-Orbit Satellite Networks.
- Author
-
Kim, Hyeonwoo, Park, Juhyeon, Lee, Heoncheol, Won, Dongshik, and Han, Myonghun
- Subjects
REINFORCEMENT learning ,DEEP reinforcement learning ,CONVOLUTIONAL neural networks ,ROUTING algorithms ,GATE array circuits ,ORBITS of artificial satellites - Abstract
This paper addresses the problem of real-time onboard routing for dynamic low earth orbit (LEO) satellite networks. It is difficult to apply general routing algorithms to dynamic LEO networks due to the frequent changes in satellite topology caused by the disconnection between moving satellites. Deep reinforcement learning (DRL) models trained by various dynamic networks can be considered. However, since the inference process with the DRL model requires too long a computation time due to multiple convolutional layer operations, it is not practical to apply to a real-time on-board computer (OBC) with limited computing resources. To solve the problem, this paper proposes a practical co-design method with heterogeneous processors to parallelize and accelerate a part of the multiple convolutional layer operations on a field-programmable gate array (FPGA). The proposed method was tested with a real heterogeneous processor-based OBC and showed that the proposed method was about 3.10 times faster than the conventional method while achieving the same routing results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Deep reinforcement learning-based attitude motion control for humanoid robots with stability constraints
- Author
-
Shi, Qun, Ying, Wangda, Lv, Lei, and Xie, Jiajun
- Published
- 2020
- Full Text
- View/download PDF
12. Intelligent Traffic Control Decision-Making Based on Type-2 Fuzzy and Reinforcement Learning.
- Author
-
Bi, Yunrui, Ding, Qinglin, Du, Yijun, Liu, Di, and Ren, Shuaihang
- Subjects
DEEP reinforcement learning ,TRAFFIC engineering ,FUZZY control systems ,TRAFFIC flow ,TRAFFIC signs & signals ,REINFORCEMENT learning ,INTELLIGENT transportation systems - Abstract
Intelligent traffic control decision-making has long been a crucial issue for improving the efficiency and safety of the intelligent transportation system. The deficiencies of the Type-1 fuzzy traffic control system in dealing with uncertainty have led to a reduced ability to address traffic congestion. Therefore, this paper proposes a Type-2 fuzzy controller for a single intersection. Based on real-time traffic flow information, the green timing of each phase is dynamically determined to achieve the minimum average vehicle delay. Additionally, in traffic light control, various factors (such as vehicle delay and queue length) need to be balanced to define the appropriate reward. Improper reward design may fail to guide the Deep Q-Network algorithm to learn the optimal strategy. To address these issues, this paper proposes a deep reinforcement learning traffic control strategy combined with Type-2 fuzzy control. The output action of the Type-2 fuzzy control system replaces the action of selecting the maximum output Q-value of the target network in the DQN algorithm, reducing the error caused by the use of the max operation of the target network. This approach improves the online learning rate of the agent and increases the reward value of the signal control action. The simulation results using the Simulation of Urban MObility platform show that the traffic signal optimization control proposed in this paper has achieved significant improvement in traffic flow optimization and congestion alleviation, which can effectively improve the traffic efficiency in front of the signal light and improve the overall operation level of traffic flow. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. A Stock Prediction Method Based on Deep Reinforcement Learning and Sentiment Analysis.
- Author
-
Du, Sha and Shen, Hailong
- Subjects
REINFORCEMENT learning ,CONVOLUTIONAL neural networks ,DEEP reinforcement learning ,INVESTORS ,SENTIMENT analysis - Abstract
Featured Application: The model proposed in this paper can help stock investors to get high returns on newly listed stocks. Most previous stock investing methods were unable to predict newly listed stocks because they did not have historical data on newly listed stocks. In this paper, we use the Q-learning algorithm based on a convolutional neural network and add sentiment analysis to establish a prediction method for Chinese stock investment tasks. There are 118 companies that are ranked in the Chinese top 150 list for two consecutive years in both 2022 and 2023. We collected all comments under the stock bar of these 118 stocks for each day from 1 January 2022 to 1 July 2024, totaling nearly 10 million comments. There are 90 stocks left after the preprocessing of 118 stocks. We use these 90 stocks as the dataset. The stock's closing price, volume, and comment text data are fed together to the agent, and the trained agent outputs investment behaviors that maximize future returns. We apply the trained model to two test sets that are completely different from the training set and compare it to several other methods. Our proposed method called SADQN-S obtains results of 1.1229 and 1.1054 on the two test sets. SADQN-S obtained higher final total assets than the other methods on both test sets. This shows that our model can help stock investors earn high returns on newly listed stocks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Aortic Pressure Control Based on Deep Reinforcement Learning for Ex Vivo Heart Perfusion.
- Author
-
Wang, Shangting, Yang, Ming, Liu, Yuan, and Yu, Junwen
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,PRESSURE control ,PID controllers ,AEROBIC metabolism ,PULSATILE flow - Abstract
In ex vivo heart perfusion (EVHP), the control of aortic pressure (AoP) is critical for maintaining the heart's physiologic aerobic metabolism. However, the complexity of and variability in cardiac parameters present a challenge in achieving the rapid and accurate regulation of AoP. In this paper, we propose a method of AoP control based on deep reinforcement learning for EVHP in Langendorff mode, which can adapt to the variations in cardiac parameters. Firstly, a mathematical model is developed by coupling the coronary artery and the pulsatile blood pump models. Subsequently, an aortic pressure control method based on the Deep Deterministic Policy Gradient (DDPG) algorithm is proposed. This method enables the regulation of the blood pump and the realization of closed-loop control. The control performance of the proposed DDPG method, the traditional proportional–integral–derivative (PID) method, and the fuzzy PID method are compared by simulating single and mixed changes in mean aortic pressure target values and coronary resistance. The proposed method exhibits superior performance compared to the PID and fuzzy PID methods under mixed factors, with 68.6% and 66.4% lower settling times and 70.3% and 54.1% lower overshoot values, respectively. This study demonstrates that the proposed DDPG-based method can respond more rapidly and accurately to different cardiac conditions than the conventional PID controllers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. TD3 Algorithm of Dynamic Classification Replay Buffer Based PID Parameter Optimization.
- Author
-
Zhong, Haojun and Wang, Zhenlei
- Abstract
In industrial control, PID controllers are widely used, but their control performance depends heavily on the parameters of the controller. However, the adjustment of PID controller parameters is cumbersome and inefficient. Recently, deep reinforcement learning has been gradually introduced into the industrial control field due to its advantage of being able to learn autonomously by interacting with the environment. In this paper, a PID parameter optimization method based on TD3 algorithm of dynamic classification replay buffer (DCRB-TD3) is proposed. By designing the optimization framework, the optimization process of the PID parameters is converted into the learning process of the weights of the actor network. In order to improve the learning efficiency of the reinforcement learning algorithm, avoid the phenomenon of control curve dispersion and ensure that the whole process can be continuously closed-loop optimized. In this paper, the regular TD3 algorithm is improved, a dynamic classification ratio strategy is designed, and a sampling update method for dynamic classification experience replay is proposed. Finally, simulations are performed on various systems, and DCRB-TD3 is compared with the PID parameter optimization method based on the PSO algorithm. The results show that the PID parameters optimized by DCRB-TD3 have better control performance than other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. A Deep Reinforcement Learning Optimization Method Considering Network Node Failures.
- Author
-
Ding, Xueying, Liao, Xiao, Cui, Wei, Meng, Xiangliang, Liu, Ruosong, Ye, Qingshan, and Li, Donghe
- Subjects
REINFORCEMENT learning ,DEEP reinforcement learning ,ELECTRIC power distribution grids ,ELECTRIC power ,MICROGRIDS - Abstract
Nowadays, the microgrid system is characterized by a diversification of power factors and a complex network structure. Existing studies on microgrid fault diagnosis and troubleshooting mostly focus on the fault detection and operation optimization of a single power device. However, for increasingly complex microgrid systems, it becomes increasingly challenging to effectively contain faults within a specific spatiotemporal range. This can lead to the spread of power faults, posing great harm to the safety of the microgrid. The topology optimization of the microgrid based on deep reinforcement learning proposed in this paper starts from the overall power grid and aims to minimize the overall failure rate of the microgrid by optimizing the topology of the power grid. This approach can limit internal faults within a small range, greatly improving the safety and reliability of microgrid operation. The method proposed in this paper can optimize the network topology for the single node fault and multi-node fault, reducing the influence range of the node fault by 21% and 58%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Guest Editorial: Special issue on computational methods and artificial intelligence applications in low‐carbon energy systems.
- Author
-
Wang, Yishen, Zhou, Fei, Guerrero, Josep M., Baker, Kyri, Chen, Yize, Wang, Hao, Xu, Bolun, Xu, Qianwen, Zhu, Hong, and Agwan, Utkarsha
- Subjects
ARTIFICIAL intelligence ,ARTIFICIAL neural networks ,MACHINE learning ,REINFORCEMENT learning ,DEEP reinforcement learning ,DEEP learning - Abstract
This document is a guest editorial for a special issue on computational methods and artificial intelligence applications in low-carbon energy systems. The editorial highlights the urgent need for advanced computing and artificial intelligence in the clean energy transition to improve system reliability, economics, and sustainability. The special issue includes 19 original research articles covering topics such as energy forecasting, situational awareness, multi-energy system dispatch, and power system operation. The articles present state-of-the-art methods and techniques in these areas, including wind power forecasting, demand-side flexibility, fault diagnosis of photovoltaic strings, and energy management strategies. The authors express their gratitude to the participating authors and anonymous reviewers for their contributions to the special section. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
18. Explainability in Deep Reinforcement Learning: A Review into Current Methods and Applications.
- Author
-
Hickling, Thomas, Zenati, Abdelhafid, Aouf, Nabil, and Spencer, Phillippa
- Published
- 2024
- Full Text
- View/download PDF
19. Deep reinforcement learning for adaptive frequency control of island microgrid considering control performance and economy.
- Author
-
Du, Wanlin, Huang, Xiangmin, Zhu, Yuanzhe, Wang, Ling, Deng, Wenyang, Yin, Linfei, and Saxena, Sahaj
- Subjects
DEEP reinforcement learning ,ADAPTIVE control systems ,MICROGRIDS ,REINFORCEMENT learning ,MAXIMUM entropy method ,INDEPENDENT system operators ,ADAPTIVE fuzzy control - Abstract
To achieve frequency stability and economic efficiency in isolated microgrids, grid operators face a trade-off between multiple performance indicators. This paper introduces a data-driven adaptive load frequency control (DD-ALFC) approach, where the load frequency controller is modeled as an agent that can balance different objectives autonomously. The paper also proposes a priority replay soft actor critic (PR-SAC) algorithm to implement the DD-ALFC method. The PR-SAC algorithm enhances the policy randomness by using entropy regularization and maximization, and improves the learning adaptability and generalization by using priority experience replay. The proposed DD-ALFC method based on the PR-SAC algorithm can achieve higher adaptability and robustness in complex microgrid environments with multiple performance indicators, and improve both the frequency control and the economic efficiency. The paper validates the effectiveness of the proposed method in the Zhuzhou Island microgrid. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. High-Frequency Quantitative Trading of Digital Currencies Based on Fusion of Deep Reinforcement Learning Models with Evolutionary Strategies.
- Author
-
Yijun He, Bo Xu, and Xinpu Su
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,DIGITAL currency ,MACHINE learning ,CRYPTOCURRENCIES ,EVOLUTIONARY models - Abstract
High-frequency quantitative trading in the emerging digital currency market poses unique challenges due to the lack of established methods for extracting trading information. This paper proposes a deep evolutionary reinforcement learning (DERL) model that combines deep reinforcement learning with evolutionary strategies to address these challenges. Reinforcement learning is applied to data cleaning and factor extraction from a high-frequency, microscopic view-point to quantitatively explain the supply and demand imbalance and to create trading strategies. In order to determine whether the algorithm can successfully extract the significant hidden features in the factors when faced with large and complex high-frequency factors, this paper trains the agent in reinforcement learning using three different learning algorithms, including Q-learning, evolutionary strategies, and policy gradient. The experimental dataset, which contains data on sharp up, sharp down, and continuous oscillation situations, was chosen to test Bitcoin in January-February, September, and November of 2022. According to the experimental results, the evolutionary strategies algorithm achieved returns of 59.18%, 25.14%, and 22.72%, respectively. The results demonstrate that deep reinforcement learning based on the evolutionary strategies outperforms Q-learning and policy gradient concerning risk resistance and return capability. The proposed approach offers a robust and adaptive solution for high-frequency trading in the digital currency market, contributing to the development of effective quantitative trading strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. DRL-based Task and Computational Offloading for Internet of Vehicles in Decentralized Computing.
- Author
-
Zhang, Ziyang, Gu, Keyu, and Xu, Zijie
- Abstract
This paper focuses on the problem of computation offloading in a high-mobility Internet of Vehicles (IoVs) environment. The goal is to address the challenges related to latency, energy consumption, and payment cost requirements. The approach considers both moving and parked vehicles as fog nodes, which can assist in offloading computational tasks. However, as the number of vehicles increases, the action space for each agent grows exponentially, posing a challenge for decentralised decision-making. The dynamic nature of vehicular mobility further complicates the network dynamics, requiring joint cooperative behaviour from the learning agents to achieve convergence. The traditional deep reinforcement learning (DRL) approach for offloading in IoVs treats each agent as an independent learner. It ignores the actions of other agents during the training process. This paper utilises a cooperative three-layer decentralised architecture called Vehicle-Assisted Multi-Access Edge Computing (VMEC) to overcome this limitation. The VMEC network consists of three layers: the fog, cloudlet, and cloud layers. In the fog layer, vehicles within associated Roadside Units (RSUs) and neighbouring RSUs participate as fog nodes. The middle layer comprises Mobile Edge Computing (MEC) servers, while the top layer represents the cloud infrastructure. To address the dynamic task offloading problem in VMEC, the paper proposes using a Decentralized Framework of Task and Computational Offloading (DFTCO), which utilises the strength of MADRL and NOMA techniques. This approach considers multiple agents making offloading decisions simultaneously and aims to find the optimal matching between tasks and available resources. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Model-based deep reinforcement learning with heuristic search for satellite attitude control
- Author
-
Xu, Ke, Wu, Fengge, and Zhao, Junsuo
- Published
- 2019
- Full Text
- View/download PDF
23. Research on Scheduling Algorithm of Knitting Production Workshop Based on Deep Reinforcement Learning.
- Author
-
Sun, Lei, Shi, Weimin, Xuan, Chang, and Zhang, Yongchao
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,PRODUCTION scheduling ,METAHEURISTIC algorithms ,KNITTING - Abstract
Intelligent scheduling of knitting workshops is the key to realizing knitting intelligent manufacturing. In view of the uncertainty of the workshop environment, it is difficult for existing scheduling algorithms to flexibly adjust scheduling strategies. This paper proposes a scheduling algorithm architecture based on deep reinforcement learning (DRL). First, the scheduling problem of knitting intelligent workshops is represented by a disjunctive graph, and a mathematical model is established. Then, a multi-proximal strategy (multi-PPO) optimization training algorithm is designed to obtain the optimal strategy, and the job selection strategy and machine selection strategy are trained at the same time. Finally, a knitting intelligent workshop scheduling experimental platform is built, and the algorithm proposed in this paper is compared with common heuristic rules and metaheuristic algorithms for experimental testing. The results show that the algorithm proposed in this paper is superior to heuristic rules in solving the knitting workshop scheduling problem, and can achieve the accuracy of the metaheuristic algorithm. In addition, the response speed of the algorithm in this paper is excellent, which meets the production scheduling needs of knitting intelligent workshops and has a good guiding significance for promoting knitting intelligent manufacturing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Low-carbon economic dispatch strategy for integrated electrical and gas system with GCCP based on multi-agent deep reinforcement learning.
- Author
-
Feng, Wentao, Deng, Bingyan, Zhang, Ziwen, Jiang, He, Zheng, Yanxi, Peng, Xinran, Zhang, Le, Jing, Zhiyuan, Qing, Ke, Xi, Xianpeng, Zhang, Bin, and Li, Mingxuan
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,MACHINE learning ,CARBON emissions ,NATURAL gas ,DEEP learning - Abstract
With the growing concern for the environment, sustainable development centred on a low-carbon economy has become a unifying pursuit for the energy industry. Integrated energy systems (IES) that combine multiple energy sources such as electricity, heat and gas are essential to facilitate the consumption of renewable energy and the reduction of carbon emission. In this paper, gas turbine (GT), carbon capture and storage (CCS) and power-to-gas (P2G) device are introduced to construct a new carbon capture coupling device model, GT-CCS-P2G (GCCP), which is applied to the integrated electrical and gas system (IEGS). Multi-agent soft actor critic (MASAC) applies historical trajectory representations, parameter spatial techniques and deep densification frameworks to reinforcement learning for reducing the detrimental effects of time-series data on the decisional procedure. The energy scheduling problem of IEGS is redefined as a Markov game, which is addressed by adopting a low carbon economic control framework based on MASAC with minimum operating cost and minimum carbon emission as the optimization objectives. To validate the rationality and effectiveness of the proposed low-carbon economy scheduling model of IEGS based on MASAC, this paper simulates and analyses in integrated PJM-5 node system and seven nodes natural gas system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. UAV Coverage Path Planning With Limited Battery Energy Based on Improved Deep Double Q-network.
- Author
-
Ni, Jianjun, Gu, Yu, Gu, Yang, Zhao, Yonghao, and Shi, Pengfei
- Abstract
In response to the increasingly complex problem of patrolling urban areas, the utilization of deep reinforcement learning algorithms for autonomous unmanned aerial vehicle (UAV) coverage path planning (CPP) has gradually become a research hotspot. CPP's solution needs to consider several complex factors, including landing area, target area coverage and limited battery capacity. Consequently, based on incomplete environmental information, policy learned by sample inefficient deep reinforcement learning algorithms are prone to getting trapped in local optima. To enhance the quality of experience data, a novel reward is proposed to guide UAVs in efficiently traversing the target area under battery limitations. Subsequently, to improve the sample efficiency of deep reinforcement learning algorithms, this paper introduces a novel dynamic soft update method, incorporates the prioritized experience replay mechanism, and presents an improved deep double Q-network (IDDQN) algorithm. Finally, simulation experiments conducted on two different grid maps demonstrate that IDDQN outperforms DDQN significantly. Our method simultaneously enhances the algorithm's sample efficiency and safety performance, thereby enabling UAVs to cover a larger number of target areas. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Computation Offloading with Privacy-Preserving in Multi-Access Edge Computing: A Multi-Agent Deep Reinforcement Learning Approach.
- Author
-
Dai, Xiang, Luo, Zhongqiang, and Zhang, Wei
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,EDGE computing ,REINFORCEMENT (Psychology) ,TELECOMMUNICATION ,QUALITY of service ,INTERNET of things - Abstract
The rapid development of mobile communication technologies and Internet of Things (IoT) devices has introduced new challenges for multi-access edge computing (MEC). A key issue is how to efficiently manage MEC resources and determine the optimal offloading strategy between edge servers and user devices, while also protecting user privacy and thereby improving the Quality of Service (QoS). To address this issue, this paper investigates a privacy-preserving computation offloading scheme, designed to maximize QoS by comprehensively considering privacy protection, delay, energy consumption, and the task discard rate of user devices. We first formalize the privacy issue by introducing the concept of privacy entropy. Then, based on quantified indicators, a multi-objective optimization problem is established. To find an optimal solution to this problem, this paper proposes a computation offloading algorithm based on the Twin delayed deep deterministic policy gradient (TD3-SN-PER), which integrates clipped double-Q learning, prioritized experience replay, and state normalization techniques. Finally, the proposed method is evaluated through simulation analysis. The experimental results demonstrate that our approach can effectively balance multiple performance metrics to achieve optimal QoS. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. AI Applications to Enhance Resilience in Power Systems and Microgrids—A Review.
- Author
-
Zahraoui, Younes, Korõtko, Tarmo, Rosin, Argo, Mekhilef, Saad, Seyedmahmoudian, Mehdi, Stojcevski, Alex, and Alhamrouni, Ibrahim
- Abstract
This paper presents an in-depth exploration of the application of Artificial Intelligence (AI) in enhancing the resilience of microgrids. It begins with an overview of the impact of natural events on power systems and provides data and insights related to power outages and blackouts caused by natural events in Estonia, setting the context for the need for resilient power systems. Then, the paper delves into the concept of resilience and the role of microgrids in maintaining power stability. The paper reviews various AI techniques and methods, and their application in power systems and microgrids. It further investigates how AI can be leveraged to improve the resilience of microgrids, particularly during different phases of an event occurrence time (pre-event, during event, and post-event). A comparative analysis of the performance of various AI models is presented, highlighting their ability to maintain stability and ensure a reliable power supply. This comprehensive review contributes significantly to the existing body of knowledge and sets the stage for future research in this field. The paper concludes with a discussion of future work and directions, emphasizing the potential of AI in revolutionizing power system monitoring and control. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Data-driven active corrective control in power systems: an interpretable deep reinforcement learning approach.
- Author
-
Li, Beibei, Liu, Qian, Hong, Yue, He, Yuxiong, Zhang, Lihong, He, Zhihong, Feng, Xiaoze, Gao, Tianlu, Yang, Li, Yan, Ziming, and Zhang, Cong
- Subjects
DEEP reinforcement learning ,ARTIFICIAL intelligence ,REINFORCEMENT learning ,MARKOV processes ,DECISION making - Abstract
With the successful application of artificial intelligence technology in various fields, deep reinforcement learning (DRL) algorithms have applied in active corrective control in the power system to improve accuracy and efficiency. However, the "black-box" nature of deep reinforcement learning models reduces their reliability in practical applications, making it difficult for operators to comprehend the decision-making mechanism. process of these models, thus undermining their credibility. In this paper, a DRL model is constructed based on the Markov decision process (MDP) to effectively address active corrective control issues in a 36-bus system. Furthermore, a feature importance explainability method is proposed, validating that the proposed feature importance-based explainability method enhances the transparency and reliability of the DRL model for active corrective control. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. UAV Path Planning Based on Random Obstacle Training and Linear Soft Update of DRL in Dense Urban Environment.
- Author
-
Zhu, Yanfei, Tan, Yingjie, Chen, Yongfa, Chen, Liudan, and Lee, Kwang Y.
- Subjects
REINFORCEMENT learning ,DEEP reinforcement learning ,ENERGY consumption ,CONSUMPTION (Economics) - Abstract
The three-dimensional (3D) path planning problem of an Unmanned Aerial Vehicle (UAV) considering the effect of environmental wind in a dense city is investigated in this paper. The mission of the UAV is to fly from its initial position to its destination while ensuring safe flight. The dense obstacle avoidance and the energy consumption in 3D space need to be considered during the mission, which are often ignored in common studies. To solve these problems, an improved Deep Reinforcement Learning (DRL) path planning algorithm based on Double Deep Q-Network (DDQN) is proposed in this paper. Among the algorithms, the random obstacle training method is first proposed to make the algorithm consider various flight scenarios more globally and comprehensively and improve the algorithm's robustness and adaptability. Then, the linear soft update strategy is employed to realize the smooth neural network parameter update, which enhances the stability and convergence of the training. In addition, the wind disturbances are integrated into the energy consumption model and reward function, which can effectively describe the wind disturbances during the UAV mission to achieve the minimum drag flight. To prevent the neural network from interfering with training failures, the meritocracy mechanism is proposed to enhance the algorithm's stability. The effectiveness and applicability of the proposed method are verified through simulation analysis and comparative studies. The UAV based on this algorithm has good autonomy and adaptability, which provides a new way to solve the UAV path planning problem in dense urban scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. A Modular Robotic Arm Configuration Design Method Based on Double DQN with Prioritized Experience Replay.
- Author
-
Ding, Ziyan, Tang, Haijun, Wan, Haiying, Zhang, Chengxi, and Sun, Ran
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,ROBOTICS - Abstract
The modular robotic arms can achieve desired performances in different scenarios through the combination of various modules, and concurrently hold the potential to exhibit geometric symmetry and uniform mass symmetry. Therefore, selecting the appropriate combination of modules is crucial for realizing the functions of the robotic arm and ensuring the elegance of the system. To this end, this paper proposes a double deep Q-network (DDQN)-based configuration design algorithm for modular robotic arms, which aims to find the optimal configuration under different tasks. First, a library of small modules of collaborative robotic arms consisting of multiple tandem robotic arms is constructed. These modules are described in a standard format that can be directly imported into the software for simulation, providing greater convenience and flexibility in the development of modular robotic arms. Subsequently, the DDQN design framework for module selection is established to obtain the optimal robotic arm configuration. The proposed method could deal with the overestimation problem in the traditional deep Q-network (DQN) method and improve the estimation accuracy of the value function for each module. In addition, the experience replay mechanism is improved based on the SumTree technique, which enables the algorithm to make effective use of historical experience and prevents the algorithm from falling into local optimal solutions. Finally, comparative experiments are carried out on the PyBullet simulation platform to verify the effectiveness and superiority of the configuration design method developed in the paper. The simulation results show that the proposed DDQN-based method with experience replay mechanism has higher search efficiency and accuracy compared to the traditional DQN scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Reimagining space layout design through deep reinforcement learning.
- Author
-
Kakooee, Reza and Dillenburger, Benjamin
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,COMPUTER-aided design software ,ARCHITECTURAL design ,GENETIC algorithms - Abstract
Space layout design is a critical aspect of architectural design, influencing functionality and aesthetics. The inherent combinatorial nature of layout design poses challenges for traditional planning approaches; thus, it demands the exploration of novel methods. This paper presents a novel framework that leverages the potential of deep reinforcement learning (RL) algorithms to optimize space layouts. RL has demonstrated remarkable success in addressing complex decision-making problems, yet its application in the design process remains relatively unexplored. We argue that RL is particularly well-suited for the design process due to its ability to accommodate offline tasks and seamless integration with existing computer-aided design software, effectively acting as a simulator for design exploration. Framing space layout design as an RL problem and employing RL methods allows for the automated exploration of the expansive design space, thereby enhancing the discovery of innovative solutions. This paper also elucidates the synergy between the design process and the RL problem, which opens new avenues for exploring the potential of RL algorithms in design. We aim to foster experimentation and collaboration within the RL and architecture communities. To facilitate our research, we have developed SpaceLayoutGym , an environment specifically designed for space layout design tasks. SpaceLayoutGym serves as a customizable environment that encapsulates the essential elements of the layout design process within an RL framework. To showcase the effectiveness of SpaceLayoutGym and the capabilities of RL as an artificial space layout designer, we employ the Proximal Policy Optimization (PPO) algorithm to train the RL agent in selected design scenarios with both geometrical constraints and topological objectives. The study further extends to contrast the effectiveness of PPO agents with that of genetic algorithms, and also includes a comparative analysis with existing layouts. Our results demonstrate the potential of RL to optimize space layouts, offering a promising direction for the future of artificial intelligence-aided design. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Editor's introduction.
- Author
-
Kou, Gang
- Subjects
RANDOM walks ,REINFORCEMENT learning ,MACHINE learning ,DEEP reinforcement learning ,BANKING industry ,VOLATILITY (Securities) ,CRYPTOCURRENCIES ,MOBILE banking industry ,RETAIL banking - Abstract
The 46th issue of Financial Innovation (FIN), Volume 10, No.4 (2024) features 30 papers from authors and co-authors representing 25 countries. The papers are categorized into three sub-themes: FinTech, Asset Pricing, and Risk Management and Analysis. The FinTech papers cover topics such as churn prediction in retail banking, text analysis methodologies in blockchain research, user experience in peer-to-peer payment systems, and factors influencing blockchain adoption in Taiwan's banking sector. The Asset Pricing papers examine topics such as financial performance evaluation, prediction of stock outperformance, pricing variance swaps, and modeling OHLC data in candlestick charts. The Risk Management and Analysis papers cover subjects such as the influence of political stability on stock market returns, measuring longevity and mortality risks for life insurance products, predicting defaults in the invoice-trading market, and the impact of hashrate on Bitcoin network security. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
33. Resource Scheduling in URLLC and eMBB Coexistence Based on Dynamic Selection Numerology.
- Author
-
Wang, Lei, Tao, Sijie, Zhao, Lindong, Zhou, Dengyou, Liu, Zhe, and Sun, Yanbing
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,WIRELESS Internet ,RESOURCE allocation ,SIMULATION software ,FEATURE selection - Abstract
This paper focuses on the resource allocation problem of multiplexing two different service scenarios, enhanced mobile broadband (eMBB) and ultrareliable low latency (URLLC) in 5G New Radio, based on dynamic numerology structure, mini-time slot scheduling, and puncturing to achieve optimal resource allocation. To obtain the optimal channel resource allocation under URLLC user constraints, this paper establishes a relevant channel model divided into two convex optimization problems: (a) eMBB resource allocation and (b) URLLC scheduling. We also determine the numerology values at the beginning of each time slot with the help of deep reinforcement learning to achieve flexible resource scheduling. The proposed algorithm is verified in simulation software, and the simulation results show that the dynamic selection of numerologies proposed in this paper can better improve the data transmission rate of eMBB users and reduce the latency of URLLC services compared with the fixed numerology scheme for the same URLLC packet arrival, while the reasonable resource allocation ensures the reliability of URLLC and eMBB communication. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Bioinspired Artificial Intelligence Applications 2023.
- Author
-
Wei, Haoran, Tao, Fei, Huang, Zhenghua, and Long, Yanhua
- Subjects
ARTIFICIAL intelligence ,DEEP learning ,REINFORCEMENT learning ,MACHINE learning ,DEEP reinforcement learning ,NATURAL language processing - Abstract
This document discusses the rapid development of Artificial Intelligence (AI) and its bioinspired applications. It highlights the benefits of bioinspired AI, such as increased accuracy in image and speech processing, reduced cost and energy usage through edge devices, and enhanced bio-signal quality. However, it also acknowledges the challenges posed by improper AI utilization, such as the generation of fake news and security issues. The document calls for research papers on bioinspired AI applications to explore its potential and address these challenges. It includes examples of research papers that utilize deep reinforcement learning for robot task sequencing, propose a real-time multi-surveillance pedestrian target detection model, develop an intelligent breast mass classification approach, and introduce a bio-inspired object detection algorithm for remote sensing images. The document concludes by emphasizing the importance of biomimetic artificial intelligence in various fields and promoting further research in this area. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
35. Longitudinal Hierarchical Control of Autonomous Vehicle Based on Deep Reinforcement Learning and PID Algorithm.
- Author
-
Ma, Jialu, Zhang, Pingping, Li, Yixian, Gao, Yuhang, Zhao, Jiandong, and Hang, Peng
- Subjects
DEEP reinforcement learning ,MACHINE learning ,ACCELERATION (Mechanics) ,AUTONOMOUS vehicles ,LONGITUDINAL method - Abstract
Longitudinal control of autonomous vehicles (AVs) has long been a prominent subject and challenge. A hierarchical longitudinal control system that integrates deep deterministic policy gradient (DDPG) and proportional–integral–derivative (PID) control algorithms was proposed in this paper to ensure safe and efficient vehicle operation. First, a hierarchical control structure was employed to devise the longitudinal control algorithm, utilizing a Carsim‐based model of the vehicle's longitudinal dynamics. Subsequently, an upper controller algorithm was developed, combining DDPG and PID, wherein perceptual information such as leading vehicle speed and distance served as input state for the DDPG algorithm to determine PID parameters and output the desired acceleration of the vehicle. Following this, a lower controller was designed employing a PID‐based driving and braking switching strategy. The disparity between the desired and actual accelerations was fed into the PID, which calculated the control acceleration to enact the driving and braking switching strategy. Finally, the effectiveness of the designed control algorithm was validated through simulation scenarios using Carsim and Simulink. Results demonstrate that the longitudinal control method proposed herein adeptly manages vehicle speed and following distance, thus satisfying the safety requirements of AVs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Research on heterogeneous multi-UAV collaborative decision-making method based on improved PPO.
- Author
-
Xu, Lin, Zhang, Xinmiao, Xiao, Dong, Liu, Beihong, and Liu, Aixue
- Abstract
In order to solve the problem that the Proximal Policy Optimization (PPO) algorithm is difficult to converge in the air-sea battle scenarios with high dynamics, strong interference, and complex state space, the Ray-LAPPO algorithm based on Long Short-Term Memory (LSTM) and Attention mechanism is proposed in this paper under the distributed training framework Ray. Firstly, the idea of Centralized Training Distributed Execution (CTDE) is adopted to extend the PPO algorithm to the field of multi-agent and the policy entropy is added to the loss function to encourage the exploration of agents; Secondly, the LSTM network is added to the actor and critic networks to explore the timing relationship between non-independent and identically distributed samples and improve the learning performance of the UAV; In addition, the Attention mechanism is introduced to obtain the states at different time steps and establish a weighted differentiation model of the final value function; Finally, the simulation experiments on the self-developed heterogeneous UAV collaborative decision-making environment show that Ray-LAPPO can get the most advanced performance in different scenarios, and also possesses potential value for large-scale real-world applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Integrating Evolutionary Game-Theoretical Methods and Deep Reinforcement Learning for Adaptive Strategy Optimization in User-Side Electricity Markets: A Comprehensive Review.
- Author
-
Cheng, Lefeng, Wei, Xin, Li, Manling, Tan, Can, Yin, Meng, Shen, Teng, and Zou, Tao
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,ELECTRICITY markets ,GAME theory ,MULTIAGENT systems ,EVOLUTIONARY algorithms - Abstract
With the rapid development of smart grids, the strategic behavior evolution in user-side electricity market transactions has become increasingly complex. To explore the dynamic evolution mechanisms in this area, this paper systematically reviews the application of evolutionary game theory in user-side electricity markets, focusing on its unique advantages in modeling multi-agent interactions and dynamic strategy optimization. While evolutionary game theory excels in explaining the formation of long-term stable strategies, it faces limitations when dealing with real-time dynamic changes and high-dimensional state spaces. Thus, this paper further investigates the integration of deep reinforcement learning, particularly the deep Q-learning network (DQN), with evolutionary game theory, aiming to enhance its adaptability in electricity market applications. The introduction of the DQN enables market participants to perform adaptive strategy optimization in rapidly changing environments, thereby more effectively responding to supply–demand fluctuations in electricity markets. Through simulations based on a multi-agent model, this study reveals the dynamic characteristics of strategy evolution under different market conditions, highlighting the changing interaction patterns among participants in complex market environments. In summary, this comprehensive review not only demonstrates the broad applicability of evolutionary game theory in user-side electricity markets but also extends its potential in real-time decision making through the integration of modern algorithms, providing new theoretical foundations and practical insights for future market optimization and policy formulation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Automated Vulnerability Exploitation Using Deep Reinforcement Learning.
- Author
-
AlMajali, Anas, Al-Abed, Loiy, Ahmad Yousef, Khalil M., Mohd, Bassam J., Samamah, Zaid, and Abu Shhadeh, Anas
- Subjects
REINFORCEMENT learning ,DEEP reinforcement learning ,REINFORCEMENT (Psychology) ,MACHINE learning ,RISK assessment - Abstract
The main objective of this paper is to develop a reinforcement agent capable of effectively exploiting a specific vulnerability. Automating pentesting can reduce the cost and time of the operation. While there are existing tools like Metasploit Pro that offer automated exploitation capabilities, they often require significant execution times and resources due to their reliance on exhaustive payload testing. In this paper, we have created a deep reinforcement agent specifically configured to exploit a targeted vulnerability. Through a training phase, the agent learns and stores payloads along with their corresponding reward values in a neural network. When encountering a specific combination of a target operating system and vulnerability, the agent utilizes its neural network to determine the optimal exploitation options. The novelty of this work lies in employing Deep Reinforcement Learning in vulnerability exploitation analysis. To evaluate our proposed methodology, we conducted training and testing on the Metasploitable platform. The training phase of the reinforcement agent was conducted on two use cases: the first one has one vulnerability, and the second one has four vulnerabilities. Our approach successfully achieved the attacker's primary objective of establishing a reverse shell with a maximum accuracy of 96.6% and 73.6% for use cases one and two, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Enhancing PV Hosting Capacity of Electricity Distribution Networks Using Deep Reinforcement Learning-Based Coordinated Voltage Control.
- Author
-
Suchithra, Jude, Rajabi, Amin, and Robinson, Duane A.
- Subjects
DEEP reinforcement learning ,ELECTRIC power distribution ,VOLTAGE control ,ELECTRICAL load ,PHOTOVOLTAIC power systems ,ELECTRON tube grids - Abstract
Coordinated voltage control enables the active management of voltage levels throughout electricity distribution networks by leveraging the voltage support capabilities of existing grid-connected PV inverters. The efficient management of power flows and precise voltage regulation through coordinated voltage control schemes facilitate the increased adoption of rooftop PV systems and enhance the hosting capacity of electricity distribution networks. The research work presented in this paper proposes a coordinated voltage control scheme and evaluates the enhanced hosting capacity utilizing a deep reinforcement learning-based approach. A comparative analysis of the proposed algorithm is presented, and the performance is benchmarked against existing local voltage control schemes. The proposed coordinated voltage control scheme in this paper is evaluated using simulations on a real-world low-voltage electricity distribution network. The evaluation involves quasi-static time series power flow simulations for assessing performance. Furthermore, a discussion is presented that reflects on the strengths and limitations of the proposed scheme based on the results observed from the case study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Research on Electric Hydrogen Hybrid Storage Operation Strategy for Wind Power Fluctuation Suppression.
- Author
-
Li, Dongsen, Qian, Kang, Gao, Ciwei, Xu, Yiyue, Xing, Qiang, and Wang, Zhangfan
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,HYDROGEN storage ,WIND power plants ,RENEWABLE energy sources - Abstract
Due to real-time fluctuations in wind farm output, large-scale renewable energy (RE) generation poses significant challenges to power system stability. To address this issue, this paper proposes a deep reinforcement learning (DRL)-based electric hydrogen hybrid storage (EHHS) strategy to mitigate wind power fluctuations (WPFs). First, a wavelet packet power decomposition algorithm based on variable frequency entropy improvement is proposed. This algorithm characterizes the energy characteristics of the original wind power in different frequency bands. Second, to minimize WPF and the comprehensive operating cost of EHHS, an optimization model for suppressing wind power in the integrated power and hydrogen system (IPHS) is constructed. Next, considering the real-time and stochastic characteristics of wind power, the wind power smoothing model is transformed into a Markov decision process. A modified proximal policy optimization (MPPO) based on wind power deviation is proposed for training and solving. Based on the DRL agent's real-time perception of wind power energy characteristics and the IPHS operation status, a WPF smoothing strategy is formulated. Finally, a numerical analysis based on a specific wind farm is conducted. The simulation results based on MATLAB R2021b show that the proposed strategy effectively suppresses WPF and demonstrates excellent convergence stability. The comprehensive performance of the MPPO is improved by 21.25% compared with the proximal policy optimization (PPO) and 42.52% compared with MPPO. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Model inductive bias enhanced deep reinforcement learning for robot navigation in crowded environments.
- Author
-
Chen, Man, Huang, Yongjie, Wang, Weiwen, Zhang, Yao, Xu, Lei, and Pan, Zhisong
- Subjects
DEEP reinforcement learning ,REWARD (Psychology) ,ROBOT motion ,ROBOTS ,SOCIAL interaction ,REINFORCEMENT learning ,MOBILE robots - Abstract
Navigating mobile robots in crowded environments poses a significant challenge and is essential for the coexistence of robots and humans in future intelligent societies. As a pragmatic data-driven approach, deep reinforcement learning (DRL) holds promise for addressing this challenge. However, current DRL-based navigation methods have possible improvements in understanding agent interactions, feedback mechanism design, and decision foresight in dynamic environments. This paper introduces the model inductive bias enhanced deep reinforcement learning (MIBE-DRL) method, drawing inspiration from a fusion of data-driven and model-driven techniques. MIBE-DRL extensively incorporates model inductive bias into the deep reinforcement learning framework, enhancing the efficiency and safety of robot navigation. The proposed approach entails a multi-interaction network featuring three modules designed to comprehensively understand potential agent interactions in dynamic environments. The pedestrian interaction module can model interactions among humans, while the temporal and spatial interaction modules consider agent interactions in both temporal and spatial dimensions. Additionally, the paper constructs a reward system that fully accounts for the robot's direction and position factors. This system's directional and positional reward functions are built based on artificial potential fields (APF) and navigation rules, respectively, which can provide reasoned evaluations for the robot's motion direction and position during training, enabling it to receive comprehensive feedback. Furthermore, the incorporation of Monte-Carlo tree search (MCTS) facilitates the development of a foresighted action strategy, enabling robots to execute actions with long-term planning considerations. Experimental results demonstrate that integrating model inductive bias significantly enhances the navigation performance of MIBE-DRL. Compared to state-of-the-art methods, MIBE-DRL achieves the highest success rate in crowded environments and demonstrates advantages in navigation time and maintaining a safe social distance from humans. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. A Robust Human–Machine Framework for Project Portfolio Selection.
- Author
-
Chen, Hang, Zhang, Nannan, Dou, Yajie, and Dai, Yulong
- Subjects
DEEP reinforcement learning ,HEURISTIC algorithms ,DEEP learning ,NP-hard problems ,COMBINATORIAL optimization - Abstract
Based on the project portfolio selection and scheduling problem (PPSS), the development of a systematic and scientific project scheduling plan necessitates comprehensive consideration of individual preferences and multiple realistic constraints, rendering it an NP-hard problem. Simultaneously, accurately and swiftly evaluating the value of projects as a complex entity poses a challenging issue that requires urgent attention. This paper introduces a novel qualitative evaluation-based project value assessment process that significantly reduces the cost and complexity of project value assessment, upon which a preference-based deep reinforcement learning method is presented for computing and solving project subsets and time scheduling plans. This paper first determines the key parameter values of the algorithm through specific examples. Then, using the method of controlling variables, it explores the sensitivity of the algorithm to changes in problem size and dimensionality. Finally, the proposed algorithm is compared with two classical algorithms and two heuristic algorithms across different instances. The experimental results demonstrate that the proposed algorithm exhibits higher effectiveness and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Artificial Intelligence-Based Adaptive Traffic Signal Control System: A Comprehensive Review.
- Author
-
Agrahari, Anurag, Dhabu, Meera M., Deshpande, Parag S., Tiwari, Ashish, Baig, Mogal Aftab, and Sawarkar, Ankush D.
- Subjects
DEEP reinforcement learning ,ARTIFICIAL intelligence ,SUSTAINABLE transportation ,REINFORCEMENT learning ,TRAFFIC signs & signals ,TRAFFIC signal control systems ,INTELLIGENT transportation systems - Abstract
The exponential increase in vehicles, quick urbanization, and rising demand for transportation are straining the world's road infrastructure today. To have a sustainable transportation system with dynamic traffic volume, an Adaptive Traffic Signal Control system (ATSC) should be contemplated to reduce urban traffic congestion and, thus, help reduce the carbon footprints/emissions of greenhouse gases. With dynamic cleave, the ATSC system can adapt the signal timing settings in real-time according to seasonal and short-term variations in traffic demand, enhancing the effectiveness of traffic operations on urban road networks. This paper provides a comprehensive study on the insights, technical lineaments, and status of various research work in ATSC. In this paper, the ATSC is categorized based on several road intersections (RIs), viz., single-intersection (SI) and multiple-intersection (MI) techniques, viz., Fuzzy Logic (FL), Metaheuristic (MH), Dynamic Programming (DP), Reinforcement Learning (RL), Deep Reinforcement Learning (DRL), and hybrids used for developing Traffic Signal Control (TSC) systems. The findings from this review demonstrate that modern ATSC systems designed using various techniques offer substantial improvements in managing the dynamic density of the traffic flow. There is still a lot of scope to research by increasing the number of RIs while designing the ATSC system to suit real-life applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. A Review of Recent Techniques for Human Activity Recognition: Multimodality, Reinforcement Learning, and Language Models.
- Author
-
Oleh, Ugonna, Obermaisser, Roman, and Ahammed, Abu Shad
- Subjects
DEEP reinforcement learning ,LANGUAGE models ,HUMAN activity recognition ,HUMAN behavior ,ALGORITHMS ,REINFORCEMENT learning - Abstract
Human Activity Recognition (HAR) is a rapidly evolving field with the potential to revolutionise how we monitor and understand human behaviour. This survey paper provides a comprehensive overview of the state-of-the-art in HAR, specifically focusing on recent techniques such as multimodal techniques, Deep Reinforcement Learning and large language models. It explores the diverse range of human activities and the sensor technologies employed for data collection. It then reviews novel algorithms used for Human Activity Recognition with emphasis on multimodality, Deep Reinforcement Learning and large language models. It gives an overview of multimodal datasets with physiological data. It also delves into the applications of HAR in healthcare. Additionally, the survey discusses the challenges and future directions in this exciting field, highlighting the need for continued research and development to fully realise the potential of HAR in various real-world applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Deep Reinforcement Learning with Local Attention for Single Agile Optical Satellite Scheduling Problem.
- Author
-
Liu, Zheng, Xiong, Wei, Han, Chi, and Yu, Xiaolan
- Subjects
REINFORCEMENT learning ,DEEP reinforcement learning ,MACHINE learning ,HEURISTIC ,STRUCTURAL frames - Abstract
This paper investigates the single agile optical satellite scheduling problem, which has received increasing attention due to the rapid growth in earth observation requirements. Owing to the complicated constraints and considerable solution space of this problem, the conventional exact methods and heuristic methods, which are sensitive to the problem scale, demand high computational expenses. Thus, an efficient approach is demanded to solve this problem, and this paper proposes a deep reinforcement learning algorithm with a local attention mechanism. A mathematical model is first established to describe this problem, which considers a series of complex constraints and takes the profit ratio of completed tasks as the optimization objective. Then, a neural network framework with an encoder–decoder structure is adopted to generate high-quality solutions, and a local attention mechanism is designed to improve the generation of solutions. In addition, an adaptive learning rate strategy is proposed to guide the actor–critic training algorithm to dynamically adjust the learning rate in the training process to enhance the training effectiveness of the proposed network. Finally, extensive experiments verify that the proposed algorithm outperforms the comparison algorithms in terms of solution quality, generalization performance, and computation efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Reputation-Driven Asynchronous Federated Learning for Optimizing Communication Efficiency in Big Data Labeling Systems.
- Author
-
Sheng, Xuanzhu, Yu, Chao, Zhou, Yang, and Cui, Xiaolong
- Subjects
REINFORCEMENT learning ,FEDERATED learning ,GRAPH neural networks ,DEEP reinforcement learning ,TELECOMMUNICATION systems - Abstract
With the continuous improvement of the performance of artificial intelligence and neural networks, a new type of computing architecture-edge computing, came into being. However, when the scale of hybrid intelligent edge systems expands, there are redundant communications between the node and the parameter server; the cost of these redundant communications cannot be ignored. This paper proposes a reputation-based asynchronous model update scheme and formulates the federated learning scheme as an optimization problem. First, the explainable reputation consensus mechanism for hybrid intelligent labeling systems communication is proposed. Then, during the process of local intelligent data annotation, significant challenges in consistency, personalization, and privacy protection posed by the federated recommendation system prompted the development of a novel federated recommendation framework utilizing a graph neural network. Additionally, the method of information interaction model fusion was adopted to address data heterogeneity and enhance the uniformity of distributed intelligent annotation. Furthermore, to mitigate communication delays and overhead, an asynchronous federated learning mechanism was devised based on the proposed reputation consensus mechanism. This mechanism leverages deep reinforcement learning to optimize the selection of participating nodes, aiming to maximize system utility and streamline data sharing efficiency. Lastly, integrating the learned models into blockchain technology and conducting validation ensures the reliability and security of shared data. Numerical findings underscore that the proposed federated learning scheme achieves higher learning accuracy and enhances communication efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Method for Bottle Opening with a Dual-Arm Robot.
- Author
-
Naranjo-Campos, Francisco J., Victores, Juan G., and Balaguer, Carlos
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,AUTONOMOUS robots ,REHABILITATION technology ,COMPUTER vision - Abstract
This paper introduces a novel approach to robotic assistance in bottle opening using the dual-arm robot TIAGo++. The solution enhances accessibility by addressing the needs of individuals with injuries or disabilities who may require help with common manipulation tasks. The aim of this paper is to propose a method involving vision, manipulation, and learning techniques to effectively address the task of bottle opening. The process begins with the acquisition of bottle and cap positions using an RGB-D camera and computer vision. Subsequently, the robot picks the bottle with one gripper and grips the cap with the other, each by planning safe trajectories. Then, the opening procedure is executed via a position and force control scheme that ensures both grippers follow the unscrewing path defined by the cap thread. Within the control loop, force sensor information is employed to control the vertical axis movements, while gripper rotation control is achieved through a Deep Reinforcement Learning (DRL) algorithm trained to determine the optimal angle increments for rotation. The results demonstrate the successful training of the learning agent. The experiments confirm the effectiveness of the proposed method in bottle opening with the TIAGo++ robot, showcasing the practical viability of the approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Research on Energy Management in Hydrogen–Electric Coupled Microgrids Based on Deep Reinforcement Learning.
- Author
-
Shi, Tao, Zhou, Hangyu, Shi, Tianyu, and Zhang, Minghui
- Subjects
ARTIFICIAL neural networks ,DEEP reinforcement learning ,REINFORCEMENT learning ,HYDROGEN as fuel ,MICROGRIDS ,PHOTOVOLTAIC power generation - Abstract
Hydrogen energy represents an ideal medium for energy storage. By integrating hydrogen power conversion, utilization, and storage technologies with distributed wind and photovoltaic power generation techniques, it is possible to achieve complementary utilization and synergistic operation of multiple energy sources in the form of microgrids. However, the diverse operational mechanisms, varying capacities, and distinct forms of distributed energy sources within hydrogen-coupled microgrids complicate their operational conditions, making fine-tuned scheduling management and economic operation challenging. In response, this paper proposes an energy management method for hydrogen-coupled microgrids based on the deep deterministic policy gradient (DDPG). This method leverages predictive information on photovoltaic power generation, load power, and other factors to simulate energy management strategies for hydrogen-coupled microgrids using deep neural networks and obtains the optimal strategy through reinforcement learning, ultimately achieving optimized operation of hydrogen-coupled microgrids under complex conditions and uncertainties. The paper includes analysis using typical case studies and compares the optimization effects of the deep deterministic policy gradient and deep Q networks, validating the effectiveness and robustness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Research on Self-Learning Control Method of Reusable Launch Vehicle Based on Neural Network Architecture Search.
- Author
-
Xue, Shuai, Wang, Zhaolei, Bai, Hongyang, Yu, Chunmei, and Li, Zian
- Subjects
DEEP reinforcement learning ,MACHINE learning ,LAUNCH vehicles (Astronautics) ,OPTIMIZATION algorithms ,ROCKETS (Aeronautics) ,REINFORCEMENT learning ,PARTICLE swarm optimization - Abstract
Reusable launch vehicles need to face complex and diverse environments during flight. The design of rocket recovery control law based on traditional deep reinforcement learning (DRL) makes it difficult to obtain a set of network architectures that can adapt to multiple scenarios and multi-parameter uncertainties, and the performance of deep reinforcement learning algorithm depends on manual trial and error of hyperparameters. To solve this problem, this paper proposes a self-learning control method for launch vehicle recovery based on neural architecture search (NAS), which decouples deep network structure search and reinforcement learning hyperparameter optimization. First, using network architecture search technology based on a multi-objective hybrid particle swarm optimization algorithm, the proximal policy optimization algorithm of deep network architecture is automatically designed, and the search space is lightweight design in the process. Secondly, in order to further improve the landing accuracy of the launch vehicle, the Bayesian optimization (BO) method is used to automatically optimize the hyperparameters of reinforcement learning, and the control law of the landing phase in the recovery process of the launch vehicle is obtained through training. Finally, the algorithm is transplanted to the rocket intelligent learning embedded platform for comparative testing to verify its online deployment capability. The simulation results show that the proposed method can satisfy the landing accuracy of the launch vehicle recovery mission, and the control effect is basically the same as the landing accuracy of the trained rocket model under the untrained condition of model parameter deviation and wind field interference, which verifies the generalization of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Autonomous Trajectory Planning Method for Stratospheric Airship Regional Station-Keeping Based on Deep Reinforcement Learning.
- Author
-
Liu, Sitong, Zhou, Shuyu, Miao, Jinggang, Shang, Hai, Cui, Yuxuan, and Lu, Ying
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,AIRSHIPS ,CLASSROOM environment ,LEARNING ability - Abstract
The stratospheric airship, as a near-space vehicle, is increasingly utilized in scientific exploration and Earth observation due to its long endurance and regional observation capabilities. However, due to the complex characteristics of the stratospheric wind field environment, trajectory planning for stratospheric airships is a significant challenge. Unlike lower atmospheric levels, the stratosphere presents a wind field characterized by significant variability in wind speed and direction, which can drastically affect the stability of the airship's trajectory. Recent advances in deep reinforcement learning (DRL) have presented promising avenues for trajectory planning. DRL algorithms have demonstrated the ability to learn complex control strategies autonomously by interacting with the environment. In particular, the proximal policy optimization (PPO) algorithm has shown effectiveness in continuous control tasks and is well suited to the non-linear, high-dimensional problem of trajectory planning in dynamic environments. This paper proposes a trajectory planning method for stratospheric airships based on the PPO algorithm. The primary contributions of this paper include establishing a continuous action space model for stratospheric airship motion; enabling more precise control and adjustments across a broader range of actions; integrating time-varying wind field data into the reinforcement learning environment; enhancing the policy network's adaptability and generalization to various environmental conditions; and enabling the algorithm to automatically adjust and optimize flight paths in real time using wind speed information, reducing the need for human intervention. Experimental results show that, within its wind resistance capability, the airship can achieve long-duration regional station-keeping, with a maximum station-keeping time ratio (STR) of up to 0.997. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.