50 results
Search Results
2. Deep and Reinforcement Learning in Virtual Synchronous Generator: A Comprehensive Review.
- Author
-
Ding, Xiaoke and Cao, Junwei
- Subjects
- *
DEEP reinforcement learning , *REINFORCEMENT learning , *SYNCHRONOUS generators , *MICROGRIDS , *ARTIFICIAL intelligence , *DEEP learning , *ELECTRIC power distribution grids - Abstract
The virtual synchronous generator (VSG) is an important concept and primary control method in modern power systems. The penetration of power-electronics-based distributed generators in the power grid provides uncertainty and reduces the inertia of the system, thus increasing the risk of instability when disturbance occurs. The VSG produces virtual inertia by introducing the dynamic characteristics of the synchronous generator, which provides inertia and becomes a grid-forming control method. The disadvantages of the VSG are that there are many parameters to be adjusted and its operation process is complicated. However, with the rapid development of artificial intelligence (AI) technology, the powerful adaptive learning capability of AI algorithms provides potential solutions to this issue. Two research hotspots are deep learning (DL) and reinforcement learning (RL). This paper presents a comprehensive review of these two techniques combined with VSG control in the energy internet (EI). Firstly, the basic principle and classification of the VSG are introduced. Next, the development of DL and RL algorithms is briefly reviewed. Then, recent research on VSG control based on DL and RL algorithms are summarized. Finally, some main challenges and study trends are discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Call for Papers.
- Subjects
- *
ARTIFICIAL intelligence , *REINFORCEMENT learning , *MACHINE learning , *DEEP learning , *INTELLIGENT networks - Abstract
The article reports that With the continued growth of IoT devices and their deployment, manually managing and connecting them is impractical and presents multiple challenges. To that end, Zero Touch Networks that rely on software-based modules instead of dedicated propriety hardware become a viable potential solution. The overall aim of zero-touch networks is for machines to learn how to become more autonomous so that we can delegate complex, mundane tasks to them.
- Published
- 2022
- Full Text
- View/download PDF
4. Call for Papers.
- Subjects
- *
ARTIFICIAL intelligence , *REINFORCEMENT learning , *MACHINE learning , *DEEP learning , *INTELLIGENT networks - Published
- 2022
- Full Text
- View/download PDF
5. Bioinspired Artificial Intelligence Applications 2023.
- Author
-
Wei, Haoran, Tao, Fei, Huang, Zhenghua, and Long, Yanhua
- Subjects
- *
ARTIFICIAL intelligence , *DEEP learning , *REINFORCEMENT learning , *MACHINE learning , *DEEP reinforcement learning , *NATURAL language processing - Abstract
This document discusses the rapid development of Artificial Intelligence (AI) and its bioinspired applications. It highlights the benefits of bioinspired AI, such as increased accuracy in image and speech processing, reduced cost and energy usage through edge devices, and enhanced bio-signal quality. However, it also acknowledges the challenges posed by improper AI utilization, such as the generation of fake news and security issues. The document calls for research papers on bioinspired AI applications to explore its potential and address these challenges. It includes examples of research papers that utilize deep reinforcement learning for robot task sequencing, propose a real-time multi-surveillance pedestrian target detection model, develop an intelligent breast mass classification approach, and introduce a bio-inspired object detection algorithm for remote sensing images. The document concludes by emphasizing the importance of biomimetic artificial intelligence in various fields and promoting further research in this area. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
6. Reinforcement Learning Applied to AI Bots in First-Person Shooters: A Systematic Review.
- Author
-
Almeida, Pedro, Carvalho, Vitor, and Simões, Alberto
- Subjects
- *
ARTIFICIAL intelligence , *MACHINE learning , *REINFORCEMENT learning , *VIDEO games , *DEEP learning , *EDUCATIONAL games - Abstract
Reinforcement Learning is one of the many machine learning paradigms. With no labelled data, it is concerned with balancing the exploration and exploitation of an environment with one or more agents present in it. Recently, many breakthroughs have been made in the creation of these agents for video game machine learning development, especially in first-person shooters with platforms such as ViZDoom, DeepMind Lab, and Unity's ML-Agents. In this paper, we review the state-of-the-art of creation of Reinforcement Learning agents for use in multiplayer deathmatch first-person shooters. We selected various platforms, frameworks, and training architectures from various papers and examined each of them, analysing their uses. We compared each platform and training architecture, and then concluded whether machine learning agents can now face off against humans and whether they make for better gameplay than traditional Artificial Intelligence. In the end, we thought about future research and what researchers should keep in mind when exploring and testing this area. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Suppression of Roll Oscillations of a Canard-Configuration Model Using Fluid Effector and Reinforcement Learning.
- Author
-
Dong, Yizhang, Shi, Zhiwei, Chen, Kun, and Ge, Zengran
- Subjects
- *
FLOW separation , *PARTICLE image velocimetry , *JETS (Fluid dynamics) , *REINFORCEMENT learning , *DEEP learning , *OSCILLATIONS , *ARTIFICIAL intelligence , *WIND tunnels - Abstract
High-angle-of-attack uncommanded roll oscillations are dangerous and can cause significant challenges in flight control. This paper constructs a stability augmented system to suppress roll oscillations with nonzero mean roll angles in a canard-configuration model. To overcome the problem of weak traditional ailerons caused by large-scale flow separations at high angles of attack, spanwise blowing was used as fluid effectors to generate lateral control moments. The control effect and mechanism of spanwise blowing were analyzed through the results of force measurements and experiments using particle image velocimetry (PIV), respectively. Spanwise blowing generates the control moment by changing the trajectory of the leading-edge vortex and delaying vortex breakdown. Subsequently, virtual flight experiment technology was used to train a policy for the stability augmented system based on real-world data using deep reinforcement learning in the wind tunnel. When testing the agent, the transient flow fields around the model were obtained synchronously using time-resolved particle image velocimetry (TR-PIV). The test results showed that the agent learned to keep the model roll at approximately zero by effectively controlling the flow field using fluid effectors. The rapid development of artificial intelligence (AI) brings new ideas for various industries. Among various AI technologies, deep reinforcement learning is a self-evolving technique that is suitable for solving complex control and decision-making problems. on the other hand, the complex dynamic characteristics of aircraft at high angles of attack leads to the emergence of uncommanded motions, making flight dangerous. Therefore, this paper focuses on the suppression of uncommanded motion of canard configuration at high angles of attack, using deep reinforcement learning. The technology of jet flow control was used to play the role of aileron. After enough training in wind tunnel, the AI finally learned how to suppress the uncommanded motion of the aircraft and showed interesting behavioral logic. The results of this paper show that deep reinforcement learning can be used for complex control problems in aerospace science; however, the practicality of deep reinforcement learning in real flight needs further verification. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. Enhancing Short Track Speed Skating Performance through Improved DDQN Tactical Decision Model.
- Author
-
Yang, Yuanbo, Li, Feimo, and Chang, Hongxing
- Subjects
- *
DEEP reinforcement learning , *REINFORCEMENT learning , *DEEP learning , *ARTIFICIAL intelligence , *SPEED , *OLYMPIC Winter Games - Abstract
This paper studies the tactical decision-making model of short track speed skating based on deep reinforcement learning, so as to improve the competitive performance of corresponding short track speed skaters. Short track speed skating, a traditional discipline in the Winter Olympics since its establishment in 1988, has consistently garnered attention. As artificial intelligence continues to advance, the utilization of deep learning methods to enhance athletes' tactical decision-making capabilities has become increasingly prevalent. Traditional tactical decision techniques often rely on the experience and knowledge of coaches and video analysis methods that require a lot of time and effort. Consequently, this study proposes a scientific simulation environment for short track speed skating, that accurately simulates the physical attributes of the venue, the physiological fitness of the athletes, and the rules of the competition. The Double Deep Q-Network (DDQN) model is enhanced and utilized, with improvements to the reward function and the distinct description of four tactics. This enables agents to learn optimal tactical decisions in various competitive states with a simulation environment. Experimental results demonstrate that this approach effectively enhances the competition performance and physiological fitness allocation of short track speed skaters. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Application of Deep Reinforcement Learning to UAV Swarming for Ground Surveillance.
- Author
-
Arranz, Raúl, Carramiñana, David, Miguel, Gonzalo de, Besada, Juan A., and Bernardos, Ana M.
- Subjects
- *
DEEP reinforcement learning , *REINFORCEMENT learning , *DRONE aircraft , *DEEP learning , *SECURITIES industry laws - Abstract
This paper summarizes in depth the state of the art of aerial swarms, covering both classical and new reinforcement-learning-based approaches for their management. Then, it proposes a hybrid AI system, integrating deep reinforcement learning in a multi-agent centralized swarm architecture. The proposed system is tailored to perform surveillance of a specific area, searching and tracking ground targets, for security and law enforcement applications. The swarm is governed by a central swarm controller responsible for distributing different search and tracking tasks among the cooperating UAVs. Each UAV agent is then controlled by a collection of cooperative sub-agents, whose behaviors have been trained using different deep reinforcement learning models, tailored for the different task types proposed by the swarm controller. More specifically, proximal policy optimization (PPO) algorithms were used to train the agents' behavior. In addition, several metrics to assess the performance of the swarm in this application were defined. The results obtained through simulation show that our system searches the operation area effectively, acquires the targets in a reasonable time, and is capable of tracking them continuously and consistently. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. Deep and Reinforcement Learning Technologies on Internet of Vehicle (IoV) Applications: Current Issues and Future Trends.
- Author
-
Elmoiz Alatabani, Lina, Sayed Ali, Elmustafa, Mokhtar, Rania A., Saeed, Rashid A., Alhumyani, Hesham, and Kamrul Hasan, Mohammad
- Subjects
- *
REINFORCEMENT learning , *DEEP learning , *ARTIFICIAL intelligence , *MACHINE learning , *INTERNET , *QUALITY of service - Abstract
Recently, artificial intelligence (AI) technology has great attention in transportation systems, which led to the emergence of a new concept known as Internet of Vehicles (IoV). The IoV has been associated with the IoT revolution and has become an active field of research due to the great need, in addition to the increase in the various applications of vehicle communication. AI provides unique solutions to enhance the quality of services (QoS) and performance of IoV systems as well. In this paper, some concepts related to deep learning networks will be discussed as one of the uses of machine learning in IoV systems, in addition to studying the effect of neural networks (NNs) and their types, as well as deep learning mechanisms that help in processing large amounts of unclassified data. Moreover, this paper briefly discusses the classification and clustering approaches in predicative analysis and reviews their abilities to enhance the performance of IoV application systems. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. LoRaWAN Meets ML: A Survey on Enhancing Performance with Machine Learning.
- Author
-
Farhad, Arshad and Pyun, Jae-Young
- Subjects
- *
MACHINE learning , *WIDE area networks , *MACHINE performance , *TELECOMMUNICATION , *NETWORK performance - Abstract
The Internet of Things is rapidly growing with the demand for low-power, long-range wireless communication technologies. Long Range Wide Area Network (LoRaWAN) is one such technology that has gained significant attention in recent years due to its ability to provide long-range communication with low power consumption. One of the main issues in LoRaWAN is the efficient utilization of radio resources (e.g., spreading factor and transmission power) by the end devices. To solve the resource allocation issue, machine learning (ML) methods have been used to improve the LoRaWAN network performance. The primary aim of this survey paper is to study and examine the issue of resource management in LoRaWAN that has been resolved through state-of-the-art ML methods. Further, this survey presents the publicly available LoRaWAN frameworks that could be utilized for dataset collection, discusses the required features for efficient resource management with suggested ML methods, and highlights the existing publicly available datasets. The survey also explores and evaluates the Network Simulator-3-based ML frameworks that can be leveraged for efficient resource management. Finally, future recommendations regarding the applicability of the ML applications for resource management in LoRaWAN are illustrated, providing a comprehensive guide for researchers and practitioners interested in applying ML to improve the performance of the LoRaWAN network. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. An Improved Distributed Sampling PPO Algorithm Based on Beta Policy for Continuous Global Path Planning Scheme.
- Author
-
Xiao, Qianhao, Jiang, Li, Wang, Manman, and Zhang, Xin
- Subjects
- *
DISTRIBUTED algorithms , *REINFORCEMENT learning , *NAVIGATION in shipping , *ALGORITHMS - Abstract
Traditional path planning is mainly utilized for path planning in discrete action space, which results in incomplete ship navigation power propulsion strategies during the path search process. Moreover, reinforcement learning experiences low success rates due to its unbalanced sample collection and unreasonable design of reward function. In this paper, an environment framework is designed, which is constructed using the Box2D physics engine and employs a reward function, with the distance between the agent and arrival point as the main, and the potential field superimposed by boundary control, obstacles, and arrival point as the supplement. We also employ the state-of-the-art PPO (Proximal Policy Optimization) algorithm as a baseline for global path planning to address the issue of incomplete ship navigation power propulsion strategy. Additionally, a Beta policy-based distributed sample collection PPO algorithm is proposed to overcome the problem of unbalanced sample collection in path planning by dividing sub-regions to achieve distributed sample collection. The experimental results show the following: (1) The distributed sample collection training policy exhibits stronger robustness in the PPO algorithm; (2) The introduced Beta policy for action sampling results in a higher path planning success rate and reward accumulation than the Gaussian policy at the same training time; (3) When planning a path of the same length, the proposed Beta policy-based distributed sample collection PPO algorithm generates a smoother path than traditional path planning algorithms, such as A*, IDA*, and Dijkstra. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Deep Neural Networks in Power Systems: A Review.
- Author
-
Khodayar, Mahdi and Regan, Jacob
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *REINFORCEMENT learning , *ARTIFICIAL intelligence , *TRENDS , *ELECTRIC power distribution grids - Abstract
Identifying statistical trends for a wide range of practical power system applications, including sustainable energy forecasting, demand response, energy decomposition, and state estimation, is regarded as a significant task given the rapid expansion of power system measurements in terms of scale and complexity. In the last decade, deep learning has arisen as a new kind of artificial intelligence technique that expresses power grid datasets via an extensive hypothesis space, resulting in an outstanding performance in comparison with the majority of recent algorithms. This paper investigates the theoretical benefits of deep data representation in the study of power networks. We examine deep learning techniques described and deployed in a variety of supervised, unsupervised, and reinforcement learning scenarios. We explore different scenarios in which discriminative deep frameworks, such as Stacked Autoencoder networks and Convolution Networks, and generative deep architectures, including Deep Belief Networks and Variational Autoencoders, solve problems. This study's empirical and theoretical evaluation of deep learning encourages long-term studies on improving this modern category of methods to accomplish substantial advancements in the future of electrical systems. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Table-Balancing Cooperative Robot Based on Deep Reinforcement Learning.
- Author
-
Kim, Yewon, Kim, Dae-Won, and Kang, Bo-Yeong
- Subjects
- *
REINFORCEMENT learning , *DEEP learning , *ROBOTS , *ARTIFICIAL intelligence , *HUMAN behavior - Abstract
Reinforcement learning is one of the artificial intelligence methods that enable robots to judge and operate situations on their own by learning to perform tasks. Previous reinforcement learning research has mainly focused on tasks performed by individual robots; however, everyday tasks, such as balancing tables, often require cooperation between two individuals to avoid injury when moving. In this research, we propose a deep reinforcement learning-based technique for robots to perform a table-balancing task in cooperation with a human. The cooperative robot proposed in this paper recognizes human behavior to balance the table. This recognition is achieved by utilizing the robot's camera to take an image of the state of the table, then the table-balance action is performed afterward. Deep Q-network (DQN) is a deep reinforcement learning technology applied to cooperative robots. As a result of learning table balancing, on average, the cooperative robot showed a 90% optimal policy convergence rate in 20 runs of training with optimal hyperparameters applied to DQN-based techniques. In the H/W experiment, the trained DQN-based robot achieved an operation precision of 90%, thus verifying its excellent performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Terahertz Meets AI: The State of the Art.
- Author
-
Farhad, Arshad and Pyun, Jae-Young
- Subjects
- *
TERAHERTZ technology , *DEEP learning , *ARTIFICIAL intelligence , *SPECTRUM allocation , *WIRELESS communications , *REINFORCEMENT learning , *DATA transmission systems - Abstract
Terahertz (THz) is a promising technology for future wireless communication networks, particularly for 6G and beyond. The ultra-wide THz band, ranging from 0.1 to 10 THz, can potentially address the limited capacity and scarcity of spectrum in current wireless systems such as 4G-LTE and 5G. Furthermore, it is expected to support advanced wireless applications requiring high data transmission and quality services, i.e., terabit-per-second backhaul systems, ultra-high-definition streaming, virtual/augmented reality, and high-bandwidth wireless communications. In recent years, artificial intelligence (AI) has been used mainly for resource management, spectrum allocation, modulation and bandwidth classification, interference mitigation, beamforming, and medium access control layer protocols to improve THz performance. This survey paper examines the use of AI in state-of-the-art THz communications, discussing the challenges, potentials, and shortcomings. Additionally, this survey discusses the available platforms, including commercial, testbeds, and publicly available simulators for THz communications. Finally, this survey provides future strategies for improving the existing THz simulators and using AI methods, including deep learning, federated learning, and reinforcement learning, to improve THz communications. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Deep and Reinforcement Learning Technologies on Internet of Vehicle (IoV) Applications: Current Issues and Future Trends.
- Author
-
Elmoiz Alatabani, Lina, Sayed Ali, Elmustafa, Mokhtar, Rania A., Saeed, Rashid A., Alhumyani, Hesham, and Kamrul Hasan, Mohammad
- Subjects
- *
REINFORCEMENT learning , *DEEP learning , *ARTIFICIAL intelligence , *MACHINE learning , *INTERNET , *QUALITY of service - Abstract
Recently, artificial intelligence (AI) technology has great attention in transportation systems, which led to the emergence of a new concept known as Internet of Vehicles (IoV). The IoV has been associated with the IoT revolution and has become an active field of research due to the great need, in addition to the increase in the various applications of vehicle communication. AI provides unique solutions to enhance the quality of services (QoS) and performance of IoV systems as well. In this paper, some concepts related to deep learning networks will be discussed as one of the uses of machine learning in IoV systems, in addition to studying the effect of neural networks (NNs) and their types, as well as deep learning mechanisms that help in processing large amounts of unclassified data. Moreover, this paper briefly discusses the classification and clustering approaches in predicative analysis and reviews their abilities to enhance the performance of IoV application systems. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. Artificial Intelligence and Computational Issues in Engineering Applications.
- Author
-
Grabowska, Karolina, Krzywanski, Jaroslaw, Sosnowski, Marcin, and Skrobek, Dorian
- Subjects
- *
COMPUTATIONAL intelligence , *ARTIFICIAL intelligence , *DEEP learning , *ENGINEERING , *REINFORCEMENT learning , *FLUIDIZED-bed combustion , *CURVE fitting , *MASS transfer - Abstract
The experimental results presented in the paper and achieved using real datasets from Shanghai Telecom indicate that DQN-ESPA outperforms state-of-the-art algorithms such as the simulated annealing placement algorithm, Top-K placement algorithm, K-Means placement algorithm, and random placement algorithm. High-performance supercomputers and emerging computing clusters created in research and development centres are rapidly increasing available computing power, which scientists are eager to use to implement increasingly advanced computing methods [[1]]. Thus, computationally demanding artificial intelligence algorithms and computational fluid dynamics methods are used more widely to consider complex engineering issues and verify and provide new information on entropy or information theory concepts [[2]]. As can be seen above, the original research articles, as well as review articles focused on optimization by artificial intelligence (AI) algorithms on computational and entropy issues, have been submitted to the Special Issue. [Extracted from the article]
- Published
- 2023
- Full Text
- View/download PDF
18. Deep Reinforcement Learning for the Detection of Abnormal Data in Smart Meters.
- Author
-
Sun, Shuxian, Liu, Chunyu, Zhu, Yiqun, He, Haihang, Xiao, Shuai, and Wen, Jiabao
- Subjects
- *
REINFORCEMENT learning , *SMART meters , *SMART power grids , *REWARD (Psychology) , *ARTIFICIAL intelligence , *DEEP learning - Abstract
The rapidly growing power data in smart grids have created difficulties in security management. The processing of large-scale power data with the use of artificial intelligence methods has become a hotspot research topic. Considering the early warning detection problem of smart meters, this paper proposes an abnormal data detection network based on Deep Reinforcement Learning, which includes a main network and a target network composed of deep learning networks. This work uses the greedy policy algorithm to find the action of the maximum value of Q based on the Q-learning method to obtain the optimal calculation policy. It also uses the reward value and discount factor to optimize the target value. In particular, this study uses the fuzzy c-means method to predict the future state information value, which improves the computational accuracy of the Deep Reinforcement Learning model. The experimental results show that compared with the traditional smart meter data anomaly detection method, the proposed model improves the accuracy of meter data anomaly detection. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. End-to-End Deep Policy Feedback-Based Reinforcement Learning Method for Quantization in DNNs.
- Author
-
Logesh Babu, R., Gurumoorthy, Sasikumar, Parameshachari, B. D., Christalin Nelson, S., and Hua, Qiaozhi
- Subjects
- *
ARTIFICIAL neural networks , *REINFORCEMENT learning , *ARTIFICIAL intelligence , *DEEP learning , *GRAPHICS processing units - Abstract
In the resource-constrained embedded systems, the designing of efficient deep neural networks is a challenging process, due to diversity in the artificial intelligence applications. The quantization in deep neural networks superiorly diminishes the storage and computational time by reducing the bit-width of networks encoding. In order to highlight the problem of accuracy loss, the quantization levels are automatically discovered using Policy Feedback-based Reinforcement Learning Method (PF-RELEQ). In this paper, the Proximal Policy Optimization with Policy Feedback (PPO-PF) technique is proposed to determine the best design decisions by choosing the optimum hyper-parameters. In order to enhance the sensitivity of the value function to the change of policy and to improve the accuracy of value estimation at the early learning stage, a policy update method is devised based on the clipped discount factor. In addition, specifically the loss functions of policy satisfy the unbiased estimation of the trust region. The proposed PF-RELEQ effectively balances quality and speed compared to other deep learning methods like ResNet-1202, ResNet-32, ResNet-110, GoogLeNet and AlexNet. The experimental analysis showed that PF-RELEQ achieved 20% computational work-load reduction compared to the existing deep learning methods on ImageNet, CIFAR-10, CIFAR-100 and tomato leaf disease datasets and achieved approximately 2% of improvisation in the validation accuracy. Additionally, the PF-RELEQ needs only 0.55 Graphics Processing Unit on an NVIDIA GTX-1080Ti to develop DNNs that delivers better accuracy improvement with fewer cycle counts for image classification. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Deep reinforcement learning for six degree-of-freedom planetary landing.
- Author
-
Gaudet, Brian, Linares, Richard, and Furfaro, Roberto
- Subjects
- *
DEEP learning , *REINFORCEMENT learning , *AUTOMOTIVE navigation systems , *REAL-time control , *DISCOUNT prices - Abstract
This work develops a deep reinforcement learning based approach for Six Degree-of-Freedom (DOF) planetary powered descent and landing. Future Mars missions will require advanced guidance, navigation, and control algorithms for the powered descent phase to target specific surface locations and achieve pinpoint accuracy (landing error ellipse <5 m radius). This requires both a navigation system capable of estimating the lander's state in real-time and a guidance and control system that can map the estimated lander state to a commanded thrust for each lander engine. In this paper, we present a novel integrated guidance and control algorithm designed by applying the principles of reinforcement learning theory. The latter is used to learn a policy mapping the lander's estimated state directly to a commanded thrust for each engine, resulting in accurate and almost fuel-optimal trajectories over a realistic deployment ellipse. Specifically, we use proximal policy optimization, a policy gradient method, to learn the policy. Another contribution of this paper is the use of different discount rates for terminal and shaping rewards, which significantly enhances optimization performance. We present simulation results demonstrating the guidance and control system's performance in a 6-DOF simulation environment and demonstrate robustness to noise and system parameter uncertainty. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
21. Intelligent traffic light systems using edge flow predictions.
- Author
-
Thahir, Adam Rizvi, Coşkun, Mustafa, Kılıç, Sultan Kübra, and Gungor, Vehbi Cagri
- Subjects
- *
SUPERVISED learning , *TRAFFIC flow , *REINFORCEMENT learning , *TRAFFIC signs & signals , *DEEP learning , *ACTIVE learning , *ARTIFICIAL intelligence - Abstract
In this paper, we propose a novel graph-based semi-supervised learning approach for traffic light management in multiple intersections. Specifically, the basic premise behind our paper is that if we know some of the occupied roads and predict which roads will be congested, we can dynamically change traffic lights at the intersections that are connected to the roads anticipated to be congested. Comparative performance evaluations show that the proposed approach can produce comparable average vehicle waiting time and reduce the training/learning time of learning adequate traffic light configurations for all intersections within a few seconds, while a deep learning-based approach can be trained in a few days for learning similar light configurations. • With this work we aimed to optimize traffic signal systems by dynamically setting the system to behave as per predicted traffic flow. • Vehicle flow is predicted using Graph-based semi-supervised and active learning on edge flows. • The systems were created and tested using Simulation Of Urban Mobility (SUMO). • Vehicle data and routes were also generated using this application. • The proposed approach was compared against other methods, such as an occupancy based method, a scoring-based method and reinforcement learning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Coordinated Wide-Area Damping Control Using Deep Neural Networks and Reinforcement Learning.
- Author
-
Gupta, Pooja, Pal, Anamitra, and Vittal, Vijay
- Subjects
- *
REINFORCEMENT learning , *STATIC VAR compensators , *LINEAR matrix inequalities , *DEEP learning , *ARTIFICIAL intelligence - Abstract
This paper proposes the design of two coordinated wide-area damping controllers (CWADCs) for damping low frequency oscillations (LFOs), while accounting for the uncertainties present in the power system. The controllers based on Deep Neural Network (DNN) and Deep Reinforcement Learning (DRL), respectively, coordinate the operation of different local damping controls such as power system stabilizers (PSSs), static VAr compensators (SVCs), and supplementary damping controllers for DC lines (DC-SDCs). The DNN-CWADC learns to make control decisions using supervised learning; the training dataset consisting of polytopic controllers designed with the help of linear matrix inequality (LMI)-based mixed $H_2/H_\infty$ optimization. The DRL-CWADC learns to adapt to the system uncertainties based on its continuous interaction with the power system environment by employing an advanced version of the state-of-the-art deep deterministic policy gradient (DDPG) algorithm referred to as bounded exploratory control-based DDPG (BEC-DDPG). The studies performed on a 33 machine, 127 bus equivalent model of the Western Electricity Coordinating Council (WECC) system-embedded with different types of damping controls demonstrate the effectiveness of the proposed CWADCs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning.
- Author
-
Zhang, Yifan, Zhao, Peilin, Wu, Qingyao, Li, Bin, Huang, Junzhou, and Tan, Mingkui
- Subjects
- *
PORTFOLIO management (Investments) , *DEEP learning , *REWARD (Psychology) , *ARTIFICIAL intelligence , *TRANSACTION costs , *REINFORCEMENT learning - Abstract
Portfolio Selection is an important real-world financial task and has attracted extensive attention in artificial intelligence communities. This task, however, has two main difficulties: (i) the non-stationary price series and complex asset correlations make the learning of feature representation very hard; (ii) the practicality principle in financial markets requires controlling both transaction and risk costs. Most existing methods adopt handcraft features and/or consider no constraints for the costs, which may make them perform unsatisfactorily and fail to control both costs in practice. In this paper, we propose a cost-sensitive portfolio selection method with deep reinforcement learning. Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations, while a new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning. We theoretically analyze the near-optimality of the proposed reward, which shows that the growth rate of the policy regarding this reward function can approach the theoretical optimum. We also empirically evaluate the proposed method on real-world datasets. Promising results demonstrate the effectiveness and superiority of the proposed method in terms of profitability, cost-sensitivity and representation abilities. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Learning Channel-Wise Interactions for Binary Convolutional Neural Networks.
- Author
-
Wang, Ziwei, Lu, Jiwen, and Zhou, Jie
- Subjects
- *
CONVOLUTIONAL neural networks , *REINFORCEMENT learning , *ARTIFICIAL intelligence , *DEEP learning , *BINARY operations - Abstract
In this paper, we propose a channel-wise interaction based binary convolutional neural networks (CI-BCNN) approach for efficient inference. Conventional binary convolutional neural networks usually apply the xnor and bitcount operations in the binary convolution with notable quantization errors, which obtain opposite signs of pixels in binary feature maps compared to their full-precision counterparts and lead to significant information loss. In our proposed CI-BCNN method, we exploit the channel-wise interactions with the prior knowledge which aims to alleviate inconsistency of signs in binary feature maps and preserves the information of input samples during inference. Specifically, we mine the channel-wise interactions by using a reinforcement learning model, and impose channel-wise priors on the intermediate feature maps to correct inconsistent signs through the interacted bitcount. Since CI-BCNN mines the channel-wise interactions in a large search space where each channel may correlate with others, the search deficiency caused by sparse interactions obstacles the agent to obtain the optimal policy. To address this, we further present a hierarchical channel-wise interaction based binary convolutional neural networks (HCI-BCNN) method to shrink the search space via hierarchical reinforcement learning. Moreover, we propose a denoising interacted bitcount operation in binary convolution by smoothing the channel-wise interactions, so that noise in channel-wise priors can be alleviated. Extensive experimental results on the CIFAR-10 and ImageNet datasets demonstrate the effectiveness of the proposed CI-BCNN and HCI-BCNN. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
25. Distribution System Resilience Under Asynchronous Information Using Deep Reinforcement Learning.
- Author
-
Bedoya, Juan Carlos, Wang, Yubo, and Liu, Chen-Ching
- Subjects
- *
REINFORCEMENT learning , *LARGE scale systems , *ARTIFICIAL intelligence , *DEEP learning - Abstract
Resilience of a distribution system can be enhanced by efficient restoration of critical load following a major outage. Existing models include optimization approaches that consider available information without incorporating the inherent asynchrony of data arrival during execution of the restoration plan. Failure to consider the asynchronous nature of information arrival can lead to underutilization of critical resources. Moreover, analytical models become computationally inefficient for large scale systems. On the other hand, artificial intelligence (AI)-based tools have demonstrated efficient results for power system applications. In this paper, it is proposed a Reinforcement Learning (RL) model that learns how to efficiently restore a distribution system after a major outage. The proposed approach is based on a Monte Carlo Tree Search to expedite the training process. The proposed model strategy provides a robust decision-making tool for asynchronous and partial information scenarios. The results, validated with the IEEE 13-bus test feeder and IEEE 8500-node distribution test feeder, demonstrate the effectiveness and scalability of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. Method of artificial intelligence algorithm to improve the automation level of Rietveld refinement.
- Author
-
Feng, Zhenjie, Hou, Qiang, Zheng, Yonglei, Ren, Wei, Ge, Jun-Yi, Li, Tao, Cheng, Cheng, Lu, Wencong, Cao, Shixun, Zhang, Jincang, and Zhang, Tongyi
- Subjects
- *
ARTIFICIAL intelligence , *RIETVELD refinement , *MARKOV processes , *REINFORCEMENT learning , *DEEP learning , *X-ray powder diffraction - Abstract
Graphical abstract Abstract In this paper, artificial Intelligence (AI) algorithm is used in the Rietveld refinement process instead of the human decision. The program, PowderBot, is developed based on Fullprof engine, which proves the effectiveness of AI in Rietveld refinement. In this program, the decision making in refinement process is modelled as a Markov decision process (MDP), and solved by a reinforcement learning algorithm. PowderBot is designed to be a self-learning system capable of conducting structure refinement without human intervention. The program has already been successfully applied to Rietveld refinements. We hope this paper will encourage more Rietveld programs become more intelligent by the help of AI algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
27. Autonomous PEV Charging Scheduling Using Dyna-Q Reinforcement Learning.
- Author
-
Wang, Fan, Gao, Jie, Li, Mushu, and Zhao, Lian
- Subjects
- *
FUTURES sales & prices , *DEEP learning , *ENERGY consumption , *MARKOV processes , *ARTIFICIAL intelligence , *REINFORCEMENT learning - Abstract
This paper proposes a demand response method to reduce the long-term charging cost of single plug-in electric vehicles (PEV) while overcoming obstacles such as the stochastic nature of the user's driving behaviour, traffic condition, energy usage, and energy price. The problem is formulated as a Markov Decision Process (MDP) with an unknown transition probability matrix and solved using deep reinforcement learning (RL) techniques. The proposed method does not require any initial data on the PEV driver's behaviour and shows improvement on learning speed when compared to a pure model-free reinforcement learning method. A combination of model-based and model-free learning methods called Dyna-Q reinforcement learning is utilized in our strategy. Every time a real experience is obtained, the model is updated, and the RL agent will learn from both the real experience and “imagined” experiences from the model. Due to the vast amount of state space, a table-lookup method is impractical, and a value approximation method using deep neural networks is employed for estimating the long-term expected reward of all state-action pairs. An average of historical price and a long short-term memory (LSTM) network are used to predict future price. Simulation results demonstrate the effectiveness of this approach and its ability to reach an optimal policy quicker while avoiding state of charge (SOC) depletion during trips when compared to existing PEV charging schemes. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
28. Learn to Make Decision with Small Data for Autonomous Driving: Deep Gaussian Process and Feedback Control.
- Author
-
Fang, Wenqi, Zhang, Shitian, Huang, Hui, Dang, Shaobo, Huang, Zhejun, Li, Wenfei, Wang, Zheng, Sun, Tianfu, and Li, Huiyun
- Subjects
- *
GAUSSIAN processes , *REAL-time control , *DECISION making , *REINFORCEMENT learning , *ARTIFICIAL intelligence , *DEEP learning - Abstract
Autonomous driving is a popular and promising field in artificial intelligence. Rapid decision of the next action according to the latest few actions and status, such as acceleration, brake, and steering angle, is a major concern for autonomous driving. There are some learning methods, such as reinforcement learning which automatically learns the decision. However, it usually requires large volume of samples. In this paper, to reduce the sample size, we exploit the deep Gaussian process, where a regression model is trained on small sample datasets and captures the most significant features correctly. Besides, to realize the real-time and close-loop control, we combine the feedback control into the process. Experimental results on the Torcs simulation engine illustrate smooth driving on virtual road which can be achieved. Compared with the amount of training data in deep reinforcement learning, our method uses only 0.34% of its size and obtains similar simulation results. It may be useful for real road tests in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
29. Multi-granularity fusion resource allocation algorithm based on dual-attention deep reinforcement learning and lifelong learning architecture in heterogeneous IIoT.
- Author
-
Wang, Ying, Shang, Fengjun, and Lei, Jianjun
- Subjects
- *
RESOURCE allocation , *METACOGNITION , *ARTIFICIAL intelligence , *REINFORCEMENT learning , *LEARNING ability , *ALGORITHMS , *DEEP learning - Abstract
Deep reinforcement learning (DRL) is a promising technology to address the resource allocation problem for efficient data transmission in complex network environments. However, most DRL-based resource allocation algorithms suffer from limited feature extraction capabilities and lack scalability and generalization, especially in heterogeneous Industrial Internet of Things (IIoT) environments. In this paper, we develop a lifelong learning architecture that can integrate artificial intelligence (AI) algorithms into the heterogeneous IIoT network for efficient data transmission. Based on this, we propose an intelligent resource allocation algorithm based on dual-attention DRL (DADR) for forwarding node selection and channel access slot allocation in a specific network environment. The proposed DADR algorithm combines the advantages of multi-dimension convolutional attention and multi-head self-attention mechanisms. It can provide local- and global-feature fusion capabilities for distributed nodes while maximizing the performance of data transmission. Furthermore, we present a lifelong federated meta reinforcement learning (LFMRL) that can effectively utilize prior knowledge and enable the DRL agent quickly adapt to a new environment. Specifically, LFMRL adopts a federated meta learning-based knowledge fusion algorithm to fuse the knowledge of learned DADR algorithms and iteratively update the shared model, thereby improving the scalability and generalization of the shared model in heterogeneous IIoT environments. In addition, a simple and efficient knowledge transfer mechanism is enabled to accelerate the DRL model convergence by transferring the knowledge of the shared model to the new environment. Simulation results demonstrate the effectiveness of the proposed algorithms in terms of energy efficiency, data transmission reliability, and network stability. Compared to DADR and FedAvg algorithms, LFMRL algorithm can further reduce the energy consumption, training time, and average forwarding node switching times, while improving packet delivery rate to 99.2%. • A novel lifelong learning architecture for resource allocation in heterogeneous IIoT. • Dual-attention deep reinforcement learning provides local- and global- feature fusion. • Federated meta learning based knowledge fusion learns the commonness of models. • Reinforcement learning based knowledge transfer learns the properties of model. • Knowledge fusion and knowledge transfer provide lifelong learning ability for system. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. Tracing the evolution of AI in the past decade and forecasting the emerging trends.
- Author
-
Shao, Zhou, Zhao, Ruoyan, Yuan, Sha, Ding, Ming, and Wang, Yongli
- Subjects
- *
DEEP learning , *RECURRENT neural networks , *ARTIFICIAL intelligence , *ECOLOGICAL systems theory , *MACHINE learning , *REINFORCEMENT learning - Abstract
The past decade has witnessed the rapid development of Artificial Intelligence (AI), especially the explosion of deep learning-related connectionist approaches. This study combines traditional literature review, bibliometric methods, and the Science of Science (SciSci) theory to scrutinize the development context of AI in the last decade on AMiner. 4 4 www.aminer.cn , AMiner is an academic mining system, which will be introduced in the following section. With the assistance of AMiner tools and datasets, this paper aims to describe a further explicit context and evolution of AI in the past decade from the development of connectionist approaches. Five aspects of the past decade are highlighted: self-learning and self-coding algorithms, Recurrent Neural Networks (RNN) algorithms, reinforcement learning, pre-trained models, and other typical deep learning algorithms, which represent the significant progress of this field. By combining these critical parts, we then summarize the current limitations and corresponding future of AI trends in the next decade and discuss some topics about the next generation of AI. Discoveries in this paper will benefit AI research in promoting understanding of the current critical stage and future trends of AI development and the AI industry in the dramatic ascendant for the academic research results transformation and its industrial layout. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. Sharing Experience for Behavior Generation of Real Swarm Robot Systems Using Deep Reinforcement Learning.
- Author
-
Toshiyuki Yasuda and Kazuhiro Ohkura
- Subjects
- *
DEEP learning , *AGGREGATION (Robotics) , *COLLECTIVE behavior , *ARTIFICIAL intelligence , *NEURAL circuitry - Abstract
Swarm robotic systems (SRSs) are a type of multirobot system in which robots operate without any form of centralized control. The typical design methodology for SRSs comprises a behavior-based approach, where the desired collective behavior is obtained manually by designing the behavior of individual robots in advance. In contrast, in an automatic design approach, a certain general methodology is adopted. This paper presents a deep reinforcement learning approach for collective behavior acquisition of SRSs. The swarm robots are expected to collect information in parallel and share their experience for accelerating their learning. We conducted real swarm robot experiments and evaluated the learning performance of the swarm in a scenario where the robots consecutively traveled between two landmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
32. A deep learning model for intelligent home energy management system using renewable energy.
- Author
-
Ben Slama, Sami and Mahmoud, Marwan
- Subjects
- *
REINFORCEMENT learning , *ENERGY management , *MACHINE learning , *LEARNING Management System , *DEEP learning , *RENEWABLE energy sources , *ARTIFICIAL intelligence - Abstract
Home automation is seen as a potential pillar of the smart city revolution that combines smart mobility, lifestyle and ecosystem governed by intelligent sensors connected to the internet. Households can save money and be more comfortable with automated appliances. The cost of electricity and user comfort are fundamentally contradictory, so they can be presented as a dynamic multi-objective optimization problem with fluctuating priorities for the customer to use various devices at different times. For this reason, this paper proposes an advanced Intelligent Home Energy Management (IHEM) approach based on reinforcement learning to achieve home demand response (DR) efficiency. The optimal formulation of the one-hour-ahead energy consumption scheduling problem is considered a Markov Decision Process (MDP) with discrete time steps. An efficient Neural Network (NN)-based approach with a Q-learning algorithm is developed to address this problem, enabling the IHEM system to achieve better cost-effective scheduling performance. The accurate data of electricity price and energy supplied by the Photovoltaic (PV) system are analyzed in sliding periods by machine learning for uncertainty prediction. Using the newly developed approach, which has the dual objective of minimizing the electricity bill, it is possible to obtain scheduling decisions for appliances and energy storage. The results show that the proposed optimization method reduces the monthly electricity costs by 20% compared to the Integer Linear Programming (ILP)-based HEMS method. • Artificial intelligence (AI) approach to monitor household energy consumption. • An efficient deep reinforcement learning algorithm to control activity recognition in smart homes. • Design and implementation of Intelligent home energy management system. • Converge promptly and considerably reduce operating expenses adopting AI-based approach. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
33. When architecture meets AI: A deep reinforcement learning approach for system of systems design.
- Author
-
Lin, Menglong, Chen, Tao, Chen, Honghui, Ren, Bangbang, and Zhang, Mengmeng
- Subjects
- *
ARTIFICIAL neural networks , *REINFORCEMENT learning , *DEEP learning , *ARTIFICIAL intelligence , *HEURISTIC algorithms , *SYSTEM of systems , *COMBINATORIAL optimization - Abstract
How to design System of Systems has been widely concerned in recent years, especially in military applications. This problem is also known as SoS architecting, which can be boiled down to two subproblems: selecting a number of systems from a set of candidates and specifying the tasks to be completed for each selected system. Essentially, such a problem can be reduced to a combinatorial optimization problem. Traditional exact solvers such as branch-bound algorithm are not efficient enough to deal with large scale cases. Heuristic algorithms are more scalable, but if input changes, these algorithms have to restart the searching process. Re-searching process may take a long time and interfere with the mission achievement of SoS in highly dynamic scenarios, e.g., in the Mosaic Warfare. In this paper, we combine artificial intelligence with SoS architecting and propose a deep reinforcement learning approach DRL-SoSDP for SoS design. Deep neural networks and actor–critic algorithms are used to find the optimal solution with constraints. Evaluation results show that the proposed approach is superior to heuristic algorithms in both solution quality and computation time, especially in large scale cases. DRL-SoSDP can find great solutions in a near real-time manner, showing great potential for cases that require an instant reply. DRL-SoSDP also shows good generalization ability and can find better results than heuristic algorithms even when the scale of SoS is much larger than that in training data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
34. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.
- Author
-
Ren, Zhipeng, Dong, Daoyi, Li, Huaxiong, and Chen, Chunlin
- Subjects
- *
REINFORCEMENT learning , *DEEP learning , *ARTIFICIAL intelligence - Abstract
In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
35. A Multi-Agent Reinforcement Learning Approach to Price and Comfort Optimization in HVAC-Systems.
- Author
-
Blad, Christian, Bøgh, Simon, and Kallesøe, Carsten
- Subjects
- *
REINFORCEMENT learning , *ARTIFICIAL intelligence , *HEATING control , *HEAT pumps , *DEEP learning , *ALGORITHMS - Abstract
This paper addresses the challenge of minimizing training time for the control of Heating, Ventilation, and Air-conditioning (HVAC) systems with online Reinforcement Learning (RL). This is done by developing a novel approach to Multi-Agent Reinforcement Learning (MARL) to HVAC systems. In this paper, the environment formed by the HVAC system is formulated as a Markov Game (MG) in a general sum setting. The MARL algorithm is designed in a decentralized structure, where only relevant states are shared between agents, and actions are shared in a sequence, which are sensible from a system's point of view. The simulation environment is a domestic house located in Denmark and designed to resemble an average house. The heat source in the house is an air-to-water heat pump, and the HVAC system is an Underfloor Heating system (UFH). The house is subjected to weather changes from a data set collected in Copenhagen in 2006, spanning the entire year except for June, July, and August, where heat is not required. It is shown that: (1) When comparing Single Agent Reinforcement Learning (SARL) and MARL, training time can be reduced by 70% for a four temperature-zone UFH system, (2) the agent can learn and generalize over seasons, (3) the cost of heating can be reduced by 19% or the equivalent to 750 kWh of electric energy per year for an average Danish domestic house compared to a traditional control method, and (4) oscillations in the room temperature can be reduced by 40% when comparing the RL control methods with a traditional control method. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
36. An application of deep reinforcement learning to algorithmic trading.
- Author
-
Théate, Thibaut and Ernst, Damien
- Subjects
- *
DEEP learning , *REINFORCEMENT learning , *SHARPE ratio , *STOCK exchanges - Abstract
• Reinforcement learning (RL) formalization of the algorithmic trading problem. • Novel trading strategy based on deep reinforcement learning (DRL), denominated TDQN. • Rigorous performance assessment methodology for algorithmic trading. • TDQN algorithm delivers promising results surpassing benchmark strategies. This scientific research paper presents an innovative approach based on deep reinforcement learning (DRL) to solve the algorithmic trading problem of determining the optimal trading position at any point in time during a trading activity in the stock market. It proposes a novel DRL trading policy so as to maximise the resulting Sharpe ratio performance indicator on a broad range of stock markets. Denominated the Trading Deep Q-Network algorithm (TDQN), this new DRL approach is inspired from the popular DQN algorithm and significantly adapted to the specific algorithmic trading problem at hand. The training of the resulting reinforcement learning (RL) agent is entirely based on the generation of artificial trajectories from a limited set of stock market historical data. In order to objectively assess the performance of trading strategies, the research paper also proposes a novel, more rigorous performance assessment methodology. Following this new performance assessment approach, promising results are reported for the TDQN algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
37. Reinforcement learning in urban network traffic signal control: A systematic literature review.
- Author
-
Noaeen, Mohammad, Naik, Atharva, Goodman, Liana, Crebo, Jared, Abrar, Taimoor, Abad, Zahra Shakeri Hossein, Bazzan, Ana L.C., and Far, Behrouz
- Subjects
- *
TRAFFIC signs & signals , *TRAFFIC engineering , *REINFORCEMENT learning , *DEEP learning , *TRANSPORTATION engineering , *CITY traffic - Abstract
Improvement of traffic signal control (TSC) efficiency has been found to lead to improved urban transportation and enhanced quality of life. Recently, the use of reinforcement learning (RL) in various areas of TSC has gained significant traction; thus, we conducted a systematic literature review as a systematic, comprehensive, and reproducible review to dissect all the existing research that applied RL in the network-level TSC domain, called as RL in NTSC or RL-NTSC for brevity. The review only targeted the network-level articles that tested the proposed methods in networks with two or more intersections. This review covers 160 peer-reviewed articles from 30 countries published from 1994 to March 2020. The goal of this study is to provide the research community with statistical and conceptual knowledge, summarize existence evidence, characterize RL applications in NTSC domains, explore all applied methods and major first events in the defined scope, and identify areas for further research based on the explored research problems in current research. We analyzed the extracted data from the included articles in the following seven categories: (i) publication and authors' data, (ii) method identification and analysis, (iii) environment attributes and traffic simulation, (iv) application domains of RL-NTSC, (v) major first events of RL-NTSC and authors' key statements, (vi) code availability, and (vii) evaluation. This paper provides a comprehensive view of the past 26 years of research on applying RL to NTSC. It also reveals the role of advancing deep learning methods in the revival of the research area, the rise of using non-commercial microscopic traffic simulators, a lack of interaction between traffic and transportation engineering practitioners and researchers, and a lack of proposal and creation of testbeds which can likely bring different communities together around common goals. • A review on Reinforcement Learning in the network-scale Traffic Signal Control area. • Presents a comprehensive systematic literature review of 160 included articles. • Consolidates and characterizes the existing research on the defined area. • Explores the methods, applications, domains, and first events in the defined scope. • Identifies past and present trends and directions for further research in the area. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning.
- Author
-
Hua, Jiang, Zeng, Liangcai, Li, Gongfa, Ju, Zhaojie, and Blažič, Sašo
- Subjects
- *
DEEP learning , *ADAPTIVE control systems , *ARTIFICIAL intelligence , *ROBUST control , *REINFORCEMENT learning , *MANIPULATORS (Machinery) - Abstract
Dexterous manipulation of the robot is an important part of realizing intelligence, but manipulators can only perform simple tasks such as sorting and packing in a structured environment. In view of the existing problem, this paper presents a state-of-the-art survey on an intelligent robot with the capability of autonomous deciding and learning. The paper first reviews the main achievements and research of the robot, which were mainly based on the breakthrough of automatic control and hardware in mechanics. With the evolution of artificial intelligence, many pieces of research have made further progresses in adaptive and robust control. The survey reveals that the latest research in deep learning and reinforcement learning has paved the way for highly complex tasks to be performed by robots. Furthermore, deep reinforcement learning, imitation learning, and transfer learning in robot control are discussed in detail. Finally, major achievements based on these methods are summarized and analyzed thoroughly, and future research challenges are proposed. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function.
- Author
-
Chen, Jie, Wu, Tao, Shi, Meiping, and Jiang, Wei
- Subjects
- *
REWARD (Psychology) , *DRIVERLESS cars , *ARTIFICIAL neural networks , *REINFORCEMENT learning , *ARTIFICIAL intelligence , *DEEP learning , *MACHINE learning - Abstract
Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, driving safely and comfortably in real dynamic scenarios with DRL is nontrivial due to the reward functions being typically pre-defined with expertise. This paper proposes a human-in-the-loop DRL algorithm for learning personalized autonomous driving behavior in a progressive learning way. Specifically, a progressively optimized reward function (PORF) learning model is built and integrated into the Deep Deterministic Policy Gradient (DDPG) framework, which is called PORF-DDPG in this paper. PORF consists of two parts: the first part of the PORF is a pre-defined typical reward function on the system state, the second part is modeled as a Deep Neural Network (DNN) for representing driving adjusting intention by the human observer, which is the main contribution of this paper. The DNN-based reward model is progressively learned using the front-view images as the input and via active human supervision and intervention. The proposed approach is potentially useful for driving in dynamic constrained scenarios when dangerous collision events might occur frequently with classic DRLs. The experimental results show that the proposed autonomous driving behavior learning method exhibits online learning capability and environmental adaptability. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
40. Artificial Intelligence Techniques for Power System Transient Stability Assessment.
- Author
-
Sarajcev, Petar, Kunac, Antonijo, Petrovic, Goran, and Despalatovic, Marin
- Subjects
- *
ELECTRIC transients , *ARTIFICIAL intelligence , *DEEP learning , *RENEWABLE energy sources , *REINFORCEMENT learning , *ELECTRONIC data processing - Abstract
The high penetration of renewable energy sources, coupled with decommissioning of conventional power plants, leads to the reduction of power system inertia. This has negative repercussions on the transient stability of power systems. The purpose of this paper is to review the state-of-the-art regarding the application of artificial intelligence to the power system transient stability assessment, with a focus on different machine, deep, and reinforcement learning techniques. The review covers data generation processes (from measurements and simulations), data processing pipelines (features engineering, splitting strategy, dimensionality reduction), model building and training (including ensembles and hyperparameter optimization techniques), deployment, and management (with monitoring for detecting bias and drift). The review focuses, in particular, on different deep learning models that show promising results on standard benchmark test cases. The final aim of the review is to point out the advantages and disadvantages of different approaches, present current challenges with existing models, and offer a view of the possible future research opportunities. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
41. Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework.
- Author
-
Wang, Qi, Hao, Yongsheng, and Cao, Jie
- Subjects
- *
ARTIFICIAL intelligence , *DEEP learning , *VEHICLE routing problem , *TRAVELING salesman problem , *OPERATIONS research , *REINFORCEMENT learning - Abstract
The combinatorial optimization (CO) problems on the graph are the core and classic problems in artificial intelligence (AI) and operations research (OR). For example, the Vehicle Routing Problem (VRP) and Traveling Salesman Problem (TSP) are fascinating NP-hard problems and have important significance for the existing transportation system. Traditional methods such as heuristics methods, exact algorithms, and solution solvers can already find approximate solutions on small-scale graphs. However, they are helpless for large-scale graphs and other problems with similar structures. Moreover, traditional methods often require artificially designed heuristic functions to aid decision-making. In recent years, more and more work has focused on applying deep learning and reinforcement learning (RL) to learn heuristics, which allows us to learn the internal structure of the graph end-to-end and find the optimal path under the guidance of heuristic rules. However, most of these still need manual assistance, and the RL method used has the problems of low sampling efficiency and small searchable space. This paper proposes a novel framework (called OmegaZero) based on Alphago Zero, which does not prescribe expert experience or label data but is trained through self-play. We divide the learning into two stages: in the first stage, we employ graph attention network (GAT) and GRU to learn node representations and memory history trajectories. In the second stage, we employ Monte Carlo tree search (MCTS) and deep RL to search for the solution space and train the model. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
42. Fair classification via Monte Carlo policy gradient method.
- Author
-
Petrović, Andrija, Nikolić, Mladen, Jovanović, Miloš, Bijanić, Miloš, and Delibašić, Boris
- Subjects
- *
ARTIFICIAL intelligence , *REINFORCEMENT learning , *MACHINE learning , *GENDER , *CLASSIFICATION algorithms , *HUMAN-artificial intelligence interaction - Abstract
Artificial intelligence is steadily increasing its impact on everyday life. Therefore, the societal issues of artificial intelligence have become an important concern in the AI research. The presence of data that reflects human biases towards historically discriminated groups defined by sensitive features such as race and gender, results in machine learning models which discriminate against these groups. In order to tackle the impact of bias in data, researchers developed a variety of specialized machine learning algorithms which are able to satisfy different fairness constraints imposed on the model. Group fairness constraints do not fit standard machine learning formulations easily due to their non-differentiable nature. In this paper we developed a technique for learning a fair classifier by Monte Carlo policy gradient method which naturally deals with such non-differentiable constraints. Our methodology focuses on direct optimization of both group fairness metric and predictive performance of the model. In addition, we propose two different variance reduction techniques of gradient estimation. We compare our models to seven other related and state-of-the-art models and demonstrate that they are able to achieve better trade-off between accuracy and unfairness. To the best of our knowledge, this is the first fair classification algorithm which solves the issue of non-differentiable constraints by reinforcement learning techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
43. Deep replacement: Reinforcement learning based constellation management and autonomous replacement.
- Author
-
Kopacz, Joseph, Roney, Jason, and Herschitz, Roman
- Subjects
- *
REINFORCEMENT learning , *ARTIFICIAL satellites , *MICROSPACECRAFT , *DEEP learning , *ARTIFICIAL intelligence , *ALGORITHMS , *RESEARCH & development - Abstract
The Deep Reinforcement Learning (DRL) algorithm, Proximal Policy Optimization (PPO2), is deployed on a custom spacecraft (S/C) build and loss model to determine if an Artificial Intelligence (AI) can learn to monitor satellite constellation health and determine an optimal replacement strategy. A custom environment is created to simulate how S/C are built, launched, generate revenue, and finally decay. The reinforcement learning agent successfully learned an optimal policy for two models: a Simplified Model where the financial cost of actions is ignored; and an Advanced Model where the financial cost of actions is a major element. In both models the AI monitors the constellations and takes multiple strategic and tactical actions to replace satellites to maintain constellation performance. The Simplified Model showed that the PPO2 algorithm was able to converge on an optimal solution after ∼ 200,000 simulations. The Advanced Model was much more difficult for the AI to learn, and thus, the performance drops during the early episodes, but eventually converges to an optimal policy at ∼ 25,000,000 simulations. With the Advanced Model, the AI is taking actions that are successfully providing strategies for constellation management and satellite replacements which include these actions' financial implications. Thus, the methods in this paper provide initial research developments towards a real-world tool and an AI application that can aid various Aerospace businesses in managing Low Earth Orbit (LEO) constellations. This type of AI application may become imperative for deploying and maintaining small satellite mega-constellations. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
44. A deep reinforcement learning-based method applied for solving multi-agent defense and attack problems.
- Author
-
Huang, Liwei, Fu, Mingsheng, Qu, Hong, Wang, Siying, and Hu, Shangqian
- Subjects
- *
REINFORCEMENT learning , *REWARD (Psychology) , *ARTIFICIAL intelligence , *MACHINE learning , *DEEP learning , *FUNCTION spaces - Abstract
• The multi-agent defense and attack environment is reconstructed. • Several algorithms are applied to solve the considered problem. • We redefine the state space, the action space and the reward functions accordingly. • Comparison experiments are conducted to show the performance of the employed models. Learning to cooperate among agents has always been an important research topic in artificial intelligence. Multi-agent defense and attack, one of the important issues in multi-agent cooperation, requires multiple agents in the environment to learn effective strategies to achieve their goals. Deep reinforcement learning (DRL) algorithms have natural advantages dealing with continuous control problems especially under situations with dynamic interactions, and have provided new solutions for those long-studied multi-agent cooperation problems. In this paper, we start from deep deterministic policy gradient (DDPG) algorithm and then introduce multi-agent DDPG (MADDPG) to solve the multi-agent defense and attack problem under different situations. We reconstruct the considered environment, redefine the continuous state space, continuous action space, reward functions accordingly, and then apply deep reinforcement learning algorithms to obtain effective decision strategies. Several experiments considering different confrontation scenarios are conducted to validate the feasibility and effectiveness of the DRL-based methods. Experimental results show that through learning the agents can make better decisions, and learning with MADDPG achieves superior performance than learning with other DRL-based models, which also explains the importance and necessity of mastering other agents' information. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
45. Deep-Reinforcement-Learning-Based Two-Timescale Voltage Control for Distribution Systems.
- Author
-
Zhang, Jing, Li, Yiqi, Wu, Zhi, Rong, Chunyan, Wang, Tao, Zhang, Zhang, and Zhou, Suyang
- Subjects
- *
VOLTAGE control , *ARTIFICIAL intelligence , *DEEP learning , *REINFORCEMENT learning , *ALGORITHMS , *VOLTAGE - Abstract
Because of the high penetration of renewable energies and the installation of new control devices, modern distribution networks are faced with voltage regulation challenges. Recently, the rapid development of artificial intelligence technology has introduced new solutions for optimal control problems with high dimensions and dynamics. In this paper, a deep reinforcement learning method is proposed to solve the two-timescale optimal voltage control problem. All control variables are assigned to different agents, and discrete variables are solved by a deep Q network (DQN) agent while the continuous variables are solved by a deep deterministic policy gradient (DDPG) agent. All agents are trained simultaneously with specially designed reward aiming at minimizing long-term average voltage deviation. Case study is executed on a modified IEEE-123 bus system, and the results demonstrate that the proposed algorithm has similar or even better performance than the model-based optimal control scheme and has high computational efficiency and competitive potential for online application. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
46. Real-Time Autonomous Residential Demand Response Management Based on Twin Delayed Deep Deterministic Policy Gradient Learning.
- Author
-
Ye, Yujian, Qiu, Dawei, Wang, Huiyu, Tang, Yi, and Strbac, Goran
- Subjects
- *
ENERGY management , *ARTIFICIAL intelligence , *PROBABILISTIC generative models , *REINFORCEMENT learning , *HOME economics , *DEEP learning , *SMART power grids - Abstract
With the roll-out of smart meters and the increasing prevalence of distributed energy resources (DERs) at the residential level, end-users rely on home energy management systems (HEMSs) that can harness real-time data and employ artificial intelligence techniques to optimally manage the operation of different DERs, which are targeted toward minimizing the end-user's energy bill. In this respect, the performance of the conventional model-based demand response (DR) management approach may deteriorate due to the inaccuracy of the employed DER operating models and the probabilistic modeling of uncertain parameters. To overcome the above drawbacks, this paper develops a novel real-time DR management strategy for a residential household based on the twin delayed deep deterministic policy gradient (TD3) learning approach. This approach is model-free, and thus does not rely on knowledge of the distribution of uncertainties or the operating models and parameters of the DERs. It also enables learning of neural-network-based and fine-grained DR management policies in a multi-dimensional action space by exploiting high-dimensional sensory data that encapsulate the uncertainties associated with the renewable generation, appliances' operating states, utility prices, and outdoor temperature. The proposed method is applied to the energy management problem for a household with a portfolio of the most prominent types of DERs. Case studies involving a real-world scenario are used to validate the superior performance of the proposed method in reducing the household's energy costs while coping with the multi-source uncertainties through comprehensive comparisons with the state-of-the-art deep reinforcement learning (DRL) methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. Online reconfiguration scheme of self-sufficient distribution network based on a reinforcement learning approach.
- Author
-
Oh, Seok Hwa, Yoon, Yong Tae, and Kim, Seung Wan
- Subjects
- *
REINFORCEMENT learning , *TELECOMMUNICATION network management , *POWER distribution networks , *RENEWABLE energy sources , *ALGORITHMS , *DEEP learning - Abstract
• Online network reconfiguration is introduced to manage distribution networks. • A Deep Q-learning algorithm is used to find the optimal network topology. • The proposed algorithm is more scalable than other algorithms. • The computation time of the algorithm is low enough for practical applications. • The algorithm can increase DRES utilization without heavy capital investment. With increasing number of distributed renewable energy sources integrated in power distribution networks, network security issues such as line overloading or bus voltage violations are becoming increasingly common. Traditional capital-intensive system reinforcements could lead to overinvestment. Moreover, active network management solutions, which have emerged as important alternatives, may become a financial burden for distribution system operators or reduce profits for owners of distributed renewable energy sources, or both. To address these limitations, this paper proposes an online network reconfiguration scheme based on a deep reinforcement learning approach. In this scheme, the distribution network operator modifies the network topology to change the power flow when the reliability of network is threatened. Because the variability of distributed renewable energy is large in self-sufficient distribution networks, the reconfiguration process needs to be performed online within short time intervals, which involves the use of conventional algorithms. To solve this problem efficiently, a deep q-learning model is utilized to determine the optimal network topology. Performances of proposed and other algorithms were compared in modified CIGRE 14-bus and IEEE 123-bus test network, as well as varying penalties for frequent switching operation in consideration of physical characteristic of the network. Simulation results demonstrated that the proposed algorithm showed almost identical performances with brute-force search algorithm in both test networks, satisfying network constraints over almost all timespans. Further, the proposed method required very small computation times - under a second per each state and its scalability was verified by comparing the computation time between two test networks. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
48. Control of superheat of organic Rankine cycle under transient heat source based on deep reinforcement learning.
- Author
-
Wang, Xuan, Wang, Rui, Jin, Ming, Shu, Gequn, Tian, Hua, and Pan, Jiaying
- Subjects
- *
HEAT recovery , *REINFORCEMENT learning , *DEEP learning , *RANKINE cycle , *THERMODYNAMIC potentials , *WASTE heat , *THERMODYNAMIC control - Abstract
• DRL-based control methods are proposed for ORC for engine waste heat recovery. • The DRL control for ORC superheat performs considerably better than PID control. • The DRL-based PID control is more robust than the DRL control. • The switching DRL control performs well and exhibits sufficient robustness. • Useful reference and motivation for application of DRL to thermodynamic systems. The organic Rankine cycle (ORC) is a promising technology for engine waste heat recovery. During real-world operation, the engine working condition varies frequently to satisfy the power demand; thus, the transient nature of engine waste heat presents significant control challenges for the ORC. To control the superheat of the ORC precisely under a transient heat source, several optimal control methods have been used such as model predictive control and dynamic programing. However, most of them depend strongly on the accurate prediction of future disturbances. Deep reinforcement learning (DRL) is an artificial-intelligence algorithm that can overcome the aforementioned disadvantage, but the potential of DRL in control of thermodynamic systems has not yet been investigated. Thus, this paper proposes two DRL-based control methods for controlling the superheat of ORC under a transient heat source. One directly uses the DRL agent to learn the control strategy (DRL control), and the other uses the DRL agent to optimize the parameters of the proportional–integral–derivative (PID) controller (DRL-based PID control). Additionally, a switching mechanism between different DRL controllers is proposed for improving the training efficiency and enlarging the operation range of the controller. The results of this study indicate that the DRL agent can satisfactorily perform the control task and optimize the traditional controller under the trained and untrained transient heat source. Specifically, the DRL control can track the reference superheat with an average error of only 0.19 K, whereas that of the traditional PID control is 2.16 K. Furthermore, the proposed switching DRL control exhibits excellent tracking performance with an average error of only 0.21 K and robustness over a wide range of operation conditions. The successful application of DRL demonstrates its considerable potential for the control of thermodynamic systems, providing a useful reference and motivation for the application to other thermodynamic systems. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
49. Multi-agent deep reinforcement learning based demand response for discrete manufacturing systems energy management.
- Author
-
Lu, Renzhi, Li, Yi-Chang, Li, Yuting, Jiang, Junhui, and Ding, Yuemin
- Subjects
- *
DEEP learning , *DISCRETE systems , *ENERGY management , *LITHIUM-ion battery manufacturing , *DETERMINISTIC algorithms , *ALGORITHMS , *REINFORCEMENT learning - Abstract
With advances in smart grid technologies, demand response has played a major role in improving the reliability of grids and reduce the cost for customers. Implementing the demand response scheme for industry is more necessary than for other sectors, because its energy consumption is often considered the largest. This paper proposes a multi-agent deep reinforcement learning based demand response scheme for energy management of discrete manufacturing systems. In this regard, the industrial manufacturing system is initially formulated as a partially-observable Markov game; then, a multi-agent deep deterministic policy gradient algorithm is adopted to obtain the optimal schedule for different machines. A typical lithium-ion battery assembly manufacturing system is used to demonstrate the effectiveness of the proposed scheme. Simulation results show that the presented demand response algorithm can minimize electricity costs and maintain production tasks, as compared to a benchmark without demand response. Moreover, the performance of the multi-agent deep reinforcement learning approach against a mathematical model method is investigated. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
50. Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics.
- Author
-
Mosavi, Amirhosein, Faghan, Yaser, Ghamisi, Pedram, Duan, Puhong, Ardabili, Sina Faizollahzadeh, Salwana, Ely, and Band, Shahab S.
- Subjects
- *
REINFORCEMENT learning , *DEEP learning , *MATHEMATICAL economics , *ARTIFICIAL intelligence , *SCALABILITY , *DYNAMICAL systems - Abstract
The popularity of deep reinforcement learning (DRL) applications in economics has increased exponentially. DRL, through a wide range of capabilities from reinforcement learning (RL) to deep learning (DL), offers vast opportunities for handling sophisticated dynamic economics systems. DRL is characterized by scalability with the potential to be applied to high-dimensional problems in conjunction with noisy and nonlinear patterns of economic data. In this paper, we initially consider a brief review of DL, RL, and deep RL methods in diverse applications in economics, providing an in-depth insight into the state-of-the-art. Furthermore, the architecture of DRL applied to economic applications is investigated in order to highlight the complexity, robustness, accuracy, performance, computational tasks, risk constraints, and profitability. The survey results indicate that DRL can provide better performance and higher efficiency as compared to the traditional algorithms while facing real economic problems in the presence of risk parameters and the ever-increasing uncertainties. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.