Descriptor: "q-learning" / Topic: 0202 electrical engineering, electronic engineering, information engineering - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"q-learning"' showing total 1,022 results

Start Over Descriptor "q-learning" Topic 0202 electrical engineering, electronic engineering, information engineering

1,022 results on '"q-learning"'

1. Learning to Delay in Ride-Sourcing Systems: A Multi-Agent Deep Reinforcement Learning Framework

Author: Jintao Ke, Hai Yang, Jieping Ye, and Feng Xiao
Subjects: Matching (statistics), Computer science, business.industry, Q-learning, 02 engineering and technology, Computer Science Applications, Computational Theory and Mathematics, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), Bipartite graph, Combinatorial optimization, Reinforcement learning, Artificial intelligence, business, Information Systems
Abstract: Online matching between idle drivers and waiting passengers is one of the most key components in a ride-sourcing system. It is naturally expected that a more effective bipartite matching can be implemented if the platform accumulates more idle drivers and waiting passengers in the matching pool. A specific passenger request can also benefit from a delayed matching since he/she may be matched with closer idle drivers after waiting for a few seconds. Motivated by the potential benefits of delayed matching, this paper establishes a two-stage framework which incorporates a combinatorial optimization and multi-agent deep reinforcement learning methods. The multi-agent reinforcement learning methods are used to dynamically determine the delayed time for each passenger request, while the combinatorial optimization conducts an optimal bipartite matching between idle drivers and waiting passengers in the matching pool. Four tailored reinforcement learning methods, delayed multi-agent deep Q learning (Delayed-M-DQN), delayed multi-agent actor-critic (Delayed-M-A2C), delayed multi-agent Proximal Policy Optimization (Delayed-M-PPO), and delayed multi-agent actor-critic with experience replay (Delayed-M-ACER), are developed. Through extensive empirical experiments with a well-designed simulator, we show that the proposed framework is able to remarkably improve system performances, by well balancing the trade-off among pick-up time, matching time, successful matching rate.
Published: 2022

2. A Self-Play and Sentiment-Emphasized Comment Integration Framework Based on Deep Q-Learning in a Crowdsourcing Scenario

Author: Mznah Al-Rodhaan, Huan Rong, Tinghuai Ma, Victor S. Sheng, and Yang Zhou
Subjects: Ground truth, Computer science, business.industry, Sentiment analysis, Q-learning, Inference, 02 engineering and technology, Machine learning, computer.software_genre, Crowdsourcing, Field (computer science), Computer Science Applications, Computational Theory and Mathematics, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Redundancy (engineering), Reinforcement learning, Artificial intelligence, business, computer, Information Systems
Abstract: Crowdsourcing is a hotspot research field which can facilitate machine learning by collecting labels to train models. Consequently, the state-of-the-art research efforts in crowdsourcing focus on truth inference or label integration, to remove inconsistent labels or to alleviate biased labeling. In turn, the integrated labels will be used to fine-tune machine learning models. Particularly, in this paper, we change the target of truth inference in crowdsourcing from discrete labels to multiple comments given by online participants, that is, the integration of the crowdsourced comments. For such a goal, we propose a Self-play and Sentiment-Emphasized Comment Integration Framework (SSECIF), based on deep Q-learning, with three unique features. First, our framework SSECIF can generate the comment integration in a totally self-play way, without relying on the ground truth generated by human effort. Second, the integrated comment generated by SSECIF can include salient content with low redundancy. Third, the proposed framework SSECIF has emphasized, with a higher intensity, the sentiment in the integrated comment, in order to reflect the attitude or opinion more obviously. Extensive evaluation on real-world datasets demonstrates that SSECIF has achieved the best overall performance in terms of both effectiveness and efficiency, compared with the state-of-the-art methods. Index Terms: Crowdsourcing; Comment Integration; Reinforcement Learning; Deep Q-Learning; Sentiment Analysis.
Published: 2022

3. An eco-driving algorithm for trains through distributing energy: A Q-Learning approach

Author: Tao Tang, Zixuan Zhang, Shuai Su, Qinghao Tian, Wentao Liu, and Qingyang Zhu
Subjects: Hyperparameter, 0209 industrial biotechnology, Computer science, Applied Mathematics, Computation, 020208 electrical & electronic engineering, Q-learning, 02 engineering and technology, Computer Science Applications, 020901 industrial engineering & automation, Control and Systems Engineering, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, Train, Markov decision process, Sensitivity (control systems), Electrical and Electronic Engineering, Instrumentation, Algorithm, Energy (signal processing)
Abstract: The energy-efficient train operation methodology is the focus of this paper, and a Q-Learning-based eco-driving approach is proposed. Firstly, the core idea of energy-distribution-based method (EDBM) that converts the eco-driving problem to the finite Markov decision process is presented. Secondly, Q-Learning approach is proposed to determine the optimal energy distribution policy. Specifically, two different state definitions, i.e., trip-time-relevant (TT) and energy-distribution-relevant (ED) state definitions, are introduced. Finally, the effectiveness of the proposed approach is verified in a deterministic and a stochastic environment. It is also illustrated that TT-state approach takes about 20 times more computation time compared with ED-state approach while the space complexity of TT-state approach is nearly constant. The hyperparameter sensitivity analysis demonstrates the robustness of the proposed approach.
Published: 2022

4. Reinforcement Learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system

Author: Raul-Cristian Roman, Radu-Emil Precup, Emil M. Petriu, and Iuliu Alexandru Zamfirache
Subjects: Information Systems and Management, Fitness function, Artificial neural network, Computer science, Q-learning, Initialization, 02 engineering and technology, Optimal control, 01 natural sciences, 010305 fluids & plasmas, Computer Science Applications, Theoretical Computer Science, Artificial Intelligence, Control and Systems Engineering, Control theory, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, 020201 artificial intelligence & image processing, Algorithm, Metaheuristic, Software
Abstract: This paper presents a novel Reinforcement Learning (RL)-based control approach that uses a combination of a Deep Q-Learning (DQL) algorithm and a metaheuristic Gravitational Search Algorithm (GSA). The GSA is employed to initialize the weights and the biases of the Neural Network (NN) involved in DQL in order to avoid the instability, which is the main drawback of the traditional randomly initialized NNs. The quality of a particular set of weights and biases is measured at each iteration of the GSA-based initialization using a fitness function aiming to achieve the predefined optimal control or learning objective. The data generated during the RL process is used in training a NN-based controller that will be able to autonomously achieve the optimal reference tracking control objective. The proposed approach is compared with other similar techniques which use different algorithms in the initialization step, namely the traditional random algorithm, the Grey Wolf Optimizer algorithm, and the Particle Swarm Optimization algorithm. The NN-based controllers based on each of these techniques are compared using performance indices specific to optimal control as settling time, rise time, peak time, overshoot, and minimum cost function value. Real-time experiments are conducted in order to validate and test the proposed new approach in the framework of the optimal reference tracking control of a nonlinear position servo system. The experimental results show the superiority of this approach versus the other three competing approaches.
Published: 2022

5. Deep Q-Learning With Q-Matrix Transfer Learning for Novel Fire Evacuation Environment

Author: Per-Arne Andersen, Ole-Christoffer Granmo, Morten Goodwin, and Jivitesh Sharma
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer science, Q-learning, ComputingMilieux_LEGALASPECTSOFCOMPUTING, Systems and Control (eess.SY), 02 engineering and technology, Overfitting, Machine Learning (cs.LG), FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, Electrical and Electronic Engineering, VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550, business.industry, 020206 networking & telecommunications, Computer Science Applications, Human-Computer Interaction, Artificial Intelligence (cs.AI), Control and Systems Engineering, Shortest path problem, Emergency evacuation, Computer Science - Systems and Control, 020201 artificial intelligence & image processing, Artificial intelligence, Transfer of learning, business, Software
Abstract: We focus on the important problem of emergency evacuation, which clearly could benefit from reinforcement learning that has been largely unaddressed. Emergency evacuation is a complex task which is difficult to solve with reinforcement learning, since an emergency situation is highly dynamic, with a lot of changing variables and complex constraints that makes it difficult to train on. In this paper, we propose the first fire evacuation environment to train reinforcement learning agents for evacuation planning. The environment is modelled as a graph capturing the building structure. It consists of realistic features like fire spread, uncertainty and bottlenecks. We have implemented the environment in the OpenAI gym format, to facilitate future research. We also propose a new reinforcement learning approach that entails pretraining the network weights of a DQN based agents to incorporate information on the shortest path to the exit. We achieved this by using tabular Q-learning to learn the shortest path on the building model's graph. This information is transferred to the network by deliberately overfitting it on the Q-matrix. Then, the pretrained DQN model is trained on the fire evacuation environment to generate the optimal evacuation path under time varying conditions. We perform comparisons of the proposed approach with state-of-the-art reinforcement learning algorithms like PPO, VPG, SARSA, A2C and ACKTR. The results show that our method is able to outperform state-of-the-art models by a huge margin including the original DQN based models. Finally, we test our model on a large and complex real building consisting of 91 rooms, with the possibility to move to any other room, hence giving 8281 actions. We use an attention based mechanism to deal with large action spaces. Our model achieves near optimal performance on the real world emergency environment., 21 pages, 14 figures, 4 tables
Published: 2021

6. Path Following Optimization for an Underactuated USV Using Smoothly-Convergent Deep Reinforcement Learning

Author: Yujiao Zhao, Yong Ma, Xin Qi, Miguel Angel Sotelo, Reza Malekian, and Zhixiong Li
Subjects: Generality, Underactuation, Computer science, business.industry, 020209 energy, Mechanical Engineering, 020208 electrical & electronic engineering, Q-learning, Usability, 02 engineering and technology, Function (mathematics), Computer Science Applications, Vehicle dynamics, Control theory, Automotive Engineering, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, Gradient descent, business
Abstract: This paper aims to solve the path following problem for an underactuated unmanned-surface-vessel (USV) based on deep reinforcement learning (DRL). A smoothly-convergent DRL (SCDRL) method is proposed based on the deep Q network (DQN) and reinforcement learning. In this new method, an improved DQN structure was developed as a decision-making network to reduce the complexity of the control law for the path following of a three-degree of freedom USV model. An exploring function was proposed based on the adaptive gradient descent to extract the training knowledge for the DQN from the empirical data. In addition, a new reward function was designed to evaluate the output decisions of the DQN, and hence, to reinforce the decision-making network in controlling the USV path following. Numerical simulations were conducted to evaluate the performance of the proposed method. The analysis results demonstrate that the proposed SCDRL converges more smoothly than the traditional deep Q learning while the path following error of the SCDRL is comparable to existing methods. Thanks to good usability and generality of the proposed method for USV path following, it can be applied to practical applications.
Published: 2021

7. Network slicing for vehicular communications: a multi-agent deep reinforcement learning approach

Author: Soumaya Cherkaoui and Zoubeir Mlika
Subjects: Networking and Internet Architecture (cs.NI), FOS: Computer and information sciences, 021103 operations research, Vehicular ad hoc network, Network packet, business.industry, Computer science, Quality of service, 0211 other engineering and technologies, Q-learning, 020206 networking & telecommunications, 02 engineering and technology, 12. Responsible consumption, Computer Science - Networking and Internet Architecture, 0202 electrical engineering, electronic engineering, information engineering, Resource allocation, Wireless, Reinforcement learning, Markov decision process, Electrical and Electronic Engineering, business, Computer network
Abstract: This paper studies the multi-agent resource allocation problem in vehicular networks using non-orthogonal multiple access (NOMA) and network slicing. Vehicles want to broadcast multiple packets with heterogeneous quality-of-service (QoS) requirements, such as safety-related packets (e.g., accident reports) that require very low latency communication, while raw sensor data sharing (e.g., high-definition map sharing) requires high-speed communication. To ensure heterogeneous service requirements for different packets, we propose a network slicing architecture. We focus on a non-cellular network scenario where vehicles communicate by the broadcast approach via the direct device-to-device interface (i.e., sidelink communication). In such a vehicular network, resource allocation among vehicles is very difficult, mainly due to (i) the rapid variation of wireless channels among highly mobile vehicles and (ii) the lack of a central coordination point. Thus, the possibility of acquiring instantaneous channel state information to perform centralized resource allocation is precluded. The resource allocation problem considered is therefore very complex. It includes not only the usual spectrum and power allocation, but also coverage selection (which target vehicles to broadcast to) and packet selection (which network slice to use). This problem must be solved jointly since selected packets can be overlaid using NOMA and therefore spectrum and power must be carefully allocated for better vehicle coverage. To do so, we first provide a mathematical programming formulation and a thorough NP-hardness analysis of the problem. Then, we model it as a multi-agent Markov decision process. Finally, to solve it efficiently, we use a deep reinforcement learning (DRL) approach and specifically propose a deep Q learning (DQL) algorithm. The proposed DQL algorithm is practical because it can be implemented in an online and distributed manner. It is based on a cooperative learning strategy in which all agents perceive a common reward and thus learn cooperatively and distributively to improve the resource allocation solution through offline training. We show that our approach is robust and efficient when faced with different variations of the network parameters and compared to centralized benchmarks.
Published: 2021

8. Reinforcement Learning Based Efficiency Optimization Scheme for the DAB DC–DC Converter With Triple-Phase-Shift Modulation

Author: Weihao Hu, Jian Xiao, Qi Huang, Zhe Chen, Chen Zhangyong, Frede Blaabjerg, and Yuanhong Tang
Subjects: Reinforcement Learning (RL), Maximum power principle, Computer science, DAB DC-DC converter, 020208 electrical & electronic engineering, 02 engineering and technology, Inductor, Power (physics), Control and Systems Engineering, Modulation, Control theory, Q-learning, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, Electrical and Electronic Engineering, power efficiency, optimization, Phase modulation, Electrical efficiency, Voltage
Abstract: Aim to improve the power efficiency of the dual-active-bridge (DAB) dc–dc converter, an efficiency optimization scheme with triple-phase-shift (TPS) modulation using reinforcement learning (RL) is proposed in this article. More specifically, the Q-learning algorithm, as a typical algorithm of the RL, is applied to train an agent offline to obtain an optimized modulation strategy, and then the trained agent provides control decisions online in a real-time manner for the DAB dc–dc converter according to the current operating environment. The main objective is to obtain the optimal phase-shift angles for the DAB dc–dc converter, which can achieve the maximum power efficiency by reducing the power losses. Moreover, all possible operation modes of the TPS modulation are considered during the offline training process of the Q-learning algorithm. Thus, the cumbersome process for selecting the optimal operation mode in the conventional schemes can be circumvented successfully. Based on these merits, the proposed efficiency optimization scheme using the RL can realize the excellent performances for the whole load conditions and voltage conversion ratios. Finally, a 1.2-KW prototyped is built, and the simulation and the experimental results demonstrate that the power efficiency can be improved by using the optimization scheme based on the RL.
Published: 2021

9. Learning Control for Air Conditioning Systems via Human Expressions

Author: Qinglai Wei, Derong Liu, and Tao Li
Subjects: business.industry, Computer science, 020208 electrical & electronic engineering, Control (management), Q-learning, Image processing, Control engineering, 02 engineering and technology, Optimal control, Grayscale, Control and Systems Engineering, Air conditioning, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, Electrical and Electronic Engineering, business
Abstract: In this article, a deep reinforcement learning method is developed to solve air conditioning control problems through human expressions. The main contribution of this article is to design a deep reinforcement learning method for air conditioning control problems with human expressions as the input for the first time. The method aims to eliminate human sleepiness and improve people's work efficiency as much as possible. First, the air conditioning system and deep reinforcement learning methods are introduced. Second, the image processing algorithm for human expressions is described. Third, the deep $Q$ -network method is designed to obtain the optimal control policy for air conditioning systems. Finally, simulation results are given to illustrate the present method that can effectively eliminate sleepiness and improve the work environment of people.
Published: 2021

10. Intelligent Traffic Signal Control Based on Reinforcement Learning with State Reduction for Smart Cities

Author: Honghao Gao, Kemu Li, Jianbo Zheng, and Li Kuang
Subjects: 050210 logistics & transportation, Computer Networks and Communications, Computer science, 05 social sciences, Real-time computing, Q-learning, 020206 networking & telecommunications, Environmental pollution, 02 engineering and technology, Signal timing, Traffic flow, Signal, Reduction (complexity), 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, State space, Reinforcement learning
Abstract: Efficient signal control at isolated intersections is vital for relieving congestion, accidents, and environmental pollution caused by increasing numbers of vehicles. However, most of the existing studies not only ignore the constraint of the limited computing resources available at isolated intersections but also the matching degree between the signal timing and the traffic demand, leading to high complexity and reduced learning efficiency. In this article, we propose a traffic signal control method based on reinforcement learning with state reduction. First, a reinforcement learning model is established based on historical traffic flow data, and we propose a dual-objective reward function that can reduce vehicle delay and improve the matching degree between signal time allocation and traffic demand, allowing the agent to learn the optimal signal timing strategy quickly. Second, the state and action spaces of the model are preliminarily reduced by selecting a proper control phase combination; then, the state space is further reduced by eliminating rare or nonexistent states based on the historical traffic flow. Finally, a simplified Q-table is generated and used to optimize the complexity of the control algorithm. The results of simulation experiments show that our proposed control algorithm effectively improves the capacity of isolated intersections while reducing the time and space costs of the signal control algorithm.
Published: 2021

11. Optimized control for human-multi-robot collaborative manipulation via multi-player Q-learning

Author: Xinglu Liu, Shuzhi Sam Ge, and Panfeng Huang
Subjects: Computer Science::Computer Science and Game Theory, 0209 industrial biotechnology, Computer Networks and Communications, Computer science, Applied Mathematics, Control (management), Q-learning, 02 engineering and technology, Object (computer science), symbols.namesake, 020901 industrial engineering & automation, Control and Systems Engineering, Control theory, Nash equilibrium, Position (vector), Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, symbols, Robot, 020201 artificial intelligence & image processing, Game theory
Abstract: In this paper, optimized interaction control is investigated for human-multi-robot collaboration control problems, which cannot be described by the traditional impedance controller. To realize global optimized interaction performance, the multi-player non-zero sum game theory is employed to obtain the optimized interaction control of each robot agent. Regarding the game strategies, Nash equilibrium strategy is utilized in this paper. In human-multi-robot collaboration problems, the dynamics parameters of the human arm and the manipulated object are usually unknown. To obviate the dependence on these parameters, the multi-player Q-learning method is employed. Moreover, for the human-multi-robot collaboration problem, the optimized solution is difficult to resolve due to the existence of the desired reference position. A multi-player Nash Q-learning algorithm considering the desired reference position is proposed to deal with the problem. The validity of the proposed method is verified through simulation studies.
Published: 2021

12. Reinforcement Learning-Enabled UAV Itinerary Planning for Remote Sensing Applications in Smart Farming

Author: Ali Cheshmehzangi and Saeid Pourroostaei Ardakani
Subjects: reinforcement learning, Computer engineering. Computer hardware, Data collection, Remote sensing application, Computer science, UAV, 010401 analytical chemistry, Real-time computing, Q-learning, 020206 networking & telecommunications, 02 engineering and technology, QA75.5-76.95, 01 natural sciences, 0104 chemical sciences, Environmental data, TK7885-7895, remote sensing, Remote sensing (archaeology), Robustness (computer science), Electronic computers. Computer science, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, Motion planning, path planning
Abstract: UAV path planning for remote sensing aims to find the best-fitted routes to complete a data collection mission. UAVs plan the routes and move through them to remotely collect environmental data from particular target zones by using sensory devices such as cameras. Route planning may utilize machine learning techniques to autonomously find/select cost-effective and/or best-fitted routes and achieve optimized results including: minimized data collection delay, reduced UAV power consumption, decreased flight traversed distance and maximized number of collected data samples. This paper utilizes a reinforcement learning technique (location and energy-aware Q-learning) to plan UAV routes for remote sensing in smart farms. Through this, the UAV avoids heuristically or blindly moving throughout a farm, but this takes the benefits of environment exploration–exploitation to explore the farm and find the shortest and most cost-effective paths into target locations with interesting data samples to collect. According to the simulation results, utilizing the Q-learning technique increases data collection robustness and reduces UAV resource consumption (e.g., power), traversed paths, and remote sensing latency as compared to two well-known benchmarks, IEMF and TBID, especially if the target locations are dense and crowded in a farm.
Published: 2021

13. Distributed Q-Learning Aided Uplink Grant-Free NOMA for Massive Machine-Type Communications

Author: Zhenjiang Shi, Shangwei Zhang, Jiajia Liu, and Nei Kato
Subjects: Computer Networks and Communications, business.industry, Computer science, Q-learning, 020206 networking & telecommunications, Throughput, 02 engineering and technology, Scheduling (computing), Transmission (telecommunications), Telecommunications link, 0202 electrical engineering, electronic engineering, information engineering, Cellular network, Reinforcement learning, Resource management, Electrical and Electronic Engineering, business, Computer network
Abstract: The explosive growth of machine-type communications (MTC) devices poses critical challenges to the existing cellular networks. Therefore, how to support massive MTC devices with limited resources is an urgent problem to be solved. Bursty traffic is an important characteristic of MTC devices, which makes it difficult for agents to learn useful experience and has a negative impact on model convergence. However, most existing reinforcement learning-based literatures assume that devices have saturate data. Towards this end, we propose two distributed Q-learning aided uplink grant-free non-orthogonal multiple access (NOMA) schemes (including all-devices distributed Q-learning (ADDQ) scheme and portion-devices distributed Q-learning (PDDQ) scheme) to maximize the number of accessible devices, where the bursty traffic of massive MTC devices is carefully considered. In order to reduce the dimension of scheduling space and mitigate the impact of bursty traffic, the idea of grouping devices as well as transmission resources and the intermittent learning mode are adopted in our schemes. Extensive numerical results demonstrate the advantages of proposed schemes from multiple perspectives.
Published: 2021

14. Optimizing of Q-Learning Day/Night Energy Strategy for Solar Harvesting Environmental Wireless Sensor Networks Nodes

Author: Jaromir Konecny and Michal Prauzek
Subjects: semi-supervised learning, Computer performance, energy management, Energy management, Computer science, Node (networking), 020208 electrical & electronic engineering, Real-time computing, Q-learning, 020206 networking & telecommunications, 02 engineering and technology, Semi-supervised learning, 7. Clean energy, TK1-9971, Microcontroller, Duty cycle, Microcontrollers, Wireless sensor networks, 0202 electrical engineering, electronic engineering, information engineering, Electrical engineering. Electronics. Nuclear engineering, Electrical and Electronic Engineering, wireless sensor networks, Wireless sensor network, microcontrollers
Abstract: This research article presents the application of the Q-learning algorithm in the operational duty cycle control of solar-powered environmental wireless sensor network (EWSN) nodes. Those nodes are commonly implemented as embedded devices using low-power and low-cost microcontrollers. Therefore, there is a significant need for an effective and easy way to implement a machine learning (ML) algorithm in terms of computer performance. This approach uses a Q-learning-based policy implementing a sleep/run switching algorithm driven by the state of charge. The presented algorithm is based on two modes: daylight and nighttime, which is a suitable solution for solar-powered systems. The study includes the complete process of design EWSN node strategy with an optimal reward policy. The presented algorithm was tested and verified on an EWSN node model and a 5-year data set of solar irradiance values was used for the learning process and its validation. As part of the study, we are also presenting the validation in terms of Q-learning parameters, which include the learning rate and discount factor. The result section shows that the overall performance of the presented solution is more suitable for solar-powered EWSN then state-of-the-art studies. Both day/night experiments reached 828 203 measurement/transmission cycles, which is 12.7 % more than in the previous studies using the strategy defined by the state of energy storage.
Published: 2021

15. DEQLFER — A Deep Extreme Q-Learning Firefly Energy Efficient and high performance routing protocol for underwater communication

Author: D. Anitha and R. A. Karthika
Subjects: Routing protocol, Computer Networks and Communications, Computer science, Distributed computing, End-to-end delay, Q-learning, 020206 networking & telecommunications, 02 engineering and technology, Energy consumption, Network topology, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Routing (electronic design automation), Underwater acoustic communication, Efficient energy use
Abstract: With an advent of Underwater sensor networks, underwater communication has reached its new dimension of research. These networks are characterized by the elongated end to end delay, high energy utility and most importantly dynamic network topologies. By incorporating these characteristics, numerous automated routing algorithms has been proposed to achieve the energy efficient and low latency data transmission. But still, short-comings still exists due to the above mentioned characteristics and the most comprehensive routing algorithms are badly desired. In this article, a novel routing scheme based on Q-learning framework and Deep Extreme Learning Machines aided with Adaptive Firefly Routing algorithm to address the above mentioned research constraints including energy efficiency and network unsteadiness in underwater communication , that practices the hybrid combination of reward function and adaptive fireflies to determine the optimal routing mechanism. In this algorithm, traditional q-learning mechanism has been replaced by the powerful q-deep extreme learning mechanism which uses the adaptive reward function for the varying underwater environment and to boost the packet-delivery ratio (PDR) and throughputs. Also the paper uses the powerful firefly aided routing mechanism to achieve the energy efficient data transmission and to avoid the void dilemma problems. The extensive experimentations has been conducted on the proposed algorithm and compared with other state of art schemes such as Q deep q-Learning energy aware routing protocol (DQLER), DELR Protocols and VBF protocols in which the proposed algorithm has outperformed than the compared existing algorithms in terms of complexity, energy consumption , packet delivery ratio and end to end delay.
Published: 2021

16. Distributed Q-Learning Based Joint Relay Selection and Access Control Scheme for IoT-Oriented Satellite Terrestrial Relay Networks

Author: Huining Zhang, Bo Zhao, Guangliang Ren, and Xiaodai Dong
Subjects: Scheme (programming language), Optimization problem, business.industry, Computer science, Q-learning, 020206 networking & telecommunications, Access control, Throughput, 02 engineering and technology, Computer Science Applications, law.invention, Relay, law, Modeling and Simulation, Telecommunications link, 0202 electrical engineering, electronic engineering, information engineering, Resource management, Electrical and Electronic Engineering, business, computer, Computer network, computer.programming_language
Abstract: In this letter, we propose a distributed Q-learning (DQL) based joint relay selection and access control (JRSAC) scheme for Internet of Things (IoT)-oriented satellite terrestrial relay networks (STRNs) with massive IoT devices and multiple relays. Firstly, a semi-random access (SRA) architecture is proposed to improve the learning efficiency of the DQL algorithm. Subsequently, a JRSAC optimization problem is formulated and solved by the proposed DQL algorithm. Simulation results show that the proposed DQL based JRSAC scheme significantly outperforms conventional schemes in terms of the medium access control (MAC) throughput, total access delay, and sum rate.
Published: 2021

17. Autonomous quadrotor obstacle avoidance based on dueling double deep recurrent Q-learning with monocular vision

Author: Ming Zhu, Xiao Guo, Jiajun Ou, and Wenjie Lou
Subjects: FOS: Computer and information sciences, 0209 industrial biotechnology, Computer science, Generalization, business.industry, Cognitive Neuroscience, Q-learning, Systems and Control (eess.SY), 02 engineering and technology, Electrical Engineering and Systems Science - Systems and Control, Computer Science Applications, Computer Science - Robotics, 020901 industrial engineering & automation, Action (philosophy), Artificial Intelligence, Obstacle avoidance, FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Robotics (cs.RO), Monocular vision
Abstract: The rapid development of unmanned aerial vehicles (UAV) puts forward a higher requirement for autonomous obstacle avoidance. Due to the limited payload and power supply, small UAVs such as quadrotors usually carry simple sensors and computation units, which makes traditional methods more challenging to implement. In this paper, a novel framework is demonstrated to control a quadrotor flying through crowded environments autonomously with monocular vision. The framework adopts a two-stage architecture, consisting of a sensing module and a decision module. The sensing module is based on an unsupervised deep learning method. And the decision module uses dueling double deep recurrent Q-learning to eliminate the adverse effects of limited observation capacity of an on-board monocular camera. The framework enables the quadrotor to realize autonomous obstacle avoidance without any prior environment information or labeled datasets for training. The trained model shows a high success rate in the simulation and a good generalization ability for transformed scenarios., 23 pages, 10 figures
Published: 2021

18. Visual Dialog Agent Based On Deep Q Learning and Memory Module Networks

Author: Shubhangi Srivastava, Ajay Kumar, Arundhati Raj, and Aniruddh Suresh Pillai
Subjects: Memory module, Computer science, Human–computer interaction, 0202 electrical engineering, electronic engineering, information engineering, Q-learning, 020201 artificial intelligence & image processing, 02 engineering and technology, Dialog box
Abstract: In the past many years, it has been observed that there has been an increase in methods to solve problems and the solution involves a combination of Computer Vision and Natural Language Processing. New algorithms and systems are emerging and are being developed every day to solve the above-mentioned kind of problems. Visual Dialog Agent is one of them. This kind of system utilizes both Computer Vision and Natural Language Processing algorithms. With this technology many variants of Visual Dialog Agents have been designed till date and many exclusive algorithms are created for Visual Dialog Agent. In this paper we propose an idea to create a Visual Dialog Agent which utilizes the present state of art End to End Memory Module Networks along with Reinforcement Learning Policies to answer the questions prompted by the user and as well understand the inclination of the user in the conversation which it holds. The goal of the proposed Visual Dialog Agent is to have a more engaging conversation with the highest user inclination.
Published: 2021

19. A Bayesian Q-Learning Game for Dependable Task Offloading Against DDoS Attacks in Sensor Edge Cloud

Author: Shui Yu, Jianhua Liu, Shigen Shen, Xin Wang, Guangxue Yue, and Minglu Li
Subjects: 021110 strategic, defence & security studies, Computer Networks and Communications, Computer science, business.industry, Distributed computing, 0805 Distributed Computing, 1005 Communications Technologies, 0211 other engineering and technologies, Q-learning, 020206 networking & telecommunications, Denial-of-service attack, Cloud computing, 02 engineering and technology, Computer Science Applications, Task (computing), Resource (project management), Hardware and Architecture, Complete information, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Resource allocation, Resource management, business, Information Systems
Abstract: To enhance dependable resource allocation against increasing distributed denial-of-service (DDoS) attacks, in this article, we investigate interactions between a sensor device–edgeVM pair and a DDoS attacker using a game-theoretic framework, under the constraints of the task time, resource budget, and incomplete knowledge of the processing time of machine learning tasks. In this game, the sensor device expects an edgeVM to cooperate and choose its resource allocation strategy with the objective of satisfying the minimum resource required of machine learning tasks at the corresponding sensor device. Similarly, the attacker’s objective is to strategically allocate resources so that the resource constraint of the machine learning tasks is not satisfied. Owing to a lack of complete information of the processing time of the machine learning tasks, this strategic resource allocation problem between the two players is modeled as a Bayesian $Q$ -learning game, in which the optimal strategies of the sensor device–edgeVM pair and the attacker are analyzed. Furthermore, probability distributions are employed by the corresponding players to model the incomplete nature of the game and a greedy $Q$ -learning algorithm is proposed to dependable resource allocation against DDoS attacks. Numerical simulation results demonstrate that the proposed mechanism is superior to other dependable resource allocation mechanisms under incomplete information for DDoS attacks in the sensor edge cloud.
Published: 2021

20. Latency and Energy Efficient Bio-Inspired Conic Optimized and Distributed Q Learning for D2D Communication in 5G

Author: Sridhar Varadala and S. Emalda Roslin
Subjects: Computer science, business.industry, Device to device, 020208 electrical & electronic engineering, Q-learning, Particle swarm optimization, 020206 networking & telecommunications, 02 engineering and technology, Fifth generation, Computer Science Applications, Theoretical Computer Science, Conic section, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Latency (engineering), business, Computer hardware, 5G, Efficient energy use
Abstract: The next-generation communication, i.e. fifth generation (5G), will be manifesting the advertisers in near future. The Device to Device communication would be a proportion of 5G to provide communic...
Published: 2021

21. Learning to Schedule Network Resources Throughput and Delay Optimally Using Q+-Learning

Author: Song Chong, Joohyun Lee, and Jeongmin Bae
Subjects: Network architecture, Schedule, Mathematical optimization, Optimization problem, Computer Networks and Communications, Computer science, Q-learning, 020206 networking & telecommunications, Throughput, 02 engineering and technology, Upper and lower bounds, Computer Science Applications, Scheduling (computing), Reduction (complexity), Bellman equation, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, Markov decision process, Electrical and Electronic Engineering, Software
Abstract: As network architecture becomes complex and the user requirement gets diverse, the role of efficient network resource management becomes more important. However, existing throughput-optimal scheduling algorithms such as the max-weight algorithm suffer from poor delay performance. In this paper, we present reinforcement learning-based network scheduling algorithms for a single-hop downlink scenario which achieve throughput-optimality and converge to minimal delay. To this end, we first formulate the network optimization problem as a Markov decision process (MDP) problem. Then, we introduce a new state-action value function called $Q^{+}$ -function and develop a reinforcement learning algorithm called $Q^{+}$ -learning with UCB (Upper Confidence Bound) exploration which guarantees small performance loss during a learning process. We also derive an upper bound of the sample complexity in our algorithm, which is more efficient than the best known bound from Q-learning with UCB exploration by a factor of $\gamma ^{2}$ where $\gamma $ is the discount factor of the MDP problem. Finally, via simulation, we verify that our algorithm shows a delay reduction of up to 40.8% compared to the max-weight algorithm over various scenarios. We also show that the Q+-learning with UCB exploration converges to an $\epsilon $ -optimal policy 10 times faster than Q-learning with UCB.
Published: 2021

22. DGA domain detection and botnet prevention using Q-learning for POMDP

Author: Y. V. Bubnov and N. N. Ivanov
Subjects: q-learning, partially observable markov decision process, Domain generation algorithm, TK7800-8360, Artificial neural network, Network security, business.industry, Computer science, Botnet, Q-learning, Partially observable Markov decision process, 020207 software engineering, 02 engineering and technology, computer.software_genre, Recurrent neural network, computer network security, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, domain generation algorithm, recurrent neural network, Data mining, Electronics, business, computer, Block (data storage)
Abstract: An effective method for preventing the operation of computer network nodes for organizing a botnet is proposed. A botnet is a collection of devices connected via the Internet for the purpose of organizing DDoS attacks, stealing data, sending spam and other malicious actions. The described method implies the detection of generated domain names in DNS queries using a neural network with parallel organization of convolutional and bidirectional recurrent layers. The effectiveness of the method is based on the assumption that generated domain names are used to create a botnet for merging. Experiments confirm that the proposed neural network is superior to the accuracy of existing counterparts on the UMUDGA dataset. The estimation of the quality of recognition of generated domain names using ROC analysis is calculated for a trained neural network. The article also formulates a model for controlling detectors using a partially observable Markov decisionmaking process to block infected nodes of a computer network. The search for the optimal policy for the formulated model by means of Q-learning of value agents is proposed. A comparative analysis of the average, minimum and maximum value of actions taken by agents in the process of interacting with the environment is carried out.
Published: 2021

23. EECCRN: Energy Enhancement with CSS Approach Using Q-Learning and Coalition Game Modelling in CRN

Author: A. Suresh, Y. Harold Robinson, L. Kalaivani, Seifedine Kadry, Vimal Shanmuganathan, and Lim Sangsoon
Subjects: Computer science, Distributed computing, Probabilistic logic, Q-learning, 020206 networking & telecommunications, 02 engineering and technology, Bidding, Computer Science Applications, Bayesian game, Cognitive radio, Control and Systems Engineering, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Stackelberg competition, Overhead (computing), False alarm, Electrical and Electronic Engineering
Abstract: The Cognitive radio network (CR) is a widespread technology in which the Secondary users are assumed to be of the winning users to acquire the spectrum by reducing the false alarm possibilities and the false detection of the user assumed to be original user in nature is restricted with the usage of Spectrum monitoring agents. The collaborative spectrum sensing (CSS) is an approach that will identify the false intruder in the CR networks, here it is proposed with the Enhanced Q-Learning model with Coalition Game approach (EQLCG) to outline the energy enhancement. Besides an approach on Greedy Bidding is used to allocate the spectrum to the winning secondary user (SU) based on the idle primary user to strengthen the spectrum sensing. The winning secondary user forms a communication establishment with the neighbouring SU to eradicate the miss detection probability based on group level cooperation. The simulation experiment analyses the cluster level security with energy monitoring that has been performed using the analysis of interference by applying the coalition game theory modelling and the information obscured by the attacker is reduced with the usage of enhanced Q-learning, and the results prove that overhead is substantially monitored. The proposed paper enhances the security in physical layer with energy conservation and maintains the spectrum usage for application purpose. The proposed simulation approach reduces the miss detection and false alarm probabilistic approach while compared with Stackelberg and Bayesian game models.
Published: 2021

24. DDQP: A Double Deep Q-Learning Approach to Online Fault-Tolerant SFC Placement

Author: Lei Wang, Jin Zhao, Yuedong Xu, and Weixi Mao
Subjects: Speedup, Computer Networks and Communications, Computer science, Distributed computing, Reliability (computer networking), Q-learning, 020206 networking & telecommunications, Fault tolerance, 02 engineering and technology, computer.software_genre, Stateful firewall, Virtual machine, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Routing (electronic design automation), Virtual network, computer
Abstract: Since Network Function Virtualization (NFV) decouples network functions (NFs) from the underlying dedicated hardware and realizes them in the form of software called Virtual Network Functions (VNFs), they are enabled to run in any resource-sufficient virtual machines. A service function chain (SFC) is composed of a sequential set of VNFs. As VNFs are vulnerable to various faults such as software failures, we consider how to deploy both active and standby SFC instances. Given the complexity and unpredictability of the network state, we propose a double deep Q-networks based online SFC placement scheme DDQP. Specifically, DDQP uses deep neural networks to deal with large continuous network state space. In the case of stateful VNFs, we offer constant generated state updates from active instances to standby instances to guarantee seamless redirection after failures. With the goal of balancing the waste of resources and ensuring service reliability, we introduce five progressive schemes of resource reservations to meet different customer needs. Our experimental results demonstrate that DDQP responds rapidly to arriving requests and reaches near-optimal performance. Specifically, DDQP outweighs the state-of-the-art method by 16.30% and 38.51% higher acceptance ratio under different schemes with 82x speedup on average. In order to enhance the integrity of the SFC state transition, we further proposed DDQP+, which extends DDQP by adding the delayed placement mechanism. Compared with DDQP, the design of the DDQP+ algorithm is more reasonable and comprehensive. The experiment results also show that DDQP+ achieved further improvement in multiple performance indicators.
Published: 2021

25. Deploying SDN Control in Internet of UAVs: Q-Learning-Based Edge Scheduling

Author: Chaofeng Zhang, Mianxiong Dong, and Kaoru Ota
Subjects: Data collection, Computer Networks and Communications, business.industry, Computer science, Q-learning, 020206 networking & telecommunications, Throughput, Cloud computing, 02 engineering and technology, Scheduling (computing), 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), The Internet, Enhanced Data Rates for GSM Evolution, Electrical and Electronic Engineering, business, Computer network
Abstract: Nowadays, wilderness monitoring provides massive data output for supporting agricultural production, environmental protection, and disaster monitoring. However, smart upgrading alone for these wireless nodes cannot meet the softwarized network needs today, relating to the explosion of multi-dimensional data and multi-species equipment. In this article, we conduct a comprehensive solution for the UAV based data collection strategy in an “air-to-ground” intelligent softwarized collection system. The innovation in this article is that after using the IoT nodes to complete the data collection process through the proposed bandwidth-weighted traffic pushing optimization (BWPTO) algorithm, the system infers the future changes according to the current network state using a deep Q-learning (DQL) network. Then, by developing the proposed AIIPO (Air-to-Ground Intelligent Information Pushing Optimization) algorithm, the entire network can “forward-looking” the uploaded information to potentially idle nodes in the future, thus achieve the optimized system performance. Through the final mathematical experiments, we prove the optimality of our proposed routing algorithm and forwarding strategy, which are more applicable in the dynamic “air-to-ground” distributed data collection system than other benchmark solutions.
Published: 2021

26. Q-Learning-Based Spectrum Access for Multimedia Transmission Over Cognitive Radio Networks

Author: Yu-Xuan Li, Xiao-Wei Tang, Xin-Lin Huang, and Yu Gao
Subjects: Computer Networks and Communications, Computer science, Wireless network, business.industry, Q-learning, 020206 networking & telecommunications, Throughput, 02 engineering and technology, Spectral efficiency, Idle, Cognitive radio, Artificial Intelligence, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business, Electrical efficiency, Information exchange, Computer network
Abstract: In order to meet the dramatic wireless bandwidth demands of emerging multimedia applications, cognitive radio has been proposed as one of promising solutions to improve the spectrum efficiency. This article aims at pursuing high spectrum efficiency via accessing the idle spectrum intelligently without information exchange among users. Different from infrastructure-based wireless networks, users in cognitive radio networks tend to compete with each other to access limited idle spectrum, thus leading to a dynamically heterogeneous radio environment. In this article, a Q-learning based spectrum access scheme is proposed to adaptively allocate multimedia data over multiple idle spectrum holes. Taking into consideration the rigorous delay and throughput performance requirements of multimedia applications, we integrate these two indicators into the definition of reward function in the proposed Q-learning algorithm. The simulation results show that the proposed scheme can quickly converge to a stable state in terms of throughput, power efficiency, and collision probability. Furthermore, the proposed learning rate adjustment strategy makes the performance of the spectrum access algorithm converge the quickest and only consumes 78% time to achieve the targeted collision probability, i.e., 0.1, compared with two other typical parameter adjustment strategies.
Published: 2021

27. Deep Q-learning-based resource allocation for solar-powered users in cognitive radio networks

Author: Insoo Koo, Hoang Thi Huong Giang, and Pham Duy Thanh
Subjects: Computer Networks and Communications, Computer science, Q-learning, Time division multiple access, Power allocation, Throughput, 02 engineering and technology, Base station, Deep Q-learning, Artificial Intelligence, Telecommunications link, Computer Science::Networking and Internet Architecture, 0202 electrical engineering, electronic engineering, information engineering, Throughput maximization, lcsh:T58.5-58.64, Energy harvesting, lcsh:Information technology, business.industry, 020208 electrical & electronic engineering, NOMA, 020206 networking & telecommunications, Cognitive radio, Hardware and Architecture, Resource allocation, Channel (broadcasting), business, Software, Information Systems, Computer network
Abstract: This paper considers uplink solar-powered cognitive radio networks (CRNs) where multiple secondary users (SUs) transmit data to a secondary base station (SBS) by sharing a licensed channel of a primary system. A deep Q-learning (DQL) algorithm, which combines non-orthogonal multiple access (NOMA) and time division multiple access (TDMA) techniques, is proposed to maximize the long-term throughput of the system. By using our scheme, the agent (i.e. the SBS) can obtain the optimal decision by interacting with the environment to learn about system dynamics. Simulation results validate the superiority of the performance under the proposed scheme, compared with traditional schemes.
Published: 2021

28. High-resolution multi-beam tracking with low overhead for mmWave beamforming system

Author: Girim Kwon, Seonyong Kim, and Hyuncheol Park
Subjects: Beamforming, Scheme (programming language), Computer Networks and Communications, Computer science, Millimeter wave communication, 02 engineering and technology, Tracking (particle physics), Interference (wave propagation), Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Electronic engineering, Auxiliary beam pair, Overhead (computing), computer.programming_language, Mobility, lcsh:T58.5-58.64, lcsh:Information technology, 020208 electrical & electronic engineering, 020206 networking & telecommunications, Beam tracking, Hardware and Architecture, Extremely high frequency, Q-learning, Key (cryptography), computer, Software, Beam (structure), Information Systems
Abstract: In millimeter wave communication, the beamforming technique with accurate angle information plays a key role to overcome the high path-loss and mitigate the interference. Particularly with multiple mobile stations (MSs), accurate multi-beam tracking without any knowledge of dynamic model is challenging. In this regard, we propose the model-free multi-beam tracking algorithm combining the Q-learning with auxiliary beam pair-based angle estimation in multi-MS environment. The proposed scheme benefits from low pilot overhead and high-resolution angle estimation. Simulation results show that the proposed scheme outperforms the conventional schemes in terms of the effective sum-rate.
Published: 2021

29. Dynamic feature selection algorithm based on Q-learning mechanism

Author: Zhongliang Yang, Lifang Yang, Zhigang Shang, Ruohao Xu, Kangjia Qiao, and Mengmeng Li
Subjects: Computer science, business.industry, Q-learning, Feature selection, Pattern recognition, 02 engineering and technology, Set (abstract data type), Data visualization, Discriminant, Discriminant function analysis, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Selection algorithm, Selection (genetic algorithm)
Abstract: Feature selection is a technique to improve the classification accuracy of classifiers and a convenient data visualization method. As an incremental, task oriented, and model-free learning algorithm, Q-learning is suitable for feature selection, this study proposes a dynamic feature selection algorithm, which combines feature selection and Q-learning into a framework. First, the Q-learning is used to construct the discriminant functions for each class of the data. Next, the feature ranking is achieved according to the all discrimination functions vectors for each class of the data comprehensively, and the feature ranking is doing during the process of updating discriminant function vectors. Finally, experiments are designed to compare the performance of the proposed algorithm with four feature selection algorithms, the experimental results on the benchmark data set verify the effectiveness of the proposed algorithm, the classification performance of the proposed algorithm is better than the other feature selection algorithms, meanwhile the proposed algorithm also has good performance in removing the redundant features, and the experiments of the effect of learning rates on the our algorithm demonstrate that the selection of parameters in our algorithm is very simple.
Published: 2021

30. Imitation and Transfer Q-Learning-Based Parameter Identification for Composite Load Modeling

Author: Jian Xie, Zixiao Ma, Yishen Wang, Di Shi, Zhaoyu Wang, Ruisheng Diao, and Kaveh Dehghanpour
Subjects: Mathematical optimization, General Computer Science, Computer science, business.industry, Process (engineering), 020209 energy, Dimensionality reduction, Stability (learning theory), Q-learning, 02 engineering and technology, Content-addressable memory, Action selection, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Local search (optimization), business, Transfer of learning
Abstract: Fast and accurate load parameter identification has a large impact on power systems operation and stability analysis. This article proposes a novel Imitation and Transfer Q-learning (ITQ)-based method to identify parameters of composite constant impedance-current-power (ZIP) and induction motor (IM) load models. Firstly, an imitation learning process is introduced to improve the exploitation and exploration processes. Then, a transfer learning method is employed to overcome the challenge of time-consuming optimization when dealing with new identification tasks. An associative memory is designed to realize dimension reduction, knowledge learning and transfer between different identification tasks. Agents can exploit the optimal knowledge from source tasks to accelerate the search rate in new tasks and improve solution accuracy. A greedy action selection rule is adopted for agents to balance the global and local search. The performance of the proposed ITQ approach has been validated on a 68-bus test system. Simulation results in multi-test cases verify that the proposed method is robust and can estimate load parameters accurately. Comparisons with other methods show that the proposed method has superior convergence rate and stability.
Published: 2021

31. Q-learning based routing for in-network aggregation in wireless sensor networks

Author: P. Yogesh and Radhakrishnan Maivizhi
Subjects: Computer Networks and Communications, Computer science, business.industry, Distributed computing, Q-learning, 020206 networking & telecommunications, 020302 automobile design & engineering, 02 engineering and technology, Tree (data structure), 0203 mechanical engineering, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Reinforcement learning, Wireless, Electrical and Electronic Engineering, Routing (electronic design automation), business, Wireless sensor network, Information Systems
Abstract: In-network data aggregation is an inherent paradigm that extends the lifetime of resource-constrained wireless sensor networks (WSNs). By aggregating sensor data at intermediate nodes, it eliminates data redundancy, minimizes the number of transmissions and saves energy. A key component of in-network data aggregation is the design of an optimal routing structure. However, when the monitoring environment is highly dynamic, the conventional in-network aggregation routing algorithms lead to unnecessary redesign, high overhead and inferior performance, and make in-network aggregation a challenging task. This paper proposes a novel adaptive routing algorithm for in-network aggregation (RINA) in wireless sensor networks. The proposed approach employs a reinforcement learning method called Q-learning to build a routing tree based on minimal information such as residual energy, distance between nodes and link strength. In addition, it finds the aggregation points in the routing structure to maximize the number of overlapping routes in order to increase the aggregation ratio. Theoretical analysis proves the feasibility of the proposed approach. Simulation results show that the aggregation tree constructed by RINA increases the network lifetime by achieving optimum data aggregation and outperform other state-of-the-art approaches in terms of different significant features under different simulation scenarios.
Published: 2021

32. Dynamic Auto Reconfiguration of Operator Placement in Wireless Distributed Stream Processing Systems

Author: K. Sornalakshmi and G. Vadivu
Subjects: Job shop scheduling, Computer science, Distributed computing, Quality of service, Q-learning, Control reconfiguration, 020206 networking & telecommunications, Throughput, Workload, 02 engineering and technology, Computer Science Applications, Scheduling (computing), System model, Stream processing, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, 020201 artificial intelligence & image processing, Electrical and Electronic Engineering, Edge computing
Abstract: The data is generated at significant speed and volume by devices in real-time. The data generation and the growth of fog and edge computing infrastructure have led to the noteworthy development of the corresponding distributed stream processing systems (DSPS). A DSPS application has Quality of Service (QoS) restrictions in terms of resource cost and time. The physical resources are distributed and heterogeneous. The resource-constrained scheduling problem has considerable implications on the performance of the system and QoS violations. The static deployment of applications in fog or edge scenario has to be monitored continuously for runtime issues, and actions have to be taken accordingly. In this paper, we propose an adaptation capability with reinforcement learning techniques to an existing stream processing framework scheduler. This functionality enables the scheduler to make decisions on its own when the system model or knowledge of the environment is not known upfront. The reinforcement learning methods adapt to the system when the system model for different states is not available. We consider applications whose workload cannot be characterized or predicted. In such applications, predictions of input load are not helpful for online scheduling. The Q-Learning based online scheduler learns to make dynamic scaling decisions at runtime when there is performance degradation. We validated the proposed approach with real-time and benchmark applications on a DSPS cluster. We obtained an average of 6% reduction in the response time and a 15% increase in the throughput when the Q Learning module is employed in the scheduler.
Published: 2021

33. Energy management of intelligent building based on deep reinforced learning

Author: Xiaoqing Huang, XiaoSong Zhang, and Dongliang Zhang
Subjects: Energy management, business.industry, Computer science, 020209 energy, General Engineering, Economic dispatch, Context (language use), 02 engineering and technology, Engineering (General). Civil engineering (General), Optimal control, Intelligent building, 01 natural sciences, Industrial engineering, Energy storage, 010305 fluids & plasmas, Energy management system, 0103 physical sciences, Q-learning, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, TA1-2040, business, Data mining, Building automation
Abstract: In the context of the ubiquitous power Internet of Things (UPIoT), this paper attempts to make full use of distributed new energy, and rationalize the energy management strategy of households. Inspired by the energy management system (EMS) of intelligent buildings, the authors searched for the optimal control plan for energy based on deep reinforcement learning (DRL) algorithm. Under the overall architecture of the system, a distributed new energy generation system was modelled for consumers in intelligent buildings, including energy storage, household electrical loads, new energy vehicles, etc. Next, a Q-learning-based energy management model was established for intelligent buildings, and the corresponding constraints were set up. After that, the reward and penalty functions of the EMSs for households and the intelligent building were designed based on the daily economic dispatch (DED) model. Finally, the energy management strategy was optimized, creating the real-time optimization control process. The proposed energy management strategy was proved effective for intelligent buildings through simulations. The research results provide a reference for energy management in other microgrids.
Published: 2021

34. Q-learning and LSTM based deep active learning strategy for malware defense in industrial IoT applications

Author: Parus Khuwaja and Sunder Ali Khowaja
Subjects: Computer Networks and Communications, Computer science, Active learning (machine learning), business.industry, Supervised learning, Q-learning, 020207 software engineering, 02 engineering and technology, computer.software_genre, Machine learning, Autoencoder, ComputingMethodologies_PATTERNRECOGNITION, Hardware and Architecture, Robustness (computer science), Active learning, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Embedding, Malware, Artificial intelligence, business, computer, Software
Abstract: Edge devices are extensively used as intermediaries between the device and the service layer in an industrial Internet of things (IIoT) environment. These devices are quite vulnerable to malware attacks. Existing studies have worked on designing complex learning algorithms or deep architectures to accurately classify malware assuming that a sufficient number of labeled examples are provided. In the real world, getting labeled examples is one of the major issues for training any classification algorithm. Recent advances have allowed researchers to use active learning strategies that are trained on a handful of labeled examples to perform the classification task, but they are based on the selection of informative instances. This study integrates the Q-learning characteristics into an active learning framework, which allows the network to either request or predict a label during the training process. We proposed the use of phase space embedding, sparse autoencoder, and LSTM with the action-value function to classify malware applications while using a handful of labeled examples. The network relies on its uncertainty to either request or predict a label. The experimental results show that the proposed method can achieve better accuracy than the supervised learning strategy while using few labeled requests. The results also show that the trained network is resilient to the adversarial attacks, which proves the robustness of the proposed method. Additionally, this study explores the tradeoff between classification accuracy and number of label requests via the choice of rewards and the use of decision-level fusion strategies to boost the classification performance. Furthermore, we also provide a hypothetical framework as an implication of the proposed method.
Published: 2021

35. Hybrid Bidirectional Rapidly Exploring Random Tree Path Planning Algorithm with Reinforcement Learning

Author: Xiangdong Wu, Zhiyang Jia, Kaoru Hirota, Junkui Wang, and Yaping Dai
Subjects: 0209 industrial biotechnology, Computer science, business.industry, Q-learning, 02 engineering and technology, Rapidly exploring random tree, Human-Computer Interaction, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Motion planning, Artificial intelligence, business
Abstract: The randomness of path generation and slow convergence to the optimal path are two major problems in the current rapidly exploring random tree (RRT) path planning algorithm. Herein, a novel reinforcement-learning-based hybrid bidirectional rapidly exploring random tree (H-BRRT) is presented to solve these problems. To model the random exploration process, a target gravitational strategy is introduced. Reinforcement learning is applied to the improved target gravitational strategy using two operations: random exploration and target gravitational exploration. The algorithm is controlled to switch operations adaptively according to the accumulated performance. It not only improves the search efficiency, but also shortens the generated path after the proposed strategy is applied to a bidirectional rapidly exploring random tree (BRRT). In addition, to solve the problem of the traditional RRT continuously falling into the local optimum, an improved exploration strategy with collision weight is applied to the BRRT. Experimental results implemented in a robot operating system indicate that the proposed H-BRRT significantly outperforms alternative approaches such as the RRT and BRRT. The proposed algorithm enhances the capability of identifying unknown spaces and avoiding local optima.
Published: 2021

36. Q-Learning-Based Target Selection for Bearings-Only Autonomous Navigation

Author: Kai Xiong and Chunling Wei
Subjects: 0209 industrial biotechnology, Spacecraft, business.industry, Computer science, Q-learning, Navigation system, 02 engineering and technology, Star (graph theory), Computer Science::Robotics, Extended Kalman filter, 020901 industrial engineering & automation, Position (vector), 0202 electrical engineering, electronic engineering, information engineering, Computer Science (miscellaneous), 020201 artificial intelligence & image processing, business, Algorithm, Selection algorithm, Selection (genetic algorithm), Information Systems
Abstract: This paper presents a Q-learning-based target selection algorithm for spacecraft autonomous navigation using bearing observations of known visible targets. For the considered navigation system, the position and velocity of the spacecraft are estimated using an extended Kalman filter (EKF) with the measurements of inter-satellite line-of-sight (LOS) vectors obtained via an onboard star camera. This paper focuses on the selection of the appropriate target at each observation period for the star camera adaptively, such that the performance of the EKF is enhanced. To derive an effective algorithm, a Q-function is designed to select a proper observation region, while a U-function is introduced to rank the targets in the selected region. Both the Q-function and the U-function are constructed based on the sequence of innovations obtained from the EKF. The efficiency of the Q-learning-based target selection algorithm is illustrated via numerical simulations, which show that the presented algorithm outperforms the traditional target selection strategy based on a Cramer-Rao bound (CRB) in the case that the prior knowledge about the target location is inaccurate.
Published: 2021

37. Research on multi-agent collaborative hunting algorithm based on game theory and Q-learning for a single escaper

Author: Yanbin Zheng, Han Mengyun, and Fan Wenxin
Subjects: Statistics and Probability, Computer science, business.industry, 020208 electrical & electronic engineering, ComputingMilieux_PERSONALCOMPUTING, General Engineering, Q-learning, 02 engineering and technology, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Game theory
Abstract: The multi-agent collaborative hunting problem is a typical problem in multi-agent coordination and collaboration research. Aiming at the multi-agent hunting problem with learning ability, a collaborative hunt method based on game theory and Q-learning is proposed. Firstly, a cooperative hunting team is established and a game model of cooperative hunting is built. Secondly, through the learning of the escaper’s strategy choice, the trajectory of the escaper’s limited T-step cumulative reward is established, and the trajectory is adjusted to the hunter’s strategy set. Finally, the Nash equilibrium solution is obtained by solving the cooperative hunt game, and each hunter executes the equilibrium strategy to complete the hunt task. C# simulation experiment shows that under the same conditions, this method can effectively solve the hunting problem of a single runaway with learning ability in the obstacle environment, and the comparative analysis of experimental data shows that the efficiency of this method is better than other methods.
Published: 2021

38. A reinforcement learning optimization for future smart cities using software defined networking

Author: Manikandan Ramachandran, K. V. Rajkumar, Rizwan Patan, and Fadi Al-Turjman
Subjects: Dynamic network analysis, Computer science, Distributed computing, Q-learning, 020206 networking & telecommunications, Computational intelligence, Throughput, 02 engineering and technology, Telecommunications network, Artificial Intelligence, Smart city, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Software-defined networking, Software
Abstract: Nowadays smart cities towards software defined network (SDN) approach will become better flexibility and manageability. A stronger, more dynamic network is an SDN network, which is precisely what a smart city network must be if it wants to be viable on a real-world scale. SDN architecture is developed to implement a learning framework for network optimization. The proposed method is called mixed-integer and reinforcement learned network optimization (MI-RLNO) for SDN monitoring. In the first phase, mixed-integer programming formulation is used as an optimization formulation for latency and convergence time. In the second phase, a reinforced Q Learning model is designed that uses communication and computation time as input state vector. Optimization formulation is used as the actions and strategies to be followed during the design and operation of communication networks, therefore contributing fairness and throughput. Simulation results improved the efficiency of the MI-RLNO method.
Published: 2021

39. A Table-Free Approximate Q-Learning-Based Thermal-Aware Adaptive Routing for Optical NoCs

Author: Wenfei Zhang and Yaoyao Ye
Subjects: Computer science, Q-learning, 02 engineering and technology, Adaptive routing, Topology, Chip, Computer Graphics and Computer-Aided Design, Optical switch, 020202 computer hardware & architecture, Hardware_INTEGRATEDCIRCUITS, 0202 electrical engineering, electronic engineering, information engineering, Table (database), Overhead (computing), System on a chip, Sensitivity (control systems), Electrical and Electronic Engineering, Routing (electronic design automation), Software
Abstract: Optical networks-on-chips (NoCs) based on silicon photonics have been proposed as an emerging communication architecture for many-core chip multiprocessors. However, the thermal sensitivity of silicon photonics is one of the major challenges. $Q$ -learning-based adaptive routing has been proposed in related work to mitigate the thermal issue. However, table overhead of the traditional table-based $Q$ -routing would scale up quickly with the increase of network size. In this article, we propose a table-free approximate $Q$ -learning-based thermal-aware adaptive routing to find optimal low-loss paths in the presence of on-chip temperature variations. The simulation results show that the proposed table-free approximate $Q$ -learning-based adaptive routing can converge faster and it can achieve similar optimization effect as compared to the best optimization effect of the traditional table-based $Q$ -routing. The performance gap between the proposed approximation method and the traditional table-based $Q$ -routing expands when the network size increases.
Published: 2021

40. A Q-Learning-Based Approach for Enhancing Energy Efficiency of Bluetooth Low Energy

Author: Xing Fu, Loberi Lopez-Estrada, and Jeong Geun Kim
Subjects: wireless networks, reinforcement learning, General Computer Science, Computer science, Internet of Things, Q-learning, Throughput, 02 engineering and technology, Bluetooth low energy, Scheduling (computing), 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, General Materials Science, business.industry, Network packet, Quality of service, General Engineering, 020206 networking & telecommunications, Key (cryptography), 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, Wireless sensor network, Efficient energy use, Computer network
Abstract: Bluetooth low energy (BLE) is a promising candidate technology for use in the Internet of Things (IoT) because of its ultra-low-power communication. Although BLE devices are designed to run on a small battery for a few years, several attempts have been made to extend BLE lifetime through various techniques. In particular, emerging approaches such as artificial intelligence (AI) can be utilized to further improve the BLE energy efficiency. For this purpose, this article proposes a Q-learning-based scheduling algorithm for BLE. The proposed scheduling algorithm dynamically adjusts the key parameters that govern the operation of the BLE transmission scheme. These key parameters, namely, the length of connection interval and the number of packets to transmit during the interval, have a profound effect on energy efficiency and the quality of service (QoS) specified in terms of maximum latency. According to the framework of reinforcement learning, our Q-learning-based scheduling algorithm is appropriately constructed to simultaneously provide a longer network lifetime and satisfy the QoS requirement. The numerical results show that the proposed Q-learning-based approach significantly increases the network lifetime compared to alternative methods while meeting QoS requirements.
Published: 2021

41. An analytical approach to estimate structural and behavioral impact of renewable energy power plants on LMP

Author: Mohammad Javad Poursalimi Jaghargh and Habib Rajabi Mashhadi
Subjects: 060102 archaeology, Renewable Energy, Sustainability and the Environment, Computer science, business.industry, Process (engineering), 020209 energy, Q-learning, 06 humanities and the arts, 02 engineering and technology, Energy security, Environmental economics, Power (physics), Renewable energy, Electric power system, Operator (computer programming), Climate change mitigation, 0202 electrical engineering, electronic engineering, information engineering, 0601 history and archaeology, business
Abstract: Implementation of renewable energy support policies in many countries for the sake of energy security and climate change mitigation worldwide raises the importance of renewable energy power plants (REPPs) in analyzing power market features. In this article, an analytical method is introduced to examine the effect of REPPs on locational marginal price (LMP). To this end, taking into account REPPs, according to the social welfare maximization problem of the independent system operator (ISO), a new decomposition for LMP is proposed to analytically express the impact of the strategic behavior of conventional generation companies (GenCos) and the power system structure on LMP. The structural section of LMP is decomposed into four components to differentiate between the effect of GenCos and REPPs. The well-known Q-Learning (QL) algorithm models the GenCos’ decision-making process. The results prove that REPPs can change both the structural and behavioral components of the LMP and highlight the significance of GenCos’ strategic behavior. The results demonstrate that not considering the features of the power system in the design of support policies might raise the likelihood of congestion or the probability of strategic behaviors from GenCos, against the objectives of these policies.
Published: 2021

42. A Time-Slotted Data Gathering Medium Access Control Protocol Using Q-Learning for Underwater Acoustic Sensor Networks

Author: Faisal Ahmed and Ho-Shin Cho
Subjects: Schedule, General Computer Science, Computer science, 02 engineering and technology, 01 natural sciences, Synchronization, Back-off, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), General Materials Science, business.industry, Network packet, Node (networking), ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, 010401 analytical chemistry, General Engineering, medium access control, 020206 networking & telecommunications, collisions, 0104 chemical sciences, machine learning, Transmission (telecommunications), Q-learning, lcsh:Electrical engineering. Electronics. Nuclear engineering, slot selection, business, Underwater acoustics, lcsh:TK1-9971, Computer network, Communication channel
Abstract: Contention-basedmedium access control (MAC) protocols for underwater acoustic sensor networks are designed to handle packet collisions that are caused by long propagation delays. However, existing protocols are known to suffer from relatively high collisions, which decrease system performance. To enhance system performance, we propose a contention-based MAC protocol that employs a widely-popular machine learning technique, namely, Q-learning. Using Q-learning, the proposed protocol allows the sensor nodes to intelligently select the back-off slots and accordingly schedule the transmission of data packets such that collisions are minimized at the receiver. Unlike in existing protocols, the sensor nodes are not required to exchange scheduling information, which implies that the proposed protocol has low complexity and overhead. Under varying traffic loads and node numbers, the proposed protocol is compared with the state-of-the-art ALOHA-Q for underwater environment (UW-ALOHA-Q), multiple access collision avoidance for underwater (MACA-U) and exponential increase exponential decrease (EIED) protocols. Results demonstrate the effectiveness of the proposed protocol in terms of energy efficiency, channel utilization, and latency.
Published: 2021

43. Emergency Load Shedding Strategy for Microgrids Based on Dueling Deep Q-Learning

Author: Yu Hongliang, Binxin Zhu, Huikang Liu, Lin Chai, and Can Wang
Subjects: frequency adjustment effect, Microgrid, General Computer Science, Computer science, 020209 energy, 020208 electrical & electronic engineering, General Engineering, Load Shedding, Q-learning, 02 engineering and technology, Power (physics), Control theory, 0202 electrical engineering, electronic engineering, information engineering, emergency load shedding, General Materials Science, deep Q-learning, lcsh:Electrical engineering. Electronics. Nuclear engineering, lcsh:TK1-9971, frequency recovery
Abstract: The rapid drop of frequency under the disturbance is a major threat to the safe and stable operation of a microgrid (MG) system. Emergency load shedding is the main measure to prevent continuous frequency drop and power outage. The existing load shedding strategies have poor adaptability to deal with the problem of MG load shedding under different disturbance situations, and it is difficult to ensure the safe and stable operation of an MG in different operating environments. To address this problem, this paper proposes a data-driven load shedding strategy. First, considering the importance of the load and the frequency recovery time of the system, a load shedding contribution indicator is designed that takes into account the load frequency adjustment effect and the load shedding priority. This contribution indicator is introduced as a load shedding criterion into the reward value function of dueling deep Q learning. Second, considering the suddenness and uncertainty of emergency load shedding, a MG emergency load shedding strategy (ELSS) based on dueling deep Q-learning is proposed. On this basis, the dueling deep Q learning algorithm is used to obtain the load shedding decision with the maximum cumulative reward. Finally, taking the MG load shedding cases in two different scenarios as examples, a simulation study is carried out on a modified IEEE-25 bus MG. The simulation results show that, compared with the model-driven implicit enumeration strategy (IES), the proposed ELSS has superiority in maintaining stable power supply for important loads and reducing load shedding decision-making time and frequency fluctuations.
Published: 2021

44. Improving IoT Services Using a Hybrid Fog-Cloud Offloading

Author: Abdolah Chalechale and Saif Aljanabi
Subjects: task offloading, General Computer Science, Computer science, Distributed computing, Internet of Things, 050801 communication & media studies, Cloud computing, 02 engineering and technology, 0508 media and communications, Server, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Edge computing, business.industry, Quality of service, Node (networking), 05 social sciences, cloud computing, General Engineering, 020206 networking & telecommunications, Task (computing), Task analysis, Q-learning, Markov decision process, lcsh:Electrical engineering. Electronics. Nuclear engineering, fog computing, business, lcsh:TK1-9971
Abstract: With the rapid development of the internet of things (IoT) devices and applications, the necessity to provide these devices with high processing capabilities appears to run the applications more quickly and smoothly. Though the manufacturing companies try to provide IoT devices with the best technologies, some drawbacks related to run some sophisticated applications like virtual reality and smart healthcare-based are still there. To overcome these drawbacks, a hybrid fog-cloud offloading (HFCO) is introduced, where the tasks associated with the complex applications are offloaded to the cloud servers to be executed and sent back the results to the corresponding applications. In the HFCO, when an IoT node generates a high-requirement processing task that cannot handle itself, it must decide to offload the task to the cloud server or to the nearby fog nodes. The decision depends on the conditions of the task requirements and the nearby fog nodes. Considering many fog nodes and many IoT nodes that need to offload their tasks, the problem is to select the best fog node to offload each task. In this paper, we propose a novel solution to the problem, where the IoT node has the choice to offload tasks to the best fog node or to the cloud based on the requirements of the applications and the conditions of the nearby fog nodes. In addition, fog nodes can offload tasks to each other or to the cloud to balance the load and improve the current conditions allowing the tasks to be executed more efficiently. The problem is formulated as a Markov Decision Process (MDP). Besides, a Q-learning-based algorithm is presented to solve the model and select the optimal offload policy. Numerical simulation results show that the proposed approach has superiority over other methods regarding reducing delay, executing more tasks, and balance the load.
Published: 2021

45. Q-Learning-Based Data-Aggregation-Aware Energy-Efficient Routing Protocol for Wireless Sensor Networks

Author: Wan-Kyu Yun and Sang-Jo Yoo
Subjects: Routing protocol, data aggregation, General Computer Science, Computer science, 02 engineering and technology, Computer Science::Networking and Internet Architecture, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, network lifetime, business.industry, Node (networking), General Engineering, 020206 networking & telecommunications, Energy consumption, Wireless sensor networks, routing, Sensor node, Q-learning, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Routing (electronic design automation), business, lcsh:TK1-9971, Wireless sensor network, Efficient energy use, Computer network
Abstract: The energy consumption of the routing protocol can affect the lifetime of a wireless sensor network (WSN) because tiny sensor nodes are usually difficult to recharge after they are deployed. Generally, to save energy, data aggregation is used to minimize and/or eliminate data redundancy at each node and reduce the amount of the overall data transmitted in a WSN. Furthermore, energy-efficient routing is widely used to determine the optimal path from the source to the destination, while avoiding the energy-short nodes, to save energy for relaying the sensed data. In most conventional approaches, data aggregation and routing path selection are considered separately. In this study, we consider the degrees of the possible data aggregation of neighbor nodes when a node needs to determine the routing path. We propose a novel Q-learning-based data-aggregation-aware energy-efficient routing algorithm. The proposed algorithm uses reinforcement learning to maximize the rewards, defined in terms of the efficiency of the sensor-type-dependent data aggregation, communication energy and node residual energy, at each sensor node to obtain an optimal path. We used sensor-type-dependent aggregation rewards. Finally, we performed simulations to evaluate the performance of the proposed routing method and compared it with that of the conventional energy-aware routing algorithms. Our results indicate that the proposed protocol can successfully reduce the amount of data and extend the lifetime of the WSN.
Published: 2021

46. Reinforcement Learning-Based Routing Protocols for Vehicular Ad Hoc Networks: A Comparative Survey

Author: Rezoan Ahmed Nazib and Sangman Moh
Subjects: Routing protocol, reinforcement learning, General Computer Science, Computer science, Wireless ad hoc network, Throughput, 02 engineering and technology, routing protocol, quality-of-service routing, 0203 mechanical engineering, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, Overhead (computing), General Materials Science, Vehicular ad hoc network, Network packet, business.industry, Quality of service, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, General Engineering, 020302 automobile design & engineering, 020206 networking & telecommunications, Q-learning, lcsh:Electrical engineering. Electronics. Nuclear engineering, Routing (electronic design automation), business, lcsh:TK1-9971, intelligent algorithm, Computer network
Abstract: Vehicular-ad hoc networks (VANETs) hold great importance because of their potentials in road safety improvement, traffic monitoring, and in-vehicle infotainment services. Due to high mobility, sparse connectivity, road-side obstacles, and shortage of roadside units, the links between the vehicles are subject to frequent disconnections; consequently, routing is crucial. Recently, to achieve more efficient routing, reinforcement learning (RL)-based routing algorithms have been investigated. RL represents a class of artificial intelligence that implements a learning procedure based on previous experiences and provides a better solution for future operations. RL algorithms are more favorable than other optimization techniques owing to their modest usage of memory and computational resources. Because a VANET deals with passenger safety, any kind of flaw is intolerable in VANET routing. Fortunately, RL-based algorithms have the potentials to optimize the different quality-of-service parameters of VANET routing such as bandwidth, end-to-end delay, throughput, control overhead, and packet delivery ratio. However, to the best of the authors' knowledge, surveys on RL-based routing protocols for VANETs have not been conducted. To fulfill this gap in the literature and to provide future research directions, it is necessary to aggregate the scattered works on this topic. This study presents a comparative investigation of RL-based routing protocols, by considering their working procedure, advantages, disadvantages, and applications. They are qualitatively compared in terms of key features, characteristics, optimization criteria, performance evaluation techniques, and implemented RL techniques. Lastly, open issues and research challenges are discussed to make RL-based VANET routing protocols more efficient in the future.
Published: 2021

47. H∞ Tracking Control for Linear Discrete-Time Systems: Model-Free Q-Learning Designs

Author: Frank L. Lewis, Jihong Zhu, Yunjie Yang, and Yan Wan
Subjects: 0209 industrial biotechnology, Control and Optimization, Computer science, Q-learning, Feed forward, 02 engineering and technology, Algebraic Riccati equation, System dynamics, Tracking error, 020901 industrial engineering & automation, Discrete time and continuous time, Control and Systems Engineering, Control theory, Bellman equation, Convergence (routing), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing
Abstract: In this letter, a novel model-free Q-learning based approach is developed to solve the H ∞ tracking problem for linear discrete-time systems. A new exponential discounted value function is introduced that includes the cost of the whole control input and tracking error. The tracking Bellman equation and the game algebraic Riccati equation (GARE) are derived. The solution to the GARE leads to the feedback and feedforward parts of the control input. A Q-learning algorithm is then developed to learn the solution of the GARE online without requiring any knowledge of the system dynamics. Convergence of the algorithm is analyzed, and it is also proved that probing noises in maintaining the persistence of excitation (PE) condition do not result in any bias. An example of the F-16 aircraft short period dynamics is developed to validate the proposed algorithm.
Published: 2021

48. A Q-Learning-Based Resource Allocation for Downlink Non-Orthogonal Multiple Access Systems Considering QoS

Author: Yong Li, Miodrag Bolic, Qi Zhai, Chenxi Liu, and Wei Cheng
Subjects: Mathematical optimization, reinforcement learning, Optimization problem, General Computer Science, Computer science, Quality of service, General Engineering, Q-learning, NOMA, 020206 networking & telecommunications, 020302 automobile design & engineering, 02 engineering and technology, Spectral efficiency, power allocation, Power budget, spectral efficiency, TK1-9971, 0203 mechanical engineering, Single antenna interference cancellation, 0202 electrical engineering, electronic engineering, information engineering, sum rate, Resource allocation, Reinforcement learning, General Materials Science, Electrical engineering. Electronics. Nuclear engineering
Abstract: As a technology that can accommodate more users and significantly improve spectral efficiency, non-orthogonal multiple access (NOMA) has attracted the attention of many scholars in recent years. The basic idea of NOMA is to implement multiple access in the power domain and decode the desired signal via successive interference cancellation (SIC). However, the resource allocation problem in such NOMA system is non-convex. It is difficult to directly solve this optimization problem through conventional methods. As such, we propose to apply a reinforcement learning (RL) approach based on cooperative Q-learning to solve the resource allocation problem in multi-antenna downlink NOMA systems. First, we formulate the resource allocation process as a sum rate maximization problem, subject to the power budget constraints and quality of service (QoS) condition. Second, we design a reward function to improve the sum rate while meeting the power and capacity constraints. Multiple Q-tables are created and cooperatively updated to get the optimal beamforming matrix. Then, we analyze the convergence of our proposed RL based power allocation method. Our simulations show that the proposed power allocation scheme yields excellent performance in terms of sum rate, energy efficiency, and spectral efficiency.
Published: 2021

49. Priority-Based Joint Resource Allocation With Deep Q-Learning for Heterogeneous NOMA Systems

Author: Wooyeol Choi and Sifat Rezwan
Subjects: Karush–Kuhn–Tucker conditions, General Computer Science, Channel allocation schemes, Computer science, Quality of service, Distributed computing, 020208 electrical & electronic engineering, Internet of Things, General Engineering, Q-learning, 020206 networking & telecommunications, 02 engineering and technology, non-orthogonal multiple access (NOMA), Single antenna interference cancellation, Deep Q-learning, 0202 electrical engineering, electronic engineering, information engineering, Resource allocation, General Materials Science, Resource management, Network performance, lcsh:Electrical engineering. Electronics. Nuclear engineering, joint resource allocation, lcsh:TK1-9971
Abstract: For heterogeneous demands in fifth-generation (5G) new radio (NR), a massive machine type communication (mMTC), enhanced mobile broadband (eMBB), and ultra-reliable and low-latency communication (URLLC) services have been introduced. To ensure these quality-of-service (QoS) requirements, non-orthogonal multiple access (NOMA) has been introduced in which multiple devices can be served from the same frequency by manipulating the power domain and successive interference cancellation (SIC) technique. To maximize the efficiency of NOMA systems, an optimal resource allocation, such as power allocation and channel assignment, is a key issue that needs to be solved. Although many researchers have proposed multiple solutions, there have been no studies addressing the 5G QoS requirements and three services that coexist in the same network. In this paper, we formulate an optimal power allocation scheme under Karush–Kuhn–Tucker (KKT) optimality conditions incorporating different NOMA constraints to maximize the channel sum-rate and system fairness. We then propose a priority-based channel assignment with a deep $Q$ -learning algorithm to maintain the 5G QoS requirements and increase the network performance. Finally, We conduct extensive simulations with respect to different system parameters and can confirm that the proposed scheme performs better than other existing schemes.
Published: 2021

50. ECRKQ: Machine Learning-Based Energy-Efficient Clustering and Cooperative Routing for Mobile Underwater Acoustic Sensor Networks

Author: Jianying Zhu, Yougan Chen, Xiang Sun, Jianming Wu, Zhenwen Liu, and Xiaomei Xu
Subjects: clustering and routing, General Computer Science, Computer science, 02 engineering and technology, Machine learning, computer.software_genre, 01 natural sciences, Base station, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Electrical and Electronic Engineering, Cluster analysis, K-means, underwater acoustic communications, Network packet, business.industry, Node (networking), 010401 analytical chemistry, General Engineering, 020206 networking & telecommunications, cooperative communications, Energy consumption, TK1-9971, 0104 chemical sciences, Transmission (telecommunications), Q-learning, Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, computer, Underwater acoustic communication, Data transmission
Abstract: The dynamic topology, narrow transmission bandwidth, and limited energy of sensor nodes in mobile underwater acoustic sensor networks (UASNs) pose challenges to design an efficient and robust network for underwater communications. In this paper, we propose a novel machine learning-based clustering and routing scheme, named energy-efficient clustering and cooperative routing based on improved K-means and Q-learning (ECRKQ), to reduce and balance energy consumption among sensor nodes in a mobile UASN and improve the bandwidth utilization. In the cluster head (CH) selection stage, ECRKQ modifies the K-means algorithm to dynamically select a CH based on the residual energy of the node and the distance from the node to the centroid in a cluster. In the clustering stage, ECRKQ adopts the Q-learning algorithm by incorporating the residual energy of the CH, the energy consumption of data transmission from the node to the CH, and the energy consumption of the data transmission from the CH to the base station into the Q-value function. In the data transmission stage, ECRKQ applies the dynamic coded cooperation (DCC) transmission to improve the bandwidth utilization and the robustness of the underwater communications. In the DCC transmission, cooperative nodes are also dynamically selected based on the residual energy and the energy consumption of transmitting a packet to their destinations. In the simulation, we apply the ocean current drifting model to emulate the position variation of nodes caused by ocean currents in a mobile UASN. The simulation results show that the proposed ECRKQ scheme can achieve more balanced energy consumption among sensor nodes in a mobile UASN than that of the existing scheme.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

1,022 results on '"q-learning"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources