Descriptor: "reinforcement learning (RL)" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"reinforcement learning (RL)"' showing total 1,393 results

Start Over Descriptor "reinforcement learning (RL)"

1,393 results on '"reinforcement learning (RL)"'

1. A Systematic Review of Optimal Task Scheduling Methods Using Machine Learning in Cloud Computing Environments

Author: Patwari, Krishna Rao, Kumar, Raghvendra, Sastry, J. S. V. R. S., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Goar, Vishal, editor, Kuri, Manoj, editor, Kumar, Rajesh, editor, and Senjyu, Tomonobu, editor
Published: 2025
Full Text: View/download PDF

2. Study of Inverse Kinematics Solution for a 5-Axis Mitsubishi RV-2AJ Robotic Arm Using Deep Reinforcement Learning

Author: Hazem, Zied Ben, Guler, Nivine, El Fezzani, Walid, Kacprzyk, Janusz, Series Editor, Novikov, Dmitry A., Editorial Board Member, Shi, Peng, Editorial Board Member, Cao, Jinde, Editorial Board Member, Polycarpou, Marios, Editorial Board Member, Pedrycz, Witold, Editorial Board Member, AlDhaen, Esra, editor, Braganza, Ashley, editor, Hamdan, Allam, editor, and Chen, Weifeng, editor
Published: 2025
Full Text: View/download PDF

3. Fuzzy Reinforcement Learning Algorithm for Efficient Task Scheduling in Fog-Cloud IoT-Based Systems.

Author: Ghafari, Reyhane and Mansouri, Najme
Abstract: In recent years, the number of IoT applications that require low latency has increased greatly. Traditional cloud servers cannot handle these applications due to strict latency requirements. Edge technologies like fog computing meet these applications' latency needs. Computing infrastructure is near end-user devices in fog-cloud environments. There are numerous traditional methods for scheduling IoT applications on heterogeneous and distributed fog-cloud nodes in these fields. Research in machine learning and its applications in many fields has grown tremendously in recent years. Machine learning algorithms such as reinforcement learning (RL) can be used to learn and make decisions based on reward signals from the environment. The purpose of this paper is to present a Task Scheduling algorithm based on Fuzzy Reinforcement Learning (TSFRL) to allocate fog-cloud computing resources so as to meet the deadlines of IoT requests. The scheduling problem is initially formulated to reduce response times, costs, and energy consumption. Fuzzy logic is then used to prioritize tasks. Fog nodes and cloud nodes employ the on-policy reinforcement learning methodology to prioritize delay-sensitive tasks with a higher priority and delay-tolerant ones with a lower priority. The suggested strategy outperforms existing algorithms in response time, cost, energy usage, and percentage of deadlines met. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Machine Learning-Based Resource Allocation Algorithm to Mitigate Interference in D2D-Enabled Cellular Networks.

Author: Kamruzzaman, Md, Sarkar, Nurul I., and Gutierrez, Jairo
Abstract: Mobile communications have experienced exponential growth both in connectivity and multimedia traffic in recent years. To support this tremendous growth, device-to-device (D2D) communications play a significant role in 5G and beyond 5G networks. However, enabling D2D communications in an underlay, heterogeneous cellular network poses two major challenges. First, interference management between D2D and cellular users directly affects a system's performance. Second, achieving an acceptable level of link quality for both D2D and cellular networks is necessary. An optimum resource allocation is required to mitigate the interference and improve a system's performance. In this paper, we provide a solution to interference management with an acceptable quality of services (QoS). To this end, we propose a machine learning-based resource allocation method to maximize throughput and achieve minimum QoS requirements for all active D2D pairs and cellular users. We first solve a resource optimization problem by allocating spectrum resources and controlling power transmission on demand. As resource optimization is an integer nonlinear programming problem, we address this problem by proposing a deep Q-network-based reinforcement learning algorithm (DRL) to optimize the resource allocation issue. The proposed DRL algorithm is trained with a decision-making policy to obtain the best solution in terms of spectrum efficiency, computational time, and throughput. The system performance is validated by simulation. The results show that the proposed method outperforms the existing ones. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Development of human decision making model with consideration of human factors through reinforcement learning and prospect utility theory.

Author: Gupta, Nimisha, Ahirwal, Mitul Kumar, and Atulkar, Mithilesh
Subjects: *ANALYTIC hierarchy process, *UTILITY theory, *UTILITY functions, *DECISION making, *ACQUISITION of data
Abstract: Human decision-making (HDM) is a complex process. Various human factors play a significant role in this process. Human factors directly or indirectly affect the entire process of decision-making (DM). In this study, an attempt has been made to integrate some of the human factors like past experiences (pe), emotion (ef), times factors (tf), and uncertain (un) with the reinforcement learning (RL) method to develop model for HDM. For this Iowa gambling Task (IGT) has been used as a data collection tool, data of 57 subjects were collected. It is a well-known experience-based task that helps to identify the DM behaviour of participants. An AHP method has been also used to decide the criteria weight to different human factors in the HDM model. Four learning models are developed that are the combination of different utility functions, learning rules, and choice rules. The AHP method decides the preference of various factors incorporated in the developed models. From the results, it is observed that the model based on prospect utility, decay RL, and trial dependency (PU-DRI-TDC Model) performs better when the emotion factor was given the highest preference than others. In addition to this, the IGT learning of participants was also analysed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. 融合动态奖励策略的无人机编队路径规划方法.

Author: 唐恒, 孙伟, 吕磊, 贺若飞, 吴建军, 孙昌浩, and 孙田野
Subjects: FORMATION flying, DRONE aircraft, REINFORCEMENT learning, ALGORITHMS
Abstract: Copyright of Systems Engineering & Electronics is the property of Journal of Systems Engineering & Electronics Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

7. Simulation of Coherent Excavator Operations in Earthmoving Tasks Based on Reinforcement Learning.

Author: Liu, Yongyue, Wang, Yaowu, and Zhou, Zhenzong
Subjects: REINFORCEMENT learning, INDUSTRIAL safety, HYDRAULIC models, CONSTRUCTION projects, WORKING hours
Abstract: Earthwork operations are critical to construction projects, with their safety and efficiency influenced by factors such as operator skill and working hours. Pre-construction simulation of these operations is essential for optimizing outcomes, providing key training for operators and improving safety awareness and operational efficiency. This study introduces a hierarchical cumulative reward mechanism that decomposes complex operational behaviors into simple, fundamental actions. The mechanism prioritizes reward function design elements, including order, size, and form, thus simplifying excavator operation simulation using reinforcement learning (RL) and enhancing policy network reusability. A 3D model of a hydraulic excavator was constructed with six degrees of freedom—comprising the boom, arm, bucket, base, and left/right tracks. The Proximal Policy Optimization (PPO) algorithm was applied to train four basic behaviors: scraping, digging, throwing, and turning back. Motion simulation was successfully achieved using diggable terrain resources. Results demonstrate that the simulated excavator, powered by RL neural networks, can perform coordinated actions and maintain smooth operational performance. This research offers practical implications by rapidly illustrating the full operational process before construction, delivering immersive movies, and enhancing worker safety and operational efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Pri-DDQN: learning adaptive traffic signal control strategy through a hybrid agent

Author: Yanliu Zheng, Juan Luo, Han Gao, Yi Zhou, and Keqin Li
Subjects: Adaptive traffic signal control (ATSC), Double DQN, Decay $$\varepsilon $$ ε -greedy, Priority-based experience replay, Reinforcement learning (RL), Electronic computers. Computer science, QA75.5-76.95, Information technology, T58.5-58.64
Abstract: Abstract Adaptive traffic signal control is the core of the intelligent transportation system (ITS), which can effectively reduce the pressure on traffic congestion and improve travel efficiency. Methods based on deep Q-leaning network (DQN) have become the mainstream to solve single-intersection traffic signal control. However, most of them neglect the important difference of samples and the dependence of traffic states, and cannot quickly respond to randomly changing traffic flows. In this paper, we propose a new single-intersection traffic signal control method (Pri-DDQN) based on reinforcement learning and model the traffic environment as a reinforcement learning environment, and the agent chooses the best action to schedule the traffic flow at the intersection based on the real-time traffic states. With the goal of minimizing the waiting time and queue length at intersections, we use double DQN to train the agent, incorporate traffic state and reward into the loss function, and update the target network parameters asynchronously, to improve the agent’s learning ability. We try to use the power function to dynamically change the exploration rate to accelerate convergence. In addition, we introduce a priority-based dynamic experience replay mechanism to increase the sampling rate of important samples. The results show that Pri-DDQN achieves better performance, compared to the best baseline, it reduces the average queue length is reduced by 13.41%, and the average waiting time by 32.33% at the intersection.
Published: 2024
Full Text: View/download PDF

9. Optimal region-specific social distancing strategies in a complex multi-patch model through reinforcement learning.

Author: Lee, Hyosun, Abdulali, Arsen, Park, Haeyoung, and Lee, Sunmi
Subjects: *SOCIAL distancing, *DEEP reinforcement learning, *REINFORCEMENT learning, *OPTIMIZATION algorithms, *INFECTIOUS disease transmission
Abstract: Although non-pharmaceutical interventions such as social distancing have proven effective in curbing outbreaks, they also carry economic consequences. This poses a dilemma for policymakers striving to find a balance between disease control and economic burden. This delicate balance varies regionally, influenced by non-epidemiological factors such as population movements, socio-demographic characteristics, and the intricacies of social distancing policies. These factors interact in intricate ways, shaping the transmission dynamics of COVID-19. To address this complexity, we propose an innovative approach utilizing deep reinforcement learning (RL). This method assists in tailoring intervention policies for diverse regions, taking into account their unique dynamics. We incorporate South Korea's social distancing policies and their economic impact into a RL framework with a multi-region epidemic model, offering a comprehensive solution. We integrate official mobility data and GDP specific to each region, employing the proximity policy optimization algorithm to determine the most appropriate region-specific social distancing policy. The algorithm's reward function considers both outbreak control and economic impacts, providing policymakers with the flexibility to fine-tune the balance between these two factors according to their preferences. This adjustment can be performed across three distinct cost scenarios: High, Base, and Low-cost scenarios. In scenarios with High-costs, social distancing measures are aimed at regions with extensive connectivity and higher transmission rates. When costs are moderate, policies center around the period of peak prevalence, illustrating adaptable strategies in areas characterized by high transmission rates, budget limitations, and population mobility. In situations with Low-costs, these measures encompass most regions, excluding those with low transmission rates. The study's results support focused interventions in specific regions to balance outbreak control and economic impact mitigation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. A reinforcement learning-based GWO-RNN approach for energy efficiency in data centers by minimizing virtual machine migration.

Author: Parsafar, Parsa
Abstract: In the era of exponential data growth, data centers face a pressing need to manage energy consumption while maintaining performance. Existing methods for optimizing energy efficiency, particularly in the context of virtual machine (VM) migrations, often fall short due to their inability to manage the complex, nonlinear relationships between resource utilization, and energy consumption. Moreover, many rely on static thresholds or single-resource metrics, which do not capture the dynamic and multi-faceted nature of data centers. In contrast, this paper introduces a novel approach that integrates a recurrent neural network (RNN) with a gray wolf optimizer (GWO). Unlike traditional models, this approach predicts future energy consumption using a more comprehensive set of resource metrics and dynamically manages workloads, reducing unnecessary VM migrations. The use of GWO optimizes the RNN's ability to capture nonlinearities, while reinforcement learning allows for continuous improvement based on real-time performance feedback. This novel combination demonstrates high predictive accuracy with minimal energy overhead, reducing the error margin to only 11% compared to optimal solutions. By doing so, the GWO-RNN framework provides a robust, adaptive solution for energy-efficient VM management in cloud environments, offering a significant advancement in the quest for sustainable data centers. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

11. Pri-DDQN: learning adaptive traffic signal control strategy through a hybrid agent.

Author: Zheng, Yanliu, Luo, Juan, Gao, Han, Zhou, Yi, and Li, Keqin
Abstract: Adaptive traffic signal control is the core of the intelligent transportation system (ITS), which can effectively reduce the pressure on traffic congestion and improve travel efficiency. Methods based on deep Q-leaning network (DQN) have become the mainstream to solve single-intersection traffic signal control. However, most of them neglect the important difference of samples and the dependence of traffic states, and cannot quickly respond to randomly changing traffic flows. In this paper, we propose a new single-intersection traffic signal control method (Pri-DDQN) based on reinforcement learning and model the traffic environment as a reinforcement learning environment, and the agent chooses the best action to schedule the traffic flow at the intersection based on the real-time traffic states. With the goal of minimizing the waiting time and queue length at intersections, we use double DQN to train the agent, incorporate traffic state and reward into the loss function, and update the target network parameters asynchronously, to improve the agent’s learning ability. We try to use the power function to dynamically change the exploration rate to accelerate convergence. In addition, we introduce a priority-based dynamic experience replay mechanism to increase the sampling rate of important samples. The results show that Pri-DDQN achieves better performance, compared to the best baseline, it reduces the average queue length is reduced by 13.41%, and the average waiting time by 32.33% at the intersection. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

12. Optimized Inverse Dead‐Zone Control Using Reinforcement Learning for a Class of Nonlinear Systems.

Author: Sun, Wenxia, Ma, Shuaihua, Li, Bin, and Wen, Guoxing
Subjects: *NONLINEAR dynamical systems, *NONLINEAR systems, *ADAPTIVE control systems, *INVERSE functions, *ALGORITHMS, *ADAPTIVE fuzzy control
Abstract: ABSTRACT In this article, an optimized inverse dead‐zone control using reinforcement learning (RL) is developed for a class of nonlinear dynamic systems. The dead‐zone is frequently occurred in the nonlinear control system, and it can affect the control performance and even cause the system instable. Hence, it is very requisite to consider the effect of dead‐zone in the design of control strategy. In this proposed optimized inverse dead‐zone control, the basic idea is to find the optimized control as input and the adaptive algorithm to estimate the unknown parameters for the inverse dead‐zone function, so that the available dead‐zone input for system control can be derived. Comparing with traditional methods, on the one hand, the proposed dead zone inverse method is with fewer adaptive parameters, on the other hand, the RL under identifier‐critic‐actor architecture is with the simplified algorithm. Finally, theoretical and simulation results manifest the feasibility of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. A Novel Medium Access Policy Based on Reinforcement Learning in Energy-Harvesting Underwater Sensor Networks.

Author: Eriş, Çiğdem, Gül, Ömer Melih, and Bölük, Pınar Sarısaray
Subjects: *MACHINE learning, *WIRELESS sensor networks, *REINFORCEMENT learning, *SENSOR networks, *ENERGY consumption, *TIME management
Abstract: Underwater acoustic sensor networks (UASNs) are fundamental assets to enable discovery and utilization of sub-sea environments and have attracted both academia and industry to execute long-term underwater missions. Given the heightened significance of battery dependency in underwater wireless sensor networks, our objective is to maximize the amount of harvested energy underwater by adopting the TDMA time slot scheduling approach to prolong the operational lifetime of the sensors. In this study, we considered the spatial uncertainty of underwater ambient resources to improve the utilization of available energy and examine a stochastic model for piezoelectric energy harvesting. Considering a realistic channel and environment condition, a novel multi-agent reinforcement learning algorithm is proposed. Nodes observe and learn from their choice of transmission slots based on the available energy in the underwater medium and autonomously adapt their communication slots to their energy harvesting conditions instead of relying on the cluster head. In the numerical results, we present the impact of piezoelectric energy harvesting and harvesting awareness on three lifetime metrics. We observe that energy harvesting contributes to 4% improvement in first node dead (FND), 14% improvement in half node dead (HND), and 22% improvement in last node dead (LND). Additionally, the harvesting-aware TDMA-RL method further increases HND by 17% and LND by 38%. Our results show that the proposed method improves in-cluster communication time interval utilization and outperforms traditional time slot allocation methods in terms of throughput and energy harvesting efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Balanced prioritized experience replay in off-policy reinforcement learning.

Author: Lou, Zhouwei, Wang, Yiye, Shan, Shuo, Zhang, Kanjian, and Wei, Haikun
Subjects: *REINFORCEMENT learning, *ALGORITHMS
Abstract: In Off-Policy reinforcement learning (RL), the experience imbalance problem can affect learning performance. The experience imbalance problem refers to the phenomenon that the experiences obtained by the agent during the learning process are unevenly distributed in the state space, resulting in the agent's inability to accurately estimate the value of each potential state. This problem is typically caused by environments with high-dimensional state and action spaces, as well as the exploration–exploitation mechanism inherent in RL. This article proposes a balanced prioritized experience replay (BPER) algorithm based on experience rarity. First, an evaluation metric to quantify experience rarity is defined. Then, the sampling priority of each experience is calculated according to this metric. Finally, prioritized experience replay is performed according to the sampling priority. BPER increases the sampling frequency of high-rarity experiences and decreases the sampling frequency of low-rarity experiences, enabling the agent to learn more comprehensive knowledge. We evaluate BPER on a series of MuJoCo continuous control tasks. Experimental results show that BPER can effectively improve the learning performance while mitigating the impact of the experience imbalance problem. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. A train trajectory optimization method based on the safety reinforcement learning with a relaxed dynamic reward.

Author: Cheng, Ligang, Cao, Jie, Yang, Xiaofeng, Wang, Wenxian, and Zhou, Zijian
Abstract: Train trajectory optimization (TTO) is an effective way to address energy consumption in rail transit. Reinforcement learning (RL), an excellent optimization method, has been used to solve TTO problems. Although traditional RL algorithms use penalty functions to restrict the random exploration behavior of agents, they cannot fully guarantee the safety of the process and results. This paper proposes a proximal policy optimization based safety reinforcement learning framework (S-PPO) for the train trajectory optimization, including a safe action rechoosing mechanism (SARM) and a relaxed dynamic reward mechanism (RDRM) combining a relaxed sparse reward and a dynamic dense reward. SARM guarantees that the new states generated by the agent consistently adhere to the environmental security constraints, thereby enhancing sampling efficiency and facilitating algorithm convergence. RDRM makes it easier for agents to obtain successful samples by relaxing time constraints, which also offers a better balance between exploration and exploitation. The experimental results show that S-PPO can significantly improve performance and obtain better train operation trajectories than soft constraint methods, and the convergence process is smoother. Finally, it was demonstrated that S-PPO exhibits good adaptability across various speed limit tracks.Article Highlights: Discretize the train operation process based on distance and construct a Markov decision process model. A safety reinforcement learning framework based on PPO is proposed to maintain the learning process within the constraints of boundaries. A relaxed sparse reward which relaxes the constraint of train planned trip time is proposed to enhance the likelihood of agents completing tasks. A dynamic dense reward can balance the contributions of time and energy consumption and offer enhanced feedback. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Traffic Management Based on Cloud and MEC Architecture with Evolutionary Approaches towards AI: A Review.

Author: Naser, Zainab Saadoon, Belguith, Hend Marouane, and Fakhfakh, Ahmed
Subjects: DEEP reinforcement learning, REINFORCEMENT learning, ADAPTIVE control systems, MOBILE computing, DEEP learning
Abstract: This review paper explores the significance of machine learning (ML), deep learning (DL), reinforcement learning (RL), and deep reinforcement learning (DRL) techniques in improving traffic management based on cloud and mobile edge computing (MEC) architectures. The key findings and contributions of this review highlight the potential of these techniques for transforming traffic management systems through data-driven decision-making, adaptive control, and optimization. The challenges identified in this field include data availability and quality, scalability and computational requirements, privacy and security concerns, and ethical considerations. In conclusion, ML, DL, RL, and DRL techniques, in conjunction with cloud and MEC architectures, have significant implications for improving traffic management. Their ability to process and analyse large-scale and real-time traffic data enables improved traffic flow, reduced congestion, enhanced energy efficiency, and enhanced overall transportation system performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Optimization of news dissemination push mode by intelligent edge computing technology for deep learning.

Author: DeGe, JiLe and Sang, Sina
Subjects: *DEEP reinforcement learning, *PATTERN recognition systems, *SOCIAL media, *NEWS websites, *RECOMMENDER systems, *DEEP learning, *REINFORCEMENT learning
Abstract: The Internet era is an era of information explosion. By 2022, the global Internet users have reached more than 4 billion, and the social media users have exceeded 3 billion. People face a lot of news content every day, and it is almost impossible to get interesting information by browsing all the news content. Under this background, personalized news recommendation technology has been widely used, but it still needs to be further optimized and improved. In order to better push the news content of interest to different readers, users' satisfaction with major news websites should be further improved. This study proposes a new recommendation algorithm based on deep learning and reinforcement learning. Firstly, the RL algorithm is introduced based on deep learning. Deep learning is excellent in processing large-scale data and complex pattern recognition, but it often faces the challenge of low sample efficiency when it comes to complex decision-making and sequential tasks. While reinforcement learning (RL) emphasizes learning optimization strategies through continuous trial and error through interactive learning with the environment. Compared with deep learning, RL is more suitable for scenes that need long-term decision-making and trial-and-error learning. By feeding back the reward signal of the action, the system can better adapt to the unknown environment and complex tasks, which makes up for the relative shortcomings of deep learning in these aspects. A scenario is applied to an action to solve the sequential decision problem in the news dissemination process. In order to enable the news recommendation system to consider the dynamic changes in users' interest in news content, the Deep Deterministic Policy Gradient algorithm is applied to the news recommendation scenario. Opposing learning complements and combines Deep Q-network with the strategic network. On the basis of fully summarizing and thinking, this paper puts forward the mode of intelligent news dissemination and push. The push process of news communication information based on edge computing technology is proposed. Finally, based on Area Under Curve a Q-Leaning Area Under Curve for RL models is proposed. This indicator can measure the strengths and weaknesses of RL models efficiently and facilitates comparing models and evaluating offline experiments. The results show that the DDPG algorithm improves the click-through rate by 2.586% compared with the conventional recommendation algorithm. It shows that the algorithm designed in this paper has more obvious advantages in accurate recommendation by users. This paper effectively improves the efficiency of news dissemination by optimizing the push mode of intelligent news dissemination. In addition, the paper also deeply studies the innovative application of intelligent edge technology in news communication, which brings new ideas and practices to promote the development of news communication methods. Optimizing the push mode of intelligent news dissemination not only improves the user experience, but also provides strong support for the application of intelligent edge technology in this field, which has important practical application prospects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Reinforced active learning for CVD-grown two-dimensional materials characterization.

Author: Li, Zebin, Yao, Fei, and Sun, Hongyue
Subjects: *ACTIVE learning, *REINFORCEMENT learning, *CHEMICAL vapor deposition, *INTRINSIC motivation, *MATERIALS science
Abstract: Two-dimensional (2D) materials are one of the research frontiers in material science due to their promising properties. Chemical Vapor Deposition (CVD) is the most widely used technique to grow large-scale high-quality 2D materials. The CVD-grown 2D materials can be efficiently characterized by an optical microscope. However, annotating microscopy images to distinguish the growth quality from good to bad is time-consuming. In this work, we explore Active Learning (AL), which iteratively acquires quality labels from a human and updates the classifier for microscopy images. As a result, AL only requires a limited amount of labels to achieve a good model performance. However, the existing handcrafted query strategies in AL are not good at dealing with the dynamics during the query process since the rigid handcrafted query strategies may not be able to choose the most informative instances (i.e., images) after each query. We propose a Reinforced Active Learning (RAL) framework that uses reinforcement learning to learn a query strategy for AL. Besides, by introducing the intrinsic motivation into the proposed framework, a unique intrinsic reward is designed to enhance the classification performance. The results show that RAL outperforms AL, and can significantly reduce the annotation efforts for the CVD-grown 2D materials characterization. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. A policy configured resource management scheme for AHNS using link reliability K‐means clustering algorithm and Weibull distribution‐based blue monkey optimization.

Author: Punitha, P., Sivaparthipan, C. B., Muthu, Bala Anand, and Lakshmana Kumar, R.
Subjects: *K-means clustering, *RESOURCE management, *ARTIFICIAL neural networks, *ORDER picking systems, *BOLTZMANN machine, *MONKEYS, *COGNITIVE radio
Abstract: Summary: Cognitive Radio Ad Hoc Networks (CRAHNs) are an essential method for resolving conflicts between extreme spectrum scarcity and rapid traffic increase while maintaining high‐quality service for consumers. However, the coexistence of primary and secondary users represents a critical challenge for reasonable resource allocation in order to provide a sustaining system performance. Many approaches have been developed to efficiently allocate resources; however, these methods are currently limited by things like user collision, strange traffic networks, and high data transmission error rates. To address these constraints, this paper proposes a policy‐configured reinforcement learning‐based ad hoc network (AHN) model. To obtain the ideal policy configuration for the network, the system first models the cognitive radio (CR) network, in which nodes are initialised and grouped employing the Link Reliability K‐Means clustering Algorithm (LR‐KMA). The available spectrum was then detected and separated into multiple bands utilizing coherent‐based detection (CBD) and signal source identification employing the Parzen–Rosenblatt Window‐based Restricted Boltzmann Machine (PRW‐RBM). Next, the learning model for the resource allocation process employs the Weibull Distribution‐based Blue Monkey Optimization (WD‐BMO) approach to pick the relevant bands. Finally, the experimental results were analyzed in order to evaluate the proposed resource allocation model's performance in CRAHNs. When compared with previous findings, the proposed method improves resource utilization by 5%, the proposed model achieves a 7% higher throughput, and the PRW‐RBM's accuracy improves classification accuracy by 1.07%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Integration of Q-Learning and PID Controller for Mobile Robots Trajectory Tracking in Unknown Environments.

Author: Munaf, Almojtaba and Jasim Almusawi, Ahmed Rahman
Subjects: ROBOTIC path planning, MACHINE learning, PID controllers, ROBOTICS, REINFORCEMENT learning, MOBILE robots, AUTOMOTIVE navigation systems
Abstract: In the realm of autonomous robotics, navigating differential drive mobile robots through unknown environments poses significant challenges due to their complex nonholonomic constraints. This issue is particularly acute in applications requiring precise trajectory tracking and effective obstacle avoidance without prior knowledge of the surroundings. Traditional navigation systems often struggle with these demands, leading to inefficiencies and potential safety risks. To address this problem, our studies propose an algorithm that integrates machine learning and control concepts, especially through the synergistic software of a Q-learning set of rules and a (PID) controller. This technique leverages the adaptability of Q-learning pathfinding and the precision of PID control for actual-time trajectory adjustment, aiming to beautify the robotics' navigation skills. Our comprehensive technique includes growing a country-area version that integrates Q-values with the dynamics of differential power robots, employing Bellman's equation for iterative coverage refinement. This version enables the robotics' capacity to dynamically adapt its navigation techniques in reaction to instant environmental feedback, thereby optimizing efficiency and protection in actual time. The effects of our full-size simulations exhibit a marked improvement in trajectory-tracking accuracy and impediment-avoidance competencies. These findings underscore the capability of combining machine learning algorithms with traditional methods to increase autonomous navigation technology in robotic systems. Our effects, derived from full-size simulations, suggest that the integration of Q-learning with PID controller markedly improves trajectory tracking accuracy, reduces tour times to targets, and complements the robotics' ability to navigate round barriers. This incorporated method demonstrates a tremendous advantage over conventional navigation systems, providing a sturdy way to the challenges of autonomous robot navigation in unpredictable environments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Model‐free learning‐based distributed cooperative tracking control of human‐in‐the‐loop multi‐agent systems.

Author: Mei, Di, Sun, Jian, Xu, Yong, and Dou, Lihua
Subjects: *MULTIAGENT systems, *MACHINE learning, *EARTH stations, *REINFORCEMENT learning, *ITERATIVE learning control
Abstract: This article studies the model‐free learning‐based distributed cooperative tracking control of human‐in‐the‐loop multi‐agent systems in the presence of an active leader. The core role of human‐in‐the‐loop is to use the ground station to send control commands to the non‐zero control input of the leader, and then directly or indirectly control a group of agents to complete complex tasks. Meanwhile, three essential demands including the completely unknown system model, the control objective obtained optimally, as well as no initial admissible control strategy requirement, are satisfied simultaneously. It is worth emphasizing that the relevant results only satisfy one or two demands at most, which are essentially not applicable to this problem. In this article, a model‐based human‐in‐the‐loop learning algorithm is first presented to achieve the optimal tracking control, as well as the convergence of the proposed learning algorithm is proved. Then, a bias‐based data‐driven learning algorithm is proposed, which provides the potential opportunities to overcome the difficulties caused by the above‐mentioned three demands. Finally, the validity of theoretical results is testified by a numerical example. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Inner External DQN LoRa SF Allocation Scheme for Complex Environments.

Author: Pang, Shengli, Kong, Delin, Wang, Xute, Pan, Ruoyu, Wang, Honggang, Ye, Zhifan, and Liu, Di
Subjects: REINFORCEMENT learning, NETWORK performance, TELECOMMUNICATION, WIRELESS communications, INTERNET of things
Abstract: In recent years, with the development of Internet of Things technology, the demand for low-power wireless communication technology has been growing, giving rise to LoRa technology. A LoRa network mainly consists of terminal nodes, gateways, and LoRa network servers. As LoRa networks often deploy many terminal node devices for environmental sensing, the limited resources of LoRa technology, the explosive growth in the number of nodes, and the ever-changing complex environment pose unprecedented challenges for the performance of the LoRa network. Although some research has already addressed the challenges by allocating channels to the LoRa network, the impact of complex and changing environmental factors on the LoRa network has yet to be considered. Reasonable channel allocation should be tailored to the situation and should face different environments and network distribution conditions through continuous adaptive learning to obtain the corresponding allocation strategy. Secondly, most of the current research only focuses on the channel adjustment of the LoRa node itself. Still, it does not consider the indirect impact of the node's allocation on the entire network. The Inner External DQN SF allocation method (IEDQN) proposed in this paper improves the packet reception rate of the whole system by using reinforcement learning methods for adaptive learning of the environment. It considers the impact on the entire network of the current node parameter configuration through nested reinforcement learning for further optimization to optimize the whole network's performance. Finally, this paper evaluates the performance of IEDQN through simulation. The experimental results show that the IEDQN method optimizes network performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Autonomous Drones in Urban Navigation: Autoencoder Learning Fusion for Aerodynamics.

Author: Wu, Jiahao, Ye, Yang, and Du, Jing
Subjects: *COMPUTATIONAL fluid dynamics, *REINFORCEMENT learning, *AERODYNAMICS, *BUILDING layout, *AERODYNAMIC load, *AERODYNAMICS of buildings
Abstract: Drones are becoming indispensable in emergency search and rescue (SAR), particularly in intricate urban areas where rapid and accurate response is crucial. This study addresses the pressing need for enhancing drone navigation in such complex, dynamic urban environments, where obstacles like building layouts and varying wind conditions create unique challenges. Particularly, the need for adapting drone autonomous navigation in correspondence with dynamic wind conditions in urban settings is emphasized because it is important for drones to avoid loss of control or crashes during SAR. This paper introduces a pioneering method integrating multiobjective reinforcement learning (MORL) with a convolutional autoencoder to train autonomous drones in comprehending and reacting to aerodynamic features in urban SAR. MORL enables the drone to optimize multiple goals, whereas the convolutional autoencoder generates synthetic wind simulations with a substantially lower computation cost compared to traditional computational fluid dynamics (CFD) simulations. A unique data transfer structure is also proposed, which fosters a seamless integration of perception and decision-making between machine learning (ML) and reinforcement learning (RL) components. This approach uses imagery data, specific to building layouts, allowing the drone to autonomously formulate policies, prioritize navigation decisions, optimize paths, and mitigate the impact of wind, all while negating the necessity for conventional aerodynamic force sensors. The method was validated with a model of New York City, offering substantial implications for enhancing automation algorithms in urban SAR. This innovation enables the possibility of more efficient, precise, and timely drone SAR operations within intricate urban landscapes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Adaptive resilient containment control using reinforcement learning for nonlinear stochastic multi-agent systems under sensor faults.

Author: Mo, Guanzong and Lyu, Yixin
Subjects: BACKSTEPPING control method, REINFORCEMENT learning, MULTIAGENT systems, STOCHASTIC systems, VIRTUAL design
Abstract: This article proposes an optimized backstepping control strategy designed for a category of nonlinear stochastic strict-feedback multi-agent systems (MASs) with sensor faults. The plan formulates optimized solutions for the respective subsystems by designing both virtual and actual controls, achieving overall optimization of the backstepping control. To address sensor faults, an adaptive neural network (NN) compensation control method is considered. The reinforcement learning (RL) framework based on neural network approximation is employed, deriving RL update rules from the negative gradient of a simple positive function correlated with the Hamilton-Jacobi-Bellman (HJB) equation. This significantly simplifies the RL algorithm while relaxing the constraints for known dynamics and persistent excitation. The theoretical analysis, based on stochastic Lyapunov theory, demonstrates the semi-global uniform ultimate boundedness (SGUUB) of all signals within the enclosed system, and illustrates the convergence of all follower outputs to the dynamic convex hull defined by the leaders. Ultimately, the proposed control strategy's effectiveness is validated through numerical simulations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Optimizing IoT Resource Allocation Using Reinforcement Learning

Author: Mostafa, Nour, Shdefat, Ahmed Younes, Al-Arnaout, Zakwan, Salman, Mohammad, Elsayed, Fahmi, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Garcia, Fausto P., editor, Jamil, Akhtar, editor, Hameed, Alaa Ali, editor, Ortis, Alessandro, editor, and Ramirez, Isaac Segovia, editor
Published: 2024
Full Text: View/download PDF

26. Beyond Traditional Motion Planning: A Proximal Policy Optimization Reinforcement Learning Approach for Robotics

Author: Rjoub, Gaith, Drawel, Nagat, Dssouli, Rachida, Bentahar, Jamal, Kassaymeh, Sofian, Alweshah, Mohammed, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Younas, Muhammad, editor, Awan, Irfan, editor, Kryvinska, Natalia, editor, Bentahar, Jamal, editor, and Ünal, Perin, editor
Published: 2024
Full Text: View/download PDF

27. Digital Twin-Driven Reinforcement Learning for Dynamic Path Planning of AGV Systems

Author: Lee, Donggun, Kang, Yong-Shin, Do Noh, Sang, Kim, Jaeung, Kim, Hijun, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Carette, Jacques, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Stettner, Lukasz, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, M. Davison, Robert, Editorial Board Member, Rettberg, Achim, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Thürer, Matthias, editor, Riedel, Ralph, editor, von Cieminski, Gregor, editor, and Romero, David, editor
Published: 2024
Full Text: View/download PDF

28. Improving User Experience via Reinforcement Learning-Based Resource Management on Mobile Devices

Author: Lu, Yufan, Hu, Chuang, Gong, Yili, Cheng, Dazhao, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Zhang, Xiankun, editor, and Pan, Yijie, editor
Published: 2024
Full Text: View/download PDF

29. Deep Learning Model for Predicting Rice Plant Disease Identification and Classification for Improving the Yield

Author: Singh, Jagendra, Singh, Navneet Pratap, Vinothkumar, B., Shelke, Nitin Arvind, Sharma, Deepak, Alsahlanee, Abbas Thajeel Rhaif, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Bajaj, Anu, editor, Hanne, Thomas, editor, and Hong, Tzung-Pei, editor
Published: 2024
Full Text: View/download PDF

30. Attention Scheduler Based on Reinforcement Learning for Multi-robot System

Author: Jiang, Kun, Kong, Lingyue, Dong, Lu, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Yu, Jianglong, editor, Liu, Yumeng, editor, and Li, Qingdong, editor
Published: 2024
Full Text: View/download PDF

31. Reinforcement Learning-Based Algorithm for Real-Time Automated Parking Decision Making

Author: Wei, Xiaoyi, Hou, Taixian, Zhao, Xiao, Tu, Jiaxin, Guan, Haiyang, Zhai, Peng, Zhang, Lihua, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Fang, Lu, editor, Pei, Jian, editor, Zhai, Guangtao, editor, and Wang, Ruiping, editor
Published: 2024
Full Text: View/download PDF

32. A Reinforcement Learning Method for Control Scheme of Permanent Magnet Synchronous Motor

Author: Tran, Xuan Khanh, Le, Duy Tung, Tran, Duc Thuan, Dao, Phuong Nam, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Nghia, Phung Trung, editor, Thai, Vu Duc, editor, Thuy, Nguyen Thanh, editor, Son, Le Hoang, editor, and Huynh, Van-Nam, editor
Published: 2024
Full Text: View/download PDF

33. Improving CCA Algorithms on SSVEP Classification with Reinforcement Learning Based Temporal Filtering

Author: Ou, Liang, Do, Thomas, Tran, Xuan-The, Leong, Daniel, Chang, Yu-Cheng, Wang, Yu-Kai, Lin, Chin-Teng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Tongliang, editor, Webb, Geoff, editor, Yue, Lin, editor, and Wang, Dadong, editor
Published: 2024
Full Text: View/download PDF

34. Operational Tunnel Model Generation Using Reinforcement Learning

Author: Rimella, Nicola, Fonsati, Arianna, Osello, Anna, di Prisco, Marco, Series Editor, Chen, Sheng-Hong, Series Editor, Vayas, Ioannis, Series Editor, Kumar Shukla, Sanjay, Series Editor, Sharma, Anuj, Series Editor, Kumar, Nagesh, Series Editor, Wang, Chien Ming, Series Editor, Cui, Zhen-Dong, Series Editor, Gabriele, Stefano, editor, Manuello Bertetto, Amedeo, editor, Marmo, Francesco, editor, and Micheletti, Andrea, editor
Published: 2024
Full Text: View/download PDF

35. On Reinforcement Learning for Part Dispatching in UAV-Served Flexible Manufacturing Systems

Author: Angelidou, Charikleia, Stathatos, Emmanuel, Vosniakos, George-Christopher, Chaari, Fakher, Series Editor, Gherardini, Francesco, Series Editor, Ivanov, Vitalii, Series Editor, Haddar, Mohamed, Series Editor, Cavas-Martínez, Francisco, Editorial Board Member, di Mare, Francesca, Editorial Board Member, Kwon, Young W., Editorial Board Member, Trojanowska, Justyna, Editorial Board Member, Xu, Jinyang, Editorial Board Member, Silva, Francisco J. G., editor, Ferreira, Luís Pinto, editor, Sá, José Carlos, editor, Pereira, Maria Teresa, editor, and Pinto, Carla M. A., editor
Published: 2024
Full Text: View/download PDF

36. Reinforced Lyapunov controllers for low-thrust lunar transfers

Author: Holt, Harry, Baresi, Nicola, and Armellin, Roberto
Published: 2024
Full Text: View/download PDF

37. Optimization of buffer design for mixed-model sequential production line based on simulation and reinforcement learning

Author: Choi, Jonghwan, Park, Jisoo, Noh, Sang Do, and Lee, Ju Yeon
Published: 2024
Full Text: View/download PDF

38. Driving key nodes to learn cooperation in social dilemma

Author: Fan, Litong, Guo, Hao, Yu, Dengxiu, Xu, Bowen, and Wang, Zhen
Published: 2024
Full Text: View/download PDF

39. Reinforcement learning for trench excavation

Author: Rankin, Jake
Subjects: reinforcement learning (RL), Artificial Intelligence (AI), machine Learning Predictions, Neural Networks, supervised learning, digital twins, autonomous excavation, autonomy, system architecture, autonomous architecture, feature selection
Abstract: Excavation autonomy is an area of industrial interest as there is a significant skills shortage for excavator operators, yet the industry is facing higher budget constraints than ever before. In particular, the excavation task of trenching is heavily impacted as it is difficult to perform but is also integral for most construction projects. Traditionally, autonomy in excavation is often rule-based or control theory-based, which is difficult to scale and apply to different scenarios such as different ground conditions or trench design, common in construction. Reinforcement learning has been applied with successes in similar fields, like robotics, demonstrating its capability in handling non-linearity and complex environments. This thesis addresses the deployment reinforcement learning to a trench excavation task, to provide a potential operator-like behaviour. Twin-Delayed Deep Deterministic Policy Gradients (TD3) were identified as the most appropriate for the research, due to their superior performance on robotics tasks. To deploy TD3 to trenching, a three-part strategy was used that focused on developing the environment, state, and reward function that TD3 would be involved with. First, the overall system architecture and environment for a reinforcement learning-based autonomy system was designed, which utilised the Common Data Environment to store and deploy a trained algorithm and the excavation plans. This was designed alongside driver data analysis to provide additional context to the challenges of trenching. Next, the selection of optimal machine sensors was studied, including the comparison of feature selection methods against existing sensor arrays. These were applied to a neural network trained on driver data, and the assumption was this would be like the policy of a trained TD3 algorithm. This was done to determine what impact the state has on the predictive performance of the policy whilst avoiding the inclusion of a reward function, allowing several machine tasks to use the same approach. Finally, TD3 was deployed to an excavator, focused on developing the reward function for trenching. This was done by breaking the trenching task into smaller tasks for developing the distance and mass rewards, before unifying them into one reward function. By focusing on individual elements of a reward function, the impact can be understood more effectively. TD3 was then deployed to perform a trench excavation task, where it was able to dig within the trench region and dump soil into a hopper. The novel findings of this research were 1) Exploratory analysis of driver data during trenching 2) An autonomy roadmap for excavation 3) An architecture of an RL-based autonomy system for excavation that utilises the driver, simulation, and Common Data Environment 4) The use of feature selection methods to determine machine sensor inputs for the policy network of TD3, without using the reward function 5) A new methodology for developing a reward function 6) The deployment of TD3 to perform a trench excavation task.
Published: 2023
Full Text: View/download PDF

40. Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in Chatbots

Author: Saadat Izadi and Mohamad Forouzanfar
Subjects: artificial intelligence (AI), chatbot training, error correction, conversational AI, natural language processing (NLP), reinforcement learning (RL), Electronic computers. Computer science, QA75.5-76.95
Abstract: This study explores the progress of chatbot technology, focusing on the aspect of error correction to enhance these smart conversational tools. Chatbots, powered by artificial intelligence (AI), are increasingly prevalent across industries such as customer service, healthcare, e-commerce, and education. Despite their use and increasing complexity, chatbots are prone to errors like misunderstandings, inappropriate responses, and factual inaccuracies. These issues can have an impact on user satisfaction and trust. This research provides an overview of chatbots, conducts an analysis of errors they encounter, and examines different approaches to rectifying these errors. These approaches include using data-driven feedback loops, involving humans in the learning process, and adjusting through learning methods like reinforcement learning, supervised learning, unsupervised learning, semi-supervised learning, and meta-learning. Through real life examples and case studies in different fields, we explore how these strategies are implemented. Looking ahead, we explore the different challenges faced by AI-powered chatbots, including ethical considerations and biases during implementation. Furthermore, we explore the transformative potential of new technological advancements, such as explainable AI models, autonomous content generation algorithms (e.g., generative adversarial networks), and quantum computing to enhance chatbot training. Our research provides information for developers and researchers looking to improve chatbot capabilities, which can be applied in service and support industries to effectively address user requirements.
Published: 2024
Full Text: View/download PDF

41. Multi-agent reinforcement learning based optimal energy sensing threshold control in distributed cognitive radio networks with directional antenna

Author: Thi Thu Hien Pham, Wonjong Noh, and Sungrae Cho
Subjects: Cognitive radio networks (CRNs), Cooperative spectrum sensing (CSS), Directional antennas, Multi-agent deep deterministic policy gradient (MADDPG), Reinforcement learning (RL), Information technology, T58.5-58.64
Abstract: In CRNs, it is crucial to develop an efficient and reliable spectrum detector that consistently provides accurate information about the channel state. In this work, we investigate a CSS in a fully-distributed environment where all secondary users (SUs) are equipped with directional antennas and make decisions based solely on their local knowledge without information sharing between SUs. First, we establish a stochastic sequential optimization problem, which is an NP-hard, that maximizes the SU’s detection accuracy by the dynamic and optimal control of the energy sensing/detection threshold. It can enable SUs to select an available channel and sector without causing interference to the primary network. To address it in a distributed environment, the problem is transformed into a decentralized partially observed Markov decision process (Dec-POMDP) problem. Second, in order to determine the best control for the Dec-POMDP in a practical environment without any prior knowledge of state–action transition probabilities, we develop a multi-agent deep deterministic policy gradient (MADDPG)-based algorithm, which is referred to as MA-DCSS. This algorithm adopts the centralized training and decentralized execution (CTDE) architecture. Third, we analyzed its computational complexity and showed the proposed approach’s scalability by the polynomial computational complexity, in terms of the number of channels, sectors, and SUs. Lastly, the simulation confirms that the proposed scheme provides enhanced performance in terms of convergence speed, accurate detection, and false alarm probabilities when it is compared to baseline algorithms.
Published: 2024
Full Text: View/download PDF

42. AI/ML Enabled Automation System for Software Defined Disaggregated Open Radio Access Networks: Transforming Telecommunication Business

Author: Sunil Kumar
Subjects: open radio access networks (o-ran), flexible radio access network intelligent controller (fric), reinforcement learning (rl), external applications (xapps), artificial intelligence (ai), machine learning (ml), sixth generation (6g), Electronic computers. Computer science, QA75.5-76.95
Abstract: Open Air Interface (OAI) alliance recently introduced a new disaggregated Open Radio Access Networks (O-RAN) framework for next generation telecommunications and networks. This disaggregated architecture is open, automated, software defined, virtual, and supports the latest advanced technologies like Artificial Intelligence (AI) Machine Learning (AI/ML). This novel intelligent architecture enables programmers to design and customize automated applications according to the business needs and to improve quality of service in fifth generation (5G) and Beyond 5G (B5G). Its disaggregated and multivendor nature gives the opportunity to new startups and small vendors to participate and provide cheap hardware software solutions to keep the market competitive. This paper presents the disaggregated and programmable O-RAN architecture focused on automation, AI/ML services, and applications with Flexible Radio access network Intelligent Controller (FRIC). We schematically demonstrate the reinforcement learning, external applications (xApps), and automation steps to implement this disaggregated O-RAN architecture. The idea of this research paper is to implement an AI/ML enabled automation system for software defined disaggregated O-RAN, which monitors, manages, and performs AI/ML-related services, including the model deployment, optimization, inference, and training.
Published: 2024
Full Text: View/download PDF

43. Deep Reinforcement Learning techniques for dynamic task offloading in the 5G edge-cloud continuum

Author: Gorka Nieto, Idoia de la Iglesia, Unai Lopez-Novoa, and Cristina Perfecto
Subjects: Task offloading, Performance evaluation, Energy consumption, Reinforcement Learning (RL), Quality-of-Experience (QoE), Multi-access Edge Computing (MEC), Computer engineering. Computer hardware, TK7885-7895, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract The integration of new Internet of Things (IoT) applications and services heavily relies on task offloading to external devices due to the constrained computing and battery resources of IoT devices. Up to now, Cloud Computing (CC) paradigm has been a good approach for tasks where latency is not critical, but it is not useful when latency matters, so Multi-access Edge Computing (MEC) can be of use. In this work, we propose a distributed Deep Reinforcement Learning (DRL) tool to optimize the binary task offloading decision, this is, the independent decision of where to execute each computing task, depending on many factors. The optimization goal in this work is to maximize the Quality-of-Experience (QoE) when performing tasks, which is defined as a metric related to the battery level of the UE, but subject to satisfying tasks’ latency requirements. This distributed DRL approach, specifically an Actor-Critic (AC) algorithm running on each User Equipment (UE), is evaluated through the simulation of two distinct scenarios and outperforms other analyzed baselines in terms of QoE values and/or energy consumption in dynamic environments, also demonstrating that decisions need to be adapted to the environment’s evolution.
Published: 2024
Full Text: View/download PDF

44. Robust Energy Management Policies for Solar Microgrids via Reinforcement Learning.

Author: Jones, Gerald, Li, Xueping, and Sun, Yulin
Subjects: *REINFORCEMENT learning, *MICROGRIDS, *CLEAN energy, *ENERGY management, *ENERGY policy, *RENEWABLE energy sources
Abstract: As the integration of renewable energy expands, effective energy system management becomes increasingly crucial. Distributed renewable generation microgrids offer green energy and resilience. Combining them with energy storage and a suitable energy management system (EMS) is essential due to the variability in renewable energy generation. Reinforcement learning (RL)-based EMSs have shown promising results in handling these complexities. However, concerns about policy robustness arise with the growing number of grid intermittent disruptions or disconnections from the main utility. This study investigates the resilience of RL-based EMSs to unforeseen grid disconnections when trained in grid-connected scenarios. Specifically, we evaluate the resilience of policies derived from advantage actor–critic (A2C) and proximal policy optimization (PPO) networks trained in both grid-connected and uncertain grid-connectivity scenarios. Stochastic models, incorporating solar energy and load uncertainties and utilizing real-world data, are employed in the simulation. Our findings indicate that grid-trained PPO and A2C excel in cost coverage, with PPO performing better. However, in isolated or uncertain connectivity scenarios, the demand coverage performance hierarchy shifts. The disruption-trained A2C model achieves the best demand coverage when islanded, whereas the grid-connected A2C network performs best in an uncertain grid connectivity scenario. This study enhances the understanding of the resilience of RL-based solutions using varied training methods and provides an analysis of the EMS policies generated. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in Chatbots.

Author: Izadi, Saadat and Forouzanfar, Mohamad
Subjects: *CHATBOTS, *ARTIFICIAL intelligence, *SUPERVISED learning, *NATURAL language processing, *GENERATIVE adversarial networks, *REINFORCEMENT learning
Abstract: This study explores the progress of chatbot technology, focusing on the aspect of error correction to enhance these smart conversational tools. Chatbots, powered by artificial intelligence (AI), are increasingly prevalent across industries such as customer service, healthcare, e-commerce, and education. Despite their use and increasing complexity, chatbots are prone to errors like misunderstandings, inappropriate responses, and factual inaccuracies. These issues can have an impact on user satisfaction and trust. This research provides an overview of chatbots, conducts an analysis of errors they encounter, and examines different approaches to rectifying these errors. These approaches include using data-driven feedback loops, involving humans in the learning process, and adjusting through learning methods like reinforcement learning, supervised learning, unsupervised learning, semi-supervised learning, and meta-learning. Through real life examples and case studies in different fields, we explore how these strategies are implemented. Looking ahead, we explore the different challenges faced by AI-powered chatbots, including ethical considerations and biases during implementation. Furthermore, we explore the transformative potential of new technological advancements, such as explainable AI models, autonomous content generation algorithms (e.g., generative adversarial networks), and quantum computing to enhance chatbot training. Our research provides information for developers and researchers looking to improve chatbot capabilities, which can be applied in service and support industries to effectively address user requirements. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Transformer in reinforcement learning for decision-making: a survey.

Author: Yuan, Weilin, Chen, Jiaxing, Chen, Shaofei, Feng, Dawei, Hu, Zhenzhen, Li, Peng, and Zhao, Weiwei
Abstract: Copyright of Frontiers of Information Technology & Electronic Engineering is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

47. A learning-based control pipeline for generic motor skills for quadruped robots.

Author: Shao, Yecheng, Jin, Yongbin, Huang, Zhilong, Wang, Hongtao, and Yang, Wei
Abstract: Copyright of Journal of Zhejiang University: Science A is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

48. Sixth-Generation (6G) Networks for Improved Machine-to-Machine (M2M) Communication in Industry 4.0.

Author: Rojek, Izabela, Kotlarz, Piotr, Dorożyński, Janusz, and Mikołajewski, Dariusz
Subjects: MACHINE-to-machine communications, INDUSTRY 4.0, ARTIFICIAL intelligence, COMMUNICATIONS industries, MACHINE learning, AUGMENTED reality
Abstract: The sixth generation of mobile networks (6G) has the potential to revolutionize the way we communicate, interact, and use information for machine-to-machine (M2M) communication in Industry 4.0 and Industry 5.0, while also improving coverage in places that were previously considered difficult to access and/or digitally excluded, and supporting more devices and users. The 6G network will have an impact through a combination of many technologies: the Internet of Things (IoT), artificial intelligence/machine learning, virtual and augmented reality, cloud computing, and cyber security. New solutions and architectures and concepts for their use need to be developed to take full advantage of this. This article provides an overview of the challenges in this area and the proposed solutions, taking into account the disruptive technologies that are yet to be developed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Deep Reinforcement Learning techniques for dynamic task offloading in the 5G edge-cloud continuum.

Author: Nieto, Gorka, de la Iglesia, Idoia, Lopez-Novoa, Unai, and Perfecto, Cristina
Subjects: DEEP reinforcement learning, REINFORCEMENT learning, 5G networks, INTERNET of things, ENERGY consumption
Abstract: The integration of new Internet of Things (IoT) applications and services heavily relies on task offloading to external devices due to the constrained computing and battery resources of IoT devices. Up to now, Cloud Computing (CC) paradigm has been a good approach for tasks where latency is not critical, but it is not useful when latency matters, so Multi-access Edge Computing (MEC) can be of use. In this work, we propose a distributed Deep Reinforcement Learning (DRL) tool to optimize the binary task offloading decision, this is, the independent decision of where to execute each computing task, depending on many factors. The optimization goal in this work is to maximize the Quality-of-Experience (QoE) when performing tasks, which is defined as a metric related to the battery level of the UE, but subject to satisfying tasks' latency requirements. This distributed DRL approach, specifically an Actor-Critic (AC) algorithm running on each User Equipment (UE), is evaluated through the simulation of two distinct scenarios and outperforms other analyzed baselines in terms of QoE values and/or energy consumption in dynamic environments, also demonstrating that decisions need to be adapted to the environment's evolution. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Beyond Static Obstacles: Integrating Kalman Filter with Reinforcement Learning for Drone Navigation.

Author: Marino, Francesco and Guglieri, Giorgio
Subjects: REINFORCEMENT learning, KALMAN filtering, MACHINE learning, AERONAUTICAL navigation, GOAL (Psychology), NAVIGATION
Abstract: Autonomous drones offer immense potential in dynamic environments, but their navigation systems often struggle with moving obstacles. This paper presents a novel approach for drone trajectory planning in such scenarios, combining the Interactive Multiple Model (IMM) Kalman filter with Proximal Policy Optimization (PPO) reinforcement learning (RL). The IMM Kalman filter addresses state estimation challenges by modeling the potential motion patterns of moving objects. This enables accurate prediction of future object positions, even in uncertain environments. The PPO reinforcement learning algorithm then leverages these predictions to optimize the drone's real-time trajectory. Additionally, the capability of PPO to work with continuous action spaces makes it ideal for the smooth control adjustments required for safe navigation. Our simulation results demonstrate the effectiveness of this combined approach. The drone successfully navigates complex dynamic environments, achieving collision avoidance and goal-oriented behavior. This work highlights the potential of integrating advanced state estimation and reinforcement learning techniques to enhance autonomous drone capabilities in unpredictable settings. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,393 results on '"reinforcement learning (RL)"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources