Descriptor: "q-learning" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"q-learning"' showing total 7,239 results

Start Over Descriptor "q-learning"

7,239 results on '"q-learning"'

1. Exploring the Use of Q-Learning in Causal Inference for Adaptive Interventions

Author: Zhou, Sha, Jiang, YanHua, Jin, ZhiWei, Qian, ZhenZhen, Ji, MengMeng, Liu, Chi, Li, HongYi, Xuan, GuoWei, Shuai, YuXing, Chen, XinLin, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhou, Xiao-Hua, editor, and Jia, Jinzhu, editor
Published: 2025
Full Text: View/download PDF

2. Wireless Sensor Network for Fault Detection Using Block Chain Technology Based Smart Grid Security

Author: Sudhakar, A. V. V., Goswami, Chandrshekhar, Neeraja, B., Jain, Amit Kumar, Gupta, Sandeep, Gowri, G., Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Kumar, Amit, editor, Gunjan, Vinit Kumar, editor, Senatore, Sabrina, editor, and Hu, Yu-Chen, editor
Published: 2025
Full Text: View/download PDF

3. Q‐Learning Based Adaptive Kalman Filtering With Adaptive Window Length.

Author: Tang, Kun, Luan, Xiaoli, Ding, Feng, and Liu, Fei
Abstract: ABSTRACT In this article, we propose an adaptive Kalman filtering with adaptive window length based on Q‐learning for dynamic systems with unknown model information. The iteration step length of the Q‐function is quantitatively adjusted through the influence function. The adaptive Kalman filtering algorithm is used to set an appropriate weight matrix for the Q‐function to estimate unknown model parameters. One numerical example and a practice‐oriented case are given to illustrate the effectiveness of the proposed method. It is shown that this filtering can provide state estimates of best accuracy among all the compared methods when the model mismatch and noise statistical characteristics change. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Q-learning improved golden jackal optimization algorithm and its application to reliability optimization of hydraulic system.

Author: Chen, Dongning, Wang, Haowen, Hu, Dongbo, Xian, Qinggui, and Wu, Bingyu
Abstract: To endow the prey with intelligent movement behavior and improve the performance of Golden Jackal Optimization (GJO), a Q-learning Improved Gold Jackal Optimization (QIGJO) algorithm is proposed. This paper introduces five update mechanisms and proposes double-population Q-learning collaborative mechanism to select appropriate update mechanisms to improve GJO performance. Additionally, a new convergence factor is incorporated to enhance convergence capability of GJO. QIGJO demonstrates excellent performance across 23 benchmark functions, CEC2022, and three classical engineering design problems, indicating high convergence accuracy and significantly enhanced global exploration capability. The reliability optimization model of the hydraulic system for concrete pump trucks was established based on a Continuous-time Multi-dimensional T-S dynamic Fault Tree (CM-TSdFT), considering the two-dimensional factors of operating time and number of impacts. Utilizing QIGJO to optimize this model yielded excellent results, providing valuable methodological support for reliability optimization of hydraulic systems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Calibration Method for Relativistic Navigation System Using Parallel Q-Learning Extended Kalman Filter.

Author: Xiong, Kai, Zhao, Qin, and Yuan, Li
Subjects: *KALMAN filtering, *ARTIFICIAL satellites in navigation, *STAR clusters, *COVARIANCE matrices, *ORBITS (Astronomy)
Abstract: For the relativistic navigation system where the position and velocity of the spacecraft are determined through the observation of the relativistic perturbations including stellar aberration and starlight gravitational deflection, a novel parallel Q-learning extended Kalman filter (PQEKF) is presented to implement the measurement bias calibration. The relativistic perturbations are extracted from the inter-star angle measurement achieved with a group of high-accuracy star sensors on the spacecraft. Inter-star angle measurement bias caused by the misalignment of the star sensors is one of the main error sources in the relativistic navigation system. In order to suppress the unfavorable effect of measurement bias on navigation performance, the PQEKF is developed to estimate the position and velocity, together with the calibration parameters, where the Q-learning approach is adopted to fine tune the process noise covariance matrix of the filter automatically. The high performance of the presented method is illustrated via numerical simulations in the scenario of medium Earth orbit (MEO) satellite navigation. The simulation results show that, for the considered MEO satellite and the presented PQEKF algorithm, in the case that the inter-star angle measurement accuracy is about 1 mas, after calibration, the positioning accuracy of the relativistic navigation system is less than 300 m. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Estimation of optimal treatment regimes with electronic medical record data using the residual life value estimator.

Author: Rhodes, Grace, Davidian, Marie, and Lu, Wenbin
Subjects: *ELECTRONIC health records, *INTENSIVE care patients, *RANDOM forest algorithms, *CRITICAL care medicine, *INDIVIDUALIZED medicine
Abstract: Clinicians and patients must make treatment decisions at a series of key decision points throughout disease progression. A dynamic treatment regime is a set of sequential decision rules that return treatment decisions based on accumulating patient information, like that commonly found in electronic medical record (EMR) data. When applied to a patient population, an optimal treatment regime leads to the most favorable outcome on average. Identifying optimal treatment regimes that maximize residual life is especially desirable for patients with life-threatening diseases such as sepsis, a complex medical condition that involves severe infections with organ dysfunction. We introduce the residual life value estimator (ReLiVE), an estimator for the expected value of cumulative restricted residual life under a fixed treatment regime. Building on ReLiVE, we present a method for estimating an optimal treatment regime that maximizes expected cumulative restricted residual life. Our proposed method, ReLiVE-Q, conducts estimation via the backward induction algorithm Q-learning. We illustrate the utility of ReLiVE-Q in simulation studies, and we apply ReLiVE-Q to estimate an optimal treatment regime for septic patients in the intensive care unit using EMR data from the Multiparameter Intelligent Monitoring Intensive Care database. Ultimately, we demonstrate that ReLiVE-Q leverages accumulating patient information to estimate personalized treatment regimes that optimize a clinically meaningful function of residual life. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Improving predictions of rock tunnel squeezing with ensemble Q-learning and online Markov chain.

Author: Fard, Hadi S, Parvin, Hamid, and Mahmoudi, Mohammadreza
Subjects: *UNDERGROUND construction, *TUNNEL design & construction, *CONSTRUCTION projects, *MARKOV processes, *ROCK mechanics, *DEEP learning
Abstract: Predicting rock tunnel squeezing in underground projects is challenging due to its intricate and unpredictable nature. This study proposes an innovative approach to enhance the accuracy and reliability of tunnel squeezing prediction. The proposed method combines ensemble learning techniques with Q-learning and online Markov chain integration. A deep learning model is trained on a comprehensive database comprising tunnel parameters including diameter (D), burial depth (H), support stiffness (K), and tunneling quality index (Q). Multiple deep learning models are trained concurrently, leveraging ensemble learning to capture diverse patterns and improve prediction performance. Integration of the Q-learning-Online Markov Chain further refines predictions. The online Markov chain analyzes historical sequences of tunnel parameters and squeezing class transitions, establishing transition probabilities between different squeezing classes. The Q-learning algorithm optimizes decision-making by learning the optimal policy for transitioning between tunnel states. The proposed model is evaluated using a dataset from various tunnel construction projects, assessing performance through metrics like accuracy, precision, recall, and F1-score. Results demonstrate the efficiency of the ensemble deep learning model combined with Q-learning-Online Markov Chain in predicting surrounding rock tunnel squeezing. This approach offers insights into parameter interrelationships and dynamic squeezing characteristics, enabling proactive planning and support measures implementation to mitigate tunnel squeezing hazards and ensure underground structure safety. Experimental results show the model achieves a prediction accuracy of 98.11%, surpassing individual CNN and RNN models, with an AUC value of 0.98. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Reinforcement learning marine predators algorithm for global optimization.

Author: Wang, Jianlan, Wang, Zhendong, Zhu, Donglin, Yang, Shuxin, Wang, Junling, and Li, Dahai
Subjects: *GLOBAL optimization, *RANDOM walks, *MARINE organisms, *PROBLEM solving, *ALGORITHMS
Abstract: Given the weak convergence, limited balance capacity, and optimization limitations observed in the Marine Predators Algorithm (MPA), which draws inspiration from the predatory behavior of marine organisms during evolutionary processes, this study introduces a Reinforcement Learning Marine Predators Algorithm (RLMPA). Firstly, based on the predatory characteristics at different stages, we have designed three location update strategies for search agents aimed at creating high-quality candidate solutions from three perspectives. In particular, ranking paired mutually beneficial learning is specifically designed to expand the scope of exploration to generate as many high-quality candidate solutions as possible for future generations. The Gaussian random walk learning is specifically designed to achieve better optimization in the transitional phase by adjusting the step-size control parameters, successfully completing the transition from exploration to local exploitation phase. Additionally, modified somersault foraging strategy is introduced to accelerate local convergence and perform more extensive local exploitation. Secondly, we integrate reinforcement learning into MPA and use Q-learning mechanism to adaptively select location update strategies. Agents fully utilize the collected information to evaluate the next action of the agents, coordinate the exploration phase and exploitation phase, and enhance the global optimization ability. Finally, compared with 10 competitive algorithms, RLMPA achieves better comprehensive performance in global optimization ability, search efficiency and convergence speed on 41 test functions and 5 practical engineering problems. In the Friedman rank sum tests, RLMPA achieves a preferable overall ranking, and has certain ascendant preponderances in solving practical problems with stability, effectiveness and robustness. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Optimizing traffic flow with Q-learning and genetic algorithm for congestion control.

Author: Deepika and Pandove, Gitanjali
Abstract: Traffic congestion in urban areas presents significant challenges, adversely affecting economic productivity, public health, and overall quality of life. Efficient coordination of traffic signals emerges as a crucial strategy to mitigate these impacts. This paper introduces an innovative approach to traffic management by leveraging Q-learning and Genetic Algorithms (GAs) to optimize traffic light schedules, aiming to reduce vehicle waiting times at intersections. The approach proposed in this study is implemented in a sophisticated simulation environment, facilitated by the python-traffic simulator platform, leveraging real-time data. Uniquely, in this paper, Q-Learning implementation incorporates a novel yet redundant random shuffling of action values in the value determination process, which differs from standard Q-learning approaches. Through a comparative analysis, we evaluated the performance of these advanced methodologies against the default traffic light control behavior. The proposed algorithm demonstrated a substantial improvement, reducing average vehicle waiting time. The research thoroughly assesses the performance of simulation outcomes under various scenarios, examining episodes in batches of 20, 50 and 100. The method exhibits notable improvements over traditional traffic control algorithms. It reduces the average wait time by approximately 12.54% compared to the default fixed cycle method. Also showcases a significant reduction in the average wait time by approximately 10.39% with the second method (longest queue first). In comparison to the third method (search algorithm) the proposed method demonstrates an appreciable decrease in the average wait time by approximately 6.09%. These findings underscore the potential of applying machine learning and evolutionary computation techniques to enhance traffic flow efficiency, suggesting a scalable solution for urban traffic management challenges. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Short-Term Photovoltaic Power Probabilistic Forecasting Based on Temporal Decomposition and Vine Copula.

Author: Wang, Xinghua, Li, Zilv, Fu, Chenyang, Liu, Xixian, Yang, Weikang, Huang, Xiangyuan, Yang, Longfa, Wu, Jianhui, and Zhao, Zhuoli
Abstract: With the large-scale development of solar power generation, highly uncertain photovoltaic (PV) power output has an increasing impact on distribution networks. PV power generation has complex correlations with various weather factors, while the time series embodies multiple temporal characteristics. To more accurately quantify the uncertainty of PV power generation, this paper proposes a short-term PV power probabilistic forecasting method based on the combination of decomposition prediction and multidimensional variable dependency modeling. First, a seasonal and trend decomposition using a Loess (STL)-based PV time series feature decomposition model is constructed to obtain periodic, trend, and residual components representing different characteristics. For different components, this paper develops a periodic component prediction model based on TimeMixer for multi-scale temporal feature mixing, a long short-term memory (LSTM)-based trend component extraction and prediction model, and a multidimensional PV residual probability density prediction model optimized by Vine Copula optimized with Q-Learning. These components' results form a short-term PV probabilistic forecasting method that considers both temporal features and multidimensional variable correlations. Experimentation with data from the Desert Knowledge Australia Solar Center (DKASC) demonstrates that the proposed method reduced root mean square error (RMSE) and mean absolute percentage error (MAPE) by at least 14.8% and 22%, respectively, compared to recent benchmark models. In probability interval prediction, while improving accuracy by 4% at a 95% confidence interval, the interval width decreased by 19%. The results show that the proposed approach has stronger adaptability and higher accuracy, which can provide more valuable references for power grid planning and decision support. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Multi-Objective Optimization of Energy-Efficient Multi-Stage, Multi-Level Assembly Job Shop Scheduling.

Author: Dong, Yingqian, Liao, Weizhi, and Xu, Guodong
Abstract: The multi-stage, multi-level assembly job shop scheduling problem (MsMlAJSP) is commonly encountered in the manufacturing of complex customized products. Ensuring production efficiency while effectively improving energy utilization is a key focus in the industry. For the energy-efficient MsMlAJSP (EEMsMlAJSP), an improved imperialist competitive algorithm based on Q-learning (IICA-QL) is proposed to minimize the maximum completion time and total energy consumption. In IICA-QL, a decoding strategy with energy-efficient triggers based on problem characteristics is designed to ensure solution quality while effectively enhancing search efficiency. Additionally, an assimilation operation with operator parameter self-adaptation based on Q-learning is devised to overcome the challenge of balancing exploration and exploitation with fixed parameters; thus, the convergence and diversity of the algorithmic search are enhanced. Finally, the effectiveness of the energy-efficient strategy decoding trigger mechanism and the operator parameter self-adaptation operation based on Q-learning is demonstrated through experimental results, and the effectiveness of IICA-QL for solving the EEMsMlAJSP is verified by comparing it with other algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. An improved fruit fly optimization algorithm with Q-learning for solving distributed permutation flow shop scheduling problems.

Author: Zhao, Cai, Wu, Lianghong, Zuo, Cili, and Zhang, Hongqiang
Subjects: OPTIMIZATION algorithms, FLOW shop scheduling, FRUIT flies, FACTORY design & construction, ECONOMIC globalization
Abstract: The distributed permutation flow shop scheduling problem (DPFSP) is one of the hottest issues in the context of economic globalization. In this paper, a Q-learning enhanced fruit fly optimization algorithm (QFOA) is proposed to solve the DPFSP with the goal of minimizing the makespan. First, a hybrid strategy is used to cooperatively initialize the position of the fruit fly in the solution space and the boundary properties are used to improve the operation efficiency of QFOA. Second, the neighborhood structure based on problem knowledge is designed in the smell stage to generate neighborhood solutions, and the Q-learning method is conducive to the selection of high-quality neighborhood structures. Moreover, a local search algorithm based on key factories is designed to improve the solution accuracy by processing sequences of subjobs from key factories. Finally, the proposed QFOA is compared with the state-of-the-art algorithms for solving 720 well-known large-scale benchmark instances. The experimental results demonstrate the most outstanding performance of QFOA. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. A sequential, multiple assignment, randomized trial design with a tailoring function.

Author: Hartman, Holly, Schipper, Matthew, and Kidwell, Kelley
Subjects: *REINFORCEMENT learning, *CLINICAL trials, *TAILORING, *PROBABILITY theory
Abstract: We present a trial design for sequential multiple assignment randomized trials (SMARTs) that use a tailoring function instead of a binary tailoring variable allowing for simultaneous development of the tailoring variable and estimation of dynamic treatment regimens (DTRs). We apply methods for developing DTRs from observational data: tree‐based regression learning and Q‐learning. We compare this to a balanced randomized SMART with equal re‐randomization probabilities and a typical SMART design where re‐randomization depends on a binary tailoring variable and DTRs are analyzed with weighted and replicated regression. This project addresses a gap in clinical trial methodology by presenting SMARTs where second stage treatment is based on a continuous outcome removing the need for a binary tailoring variable. We demonstrate that data from a SMART using a tailoring function can be used to efficiently estimate DTRs and is more flexible under varying scenarios than a SMART using a tailoring variable. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Jointly Optimization of Delay and Energy Consumption for Multi-Device FDMA in WPT-MEC System.

Author: Qiao, Danxia, Sun, Lu, Li, Dianju, Xiong, Huajie, Liang, Rina, Han, Zhenyuan, and Wan, Liangtian
Subjects: *METAHEURISTIC algorithms, *WIRELESS power transmission, *POWER resources, *MOBILE computing, *ENERGY consumption
Abstract: With the rapid development of mobile edge computing (MEC) and wireless power transfer (WPT) technologies, the MEC-WPT system makes it possible to provide high-quality data processing services for end users. However, in a real-world WPT-MEC system, the channel gain decreases with the transmission distance, leading to "double near and far effect" in the joint transmission of wireless energy and data, which affects the quality of the data processing service for end users. Consequently, it is essential to design a reasonable system model to overcome the "double near and far effect" and make reasonable scheduling of multi-dimensional resources such as energy, communication and computing to guarantee high-quality data processing services. First, this paper designs a relay collaboration WPT-MEC resource scheduling model to improve wireless energy utilization efficiency. The optimization goal is to minimize the normalization of the total communication delay and total energy consumption while meeting multiple resource constraints. Second, this paper imports a BK-means algorithm to complete the end terminals cluster to guarantee effective energy reception and adapts the whale optimization algorithm with adaptive mechanism (AWOA) for mobile vehicle path-planning to reduce energy waste. Third, this paper proposes an immune differential enhanced deep deterministic policy gradient (IDDPG) algorithm to realize efficient resource scheduling of multiple resources and minimize the optimization goal. Finally, simulation experiments are carried out on different data, and the simulation results prove the validity of the designed scheduling model and proposed IDDPG. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. A decomposition-based multi-objective evolutionary algorithm with Q-learning for adaptive operator selection.

Author: Xue, Fei, Chen, Yuezheng, Wang, Peiwen, Ye, Yunsen, Dong, Jinda, and Dong, Tingting
Subjects: *EVOLUTIONARY algorithms, *ALGORITHMS
Abstract: In the past few decades, many multi-objective evolution algorithms (MOEAs) have been proposed, often emphasizing a single crossover operator, which has a significant impact on the algorithm's performance. This paper proposed a novel MOEA, based on the MOEA/D framework and employing Q-learning for adaptive operator selection (QLMOEA/D-AOS). In every Iteration, Q-learning is used to dynamically choose an operator among five crossover operators. To obtain a better distribution of solutions in multi-objective optimization problems with irregular PFs, a new approach for weight vector initializing is proposed. Additionally, to enhance population diversity, a reward calculation method based on two metrics, Spacing and PD, is proposed. Finally, the proposed algorithm is validated for different numbers of objectives, ranging from two to five for multi/many-objective optimization problems. The experimental results demonstrate the significant advantages of the proposed algorithm compared to state-of-the-art MOEAs across multiple test cases. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Smart GAN: a smart generative adversarial network for limited imbalanced dataset.

Author: Kumari, Deepa, Vyshnavi, S. K., Dhar, Rupsa, Rajita, B. S. A. S., Panda, Subhrakanta, and Christopher, Jabez
Subjects: *CONVOLUTIONAL neural networks, *GENERATIVE adversarial networks, *COMPUTER vision, *DATABASES, *MACHINE learning
Abstract: Advancements in Machine Learning (ML) and Computer Vision have led to notable improvements in the detection of breast cancer. However, the accuracy of the classifier is limited due to imbalanced datasets that cause overfitting. Thus, additional images are needed to improve the classifier's performance. Generative Adversarial Networks (GAN) are used for image augmentation. Still, its limitations, such as the random use of different types of GAN, can lead to a too-good discriminator, causing generator training to fail due to vanishing gradients. Therefore, selecting the appropriate GAN model for the given scenario is crucial for optimal performance. This paper proposes a novel Smart Generative Adversarial Network (Smart GAN) architecture to develop an efficient and computational classification model for a limited imbalanced dataset. Smart GAN uses a three-fold approach. Different types of GAN augment the dataset in the first phase of experimental work, and their evaluation metrics are calculated. In the second phase, the metric scores are used as rewards for the Reinforcement learning model (Q-learning approach). The best augmentation is chosen based on the best Q-values for each metric score. It compares three different Convolutional Neural Networks (CNN) and selects the best-suited network to classify the augmented datasets. The proposed Smart GAN architecture outperforms other existing approaches by giving a better accuracy of 89.62% and 89.91% on Mammographic Image Analysis Society (MIAS) and Digital Database for Screening Mammography (DDSM) augmented datasets, respectively, representing an approximately 10% increment in detection rate compared to non-augmented datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Exploring the potential of 5G uplink communication: Synergistic integration of joint power control, user grouping, and multi-learning Grey Wolf Optimizer.

Author: Sikkanan, Sobana, Kumar, Chandrasekaran, Manoharan, Premkumar, and Ravichandran, Sowmya
Abstract: Non-orthogonal Multiple Access (NOMA) techniques offer potential enhancements in spectral efficiency for 5G and 6G wireless networks, facilitating broader network access. Central to realizing optimal system performance are factors like joint power control, user grouping, and decoding order. This study investigates power control and user grouping to optimize spectral efficiency in NOMA uplink systems, aiming to reduce computational difficulty. While previous research on this integrated optimization has identified several near-optimal solutions, they often come with considerable system and computational overheads. To address this, this study employed an improved Grey Wolf Optimizer (GWO), a nature-inspired metaheuristic optimization method. Although GWO is effective, it can sometimes converge prematurely and might lack diversity. To enhance its performance, this study introduces a new version of GWO, integrating Competitive Learning, Q-learning, and Greedy Selection. Competitive learning adopts agent competition, balancing exploration and exploitation and preserving diversity. Q-learning guides the search based on past experiences, enhancing adaptability and preventing redundant exploration of sub-optimal regions. Greedy selection ensures the retention of the best solutions after each iteration. The synergistic integration of these three components substantially enhances the performance of the standard GWO. This algorithm was used to manage power and user-grouping in NOMA systems, aiming to strengthen system performance while restricting computational demands. The effectiveness of the proposed algorithm was validated through numerical evaluations. Simulated outcomes revealed that when applied to the joint challenge in NOMA uplink systems, it surpasses the spectral efficiency of conventional orthogonal multiple access. Moreover, the proposed approach demonstrated superior performance compared to the standard GWO and other state-of-the-art algorithms, achieving reduced system complexity under identical constraints. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Efficient Jamming Policy Generation Method Based on Multi-Timescale Ensemble Q-Learning.

Author: Qian, Jialong, Zhou, Qingsong, Li, Zhihui, Yang, Zhongping, Shi, Shasha, Xu, Zhenjia, and Xu, Qiyun
Subjects: *MARKOV processes, *ERROR rates, *DECISION making, *RADAR, *RADAR interference, *ALGORITHMS
Abstract: With the advancement of radar technology toward multifunctionality and cognitive capabilities, traditional radar countermeasures are no longer sufficient to meet the demands of countering the advanced multifunctional radar (MFR) systems. Rapid and accurate generation of the optimal jamming strategy is one of the key technologies for efficiently completing radar countermeasures. To enhance the efficiency and accuracy of jamming policy generation, an efficient jamming policy generation method based on multi-timescale ensemble Q-learning (MTEQL) is proposed in this paper. First, the task of generating jamming strategies is framed as a Markov decision process (MDP) by constructing a countermeasure scenario between the jammer and radar, while analyzing the principle radar operation mode transitions. Then, multiple structure-dependent Markov environments are created based on the real-world adversarial interactions between jammers and radars. Q-learning algorithms are executed concurrently in these environments, and their results are merged through an adaptive weighting mechanism that utilizes the Jensen–Shannon divergence (JSD). Ultimately, a low-complexity and near-optimal jamming policy is derived. Simulation results indicate that the proposed method has superior jamming policy generation performance compared with the Q-learning algorithm, in terms of the short jamming decision-making time and low average strategy error rate. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. SIMULATION MODELLING OF ELECTRIC VEHICLE CHARGING RECOMMENDATIONS BASED ON Q-LEARNING.

Author: Tang, M. C., Cao, J., Gong, D. Q., Xue, G., and Khoa, B. T.
Subjects: *ELECTRIC vehicle charging stations, *INTELLIGENT transportation systems, *INFRASTRUCTURE (Economics), *ELECTRIC vehicle industry, *RECOMMENDER systems
Abstract: The adoption of electric vehicles (EVs) represents a pivotal shift towards sustainable mobility, yet the challenge of efficient charging station recommendations persists, influencing user convenience and EV uptake. This study introduces a novel approach utilizing Q-learning for simulating EV charging station recommendations, aiming to optimize the matching process between EVs and charging infrastructure. By integrating Markov decision processes with Q-learning algorithms, we dynamically adapt recommendations to user behaviours and preferences, significantly enhancing recommendation accuracy and personalization. The methodology involves constructing a simulation environment to model EV charging behaviour, evaluating the performance of the Q-learning based recommendation system under various scenarios. Results demonstrate the effectiveness of this approach in identifying optimal charging strategies, thus contributing to improved user satisfaction and charging station utilization. The findings underscore the importance of innovative technological integration for addressing the complexities of sustainable urban mobility. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. A controlling estimation bias method: Max_Mix_Min estimator for Q-learning.

Author: Abliz, Patigül
Subjects: *ESTIMATION bias, *BENCHMARK problems (Computer science), *SAMPLE size (Statistics), *REINFORCEMENT learning
Abstract: Although Q-learning (QL) is widely used in reinforcement learning, it suffers from overestimation bias, which can lead to poor performance in stochastic environments due to its susceptibility to maximization bias. To address this problem, various bias correction mechanisms have been proposed. However, while these mechanisms may reduce overestimation bias, some of them introduce underestimation bias, which is undesirable in some environments. To leverage both overestimation and underestimation biases, we introduce an underestimation mechanism called the min estimator, followed by our proposed Max_Mix_Min Q-learning (M3QL) method, which incorporates a balance parameter β . Our method also considers the number of N Q-functions. Initially, we theoretically analyze why our method benefits from both overestimation and underestimation bias under the assumption of different bias distributions and how sample size N affects the performance of our method. Additionally, we visualize the theoretic analysis results on Meta-chain MDP example. Theoretical analysis demonstrates that M3QL achieves bias reduction compared to QL and the underestimation mechanism. Furthermore, we theoretically prove that M3QL is unbiased on certain values of β . In experimental comparisons with states of arts on Atari benchmark problems, our method consistently outperforms them. We also compare M3QL with the underestimation mechanism and Deep Q-learning (DQN) on these benchmark problems, revealing that M3QL improves the performance of underestimation mechanism and DQN on most of the benchmark problems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. A Novel Industrial Big Data Fusion Method Based on Q-learning and Cascade Classifier.

Author: Xi Zhang, Jiyue Wang, Ying Huang, and Feiyue Zhu
Abstract: The traditional industrial big data fusion algorithm has low efficiency and difficulty in processing high-dimensional data, this paper proposes a Q-learningbased cascade classifier model for industrial big data fusion. By combining cascade classifier and softmax classifier, feature extraction and data attribute classification of source industrial big data are completed in this cluster. In order to improve the classification rate, an improved Q-learning algorithm is proposed, which makes the improved algorithm randomly select actions in the early stage, and dynamically change in the late stage in the random selection of actions and actions with the highest reward value. It effectively improves the defects of traditional Q-learning algorithm that it is easy to fall into the local optimal and has slow convergence speed. The experimental results show that compared with other advanced fusion algorithms, the proposed method can greatly reduce the network energy consumption and effectively improve the efficiency and accuracy of data fusion under the same data volume. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. AHT-QCN: Adaptive Hunt Tuner Algorithm Optimized Q-learning Based Deep Convolutional Neural Network for the Penetration Testing.

Author: Railkar, Dipali and Joshi, Shubhalaxmi
Subjects: CONVOLUTIONAL neural networks, INFRASTRUCTURE (Economics), CYBERTERRORISM, COMMUNICATION infrastructure, DEEP learning, ALGORITHMS
Abstract: Penetration Testing (PT), which mimics actual cyber attacks, has become an essential procedure for assessing the security posture of network infrastructures in recent years. Automated PT reduces human labor, increases scalability, and allows for more frequent evaluations. Real-world exploitation still challenges RL-based penetration testing because the agent's many possible actions make it hard for the algorithm to converge. To resolve these shortcomings, a deep learning- model named Adaptive Hunt Tuner algorithm optimized Q-learning based deep Convolutional neural Network (AHT-QCN) is developed for efficient PT. Specifically, the Q-learning employed in this model improves its efficiency by enabling optimal policy learning for decision-making. In addition, the Adaptive Hunt Tuner (AHT) algorithm enhances the model's performance by tuning its parameters with reduced computational time. The experimental outcomes demonstrate that the developed model attains 95.25% accuracy, 97.66% precision, and 93.81% F1 score. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Q-Learning-Assisted Meta-Heuristics for Scheduling Distributed Hybrid Flow Shop Problems.

Author: Zhu, Qianyao, Gao, Kaizhou, Huang, Wuze, Ma, Zhenfang, and Slowik, Adam
Subjects: FLOW shop scheduling, PARTICLE swarm optimization, GREEDY algorithms, FLOW shops, DIFFERENTIAL evolution, BEES algorithm
Abstract: The flow shop scheduling problem is important for the manufacturing industry. Effective flow shop scheduling can bring great benefits to the industry. However, there are few types of research on Distributed Hybrid Flow Shop Problems (DHFSP) by learning assisted meta-heuristics. This work addresses a DHFSP with minimizing the maximum completion time (Makespan). First, a mathematical model is developed for the concerned DHFSP. Second, four Q-learning-assisted meta-heuristics, e.g. genetic algorithm (GA), artificial bee colony algorithm (ABC), particle swarm optimization (PSO), and differential evolution (DE), are proposed. According to the nature of DHFSP, six local search operations are designed for finding high-quality solutions in local space. Instead of random selection, Q-learning assists meta-heuristics in choosing the appropriate local search operations during iterations. Finally, based on 60 cases, comprehensive numerical experiments are conducted to assess the effectiveness of the proposed algorithms. The experimental results and discussions prove that using Q-learning to select appropriate local search operations is more effective than the random strategy. To verify the competitiveness of the Q-learning assistedmeta-heuristics, they are compared with the improved iterated greedy algorithm (IIG), which is also for solving DHFSP. The Friedman test is executed on the results by five algorithms. It is concluded that the performance of four Q-learning-assisted meta-heuristics are better than IIG, and the Q-learning-assisted PSO shows the best competitiveness. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Obstacle detection to minimize delay and Q-learning to improve routing efficiency in VANET.

Author: Dev, Kishore Chandra and Barani, Selvaraj
Subjects: VEHICULAR ad hoc networks, MACHINE learning, SPANNING trees, DINGO, CITIES & towns
Abstract: Nowadays, several service providers in urban areas significantly consider vehicular ad hoc networks (VANET). VANETs can enhance road safety, prevent accidents, and grant passengers entertainment. Though in VANET, efficient routing has remained an open problem. VANET is dynamic; the frequent update in the situation originates through several aspects, such as traffic conditions and updates in the road topology, which demand a suitably adaptive routing. The existence of blocking obstacles degrades routing approaches and increases the failure of paths. These issues build an excessive amount of resource utilization and increase network delay. To solve these issues, obstacle detection to minimize delay and Q-learning to improve routing efficiency (ODQI) in VANET is proposed. This mechanism uses the spanning tree algorithm detects the obstacle. Clustering can be used to manage the topology in VANETs. The dingo algorithm selects the best cluster head (CH) based on vehicle bandwidth, speed, and link lifespan. Furthermore, the sender forwards the traffic information from the sender to the receiver by applying a Q-learning algorithm. This learning algorithm computes the award function to choose the forwarder, improving the routing efficiency. Simulation results demonstrate that the ODQI mechanism increases the CH lifetime and minimizes the network delay. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Cooperation in evolutionary games incorporated with extended Q-learning algorithm.

Author: Long, Pinduo, Dai, Qionglin, Li, Haihong, and Yang, Junzhong
Subjects: *GAME theory, *COMPUTER simulation, *NEIGHBORHOODS, *COOPERATION, *ALGORITHMS
Abstract: Evolutionary game theory provides a platform to investigate the emergence of cooperation in population consisting of selfish agents. In this work, we study evolutionary games on networks in which agents cooperate or defect according to Q-learning algorithms with extended state space. Extended state space provides agents two types of information, local environment information based on the cooperation level in agents’ neighborhood and personal information based on the last action of agents. Through numerical simulations, we find that rich information on local environment tends to improve cooperation in the population no matter whether personal information is present or not. Moreover, we show that, for the same local environment information, the introduction of personal information may improve cooperation except for the situations with low amount of local environment information where personal information deteriorates cooperation in bad-condition environment. For the same total information, the absence of personal information promotes cooperation in bad-condition environment while the presence of personal information promotes cooperation in good-condition environment. By investigating the distributions and temporal behaviors of Q-values, we present explanations for the above statements. This work suggests an effective way of extending the state space in evolutionary games incorporated with Q-learning algorithm to enhance cooperation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Reinforcement Q-Learning for PDF Tracking Control of Stochastic Systems with Unknown Dynamics.

Author: Yang, Weiqing, Zhou, Yuyang, Zhang, Yong, and Ren, Yan
Subjects: *PROBABILITY density function, *TRACKING control systems, *TRACKING algorithms, *SYSTEM dynamics, *STOCHASTIC models, *STOCHASTIC systems
Abstract: Tracking control of the output probability density function presents significant challenges, particularly when dealing with unknown system models and multiplicative noise disturbances. To address these challenges, this paper introduces a novel tracking control algorithm based on reinforce-ment Q-learning. Initially, a B-spline model is employed to represent the original system, thereby transforming the control problem into a state weight tracking issue within the B-spline stochastic system model. Moreover, to tackle the challenge of unknown stochastic system dynamics and the presence of multiplicative noise, a model-free reinforcement Q-learning algorithm is employed to solve the control problem. Finally, the proposed algorithm's effectiveness is validated through comprehensive simulation examples. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Optimal Electric Vehicle Battery Management Using Q-learning for Sustainability.

Author: Suanpang, Pannee and Jamjuntr, Pitchaya
Abstract: This paper presents a comprehensive study on the optimization of electric vehicle (EV) battery management using Q-learning, a powerful reinforcement learning technique. As the demand for electric vehicles continues to grow, there is an increasing need for efficient battery-management strategies to extend battery life, enhance performance, and minimize operating costs. The primary objective of this research is to develop and assess a Q-learning-based approach to address the intricate challenges associated with EV battery management. This paper starts by elucidating the key challenges inherent in EV battery management and discusses the potential advantages of incorporating Q-learning into the optimization process. Leveraging Q-learning's capacity to make dynamic decisions based on past experiences, we introduce a framework that considers state-of-charge, state-of-health, charging infrastructure, and driving patterns as critical state variables. The methodology is detailed, encompassing the selection of state, action, reward, and policy, with the training process informed by real-world data. Our experimental results underscore the efficacy of the Q-learning approach in optimizing battery management. Through the utilization of Q-learning, we achieve substantial enhancements in battery performance, energy efficiency, and overall EV sustainability. A comparative analysis with traditional battery-management strategies is presented to highlight the superior performance of our approach. A comparative analysis with traditional battery-management strategies is presented to highlight the superior performance of our approach, demonstrating compelling results. Our Q-learning-based method achieves a significant 15% improvement in energy efficiency compared to conventional methods, translating into substantial savings in operational costs and reduced environmental impact. Moreover, we observe a remarkable 20% increase in battery lifespan, showcasing the effectiveness of our approach in enhancing long-term sustainability and user satisfaction. This paper significantly enriches the body of knowledge on EV battery management by introducing an innovative, data-driven approach. It provides a comprehensive comparative analysis and applies novel methodologies for practical implementation. The implications of this research extend beyond the academic sphere to practical applications, fostering the broader adoption of electric vehicles and contributing to a reduction in environmental impact while enhancing user satisfaction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Qsmix: Q-learning-based task scheduling approach for mixed-critical applications on heterogeneous multi-cores.

Author: Afshari, Fatemeh and Abdi, Athena
Subjects: *REINFORCEMENT learning, *SCHEDULING, *REINFORCEMENT (Psychology), *PUNISHMENT
Abstract: In this paper, a Q-learning-based task scheduling approach for mixed-critical application on heterogeneous multi-cores (QSMix) to optimize their main design challenges is proposed. This approach employs reinforcement learning capabilities to optimize execution time, power consumption, reliability and temperature of the heterogeneous multi-cores during task scheduling process. In QSMix, a reward function is defined to consider all target design parameters simultaneously and is tuned based on applying punishment for unwanted conditions during the learning. The learning process of QSMix is led by utilizing the defined reward function during constructing the Q-table for various execution scenarios. Afterward, the best solution is selected from the constructed Q-table based on the system's policy to achieve a near-optimal solution that meets the existing trade-offs among objectives while considering its constraints properly. To evaluate our proposed QSMix, several experiments are performed to show its effectiveness in finding appropriate solutions and its gradual behavior during learning process. Moreover, the performance of QSMix in terms of optimizing the target design parameters is compared to various related research. The results confirm that QSMix has average improvement about 9% over related studies in joint optimization of execution time, power consumption, reliability and temperature. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. GTR: GAN-Based Trusted Routing Algorithm for Underwater Wireless Sensor Networks.

Author: Wang, Bin and Ben, Kerong
Subjects: *GENERATIVE adversarial networks, *WIRELESS sensor networks, *ENERGY tax, *DATA transmission systems, *ERROR rates, *ROUTING algorithms
Abstract: The transmission environment of underwater wireless sensor networks is open, and important transmission data can be easily intercepted, interfered with, and tampered with by malicious nodes. Malicious nodes can be mixed in the network and are difficult to distinguish, especially in time-varying underwater environments. To address this issue, this article proposes a GAN-based trusted routing algorithm (GTR). GTR defines the trust feature attributes and trust evaluation matrix of underwater network nodes, constructs the trust evaluation model based on a generative adversarial network (GAN), and achieves malicious node detection by establishing a trust feature profile of a trusted node, which improves the detection performance for malicious nodes in underwater networks under unlabeled and imbalanced training data conditions. GTR combines the trust evaluation algorithm with the adaptive routing algorithm based on Q-Learning to provide an optimal trusted data forwarding route for underwater network applications, improving the security, reliability, and efficiency of data forwarding in underwater networks. GTR relies on the trust feature profile of trusted nodes to distinguish malicious nodes and can adaptively select the forwarding route based on the status of trusted candidate next-hop nodes, which enables GTR to better cope with the changing underwater transmission environment and more accurately detect malicious nodes, especially unknown malicious node intrusions, compared to baseline algorithms. Simulation experiments showed that, compared to baseline algorithms, GTR can provide a better malicious node detection performance and data forwarding performance. Under the condition of 15% malicious nodes and 10% unknown malicious nodes mixed in, the detection rate of malicious nodes by the underwater network configured with GTR increased by 5.4%, the error detection rate decreased by 36.4%, the packet delivery rate increased by 11.0%, the energy tax decreased by 11.4%, and the network throughput increased by 20.4%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Q-learning based task scheduling and energy-saving MAC protocol for wireless sensor networkss.

Author: Jaber, Mustafa Musa, Ali, Mohammed Hassan, Abd, Sura Khalil, Jassim, Mustafa Mohammed, Alkhayyat, Ahmed, Jassim, Mohammed, Alkhuwaylidee, Ahmed Rashid, and Nidhal, Lahib
Subjects: *WIRELESS sensor networks, *REINFORCEMENT learning, *RELIABILITY in engineering, *ENERGY consumption, *DETECTORS
Abstract: The primary problem for a resource-limited Wireless Sensor Network is how to extend the system reliability without sacrificing system performance like reception rate and network connectivity. This method effectively deploys sensor nodes and ensures connectivity between the nodes and the transceiver station. Reinforcement Learning (RL) can effectively schedule the sensor nodes' unsupervised activities. A Nash Q-Learning inspired node task scheduling (QL-TS) for service and connection management has been described in this work. An energy-saving MAC Protocol (ESMACP) has been developed to extend the system lifetime. The primary aim of this model is to use the suggested QL-TS-ESMACP that allows sensor devices to learn their best action with minimal energy. The correctness and dependability of the QL-TS-ESMACP can be demonstrated by comparing it with other existing methods. QL-TS-ESMACP outperforms other models in terms of energy efficiency, coverage, node lifespan, and packet delivery ratio in the simulation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Algorithmic Trading Using Double Deep Q-Networks and Sentiment Analysis.

Author: Tabaro, Leon, Kinani, Jean Marie Vianney, Rosales-Silva, Alberto Jorge, Salgado-Ramírez, Julio César, Mújica-Vargas, Dante, Escamilla-Ambrosio, Ponciano Jorge, and Ramos-Díaz, Eduardo
Subjects: *DEEP reinforcement learning, *REINFORCEMENT learning, *SENTIMENT analysis, *COMPUTER algorithms, *CORPORATE finance
Abstract: In this work, we explore the application of deep reinforcement learning (DRL) to algorithmic trading. While algorithmic trading is focused on using computer algorithms to automate a predefined trading strategy, in this work, we train a Double Deep Q-Network (DDQN) agent to learn its own optimal trading policy, with the goal of maximising returns whilst managing risk. In this study, we extended our approach by augmenting the Markov Decision Process (MDP) states with sentiment analysis of financial statements, through which the agent achieved up to a 70% increase in the cumulative reward over the testing period and an increase in the Calmar ratio from 0.9 to 1.3. The experimental results also showed that the DDQN agent's trading strategy was able to consistently outperform the benchmark set by the buy-and-hold strategy. Additionally, we further investigated the impact of the length of the window of past market data that the agent considers when deciding on the best trading action to take. The results of this study have validated DRL's ability to find effective solutions and its importance in studying the behaviour of agents in markets. This work serves to provide future researchers with a foundation to develop more advanced and adaptive DRL-based trading systems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Electric vehicle routing optimization under 3D electric energy modeling.

Author: Zhu, Yanfei, Wang, Yonghua, Li, Chunhui, and Lee, Kwang Y.
Abstract: In logistics transportation, the electric vehicle routing problem (EVRP) is researched widely in order to save vehicle power expenditure, reduce transportation costs, and improve service quality. The power expenditure model and routing algorithm are essential for resolving EVRP. To align the routing schedule more reasonable and closer to reality, this paper employs a three-dimensional power expenditure model to calculate the power expenditure of EVs. In this model, the power expenditure of the EVs during the process of going up and downhill is considered to solve the routing schedule of logistics transportation in mountainous areas. This study combines Q-learning and the Re-insertion Genetic Algorithm (Q-RIGA) to design EV routes with low electricity expenditure and reduced transportation costs. The Q-learning algorithm is used to improve route initialization and obtain high-quality initial routes, which are further optimized by RIGA. Tested in a collection of randomly dispersed customer groups, the advantages of the proposed method in terms of convergence speed and power expenditure are confirmed. The three-dimensional power expenditure model with consideration of elevation is used to conduct simulation experiments on the distribution example of Sanlian Dairy in Guizhou to verify that the improved model features broader application and higher practical value. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Active fault-tolerant attitude control based on Q-learning for rigid spacecraft with actuator faults.

Author: Rafiee, Sajad, Kankashvar, Mohammadrasoul, Mohammadi, Parisa, and Bolandi, Hossein
Subjects: *FAULT-tolerant control systems, *ARTIFICIAL satellite attitude control systems, *FAULT-tolerant computing, *SPACE vehicles, *ACTUATORS, *RIGID dynamics, *REINFORCEMENT learning
Abstract: • A novel fault-tolerant controller has been developed for the attitude control of rigid spacecraft based on Q-learning. • This controller obviates the necessity for actuator fault data or extensive fault knowledge. • The controller stability analysis and controller implementation are discussed. This paper presents a novel active fault-tolerant control (FTC) scheme based on reinforcement learning (RL) for rigid spacecraft operating in challenging conditions with simultaneous actuator faults and external disturbances. Initially, the paper outlines the dynamics of a rigid spacecraft afflicted by actuator faults and subject to external disturbances. Subsequently, an observer is designed to swiftly detect actuator faults, ensuring a timely response to fault occurrences. An indirect fault estimator is then employed to estimate the total faults affecting the system. Based on the estimated total faults, the proposed decision mechanism switches the controller from the nominal to the fault-tolerant controller. The proposed fault-tolerant controller is model-free and utilizes the Q-learning algorithm. This Q-learning-based fault-tolerant controller can be implemented online without relying on explicit system models or actuator fault details. Notably, this innovative controller operates independently from fault detection and identification (FDI), utilizing data extracted from system trajectories. The stability of the fault-tolerant controller is established using Lyapunov techniques, providing rigorous validation of its effectiveness in maintaining system stability and achieving satisfactory performance. The performance and adaptability of the proposed approach are assessed through comprehensive simulation studies, emphasizing its capacity to enhance spacecraft fault tolerance in demanding operational scenarios. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Integration of Q-Learning and PID Controller for Mobile Robots Trajectory Tracking in Unknown Environments.

Author: Munaf, Almojtaba and Jasim Almusawi, Ahmed Rahman
Subjects: ROBOTIC path planning, MACHINE learning, PID controllers, ROBOTICS, REINFORCEMENT learning, MOBILE robots, AUTOMOTIVE navigation systems
Abstract: In the realm of autonomous robotics, navigating differential drive mobile robots through unknown environments poses significant challenges due to their complex nonholonomic constraints. This issue is particularly acute in applications requiring precise trajectory tracking and effective obstacle avoidance without prior knowledge of the surroundings. Traditional navigation systems often struggle with these demands, leading to inefficiencies and potential safety risks. To address this problem, our studies propose an algorithm that integrates machine learning and control concepts, especially through the synergistic software of a Q-learning set of rules and a (PID) controller. This technique leverages the adaptability of Q-learning pathfinding and the precision of PID control for actual-time trajectory adjustment, aiming to beautify the robotics' navigation skills. Our comprehensive technique includes growing a country-area version that integrates Q-values with the dynamics of differential power robots, employing Bellman's equation for iterative coverage refinement. This version enables the robotics' capacity to dynamically adapt its navigation techniques in reaction to instant environmental feedback, thereby optimizing efficiency and protection in actual time. The effects of our full-size simulations exhibit a marked improvement in trajectory-tracking accuracy and impediment-avoidance competencies. These findings underscore the capability of combining machine learning algorithms with traditional methods to increase autonomous navigation technology in robotic systems. Our effects, derived from full-size simulations, suggest that the integration of Q-learning with PID controller markedly improves trajectory tracking accuracy, reduces tour times to targets, and complements the robotics' ability to navigate round barriers. This incorporated method demonstrates a tremendous advantage over conventional navigation systems, providing a sturdy way to the challenges of autonomous robot navigation in unpredictable environments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. 基于输出反馈逆强化Q学习的线性二次型最优控制方法.

Author: 刘文, 范家璐, and 薛文倩
Subjects: LINEAR systems, REINFORCEMENT learning, EQUATIONS
Abstract: Copyright of Control Theory & Applications / Kongzhi Lilun Yu Yinyong is the property of Editorial Department of Control Theory & Applications and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

36. 结构拓扑优化的Q学习元胞方法研究.

Author: 宋旭明, 史哲宇, 包世鹏, and 唐冕
Abstract: Copyright of Journal of Railway Science & Engineering is the property of Journal of Railway Science & Engineering Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

37. Tool Condition Monitoring in the Milling Process Using Deep Learning and Reinforcement Learning.

Author: Kaliyannan, Devarajan, Thangamuthu, Mohanraj, Pradeep, Pavan, Gnansekaran, Sakthivel, Rakkiyannan, Jegadeeshwaran, and Pramanik, Alokesh
Subjects: DEEP reinforcement learning, REINFORCEMENT learning, DEEP learning, RECEIVER operating characteristic curves, MANUFACTURING processes
Abstract: Tool condition monitoring (TCM) is crucial in the machining process to confirm product quality as well as process efficiency and minimize downtime. Traditional methods for TCM, while effective to a degree, often fall short in real-time adaptability and predictive accuracy. This research work aims to advance the state-of-the-art methods in predictive maintenance for TCM and improve tool performance and reliability during the milling process. The present work investigates the application of Deep Learning (DL) and Reinforcement Learning (RL) techniques to monitor tool conditions in milling operations. DL models, including Long Short-Term Memory (LSTM) networks, Feed Forward Neural Networks (FFNN), and RL models, including Q-learning and SARSA, are employed to classify tool conditions from the vibration sensor. The performance of the selected DL and RL algorithms is evaluated through performance metrics like confusion matrix, recall, precision, F1 score, and Receiver Operating Characteristics (ROC) curves. The results revealed that RL based on SARSA outperformed other algorithms. The overall classification accuracies for LSTM, FFNN, Q-learning, and SARSA were 94.85%, 98.16%, 98.50%, and 98.66%, respectively. In regard to predicting tool conditions accurately and thereby enhancing overall process efficiency, SARSA showed the best performance, followed by Q-learning, FFNN, and LSTM. This work contributes to the advancement of TCM systems, highlighting the potential of DL and RL techniques to revolutionize manufacturing processes in the era of Industry 5.0. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Hybridization of the Q-learning and honey bee foraging algorithms for load balancing in cloud environments.

Author: Adewale, Adeyinka Ajao, Obiazi, Oghorchukwuyem, Okokpujie, Kennedy, and Koto, Omiloli
Subjects: VIRTUAL machine systems, HONEYBEES, MACHINE learning, BEES algorithm, CLOUD computing
Abstract: Load balancing (LB) is very critical in cloud computing because it keeps nodes from being overloading while others are idle or underutilized. Maintaining the quality of service (QoS) characteristics like response time, throughput, cost, makespan, resource utilization, and runtime is difficult in cloud computing due to load balancing. A robust resource allocation strategy contributes to the end user receiving high-quality cloud computing services. An effective LB strategy should improve and deliver required user satisfaction by efficiently using the resources of virtual machines (VM). The Q-learning method and the honey bee foraging load balancing algorithm were combined in this study. This hybrid combination of a load balancing algorithm and a machine learning method has reduced the runtime of load balancing activities and makespan, and increased task throughput in a cloud computing environment thereby enhancing routing activities. It achieved this by continuously tracking the usage histories of the VMs and altering the usage matrix to send jobs to the VMs with the best usage histories. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Q-learning improved golden jackal optimization algorithm and its application to reliability optimization of hydraulic system

Author: Dongning Chen, Haowen Wang, Dongbo Hu, Qinggui Xian, and Bingyu Wu
Subjects: Golden jackal optimization, Reliability optimization, Q-Learning, Global optimization, Medicine, Science
Abstract: Abstract To endow the prey with intelligent movement behavior and improve the performance of Golden Jackal Optimization (GJO), a Q-learning Improved Gold Jackal Optimization (QIGJO) algorithm is proposed. This paper introduces five update mechanisms and proposes double-population Q-learning collaborative mechanism to select appropriate update mechanisms to improve GJO performance. Additionally, a new convergence factor is incorporated to enhance convergence capability of GJO. QIGJO demonstrates excellent performance across 23 benchmark functions, CEC2022, and three classical engineering design problems, indicating high convergence accuracy and significantly enhanced global exploration capability. The reliability optimization model of the hydraulic system for concrete pump trucks was established based on a Continuous-time Multi-dimensional T-S dynamic Fault Tree (CM-TSdFT), considering the two-dimensional factors of operating time and number of impacts. Utilizing QIGJO to optimize this model yielded excellent results, providing valuable methodological support for reliability optimization of hydraulic systems.
Published: 2024
Full Text: View/download PDF

40. Prediction of formation fracture pressure based on reinforcement learning and XGBoost

Author: Wan Bingqian, Xu Shengchi, Luo Shuai, Wei Leipeng, Zhang Ci, Zhou Diao, Zhang Hao, and Zhang Yan
Subjects: fracture pressure prediction, geophysical well logging, fracturing design, xgboost, q-learning, Geology, QE1-996.5
Abstract: Clearly determining the magnitude of fracture pressure is a crucial indicator for fracturing design. Traditional methods for predicting fracture pressure suffer from challenges such as difficulties in obtaining required data, low prediction accuracy, and local limitations in application. In light of these issues, the article proposes a fracture pressure prediction model based on reinforcement learning and XGBoost utilizing geophysical well logging data. Based on the relevance analysis, optimal input parameters, including DEPTH, DEN, AC, GR, CRL, and RT, are selected from geophysical well logging data. We have developed a framework for a fracture pressure prediction model based on XGBoost, wherein hyperparameters are fine-tuned using an improved Q-learning algorithm. The optimized XGBoost model for fracture pressure prediction attains outstanding performance metrics, including an R 2 value of 0.992, a root mean square error of 0.006%, and a mean absolute error of 0.539%. In direct comparison with grid search, Bayesian optimization, and ant colony optimization, the improved Q-learning algorithm emerges as the most effective optimization approach. The predictions generated by the proposed method exhibit remarkable consistency with fracture pressure data measured on-site. This approach successfully addresses the shortcomings encountered with traditional fracture pressure prediction methods, such as inadequate accuracy, demanding data prerequisites, and constrained applicability.
Published: 2024
Full Text: View/download PDF

41. Improving predictions of rock tunnel squeezing with ensemble Q-learning and online Markov chain

Author: Hadi S Fard, Hamid Parvin, and Mohammadreza Mahmoudi
Subjects: Geotechnical engineering, Tunnel construction, Rock mechanics, Ensemble deep learning, Q-learning, Markov chain analysis, Medicine, Science
Abstract: Abstract Predicting rock tunnel squeezing in underground projects is challenging due to its intricate and unpredictable nature. This study proposes an innovative approach to enhance the accuracy and reliability of tunnel squeezing prediction. The proposed method combines ensemble learning techniques with Q-learning and online Markov chain integration. A deep learning model is trained on a comprehensive database comprising tunnel parameters including diameter (D), burial depth (H), support stiffness (K), and tunneling quality index (Q). Multiple deep learning models are trained concurrently, leveraging ensemble learning to capture diverse patterns and improve prediction performance. Integration of the Q-learning-Online Markov Chain further refines predictions. The online Markov chain analyzes historical sequences of tunnel parameters and squeezing class transitions, establishing transition probabilities between different squeezing classes. The Q-learning algorithm optimizes decision-making by learning the optimal policy for transitioning between tunnel states. The proposed model is evaluated using a dataset from various tunnel construction projects, assessing performance through metrics like accuracy, precision, recall, and F1-score. Results demonstrate the efficiency of the ensemble deep learning model combined with Q-learning-Online Markov Chain in predicting surrounding rock tunnel squeezing. This approach offers insights into parameter interrelationships and dynamic squeezing characteristics, enabling proactive planning and support measures implementation to mitigate tunnel squeezing hazards and ensure underground structure safety. Experimental results show the model achieves a prediction accuracy of 98.11%, surpassing individual CNN and RNN models, with an AUC value of 0.98.
Published: 2024
Full Text: View/download PDF

42. Exploring the potential of 5G uplink communication: Synergistic integration of joint power control, user grouping, and multi-learning Grey Wolf Optimizer

Author: Sobana Sikkanan, Chandrasekaran Kumar, Premkumar Manoharan, and Sowmya Ravichandran
Subjects: Competitive learning, Grey Wolf Optimizer, Non-orthogonal multiple access (NOMA), Q-learning, Spectral efficiency, User-grouping, Medicine, Science
Abstract: Abstract Non-orthogonal Multiple Access (NOMA) techniques offer potential enhancements in spectral efficiency for 5G and 6G wireless networks, facilitating broader network access. Central to realizing optimal system performance are factors like joint power control, user grouping, and decoding order. This study investigates power control and user grouping to optimize spectral efficiency in NOMA uplink systems, aiming to reduce computational difficulty. While previous research on this integrated optimization has identified several near-optimal solutions, they often come with considerable system and computational overheads. To address this, this study employed an improved Grey Wolf Optimizer (GWO), a nature-inspired metaheuristic optimization method. Although GWO is effective, it can sometimes converge prematurely and might lack diversity. To enhance its performance, this study introduces a new version of GWO, integrating Competitive Learning, Q-learning, and Greedy Selection. Competitive learning adopts agent competition, balancing exploration and exploitation and preserving diversity. Q-learning guides the search based on past experiences, enhancing adaptability and preventing redundant exploration of sub-optimal regions. Greedy selection ensures the retention of the best solutions after each iteration. The synergistic integration of these three components substantially enhances the performance of the standard GWO. This algorithm was used to manage power and user-grouping in NOMA systems, aiming to strengthen system performance while restricting computational demands. The effectiveness of the proposed algorithm was validated through numerical evaluations. Simulated outcomes revealed that when applied to the joint challenge in NOMA uplink systems, it surpasses the spectral efficiency of conventional orthogonal multiple access. Moreover, the proposed approach demonstrated superior performance compared to the standard GWO and other state-of-the-art algorithms, achieving reduced system complexity under identical constraints.
Published: 2024
Full Text: View/download PDF

43. Disassembly line optimization with reinforcement learning.

Author: Kegyes, Tamás, Süle, Zoltán, and Abonyi, János
Subjects: MACHINE learning, REINFORCEMENT learning, QUADRATIC programming, INVERSE problems, RESEARCH personnel
Abstract: As the environmental aspects become increasingly important, the disassembly problems have become the researcher's focus. Multiple criteria do not enable finding a general optimization method for the topic, but some heuristics and classical formulations provide effective solutions. By highlighting that disassembly problems are not the straight inverses of assembly problems and the conditions are not standard, disassembly optimization solutions require human control and supervision. Considering that Reinforcement learning (RL) methods can successfully solve complex optimization problems, we developed an RL-based solution for a fully formalized disassembly problem. There were known successful implementations of RL-based optimizers. But we integrated a novel heuristic to target a dynamically pre-filtered action space for the RL agent (dlOptRL algorithm) and hence significantly raise the efficiency of the learning path. Our algorithm belongs to the Heuristically Accelerated Reinforcement Learning (HARL) method class. We demonstrated its applicability in two use cases, but our approach can also be easily adapted for other problem types. Our article gives a detailed overview of disassembly problems and their formulation, the general RL framework and especially Q-learning techniques, and a perfect example of extending RL learning with a built-in heuristic. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Vision-based robotic peg-in-hole research: integrating object recognition, positioning, and reinforcement learning.

Author: Chen, Chengjun, Wang, Hao, Pan, Yong, and Li, Dongnian
Abstract: The peg-in-hole task is important in robotics. Visual inspection is a crucial method for recognition and positioning during this task. Currently, vision-based peg-in-hole techniques suffer from limited applicability and low positioning accuracy. Therefore, this study introduced a general vision-based approach for robotic peg-in-hole tasks. This approach delineates the process into two stages. First, during the object recognition, positioning, and approach phases, a coarse adjustment technique for the assembly pose of the robot's end effector was proposed based on object recognition. This method determines the pose of the hole to be assembled through ellipse fitting, thereby guiding the robot to approach the hole. Second, a Q-learning-based method was introduced to fine adjust the robot end effector's pose and position. Q-learning was applied to the scenario of small-scale adjustment of the robotic peg-in-hole, the reward function based on the pixel area of the gap and the included angle between central axes of peg and hole are designed. Finally, the feasibility and efficacy of this method are substantiated through a series of assembly experiments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Energy and Throughput Management in Wireless Body Area Network with Wireless Information and Energy Transfer using Reinforcement Learning.

Author: Rashidi, Z. and Majidi, M.
Subjects: BODY area networks, WIRELESS communications, ENERGY transfer, REINFORCEMENT learning, BLUETOOTH technology
Abstract: In this paper, we address the challenges of energy and throughput management in a Wireless Body Area Network (WBAN) with a focus on a heart rate sensor. Our approach utilizes the sleep and wake-up method to minimize sensor energy consumption while harnessing Radio Frequency (RF) waves and human activities (running, walking, and relaxing) as Energy Harvesting (EH) sources to supplement battery power. Bluetooth Low Energy 5 (BLE5) technology is employed for wireless information and energy transfer. Our goal is to strike a balance between throughput and battery residual energy. The advantages of using Q-learning for action selection in comparison to Random Action (RA) selection are demonstrated through simulations. The results reveal that the reward function in Q-learning, incorporating a balancing parameter, effectively achieves a compromise between throughput and battery residual energy. Additionally, our Q-learning method improves system throughput by 43% compared to RA selection. In addition, we compare the performance of the Q-learning and State- Action- RewardState- Action (SARSA) algorithms using the same reward function to evaluate their respective abilities in managing system throughput and battery residual energy. These findings have significant implications for developing energy-efficient WBANs, enabling prolonged operation and reliable data transmission. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. An intelligent decision system for virtual machine migration based on specific Q-learning

Author: Xinying Zhu, Ran Xia, Hang Zhou, Shuo Zhou, and Haoran Liu
Subjects: Virtual machine migration, SLAV, Energy conservation, Q-learning, Computer engineering. Computer hardware, TK7885-7895, Electronic computers. Computer science, QA75.5-76.95
Abstract: Summary Due to the convenience of virtualization, the live migration of virtual machines is widely used to fulfill optimization objectives in cloud/edge computing. However, live migration may lead to side effects and performance degradation when migration is overused or an unreasonable migration process is carried out. One pressing challenge is how to capture the best opportunity for virtual machine migration. Leveraging rough sets and AI, this paper provides an innovative strategy based on Q-learning that is designed for migration decisions. The highlight of our strategy is the harmonious mechanism for applying rough sets and Q-learning. For the ABDS (adaptive boundary decision system) strategy in this paper, the exploration space of Q learning is confined by the boundary region of rough sets, while the thresholds of the boundary region can be dynamically adjusted by the reaction results from the computing cluster. The structure and mechanism of the ABDS strategy are described in this paper. The corresponding experiments show a firm advantage for the cooperation of rough sets and reinforcement learning algorithms. Considering both the energy consumption and application performance, the ABDS strategy in this paper outperforms the benchmark strategies in comprehensive performance.
Published: 2024
Full Text: View/download PDF

47. A Q-Learning Based Hybrid Meta-Heuristic for Integrated Scheduling of Disassembly and Reprocessing Processes Considering Product Structures and Stochasticity

Author: Fuquan Wang, Yaping Fu, Kaizhou Gao, Yaoxin Wu, and Song Gao
Subjects: remanufacturing scheduling, disassembly, reprocessing, meta-heuristic, q-learning, Electronic computers. Computer science, QA75.5-76.95, Systems engineering, TA168
Abstract: Remanufacturing is regarded as a sustainable manufacturing paradigm of energy conservation and environment protection. To improve the efficiency of the remanufacturing process, this work investigates an integrated scheduling problem for disassembly and reprocessing in a remanufacturing process, where product structures and uncertainty are taken into account. First, a stochastic programming model is developed to minimize the maximum completion time (makespan). Second, a Q-learning based hybrid meta-heuristic (Q-HMH) is specially devised. In each iteration, a Q-learning method is employed to adaptively choose a premium algorithm from four candidate ones, including genetic algorithm (GA), artificial bee colony (ABC), shuffled frog-leaping algorithm (SFLA), and simulated annealing (SA) methods. At last, simulation experiments are carried out by using sixteen instances with different scales, and three state-of-the-art algorithms in literature and an exact solver CPLEX are chosen for comparisons. By analyzing the results with the average relative percentage deviation (RPD) metric, we find that Q-HMH outperforms its rivals by 9.79%−26.76%. The results and comparisons verify the excellent competitiveness of Q-HMH for solving the concerned problems.
Published: 2024
Full Text: View/download PDF

48. An improved fruit fly optimization algorithm with Q-learning for solving distributed permutation flow shop scheduling problems

Author: Cai Zhao, Lianghong Wu, Cili Zuo, and Hongqiang Zhang
Subjects: Fruit fly optimization algorithm, Makespan, Distributed permutation flow-shop scheduling, Q-learning, Electronic computers. Computer science, QA75.5-76.95, Information technology, T58.5-58.64
Abstract: Abstract The distributed permutation flow shop scheduling problem (DPFSP) is one of the hottest issues in the context of economic globalization. In this paper, a Q-learning enhanced fruit fly optimization algorithm (QFOA) is proposed to solve the DPFSP with the goal of minimizing the makespan. First, a hybrid strategy is used to cooperatively initialize the position of the fruit fly in the solution space and the boundary properties are used to improve the operation efficiency of QFOA. Second, the neighborhood structure based on problem knowledge is designed in the smell stage to generate neighborhood solutions, and the Q-learning method is conducive to the selection of high-quality neighborhood structures. Moreover, a local search algorithm based on key factories is designed to improve the solution accuracy by processing sequences of subjobs from key factories. Finally, the proposed QFOA is compared with the state-of-the-art algorithms for solving 720 well-known large-scale benchmark instances. The experimental results demonstrate the most outstanding performance of QFOA.
Published: 2024
Full Text: View/download PDF

49. An intelligent decision system for virtual machine migration based on specific Q-learning.

Author: Zhu, Xinying, Xia, Ran, Zhou, Hang, Zhou, Shuo, and Liu, Haoran
Subjects: ARTIFICIAL intelligence, MACHINE learning, ROUGH sets, REINFORCEMENT learning, COMPUTER workstation clusters, INTELLIGENT tutoring systems, VIRTUAL machine systems
Abstract: Summary: Due to the convenience of virtualization, the live migration of virtual machines is widely used to fulfill optimization objectives in cloud/edge computing. However, live migration may lead to side effects and performance degradation when migration is overused or an unreasonable migration process is carried out. One pressing challenge is how to capture the best opportunity for virtual machine migration. Leveraging rough sets and AI, this paper provides an innovative strategy based on Q-learning that is designed for migration decisions. The highlight of our strategy is the harmonious mechanism for applying rough sets and Q-learning. For the ABDS (adaptive boundary decision system) strategy in this paper, the exploration space of Q learning is confined by the boundary region of rough sets, while the thresholds of the boundary region can be dynamically adjusted by the reaction results from the computing cluster. The structure and mechanism of the ABDS strategy are described in this paper. The corresponding experiments show a firm advantage for the cooperation of rough sets and reinforcement learning algorithms. Considering both the energy consumption and application performance, the ABDS strategy in this paper outperforms the benchmark strategies in comprehensive performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Reinforcement imitation learning for reliable and efficient autonomous navigation in complex environments.

Author: Kumar, Dharmendra
Subjects: *REINFORCEMENT learning, *ARTIFICIAL neural networks, *NAVIGATION, *MACHINE learning
Abstract: Reinforcement learning (RL) and imitation learning (IL) are quite two useful machine learning techniques that were shown to be potential in enhancing navigation performance. Basically, both of these methods try to find a policy decision function in a reinforcement learning fashion or through imitation. In this paper, we propose a novel algorithm named Reinforcement Imitation Learning (RIL) that naturally combines RL and IL together in accelerating more reliable and efficient navigation in dynamic environments. RIL is a hybrid approach that utilizes RL for policy optimization and IL as some kind of learning from expert demonstrations with the inclusion of guidance. We present the comparison of the convergence of RIL with conventional RL and IL to provide the support for our algorithm's performance in a dynamic environment with moving obstacles. The results of the testing indicate that the RIL algorithm has better collision avoidance and navigation efficiency than traditional methods. The proposed RIL algorithm has broad application prospects in many specific areas such as an autonomous driving, unmanned aerial vehicles, and robots. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

7,239 results on '"q-learning"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources