Descriptor: "q-learning" / Journal: neurocomputing - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"q-learning"' showing total 41 results

Start Over Descriptor "q-learning" Journal neurocomputing

41 results on '"q-learning"'

1. Output feedback Q-learning for discrete-time finite-horizon zero-sum games with application to the [formula omitted] control.

Author: Liu, Mingxiang, Cai, Qianqian, Li, Dandan, Meng, Wei, and Fu, Minyue
Subjects: *ZERO sum games, *REINFORCEMENT learning, *LINEAR control systems, *STATE feedback (Feedback control systems), *RICCATI equation, *VECTOR valued functions, *HORIZON
Abstract: In this paper, we present a Q-learning framework for solving finite-horizon zero-sum game problems involving the H ∞ control of linear system without knowing the dynamics. Research in the past mainly focused on solving problems in infinite horizon with completely measurable state. However, in the practical engineering, the system state is not always directly accessible, and it is difficult to solve the time-varying Riccati equation associated with the finite-horizon setting directly either. The main contribution of the proposed model-free algorithm is to determine the optimal output feedback policies without measurement state in finite-horizon setting. To achieve this goal, we first describe the Q-function caused by finite-horizon problems in the context of state feedback, then we parameterize the Q-functions as input–output vectors functions. Finally, the numerical examples on aircraft dynamics demonstrate the algorithm's efficiency. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. Multi-objective fuzzy Q-learning to solve continuous state-action problems.

Author: Asgharnia, Amirhossein, Schwartz, Howard, and Atia, Mohamed
Subjects: *MACHINE learning, *REINFORCEMENT learning, *FUZZY logic, *MATHEMATICAL optimization, *FUZZY systems, *SOCIAL problems
Abstract: Many real world problems are multi-objective. Thus, the need for multi-objective learning and optimization algorithms is inevitable. Although the multi-objective optimization algorithms are well-studied, the multi-objective learning algorithms have attracted less attention. In this paper, a fuzzy multi-objective reinforcement learning algorithm is proposed, and we refer to it as the multi-objective fuzzy Q-learning (MOFQL) algorithm. The algorithm is implemented to solve a bi-objective reach-avoid game. The majority of the multi-objective reinforcement algorithms proposed address solving problems in the discrete state-action domain. However, the MOFQL algorithm can also handle problems in a continuous state-action domain. A fuzzy inference system (FIS) is implemented to estimate the value function for the bi-objective problem. We used a temporal difference (TD) approach to update the fuzzy rules. The proposed method is a multi-policy multi-objective algorithm and can find the non-convex regions of the Pareto front. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. Effect of state transition triggered by reinforcement learning in evolutionary prisoner's dilemma game.

Author: Guo, Hao, Wang, Zhen, Song, Zhao, Yuan, Yuan, Deng, Xinyang, and Li, Xuelong
Subjects: *REINFORCEMENT learning, *PRISONER'S dilemma game, *MACHINE learning, *GAME theory
Abstract: Cooperative behavior is essential for conflicts between collective and individual benefits, and evolutionary game theory provides a key framework to solve this problem. Decision-making of human or automata agent occurs not only in a static environment, but also in the dynamic interactive environment. Since the reinforcement learning algorithm is well performed at explaining the problem in regard to state, action, and environment, we propose a game model with individual state transition which is influenced by the self-regarding Q -learning algorithm. In detail, we at the first time investigate a two-state two-action game, where agents can choose either to participate in the network game (i.e., active agent) or to cut off all the links based on its Q -table (i.e., inactive agent), involving in a dynamic interactive environment. Through numerical simulations, it is shown that cooperation can reach the maximal level in the moderate value space of fixed reward obtained by inactive agents. In particular, long-term expectations and large learning rates are more productive in promoting cooperation. Furthermore, when the dynamic interactive environment reaches a stable state, the number of active neighbors of active cooperators is larger than that of active defectors, which is further larger than the number of active neighbors of inactive agents. Finally, we testify the results of theoretical analysis from the perspective of state transition. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

4. MICQ-IPSO: An effective two-stage hybrid feature selection algorithm for high-dimensional data.

Author: Li, Xinqian and Ren, Jia
Subjects: *FEATURE selection, *PARTICLE swarm optimization, *ALGORITHMS, *MACHINE learning, *PATTERN recognition systems
Abstract: In machine learning and pattern recognition tasks, classification performance is often degraded due to the existence of irrelevant and redundant features, especially for high-dimensional data. As a data preprocessing tool, feature selection can improve classification performance while reducing the number of features. Focusing on high-dimensional data, we propose a novel two-stage hybrid feature selection method that combines the maximum information coefficient (MIC) based Q-learning algorithm and the improved particle swarm optimization (PSO) based algorithm, named as MICQ-IPSO. In the first stage, we employed an intelligent feature pre-screening operation to get a rough feature subset, which introduced MIC value as the correlation measure and automated the determination of the screening threshold by Q-learning. In the second stage, we applied an improved PSO-based method to get an optimal feature subset. During this stage, a swarm initialization strategy based on MIC correlation was used to narrow the search range and accelerate swarm convergence. To further enhance the exploitability, a deeper local search operation was performed in the search region. Moreover, a particle reset strategy was adopted to help particles jump out of the local optimal solution. Finally, we evaluated our algorithm against several state-of-the-art feature selection approaches on 17 benchmark datasets. The experimental results demonstrate the effectiveness and competitiveness of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

5. Neural network-based reinforcement learning control for combined spacecraft attitude tracking maneuvers.

Author: Liu, Yuhan, Ma, Guangfu, Lyu, Yueyong, and Wang, Pengyu
Subjects: *REINFORCEMENT learning, *SPACE vehicles, *PARAMETER identification, *ARTIFICIAL satellite attitude control systems, *TRACKING control systems, *ATTITUDE (Psychology), *DYNAMIC models
Abstract: This paper proposes a novel reinforcement learning-based attitude tracking control strategy for combined spacecraft takeover maneuvers with completely unknown dynamics. One major issue in the context of combined spacecraft attitude takeover control is that the accurate dynamic model is highly nonlinear, complex and costly to identify online, which makes it impractical for control design. To address this issue, we take the advantage of the Q-learning algorithm to acquire the control strategy directly from system input/output measurement data in a model-free manner, and thus the online inertia parameter identification procedure is avoided. More specifically, first, the attitude tracking is formulated as a regulation problem by introducing an argumented system, where the system dynamic model is still required in control design. Then, in order to achieve a model-free control strategy, an online policy-iteration (PI) Q-learning procedure is derived to solve the Bellman optimality equation by utilizing the generated measurement data. In theoretical analysis, it is proved that the iteration sequences of Q value function and control strategy can converge to the optimal ones. In addition, rigorous proof of the stability and monotonicity guarantees of the proposed control strategy are also provided. Furthermore, for the purpose of online implementation, off-policy learning scheme is employed to find the optimal Q value function approximator with neural network structure after data-collection phase. Numerical simulations are exhibited to validate the effectiveness of the proposed strategy. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

6. Periodic update rule with Q-learning promotes evolution of cooperation in game transition with punishment mechanism.

Author: Yan, Zeyuan, Li, Li, Shang, Jun, and Zhao, Hui
Subjects: *PUNISHMENT (Psychology), *REINFORCEMENT learning, *ROLE conflict, *GAME theory, *PUNISHMENT
Abstract: Cooperative behavior assumes a critical role in resolving conflicts arising between collective and individual interests, while punishment measures serve as a robust deterrent against opportunistic free-riding. Within this context, evolutionary game theory (EGT) emerges as an indispensable paradigm for addressing this multifaceted issue. When it comes to introspection behaviors, reinforcement learning (RL) methods exhibit remarkable capabilities to capture agents' cognitive processes. Nonetheless, previous research has often focused on a static and time-invariant update rule, neglecting the dynamic nature of real-world scenarios where individuals can flexibly transit between strategies in periodic time-dependent patterns. Here, we propose periodic update rules with Q-learning algorithm and game transition model with a punishment mechanism that grants cooperative agents the autonomy to exercise discretion in deciding whether to initiate punishment actions. The agents display dynamic rules periodically through game model transitions, thus ensuring EGT's inherent adaptability. By employing Monte Carlo (MC) simulations, we analyze the emergence of cooperation that underscores the substantial enhancement of cooperative behavior through the proposed periodic update rules with Q-learning algorithm and game transitions in the presence of punishment. Our study highlights the indispensable significance of appropriate periodic intervals for updating rules and determining optimal punishment costs in the game transition model as critical elements for fostering the evolution of cooperation in real-world scenarios. • We pioneeringly propose an improved Q-learning algorithm with periodic update rules, the Fermi and aspiration-driven update rules, to study the evolutionary dynamics of cooperation. • We develop a novel game transition model with a punishment mechanism via the proposed improved Q-learning algorithm to solve the "second-order PDG dilemma". • We experimentally prove the effectiveness of our proposed algorithm and explore how a series of parameters influence the evolution of cooperation through comprehensive Monte Carlo (MC) simulations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Adjustable iterative Q-learning for advanced neural tracking control with stability guarantee.

Author: Wang, Yuan, Wang, Ding, Zhao, Mingming, Liu, Ao, and Qiao, Junfei
Subjects: *ITERATIVE learning control, *DYNAMIC programming, *ALGORITHMS, *ARTIFICIAL satellite tracking
Abstract: In this article, an accelerated Q-learning algorithm with evolving control is established to solve the optimal tracking control problem. First, an accelerated Q-learning scheme is constructed with an advanced Q-function. By utilizing the advanced Q-function, calculating of the feedforward control input can be avoided and the terminal tracking error can be eliminated. Then, by introducing the relaxation factor, the convergence rate of the iterative Q-function sequence is accelerated significantly, which is a potential way to diminish the computational burden. Furthermore, the convergence, positive definiteness, and stability conditions of the accelerated Q-learning algorithm are analyzed with some preconditions of the relaxation factor. Thus, the developed algorithm can achieve evolving control. Finally, the fantastic performance of the developed algorithm with critic network implementation is verified through two simulation examples. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Subject sensitive EEG discrimination with fast reconstructable CNN driven by reinforcement learning: A case study of ASD evaluation.

Author: Dong, Heyou, Chen, Dan, Zhang, Lei, Ke, Hengjin, and Li, Xiaoli
Subjects: *AUTISM spectrum disorders, *REINFORCEMENT learning, *CONVOLUTIONAL neural networks, *ELECTROENCEPHALOGRAPHY, *TACIT knowledge, *DISCRETE systems, *BIOMEDICAL signal processing
Abstract: Recent Electroencephalogram (EEG) analysis in connection with brain disorders has been tremendously benefiting from the (Deep) Neural Network technology in neuroscience research and neuro-engineering practices. However, the performance of existing hand-crafted models, such as the stability, has largely been refrained. This is the case especially in the paradigms that sensitive to the individuality of subjects and the non-stationarity of cognitive dynamics, such as Autism Spectrum Disorder (ASD) evaluation. Aiming at this problem, this study develops a Q-Learning method to enable fast reconstruction of Convolutional Neural Network (CNN) thus to support EEG discrimination adapting to the individuality of subjects under examination. The proposed method first generates a CNN model with the structure and hyper-parameters determined (i.e., Neural Architecture Search) by the customized Q-Learning algorithm, where the CNN model is treated as a discrete system to be optimized. With the sharp shift of subjects, the Q-Learning algorithm reconstructs the CNN model to reach optimization reusing the tacit knowledge learned from the previous trials. A case study has been performed to check the proposed method versus state-of-the-art counterparts based on resting-state EEG collected from 175 ASD-suspicious children with a diverse geological distribution. The observations in the case study indicate that: 1) the method outperforms the counterparts with an individual/sample accuracy of 92.63 % / 83.23 % achieved; 2) the method can quickly reconstruct the CNN model with the group of subjects shifting from one region to another to maintain an encouraging performance while the counterparts without reconstruction may drop by about 12 %. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

9. Stable approximate Q-learning under discounted cost for data-based adaptive tracking control.

Author: Liang, Zhantao, Ha, Mingming, Liu, Derong, and Wang, Yonghua
Subjects: *ADAPTIVE control systems, *APPROXIMATION error, *ERROR functions, *DISCRETE-time systems, *COST
Abstract: In this paper, the stability of tracking error dynamics under the data-based discounted iterative Q-learning is investigated. First, a novel performance index with a discount factor is introduced into the iterative Q-learning-based tracking control. Then, considering the approximation errors caused by the Q-function approximator, the finite error bound between the iterative and optimal Q-functions is established. Moreover, based on the new stability analysis, the selection rule of the discount factor is developed, which ensures that the corresponding optimal control policy is admissible. Next, to ensure the stability of the tracking error dynamics under iterative control policies, the stability condition about the approximate Q-function is established. It is guaranteed that iterative control policies derived from the critic network drive the tracking error to zero. Additionally, considering the adopted policy function approximator, the upper bound function of the approximation errors is developed. It is ensured that the trained action network stabilizes the tracking error dynamics. Finally, a simulation example is utilized to implement the data-based discounted iterative Q-learning and verify the present theoretical results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Active disturbance rejection controller for multi-area interconnected power system based on reinforcement learning.

Author: Zheng, Yuemin, Chen, Zengqiang, Huang, Zhaoyang, Sun, Mingwei, and Sun, Qinglin
Subjects: *REINFORCEMENT learning, *ADAPTIVE natural resource management, *PROBLEM solving
Abstract: In this paper, a method of Active Disturbance Rejection Controller (ADRC) is proposed based on Q-learning of Reinforcement Learning (RL) for multi-area interconnected power system. Excessive changes in load can cause instability to the system. Therefore, the ADRC controller is used to keep the load within rated range for its strong anti-interference performance and Q-learning algorithm to select the adaptive parameters of the controller. Finally, through simulation experiments on traditional and deregulated three-area interconnected power system respectively, the effectiveness of the proposed method is proved and the results show that reinforcement learning can indeed be used to solve the problem of controller parameter adjustment. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

11. Deep reinforcement learning based lane detection and localization.

Author: Zhao, Zhiyuan, Wang, Qi, and Li, Xuelong
Subjects: *REINFORCEMENT learning, *DEEP learning, *DRIVER assistance systems, *CONVOLUTIONAL neural networks
Abstract: Recently, deep-learning based lane detection methods effectively boost the development of Advanced Driver Assistance Systems (ADAS) and Self-Driving Systems. However, these methods only detect lane lines with sketchy bounding boxes while ignore the shape of specific curved lanes. To address the above problems, this paper introduces deep reinforcement learning into cursory lane detection models for accurate lane detection and localization. This model consists of two stages, namely the bounding box detector and landmark point localizer. To be specific, a bounding box level convolution neural network lane detector outputs the preliminary location of lanes in the form of bounding boxes. Then, a reinforcement based Deep Q-Learning Localizer (DQLL) accurately localizes the lanes as a group of landmarks to achieve better representation of curved lanes. Moreover, a pixel-level lane detection dataset named NWPU Lanes Dataset is constructed and released. It contains a variety of real traffic scenes and accurate masks of the lane lines. This approach achieves competitive performance in the released dataset and TuSimple Lane dataset. Furthermore, the codes and dataset will be released on https://github.com/tuzixini/DQLL. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

12. Q-Learning-based parameters adaptive algorithm for active disturbance rejection control and its application to ship course control.

Author: Chen, Zengqiang, Qin, Beibei, Sun, Mingwei, and Sun, Qinglin
Subjects: *ALGORITHMS, *NAVAL architecture, *REINFORCEMENT learning, *ADAPTIVE control systems
Abstract: • First, we design a LADRC ship course controller to deal with nonlinearity and uncertainty caused by complex sea conditions, such as wind, waves and load. • Second, to achieve better control performance, we propose an adaptive algorithm based on Q-learning. • Furthermore, some desired states are initialized with higher probability during the training in order to train the corresponding Q-value more adequately. This paper concerns with the method of parameters tuning and the capability of active disturbance rejection control (ADRC) to the nonlinear plants. Firstly, an adaptive method of ADRC parameters based on Q-learning is proposed. Besides, to verify the effectiveness of the proposed parameter tuning strategy, the novel method is applied to the ship course control which has multifarious uncertainties due to the disturbance of wind, waves and currents. And then, for better control performance, when the training of Q value-table, the states stochastic initialize of each episode is not equiprobability, different state has different weights. The simulation results of both adaptive ADRC and linear active disturbance rejection control (LADRC) show that the proposed algorithm has the advantages of robustness and higher tracking precision. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

13. Adaptive visual servoing with an uncalibrated camera using extreme learning machine and Q-leaning.

Author: Kang, Meng, Chen, Hao, and Dong, Jiuxiang
Subjects: *MACHINE learning, *CAMERA calibration, *REINFORCEMENT learning, *CAMERAS
Abstract: In this paper, a novel image-based visual servoing (IBVS) method using Extreme Learning Machine (ELM) and Q -learning is proposed to solve the problems of complex modeling and selection of the servo gain. First, the pseudoinverse of the interaction matrix is approached by ELM which avoids the singularity of the interaction matrix effectively and is robust to interferences such as feature noises and camera calibration errors. Second, a reinforcement learning method, Q -learning, is adopted to adaptively adjust the servo gain in order to improve the convergence speed and stability. Compared with other methods, ELM has better generalization performance, faster operation speed and a unique optimal solution. Also, Q -learning has self-learning ability without experience in advance. The effectiveness of the proposed method is validated by simulations and experiment on a 6-DOF robot with eye-in-hand configuration. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

14. Target transfer Q-learning and its convergence analysis.

Author: Wang, Yue, Liu, Yuting, Chen, Wei, Ma, Zhi-Ming, and Liu, Tie-Yan
Subjects: *TECHNOLOGY convergence, *REINFORCEMENT learning, *DEEP learning, *ACQUISITION of data
Abstract: Reinforcement Learning (RL) technologies are powerful to learn how to interact with environments and have been successfully applied to various important applications. Q-learning is one of the most popular methods in RL, which leverages the Bellman equation to update the Q-function. Considering that data collection in RL is both time and cost consuming and Q-learning converges slowly, different kinds of transfer RL algorithms are designed to improve the sample complexity of the new tasks 1 1 In order to avoid confusion, we use "old/new tasks" instead of "source/target tasks" in this paper.. However, most of the previous transfer RL algorithms are similar to the transfer learning methods in deep learning and are heuristic with no theoretical guarantee of the convergence rate. Therefore, it is important for us to clearly understand how and when will transfer learning help RL method and provide the theoretical guarantee for the improvement of the sample complexity. In this paper, we rethink the transfer Rl problems in the RL perspective and propose to transfer the Q-function learned in the old task to the target Q-function in the Q-learning of the new task. We call this new transfer Q-learning method target transfer Q-Learning (abbrev. TTQL). The transfer process is controlled by the error condition which can help to avoid the harm to the new tasks brought by the transferred target. We design the error condition in TTQL as whether the Bellman error of the transferred target Q-function is less than the current Q-function. We show that TTQL with the error condition will achieve a faster convergence rate than Q-learning. Our experiments are consistent with our theoretical results and verify the effectiveness of our proposed target transfer Q-learning method. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

15. An improved fuzzy ARTMAP and Q-learning agent model for pattern classification.

Author: Pourpanah, Farhad, Wang, Ran, Lim, Chee Peng, Wang, Xizhao, Seera, Manjeevan, and Tan, Choo Jun
Subjects: *MULTIAGENT systems, *BENCHMARK problems (Computer science), *REINFORCEMENT learning, *CLASSIFICATION, *FUZZY systems, *FUZZY neural networks
Abstract: • An improved multi-agent system based on Fuzzy ARTMAP (FAM) and Q-learning, i.e., IQ-MACS, is proposed. • A trust measurement using a combination of Q-learning and Bayesian formalism is added at team level. • Two real-world case studies, i.e., human motion detection and motor fault detection, with and without noise are used to evaluate the proposed model. • The outcome indicates that IQMACS is able to produce promising results. The Fuzzy ARTMAP (FAM) network is an online supervised neural network that operates by computing the similarity level between the new sample and those prototype nodes stored in its network against a threshold. In our previous study, we have developed a multi-agent system consisting of an ensemble of FAM networks and Q-learning, known as QMACS, for data classification. In this paper, an Improved QMACS (IQMACS) model with trust measurement using a combination of Q-learning and Bayesian formalism is proposed. A number of benchmark and real-world problems, i.e., motor fault detection and human motion detection, are conducted to evaluate the effectiveness of IQMACS. Statistical features are extracted from real-world case studies and utilized for classification with IQMACS, QMACS, and their constituents. The experimental results indicate that IQMACS produces better classification performance by combining the outcomes of its constituents as compared with those of QMACS and other related methods. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

16. A cognitive control approach for microgrid performance optimization in unstable wireless communication.

Author: Fang, Xiaohan, Han, Yinghua, Wang, Jinkuan, and Zhao, Qiang
Subjects: *WIRELESS communications, *COGNITIVE learning, *SUPPLY & demand, *CONTROL rooms, *MACHINE learning, *REINFORCEMENT learning
Abstract: As a pivotal component of microgrid control and optimization, demand response management (DRM) is implemented in bidirectional communication between power suppliers and users to enhance the efficiency of the microgrid, and to balance supply and demand. Unstable communication, especially in wireless networks, will inevitably influence DRM performance. In this paper, a novel cognitive control approach is adopted to relieve the negative impacts on microgrid DRM performance. Unlike previous studies that have tried to resolve the impacts through communication methods, our machine learning method considers both communication quality and lost data estimates. The notion of information gap is introduced to represent information uncertainties caused by unstable communication. Moreover, cognitive control models are formulated in both the user agent and the microgrid control center agent, and a Q-learning algorithm is used as a cognitive learning method to generate different optimal policies for both types of agents based on attainable information. Our simulation results demonstrate the relationship between DRM performance and the unstable wireless environment. Moreover, the effectiveness of our proposed cognitive control approach is verified. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

17. Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm.

Author: Wang, Tao, Zhang, Huaguang, and Luo, Yanhong
Subjects: *STOCHASTIC processes, *H2 control, *OPTIMAL control theory, *DISCRETE-time systems, *MACHINE learning
Abstract: Solving the stochastic linear quadratic (SLQ) optimal control problem generally needs full information about system dynamics. In this paper, a Q-learning iteration algorithm is adopted to solve the control problem for model-free discrete-time systems. Firstly, the condition of the well-posedness for the SLQ problem is given. In order to solve the SLQ problem, the stochastic problem is transformed into the deterministic one. Secondly, in the iteration process of Q-learning algorithm, the H matrix sequence and control gain matrix sequence are obtained without the knowledge of system parameters, and the convergence proof of two sequences is also given. Lastly, two simulation examples are supplied to explain the effectiveness of the Q-learning algorithm. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

18. Data-driven model-free slip control of anti-lock braking systems using reinforcement Q-learning.

Author: Radac, Mircea-Bogdan and Precup, Radu-Emil
Subjects: *ANTILOCK brake systems in automobiles, *AUTOMOBILE brakes, *ARTIFICIAL neural networks, *ACQUISITION of data, *INDUSTRIAL applications
Abstract: This paper proposes the design and implementation of a model-free tire slip control for a fast and highly nonlinear Anti-lock Braking System (ABS). A reinforcement Q-learning optimal control approach is inserted in a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively. The transition samples required for learning high performance control can be collected by interacting with the process either by online exploiting the current iteration controller (or policy) under an ε-greedy exploration strategy, or by using data collected under any other controller that is capable of ensuring efficient exploration of the action-state space. Both approaches are highlighted in the paper. Fortunately, the ABS process fits this type of learning-by-interaction because it does not need an initial stabilizing controller. The validation case studies conducted on a real laboratory setup reveal that high control system performance can be achieved using the proposed approaches. Insightful comments on the observed control behavior are offered along with performance comparisons with several types of model-based and model-free controllers including relay, model-based optimal PI, an original model-free neural network state-feedback VRFT controller and a model-free neural network adaptive actor-critic one. With the ability to improve control performance starting from different supervisory controllers or to learn high performance controllers from scratch, the proposed Q-learning optimal control approach proves its performance in a wide operating range and is therefore recommended to its industrial application on ABS. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

19. A temporal difference method for multi-objective reinforcement learning.

Author: Ruiz-Montiel, Manuela, Mandow, Lawrence, and Pérez-de-la-Cruz, José-Luis
Subjects: *REINFORCEMENT learning, *MARKOV processes, *MULTIPLE criteria decision making, *PARETO optimum, *MACHINE learning
Abstract: This work describes MPQ-learning, an algorithm that approximates the set of all deterministic non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for an objective to maximize. MPQ-learning generalizes directly the ideas of Q-learning to the multi-objective case. It can be applied to non-convex Pareto frontiers and finds both supported and unsupported solutions. We present the results of the application of MPQ-learning to some benchmark problems. The algorithm solves successfully these problems, so showing the feasibility of this approach. We also compare MPQ-learning to a standard linearization procedure that computes only supported solutions and show that in some cases MPQ-learning can be as effective as the scalarization method. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

20. Identification and off-policy learning of multiple objectives using adaptive clustering.

Author: Karimpanal, Thommen George and Wilhelm, Erik
Subjects: *MULTIPLE criteria decision making, *CLUSTERING of particles, *ADAPTIVE estimation (Statistics), *REINFORCEMENT learning, *EXPLANATION-based learning
Abstract: In this work, we present a methodology that enables an agent to make efficient use of its exploratory actions by autonomously identifying possible objectives in its environment and learning them in parallel. The identification of objectives is achieved using an online and unsupervised adaptive clustering algorithm. The identified objectives are learned (at least partially) in parallel using Q − learning. Using a simulated agent and environment, it is shown that the converged or partially converged value function weights resulting from off-policy learning can be used to accumulate knowledge about multiple objectives without any additional exploration. We claim that the proposed approach could be useful in scenarios where the objectives are initially unknown or in real world scenarios where exploration is typically a time and energy intensive process. The implications and possible extensions of this work are also briefly discussed. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

21. Autonomous quadrotor obstacle avoidance based on dueling double deep recurrent Q-learning with monocular vision

Author: Ming Zhu, Xiao Guo, Jiajun Ou, and Wenjie Lou
Subjects: FOS: Computer and information sciences, 0209 industrial biotechnology, Computer science, Generalization, business.industry, Cognitive Neuroscience, Q-learning, Systems and Control (eess.SY), 02 engineering and technology, Electrical Engineering and Systems Science - Systems and Control, Computer Science Applications, Computer Science - Robotics, 020901 industrial engineering & automation, Action (philosophy), Artificial Intelligence, Obstacle avoidance, FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Robotics (cs.RO), Monocular vision
Abstract: The rapid development of unmanned aerial vehicles (UAV) puts forward a higher requirement for autonomous obstacle avoidance. Due to the limited payload and power supply, small UAVs such as quadrotors usually carry simple sensors and computation units, which makes traditional methods more challenging to implement. In this paper, a novel framework is demonstrated to control a quadrotor flying through crowded environments autonomously with monocular vision. The framework adopts a two-stage architecture, consisting of a sensing module and a decision module. The sensing module is based on an unsupervised deep learning method. And the decision module uses dueling double deep recurrent Q-learning to eliminate the adverse effects of limited observation capacity of an on-board monocular camera. The framework enables the quadrotor to realize autonomous obstacle avoidance without any prior environment information or labeled datasets for training. The trained model shows a high success rate in the simulation and a good generalization ability for transformed scenarios., 23 pages, 10 figures
Published: 2021

22. Q-Learning-based parameters adaptive algorithm for active disturbance rejection control and its application to ship course control

Author: Qinglin Sun, Beibei Qin, Mingwei Sun, and Zengqiang Chen
Subjects: 0209 industrial biotechnology, Adaptive algorithm, Computer science, Cognitive Neuroscience, Q-learning, 02 engineering and technology, Active disturbance rejection control, Computer Science Applications, Nonlinear system, 020901 industrial engineering & automation, Artificial Intelligence, Control theory, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing
Abstract: This paper concerns with the method of parameters tuning and the capability of active disturbance rejection control (ADRC) to the nonlinear plants. Firstly, an adaptive method of ADRC parameters based on Q-learning is proposed. Besides, to verify the effectiveness of the proposed parameter tuning strategy, the novel method is applied to the ship course control which has multifarious uncertainties due to the disturbance of wind, waves and currents. And then, for better control performance, when the training of Q value-table, the states stochastic initialize of each episode is not equiprobability, different state has different weights. The simulation results of both adaptive ADRC and linear active disturbance rejection control (LADRC) show that the proposed algorithm has the advantages of robustness and higher tracking precision.
Published: 2020

23. Nash Q-learning based equilibrium transfer for integrated energy management game with We-Energy

Author: Lingxiao Yang, Qiuye Sun, Dazhong Ma, and Qinglai Wei
Subjects: 0209 industrial biotechnology, Mathematical optimization, Computer science, business.industry, Energy management, Cognitive Neuroscience, Computation, Q-learning, Mode (statistics), 02 engineering and technology, Computer Science Applications, Renewable energy, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business, Energy (signal processing), Randomness
Abstract: This paper proposes an innovative energy interacting unit (“We-Energy”) with the characteristic of full duplex trading mode. In order to manage all the We-Energies in an optimal way, a new integrated energy management framework based on a noncooperative game is performed so as to allocate the energy demands of each WE such that the benefit of each WE can be maximized. To overcome the impact of the randomness and inaccurate information of renewable energy sources, Nash Q-learning algorithm is applied for computation of game equilibrium under the unknown environment. The novelty of the proposed algorithms is related to the incorporation of the continuous action space into the discrete adaptive action set and combined the equilibrium transfer to improve the efficiency of the algorithm. Simulation studies of modified IMS confirm that it has a better performance with the desired equilibrium strategy and convergence speed.
Published: 2020

24. Target transfer Q-learning and its convergence analysis

Author: Tie-Yan Liu, Zhi-Ming Ma, Yuting Liu, Wei Chen, and Yue Wang
Subjects: 0209 industrial biotechnology, Mathematical optimization, business.industry, Heuristic, Computer science, Cognitive Neuroscience, Deep learning, Q-learning, 02 engineering and technology, Computer Science Applications, 020901 industrial engineering & automation, Rate of convergence, Artificial Intelligence, Bellman equation, Convergence (routing), 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, 020201 artificial intelligence & image processing, Artificial intelligence, Transfer of learning, business
Abstract: Reinforcement Learning (RL) technologies are powerful to learn how to interact with environments and have been successfully applied to various important applications. Q-learning is one of the most popular methods in RL, which leverages the Bellman equation to update the Q-function. Considering that data collection in RL is both time and cost consuming and Q-learning converges slowly, different kinds of transfer RL algorithms are designed to improve the sample complexity of the new tasks 1 . However, most of the previous transfer RL algorithms are similar to the transfer learning methods in deep learning and are heuristic with no theoretical guarantee of the convergence rate. Therefore, it is important for us to clearly understand how and when will transfer learning help RL method and provide the theoretical guarantee for the improvement of the sample complexity. In this paper, we rethink the transfer Rl problems in the RL perspective and propose to transfer the Q-function learned in the old task to the target Q-function in the Q-learning of the new task. We call this new transfer Q-learning method target transfer Q-Learning (abbrev. TTQL). The transfer process is controlled by the error condition which can help to avoid the harm to the new tasks brought by the transferred target. We design the error condition in TTQL as whether the Bellman error of the transferred target Q-function is less than the current Q-function. We show that TTQL with the error condition will achieve a faster convergence rate than Q-learning. Our experiments are consistent with our theoretical results and verify the effectiveness of our proposed target transfer Q-learning method.
Published: 2020

25. Learning methodologies for wireless big data networks: A Markovian game-theoretic perspective.

Author: Yang, Chungang
Subjects: *MACHINE learning, *BIG data, *MARKOV processes, *GAME theory, *MATHEMATICAL models
Abstract: Wireless big data significantly challenges the current network management and control architecture, mathematical modeling techniques, and distributed algorithm design, in particular, in the promising cognitive, distributed, and ultra-dense networks. Motivated by the idea of divide-and-conquer, in this article, we first present a multiple cognitive agent-based divide-and-conquer network management and control architecture. Furthermore, a Markovian game-theoretic modeling framework is proposed to model the state big data-based decision-making problem. Then, we investigate various learning methodologies with respect to different kinds of the state information, in particular, we concentrate on the construction of state space, the state transition computation, and the convergence of parallel Q -learning technique. This work provides a suitable network management architecture, an effective modeling tool, and various learning techniques for wireless big data networks. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

26. An improved fuzzy ARTMAP and Q-learning agent model for pattern classification

Author: Chee Peng Lim, Choo Jun Tan, Xizhao Wang, Ran Wang, Farhad Pourpanah, and Manjeevan Seera
Subjects: 0209 industrial biotechnology, Artificial neural network, Computer science, business.industry, Cognitive Neuroscience, Multi-agent system, Bayesian probability, Data classification, Q-learning, Pattern recognition, 02 engineering and technology, Fuzzy logic, Fault detection and isolation, Computer Science Applications, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: The Fuzzy ARTMAP (FAM) network is an online supervised neural network that operates by computing the similarity level between the new sample and those prototype nodes stored in its network against a threshold. In our previous study, we have developed a multi-agent system consisting of an ensemble of FAM networks and Q-learning, known as QMACS, for data classification. In this paper, an Improved QMACS (IQMACS) model with trust measurement using a combination of Q-learning and Bayesian formalism is proposed. A number of benchmark and real-world problems, i.e., motor fault detection and human motion detection, are conducted to evaluate the effectiveness of IQMACS. Statistical features are extracted from real-world case studies and utilized for classification with IQMACS, QMACS, and their constituents. The experimental results indicate that IQMACS produces better classification performance by combining the outcomes of its constituents as compared with those of QMACS and other related methods.
Published: 2019

27. Data-driven model-free slip control of anti-lock braking systems using reinforcement Q-learning

Author: Mircea-Bogdan Radac and Radu-Email Precup
Subjects: 0209 industrial biotechnology, Adaptive control, Artificial neural network, Computer science, Cognitive Neuroscience, Q-learning, 02 engineering and technology, Slip (materials science), Optimal control, Computer Science Applications, Nonlinear system, Anti-lock braking system, 020901 industrial engineering & automation, Artificial Intelligence, Control theory, Control system, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, 020201 artificial intelligence & image processing, Slip (vehicle dynamics)
Abstract: This paper proposes the design and implementation of a model-free tire slip control for a fast and highly nonlinear Anti-lock Braking System (ABS). A reinforcement Q-learning optimal control approach is inserted in a batch neural fitted scheme using two neural networks to approximate the value function and the controller, respectively. The transition samples required for learning high performance control can be collected by interacting with the process either by online exploiting the current iteration controller (or policy) under an e-greedy exploration strategy, or by using data collected under any other controller that is capable of ensuring efficient exploration of the action-state space. Both approaches are highlighted in the paper. Fortunately, the ABS process fits this type of learning-by-interaction because it does not need an initial stabilizing controller. The validation case studies conducted on a real laboratory setup reveal that high control system performance can be achieved using the proposed approaches. Insightful comments on the observed control behavior are offered along with performance comparisons with several types of model-based and model-free controllers including relay, model-based optimal PI, an original model-free neural network state-feedback VRFT controller and a model-free neural network adaptive actor-critic one. With the ability to improve control performance starting from different supervisory controllers or to learn high performance controllers from scratch, the proposed Q-learning optimal control approach proves its performance in a wide operating range and is therefore recommended to its industrial application on ABS.
Published: 2018

28. Identification and off-policy learning of multiple objectives using adaptive clustering

Author: Erik Wilhelm and Thommen George Karimpanal
Subjects: FOS: Computer and information sciences, Computer Science - Artificial Intelligence, Process (engineering), Computer science, Cognitive Neuroscience, Q-learning, 02 engineering and technology, Machine learning, computer.software_genre, 050105 experimental psychology, Artificial Intelligence, Bellman equation, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, 0501 psychology and cognitive sciences, Cluster analysis, business.industry, 05 social sciences, Computer Science Applications, Identification (information), Artificial Intelligence (cs.AI), Unsupervised learning, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Energy (signal processing)
Abstract: In this work, we present a methodology that enables an agent to make efficient use of its exploratory actions by autonomously identifying possible objectives in its environment and learning them in parallel. The identification of objectives is achieved using an online and unsupervised adaptive clustering algorithm. The identified objectives are learned (at least partially) in parallel using Q-learning. Using a simulated agent and environment, it is shown that the converged or partially converged value function weights resulting from off-policy learning can be used to accumulate knowledge about multiple objectives without any additional exploration. We claim that the proposed approach could be useful in scenarios where the objectives are initially unknown or in real world scenarios where exploration is typically a time and energy intensive process. The implications and possible extensions of this work are also briefly discussed., Comment: Accepted in Neurocomputing: Special Issue on Multiobjective Reinforcement Learning: Theory and Applications, 24 pages, 6 figures
Published: 2017

29. Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions

Author: Wei, Qinglai, Zhang, Huaguang, and Dai, Jing
Subjects: *NONLINEAR systems, *DYNAMIC programming, *QUADRATIC forms, *RICCATI equation, *MATHEMATICAL models, *STOCHASTIC convergence
Abstract: Abstract: In this paper, a forward-in-time optimal control method for a class of discrete-time nonlinear systems with general multiobjective performance indices is proposed with unknown system dynamics. The proposed approximate dynamic programming (ADP) method aims to find out the increments of both the controls and states instead of computing the controls and states directly. Using the technique of dimension augment, the vector-valued performance indices are transformed into additive quadratic form which satisfies the corresponding discrete-time algebraic Riccati equation (DTARE). Both the action and critic networks can be adaptively tuned by adaptive critic methods without the information of the system model. The convergence property is guaranteed by a rigorous mathematical proof and finally the simulation results show the effectiveness of the method. [Copyright &y& Elsevier]
Published: 2009
Full Text: View/download PDF

30. A novel framework for automatic generation of fuzzy neural networks

Author: Er, Meng Joo and Zhou, Yi
Subjects: *ARTIFICIAL intelligence, *ARTIFICIAL neural networks, *ROBOT control systems, *MOBILE robots, *MACHINE theory
Abstract: Abstract: In this paper, a novel framework for automatic generation of fuzzy neural networks (FNNs) termed hierarchically generated fuzzy neural networks (HGFNN) is proposed for realizing machine intelligence. Human intelligence in organizing companies in a civic society has been adopted in this framework. In the HGFNN framework, an FNN is regarded as a company and fuzzy rules are considered as employees of the company. Analogous to the management of a company, three criteria, namely client satisfaction, performance evaluation and cost minimization, have been proposed. Simulation studies on mobile robot control demonstrate that the proposed method is superior to other existing approaches. [Copyright &y& Elsevier]
Published: 2008
Full Text: View/download PDF

31. Online adaptive policies for ensemble classifiers

Author: Dimitrakakis, Christos and Bengio, Samy
Subjects: *ALGORITHMS, *ALGEBRA, *ARTIFICIAL neural networks, *ARTIFICIAL intelligence
Abstract: Abstract: Ensemble algorithms can improve the performance of a given learning algorithm through the combination of multiple base classifiers into an ensemble. In this paper, we attempt to train and combine the base classifiers using an adaptive policy. This policy is learnt through a Q-learning inspired technique. Its effectiveness for an essentially supervised task is demonstrated by experimental results on several UCI benchmark databases. [Copyright &y& Elsevier]
Published: 2005
Full Text: View/download PDF

32. Learning methodologies for wireless big data networks: A Markovian game-theoretic perspective

Author: Chungang Yang
Subjects: 020203 distributed computing, business.industry, Computer science, Cognitive Neuroscience, Multi-agent system, Distributed computing, Big data, Q-learning, 020206 networking & telecommunications, 02 engineering and technology, Computer Science Applications, Network management, Artificial Intelligence, Distributed algorithm, 0202 electrical engineering, electronic engineering, information engineering, State space, Wireless, State (computer science), business
Abstract: Wireless big data significantly challenges the current network management and control architecture, mathematical modeling techniques, and distributed algorithm design, in particular, in the promising cognitive, distributed, and ultra-dense networks. Motivated by the idea of divide-and-conquer, in this article, we first present a multiple cognitive agent-based divide-and-conquer network management and control architecture. Furthermore, a Markovian game-theoretic modeling framework is proposed to model the state big data-based decision-making problem. Then, we investigate various learning methodologies with respect to different kinds of the state information, in particular, we concentrate on the construction of state space, the state transition computation, and the convergence of parallel Q-learning technique. This work provides a suitable network management architecture, an effective modeling tool, and various learning techniques for wireless big data networks.
Published: 2016

33. Nonlinear neuro-optimal tracking control via stable iterative Q-learning algorithm

Author: Ruizhuo Song, Qinglai Wei, and Qiuye Sun
Subjects: Mathematical optimization, Q-function, Iterative method, Cognitive Neuroscience, Iterative learning control, Q-learning, Monotonic function, Optimal control, Computer Science Applications, Dynamic programming, Nonlinear system, Artificial Intelligence, Control theory, Mathematics
Abstract: This paper discusses a new policy iteration Q-learning algorithm to solve the infinite horizon optimal tracking problems for a class of discrete-time nonlinear systems. The idea is to use an iterative adaptive dynamic programming (ADP) technique to construct the iterative tracking control law which makes the system state track the desired state trajectory and simultaneously minimizes the iterative Q function. Via system transformation, the optimal tracking problem is transformed into an optimal regulation problem. The policy iteration Q-learning algorithm is then developed to obtain the optimal control law for the regulation system. Initialized by an arbitrary admissible control law, the convergence property is analyzed. It is shown that the iterative Q function is monotonically non-increasing and converges to the optimal Q function. It is proven that any of the iterative control laws can stabilize the transformed nonlinear system. Two neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of policy iteration Q-learning algorithm. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.
Published: 2015

34. Adaptive feature descriptor selection based on a multi-table reinforcement learning strategy

Author: Ricardo Toledo, Angel D. Sappa, and Monica Piñol
Subjects: Learning classifier system, Computer science, business.industry, Cognitive Neuroscience, Q-learning, Scale-invariant feature transform, Object (computer science), Machine learning, computer.software_genre, Computer Science Applications, Set (abstract data type), Artificial Intelligence, Feature (computer vision), Table (database), Reinforcement learning, Artificial intelligence, business, computer
Abstract: This paper presents and evaluates a framework to improve the performance of visual object classification methods, which are based on the usage of image feature descriptors as inputs. The goal of the proposed framework is to learn the best descriptor for each image in a given database. This goal is reached by means of a reinforcement learning process using the minimum information. The visual classification system used to demonstrate the proposed framework is based on a bag of features scheme, and the reinforcement learning technique is implemented through the Q-learning approach. The behavior of the reinforcement learning with different state definitions is evaluated. Additionally, a method that combines all these states is formulated in order to select the optimal state. Finally, the chosen actions are obtained from the best set of image descriptors in the literature: PHOW, SIFT, C-SIFT, SURF and Spin. Experimental results using two public databases (ETH and COIL) are provided showing both the validity of the proposed approach and comparisons with state of the art. In all the cases the best results are obtained with the proposed approach.
Published: 2015

35. Multi-source transfer ELM-based Q learning

Author: Yuhu Cheng, Xuesong Wang, Ge Cao, and Jie Pan
Subjects: business.industry, Active learning (machine learning), Computer science, Cognitive Neuroscience, Q-learning, Multi-task learning, Semi-supervised learning, Machine learning, computer.software_genre, Generalization error, Computer Science Applications, Inductive transfer, Artificial Intelligence, Learning disability, medicine, Unsupervised learning, Artificial intelligence, Instance-based learning, medicine.symptom, Transfer of learning, business, computer, Extreme learning machine
Abstract: Extreme learning machine (ELM) has advantages of good generalization property, simple structure and convenient calculation. Therefore, an ELM-based Q learning is proposed by using an ELM as a Q-value function approximator, which is suitable for large-scale or continuous space problems. This is the first contribution of this paper. Because the number of ELM hidden layer nodes is equal to that of training samples, large sample size will seriously affect the learning speed. Therefore, a rolling time-window mechanism is introduced into the ELM-based Q learning to reduce the size of training samples of the ELM. In addition, in order to reduce the learning difficulty of new tasks, transfer learning technology is introduced into the ELM-based Q learning. The transfer learning technology can reuse past experience and knowledge to solve current issues. Thus the second contribution is to propose a multi-source transfer ELM-based Q learning (MST-ELMQ), which can take full advantage of valuable information from multiple source tasks and avoid negative transfer resulted from irrelevant information. According to the Bayesian theory, each source task is assigned with a task transfer weight and each source sample is assigned with a sample transfer weight. The task and sample transfer weights determine the number and the manner of transfer samples. Samples with large sample transfer weights are selected from each source task, and assist Q learning agent in quick decision-making for the target task. Simulations results concerning on a boat problem show that MST-ELMQ has better performance than that of Q learning algorithms without or with a single source task, i.e., it can effectively reduce learning difficulty and find an optimal solution with fewer number of training.
Published: 2014

36. Apprenticeship learning with few examples

Author: Brahim Chaib-draa and Abdeslam Boularias
Subjects: business.industry, Generalization, Computer science, Cognitive Neuroscience, media_common.quotation_subject, Q-learning, Machine learning, computer.software_genre, Computer Science Applications, Apprenticeship learning, Artificial Intelligence, Simple (abstract algebra), Bellman equation, State space, Artificial intelligence, business, Transfer of learning, Function (engineering), computer, media_common
Abstract: We consider the problem of imitation learning when the examples, provided by an expert human, are scarce. Apprenticeship learning via inverse reinforcement learning provides an efficient tool for generalizing the examples, based on the assumption that the expert's policy maximizes a value function, which is a linear combination of state and action features. Most apprenticeship learning algorithms use only simple empirical averages of the features in the demonstrations as a statistics of the expert's policy. However, this method is efficient only when the number of examples is sufficiently large to cover most of the states, or the dynamics of the system is nearly deterministic. In this paper, we show that the quality of the learned policies is sensitive to the error in estimating the averages of the features when the dynamics of the system is stochastic. To reduce this error, we introduce two new approaches for bootstrapping the demonstrations by assuming that the expert is near-optimal and the dynamics of the system is known. In the first approach, the expert's examples are used to learn a reward function and to generate furthermore examples from the corresponding optimal policy. The second approach uses a transfer technique, known as graph homomorphism, in order to generalize the expert's actions to unvisited regions of the state space. Empirical results on simulated robot navigation problems show that our approach is able to learn sufficiently good policies from a significantly small number of examples.
Published: 2013

37. a simple reinforcement learning scheme for two-player zero-sum Markov games

Author: Marco Saerens and Benoît Frénay
Subjects: Learning classifier system, Markov chain, business.industry, Computer science, Cognitive Neuroscience, Q-learning, Markov model, Minimax, Computer Science Applications, Artificial Intelligence, Reinforcement learning, Unsupervised learning, Markov decision process, Artificial intelligence, business
Abstract: Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994.) uses this framework to model two-agent zero-sum problems and, within this context, proposes the minimax-Q algorithm. This paper reviews RL algorithms for two-player zero-sum Markov games and introduces a new, simple, fast, algorithm, called QL"2. QL"2 is compared to several standard algorithms (Q-learning, Minimax and minimax-Q) implemented with the Qash library written in Python. The experiments show that QL"2 converges empirically to optimal mixed policies, as minimax-Q, but uses a surprisingly simple and cheap updating rule.
Published: 2009

38. Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions

Author: Jing Dai, Huaguang Zhang, and Qinglai Wei
Subjects: Dynamic programming, Nonlinear system, Mathematical optimization, Artificial Intelligence, Quadratic form, Cognitive Neuroscience, Convergence (routing), Q-learning, Optimal control, Computer Science Applications, Algebraic Riccati equation, Mathematics, System dynamics
Abstract: In this paper, a forward-in-time optimal control method for a class of discrete-time nonlinear systems with general multiobjective performance indices is proposed with unknown system dynamics. The proposed approximate dynamic programming (ADP) method aims to find out the increments of both the controls and states instead of computing the controls and states directly. Using the technique of dimension augment, the vector-valued performance indices are transformed into additive quadratic form which satisfies the corresponding discrete-time algebraic Riccati equation (DTARE). Both the action and critic networks can be adaptively tuned by adaptive critic methods without the information of the system model. The convergence property is guaranteed by a rigorous mathematical proof and finally the simulation results show the effectiveness of the method.
Published: 2009

39. A novel framework for automatic generation of fuzzy neural networks

Author: Meng Joo Er and Yi Zhou
Subjects: Adaptive neuro fuzzy inference system, Fuzzy classification, Neuro-fuzzy, Computer science, business.industry, Cognitive Neuroscience, Q-learning, Machine learning, computer.software_genre, Fuzzy logic, Computer Science Applications, Fuzzy electronics, Artificial Intelligence, Fuzzy set operations, Artificial intelligence, business, Intelligent control, computer
Abstract: In this paper, a novel framework for automatic generation of fuzzy neural networks (FNNs) termed hierarchically generated fuzzy neural networks (HGFNN) is proposed for realizing machine intelligence. Human intelligence in organizing companies in a civic society has been adopted in this framework. In the HGFNN framework, an FNN is regarded as a company and fuzzy rules are considered as employees of the company. Analogous to the management of a company, three criteria, namely client satisfaction, performance evaluation and cost minimization, have been proposed. Simulation studies on mobile robot control demonstrate that the proposed method is superior to other existing approaches.
Published: 2008

40. PIRANHA: Policy iteration for recurrent artificial neural networks with hidden activities

Author: András Lőrincz and Istvan Szita
Subjects: Computer Science::Machine Learning, Computer science, Cognitive Neuroscience, Competitive learning, Q-learning, Machine learning, computer.software_genre, Connectionism, Artificial Intelligence, Reinforcement learning, Learning classifier system, Artificial neural network, business.industry, Deep learning, Online machine learning, Generalization error, Computer Science Applications, Nonlinear system, Recurrent neural network, Unsupervised learning, Feedforward neural network, Sequence learning, Artificial intelligence, Types of artificial neural networks, business, computer
Abstract: It is an intriguing task to develop efficient connectionist representations for learning long time series. Recurrent neural networks have great promises here. We model the learning task as a minimization problem of a nonlinear least-squares cost function, that takes into account both one-step and multi-step prediction errors. The special structure of the cost function is constructed to build a bridge to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm, and show that RNN training can be made to fit the reinforcement learning framework in a natural fashion. The relevance of this connection is discussed. We also present experimental results, which demonstrate the appealing properties of the unique parameter structure prescribed by reinforcement learning. Experiments cover both sequence learning and long-term prediction.
Published: 2006

41. Online adaptive policies for ensemble classifiers

Author: Samy Bengio and Christos Dimitrakakis
Subjects: Learning classifier system, Boosting (machine learning), Artificial neural network, Computer science, business.industry, Cognitive Neuroscience, Supervised learning, Q-learning, Semi-supervised learning, Machine learning, computer.software_genre, Generalization error, Ensemble learning, Computer Science Applications, Random subspace method, ComputingMethodologies_PATTERNRECOGNITION, Artificial Intelligence, Reinforcement learning, Unsupervised learning, Artificial intelligence, business, computer, Cascading classifiers
Abstract: Ensemble algorithms can improve the performance of a given learning algorithm through the combination of multiple base classifiers into an ensemble. In this paper, we attempt to train and combine the base classifiers using an adaptive policy. This policy is learnt through a Q-learning inspired technique. Its effectiveness for an essentially supervised task is demonstrated by experimental results on several UCI benchmark databases.
Published: 2005

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

41 results on '"q-learning"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources