20 results on '"Reinforced learning"'
Search Results
2. Systematic analysis of artificial intelligence in the era of industry 4.0.
- Author
-
Chen, Weiru, He, Wu, Shen, Jiayue, Tian, Xin, and Wang, Xianping
- Subjects
INDUSTRY 4.0 ,ARTIFICIAL intelligence - Abstract
Artificial Intelligence has been playing a profound role in the global economy, social progress, and people's daily life. With the increasing capabilities and accuracy of AI, the application of AI will have more impacts on manufacturing and service areas in the era of industry 4.0. This study conducts a systematic literature review to study the state-of-the-art on AI in industry 4.0. This paper describes the development of industries and the evolution of AI. This paper also identifies that the development and application of AI will bring not only opportunities but also challenges to industry 4.0. The findings provide a valuable reference for researchers and practitioners through a multi-angle systematic analysis of AI. In the era of industry 4.0, AI system will become an innovative and revolutionary assistance to the whole industry. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Machine Learning: Towards an Unified Classification Criteria
- Author
-
Burbano, Clara, Reveló, David, Mejía, Julio, Soto, Daniel, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, and Latifi, Shahram, editor
- Published
- 2021
- Full Text
- View/download PDF
4. A Study on Behavioural Agents for StarCraft 2
- Author
-
Williams, Ivan, van der Haar, Dustin, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Yang, Xin-She, editor, Sherratt, R Simon, editor, Dey, Nilanjan, editor, and Joshi, Amit, editor
- Published
- 2021
- Full Text
- View/download PDF
5. A Hypothesis on Ideal Artificial Intelligence and Associated Wrong Implications
- Author
-
Shah, Kunal, Laxkar, Pradeep, Chakrabarti, Prasun, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Choudhury, Sushabhan, editor, Mishra, Ranjan, editor, Mishra, Raj Gaurav, editor, and Kumar, Adesh, editor
- Published
- 2020
- Full Text
- View/download PDF
6. The use of reinforced learning to support multidisciplinary design in the AEC industry: Assessing the utilization of Markov Decision Process.
- Author
-
BuHamdan, Samer, Alwisy, Aladdin, Danel, Thomas, Bouferguene, Ahmed, and Lafhaj, Zoubeir
- Subjects
MARKOV processes ,DECISION making ,ARCHITECTURAL practice ,ARTIFICIAL intelligence ,ARCHITECTURAL design - Abstract
While the design practice in the architecture, engineering, and construction (AEC) industry continues to be a creative activity, approaching the design problem from a perspective of the decision-making science has remarkable potentials that manifest in the delivery of high-performing sustainable structures. These possible gains can be attributed to the myriad of decision-making tools and technologies that can be implemented to assist design efforts, such as artificial intelligence (AI) that combines computational power and data wisdom. Such combination comes to extreme importance amid the mounting pressure on the AEC industry players to deliver economic, environmentally friendly, and socially considerate structures. Despite the promising potentials, the utilization of AI, particularly reinforced learning (RL), to support multidisciplinary design endeavours in the AEC industry is still in its infancy. Thus, the present research discusses developing and applying a Markov Decision Process (MDP) model, an RL application, to assist the preliminary multidisciplinary design efforts in the AEC industry. The experimental work shows that MDP models can expedite identifying viable design alternatives within the solutions space in multidisciplinary design while maximizing the likelihood of finding the optimal design. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Hierarchical Associative Memory Model for Artificial General-Purpose Cognitive Agents.
- Author
-
Stepankov, Vladimir Y. and Dushkin, Roman V.
- Subjects
ARTIFICIAL intelligence ,LONG-term memory ,INTELLIGENT agents ,MACHINE learning ,COGNITIVE ability ,COACHING psychology - Abstract
This paper presents a model of hierarchical associative memory, which can be used as a basis for building artificial cognitive agents of general purpose. With the help of this model, one of the most important problems of modern machine learning and artificial intelligence in general can be solved — the ability for a cognitive agent to use "life experience" to process the context of the situation in which he was, is and, possibly, will be. This model is applicable for artificial cognitive agents functioning both in specially designed virtual worlds and in objective reality. The use of hierarchical associative memory as a long-term memory of artificial cognitive agents will allow the latter to effectively navigate both in the general knowledge accumulated by mankind and in their life experience. The novelty of the presented work is based on the author's approach to the construction of context-dependent artificial cognitive agents using an interdisciplinary approach, in particular, based on the achievements of artificial intelligence, cognitology, neurophysiology, psychology and sociology. The relevance of this work is based on the keen interest of the scientific community and the high social demand for the creation of general-level artificial intelligence systems. Associative hierarchical memory, based on the use of an approach similar to the hypercolumns of the human cerebral cortex, is becoming one of the important components of an artificial intelligent agent of the general level. The article will be of interest to all researchers working in the field of building artificial cognitive agents and related fields. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. The European vision and research directions in the Cloud-Edge-IoT domain for 2025-2027 (Distillation of the Concertation Meeting in Brussels on May 11, 2023)
- Author
-
Majid, Amjad Yousef and Rimassa, Giovanni
- Subjects
Cybersecurity ,Virtual Worlds ,Industrial Metaverse ,Open Continuum ,Energy-Aware ,Metaverse standard form ,Artificial Intelligence ,SecDevOps ,Optimisation ,Federation ,European Commission ,Autonomy ,Metaweb ,Business model ,5G/6G ,Hyperdistribution ,Open Source ,DevOps ,Cognitive Cloud-Edge Continuum ,Distributed AI ,Software Engineering ,RISC-V ,Traceability ,Any-to-Any Infrastructure ,Interoperability ,Advanced Mechanisms ,Reinforcement Learning ,Reinforced learning ,Digital Twins ,MetaOs for IoT-Edge ,Privacy ,Computing Continuum Infrastructure ,Quantum Computing ,Distributed Intelligence ,End-to-End AI - Abstract
Extended version of the executive summary that incorporates presentations from the Consultation meeting held in Brussels on May 11, 2023.
- Published
- 2023
- Full Text
- View/download PDF
9. Artificial Intelligence, New Technologies, and Academic Integrity Concerns: A Positive Fightback Through Teaching Effectiveness and Reinforced Learning
- Author
-
Yoosefdoost, Arash, Jariwala, Hiral, and Santos, Rafael M.
- Subjects
Reinforced Learning ,ChatGPT ,Artificial Intelligence ,Academic Integrity ,Active Teaching ,Teaching Effectiveness ,Active Learning - Abstract
The advancement of computers and information technologies introduces modern threats to the traditional education system and raised new concerns even about academic integrity. This work explores a positive fightback approach to combat academic dishonesty by improving students’ experience and learning rate through active learning, interactive learning and leadership techniques to enhance teaching effectiveness. Evaluations of the method through an anonymized survey suggest promising effectiveness in achieving the learning objectives and mitigating the challenges posed by modern education technology by offering competitive advantages in improving the student experience., {"references":["International Institute of Tropical Agriculture, \"Library users reading at library in 80's and 90's,\" Flickr, 2012. https://www.flickr.com/photos/iita-media-library/8205730904/in/photostream/","Bobba, \"Customers be like 'Why is my PC slow to load Google?,'\" Reddit, 2021. https://www.reddit.com/r/pcmasterrace/comments/kmuv09/customers_be_like_why_is_my_pc_slow_to_load_google/","J. Fedewa, \"ChatGPT: How to Use the AI Chatbot for Free,\" How-to Geek, 2023. https://www.howtogeek.com/871065/chatgpt-how-to-use-the-ai-chatbot-for-free/","V. Laurentius, \"A university class, Bologna (1350s),\" DIRECTMEDIA Publishing GmbH. DIRECTMEDIA Publishing GmbH, 1300. Accessed: May 08, 2023. https://commons.wikimedia.org/wiki/File:Laurentius_de_Voltolina_001.jpg","UNI, \"Classrooms from the Late 1800s to the Early 1900s,\" University of Northern Iowa, 2019. https://scua.library.uni.edu/university-archives/articles/classrooms-late-1800s-early-1900s","J. P. Colomé, \"AI in the classroom: 'I require my students to use ChatGPT,'\" EL PAÍS, 2023, Photo by: Pablo Lasaosa. https://english.elpais.com/science-tech/2023-02-24/ai-in-the-classroom-i-require-my-students-to-use-chatgpt.html","J. Koblin, \"The Dunning Kruger Effect,\" Sprouts, Jul. 12, 2021. https://sproutsschools.com/the-dunning-kruger-effect/"]}
- Published
- 2023
- Full Text
- View/download PDF
10. Continuous learning of emergent behavior in robotic matter
- Author
-
Clara Miette, Giorgio Oliveri, Cesare Carissimo, Johannes T. B. Overvelde, and Lucas C. van Laake
- Subjects
dynamic environment ,Scheme (programming language) ,0209 industrial biotechnology ,Computer science ,Distributed computing ,02 engineering and technology ,Space exploration ,Engineering ,reinforced learning ,020901 industrial engineering & automation ,Control theory ,Reinforcement learning ,computer.programming_language ,Multidisciplinary ,business.industry ,modular robot ,Robotics ,emergent behavior ,Modular design ,021001 nanoscience & nanotechnology ,Physical Sciences ,Scalability ,Robot ,Artificial intelligence ,0210 nano-technology ,business ,computer - Abstract
Significance In the last century, robots have been revolutionizing our lives, augmenting human actions with greater precision and repeatability. Unfortunately, most robotic systems can only operate in controlled environments. While increasing the complexity of the centralized controller is an instinctive direction to enable robots that are capable of autonomously adapting to their environment, there are ample examples in nature where adaptivity emerges from simpler decentralized processes. Here we perform experiments and simulations on a modular and scalable robotic platform in which each unit is stochastically updating its own behavior to explore requirements needed for a decentralized learning strategy capable of achieving locomotion in a continuously changing environment or when undergoing damage., One of the main challenges in robotics is the development of systems that can adapt to their environment and achieve autonomous behavior. Current approaches typically aim to achieve this by increasing the complexity of the centralized controller by, e.g., direct modeling of their behavior, or implementing machine learning. In contrast, we simplify the controller using a decentralized and modular approach, with the aim of finding specific requirements needed for a robust and scalable learning strategy in robots. To achieve this, we conducted experiments and simulations on a specific robotic platform assembled from identical autonomous units that continuously sense their environment and react to it. By letting each unit adapt its behavior independently using a basic Monte Carlo scheme, the assembled system is able to learn and maintain optimal behavior in a dynamic environment as long as its memory is representative of the current environment, even when incurring damage. We show that the physical connection between the units is enough to achieve learning, and no additional communication or centralized information is required. As a result, such a distributed learning approach can be easily scaled to larger assemblies, blurring the boundaries between materials and robots, paving the way for a new class of modular “robotic matter” that can autonomously learn to thrive in dynamic or unfamiliar situations, for example, encountered by soft robots or self-assembled (micro)robots in various environments spanning from the medical realm to space explorations.
- Published
- 2021
- Full Text
- View/download PDF
11. Framtidssäkring av datorspelsagenter med förstärkningsinlärning och Unity ML-Agents
- Author
-
Andersson, Pontus
- Subjects
Reinforced Learning ,Behavior ,Learning Environment ,artificiell intelligens ,Unity ,Computer Agents ,Computer Sciences ,RL ,agent ,Agents ,inlärningsmiljö ,ML ,beteende ,förstärkningsinlärning ,Machine Learning ,Algorithm ,Datavetenskap (datalogi) ,ML-Agents ,maskininlärning ,Artificial Intelligence ,ai ,algoritm ,Machine Learning Toolkit ,datoragenter - Abstract
In later years, a number of simulation platforms has utilized video games as training grounds for designing and experimenting with different Machine Learning algorithms. One issue for many is that video games usually do not provide any source code. The Unity ML-Agents toolkit provides both example environments and state-of-the-art Machine Learning algorithms in an attempt solve this. This has sparked curiosity in a local game company which wished to investigate the incorporation of machine-learned agents into their game using the toolkit. As such, the goal was to produce high performing, integrable agents capable of completing locomotive tasks. A pilot study was conducted which contributed with insight in training functionality and aspect which were important to producing a robust behavior model. With the use of Proximal Policy Optimization and different training configurations several neural network models were produced and evaluated on existing and new data. Several of the produced models displayed promising results but did not achieve the defined success rate of 80%. With some additional testing it is believed that the desired result could be reached. Alternatively, different aspect of the toolkit like Soft Actor Critic and Curriculum Learning could be investigated. På senare tid har ett handfull simulationsplattformar använt datorspel som en träningsmiljö för att designa och experimentera med olika maskininlärningsalgoritmer. Ett problem för många är att dessa spel vanligtvis inte tillhandahåller någon källkod. Unity ML-Agents toolkit ämnar lösa behovet genom att erbjuda befintliga träningsmiljöer tillsammans med de senaste maskininlärningsalgoritmerna. Detta har väckt intresset hos ett lokalt spelföretag som vill undersöka möjligheten att integrera maskininlärda agenter i ett av deras spel. Som följd formulerades målet att skapa högpresterande och integrerbara agenter kapabla att utföra lokomotoriska uppgifter. En förstudie genomfördes och tillhandagav nyttig information om träningsfunktionalitet och kringliggande aspekter om att producera robusta beteendemodeller. Med hjälp av proximal policyoptimering och olika träningskonfigurationer skapades modeller av neurala nätverk som utvärderades på befintlig respektive ny data. Flertalet modeller visade lovande resultat men ingendera nådde det specificerade prestandamålet på 80%. Tron är att med ytterligare tester hade ett önskat resultat kunnat bli nått. Fortsättningsvis är det även möjligt att undersöka andra lärotekniker inkluderade i ML-Agent verktyget.
- Published
- 2021
12. Learning a dynamic policy by using policy gradient: application to biped walking.
- Author
-
Matsubara, Takamitsu, Morimoto, Jun, Nakanishi, Jun, Sato, Masa-Aki, and Doya, Kenji
- Subjects
REINFORCEMENT learning ,BIPEDALISM ,STOCHASTIC learning models ,ROBOTICS ,COMPUTATIONAL learning theory ,ARTIFICIAL intelligence - Abstract
This paper discusses a method in which periodic motion, such as biped walking, is acquired by the framework of reinforced learning. For an object with multiple degrees of freedom, it is in general difficult to acquire the policy by reinforced learning. There must be a framework of learning which is matched to the motion task under consideration. From this viewpoint, this paper proposes “dynamic policy,” that is, a policy with internal state and dynamics. Its learning procedure is then considered. When periodic motion such as biped walking is to be acquired by learning, dynamical entrainment to the control object can be utilized if the policy has internal dynamics. Furthermore, by providing a policy with internal state, the controller is made robust to time delay and noise in the sensor inputs. The use of the policy gradient is discussed as a method of learning the policy without explicitly considering its internal state. As an example, dynamic policy is applied to a three-link biped walking robot model. The dynamic policy in this example is realized by a feedback controller based on a neural oscillator and sensor inputs. By using the policy gradient, appropriate rules for sensor feedback to the neural oscillator are learned by trial and error. It is shown by simulation that a dynamic policy realizing biped walking is acquired. The method is further applied to a five-link biped walking robot in order to investigate the possibility of extension to a system with multiple degrees of freedom. © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(4): 25–38, 2007; Published online in Wiley InterScience (
www.interscience.wiley.com ). DOI 10.1002/scj.20441 [ABSTRACT FROM AUTHOR]- Published
- 2007
- Full Text
- View/download PDF
13. Analysis and Adaptation of Q-Learning Algorithm to Expert Controls of a Solar Domestic Hot Water System
- Author
-
Roberto Fedrizzi, Raul Mario del Toro Matamoros, Davide Bettoni, and Anton Soppelsa
- Subjects
Computer science ,020209 energy ,Circulator ,fuzzy control ,02 engineering and technology ,Fuzzy logic ,lcsh:Technology ,Industrial and Manufacturing Engineering ,domestic hot water systems ,reinforced learning ,Artificial Intelligence ,Control theory ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,lcsh:T ,Applied Mathematics ,lcsh:T57-57.97 ,Process (computing) ,Function (mathematics) ,Fuzzy control system ,solar ,Human-Computer Interaction ,Control and Systems Engineering ,lcsh:Applied mathematics. Quantitative methods ,020201 artificial intelligence & image processing ,simulations ,Membership function ,Information Systems - Abstract
This paper discusses the development of a coupled Q-learning/fuzzy control algorithm to be applied to the control of solar domestic hot water systems. The controller brings the benefit of showing performance in line with the best reference controllers without the need for devoting time to modelling and simulations to tune its parameters before deployment. The performance of the proposed control algorithm was analysed in detail concerning the input membership function defining the fuzzy controller. The algorithm was compared to four standard reference control cases using three performance figures: the seasonal performance factor of the solar collectors, the seasonal performance factor of the system and the number of on/off cycles of the primary circulator. The work shows that the reinforced learning controller can find the best performing fuzzy controller within a family of controllers. It also shows how to increase the speed of the learning process by loading the controller with partial pre-existing information. The new controller performed significantly better than the best reference case with regard to the collectors&rsquo, performance factor (between 15% and 115%), and at the same time, to the number of on/off cycles of the primary circulator (1.2 per day down from 30 per day). Regarding the domestic hot water performance factor, the new controller performed about 11% worse than the best reference controller but greatly improved its on/off cycle figure (425 from 11,046). The decrease in performance was due to the choice of reward function, which was not selected for that purpose and it was blind to some of the factors influencing the system performance factor.
- Published
- 2019
14. Pekiştirmeli öğrenme tabanlı robotlar ile yeni bir robocode savaş stratejisi
- Author
-
Kayakökü, Hakan, Güzel, Mehmet Serdar, and Bilgisayar Mühendisliği Anabilim Dalı
- Subjects
Artificial intelligence ,Artificial neural networks ,Machine learning ,Deep learning ,Computer Engineering and Computer Science and Control ,Reinforced learning ,Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol - Abstract
Son yıllarda yapay zeka, makine öğrenmesi ve robotik konularındaki araştırmalar teknolojinin gelişmesi ve sektörün istekleri ile birlikte önceki senelere göre dikkate değer gelişmeler göstermiştir. Literatürde bahsi geçen konular üzerine yapılan araştırmalarda görülmüştür ki, yapay zeka uygulamalarının en önemli uygulama alanlarından birisi de yapay zeka birimlerinin dışarıdan herhangi bir müdahale olmadan hareket edebilmesi anlamı gelen otomatik kontroldür. Bu kavrama çözüm niteliğinde geliştirilen pekiştirmeli öğrenme, davranış biliminden ilham alınan, ajanların tanımlı bir çevrede en yüksek ödül miktarına ulaşabilmesi için hangi eylemleri yapması gerektiği bilgisini ajana sunan oldukça başarılı bir makine öğrenmesi modelidir. Bu çalışmada makine öğrenmesi konusu kapsamında geliştirilen pekiştirmeli öğrenme tabanlı algoritmaların RoboCode adı verilen 2001 yılında tasarlanmış, Java programlama dili tabanlı, içerisindeki sanal robotların programlanarak yönlendirilebildiği bir ortam sunan simülasyon içerisinde geliştirilmesini konusu incelenmiştir. Yapay zeka alanındaki gelişmeler ile birlikte RoboCode simülasyonu yapay zeka algoritmalarının uygulandığı, kendi kendine öğrenebilen robotların savaştırıldığı, geliştirilen yapay zekâ algoritmalarının güçlerinin ölçüldüğü bir ortam haline gelmiştir. Simülasyon, literatürde geliştirilen yapay zeka algoritmaların sonuçlarının görülmesi ve diğer algoritmalara sahip robotlar ile savaştırılarak algoritma analizi yapılabilmesi açısından büyük önem taşımaktadır. Önerilen tez konusunun amacı RoboCode simülasyonunda geliştirilecek evrişimsel sinirsel ağ destekli pekiştirmeli öğrenme algoritması ile hedef robotu arena üzerinde eğitmek ve eğitilen robotun arenada diğer robotlara karşı galip gelmesidir. Bu kapsamda, arenadan elde edilen görüntüler dinamik olarak oluşturulmuş ve geliştirilen robot bu görüntüler vasıtasıyla eğitilmiştir. Eğitilen model, önceden tanımlı rakip robotlar ile birebir olarak savaştırılmış ve bu savaşlar sonucunda rakip robotlara karşı elde edilen skorlar ile modelin önemli ölçüde başarılı olduğu gözlemlenmiştir. In the last decade, researches on artificial intelligence, machine learning and robotics have shown remarkable improvements compared to previous years with the development of technology and the demands of the sector. Researches on the subjects mentioned in the literature have shown that one of the most important application areas of artificial intelligence applications is the automatic control, which means that artificial intelligence units can move without any external intervention. Reinforcement learning, developed as a solution to this concept, is a highly successful model of machine learning, inspired by behavioral science, which provides the agent with information on what actions agents must take to achieve the highest amount of rewards in a defined environment. In this study, reinforcement learning based algorithms examined in the scope of machine learning for RoboCode simulation, which is designed in 2001 and based on Java programming language which provides an environment for virtual robots that can be programmed and controlled with program. In the simulation, robots with different characters are fought in an arena provided by the simulation. With the developments in the field of artificial intelligence, RoboCode simulation has become an environment where artificial intelligence algorithms are applied. Using RoboCode, self-learning robots are fought and the power of artificial intelligence algorithms is measured. This simulation is great importance in order to see the results of artificial intelligence algorithms developed in the literature and to perform algorithm analysis by combating robots with other algorithms. This thesis proposed to train the target robot on the arena with the convolutional neural network supported reinforcement learning algorithm to be developed in RoboCode simulation and train the robot to win over the other robots in the arena. In this context, the images obtained from the arena were created dynamically and the developed robot was trained by these images. The trained model was battled one-on-one with the pre-defined rival robots, and as a result of these battles the scores obtained against the rival robots were observed to be significantly successful. 63
- Published
- 2019
15. Image-based visual servoing controller for multirotor aerial robots using deep reinforcement learning
- Author
-
Alejandro Rodriguez-Ramos, Carlos Sampedro, Ignacio Gil, Luis Mejias, and Pascual Campoy
- Subjects
Reinforced Learning ,0209 industrial biotechnology ,Computer science ,business.industry ,Deep learning ,080100 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING ,090100 AEROSPACE ENGINEERING ,02 engineering and technology ,UAVs ,Visual servoing ,Object (computer science) ,Visual Servoing ,090602 Control Systems Robotics and Automation ,020901 industrial engineering & automation ,Deep Learning ,Control theory ,0202 electrical engineering, electronic engineering, information engineering ,Robot ,Reinforcement learning ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Multirotor ,business - Abstract
In this paper, we propose a novel Image-Based Visual Servoing (IBVS) controller for multirotor aerial robots based on a recent deep reinforcement learning algorithm named Deep Deterministic Policy Gradients (DDPG). The proposed RL-IBVS controller is successfully trained in a Gazebo-based simulated environment in order to learn the appropriate IBVS policy for directly mapping a state, based on errors in the image, to the linear velocity commands of the aerial robot. A thorough validation of the proposed controller has been conducted in simulated and real flight scenarios, demonstrating outstanding capabilities in object following applications. Moreover, we conduct a detailed comparison of the RL-IBVS controller with respect to classic and partitioned IBVS approaches.
- Published
- 2018
16. An Approach for the Binding Problem Based on Brain-oriented Autonomous Adaptation System with Object Handling Functions
- Author
-
Yasuo Kinouchi and Kenneth J. Mackin
- Subjects
autonomous adaptation ,Artificial neural network ,object file ,Computer science ,business.industry ,Object (computer science) ,Method ,reinforced learning ,Adaptive system ,binding problem ,nonlinear programming ,pyramidal neuron ,General Earth and Planetary Sciences ,Object model ,Reinforcement learning ,Binding problem ,Artificial intelligence ,business ,General Environmental Science - Abstract
An approach for the binding problem is proposed, based on an autonomous adaptive system designed using artificial neural networks with object handling functions. Object handling functionality, such as object files, has been reported to have a relationship with perception, and working memory. However, in order for a brain-oriented system to decide actions based on object handling, the system must clarify the “binding problem”, or the problem of processing different attributes such as shape, color and location in parallel, then binding these multiple attributes as a single object. The proposed system decides semi-optimum actions by combining nonlinear programming and reinforced learning. By the introduction of artificial neural networks based on dendritic structures of pyramidal neurons in the cerebral cortex, together with a mechanism for dynamically linking nodes to objects, it is shown that deciding actions and learning as a whole system, based on binding object attributes and location, is possible. The proposed features are verified through computer simulation results.
- Published
- 2015
- Full Text
- View/download PDF
17. Reinforcement learning in non-stationary environments using spatiotemporal analysis
- Author
-
Göncü, Burak Muhammed, Tümer, Mustafa Borahan, Bilgisayar Mühendisliği Anabilim Dalı, Tümer, M Borahan, and Bilgisayar Mühendisliği Anabilim Dalı Bilgisayar Mühendisliği Programı
- Subjects
Bilgisayar, Mühendislik ,Artificial intelligence ,Artificial neural networks ,Computer Engineering and Computer Science and Control ,Reinforced learning ,Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol - Abstract
Geleneksel pekiştirmeli öğrenme (PÖ) yöntemleri ortamın veya hedefindeğişkenlik gösterdiği durumlarda öğrenme sağlayamamaktadırlar. Bunun sebebi, PÖbiriminin hali hazırda öğrenmiş olduğu ortamı sil baştan yeniden öğrenememesidir. Busorunu çözmek amacıyla yavaş değişen ortamlarda, PÖ biriminin en son yaptığı eylemiyapmasının teşvik edildiği sezgisel yaklaşımlar olsada [Sutton And Barto (1998),Chapter9, Example9.3, p236-238], bunlar PÖ birimiyle aynı hızda hareket eden hedefleriçin yeterince hızlı sonuç vermemektedirler. Bu yazıda, yukarda belirtilen hareketlihedefler ve rekabet ortamını olduğu durumlar için yeni bir yöntem tartışacağız. Busorunun çözümü için hedefin konum-zaman bilgisi kullanılarak hazırlanan Stokastiksüreç, PÖ döngüsünde PÖ biriminin ödüllendirme mekanizmasına iliştirilip sorununçözümü için modüler bir yaklaşım sağlamış olacağız. Ayriyetten bu çalışmamızdayöntemimizin uygulanabilirliği ve performansını farklı problemler ile ölçüp AtariMs.Pacman oyunu ile değerlendireceğiz. Son olarak yazıda belirtilen yöntemin testleribaşarıyla tamamlayıp, hedef noktalarının başaralı bir şekilde tahminini sağladığını vegerekli stratejileri (pusu kurma, önünü kesme, hedefin amaçlarını anlama) uyguladığınıgörmüş olacağız. Traditional reinforcement learning (RL) approaches fail to learn a policy to attaina dynamic or non-stationary goal. The reason for this is that the RL agent cannot startlearning the changed environment from scratch once it has converged to a policy beforethe environment has changed. While heuristic solutions where the RL agent is encouragedto use least recently attempted actions are successful for slowly changing environments[Sutton And Barto (1998), Chapter9, Example9.3, p236-238], they do not form asufficiently fast solution to follow a non-stationary goal state that moves with the samevelocity of the RL agent. In this paper, we will discuss a new approach to the problemwhere there is an adversarial relation present between the dynamic goal and the RL agent.To tackle this, the spatio-temporal information of the dynamic goal state is incorporated,in terms of stochastic processes, as the rewards of the RL agent into the environmentmodel thus enabling a modular solution to the problem. In addition, in this paper wepresent the method's robustness using different mazes where we assess the performanceof our method and also test our algorithm with the Atari Ms.Pacman game for somecomplex problem solving. Finally, the results of the experiments show that our methodsuccessfully predicts the rival agent's behavior and points of interest in which the rivalagent will pass through and ambush it at key positions. 45
- Published
- 2017
18. Adaptive Semi-structured Information Extraction
- Author
-
Arpteg, Anders
- Subjects
Artificial intelligence ,Datavetenskap (datalogi) ,Information extraction ,Computer Sciences ,Knowledge management ,Semi-structured data ,Reinforced learning - Abstract
The number of domains and tasks where information extraction tools can be used needs to be increased. One way to reach this goal is to construct user-driven information extraction systems where novice users are able to adapt them to new domains and tasks. To accomplish this goal, the systems need to become more intelligent and able to learn to extract information without need of expert skills or time-consuming work from the user. The type of information extraction system that is in focus for this thesis is semistructural information extraction. The term semi-structural refers to documents that not only contain natural language text but also additional structural information. The typical application is information extraction from World Wide Web hypertext documents. By making effective use of not only the link structure but also the structural information within each such document, user-driven extraction systems with high performance can be built. The extraction process contains several steps where different types of techniques are used. Examples of such types of techniques are those that take advantage of structural, pure syntactic, linguistic, and semantic information. The first step that is in focus for this thesis is the navigation step that takes advantage of the structural information. It is only one part of a complete extraction system, but it is an important part. The use of reinforcement learning algorithms for the navigation step can make the adaptation of the system to new tasks and domains more user-driven. The advantage of using reinforcement learning techniques is that the extraction agent can efficiently learn from its own experience without need for intensive user interactions. An agent-oriented system was designed to evaluate the approach suggested in this thesis. Initial experiments showed that the training of the navigation step and the approach of the system was promising. However, additional components need to be included in the system before it becomes a fully-fledged user-driven system. Report code: LiU-Tek-Lic-2002:73.
- Published
- 2003
19. Modüler bulanık takviyeli öğrenme
- Author
-
Gültekin, İrfan, Arslan, Ahmet, and Diğer
- Subjects
Fuzzy logic ,Artificial intelligence ,Learning ,Computer Engineering and Computer Science and Control ,Reinforced learning ,Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol ,Markov decision processes - Abstract
ÖZET Yüksek Lisans Tezi MODÜLER BULANIK TAKVİYELİ ÖĞRENME İrfan GÜLTEKİN Fırat Üniversitesi Fen Bilimleri Enstitüsü Bilgisayar Mühendisliği Anabilim Dalı 2002, Sayfa: 46 Çoklu etmenli sistemlerdeki takviyeli öğrenme uygulamaları son zamanlarda oldukça ilgi çekmiştir. Çoklu etmenli sistemlerde, durum uzayının fazla olması etmenlerin öğrenmesinde büyük problem teşkil eder. Aynı ortamdaki etmenlerin birlikte hareket etmesi için, çoklu etmenli sistemlerdeki etmenlerin birbirlerinin hareketlerini değerlendirmesi ve gözlemlemesi gerekir. Bu durumda durum uzayının boyutu etmen sayısıyla eksporansiyel olarak artar. Bu tezde bu problemin çözümü için yeni bir yöntem sunulacaktır. Bu yöntemde çoklu etmenli sistemlerde, modüler mimari, dahili modelin tahmini ve bulanık mantığın avantajları birlikte kullanılmıştır. Geliştirilen koordinasyon yöntemi bir etmenin, diğer etmenlerin dahili modellerine göre hareketlerinin tahminine dayanır. Dahili model diğer etmenlerin hareketlerinin gözlemlenmesi ve değerlendirilmesi ile oluşturulur. Bulanık mantık, her öğrenme modülünün durum uzayından oluşturulan bulanık giriş kümelerinden ve hareket uzayından oluşturulan bulanık çıkış kümelerinden oluşturulur. Her öğrenme modülünün bulanık kural tabanı Q - öğrenme doğrultusunda yapılandırılır. Deneysel sonuçlar, durum uzayında gerçekleştirilen uygulamanın geçerliliğini göstermek için sunulmuştur. Anahtar kelimeler: Yapay zekâ, takviyeli öğrenme, Q öğrenme, modüler mimari, markov karar verme süreci, dahili modelin tahmini, bulanık mantık, modüler bulanık-takviyeli öğrenme. VIII ABSTRACT Masters Thesis MODULAR - FUZZY REINFORCEMENT LEARNING İrfan GÜLTEKİN Fırat University Graduate School of Natural and Applied Sciences Department of Computer Engineering 2002, Page: 46 The application of reinforcement learning to multi-agent systems has attracted recent attention. In multi-agent systems, the state space to be handled constitutes a major problem efficiently in learning of agents. In order to cooperate agents in the same environment, it is needed to observe and evaluate the action of other agents in the multi-agent system. This case increases the dimension of state space proportional to the number of agents, exponentially. This theses presents a novel approach to overcome this problem. The approach uses together the advantages of the modular architecture, estimation of internal model and fuzzy logic in multi-agent systems. In our cooperation method, one agent estimates its action according to the internal model of the other agent. The internal model is acquired by the observation and evaluation of the other agent's actions. Fuzzy logic maps from input fuzzy sets, representing state space of each learning module to the output fuzzy sets representing the action space. The fuzzy rule base of each learning module is built through the Q-learning. Experimental results handled on pursuit domain show the effectiveness and applicability of the proposed approach. Keywords: Artificial intelligence, Reinforcement learning, Q - learning, Modular architecture, Markov decision process, Estimation of internal model, Fuzzy logic, Modular - fuzzy reinforcement learning. IX 47
- Published
- 2002
20. Desarrollo de un bot para un juego de lucha mediante aprendizaje por refuerzo
- Author
-
Balaghi Buil, David and Béjar Alonso, Javier
- Subjects
Reinforced Learning ,Artificial intelligence ,Unity ,IA ,Intel·ligència artificial ,Videojocs ,videogame ,algorithms ,Video games ,Informàtica [Àrees temàtiques de la UPC] ,AI ,Machine learning ,Aprenentatge per reforç ,Aprenentatge automàtic ,algoritmes ,videojoc - Abstract
Este proyecto trata el diseño e implementación de un videojuego de lucha en 2D desarrollado en Unity, y de un bot que aprende y mejora a medida que lo juega mediante una IA que implementa técnicas de aprendizaje por refuerzo. This project's goal is to design and implement a 2D fighting game developed in Unity, and a bot that learns and improves as it plays the game, by means of an AI that implements Reinforced Learning techniques.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.