1. D2D Mobile Relaying Meets NOMA—Part II: A Reinforcement Learning Perspective
- Author
-
Essaid Sabir, Mounir Ghogho, Safaa Driouech, El-Mehdi Amhoud, Phare, LIP6, Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Ecole Nationale Supérieure d'Electricité et de Mécanique [Casablanca] (ENSEM), Université Hassan II [Casablanca] (UH2MC), Université Internationale de Rabat (UIR), and Mohammed VI Polytechnic University [Marocco] (UM6P)
- Subjects
Computer science ,Distributed computing ,distributed reinforcement learning ,02 engineering and technology ,lcsh:Chemical technology ,Biochemistry ,Article ,Nash equilibrium ,Analytical Chemistry ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,self-organized devices ,symbols.namesake ,[INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI] ,Complete information ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,lcsh:TP1-1185 ,Fading ,NOMA/OMA ,Electrical and Electronic Engineering ,Instrumentation ,ComputingMilieux_MISCELLANEOUS ,5G/B5G/6G ,Point (typography) ,Perspective (graphical) ,020206 networking & telecommunications ,Atomic and Molecular Physics, and Optics ,symbols ,D2D relaying ,020201 artificial intelligence & image processing ,biform game ,Communication channel - Abstract
Structureless communications such as Device-to-Device (D2D) relaying are undeniably of paramount importance to improving the performance of today’s mobile networks. Such a communication paradigm requires a certain level of intelligence at the device level, thereby allowing it to interact with the environment and make proper decisions. However, decentralizing decision-making may induce paradoxical outcomes, resulting in a drop in performance, which sustains the design of self-organizing yet efficient systems. We propose that each device decides either to directly connect to the eNodeB or get access via another device through a D2D link. In the first part of this article, we describe a biform game framework to analyze the proposed self-organized system’s performance, under pure and mixed strategies. We use two reinforcement learning (RL) algorithms, enabling devices to self-organize and learn their pure/mixed equilibrium strategies in a fully distributed fashion. Decentralized RL algorithms are shown to play an important role in allowing devices to be self-organized and reach satisfactory performance with incomplete information or even under uncertainties. We point out through a simulation the importance of D2D relaying and assess how our learning schemes perform under slow/fast channel fading.
- Published
- 2021