Author: "Pineau, Joelle" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Pineau, Joelle"' showing total 597 results

Start Over Author "Pineau, Joelle"

597 results on '"Pineau, Joelle"'

151. Active Learning in Partially Observable Markov Decision Processes

Author: Jaulmes, Robin, Pineau, Joelle, Precup, Doina, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Gama, João, editor, Camacho, Rui, editor, Brazdil, Pavel B., editor, Jorge, Alípio Mário, editor, and Torgo, Luís, editor
Published: 2005
Full Text: View/download PDF

152. A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions

Author: GX-Chen, Anthony, primary, Chelu, Veronica, additional, Richards, Blake A., additional, and Pineau, Joelle, additional
Published: 2022
Full Text: View/download PDF

153. Biomedical Research and Informatics Living Laboratory for Innovative Advances of New Technologies in Community Mobility Rehabilitation: Protocol for Evaluation and Rehabilitation of Mobility Across Continuums of Care

Author: Ahmed, Sara, primary, Archambault, Philippe, additional, Auger, Claudine, additional, Durand, Audrey, additional, Fung, Joyce, additional, Kehayia, Eva, additional, Lamontagne, Anouk, additional, Majnemer, Annette, additional, Nadeau, Sylvie, additional, Pineau, Joelle, additional, Ptito, Alain, additional, and Swaine, Bonnie, additional
Published: 2022
Full Text: View/download PDF

154. Adaptive control of epileptiform excitability in an in vitro model of limbic seizures

Author: Panuccio, Gabriella, Guez, Arthur, Vincent, Robert, Avoli, Massimo, and Pineau, Joelle
Published: 2013
Full Text: View/download PDF

155. ML Reproducibility Challenge 2021

Author: Sinha, Koustuv, Dodge, Jesse, Luccioni, Sasha, Forde, Jessica Zosa, Raparthy, Sharath Chandra, Pineau, Joelle, Stojnic, Robert, and Nicolas P. Rougier
Subjects: python, InformationSystems_GENERAL, machine learning, rescience c, MathematicsofComputing_GENERAL, pytorch, deep learning, GeneralLiterature_MISCELLANEOUS
Abstract: Editorial
Published: 2022
Full Text: View/download PDF

156. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Author: Doshi-Velez, Finale, Pineau, Joelle, and Roy, Nicholas
Published: 2012
Full Text: View/download PDF

157. Chapter 11: Imputing missing data from sequential multiple assignment randomized trials

Author: Shortreed, Susan M., primary, Laber, Eric B., additional, Pineau, Joelle, additional, and Murphy, Susan A., additional
Published: 2015
Full Text: View/download PDF

158. Chapter 16: Practical reinforcement learning in dynamic treatment regimes

Author: Vincent, Robert D., primary, Pineau, Joelle, additional, Ybarra, Norma, additional, and Naqa, Issam El, additional
Published: 2015
Full Text: View/download PDF

159. A bistable computational model of recurring epileptiform activity as observed in rodent slice preparations

Author: Vincent, Robert D., Courville, Aaron, and Pineau, Joelle
Published: 2011
Full Text: View/download PDF

160. Biomedical Research & Informatics Living Laboratory for Innovative Advances of New Technologies in Community Mobility Rehabilitation: Protocol for a longitudinal evaluation of mobility outcomes (Preprint)

Author: Ahmed, Sara, primary, Archambault, Philippe, additional, Auger, Claudine, additional, Durand, Audrey, additional, Fung, Joyce, additional, Kehayia, Eva, additional, Lamontagne, Anouk, additional, Majnemer, Annette, additional, Nadeau, Sylvie, additional, Pineau, Joelle, additional, Ptito, Alain, additional, and Swaine, Bonnie, additional
Published: 2022
Full Text: View/download PDF

161. Improving Passage Retrieval with Zero-Shot Question Generation

Author: Sachan, Devendra, primary, Lewis, Mike, additional, Joshi, Mandar, additional, Aghajanyan, Armen, additional, Yih, Wen-tau, additional, Pineau, Joelle, additional, and Zettlemoyer, Luke, additional
Published: 2022
Full Text: View/download PDF

162. The Curious Case of Absolute Position Embeddings

Author: Sinha, Koustuv, primary, Kazemnejad, Amirhossein, additional, Reddy, Siva, additional, Pineau, Joelle, additional, Hupkes, Dieuwke, additional, and Williams, Adina, additional
Published: 2022
Full Text: View/download PDF

163. Compressed Least-Squares Regression on Sparse Spaces

Author: Milani Fard, Mahdi, primary, Grinberg, Yuri, additional, Pineau, Joelle, additional, and Precup, Doina, additional
Published: 2021
Full Text: View/download PDF

164. Transparency and reproducibility in artificial intelligence

Author: Haibe-Kains, Benjamin, Adam, George Alexandru, Hosny, Ahmed, Khodakarami, Farnoosh, Shraddha, Thakkar, Kusko, Rebecca, Sansone, Susanna-Assunta, Tong, Weida, Wolfinger, Russ D., Mason, Christopher E., Jones, Wendell, Dopazo, Joaquin, Furlanello, Cesare, Waldron, Levi, Wang, Bo, McIntosh, Chris, Goldenberg, Anna, Kundaje, Anshul, Greene, Casey S., Broderick, Tamara, Hoffman, Michael M., Leek, Jeffrey T., Korthauer, Keegan, Huber, Wolfgang, Brazma, Alvis, Pineau, Joelle, Tibshirani, Robert, Hastie, Trevor, Ioannidis, John P. A., Quackenbush, John, Aerts, Hugo J. W. L., RS: Carim - B06 Imaging, Beeldvorming, and MUMC+: DA BV Research (9)
Subjects: Reproducibility, Multidisciplinary, Computer science, Reproducibility of Results, Breast Neoplasms, Transparency (behavior), Data science, Article, United Kingdom, United States, Artificial Intelligence, AI, Humans, Female, Early Detection of Cancer, Algorithms, Mammography
Abstract: Breakthroughs in artificial intelligence (AI) hold enormous potential as it can automate complex tasks and go even beyond human performance. In their study, McKinney et al. showed the high potential of AI for breast cancer screening. However, the lack of methods’ details and algorithm code undermines its scientific value. Here, we identify obstacles hindering transparent and reproducible AI research as faced by McKinney et al., and provide solutions to these obstacles with implications for the broader field.
Published: 2020

165. ML Reproducibility Challenge 2020

Author: Sinha, Koustuv, Dodge, Jesse, Luccioni, Sasha, Forde, Jessica Zosa, Stojnic, Robert, Pineau, Joelle, and Nicolas P. Rougier
Subjects: neurips, InformationSystems_GENERAL, reproducibility challenge, machine learning, MathematicsofComputing_GENERAL, GeneralLiterature_MISCELLANEOUS
Abstract: Editorial
Published: 2021
Full Text: View/download PDF

166. A survey of point-based POMDP solvers

Author: Shani, Guy, Pineau, Joelle, and Kaplow, Robert
Published: 2013
Full Text: View/download PDF

167. Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

Author: Kochmar, Ekaterina, primary, Vu, Dung Do, additional, Belfer, Robert, additional, Gupta, Varun, additional, Serban, Iulian Vlad, additional, and Pineau, Joelle, additional
Published: 2021
Full Text: View/download PDF

168. Improving reproducibility in machine learning research : a report from the NeurIPS 2019 reproducibility program

Author: Pineau, Joelle, Vincent-Lamarre, Philippe, Sinha, Koustuv, Larivière, Vincent, Beygelzimer, Alina, d'Alché-Buc, Florence, Fox, Emily, Larochelle, Hugo, Université de Montréal. Faculté des arts et des sciences. École de bibliothéconomie et des sciences de l'information, McGill University = Université McGill [Montréal, Canada], Facebook AI Research [Montréal] (FAIR), Facebook, Ecole de Bibliothéconomie et des Sciences de l'Information (EBSI), Université de Montréal (UdeM), Yahoo! Labs New York, Yahoo! Labs, Institut Polytechnique de Paris (IP Paris), Département Images, Données, Signal (IDS), Télécom ParisTech, Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, Apple Inc, Brain team [Paris], Research at Google, DSAIDIS, and d'Alché-Buc, Florence
Subjects: FOS: Computer and information sciences, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Computer Science - Machine Learning, Machine Learning (stat.ML), [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [INFO] Computer Science [cs], NeurIPS 2019, Reproducibility, Machine Learning (cs.LG), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Statistics - Machine Learning, [INFO]Computer Science [cs], Reproducibilité, Apprentissage Artificiel
Abstract: One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible research, thereby allowing the scientific community to quickly integrate new findings and convert ideas to practice. Reproducibility also promotes the use of robust experimental workflows, which potentially reduce unintentional errors. In 2019, the Neural Information Processing Systems (NeurIPS) conference, the premier international conference for research in machine learning, introduced a reproducibility program, designed to improve the standards across the community for how we conduct, communicate, and evaluate machine learning research. The program contained three components: a code submission policy, a community-wide reproducibility challenge, and the inclusion of the Machine Learning Reproducibility checklist as part of the paper submission process. In this paper, we describe each of these components, how it was deployed, as well as what we were able to learn from this initiative., Comment: To appear at JMLR, 16 pages + Appendix
Published: 2021

169. Informing sequential clinical decision-making through reinforcement learning: an empirical study

Author: Shortreed, Susan M., Laber, Eric, Lizotte, Daniel J., Stroup, T. Scott, Pineau, Joelle, and Murphy, Susan A.
Published: 2011
Full Text: View/download PDF

170. Development and Validation of a Robust Speech Interface for Improved Human-Robot Interaction

Author: Atrash, Amin, Kaplow, Robert, Villemure, Julien, West, Robert, Yamani, Hiba, and Pineau, Joelle
Published: 2009
Full Text: View/download PDF

171. Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

Author: Yarats, Denis, primary, Zhang, Amy, additional, Kostrikov, Ilya, additional, Amos, Brandon, additional, Pineau, Joelle, additional, and Fergus, Rob, additional
Published: 2021
Full Text: View/download PDF

172. NeurIPS 2019 Reproducibility Challenge

Author: Sinha, Koustuv, Pineau, Joelle, Forde, Jessica, Ke, Rosemary Nan, Larochelle, Hugo, and Nicolas Rougier
Subjects: neurips, InformationSystems_GENERAL, reproducibility challenge, machine learning, MathematicsofComputing_GENERAL, GeneralLiterature_MISCELLANEOUS
Abstract: Editorial
Published: 2020
Full Text: View/download PDF

173. The Duality of State and Observation in Probabilistic Transition Systems

Author: Dinculescu, Monica, primary, Hundt, Christopher, additional, Panangaden, Prakash, additional, Pineau, Joelle, additional, and Precup, Doina, additional
Published: 2013
Full Text: View/download PDF

174. Constructing evidence-based treatment strategies using methods from computer science

Author: Pineau, Joelle, Bellemare, Marc G., Rush, A. John, Ghizaru, Adrian, and Murphy, Susan A.
Published: 2007
Full Text: View/download PDF

175. Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

Author: Sinha, Koustuv, primary, Jia, Robin, additional, Hupkes, Dieuwke, additional, Pineau, Joelle, additional, Williams, Adina, additional, and Kiela, Douwe, additional
Published: 2021
Full Text: View/download PDF

176. Exploring the Limits of Few-Shot Link Prediction in Knowledge Graphs

Author: Jambor, Dora, primary, Teru, Komal, additional, Pineau, Joelle, additional, and Hamilton, William L., additional
Published: 2021
Full Text: View/download PDF

177. UnNatural Language Inference

Author: Sinha, Koustuv, primary, Parthasarathi, Prasanna, additional, Pineau, Joelle, additional, and Williams, Adina, additional
Published: 2021
Full Text: View/download PDF

178. A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic loss

Author: Parthasarathi, Prasanna, primary, Abdelsalam, Mohamed, additional, Chandar, Sarath, additional, and Pineau, Joelle, additional
Published: 2021
Full Text: View/download PDF

179. Sometimes We Want Ungrammatical Translations

Author: Parthasarathi, Prasanna, primary, Sinha, Koustuv, additional, Pineau, Joelle, additional, and Williams, Adina, additional
Published: 2021
Full Text: View/download PDF

180. Do Encoder Representations of Generative Dialogue Models have sufficient summary of the Information about the task ?

Author: Parthasarathi, Prasanna, primary, Pineau, Joelle, additional, and Chandar, Sarath, additional
Published: 2021
Full Text: View/download PDF

181. The Bottleneck Simulator: A Model-Based Deep Reinforcement Learning Approach

Author: Serban, Iulian Vlad, primary, Sankar, Chinnadhurai, additional, Pieper, Michael, additional, Pineau, Joelle, additional, and Bengio, Yoshua, additional
Published: 2020
Full Text: View/download PDF

182. Building reproducible, reusable, and robust machine learning software

Author: Pineau, Joelle, primary
Published: 2020
Full Text: View/download PDF

183. Development of a polygenic risk score to improve screening for fracture risk: A genetic risk prediction study

Author: Forgetta, Vincenzo, primary, Keller-Baruch, Julyan, additional, Forest, Marie, additional, Durand, Audrey, additional, Bhatnagar, Sahir, additional, Kemp, John P., additional, Nethander, Maria, additional, Evans, Daniel, additional, Morris, John A., additional, Kiel, Douglas P., additional, Rivadeneira, Fernando, additional, Johansson, Helena, additional, Harvey, Nicholas C., additional, Mellström, Dan, additional, Karlsson, Magnus, additional, Cooper, Cyrus, additional, Evans, David M., additional, Clarke, Robert, additional, Kanis, John A., additional, Orwoll, Eric, additional, McCloskey, Eugene V., additional, Ohlsson, Claes, additional, Pineau, Joelle, additional, Leslie, William D., additional, Greenwood, Celia M. T., additional, and Richards, J. Brent, additional
Published: 2020
Full Text: View/download PDF

184. On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability (Extended Abstract)

Author: Francois-Lavet, Vincent, primary, Rabusseau, Guillaume, additional, Pineau, Joelle, additional, Ernst, Damien, additional, and Fonteneau, Raphael, additional
Published: 2020
Full Text: View/download PDF

185. Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks

Author: Wabartha, Maxime, primary, Durand, Audrey, additional, François-Lavet, Vincent, additional, and Pineau, Joelle, additional
Published: 2020
Full Text: View/download PDF

186. Literature Mining for Incorporating Inductive Bias in Biomedical Prediction Tasks (Student Abstract)

Author: Zhang, Qizhen, primary, Durand, Audrey, additional, and Pineau, Joelle, additional
Published: 2020
Full Text: View/download PDF

187. Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

Author: Crawford, Eric, primary and Pineau, Joelle, additional
Published: 2020
Full Text: View/download PDF

188. Learning an Unreferenced Metric for Online Dialogue Evaluation

Author: Sinha, Koustuv, primary, Parthasarathi, Prasanna, additional, Wang, Jasmine, additional, Lowe, Ryan, additional, Hamilton, William L., additional, and Pineau, Joelle, additional
Published: 2020
Full Text: View/download PDF

189. Recurrent Boosting for Classification of Natural and Synthetic Time-Series Data

Author: Vincent, Robert D., primary, Pineau, Joelle, additional, de Guzman, Philip, additional, and Avoli, Massimo, additional
Published: 2007
Full Text: View/download PDF

190. Active Learning in Partially Observable Markov Decision Processes

Author: Jaulmes, Robin, primary, Pineau, Joelle, additional, and Precup, Doina, additional
Published: 2005
Full Text: View/download PDF

191. AAAI 2008 workshop reports

Author: Anand, Sarabjot Singh, Bunescu, Razvan, Carvalho, Vitor, Chomicki, Jan, Conitzer, Vincent, Cox, Michael T., Dignum, Virginia, Dodds, Zachary, Dredze, Mark, Furcy, David, Gabrilovich, Evgeniy, Goker, Mehmet H., Guesgen, Hans, Hirsh, Haym, Jannach, Dietmar, Junker, Ulrich, Ketter, Wolfgang, Kobsa, Alfred, Koenig, Sven, Lau, Tessa, Lewis, Lundy, Matson, Eric, Metzler, Ted, Mihalcea, Rada, Mobasher, Bamshad, Pineau, Joelle, Poupart, Pascal, Raja, Anita, Rural, Wheeler, Sadeh, Norman, Shani, Guy, Shapiro, Daniel, Smith, Trey, Taylor, Matthew E., Wagstaff, Kiri, Walsh, William, and Zhou, Rong
Subjects: Artificial intelligence, Wikipedia (Reference work), Artificial intelligence -- Conferences, meetings and seminars
Abstract: The Workshop on Advancements in POMDP Solvers brought together active researchers in the area of solving partially observable Markov decision processes (POMDPs). Participants discussed various approaches to solving POMDPs, and […], AAAI was pleased to present the AAAI-08 Workshop Program, held Sunday and Monday, July 13-14, in Chicago, Illinois, USA. The program included the following 15 workshops: Advancements in POMDP Solvers; AI Education Workshop Colloquium; Coordination, Organizations, Institutions, and Norms in Agent Systems, Enhanced Messaging; Human Implications of Human-Robot Interaction; Intelligent Techniques for Web Personalization and Recommender Systems; Metareasoning: Thinking about Thinking; Multidisciplinary Workshop on Advances in Preference Handling; Search in Artificial Intelligence and Robotics; Spatial and Temporal Reasoning; Trading Agent Design and Analysis; Transfer Learning for Complex Tasks; What Went Wrong and Why: Lessons from AI Research and Applications; and Wikipedia and Artificial Intelligence: An Evolving Synergy.
Published: 2009

192. MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions

Author: Sivakumar, Viswanath, Delalleau, Olivier, Rockt��schel, Tim, Miller, Alexander H., K��ttler, Heinrich, Nardelli, Nantas, Rabbat, Mike, Pineau, Joelle, and Riedel, Sebastian
Subjects: Networking and Internet Architecture (cs.NI), FOS: Computer and information sciences, Computer Science - Networking and Internet Architecture, Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Statistics - Machine Learning, Machine Learning (stat.ML), Distributed, Parallel, and Cluster Computing (cs.DC), Machine Learning (cs.LG)
Abstract: Effective network congestion control strategies are key to keeping the Internet (or any large computer network) operational. Network congestion control has been dominated by hand-crafted heuristics for decades. Recently, ReinforcementLearning (RL) has emerged as an alternative to automatically optimize such control strategies. Research so far has primarily considered RL interfaces which block the sender while an agent considers its next action. This is largely an artifact of building on top of frameworks designed for RL in games (e.g. OpenAI Gym). However, this does not translate to real-world networking environments, where a network sender waiting on a policy without sending data leads to under-utilization of bandwidth. We instead propose to formulate congestion control with an asynchronous RL agent that handles delayed actions. We present MVFST-RL, a scalable framework for congestion control in the QUIC transport protocol that leverages state-of-the-art in asynchronous RL training with off-policy correction. We analyze modeling improvements to mitigate the deviation from Markovian dynamics, and evaluate our method on emulated networks from the Pantheon benchmark platform. The source code is publicly available at https://github.com/facebookresearch/mvfst-rl., Workshop on ML for Systems at NeurIPS 2019
Published: 2019

193. Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning

Author: Doan, Thang, Mazoure, Bogdan, Abdar, Moloud, Durand, Audrey, Pineau, Joelle, and Hjelm, R Devon
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Computer Science - Multiagent Systems, Machine Learning (cs.LG), Multiagent Systems (cs.MA)
Abstract: Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensional state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the "best" coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC).
Published: 2019

194. No Press Diplomacy: Modeling Multi-Agent Gameplay

Author: Paquette, Philip, Lu, Yuchen, Bocco, Steven, Smith, Max O., Ortiz-Gagne, Satya, Kummerfeld, Jonathan K., Singh, Satinder, Pineau, Joelle, and Courville, Aaron
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems, Machine Learning (cs.LG), Multiagent Systems (cs.MA)
Abstract: Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. We present DipNet, a neural-network-based policy model for No Press Diplomacy. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play. Both the SL and RL agents demonstrate state-of-the-art No Press performance by beating popular rule-based bots., Accepted at NeurIPS 2019
Published: 2019

195. Learning Causal State Representations of Partially Observable Environments

Author: Zhang, Amy, Lipton, Zachary C., Pineda, Luis, Azizzadenesheli, Kamyar, Anandkumar, Anima, Itti, Laurent, Pineau, Joelle, and Furlanello, Tommaso
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose an algorithm to approximate causal states, which are the coarsest partition of the joint history of actions and observations in partially-observable Markov decision processes (POMDP). Our method learns approximate causal state representations from RNNs trained to predict subsequent observations given the history. We demonstrate that these learned state representations are useful for learning policies efficiently in reinforcement learning problems with rich observation spaces. We connect causal states with causal feature sets from the causal inference literature, and also provide theoretical guarantees on the optimality of the continuous version of this causal state representation under Lipschitz assumptions by proving equivalence to bisimulation, a relation between behaviorally equivalent systems. This allows for lower bounds on the optimal value function of the learned representation, which is tight given certain assumptions. Finally, we empirically evaluate causal state representations using multiple partially observable tasks and compare with prior methods., 35 pages, 8 figures
Published: 2019

196. On the Pitfalls of Measuring Emergent Communication

Author: Lowe, Ryan, Foerster, Jakob, Boureau, Y-Lan, Pineau, Joelle, and Dauphin, Yann
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: How do we know if communication is emerging in a multi-agent system? The vast majority of recent papers on emergent communication show that adding a communication channel leads to an increase in reward or task success. This is a useful indicator, but provides only a coarse measure of the agent's learned communication abilities. As we move towards more complex environments, it becomes imperative to have a set of finer tools that allow qualitative and quantitative insights into the emergence of communication. This may be especially useful to allow humans to monitor agents' behaviour, whether for fault detection, assessing performance, or even building trust. In this paper, we examine a few intuitive existing metrics for measuring communication, and show that they can be misleading. Specifically, by training deep reinforcement learning agents to play simple matrix games augmented with a communication channel, we find a scenario where agents appear to communicate (their messages provide information about their subsequent action), and yet the messages do not impact the environment or other agent in any way. We explain this phenomenon using ablation studies and by visualizing the representations of the learned policies. We also survey some commonly used metrics for measuring emergent communication, and provide recommendations as to when these metrics should be used., AAMAS 2019. 13 pages
Published: 2019

197. Separating value functions across time-scales

Author: Romoff, Joshua, Henderson, Peter, Touati, Ahmed, Brunskill, Emma, Pineau, Joelle, and Ollivier, Yann
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: In many finite horizon episodic reinforcement learning (RL) settings, it is desirable to optimize for the undiscounted return - in settings like Atari, for instance, the goal is to collect the most points while staying alive in the long run. Yet, it may be difficult (or even intractable) mathematically to learn with this target. As such, temporal discounting is often applied to optimize over a shorter effective planning horizon. This comes at the risk of potentially biasing the optimization target away from the undiscounted goal. In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning. We present an extension of temporal difference (TD) learning, which we call TD($\Delta$), that breaks down a value function into a series of components based on the differences between value functions with smaller discount factors. The separation of a longer horizon value function into these components has useful properties in scalability and performance. We discuss these properties and show theoretic and empirical improvements over standard TD learning in certain settings., Comment: Full version accepted to ICML 2019. Extended abstract also to be presented at RLDM 2019
Published: 2019

198. The Second Conversational Intelligence Challenge (ConvAI2)

Author: Dinan, Emily, Logacheva, Varvara, Malykh, Valentin, Miller, Alexander, Shuster, Kurt, Urbanek, Jack, Kiela, Douwe, Szlam, Arthur, Serban, Iulian, Lowe, Ryan, Prabhumoye, Shrimai, Black, Alan W, Rudnicky, Alexander, Williams, Jason, Pineau, Joelle, Burtsev, Mikhail, and Weston, Jason
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Computation and Language (cs.CL), Human-Computer Interaction (cs.HC)
Abstract: We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations) -- in terms of repetition, consistency and balance of dialogue acts (e.g. how many questions asked vs. answered).
Published: 2019

199. Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

Author: Assran, Mahmoud, Romoff, Joshua, Ballas, Nicolas, Pineau, Joelle, and Rabbat, Michael
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Computer Science - Multiagent Systems, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG), Multiagent Systems (cs.MA)
Abstract: Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient. We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws.
Published: 2019
Full Text: View/download PDF

200. Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift

Author: Islam, Riashat, Teru, Komal K., Sharma, Deepak, and Pineau, Joelle
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Off-policy deep reinforcement learning (RL) algorithms are incapable of learning solely from batch offline data without online interactions with the environment, due to the phenomenon known as \textit{extrapolation error}. This is often due to past data available in the replay buffer that may be quite different from the data distribution under the current policy. We argue that most off-policy learning methods fundamentally suffer from a \textit{state distribution shift} due to the mismatch between the state visitation distribution of the data collected by the behavior and target policies. This data distribution shift between current and past samples can significantly impact the performance of most modern off-policy based policy optimization algorithms. In this work, we first do a systematic analysis of state distribution mismatch in off-policy learning, and then develop a novel off-policy policy optimization method to constraint the state distribution shift. To do this, we first estimate the state distribution based on features of the state, using a density estimator and then develop a novel constrained off-policy gradient objective that minimizes the state distribution shift. Our experimental results on continuous control tasks show that minimizing this distribution mismatch can significantly improve performance in most popular practical off-policy policy gradient algorithms., Comment: Accepted at NeurIPS 2019 workshop on Deep Reinforcement Learning
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

597 results on '"Pineau, Joelle"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources