Author: "Korbak, Tomasz" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Korbak, Tomasz"' showing total 42 results

Start Over Author "Korbak, Tomasz"

42 results on '"Korbak, Tomasz"'

1. Aligning language models with human preferences

Author: Korbak, Tomasz
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Language models (LMs) trained on vast quantities of text data can acquire sophisticated skills such as generating summaries, answering questions or generating code. However, they also manifest behaviors that violate human preferences, e.g., they can generate offensive content, falsehoods or perpetuate social biases. In this thesis, I explore several approaches to aligning LMs with human preferences. First, I argue that aligning LMs can be seen as Bayesian inference: conditioning a prior (base, pretrained LM) on evidence about human preferences (Chapter 2). Conditioning on human preferences can be implemented in numerous ways. In Chapter 3, I investigate the relation between two approaches to finetuning pretrained LMs using feedback given by a scoring function: reinforcement learning from human feedback (RLHF) and distribution matching. I show that RLHF can be seen as a special case of distribution matching but distributional matching is strictly more general. In chapter 4, I show how to extend the distribution matching to conditional language models. Finally, in chapter 5 I explore a different root: conditioning an LM on human preferences already during pretraining. I show that involving human feedback from the very start tends to be more effective than using it only during supervised finetuning. Overall, these results highlight the room for alignment techniques different from and complementary to RLHF., Comment: PhD thesis
Published: 2024

2. Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Author: Anwar, Usman, Saparov, Abulhair, Rando, Javier, Paleka, Daniel, Turpin, Miles, Hase, Peter, Lubana, Ekdeep Singh, Jenner, Erik, Casper, Stephen, Sourbut, Oliver, Edelman, Benjamin L., Zhang, Zhaowei, Günther, Mario, Korinek, Anton, Hernandez-Orallo, Jose, Hammond, Lewis, Bigelow, Eric, Pan, Alexander, Langosco, Lauro, Korbak, Tomasz, Zhang, Heidi, Zhong, Ruiqi, hÉigeartaigh, Seán Ó, Recchia, Gabriel, Corsi, Giulio, Chan, Alan, Anderljung, Markus, Edwards, Lilian, Petrov, Aleksandar, de Witt, Christian Schroeder, Motwan, Sumeet Ramesh, Bengio, Yoshua, Chen, Danqi, Torr, Philip H. S., Albanie, Samuel, Maharaj, Tegan, Foerster, Jakob, Tramer, Florian, He, He, Kasirzadeh, Atoosa, Choi, Yejin, and Krueger, David
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.
Published: 2024

3. Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Author: Gerstgrasser, Matthias, Schaeffer, Rylan, Dey, Apratim, Rafailov, Rafael, Sleight, Henry, Hughes, John, Korbak, Tomasz, Agrawal, Rajashree, Pai, Dhruv, Gromov, Andrey, Roberts, Daniel A., Yang, Diyi, Donoho, David L., and Koyejo, Sanmi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Emerging Technologies, Statistics - Machine Learning
Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs.
Published: 2024

4. Towards Understanding Sycophancy in Language Models

Author: Sharma, Mrinank, Tong, Meg, Korbak, Tomasz, Duvenaud, David, Askell, Amanda, Bowman, Samuel R., Cheng, Newton, Durmus, Esin, Hatfield-Dodds, Zac, Johnston, Scott R., Kravec, Shauna, Maxwell, Timothy, McCandlish, Sam, Ndousse, Kamal, Rausch, Oliver, Schiefer, Nicholas, Yan, Da, Zhang, Miranda, and Perez, Ethan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning, I.2.6
Abstract: Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks. To understand if human preferences drive this broadly observed behavior, we analyze existing human preference data. We find that when a response matches a user's views, it is more likely to be preferred. Moreover, both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time. Optimizing model outputs against PMs also sometimes sacrifices truthfulness in favor of sycophancy. Overall, our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses., Comment: 32 pages, 20 figures
Published: 2023

5. Compositional preference models for aligning LMs

Author: Go, Dongyoung, Korbak, Tomasz, Kruszewski, Germán, Rozen, Jos, and Dymetman, Marc
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: As language models (LMs) become more capable, it is increasingly important to align them with human preferences. However, the dominant paradigm for training Preference Models (PMs) for that purpose suffers from fundamental limitations, such as lack of transparency and scalability, along with susceptibility to overfitting the preference dataset. We propose Compositional Preference Models (CPMs), a novel PM framework that decomposes one global preference assessment into several interpretable features, obtains scalar scores for these features from a prompted LM, and aggregates these scores using a logistic regression classifier. Through these simple steps, CPMs allow to control which properties of the preference data are used to train the preference model and to build it based on features that are believed to underlie the human preference judgment. Our experiments show that CPMs not only improve generalization and are more robust to overoptimization than standard PMs, but also that best-of-n samples obtained using CPMs tend to be preferred over samples obtained using conventional PMs. Overall, our approach demonstrates the benefits of endowing PMs with priors about which features determine human preferences while relying on LM capabilities to extract those features in a scalable and robust way., Comment: ICLR 2024
Published: 2023

6. The Reversal Curse: LLMs trained on 'A is B' fail to learn 'B is A'

Author: Berglund, Lukas, Tong, Meg, Kaufmann, Max, Balesni, Mikita, Stickland, Asa Cooper, Korbak, Tomasz, and Evans, Owain
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse., Comment: 21 pages, 11 figures
Published: 2023

7. Taken out of context: On measuring situational awareness in LLMs

Author: Berglund, Lukas, Stickland, Asa Cooper, Balesni, Mikita, Kaufmann, Max, Tong, Meg, Korbak, Tomasz, Kokotajlo, Daniel, and Evans, Owain
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We aim to better understand the emergence of `situational awareness' in large language models (LLMs). A model is situationally aware if it's aware that it's a model and can recognize whether it's currently in testing or deployment. Today's LLMs are tested for safety and alignment before they are deployed. An LLM could exploit situational awareness to achieve a high score on safety tests, while taking harmful actions after deployment. Situational awareness may emerge unexpectedly as a byproduct of model scaling. One way to better foresee this emergence is to run scaling experiments on abilities necessary for situational awareness. As such an ability, we propose `out-of-context reasoning' (in contrast to in-context learning). We study out-of-context reasoning experimentally. First, we finetune an LLM on a description of a test while providing no examples or demonstrations. At test time, we assess whether the model can pass the test. To our surprise, we find that LLMs succeed on this out-of-context reasoning task. Their success is sensitive to the training setup and only works when we apply data augmentation. For both GPT-3 and LLaMA-1, performance improves with model size. These findings offer a foundation for further empirical study, towards predicting and potentially controlling the emergence of situational awareness in LLMs. Code is available at: https://github.com/AsaCooperStickland/situational-awareness-evals.
Published: 2023

8. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Author: Casper, Stephen, Davies, Xander, Shi, Claudia, Gilbert, Thomas Krendl, Scheurer, Jérémy, Rando, Javier, Freedman, Rachel, Korbak, Tomasz, Lindner, David, Freire, Pedro, Wang, Tony, Marks, Samuel, Segerie, Charbel-Raphaël, Carroll, Micah, Peng, Andi, Christoffersen, Phillip, Damani, Mehul, Slocum, Stewart, Anwar, Usman, Siththaranjan, Anand, Nadeau, Max, Michaud, Eric J., Pfau, Jacob, Krasheninnikov, Dmitrii, Chen, Xin, Langosco, Lauro, Hase, Peter, Bıyık, Erdem, Dragan, Anca, Krueger, David, Sadigh, Dorsa, and Hadfield-Menell, Dylan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.
Published: 2023

9. Inverse Scaling: When Bigger Isn't Better

Author: McKenzie, Ian R., Lyzhov, Alexander, Pieler, Michael, Parrish, Alicia, Mueller, Aaron, Prabhu, Ameya, McLean, Euan, Kirtland, Aaron, Ross, Alexis, Liu, Alisa, Gritsevskiy, Andrew, Wurgaft, Daniel, Kauffman, Derik, Recchia, Gabriel, Liu, Jiacheng, Cavanagh, Joe, Weiss, Max, Huang, Sicong, Droid, The Floating, Tseng, Tom, Korbak, Tomasz, Shen, Xudong, Zhang, Yuhui, Zhou, Zhengping, Kim, Najoung, Bowman, Samuel R., and Perez, Ethan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society
Abstract: Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at https://inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models., Comment: Published in TMLR (2023), 39 pages
Published: 2023

10. Training Language Models with Language Feedback at Scale

Author: Scheurer, Jérémy, Campos, Jon Ander, Korbak, Tomasz, Chan, Jun Shern, Chen, Angelica, Cho, Kyunghyun, and Perez, Ethan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we introduce Imitation learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback. ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements. Second, selecting the refinement incorporating the most feedback. Third, finetuning the language model to maximize the likelihood of the chosen refinement given the input. We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback. We evaluate ILF's effectiveness on a carefully-controlled toy task and a realistic summarization task. Our experiments demonstrate that large language models accurately incorporate feedback and that finetuning with ILF scales well with the dataset size, even outperforming finetuning on human summaries. Learning from both language and comparison feedback outperforms learning from each alone, achieving human-level summarization performance., Comment: Published in TMLR: https://openreview.net/forum?id=xo3hI5MwvU
Published: 2023

11. Improving Code Generation by Training with Natural Language Feedback

Author: Chen, Angelica, Scheurer, Jérémy, Korbak, Tomasz, Campos, Jon Ander, Chan, Jun Shern, Bowman, Samuel R., Cho, Kyunghyun, and Perez, Ethan
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and fine-tuning on repaired programs written by humans. Overall, our results suggest that learning from human-written natural language feedback is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM's performance on code generation tasks., Comment: Published in (and superceded by) TMLR: https://openreview.net/forum?id=xo3hI5MwvU
Published: 2023

12. Models of symbol emergence in communication: a conceptual review and a guide for avoiding local minima

Author: Zubek, Julian, Korbak, Tomasz, and Rączaszek-Leonardi, Joanna
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Multiagent Systems
Abstract: Computational simulations are a popular method for testing hypotheses about the emergence of communication. This kind of research is performed in a variety of traditions including language evolution, developmental psychology, cognitive science, machine learning, robotics, etc. The motivations for the models are different, but the operationalizations and methods used are often similar. We identify the assumptions and explanatory targets of several most representative models and summarise the known results. We claim that some of the assumptions -- such as portraying meaning in terms of mapping, focusing on the descriptive function of communication, modelling signals with amodal tokens -- may hinder the success of modelling. Relaxing these assumptions and foregrounding the interactions of embodied and situated agents allows one to systematise the multiplicity of pressures under which symbolic systems evolve. In line with this perspective, we sketch the road towards modelling the emergence of meaningful symbolic communication, where symbols are simultaneously grounded in action and perception and form an abstract system.
Published: 2023

13. Pretraining Language Models with Human Preferences

Author: Korbak, Tomasz, Shi, Kejian, Chen, Angelica, Bhalerao, Rasika, Buckley, Christopher L., Phang, Jason, Bowman, Samuel R., and Perez, Ethan
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto-optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially-chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training., Comment: ICML 2023
Published: 2023

14. Aligning Language Models with Preferences through f-divergence Minimization

Author: Go, Dongyoung, Korbak, Tomasz, Kruszewski, Germán, Rozen, Jos, Ryu, Nahyeon, and Dymetman, Marc
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Aligning language models with preferences can be posed as approximating a target distribution representing some desired behavior. Existing approaches differ both in the functional form of the target distribution and the algorithm used to approximate it. For instance, Reinforcement Learning from Human Feedback (RLHF) corresponds to minimizing a reverse KL from an implicit target distribution arising from a KL penalty in the objective. On the other hand, Generative Distributional Control (GDC) has an explicit target distribution and minimizes a forward KL from it using the Distributional Policy Gradient (DPG) algorithm. In this paper, we propose a new approach, f-DPG, which allows the use of any f-divergence to approximate any target distribution that can be evaluated. f-DPG unifies both frameworks (RLHF, GDC) and the approximation methods (DPG, RL with KL penalties). We show the practical benefits of various choices of divergence objectives and demonstrate that there is no universally optimal objective but that different divergences present different alignment and diversity trade-offs. We show that Jensen-Shannon divergence strikes a good balance between these objectives, and frequently outperforms forward KL divergence by a wide margin, leading to significant improvements over prior work. These distinguishing characteristics between divergences persist as the model size increases, highlighting the importance of selecting appropriate divergence objectives.
Published: 2023

15. On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

Author: Korbak, Tomasz, Elsahar, Hady, Kruszewski, Germán, and Dymetman, Marc
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: The availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a training-from-scratch to a fine-tuning paradigm. While in some applications the goal is to "nudge" the pre-trained distribution towards preferred outputs, in others it is to steer it towards a different distribution over the sample space. Two main paradigms have emerged to tackle this challenge: Reward Maximization (RM) and, more recently, Distribution Matching (DM). RM applies standard Reinforcement Learning (RL) techniques, such as Policy Gradients, to gradually increase the reward signal. DM prescribes to first make explicit the target distribution that the model is fine-tuned to approximate. Here we explore the theoretical connections between the two paradigms, and show that methods such as KL-control developed for RM can also be construed as belonging to DM. We further observe that while DM differs from RM, it can suffer from similar training difficulties, such as high gradient variance. We leverage connections between the two paradigms to import the concept of baseline into DM methods. We empirically validate the benefits of adding a baseline on an array of controllable language generation tasks such as constraining topic, sentiment, and gender distributions in texts sampled from a language model. We observe superior performance in terms of constraint satisfaction, stability and sample efficiency.
Published: 2022

16. RL with KL penalties is better viewed as Bayesian inference

Author: Korbak, Tomasz, Perez, Ethan, and Buckley, Christopher L
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Reinforcement learning (RL) is frequently employed in fine-tuning large language models (LMs), such as GPT-3, to penalize them for undesirable features of generated sequences, such as offensiveness, social bias, harmfulness or falsehood. The RL formulation involves treating the LM as a policy and updating it to maximise the expected value of a reward function which captures human preferences, such as non-offensiveness. In this paper, we analyze challenges associated with treating a language model as an RL policy and show how avoiding those challenges requires moving beyond the RL paradigm. We start by observing that the standard RL approach is flawed as an objective for fine-tuning LMs because it leads to distribution collapse: turning the LM into a degenerate distribution. Then, we analyze KL-regularised RL, a widely used recipe for fine-tuning LMs, which additionally constrains the fine-tuned LM to stay close to its original distribution in terms of Kullback-Leibler (KL) divergence. We show that KL-regularised RL is equivalent to variational inference: approximating a Bayesian posterior which specifies how to update a prior LM to conform with evidence provided by the reward function. We argue that this Bayesian inference view of KL-regularised RL is more insightful than the typically employed RL perspective. The Bayesian inference view explains how KL-regularised RL avoids the distribution collapse problem and offers a first-principles derivation for its objective. While this objective happens to be equivalent to RL (with a particular choice of parametric reward), there exist other objectives for fine-tuning LMs which are no longer equivalent to RL. That observation leads to a more general point: RL is not an adequate formal framework for problems such as fine-tuning language models. These problems are best viewed as Bayesian inference: approximating a pre-defined target distribution., Comment: Findings of EMNLP 2022
Published: 2022

17. A continuity of Markov blanket interpretations under the Free Energy Principle

Author: Seth, Anil, Korbak, Tomasz, and Tschantz, Alexander
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: Bruineberg and colleagues helpfully distinguish between instrumental and ontological interpretations of Markov blankets, exposing the dangers of using the former to make claims about the latter. However, proposing a sharp distinction neglects the value of recognising a continuum spanning from instrumental to ontological. This value extends to the related distinction between being and having a model., Comment: 4 pages, 0 figures, invited commentary
Published: 2022

18. Controlling Conditional Language Models without Catastrophic Forgetting

Author: Korbak, Tomasz, Elsahar, Hady, Kruszewski, German, and Dymetman, Marc
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g., hallucinations in abstractive summarization or style violations in code generation). This raises the important question of how to adapt pre-trained generative models to meet all requirements without destroying their general capabilities ("catastrophic forgetting"). Recent work has proposed to solve this problem by representing task-specific requirements through energy-based models (EBMs) and approximating these EBMs using distributional policy gradients (DPG). Despite its effectiveness, this approach is however limited to unconditional distributions. In this paper, we extend DPG to conditional tasks by proposing Conditional DPG (CDPG). We evaluate CDPG on four different control objectives across three tasks (translation, summarization and code generation) and two pretrained models (T5 and GPT-Neo). Our results show that fine-tuning using CDPG robustly moves these pretrained models closer towards meeting control objectives and -- in contrast with baseline approaches -- does not result in catastrophic forgetting., Comment: ICML 2022
Published: 2021

19. Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication

Author: Kuciński, Łukasz, Korbak, Tomasz, Kołodziej, Paweł, and Miłoś, Piotr
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Communication is compositional if complex signals can be represented as a combination of simpler subparts. In this paper, we theoretically show that inductive biases on both the training framework and the data are needed to develop a compositional communication. Moreover, we prove that compositionality spontaneously arises in the signaling games, where agents communicate over a noisy channel. We experimentally confirm that a range of noise levels, which depends on the model and the data, indeed promotes compositionality. Finally, we provide a comprehensive study of this dependence and report results in terms of recently studied compositionality metrics: topographical similarity, conflict count, and context independence., Comment: NeurIPS 2021
Published: 2021

20. Energy-Based Models for Code Generation under Compilability Constraints

Author: Korbak, Tomasz, Elsahar, Hady, Dymetman, Marc, and Kruszewski, Germán
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Neural and Evolutionary Computing, Computer Science - Software Engineering, I.2.2, I.2.7, I.2.6, I.5.1
Abstract: Neural language models can be successfully trained on source code, leading to applications such as code completion. However, their versatile autoregressive self-supervision objective overlooks important global sequence-level features that are present in the data such as syntactic correctness or compilability. In this work, we pose the problem of learning to generate compilable code as constraint satisfaction. We define an Energy-Based Model (EBM) representing a pre-trained generative model with an imposed constraint of generating only compilable sequences. We then use the KL-Adaptive Distributional Policy Gradient algorithm (Khalifa et al., 2021) to train a generative model approximating the EBM. We conduct experiments showing that our proposed approach is able to improve compilability rates without sacrificing diversity and complexity of the generated samples., Comment: Accepted for the First Workshop on Natural Language Processing for Programming, ACL 2021
Published: 2021

21. Measuring non-trivial compositionality in emergent communication

Author: Korbak, Tomasz, Zubek, Julian, and Rączaszek-Leonardi, Joanna
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Compositionality is an important explanatory target in emergent communication and language evolution. The vast majority of computational models of communication account for the emergence of only a very basic form of compositionality: trivial compositionality. A compositional protocol is trivially compositional if the meaning of a complex signal (e.g. blue circle) boils down to the intersection of meanings of its constituents (e.g. the intersection of the set of blue objects and the set of circles). A protocol is non-trivially compositional (NTC) if the meaning of a complex signal (e.g. biggest apple) is a more complex function of the meanings of their constituents. In this paper, we review several metrics of compositionality used in emergent communication and experimentally show that most of them fail to detect NTC - i.e. they treat non-trivial compositionality as a failure of compositionality. The one exception is tree reconstruction error, a metric motivated by formal accounts of compositionality. These results emphasise important limitations of emergent communication research that could hamper progress on modelling the emergence of NTC., Comment: 4th Workshop on Emergent Communication, NeurIPS 2020
Published: 2020

22. Developmentally motivated emergence of compositional communication via template transfer

Author: Korbak, Tomasz, Zubek, Julian, Kuciński, Łukasz, Miłoś, Piotr, and Rączaszek-Leonardi, Joanna
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: This paper explores a novel approach to achieving emergent compositional communication in multi-agent systems. We propose a training regime implementing template transfer, the idea of carrying over learned biases across contexts. In our method, a sender-receiver pair is first trained with disentangled loss functions and then the receiver is transferred to train a new sender with a standard loss. Unlike other methods (e.g. the obverter algorithm), our approach does not require imposing inductive biases on the architecture of the agents. We experimentally show the emergence of compositional communication using topographical similarity, zero-shot generalization and context independence as evaluation metrics. The presented approach is connected to an important line of work in semiotics and developmental psycholinguistics: it supports a conjecture that compositional communication is scaffolded on simpler communication protocols., Comment: Accepted for NeurIPS 2019 workshop Emergent Communication: Towards Natural Language
Published: 2019

23. Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

Author: Korzeniowski, Renard, Rolczyński, Rafał, Sadownik, Przemysław, Korbak, Tomasz, and Możejko, Marcin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: This paper presents our contribution to PolEval 2019 Task 6: Hate speech and bullying detection. We describe three parallel approaches that we followed: fine-tuning a pre-trained ULMFiT model to our classification task, fine-tuning a pre-trained BERT model to our classification task, and using the TPOT library to find the optimal pipeline. We present results achieved by these three tools and review their advantages and disadvantages in terms of user experience. Our team placed second in subtask 2 with a shallow model found by TPOT: a~logistic regression classifier with non-trivial feature engineering., Comment: http://poleval.pl/publication
Published: 2019

24. The Emergence of Action-grounded Compositional Communication

Author: Niklewski, Micha, Gwka, Krzysztof, Wiszowata, Joanna, Kaul, Vibhesh, Korbak, Tomasz, Rczaszek-Leonardi, Joanna, and Zubek, Julian
Abstract: Classical models of the emergence of compositionality in communication focused on the compositional nature of the en-vironment (Cangelosi, 2001; Cornish et al., 2008). Here we advance a model in which compositional structure emergesfrom integrating environments properties with agents actions. We take as a starting point Cangelosis (2001) model, wherea population of agents searched for edible mushrooms. Given opportunity to communicate, they evolved a system inwhich combinations of signs were sensorily grounded in combinations of mushroom properties. We modify this modelby grounding the communication also in agents’ actions. With this, we are able to evolve communication systems con-taining meaningful compositions of mushroom properties and agent actions. We investigate how such compositions canfacilitate a) learning the communication protocol, b) learning the adequate behavior policy. This kind of sensory-motorcompositionality seems better suited for coordinating navigation in dynamic environments.
Published: 2020

25. Fine-tuning Tree-LSTM for phrase-level sentiment classification on a Polish dependency treebank. Submission to PolEval task 2

Author: Korbak, Tomasz and Żak, Paulina
Subjects: Computer Science - Computation and Language
Abstract: We describe a variant of Child-Sum Tree-LSTM deep neural network (Tai et al, 2015) fine-tuned for working with dependency trees and morphologically rich languages using the example of Polish. Fine-tuning included applying a custom regularization technique (zoneout, described by (Krueger et al., 2016), and further adapted for Tree-LSTMs) as well as using pre-trained word embeddings enhanced with sub-word information (Bojanowski et al., 2016). The system was implemented in PyTorch and evaluated on phrase-level sentiment labeling task as part of the PolEval competition.
Published: 2017

26. Fine-Tuning Tree-LSTM for Phrase-Level Sentiment Classification on a Polish Dependency Treebank

Author: Korbak, Tomasz, Żak, Paulina, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Vetulani, Zygmunt, editor, Paroubek, Patrick, editor, and Kubis, Marek, editor
Published: 2020
Full Text: View/download PDF

27. Computational enactivism under the free energy principle

Author: Korbak, Tomasz
Published: 2021
Full Text: View/download PDF

28. Fine-Tuning Tree-LSTM for Phrase-Level Sentiment Classification on a Polish Dependency Treebank

Author: Korbak, Tomasz, primary and Żak, Paulina, additional
Published: 2020
Full Text: View/download PDF

29. Scaffolded Minds And The Evolution Of Content In Signaling Pathways

Author: Korbak Tomasz
Subjects: philosophy of cognitive science, content, hutto and myin, hard problem of content, representation, cell signaling, the distributed view of language, History of scholarship and learning. The humanities, AZ20-999
Abstract: Hutto and Myin (2013) famously argue that basic minds are not contentful and content exists only as far as it is scaffolded with social and linguistic practices. This view, however, rests on a troublesome distinction between basic and scaffolded minds. Since Hutto and Myin have to account for language purely in terms of joint action guidance, there is no reason why simpler communication systems, such as cellular signaling pathways, should not give rise to scaffolded content as well. This conclusion remains valid even if one rejects the view of language as mediated through public symbols and embraces global antirepresentationalism. Content evolves spontaneously in complex regulatory systems, such as human, animal, and cellular communication.
Published: 2015
Full Text: View/download PDF

30. Self-organisation, (M, R)–systems and enactive cognitive science

Author: Korbak, Tomasz, primary
Published: 2022
Full Text: View/download PDF

31. A continuity of Markov blanket interpretations under the Free Energy Principle

Author: Seth, Anil, primary, Korbak, Tomasz, additional, and Tschantz, Alexander, additional
Published: 2022
Full Text: View/download PDF

32. Enough blanket metaphysics, time for data-driven heuristics

Author: Rorot, Wiktor, primary, Korbak, Tomasz, additional, Litwin, Piotr, additional, and Miłkowski, Marcin, additional
Published: 2022
Full Text: View/download PDF

33. RL with KL penalties is better viewed as Bayesian inference

Author: Korbak, Tomasz, primary, Perez, Ethan, additional, and Buckley, Christopher, additional
Published: 2022
Full Text: View/download PDF

34. Self-organisation, (M, R)–systems and enactive cognitive science.

Author: Korbak, Tomasz
Published: 2023
Full Text: View/download PDF

35. Interaction history as a source of compositionality in emergent communication

Author: Korbak, Tomasz, primary, Zubek, Julian, additional, Kuciński, Łukasz, additional, Miłoś, Piotr, additional, and Rączaszek-Leonardi, Joanna, additional
Published: 2021
Full Text: View/download PDF

36. Unsupervised learning and the natural origins of content

Author: Korbak, Tomasz and Korbak, Tomasz
Abstract: In this paper I evaluate the prospects and limitations of radical enactivism as recently developed by Hutto and Myin (henceforth, “H&M”) (2017). According to radical enactivism, cognition does not essentially involve content and admits explanations on a semantic level only as far as it is scaffolded with social and linguistic practices. Numerous authors argued this view to be indefensible because H&M’s objections against semantic accounts of basic minds are flawed and they fail to provide a positive research program for cognitive science. I investigate these concerns focusing on H&M’s criticism of predictive processing account of cognition (dubbed Bootstrap Hell argument) and their own account of the emergence of content (the Natural Origins of Content). My claim is that H&M fail in both of these fronts, which cast a shadow of doubt on whether radical enactivism is a philosophically and empirically interesting approach at all.
Published: 2019

37. Computational enactivism under the free energy principle

Author: Korbak, Tomasz, primary
Published: 2019
Full Text: View/download PDF

38. Unsupervised Learning and the Natural Origins of Content

Author: Korbak, Tomasz, primary
Published: 2019
Full Text: View/download PDF

39. Enough blanket metaphysics, time for data-driven heuristics.

Author: Rorot, Wiktor, Korbak, Tomasz, Litwin, Piotr, and Miłkowski, Marcin
Subjects: *FREE energy (Thermodynamics), *METAPHYSICS, *HEURISTIC
Abstract: Bruineberg and colleagues criticisms' have been received but downplayed in the free energy principle (FEP) literature. We strengthen their points, arguing that Friston blanket discovery, even if tractable, requires a full formal description of the system of interest at the outset. Hence, blanket metaphysics is futile, and we postulate that researchers should turn back to heuristic uses of Pearl blankets. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

40. Apercepcja transcendentalna w kantowskim modelu epigenezy czystego rozumu.

Author: Korbak, Tomasz
Published: 2015

41. Dlaczego reprezentacje nie trzymają się modeli dynamicznych?

Author: KORBAK, TOMASZ
Abstract: In this paper I investigate thesis, embraced by proponents of dynamicism in cognitive science, that mind is not representational and explanation of cognition can go without representations. This claim has received serious criticism from cognitive scientists and philosophers of mind, who accuse dynamical explanation of being satisfying only for a narrow class of simple cognitive phenomena. Thus, genuine, representation-free explanation of cognition will always be incomplete. I espouse another strategy and present two arguments saying that the language of pure dynamical systems theory is not rich enough to define any nontrivial notion of representation. If I am right, then at least these phenomena dynamical explanation deals well with are not representational and representation talk can in no way help us understand them. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

42. A continuity of Markov blanket interpretations under the free-energy principle.

Author: Seth A, Korbak T, and Tschantz A
Abstract: Bruineberg and colleagues helpfully distinguish between instrumental and ontological interpretations of Markov blankets, exposing the dangers of using the former to make claims about the latter. However, proposing a sharp distinction neglects the value of recognising a continuum spanning from instrumental to ontological. This value extends to the related distinction between "being" and "having" a model.
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

42 results on '"Korbak, Tomasz"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources