14 results on '"GAMBARDELLA, ANDREW"'
Search Results
2. gOd, mOther and sOldier : A Story of Oppression, Told through the Lens of AI
- Author
-
Gambardella, Andrew, Chung, Meeyung, Choi, Doyo, and Lee, Jinjoon
- Published
- 2023
3. Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?
- Author
-
Uchiyama, Fumiya, Kojima, Takeshi, Gambardella, Andrew, Cao, Qi, Iwasawa, Yusuke, and Matsuo, Yutaka
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Recent large language models (LLMs) have demonstrated remarkable generalization abilities in mathematics and logical reasoning tasks. Prior research indicates that LLMs pre-trained with programming language data exhibit high mathematical and reasoning abilities; however, this causal relationship has not been rigorously tested. Our research aims to verify which programming languages and features during pre-training affect logical inference performance. Specifically, we pre-trained decoder-based language models from scratch using datasets from ten programming languages (e.g., Python, C, Java) and three natural language datasets (Wikipedia, Fineweb, C4) under identical conditions. Thereafter, we evaluated the trained models in a few-shot in-context learning setting on logical reasoning tasks: FLD and bAbi, which do not require commonsense or world knowledge. The results demonstrate that nearly all models trained with programming languages consistently outperform those trained with natural languages, indicating that programming languages contain factors that elicit logic inference performance. In addition, we found that models trained with programming languages exhibit a better ability to follow instructions compared to those trained with natural languages. Further analysis reveals that the depth of Abstract Syntax Trees representing parsed results of programs also affects logical reasoning performance. These findings will offer insights into the essential elements of pre-training for acquiring the foundational abilities of LLMs.
- Published
- 2024
4. Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
- Author
-
Takashiro, Shota, Kojima, Takeshi, Gambardella, Andrew, Cao, Qi, Iwasawa, Yusuke, and Matsuo, Yutaka
- Subjects
Computer Science - Computation and Language - Abstract
As large language models (LLMs) are applied across diverse domains, the ability to selectively unlearn specific information has become increasingly essential. For instance, LLMs are expected to provide confidential information to authorized internal users, such as employees or trusted partners, while withholding it from external users, including the general public and unauthorized entities. In response to this challenge, we propose a novel method termed ``in-context knowledge unlearning'', which enables the model to selectively forget information in test-time based on the context of the query. Our method fine-tunes pre-trained LLMs to enable prompt unlearning of target knowledge within the context, while preserving other knowledge. Experiments on the TOFU and AGE datasets using Llama2-7B/13B and Mistral-7B models show our method achieves up to 95% forgetting accuracy while retaining 80% of unrelated knowledge, significantly outperforming baselines in both in-domain and out-of-domain scenarios. Further investigation into the model's internal behavior revealed that while fine-tuned LLMs generate correct predictions in the middle layers and maintain them up to the final layer, they make the decision to forget at the last layer, i.e., ``LLMs pretend to forget''. Our findings offer valuable insights into enhancing the robustness of unlearning mechanisms in LLMs, setting a foundation for future research in the field.
- Published
- 2024
5. Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks
- Author
-
Gambardella, Andrew, Iwasawa, Yusuke, and Matsuo, Yutaka
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
The ability (and inability) of large language models (LLMs) to perform arithmetic tasks has been the subject of much theoretical and practical debate. We show that LLMs are frequently able to correctly and confidently predict the first digit of n-digit by m-digit multiplication tasks without using chain of thought reasoning, despite these tasks require compounding operations to solve. Simultaneously, LLMs in practice often fail to correctly or confidently predict the last digit of an n-digit by m-digit multiplication, a task equivalent to 1-digit by 1-digit multiplication which can be easily learned or memorized. We show that the latter task can be solved more robustly when the LLM is conditioned on all of the correct higher-order digits, which on average increases the confidence of the correct last digit on 5-digit by 5-digit multiplication tasks using Llama 2-13B by over 230% (0.13 to 0.43) and Mistral-7B by 150% (0.22 to 0.55)., Comment: In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Published
- 2024
6. Real-World Robot Applications of Foundation Models: A Review
- Author
-
Kawaharazuka, Kento, Matsushima, Tatsuya, Gambardella, Andrew, Guo, Jiaxian, Paxton, Chris, and Zeng, Andy
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-Language Models (VLMs), trained on extensive data, facilitate flexible application across different tasks and modalities. Their impact spans various fields, including healthcare, education, and robotics. This paper provides an overview of the practical application of foundation models in real-world robotics, with a primary emphasis on the replacement of specific components within existing robot systems. The summary encompasses the perspective of input-output relationships in foundation models, as well as their role in perception, motion planning, and control within the field of robotics. This paper concludes with a discussion of future challenges and implications for practical robot applications.
- Published
- 2024
- Full Text
- View/download PDF
7. Efficient Data Mosaicing with Simulation-based Inference
- Author
-
Gambardella, Andrew, Choi, Youngjun, Choi, Doyo, and Lee, Jinjoon
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing ,Statistics - Applications - Abstract
We introduce an efficient algorithm for general data mosaicing, based on the simulation-based inference paradigm. Our algorithm takes as input a target datum, source data, and partitions of the target and source data into fragments, learning distributions over averages of fragments of the source data such that samples from those distributions approximate fragments of the target datum. We utilize a model that can be trivially parallelized in conjunction with the latest advances in efficient simulation-based inference in order to find approximate posteriors fast enough for use in practical applications. We demonstrate our technique is effective in both audio and image mosaicing problems.
- Published
- 2022
8. Detecting and Quantifying Malicious Activity with Simulation-based Inference
- Author
-
Gambardella, Andrew, State, Bogdan, Khan, Naeemullah, Tsourides, Leo, Torr, Philip H. S., and Baydin, Atılım Güneş
- Subjects
Statistics - Machine Learning ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning ,Statistics - Applications - Abstract
We propose the use of probabilistic programming techniques to tackle the malicious user identification problem in a recommendation algorithm. Probabilistic programming provides numerous advantages over other techniques, including but not limited to providing a disentangled representation of how malicious users acted under a structured model, as well as allowing for the quantification of damage caused by malicious users. We show experiments in malicious user identification using a model of regular and malicious users interacting with a simple recommendation algorithm, and provide a novel simulation-based measure for quantifying the effects of a user or group of users on its dynamics., Comment: Short version, appeared at ICML workshop on Socially Responsible Machine Learning 2021
- Published
- 2021
9. Simulation-Based Inference for Global Health Decisions
- Author
-
de Witt, Christian Schroeder, Gram-Hansen, Bradley, Nardelli, Nantas, Gambardella, Andrew, Zinkov, Rob, Dokania, Puneet, Siddharth, N., Espinosa-Gonzalez, Ana Belen, Darzi, Ara, Torr, Philip, and Baydin, Atılım Güneş
- Subjects
Computer Science - Machine Learning ,Statistics - Applications ,Statistics - Machine Learning - Abstract
The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are developing software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria (https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators.
- Published
- 2020
10. Deep transfer learning with Bayesian inference
- Author
-
Gambardella, Andrew and Torr, Philip
- Subjects
Machine learning ,Inference ,Reinforcement learning ,Transfer learning (Machine learning) - Abstract
Since the deep learning revolution, a general trend in machine learning literature has been that large, deep models will consistently outperform small, shallow models. This trend, however, comes with the drawback of ever-increasing compute requirements, with many recent state-of-the-art results requiring resources well out of reach of all but the top industry labs. Such issues raise very real concerns with regards to the democratisation of machine learning research, and left unaddressed could ultimately lead to more power and wealth being concentrated in the institutions which are able to invest extremely large sums of money into their AI research programs today. Transfer learning techniques are a potential solution to these issues, allowing large, general models to be trained once, and then reused in a variety of situations with minimal computation required to adapt them. This work explores novel algorithms and applications of transfer learning in domains as diverse as hierarchical reinforcement learning, generative modeling, and computational social science. Within the hierarchical reinforcement learning domain, we present an algorithm that allows for transfer between options (i.e., temporally abstracted actions) over separate but similar tasks. In the generative modeling domain, we present an algorithm for reusing existing invertible generative models on new data without incurring any extra training cost. Lastly, in the computational social science domain, we show that knowledge can be transferred from human-designed models in order to detect malicious activity targeting a ranking algorithm. The common thread between all of the algorithms presented in this thesis is that they are inherently Bayesian. We argue that the Bayesian paradigm naturally lends itself to transfer learning applications, in that Bayesian priors can serve as adaptable, general models which can be transformed into task-specific posteriors through the process of inference.
- Published
- 2021
11. Transflow Learning: Repurposing Flow Models Without Retraining
- Author
-
Gambardella, Andrew, Baydin, Atılım Güneş, and Torr, Philip H. S.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
It is well known that deep generative models have a rich latent space, and that it is possible to smoothly manipulate their outputs by traversing this latent space. Recently, architectures have emerged that allow for more complex manipulations, such as making an image look as though it were from a different class, or painted in a certain style. These methods typically require large amounts of training in order to learn a single class of manipulations. We present Transflow Learning, a method for transforming a pre-trained generative model so that its outputs more closely resemble data that we provide afterwards. In contrast to previous methods, Transflow Learning does not require any training at all, and instead warps the probability distribution from which we sample latent vectors using Bayesian inference. Transflow Learning can be used to solve a wide variety of tasks, such as neural style transfer and few-shot classification.
- Published
- 2019
12. Multitask Soft Option Learning
- Author
-
Igl, Maximilian, Gambardella, Andrew, He, Jinke, Nardelli, Nantas, Siddharth, N., Böhmer, Wendelin, and Whiteson, Shimon
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This ''soft'' version of options avoids several instabilities during training in a multitask setting, and provides a natural way to learn both intra-option policies and their terminations. Furthermore, it allows fine-tuning of options for new tasks without forgetting their learned policies, leading to faster training without reducing the expressiveness of the hierarchical policy. We demonstrate empirically that MSOL significantly outperforms both hierarchical and flat transfer-learning baselines., Comment: Published at UAI 2020
- Published
- 2019
13. Multitask Soft Option Learning
- Author
-
Igl, Maximilian (author), Gambardella, Andrew (author), He, J. (author), Nardelli, Nantas (author), Siddharth, N (author), Böhmer, J.W. (author), Whiteson, Shimon (author), Igl, Maximilian (author), Gambardella, Andrew (author), He, J. (author), Nardelli, Nantas (author), Siddharth, N (author), Böhmer, J.W. (author), and Whiteson, Shimon (author)
- Abstract
We present Multitask Soft Option Learning (MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This “soft” version of options avoids several instabilities during training in a multitask setting, and provides a natural way to learn both intra-option policies and their terminations. Furthermore, it allows fine-tuning of options for new tasks without forgetting their learned policies, leading to faster training without reducing the expressiveness of the hierarchical policy. We demonstrate empirically that MSOL significantly outperforms both hierarchical and flat transfer-learning baselines., Interactive Intelligence
- Published
- 2020
14. LETTERS.
- Author
-
Hoos, B. G., MACHESNEY, TOM, DEIHL, DAVID H., DUNCAN, JANET A., NEFF, JUNE, GAMBARDELLA, ANDREW, LARSEN, ROBERT, WOODSIDE, W. W., KENDELL, ROBERT LOTHAR, LLOYD, D. A., HERR, JOHN KNOWLES, LEBAU, HARRY, BARTON, J. H., and YORKE, BETTY
- Subjects
LETTERS to the editor ,PRESIDENTIAL candidates ,CAVALRY ,KOREAN War, 1950-1953 - Abstract
Several letters to the editor are presented in response to articles in previous issues including an opposition to possible Republican presidential candidates, an appreciation of an article in the June 4, 1951 issue on the 43rd Division of the U.S. Army at Camp Pickett, Virginia, and a call on the revival of the U.S. cavalry in the Korean war.
- Published
- 1951
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.