153,921 results on '"P. Forget"'
Search Results
2. Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
- Author
-
Takashiro, Shota, Kojima, Takeshi, Gambardella, Andrew, Cao, Qi, Iwasawa, Yusuke, and Matsuo, Yutaka
- Subjects
Computer Science - Computation and Language - Abstract
As large language models (LLMs) are applied across diverse domains, the ability to selectively unlearn specific information has become increasingly essential. For instance, LLMs are expected to provide confidential information to authorized internal users, such as employees or trusted partners, while withholding it from external users, including the general public and unauthorized entities. In response to this challenge, we propose a novel method termed ``in-context knowledge unlearning'', which enables the model to selectively forget information in test-time based on the context of the query. Our method fine-tunes pre-trained LLMs to enable prompt unlearning of target knowledge within the context, while preserving other knowledge. Experiments on the TOFU and AGE datasets using Llama2-7B/13B and Mistral-7B models show our method achieves up to 95% forgetting accuracy while retaining 80% of unrelated knowledge, significantly outperforming baselines in both in-domain and out-of-domain scenarios. Further investigation into the model's internal behavior revealed that while fine-tuned LLMs generate correct predictions in the middle layers and maintain them up to the final layer, they make the decision to forget at the last layer, i.e., ``LLMs pretend to forget''. Our findings offer valuable insights into enhancing the robustness of unlearning mechanisms in LLMs, setting a foundation for future research in the field.
- Published
- 2024
3. How much can we forget about Data Contamination?
- Author
-
Bordt, Sebastian, Srinivas, Suraj, Boreiko, Valentyn, and von Luxburg, Ulrike
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
The leakage of benchmark data into the training data has emerged as a significant challenge for evaluating the capabilities of large language models (LLMs). In this work, we use experimental evidence and theoretical estimates to challenge the common assumption that small-scale contamination renders benchmark evaluations invalid. First, we experimentally quantify the magnitude of benchmark overfitting based on scaling along three dimensions: The number of model parameters (up to 1.6B), the number of times an example is seen (up to 144), and the number of training tokens (up to 40B). We find that if model and data follow the Chinchilla scaling laws, minor contamination indeed leads to overfitting. At the same time, even 144 times of contamination can be forgotten if the training data is scaled beyond five times Chinchilla, a regime characteristic of many modern LLMs. We then derive a simple theory of example forgetting via cumulative weight decay. It allows us to bound the number of gradient steps required to forget past data for any training run where we know the hyperparameters of AdamW. This indicates that many LLMs, including Llama 3, have forgotten the data seen at the beginning of training. Experimentally, we demonstrate that forgetting occurs faster than what is predicted by our bounds. Taken together, our results suggest that moderate amounts of contamination can be forgotten at the end of realistically scaled training runs.
- Published
- 2024
4. SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture
- Author
-
Han, Jiayi, Du, Liang, Du, Hongwei, Zhou, Xiangguo, Wu, Yiwen, Zheng, Weibo, and Han, Donghong
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Although many efforts have been made, it is still a challenge to balance the training budget, downstream performance, and the general capabilities of the LLMs in many applications. Training the whole model for downstream tasks is expensive, and could easily result in catastrophic forgetting. By introducing parameter-efficient fine-tuning (PEFT), the training cost could be reduced, but it still suffers from forgetting, and limits the learning on the downstream tasks. To efficiently fine-tune the LLMs with less limitation to their downstream performance while mitigating the forgetting of general capabilities, we propose a novel mixture of expert (MoE) framework based on Soft LoRA and Identity Mixture (SLIM), that allows dynamic routing between LoRA adapters and skipping connection, enables the suppression of forgetting. We adopt weight-yielding with sliding clustering for better out-of-domain distinguish to enhance the routing. We also propose to convert the mixture of low-rank adapters to the model merging formulation and introduce fast dynamic merging of LoRA adapters to keep the general capabilities of the base model. Extensive experiments demonstrate that the proposed SLIM is comparable to the state-of-the-art PEFT approaches on the downstream tasks while achieving the leading performance in mitigating catastrophic forgetting., Comment: 11 pages, 6 figures, 4 tables
- Published
- 2024
5. Should RAG Chatbots Forget Unimportant Conversations? Exploring Importance and Forgetting with Psychological Insights
- Author
-
Sumida, Ryuichi, Inoue, Koji, and Kawahara, Tatsuya
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
While Retrieval-Augmented Generation (RAG) has shown promise in enhancing long-term conversations, the increasing memory load as conversations progress degrades retrieval accuracy. Drawing on psychological insights, we propose LUFY, a simple yet effective method that focuses on emotionally arousing memories and retains less than 10% of the conversation. In the user experiment, participants interacted with three types of RAG chatbots, each for 2 hours over 4 sessions, marking the most extensive assessment of a chatbot's long-term capabilities to date -- more than four times longer than any existing benchmark. The results demonstrate that prioritizing arousing memories while forgetting the majority of the conversation significantly enhances user experience. This study pushes the frontier of long-term conversations and highlights the importance of forgetting unimportant parts of conversations. Code and Dataset: https://github.com/ryuichi-sumida/LUFY
- Published
- 2024
6. Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
- Author
-
Rashid, Md Rafi Ur, Liu, Jing, Koike-Akino, Toshiaki, Mehnaz, Shagufta, and Wang, Ye
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pre-trained models from unverified sources, highlighting the potential risks involved.
- Published
- 2024
7. 'They Just Forget about the Students': Growing Resilient Urban Farmers with a Research Practice Partnership
- Author
-
Marc T. Sager and Anthony J. Petrosino
- Abstract
A sustainable transdisciplinary research network was established through a research practice partnership (RPP) between an urban farm, faculty and staff from a Historically Black College (HBC), and researchers at a medium-sized private university. We investigate student-worker resilience at this urban farm situated on the HBC campus, drawing on literature that explores tensions between informal learning environments and formal spaces, equitable food systems and farming systems, as well as the resilience of farm work, and which is grounded critical food systems education theory. Utilizing a participatory design approach, we conducted semi-structured interviews and deductively analyzed the data. The research questions guiding this paper are: (1) What topics of discussion are most important to the student-workers and staff working on an urban farm, (2) How do student-workers and college staff members perceive and experience resilience on an urban farm? We found that what participants on an urban farm discuss, relating to their experiences, include (1) how participants were eager to "engage" with the local community, (2) how participants demonstrated "resilience" while working on the urban farm, (3) how "power dynamics" played a pivotal role informing the direction of the urban farm, (4) how participants consider community "access" to healthy foods an important mission for the farm, (5) how the college acted as a power wielding entity, perpetuating its "privilege" over the farmers and the farm operations. These findings have the potential to enable community organizing spaces to promote resilience for their volunteers and workers, and for urban farms top partner with their community to promote the mission of increasing access to healthy and affordable food options.
- Published
- 2024
- Full Text
- View/download PDF
8. Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference
- Author
-
Ji, Jiabao, Liu, Yujian, Zhang, Yang, Liu, Gaowen, Kompella, Ramana Rao, Liu, Sijia, and Chang, Shiyu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the other knowledge that the LLM possesses, for which we assume access to a small number of retain documents. To achieve both goals, a mainstream class of LLM unlearning methods introduces an optimization framework with a combination of two objectives - maximizing the prediction loss on the forget documents while minimizing that on the retain documents, which suffers from two challenges, degenerated output and catastrophic forgetting. In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. ULD then derives the unlearned LLM by computing the logit difference between the target and the assistant LLMs. We show that such reversed objectives would naturally resolve both aforementioned challenges while significantly improving the training efficiency. Extensive experiments demonstrate that our method efficiently achieves the intended forgetting while preserving the LLM's overall capabilities, reducing training time by more than threefold. Notably, our method loses 0% of model utility on the ToFU benchmark, whereas baseline methods may sacrifice 17% of utility on average to achieve comparable forget quality. Our code will be publicly available at https://github.com/UCSB-NLP-Chang/ULD., Comment: 21 pages, 11 figures
- Published
- 2024
9. Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM
- Author
-
Li, Baicheng, Yan, Zike, Wu, Dong, Jiang, Hanqing, and Zha, Hongbin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Simultaneous localization and mapping (SLAM) with implicit neural representations has received extensive attention due to the expressive representation power and the innovative paradigm of continual learning. However, deploying such a system within a dynamic environment has not been well-studied. Such challenges are intractable even for conventional algorithms since observations from different views with dynamic objects involved break the geometric and photometric consistency, whereas the consistency lays the foundation for joint optimizing the camera pose and the map parameters. In this paper, we best exploit the characteristics of continual learning and propose a novel SLAM framework for dynamic environments. While past efforts have been made to avoid catastrophic forgetting by exploiting an experience replay strategy, we view forgetting as a desirable characteristic. By adaptively controlling the replayed buffer, the ambiguity caused by moving objects can be easily alleviated through forgetting. We restrain the replay of the dynamic objects by introducing a continually-learned classifier for dynamic object identification. The iterative optimization of the neural map and the classifier notably improves the robustness of the SLAM system under a dynamic environment. Experiments on challenging datasets verify the effectiveness of the proposed framework.
- Published
- 2024
10. Learn and Don't Forget: Adding a New Language to ASR Foundation Models
- Author
-
Qian, Mengjie, Tang, Siyuan, Ma, Rao, Knill, Kate M., and Gales, Mark J. F.
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Computer Science - Sound - Abstract
Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code tuning, train only the language code; soft prompt tuning, train prepended tokens; and LoRA where a small set of additional parameters are optimised. Elastic Weight Consolidation (EWC) offers an alternative compromise with the potential to maintain performance in specific target languages. Results show that direct fine-tuning yields the best performance for the new language but degrades existing language capabilities. EWC can address this issue for specific languages. If only adaptation parameters are used, the language capabilities are maintained but at the cost of performance in the new language., Comment: Proceedings of Interspeech
- Published
- 2024
- Full Text
- View/download PDF
11. To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
- Author
-
Tian, Bozhong, Liang, Xiaozhuan, Cheng, Siyuan, Liu, Qingbin, Wang, Mengru, Sui, Dianbo, Chen, Xi, Chen, Huajun, and Zhang, Ningyu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Multimedia - Abstract
Large Language Models (LLMs) trained on extensive corpora inevitably retain sensitive data, such as personal privacy information and copyrighted material. Recent advancements in knowledge unlearning involve updating LLM parameters to erase specific knowledge. However, current unlearning paradigms are mired in vague forgetting boundaries, often erasing knowledge indiscriminately. In this work, we introduce KnowUnDo, a benchmark containing copyrighted content and user privacy domains to evaluate if the unlearning process inadvertently erases essential knowledge. Our findings indicate that existing unlearning methods often suffer from excessive unlearning. To address this, we propose a simple yet effective method, MemFlex, which utilizes gradient information to precisely target and unlearn sensitive parameters. Experimental results show that MemFlex is superior to existing methods in both precise knowledge unlearning and general knowledge retaining of LLMs. Code and dataset are released at https://github.com/zjunlp/KnowUnDo., Comment: EMNLP 2024 Findings; Code and dataset are released at https://github.com/zjunlp/KnowUnDo
- Published
- 2024
12. Forget but Recall: Incremental Latent Rectification in Continual Learning
- Author
-
Nguyen, Nghia D., Nguyen, Hieu Trung, Li, Ang, Pham, Hoang, Nguyen, Viet Anh, and Doan, Khoa D.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs). However, current DNNs suffer from catastrophic forgetting, which hinders remembering past knowledge. To mitigate this issue, existing Continual Learning (CL) approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks. This paper investigates an unexplored CL direction for incremental learning called Incremental Latent Rectification or ILR. In a nutshell, ILR learns to propagate with correction (or rectify) the representation from the current trained DNN backward to the representation space of the old task, where performing predictive decisions is easier. This rectification process only employs a chain of small representation mapping networks, called rectifier units. Empirical experiments on several continual learning benchmarks, including CIFAR10, CIFAR100, and Tiny ImageNet, demonstrate the effectiveness and potential of this novel CL direction compared to existing representative CL methods.
- Published
- 2024
13. Don't Forget Too Much: Towards Machine Unlearning on Feature Level
- Author
-
Xu, Heng, Zhu, Tianqing, Zhou, Wanlei, and Zhao, Wei
- Subjects
Computer Science - Cryptography and Security - Abstract
Machine unlearning enables pre-trained models to remove the effect of certain portions of training data. Previous machine unlearning schemes have mainly focused on unlearning a cluster of instances or all instances belonging to a specific class. These types of unlearning might have a significant impact on the model utility; and they may be inadequate for situations where we only need to unlearn features within instances, rather than the whole instances. Due to the different granularity, current unlearning methods can hardly achieve feature-level unlearning. To address the challenges of utility and granularity, we propose a refined granularity unlearning scheme referred to as ``feature unlearning". We first explore two distinct scenarios based on whether the annotation information about the features is given: feature unlearning with known annotations and feature unlearning without annotations. Regarding unlearning with known annotations, we propose an adversarial learning approach to automatically remove effects about features. For unlearning without annotations, we initially enable the output of one model's layer to identify different pattern features using model interpretability techniques. We proceed to filter features from instances based on these outputs with identifying ability. So that we can remove the feature impact based on filtered instances and the fine-tuning process. The effectiveness of our proposed approach is demonstrated through experiments involving diverse models on various datasets in different scenarios.
- Published
- 2024
14. Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics
- Author
-
Vani, Ankit, Tung, Frederick, Oliveira, Gabriel L., and Sharifi-Noghabi, Hossein
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Despite attaining high empirical generalization, the sharpness of models trained with sharpness-aware minimization (SAM) do not always correlate with generalization error. Instead of viewing SAM as minimizing sharpness to improve generalization, our paper considers a new perspective based on SAM's training dynamics. We propose that perturbations in SAM perform perturbed forgetting, where they discard undesirable model biases to exhibit learning signals that generalize better. We relate our notion of forgetting to the information bottleneck principle, use it to explain observations like the better generalization of smaller perturbation batches, and show that perturbed forgetting can exhibit a stronger correlation with generalization than flatness. While standard SAM targets model biases exposed by the steepest ascent directions, we propose a new perturbation that targets biases exposed through the model's outputs. Our output bias forgetting perturbations outperform standard SAM, GSAM, and ASAM on ImageNet, robustness benchmarks, and transfer to CIFAR-{10,100}, while sometimes converging to sharper regions. Our results suggest that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface., Comment: Published as a conference paper at ICML 2024. 9 pages main, 15 pages total including references and appendix
- Published
- 2024
15. Don't Forget to Connect! Improving RAG with Graph-based Reranking
- Author
-
Dong, Jialin, Fatemi, Bahare, Perozzi, Bryan, Yang, Lin F., and Tsitsulin, Anton
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Social and Information Networks - Abstract
Retrieval Augmented Generation (RAG) has greatly improved the performance of Large Language Model (LLM) responses by grounding generation with context from existing documents. These systems work well when documents are clearly relevant to a question context. But what about when a document has partial information, or less obvious connections to the context? And how should we reason about connections between documents? In this work, we seek to answer these two core questions about RAG generation. We introduce G-RAG, a reranker based on graph neural networks (GNNs) between the retriever and reader in RAG. Our method combines both connections between documents and semantic information (via Abstract Meaning Representation graphs) to provide a context-informed ranker for RAG. G-RAG outperforms state-of-the-art approaches while having smaller computational footprint. Additionally, we assess the performance of PaLM 2 as a reranker and find it to significantly underperform G-RAG. This result emphasizes the importance of reranking for RAG even when using Large Language Models.
- Published
- 2024
16. Vergiss mein nicht: Ein Plädoyer für die psychotherapeutische Arbeit mit an Demenz erkrankten Personen
- Author
-
Pargfrieder, Sonja
- Published
- 2024
- Full Text
- View/download PDF
17. Visual supports and informative material not to forget counselling on reproductive health in dialysis: a point of view
- Author
-
Chimenti, Giulia, Magli, Anna, Spanu, Giulia, Santagati, Giulia, Fois, Antioco, Njandjo, Linda, Popa, Cristina Adriana, Torreggiani, Massimo, and Piccoli, Giorgina Barbara
- Published
- 2024
- Full Text
- View/download PDF
18. “They forget that I’m a human being”—ward round communication with older patients living with frailty and informal caregivers: a qualitative study
- Author
-
Andersen, Lene Holst, Løfgren, Bo, Skipper, Mads, Krogh, Kristian, and Jensen, Rune Dall
- Published
- 2024
- Full Text
- View/download PDF
19. Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning
- Author
-
Fan, Chongyu, Liu, Jiancheng, Hero, Alfred, and Liu, Sijia
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The trustworthy machine learning (ML) community is increasingly recognizing the crucial need for models capable of selectively 'unlearning' data points after training. This leads to the problem of machine unlearning (MU), aiming to eliminate the influence of chosen data points on model performance, while still maintaining the model's utility post-unlearning. Despite various MU methods for data influence erasure, evaluations have largely focused on random data forgetting, ignoring the vital inquiry into which subset should be chosen to truly gauge the authenticity of unlearning performance. To tackle this issue, we introduce a new evaluative angle for MU from an adversarial viewpoint. We propose identifying the data subset that presents the most significant challenge for influence erasure, i.e., pinpointing the worst-case forget set. Utilizing a bi-level optimization principle, we amplify unlearning challenges at the upper optimization level to emulate worst-case scenarios, while simultaneously engaging in standard training and unlearning at the lower level, achieving a balance between data influence erasure and model utility. Our proposal offers a worst-case evaluation of MU's resilience and effectiveness. Through extensive experiments across different datasets (including CIFAR-10, 100, CelebA, Tiny ImageNet, and ImageNet) and models (including both image classifiers and generative models), we expose critical pros and cons in existing (approximate) unlearning strategies. Our results illuminate the complex challenges of MU in practice, guiding the future development of more accurate and robust unlearning algorithms. The code is available at https://github.com/OPTML-Group/Unlearn-WorstCase., Comment: Accepted by ECCV 2024
- Published
- 2024
20. Learning Symbolic Task Representation from a Human-Led Demonstration: A Memory to Store, Retrieve, Consolidate, and Forget Experiences
- Author
-
Buoncompagni, Luca and Mastrogiovanni, Fulvio
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction ,Computer Science - Logic in Computer Science ,68T40 (Primary) 68T20, 68T27, 68T30, 68T37, 05C72, 68Q32 (Secondary) ,I.2.4 ,I.2.6 ,E.1 - Abstract
We present a symbolic learning framework inspired by cognitive-like memory functionalities (i.e., storing, retrieving, consolidating and forgetting) to generate task representations to support high-level task planning and knowledge bootstrapping. We address a scenario involving a non-expert human, who performs a single task demonstration, and a robot, which online learns structured knowledge to re-execute the task based on experiences, i.e., observations. We consider a one-shot learning process based on non-annotated data to store an intelligible representation of the task, which can be refined through interaction, e.g., via verbal or visual communication. Our general-purpose framework relies on fuzzy Description Logic, which has been used to extend the previously developed Scene Identification and Tagging algorithm. In this paper, we exploit such an algorithm to implement cognitive-like memory functionalities employing scores that rank memorised observations over time based on simple heuristics. Our main contribution is the formalisation of a framework that can be used to systematically investigate different heuristics for bootstrapping hierarchical knowledge representations based on robot observations. Through an illustrative assembly task scenario, the paper presents the performance of our framework to discuss its benefits and limitations.
- Published
- 2024
21. 'Don't forget to put the milk back!' Dataset for Enabling Embodied Agents to Detect Anomalous Situations
- Author
-
Mullen Jr, James F., Goyal, Prasoon, Piramuthu, Robinson, Johnston, Michael, Manocha, Dinesh, and Ghanadan, Reza
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Home robots intend to make their users lives easier. Our work assists in this goal by enabling robots to inform their users of dangerous or unsanitary anomalies in their home. Some examples of these anomalies include the user leaving their milk out, forgetting to turn off the stove, or leaving poison accessible to children. To move towards enabling home robots with these abilities, we have created a new dataset, which we call SafetyDetect. The SafetyDetect dataset consists of 1000 anomalous home scenes, each of which contains unsafe or unsanitary situations for an agent to detect. Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene. Our key insight is that this connected scene graph and the object relationships it encodes enables the LLM to better reason about the scene -- especially as it relates to detecting dangerous or unsanitary situations. Our most promising approach utilizes GPT-4 and pursues a categorization technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children. This method is able to correctly identify over 90% of anomalous scenarios in the SafetyDetect Dataset. Additionally, we conduct real world experiments on a ClearPath TurtleBot where we generate a scene graph from visuals of the real world scene, and run our approach with no modification. This setup resulted in little performance loss. The SafetyDetect Dataset and code will be released to the public upon this papers publication.
- Published
- 2024
22. Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
- Author
-
Bordt, Sebastian, Nori, Harsha, Rodrigues, Vanessa, Nushi, Besmira, and Caruana, Rich
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
While many have shown how Large Language Models (LLMs) can be applied to a diverse set of tasks, the critical issues of data contamination and memorization are often glossed over. In this work, we address this concern for tabular data. Specifically, we introduce a variety of different techniques to assess whether a language model has seen a tabular dataset during training. This investigation reveals that LLMs have memorized many popular tabular datasets verbatim. We then compare the few-shot learning performance of LLMs on datasets that were seen during training to the performance on datasets released after training. We find that LLMs perform better on datasets seen during training, indicating that memorization leads to overfitting. At the same time, LLMs show non-trivial performance on novel datasets and are surprisingly robust to data transformations. We then investigate the in-context statistical learning abilities of LLMs. While LLMs are significantly better than random at solving statistical classification problems, the sample efficiency of few-shot learning lags behind traditional statistical learning algorithms, especially as the dimension of the problem increases. This suggests that much of the observed few-shot performance on novel real-world datasets is due to the LLM's world knowledge. Overall, our results highlight the importance of testing whether an LLM has seen an evaluation dataset during pre-training. We release the https://github.com/interpretml/LLM-Tabular-Memorization-Checker Python package to test LLMs for memorization of tabular datasets., Comment: COLM camera ready
- Published
- 2024
23. Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish
- Author
-
Philippy, Fred, Haddadan, Shohreh, and Guo, Siwen
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels. However, this approach faces certain challenges, particularly for languages with limited resources. In this paper, we propose an alternative solution that leverages dictionaries as a source of data for ZSC. We focus on Luxembourgish, a low-resource language spoken in Luxembourg, and construct two new topic relevance classification datasets based on a dictionary that provides various synonyms, word translations and example sentences. We evaluate the usability of our dataset and compare it with the NLI-based approach on two topic classification tasks in a zero-shot manner. Our results show that by using the dictionary-based dataset, the trained models outperform the ones following the NLI-based approach for ZSC. While we focus on a single low-resource language in this study, we believe that the efficacy of our approach can also transfer to other languages where such a dictionary is available., Comment: 3rd Annual Meeting of the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL 2024)
- Published
- 2024
24. Don't Forget What I did?: Assessing Client Contributions in Federated Learning
- Author
-
Ghosh, Bishwamittra, Basu, Debabrota, Huazhu, Fu, Yuan, Wang, Kanagavelu, Renuga, Peng, Jiang Jin, Yong, Liu, Rick, Goh Siow Mong, and Qingsong, Wei
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
Federated Learning (FL) is a collaborative machine learning (ML) approach, where multiple clients participate in training an ML model without exposing the private data. Fair and accurate assessment of client contributions is an important problem in FL to facilitate incentive allocation and encouraging diverse clients to participate in a unified model training. Existing methods for assessing client contribution adopts co-operative game-theoretic concepts, such as Shapley values, but under simplified assumptions. In this paper, we propose a history-aware game-theoretic framework, called FLContrib, to assess client contributions when a subset of (potentially non-i.i.d.) clients participate in each epoch of FL training. By exploiting the FL training process and linearity of Shapley value, we develop FLContrib that yields a historical timeline of client contributions as FL training progresses over epochs. Additionally, to assess client contribution under limited computational budget, we propose a scheduling procedure that considers a two-sided fairness criteria to perform expensive Shapley value computation only in a subset of training epochs. In experiments, we demonstrate a controlled trade-off between the correctness and efficiency of client contributions assessed via FLContrib. To demonstrate the benefits of history-aware client contributions, we apply FLContrib to detect dishonest clients conducting data poisoning in FL training., Comment: Under submission
- Published
- 2024
25. Elephants Never Forget: Testing Language Models for Memorization of Tabular Data
- Author
-
Bordt, Sebastian, Nori, Harsha, and Caruana, Rich
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
While many have shown how Large Language Models (LLMs) can be applied to a diverse set of tasks, the critical issues of data contamination and memorization are often glossed over. In this work, we address this concern for tabular data. Starting with simple qualitative tests for whether an LLM knows the names and values of features, we introduce a variety of different techniques to assess the degrees of contamination, including statistical tests for conditional distribution modeling and four tests that identify memorization. Our investigation reveals that LLMs are pre-trained on many popular tabular datasets. This exposure can lead to invalid performance evaluation on downstream tasks because the LLMs have, in effect, been fit to the test set. Interestingly, we also identify a regime where the language model reproduces important statistics of the data, but fails to reproduce the dataset verbatim. On these datasets, although seen during training, good performance on downstream tasks might not be due to overfitting. Our findings underscore the need for ensuring data integrity in machine learning tasks with LLMs. To facilitate future research, we release an open-source tool that can perform various tests for memorization \url{https://github.com/interpretml/LLM-Tabular-Memorization-Checker}., Comment: Table Representation Learning Workshop at NeurIPS 2023
- Published
- 2024
26. Do not forget the electrons: Extending moderately-sized nuclear networks for multidimensional hydrodynamic codes
- Author
-
García-Senz, Domingo, Cabezón, Rubén M., Reichert, Moritz, Lechuga, Axel S., Escartín, José A., Psaltis, Athanasios, Arcones, Almudena, and Thielemann, Friedrich-Karl
- Subjects
Astrophysics - Solar and Stellar Astrophysics ,Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Instrumentation and Methods for Astrophysics ,Nuclear Theory - Abstract
We present here an extended nuclear network, with 90 species, designed for being coupled with hydrodynamic simulations, which includes neutrons, protons, electrons, positrons, and the corresponding neutrino and anti-neutrino emission. This network is also coupled with temperature, making it extremely robust and, together with its size, unique of its kind. The inclusion of electron captures on free protons makes the network very appropriate for multidimensional studies of Type Ia supernova explosions, especially when the exploding object is a massive white dwarf. The results obtained with the proposed medium-sized network compare fairly well, to a few percent, with those computed with the extended network WinNet (> 2000 isotopes) in scenarios reproducing the gross physical conditions of current Type Ia supernova explosion models. In those cases where the carbon and oxygen fuel ignites at high density, the high-temperature plateau typical of the nuclear statistical equilibrium regime is well defined and stable, allowing large integration time steps. We show that the inclusion of electron captures on free protons substantially improves the estimation of the electron fraction of the mixture. Therefore, the pressure is better determined than in networks where electron captures are excluded, which will ultimately lead to more reliable hydrodynamic models. Explosive combustion of helium at low density, occurring near the surface layer of a white dwarf, is also better described with the proposed network, which gives nuclear energy generation rates much closer to WinNet than typical reduced alpha networks., Comment: 19 pages, 21 Figures and 5 Tables. Accepted for publication in A&A
- Published
- 2024
- Full Text
- View/download PDF
27. Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration
- Author
-
Mao, Xin, Li, Feng-Lin, Xu, Huimin, Zhang, Wei, and Luu, Anh Tuan
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
While Reinforcement Learning from Human Feedback (RLHF) significantly enhances the generation quality of Large Language Models (LLMs), recent studies have raised concerns regarding the complexity and instability associated with the Proximal Policy Optimization (PPO) algorithm, proposing a series of order-based calibration methods as viable alternatives. This paper delves further into current order-based methods, examining their inefficiencies in utilizing reward values and addressing misalignment issues. Building upon these findings, we propose a novel \textbf{V}alue-based \textbf{C}ali\textbf{B}ration (VCB) method to better align LLMs with human preferences. Experimental results demonstrate that VCB surpasses existing alignment methods on AI assistant and summarization datasets, providing impressive generalizability, robustness, and stability in diverse settings., Comment: 19 pages, Under review
- Published
- 2024
28. Can we forget how we learned? Doxastic redundancy in iterated belief revision
- Author
-
Liberatore, Paolo
- Subjects
Computer Science - Artificial Intelligence - Abstract
How information was acquired may become irrelevant. An obvious case is when something is confirmed many times. In terms of iterated belief revision, a specific revision may become irrelevant in presence of others. Simple repetitions are an example, but not the only case when this happens. Sometimes, a revision becomes redundant even in presence of none equal, or even no else implying it. A necessary and sufficient condition for the redundancy of the first of a sequence of lexicographic revisions is given. The problem is coNP-complete even with two propositional revisions only. Complexity is the same in the Horn case but only with an unbounded number of revisions: it becomes polynomial with two revisions. Lexicographic revisions are not only relevant by themselves, but also because sequences of them are the most compact of the common mechanisms used to represent the state of an iterated revision process. Shortening sequences of lexicographic revisions is shortening the most compact representations of iterated belief revision states., Comment: formerly part of arXiv:2305.09200
- Published
- 2024
29. What Will My Model Forget? Forecasting Forgotten Examples in Language Model Refinement
- Author
-
Jin, Xisen and Ren, Xiang
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Statistics - Machine Learning - Abstract
Language models deployed in the wild make errors. However, simply updating the model with the corrected error instances causes catastrophic forgetting -- the updated model makes errors on instances learned during the instruction tuning or upstream training phase. Randomly replaying upstream data yields unsatisfactory performance and often comes with high variance and poor controllability. To this end, we try to forecast upstream examples that will be forgotten due to a model update for improved controllability of the replay process and interpretability. We train forecasting models given a collection of online learned examples and corresponding forgotten upstream pre-training examples. We propose a partially interpretable forecasting model based on the observation that changes in pre-softmax logit scores of pretraining examples resemble that of online learned examples, which performs decently on BART but fails on T5 models. We further show a black-box classifier based on inner products of example representations achieves better forecasting performance over a series of setups. Finally, we show that we reduce forgetting of upstream pretraining examples by replaying examples that are forecasted to be forgotten, demonstrating the practical utility of forecasting example forgetting., Comment: To appear at ICML 2024 (Spotlight)
- Published
- 2024
30. Gesichtsschmerzen: Psychosoziale Aspekte nicht vergessen
- Author
-
Guth, Anna-Lena, Liesering-Latta, Eva, Weiß, Susanne, and Dresler, Thomas
- Published
- 2024
- Full Text
- View/download PDF
31. Do Not Forget about Me, Do Not Forget about You. Usability of a Mobile App for Professional Identity Formation
- Author
-
Silvia Lizett Olivares-Olivares, Miriam Lizzeth Turrubiates Corolla, Juan Pablo Nigenda Alvarez, Natalia Mejía Gaviria, Mariana Lema-Velez, Miguel Angel Villarreal Rodríguez, Luis Carlos Franco Ayala, Elena María Trujillo Maza, Isabel Barriga Cosmelli, and Klaus Puschel Illanes
- Abstract
Purpose: Professional Identity Formation is the dynamic evolution to "think, act and feel" to become part of a professional community. This document presents the development and the study that aimed to assess the usability of a m-Learning Identity App (MLIA) focused on the formation of professional identity among undergraduate medical students. Design/methodology/approach: MLIA development included four phases: Conceptual, prototype, pilot and implementation, before further deployment. The conceptual model was designed by eight faculty members from three Latin American universities. The prototype was developed and tested with stakeholders. The pilot was performed during 5 weeks before the implementation. Cross-sectional data collected during implementation from 138 medical students who completed a survey to assess the usability of MLIA are presented. During deployment, 977 posts were made on Professional Identity Formation, and examples of these posts are presented. Findings: The prototype and pilot phases demanded improvements. The survey explored (1) Familiarity, (2) Perceived ease of use, (3) Perceived usefulness for Professional Identity Formation, (4) Satisfaction, (5) Intention to reuse (6) Digital aesthetics and (7) Safety. Results from the usability assessment suggest that students perceived MLIA as a secure space with positive aesthetics and ease of use. Research limitations/implications: Important limitations of the present study include, firstly, that it does not provide information on the effectiveness of the MLIA in shaping professional identity in medical students, it focuses exclusively on its development (conceptual model, prototype, pilot and implementation) and usability. Secondly, the study design did not consider a control group and, therefore, does not provide information on how the App compares with other strategies addressing self-reflection and sharing of meaningful experiences related to professional identity. Originality/value: MLIA introduces a different approach to education, simulating a secure, easy-to-use, social media with a friendly interface in a safe environment to share academic and motivational moments, transitioning from being to becoming a professional.
- Published
- 2024
- Full Text
- View/download PDF
32. Don't Forget the Tasks: Why Formative Tasks Are Key to Deliberation
- Author
-
Bonnie Lewis, Kathy Swan, and Ryan M. Crowley
- Abstract
Deliberation and inquiry can go hand-in-hand. Inquiry-based learning calls on teachers to facilitate student-led discovery, something that can only happen when students ask questions and weigh possible answers before settling on a plausible and evidentiary answer. Teaching through inquiry is about setting students up to wrestle with the issue at hand using sources so that they can communicate their conclusions. For this outcome to happen, deliberation must be front and center in the progression from compelling question to summative performance task. In this article, the authors discuss how a sequence of tasks supports meaningful deliberation within an inquiry. Specifically, they look at an upper elementary inquiry that features a cost-benefit analysis on a dam project in Tennessee.
- Published
- 2024
33. BrainWash: A Poisoning Attack to Forget in Continual Learning
- Author
-
Abbasi, Ali, Nooralinejad, Parsa, Pirsiavash, Hamed, and Kolouri, Soheil
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
Continual learning has gained substantial attention within the deep learning community, offering promising solutions to the challenging problem of sequential learning. Yet, a largely unexplored facet of this paradigm is its susceptibility to adversarial attacks, especially with the aim of inducing forgetting. In this paper, we introduce "BrainWash," a novel data poisoning method tailored to impose forgetting on a continual learner. By adding the BrainWash noise to a variety of baselines, we demonstrate how a trained continual learner can be induced to forget its previously learned tasks catastrophically, even when using these continual learning baselines. An important feature of our approach is that the attacker requires no access to previous tasks' data and is armed merely with the model's current parameters and the data belonging to the most recent task. Our extensive experiments highlight the efficacy of BrainWash, showcasing degradation in performance across various regularization-based continual learning methods.
- Published
- 2023
34. When did we forget we were playing? Failure, play, and possibility in sport & clinical life
- Author
-
Merson, Molly
- Published
- 2024
- Full Text
- View/download PDF
35. Elephants Do Not Forget: Differential Privacy with State Continuity for Privacy Budget
- Author
-
Jin, Jiankai, Chuengsatiansup, Chitchanok, Murray, Toby, Rubinstein, Benjamin I. P., Yarom, Yuval, and Ohrimenko, Olga
- Subjects
Computer Science - Cryptography and Security - Abstract
Current implementations of differentially-private (DP) systems either lack support to track the global privacy budget consumed on a dataset, or fail to faithfully maintain the state continuity of this budget. We show that failure to maintain a privacy budget enables an adversary to mount replay, rollback and fork attacks - obtaining answers to many more queries than what a secure system would allow. As a result the attacker can reconstruct secret data that DP aims to protect - even if DP code runs in a Trusted Execution Environment (TEE). We propose ElephantDP, a system that aims to provide the same guarantees as a trusted curator in the global DP model would, albeit set in an untrusted environment. Our system relies on a state continuity module to provide protection for the privacy budget and a TEE to faithfully execute DP code and update the budget. To provide security, our protocol makes several design choices including the content of the persistent state and the order between budget updates and query answers. We prove that ElephantDP provides liveness (i.e., the protocol can restart from a correct state and respond to queries as long as the budget is not exceeded) and DP confidentiality (i.e., an attacker learns about a dataset as much as it would from interacting with a trusted curator). Our implementation and evaluation of the protocol use Intel SGX as a TEE to run the DP code and a network of TEEs to maintain state continuity. Compared to an insecure baseline, we observe 1.1-3.2$\times$ overheads and lower relative overheads for complex DP queries., Comment: In Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS 2024)
- Published
- 2024
36. How to Forget Clients in Federated Online Learning to Rank?
- Author
-
Wang, Shuyi, Liu, Bing, and Zuccon, Guido
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
Data protection legislation like the European Union's General Data Protection Regulation (GDPR) establishes the \textit{right to be forgotten}: a user (client) can request contributions made using their data to be removed from learned models. In this paper, we study how to remove the contributions made by a client participating in a Federated Online Learning to Rank (FOLTR) system. In a FOLTR system, a ranker is learned by aggregating local updates to the global ranking model. Local updates are learned in an online manner at a client-level using queries and implicit interactions that have occurred within that specific client. By doing so, each client's local data is not shared with other clients or with a centralised search service, while at the same time clients can benefit from an effective global ranking model learned from contributions of each client in the federation. In this paper, we study an effective and efficient unlearning method that can remove a client's contribution without compromising the overall ranker effectiveness and without needing to retrain the global ranker from scratch. A key challenge is how to measure whether the model has unlearned the contributions from the client $c^*$ that has requested removal. For this, we instruct $c^*$ to perform a poisoning attack (add noise to this client updates) and then we measure whether the impact of the attack is lessened when the unlearning process has taken place. Through experiments on four datasets, we demonstrate the effectiveness and efficiency of the unlearning strategy under different combinations of parameter settings., Comment: Accepted in ECIR 2024
- Published
- 2024
37. Divide and not forget: Ensemble of selectively trained experts in Continual Learning
- Author
-
Rypeść, Grzegorz, Cygert, Sebastian, Khan, Valeriya, Trzciński, Tomasz, Zieliński, Bartosz, and Twardowski, Bartłomiej
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation, we introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert. For this purpose, each expert represents each class with a Gaussian distribution, and the optimal expert is selected based on the similarity of those distributions. Consequently, SEED increases diversity and heterogeneity within the experts while maintaining the high stability of this ensemble method. The extensive experiments demonstrate that SEED achieves state-of-the-art performance in exemplar-free settings across various scenarios, showing the potential of expert diversification through data in continual learning., Comment: Accepted for ICLR 2024 (main track), code is available at: https://github.com/grypesc/SEED
- Published
- 2024
38. Evidence of an active role of dreaming in emotional memory processing shows that we dream to forget.
- Author
-
Zhang, Jing, Pena, Andres, Delano, Nicole, Sattari, Negin, Shuster, Alessandra, Baker, Fiona, Simon, Katharine, and Mednick, Sara
- Subjects
Humans ,Dreams ,Memory ,Emotions ,Sleep - Abstract
Dreaming is a universal human behavior that has inspired searches for meaning across many disciplines including art, psychology, religion, and politics, yet its function remains poorly understood. Given the suggested role of sleep in emotional memory processing, we investigated whether reported overnight dreaming and dream content are associated with sleep-dependent changes in emotional memory and reactivity, and whether dreaming plays an active or passive role. Participants completed an emotional picture task before and after a full night of sleep and they recorded the presence and content of their dreams upon waking in the morning. The results replicated the emotional memory trade-off (negative images maintained at the cost of neutral memories), but only in those who reported dreaming (Dream-Recallers), and not in Non-Dream-Recallers. Results also replicated sleep-dependent reductions in emotional reactivity, but only in Dream-Recallers, not in Non-Dream-Recallers. Additionally, the more positive the dream report, the more positive the next-day emotional reactivity is compared to the night before. These findings implicate an active role for dreaming in overnight emotional memory processing and suggest a mechanistic framework whereby dreaming may enhance salient emotional experiences via the forgetting of less relevant information.
- Published
- 2024
39. Continual Learning: Forget-free Winning Subnetworks for Video Representations
- Author
-
Kang, Haeyong, Yoon, Jaehong, Hwang, Sung Ju, and Yoo, Chang D.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Inspired by the Lottery Ticket Hypothesis (LTH), which highlights the existence of efficient subnetworks within larger, dense networks, a high-performing Winning Subnetwork (WSN) in terms of task performance under appropriate sparsity conditions is considered for various continual learning tasks. It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incremental Learning (TIL) and Task-agnostic Incremental Learning (TaIL) scenarios. In Few-Shot Class Incremental Learning (FSCIL), a variation of WSN referred to as the Soft subnetwork (SoftNet) is designed to prevent overfitting when the data samples are scarce. Furthermore, the sparse reuse of WSN weights is considered for Video Incremental Learning (VIL). The use of Fourier Subneural Operator (FSO) within WSN is considered. It enables compact encoding of videos and identifies reusable subnetworks across varying bandwidths. We have integrated FSO into different architectural frameworks for continual learning, including VIL, TIL, and FSCIL. Our comprehensive experiments demonstrate FSO's effectiveness, significantly improving task performance at various convolutional representational levels. Specifically, FSO enhances higher-layer performance in TIL and FSCIL and lower-layer performance in VIL., Comment: arXiv admin note: substantial text overlap with arXiv:2303.14962, arXiv:2306.11305
- Published
- 2023
40. Don't forget private retrieval: distributed private similarity search for large language models
- Author
-
Zyskind, Guy, South, Tobin, and Pentland, Alex
- Subjects
Computer Science - Information Retrieval - Abstract
While the flexible capabilities of large language models (LLMs) allow them to answer a range of queries based on existing learned knowledge, information retrieval to augment generation is an important tool to allow LLMs to answer questions on information not included in pre-training data. Such private information is increasingly being generated in a wide array of distributed contexts by organizations and individuals. Performing such information retrieval using neural embeddings of queries and documents always leaked information about queries and database content unless both were stored locally. We present Private Retrieval Augmented Generation (PRAG), an approach that uses multi-party computation (MPC) to securely transmit queries to a distributed set of servers containing a privately constructed database to return top-k and approximate top-k documents. This is a first-of-its-kind approach to dense information retrieval that ensures no server observes a client's query or can see the database content. The approach introduces a novel MPC friendly protocol for inverted file approximate search (IVF) that allows for fast document search over distributed and private data in sublinear communication complexity. This work presents new avenues through which data for use in LLMs can be accessed and used without needing to centralize or forgo privacy.
- Published
- 2023
41. Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion
- Author
-
Zaman, Kerem, Choshen, Leshem, and Srivastava, Shashank
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models., Comment: 21 pages, 12 figures, 7 tables; To appear at EMNLP 2024
- Published
- 2023
42. Unlearn What You Want to Forget: Efficient Unlearning for LLMs
- Author
-
Chen, Jiaao and Yang, Diyi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data, however, this process might suffer from privacy issues and violations of data protection regulations. As a result, the ability to easily remove data related to individual users from such models while not deteriorating their predictive quality after the removal becomes increasingly important. To address these issues, in this work, we propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals, by introducing lightweight unlearning layers learned with a selective teacher-student objective into the transformers. In addition, we introduce a fusion mechanism to effectively combine different unlearning layers that learns to forget different sets of data to handle a sequence of forgetting operations. Experiments on classification and generation tasks demonstrate the effectiveness of our proposed methods compared to the state-of-the-art baselines., Comment: EMNLP 2023
- Published
- 2023
43. Emerging infodemic management strategies focus on technology: They can’t forget trust
- Author
-
Steiner, Robert
- Published
- 2024
- Full Text
- View/download PDF
44. Forget ChatGPT: why researchers now run small AIs on their laptops
- Author
-
Hutson, Matthew
- Published
- 2024
- Full Text
- View/download PDF
45. Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models
- Author
-
Zhang, Eric, Wang, Kai, Xu, Xingqian, Wang, Zhangyang, and Shi, Humphrey
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.
- Published
- 2023
46. Forget metamaterial: It does not improve sound absorption performance as it claims
- Author
-
Shen, Chao, Liu, Yu, Tang, Tianquan, and Huang, Lixi
- Subjects
Physics - Classical Physics - Abstract
The term `sub-wavelength' is commonly used to describe innovative sound-absorbing structures usually labeled as `metamaterials'. Such structures, however, inherently do not bring groundbreaking advancements. This study addresses the limitations imposed by the thickness criterion of Yang et al. by introducing the concept of equivalent mass-spring-damping parameters within the resonator framework. This innovative approach introduces an index of `half-absorption bandwidth' to effectively overcome the thickness restriction. Four practical cases are then presented to correct prevalent misleading conceptions about low-frequency, broadband absorption as claimed. The phenomenon of mass disappearing in the expression of sound absorption coefficient supports the conclusion that volume is the only determinant factor in sound absorption performance. Any attempts to improve sound absorption solely through geometry and structural designs would inevitably sacrifice the half-absorption bandwidth. Additionally, the concept of negative stiffness or bulk modulus is merely a mathematical convention without any real improvement in absorption performance. Overall, this research focuses on the physical mechanism of sound-absorbing structures by correcting traditional misunderstandings, and offers a comprehensive framework for assessing and enhancing sound absorption., Comment: 12 pages, 5 figures, part of the first author's Ph.D. thesis
- Published
- 2023
47. Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning
- Author
-
Frati, Lapo, Traft, Neil, Clune, Jeff, and Cheney, Nick
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Neural and Evolutionary Computing - Abstract
This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning. This mechanism -- the repeated resetting of weights in the last layer, which we nickname "zapping" -- was originally designed for a meta-continual-learning procedure, yet we show it is surprisingly applicable in many settings beyond both meta-learning and continual learning. In our experiments, we wish to transfer a pre-trained image classifier to a new set of classes, in a few shots. We show that our zapping procedure results in improved transfer accuracy and/or more rapid adaptation in both standard fine-tuning and continual learning settings, while being simple to implement and computationally efficient. In many cases, we achieve performance on par with state of the art meta-learning without needing the expensive higher-order gradients, by using a combination of zapping and sequential learning. An intuitive explanation for the effectiveness of this zapping procedure is that representations trained with repeated zapping learn features that are capable of rapidly adapting to newly initialized classifiers. Such an approach may be considered a computationally cheaper type of, or alternative to, meta-learning rapidly adaptable features with higher-order gradients. This adds to recent work on the usefulness of resetting neural network parameters during training, and invites further investigation of this mechanism.
- Published
- 2023
48. Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
- Author
-
Hoang, Duc N. M, Cho, Minsik, Merth, Thomas, Rastegari, Mohammad, and Wang, Zhangyang
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Compressing Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks. In this work, we dive into how compression damages LLMs' inherent knowledge and the possible remedies. We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed model to (re)learn from data with additional parameters; the other presumes that knowledge is internally displaced and hence one requires merely "inference re-direction" with input-side augmentation such as prompting, to recover the knowledge-related performance. Extensive experiments are then designed to (in)validate the two conjectures. We observe the promise of prompting in comparison to model tuning; we further unlock prompting's potential by introducing a variant called Inference-time Dynamic Prompting (IDP), that can effectively increase prompt diversity without incurring any inference overhead. Our experiments consistently suggest that compared to the classical re-training alternatives such as LoRA, prompting with IDP leads to better or comparable post-compression performance recovery, while saving the extra parameter size by 21x and reducing inference latency by 60%. Our experiments hence strongly endorse the conjecture of "knowledge displaced" over "knowledge forgotten", and shed light on a new efficient mechanism to restore compressed LLM performance. We additionally visualize and analyze the different attention and activation patterns between prompted and re-trained models, demonstrating they achieve performance recovery in two different regimes.
- Published
- 2023
49. A Duty to Forget, a Right to be Assured? Exposing Vulnerabilities in Machine Unlearning Services
- Author
-
Hu, Hongsheng, Wang, Shuo, Chang, Jiamin, Zhong, Haonan, Sun, Ruoxi, Hao, Shuang, Zhu, Haojin, and Xue, Minhui
- Subjects
Computer Science - Cryptography and Security - Abstract
The right to be forgotten requires the removal or "unlearning" of a user's data from machine learning models. However, in the context of Machine Learning as a Service (MLaaS), retraining a model from scratch to fulfill the unlearning request is impractical due to the lack of training data on the service provider's side (the server). Furthermore, approximate unlearning further embraces a complex trade-off between utility (model performance) and privacy (unlearning performance). In this paper, we try to explore the potential threats posed by unlearning services in MLaaS, specifically over-unlearning, where more information is unlearned than expected. We propose two strategies that leverage over-unlearning to measure the impact on the trade-off balancing, under black-box access settings, in which the existing machine unlearning attacks are not applicable. The effectiveness of these strategies is evaluated through extensive experiments on benchmark datasets, across various model architectures and representative unlearning approaches. Results indicate significant potential for both strategies to undermine model efficacy in unlearning scenarios. This study uncovers an underexplored gap between unlearning and contemporary MLaaS, highlighting the need for careful considerations in balancing data unlearning, model utility, and security., Comment: To Appear in the Network and Distributed System Security Symposium (NDSS) 2024, San Diego, CA, USA
- Published
- 2023
50. Compressed Models Decompress Race Biases: What Quantized Models Forget for Fair Face Recognition
- Author
-
Neto, Pedro C., Caldeira, Eduarda, Cardoso, Jaime S., and Sequeira, Ana F.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
With the ever-growing complexity of deep learning models for face recognition, it becomes hard to deploy these systems in real life. Researchers have two options: 1) use smaller models; 2) compress their current models. Since the usage of smaller models might lead to concerning biases, compression gains relevance. However, compressing might be also responsible for an increase in the bias of the final model. We investigate the overall performance, the performance on each ethnicity subgroup and the racial bias of a State-of-the-Art quantization approach when used with synthetic and real data. This analysis provides a few more details on potential benefits of performing quantization with synthetic data, for instance, the reduction of biases on the majority of test scenarios. We tested five distinct architectures and three different training datasets. The models were evaluated on a fourth dataset which was collected to infer and compare the performance of face recognition models on different ethnicity., Comment: Accepted for Oral at BIOSIG 2023
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.