Author: "Geiger, Atticus" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Geiger, Atticus"' showing total 38 results

Start Over Author "Geiger, Atticus"

38 results on '"Geiger, Atticus"'

1. Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small

Author: Chaudhary, Maheep and Geiger, Atticus
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Abstract: A popular new method in mechanistic interpretability is to train high-dimensional sparse autoencoders (SAEs) on neuron activations and use SAE features as the atomic units of analysis. However, the body of evidence on whether SAE feature spaces are useful for causal analysis is underdeveloped. In this work, we use the RAVEL benchmark to evaluate whether SAEs trained on hidden representations of GPT-2 small have sets of features that separately mediate knowledge of which country a city is in and which continent it is in. We evaluate four open-source SAEs for GPT-2 small against each other, with neurons serving as a baseline, and linear features learned via distributed alignment search (DAS) serving as a skyline. For each, we learn a binary mask to select features that will be patched to change the country of a city without changing the continent, or vice versa. Our results show that SAEs struggle to reach the neuron baseline, and none come close to the DAS skyline. We release code here: https://github.com/MaheepChaudhary/SAE-Ravel
Published: 2024

2. Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

Author: Csordás, Róbert, Potts, Christopher, Manning, Christopher D., and Geiger, Atticus
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Abstract: The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that interpretability research should not be confined by the LRH.
Published: 2024

3. Updating CLIP to Prefer Descriptions Over Captions

Author: Zur, Amir, Kreiss, Elisa, D'Oosterlinck, Karel, Potts, Christopher, and Geiger, Atticus
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Although CLIPScore is a powerful generic metric that captures the similarity between a text and an image, it fails to distinguish between a caption that is meant to complement the information in an image and a description that is meant to replace an image entirely, e.g., for accessibility. We address this shortcoming by updating the CLIP model with the Concadia dataset to assign higher scores to descriptions than captions using parameter efficient fine-tuning and a loss objective derived from work on causal interpretability. This model correlates with the judgements of blind and low-vision people while preserving transfer capabilities and has interpretable structure that sheds light on the caption--description distinction.
Published: 2024

4. ReFT: Representation Finetuning for Language Models

Author: Wu, Zhengxuan, Arora, Aryaman, Wang, Zheng, Geiger, Atticus, Jurafsky, Dan, Manning, Christopher D., and Potts, Christopher
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Parameter-efficient finetuning (PEFT) methods seek to adapt large neural models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations. We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT), and we identify an ablation of this method that trades some performance for increased efficiency. Both are drop-in replacements for existing PEFTs and learn interventions that are 15x--65x more parameter-efficient than LoRA. We showcase LoReFT on eight commonsense reasoning tasks, four arithmetic reasoning tasks, instruction-tuning, and GLUE. In all these evaluations, our ReFTs deliver the best balance of efficiency and performance, and almost always outperform state-of-the-art PEFTs. We release a generic ReFT training library publicly at https://github.com/stanfordnlp/pyreft., Comment: preprint
Published: 2024

5. pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Author: Wu, Zhengxuan, Geiger, Atticus, Arora, Aryaman, Huang, Jing, Wang, Zheng, Goodman, Noah D., Manning, Christopher D., and Potts, Christopher
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce $\textbf{pyvene}$, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. $\textbf{pyvene}$ supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters. We show how $\textbf{pyvene}$ provides a unified and extensible framework for performing interventions on neural models and sharing the intervened upon models with others. We illustrate the power of the library via interpretability analyses using causal abstraction and knowledge localization. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at https://github.com/stanfordnlp/pyvene., Comment: 8 pages, 3 figures
Published: 2024

6. RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Author: Huang, Jing, Wu, Zhengxuan, Potts, Christopher, Geva, Mor, and Geiger, Atticus
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Individual neurons participate in the representation of multiple high-level concepts. To what extent can different interpretability methods successfully disentangle these roles? To help address this question, we introduce RAVEL (Resolving Attribute-Value Entanglements in Language Models), a dataset that enables tightly controlled, quantitative comparisons between a variety of existing interpretability methods. We use the resulting conceptual framework to define the new method of Multi-task Distributed Alignment Search (MDAS), which allows us to find distributed representations satisfying multiple causal criteria. With Llama2-7B as the target language model, MDAS achieves state-of-the-art results on RAVEL, demonstrating the importance of going beyond neuron-level analyses to identify features distributed across activations. We release our benchmark at https://github.com/explanare/ravel., Comment: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)
Published: 2024

7. A Reply to Makelov et al. (2023)'s 'Interpretability Illusion' Arguments

Author: Wu, Zhengxuan, Geiger, Atticus, Huang, Jing, Arora, Aryaman, Icard, Thomas, Potts, Christopher, and Goodman, Noah D.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions". We first review Makelov et al. (2023)'s technical notion of what an "interpretability illusion" is, and then we show that even intuitive and desirable explanations can qualify as illusions in this sense. As a result, their method of discovering "illusions" can reject explanations they consider "non-illusory". We then argue that the illusions Makelov et al. (2023) see in practice are artifacts of their training and evaluation paradigms. We close by emphasizing that, though we disagree with their core characterization, Makelov et al. (2023)'s examples and discussion have undoubtedly pushed the field of interpretability forward., Comment: 20 pages, 14 figures
Published: 2024

8. Linear Representations of Sentiment in Large Language Models

Author: Tigges, Curt, Hollinsworth, Oskar John, Geiger, Atticus, and Nanda, Neel
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Sentiment is a pervasive feature in natural language text, yet it is an open question how sentiment is represented within Large Language Models (LLMs). In this study, we reveal that across a range of models, sentiment is represented linearly: a single direction in activation space mostly captures the feature across a range of tasks with one extreme for positive and the other for negative. Through causal interventions, we isolate this direction and show it is causally relevant in both toy tasks and real world datasets such as Stanford Sentiment Treebank. Through this case study we model a thorough investigation of what a single direction means on a broad data distribution. We further uncover the mechanisms that involve this direction, highlighting the roles of a small subset of attention heads and neurons. Finally, we discover a phenomenon which we term the summarization motif: sentiment is not solely represented on emotionally charged words, but is additionally summarized at intermediate positions without inherent sentiment, such as punctuation and names. We show that in Stanford Sentiment Treebank zero-shot classification, 76% of above-chance classification accuracy is lost when ablating the sentiment direction, nearly half of which (36%) is due to ablating the summarized sentiment direction exclusively at comma positions.
Published: 2023

9. Rigorously Assessing Natural Language Explanations of Neurons

Author: Huang, Jing, Geiger, Atticus, D'Oosterlinck, Karel, Wu, Zhengxuan, and Potts, Christopher
Subjects: Computer Science - Computation and Language
Abstract: Natural language is an appealing medium for explaining how large language models process and store information, but evaluating the faithfulness of such explanations is challenging. To help address this, we develop two modes of evaluation for natural language explanations that claim individual neurons represent a concept in a text input. In the observational mode, we evaluate claims that a neuron $a$ activates on all and only input strings that refer to a concept picked out by the proposed explanation $E$. In the intervention mode, we construe $E$ as a claim that the neuron $a$ is a causal mediator of the concept denoted by $E$. We apply our framework to the GPT-4-generated explanations of GPT-2 XL neurons of Bills et al. (2023) and show that even the most confident explanations have high error rates and little to no causal efficacy. We close the paper by critically assessing whether natural language is a good choice for explanations and whether neurons are the best level of analysis.
Published: 2023

10. ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

Author: She, Jingyuan Selena, Potts, Christopher, Bowman, Samuel R., and Geiger, Atticus
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: A number of recent benchmarks seek to assess how well models handle natural language negation. However, these benchmarks lack the controlled example paradigms that would allow us to infer whether a model had learned how negation morphemes semantically scope. To fill these analytical gaps, we present the Scoped Negation NLI (ScoNe-NLI) benchmark, which contains contrast sets of six examples with up to two negations where either zero, one, or both negative morphemes affect the NLI label. We use ScoNe-NLI to assess fine-tuning and in-context learning strategies. We find that RoBERTa and DeBERTa models solve ScoNe-NLI after many shot fine-tuning. For in-context learning, we test InstructGPT models and find that most prompt strategies are not successful, including those using step-by-step reasoning. To better understand this result, we extend ScoNe with ScoNe-NLG, a sentence completion test set that embeds negation reasoning in short narratives. Here, InstructGPT is successful, which reveals the model can correctly reason about negation, but struggles to do so on prompt-adapted NLI examples outside of its core pretraining regime.
Published: 2023
Full Text: View/download PDF

11. Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

Author: Wu, Zhengxuan, Geiger, Atticus, Icard, Thomas, Potts, Christopher, and Goodman, Noah D.
Subjects: Computer Science - Computation and Language
Abstract: Obtaining human-interpretable explanations of large, general-purpose language models is an urgent goal for AI safety. However, it is just as important that our interpretability methods are faithful to the causal dynamics underlying model behavior and able to robustly generalize to unseen inputs. Distributed Alignment Search (DAS) is a powerful gradient descent method grounded in a theory of causal abstraction that has uncovered perfect alignments between interpretable symbolic algorithms and small deep learning models fine-tuned for specific tasks. In the present paper, we scale DAS significantly by replacing the remaining brute-force search steps with learned parameters -- an approach we call Boundless DAS. This enables us to efficiently search for interpretable causal structure in large language models while they follow instructions. We apply Boundless DAS to the Alpaca model (7B parameters), which, off the shelf, solves a simple numerical reasoning problem. With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables. Furthermore, we find that the alignment of neural representations with these variables is robust to changes in inputs and instructions. These findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models. Our tool is extensible to larger LLMs and is released publicly at `https://github.com/stanfordnlp/pyvene`., Comment: NeurIPS 2023 with Author Corrections
Published: 2023

12. Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Author: Geiger, Atticus, Wu, Zhengxuan, Potts, Christopher, Icard, Thomas, and Goodman, Noah D.
Subjects: Computer Science - Artificial Intelligence
Abstract: Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing causal abstraction methods have two major limitations: they require a brute-force search over alignments between the high-level model and the low-level one, and they presuppose that variables in the high-level model will align with disjoint sets of neurons in the low-level one. In this paper, we present distributed alignment search (DAS), which overcomes these limitations. In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations. Our experiments show that DAS can discover internal structure that prior approaches miss. Overall, DAS removes previous obstacles to conducting causal abstraction analyses and allows us to find conceptual structure in trained neural nets.
Published: 2023

13. Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability

Author: Geiger, Atticus, Ibeling, Duligur, Zur, Amir, Chaudhary, Maheep, Chauhan, Sonakshi, Huang, Jing, Arora, Aryaman, Wu, Zhengxuan, Goodman, Noah, Potts, Christopher, and Icard, Thomas
Subjects: Computer Science - Artificial Intelligence
Abstract: Causal abstraction provides a theoretical foundation for mechanistic interpretability, the field concerned with providing intelligible algorithms that are faithful simplifications of the known, but opaque low-level details of black box AI models. Our contributions are (1) generalizing the theory of causal abstraction from mechanism replacement (i.e., hard and soft interventions) to arbitrary mechanism transformation (i.e., functionals from old mechanisms to new mechanisms), (2) providing a flexible, yet precise formalization for the core concepts of modular features, polysemantic neurons, and graded faithfulness, and (3) unifying a variety of mechanistic interpretability methodologies in the common language of causal abstraction, namely activation and path patching, causal mediation analysis, causal scrubbing, causal tracing, circuit analysis, concept erasure, sparse autoencoders, differential binary masking, distributed alignment search, and activation steering.
Published: 2023

14. Causal Abstraction with Soft Interventions

Author: Massidda, Riccardo, Geiger, Atticus, Icard, Thomas, and Bacciu, Davide
Subjects: Computer Science - Artificial Intelligence
Abstract: Causal abstraction provides a theory describing how several causal models can represent the same system at different levels of detail. Existing theoretical proposals limit the analysis of abstract models to "hard" interventions fixing causal variables to be constant values. In this work, we extend causal abstraction to "soft" interventions, which assign possibly non-constant functions to variables without adding new causal connections. Specifically, (i) we generalize $\tau$-abstraction from Beckers and Halpern (2019) to soft interventions, (ii) we propose a further definition of soft abstraction to ensure a unique map $\omega$ between soft interventions, and (iii) we prove that our constructive definition of soft abstraction guarantees the intervention map $\omega$ has a specific and necessary explicit form.
Published: 2022

15. Causal Proxy Models for Concept-Based Model Explanations

Author: Wu, Zhengxuan, D'Oosterlinck, Karel, Geiger, Atticus, Zur, Amir, and Potts, Christopher
Subjects: Computer Science - Computation and Language
Abstract: Explainability methods for NLP systems encounter a version of the fundamental problem of causal inference: for a given ground-truth input text, we never truly observe the counterfactual texts necessary for isolating the causal effects of model representations on outputs. In response, many explainability methods make no use of counterfactual texts, assuming they will be unavailable. In this paper, we show that robust causal explainability methods can be created using approximate counterfactuals, which can be written by humans to approximate a specific counterfactual or simply sampled using metadata-guided heuristics. The core of our proposal is the Causal Proxy Model (CPM). A CPM explains a black-box model $\mathcal{N}$ because it is trained to have the same actual input/output behavior as $\mathcal{N}$ while creating neural representations that can be intervened upon to simulate the counterfactual input/output behavior of $\mathcal{N}$. Furthermore, we show that the best CPM for $\mathcal{N}$ performs comparably to $\mathcal{N}$ in making factual predictions, which means that the CPM can simply replace $\mathcal{N}$, leading to more explainable deployed models. Our code is available at https://github.com/frankaging/Causal-Proxy-Model., Comment: 23 pages
Published: 2022

16. A Semantics for Causing, Enabling, and Preventing Verbs Using Structural Causal Models

Author: Cao, Angela, Geiger, Atticus, Kreiss, Elisa, Icard, Thomas, and Gerstenberg, Tobias
Subjects: Linguistics, Psychology, Causal reasoning, Language and thought, Language Comprehension, Semantics of language, Computer-based experiment, Logic, Statistics
Abstract: When choosing how to describe what happened, we have a number of causal verbs at our disposal. In this paper, we develop a model-theoretic formal semantics for nine causal verbs that span the categories of CAUSE, ENABLE, and PREVENT. We use structural causal models (SCMs) to represent participants’ mental construction of a scene when assessing the correctness of causal expressions relative to a presented context. Furthermore, SCMs enable us to model events relating both the physical world as well as agents’ mental states. In experimental evaluations, we find that the proposed semantics exhibits a closer alignment with human evaluations in comparison to prior accounts of the verb families
Published: 2023

17. CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior

Author: Abraham, Eldar David, D'Oosterlinck, Karel, Feder, Amir, Gat, Yair Ori, Geiger, Atticus, Potts, Christopher, Reichart, Roi, and Wu, Zhengxuan
Subjects: Computer Science - Computation and Language
Abstract: The increasing size and complexity of modern ML systems has improved their predictive capabilities but made their behavior harder to explain. Many techniques for model explanation have been developed in response, but we lack clear criteria for assessing these techniques. In this paper, we cast model explanation as the causal inference problem of estimating causal effects of real-world concepts on the output behavior of ML models given actual input data. We introduce CEBaB, a new benchmark dataset for assessing concept-based explanation methods in Natural Language Processing (NLP). CEBaB consists of short restaurant reviews with human-generated counterfactual reviews in which an aspect (food, noise, ambiance, service) of the dining experience was modified. Original and counterfactual reviews are annotated with multiply-validated sentiment ratings at the aspect-level and review-level. The rich structure of CEBaB allows us to go beyond input features to study the effects of abstract, real-world concepts on model behavior. We use CEBaB to compare the quality of a range of concept-based explanation methods covering different assumptions and conceptions of the problem, and we seek to establish natural metrics for comparative assessments of these methods., Comment: Accepted to NeurIPS 2022
Published: 2022

18. Causal Distillation for Language Models

Author: Wu, Zhengxuan, Geiger, Atticus, Rozner, Josh, Kreiss, Elisa, Lu, Hanson, Icard, Thomas, Potts, Christopher, and Goodman, Noah D.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Distillation efforts have led to language models that are more compact and efficient without serious drops in performance. The standard approach to distillation trains a student model against two objectives: a task-specific objective (e.g., language modeling) and an imitation objective that encourages the hidden states of the student model to be similar to those of the larger teacher model. In this paper, we show that it is beneficial to augment distillation with a third objective that encourages the student to imitate the causal computation process of the teacher through interchange intervention training(IIT). IIT pushes the student model to become a causal abstraction of the teacher model - a simpler model with the same causal structure. IIT is fully differentiable, easily implemented, and combines flexibly with other objectives. Compared with standard distillation of BERT, distillation via IIT results in lower perplexity on Wikipedia (masked language modeling) and marked improvements on the GLUE benchmark (natural language understanding), SQuAD (question answering), and CoNLL-2003 (named entity recognition)., Comment: 7 pages, 2 figures
Published: 2021

19. Inducing Causal Structure for Interpretable Neural Networks

Author: Geiger, Atticus, Wu, Zhengxuan, Lu, Hanson, Rozner, Josh, Kreiss, Elisa, Icard, Thomas, Goodman, Noah D., and Potts, Christopher
Subjects: Computer Science - Machine Learning
Abstract: In many areas, we have well-founded insights about causal structure that would be useful to bring into our trained models while still allowing them to learn in a data-driven fashion. To achieve this, we present the new method of interchange intervention training (IIT). In IIT, we (1) align variables in a causal model (e.g., a deterministic program or Bayesian network) with representations in a neural model and (2) train the neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in both models are set to be the value they would be for a source input. IIT is fully differentiable, flexibly combines with other objectives, and guarantees that the target causal model is a causal abstraction of the neural model when its loss is zero. We evaluate IIT on a structural vision task (MNIST-PVR), a navigational language task (ReaSCAN), and a natural language inference task (MQNLI). We compare IIT against multi-task training objectives and data augmentation. In all our experiments, IIT achieves the best results and produces neural models that are more interpretable in the sense that they more successfully realize the target causal model.
Published: 2021

20. Causal Abstractions of Neural Networks

Author: Geiger, Atticus, Lu, Hanson, Icard, Thomas, and Potts, Christopher
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Structural analysis methods (e.g., probing and feature attribution) are increasingly important tools for neural network analysis. We propose a new structural analysis method grounded in a formal theory of causal abstraction that provides rich characterizations of model-internal representations and their roles in input/output behavior. In this method, neural representations are aligned with variables in interpretable causal models, and then interchange interventions are used to experimentally verify that the neural representations have the causal properties of their aligned variables. We apply this method in a case study to analyze neural models trained on Multiply Quantified Natural Language Inference (MQNLI) corpus, a highly complex NLI dataset that was constructed with a tree-structured natural logic causal model. We discover that a BERT-based model with state-of-the-art performance successfully realizes parts of the natural logic model's causal structure, whereas a simpler baseline model fails to show any such structure, demonstrating that BERT representations encode the compositional structure of MQNLI., Comment: NeurIPS 2021
Published: 2021

21. Dynabench: Rethinking Benchmarking in NLP

Author: Kiela, Douwe, Bartolo, Max, Nie, Yixin, Kaushik, Divyansh, Geiger, Atticus, Wu, Zhengxuan, Vidgen, Bertie, Prasad, Grusha, Singh, Amanpreet, Ringshia, Pratik, Ma, Zhiyi, Thrush, Tristan, Riedel, Sebastian, Waseem, Zeerak, Stenetorp, Pontus, Jia, Robin, Bansal, Mohit, Potts, Christopher, and Williams, Adina
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field., Comment: NAACL 2021
Published: 2021

22. DynaSent: A Dynamic Benchmark for Sentiment Analysis

Author: Potts, Christopher, Wu, Zhengxuan, Geiger, Atticus, and Kiela, Douwe
Subjects: Computer Science - Computation and Language
Abstract: We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis. DynaSent combines naturally occurring sentences with sentences created using the open-source Dynabench Platform, which facilities human-and-model-in-the-loop dataset creation. DynaSent has a total of 121,634 sentences, each validated by five crowdworkers, and its development and test splits are designed to produce chance performance for even the best models we have been able to develop; when future models solve this task, we will use them to create DynaSent version 2, continuing the dynamic evolution of this benchmark. Here, we report on the dataset creation effort, focusing on the steps we took to increase quality and reduce artifacts. We also present evidence that DynaSent's Neutral category is more coherent than the comparable category in other benchmarks, and we motivate training models from scratch for each round over successive fine-tuning.
Published: 2020

23. Relational reasoning and generalization using non-symbolic neural networks

Author: Geiger, Atticus, Carstensen, Alexandra, Frank, Michael C., and Potts, Christopher
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The notion of equality (identity) is simple and ubiquitous, making it a key case study for broader questions about the representations supporting abstract relational reasoning. Previous work suggested that neural networks were not suitable models of human relational reasoning because they could not represent mathematically identity, the most basic form of equality. We revisit this question. In our experiments, we assess out-of-sample generalization of equality using both arbitrary representations and representations that have been pretrained on separate tasks to imbue them with structure. We find neural networks are able to learn (1) basic equality (mathematical identity), (2) sequential equality problems (learning ABA-patterned sequences) with only positive training instances, and (3) a complex, hierarchical equality problem with only basic equality training instances ("zero-shot'" generalization). In the two latter cases, our models perform tasks proposed in previous work to demarcate human-unique symbolic abilities. These results suggest that essential aspects of symbolic reasoning can emerge from data-driven, non-symbolic learning processes.
Published: 2020

24. Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

Author: Geiger, Atticus, Richardson, Kyle, and Potts, Christopher
Subjects: Computer Science - Computation and Language
Abstract: We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural evaluation methods of (3) probes and (4) interventions. To facilitate this holistic evaluation, we present Monotonicity NLI (MoNLI), a new naturalistic dataset focused on lexical entailment and negation. In our behavioral evaluations, we find that models trained on general-purpose NLI datasets fail systematically on MoNLI examples containing negation, but that MoNLI fine-tuning addresses this failure. In our structural evaluations, we look for evidence that our top-performing BERT-based model has learned to implement the monotonicity algorithm behind MoNLI. Probes yield evidence consistent with this conclusion, and our intervention experiments bolster this, showing that the causal dynamics of the model mirror the causal dynamics of this algorithm on subsets of MoNLI. This suggests that the BERT model at least partially embeds a theory of lexical entailment and negation at an algorithmic level., Comment: In Proceedings of BlackBoxNLP 2020 at EMNLP 2020
Published: 2020

25. Posing Fair Generalization Tasks for Natural Language Inference

Author: Geiger, Atticus, Cases, Ignacio, Karttunen, Lauri, and Potts, Chris
Subjects: Computer Science - Computation and Language
Abstract: Deep learning models for semantics are generally evaluated using naturalistic corpora. Adversarial methods, in which models are evaluated on new examples with known semantic properties, have begun to reveal that good performance at these naturalistic tasks can hide serious shortcomings. However, we should insist that these evaluations be fair -that the models are given data sufficient to support the requisite kinds of generalization. In this paper, we define and motivate a formal notion of fairness in this sense. We then apply these ideas to natural language inference by constructing very challenging but provably fair artificial datasets and showing that standard neural models fail to generalize in the required ways; only task-specific models that jointly compose the premise and hypothesis are able to achieve high performance, and even these models do not solve the task perfectly.
Published: 2019

26. Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences

Author: Geiger, Atticus, Cases, Ignacio, Karttunen, Lauri, and Potts, Christopher
Subjects: Computer Science - Computation and Language
Abstract: Standard evaluations of deep learning models for semantics using naturalistic corpora are limited in what they can tell us about the fidelity of the learned representations, because the corpora rarely come with good measures of semantic complexity. To overcome this limitation, we present a method for generating data sets of multiply-quantified natural language inference (NLI) examples in which semantic complexity can be precisely characterized, and we use this method to show that a variety of common architectures for NLI inevitably fail to encode crucial information; only a model with forced lexical alignments avoids this damaging information loss.
Published: 2018

27. A Semantics for Causing, Enabling, and Preventing Verbs Using Structural Causal Models

Author: Cao, Angela, primary, Geiger, Atticus, additional, Kreiss, Elisa, additional, Icard, Thomas, additional, and Gerstenberg, Tobias, additional
Published: 2023
Full Text: View/download PDF

28. Causal Abstraction for Faithful Model Interpretation

Author: Geiger, Atticus, Potts, Chris, Icard, Thomas, Geiger, Atticus, Potts, Chris, and Icard, Thomas
Abstract: A faithful and interpretable explanation of an AI model's behavior and internal structure is a high-level explanation that is human-intelligible but also consistent with the known, but often opaque low-level causal details of the model. We argue that the theory of causal abstraction provides the mathematical foundations for the desired kinds of model explanations. In causal abstraction analysis, we use interventions on model-internal states to rigorously assess whether an interpretable high-level causal model is a faithful description of an AI model. Our contributions in this area are: (1) We generalize causal abstraction to cyclic causal structures and typed high-level variables. (2) We show how multi-source interchange interventions can be used to conduct causal abstraction analyses. (3) We define a notion of approximate causal abstraction that allows us to assess the degree to which a high-level causal model is a causal abstraction of a lower-level one. (4) We prove constructive causal abstraction can be decomposed into three operations we refer to as marginalization, variable-merge, and value-merge. (5) We formalize the XAI methods of LIME, causal effect estimation, causal mediation analysis, iterated nullspace projection, and circuit-based explanations as special cases of causal abstraction analysis.
Published: 2023

29. ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

Author: She, Jingyuan S., primary, Potts, Christopher, additional, Bowman, Samuel R., additional, and Geiger, Atticus, additional
Published: 2023
Full Text: View/download PDF

30. Rigorously Assessing Natural Language Explanations of Neurons

Author: Huang, Jing, primary, Geiger, Atticus, additional, D’Oosterlinck, Karel, additional, Wu, Zhengxuan, additional, and Potts, Christopher, additional
Published: 2023
Full Text: View/download PDF

31. Relational reasoning and generalization using nonsymbolic neural networks.

Author: Geiger, Atticus, primary, Carstensen, Alexandra, additional, Frank, Michael C., additional, and Potts, Christopher, additional
Published: 2022
Full Text: View/download PDF

32. Relational Reasoning and Generalization Using Nonsymbolic Neural Networks.

Author: Geiger, Atticus, Carstensen, Alexandra, Frank, Michael C., and Potts, Christopher
Subjects: *GENERALIZATION
Abstract: The notion of equality (identity) is simple and ubiquitous, making it a key case study for broader questions about the representations supporting abstract relational reasoning. Previous work suggested that neural networks were not suitable models of human relational reasoning because they could not represent mathematically identity, the most basic form of equality. We revisit this question. In our experiments, we assess out-of-sample generalization of equality using both arbitrary representations and representations that have been pretrained on separate tasks to imbue them with structure. We find neural networks are able to learn (a) basic equality (mathematical identity), (b) sequential equality problems (learning ABA-patterned sequences) with only positive training instances, and (c) a complex, hierarchical equality problem with only basic equality training instances ("zero-shot" generalization). In the two latter cases, our models perform tasks proposed in previous work to demarcate human-unique symbolic abilities. These results suggest that essential aspects of symbolic reasoning can emerge from data-driven, nonsymbolic learning processes. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

33. Causal Distillation for Language Models

Author: Wu, Zhengxuan, primary, Geiger, Atticus, additional, Rozner, Joshua, additional, Kreiss, Elisa, additional, Lu, Hanson, additional, Icard, Thomas, additional, Potts, Christopher, additional, and Goodman, Noah, additional
Published: 2022
Full Text: View/download PDF

34. DynaSent: A Dynamic Benchmark for Sentiment Analysis

Author: Potts, Christopher, primary, Wu, Zhengxuan, additional, Geiger, Atticus, additional, and Kiela, Douwe, additional
Published: 2021
Full Text: View/download PDF

35. Dynabench: Rethinking Benchmarking in NLP

Author: Kiela, Douwe, primary, Bartolo, Max, additional, Nie, Yixin, additional, Kaushik, Divyansh, additional, Geiger, Atticus, additional, Wu, Zhengxuan, additional, Vidgen, Bertie, additional, Prasad, Grusha, additional, Singh, Amanpreet, additional, Ringshia, Pratik, additional, Ma, Zhiyi, additional, Thrush, Tristan, additional, Riedel, Sebastian, additional, Waseem, Zeerak, additional, Stenetorp, Pontus, additional, Jia, Robin, additional, Bansal, Mohit, additional, Potts, Christopher, additional, and Williams, Adina, additional
Published: 2021
Full Text: View/download PDF

36. Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

Author: Geiger, Atticus, primary, Richardson, Kyle, additional, and Potts, Christopher, additional
Published: 2020
Full Text: View/download PDF

37. Recursive Routing Networks: Learning to Compose Modules for Language Understanding

Author: Cases, Ignacio, primary, Rosenbaum, Clemens, additional, Riemer, Matthew, additional, Geiger, Atticus, additional, Klinger, Tim, additional, Tamkin, Alex, additional, Li, Olivia, additional, Agarwal, Sandhini, additional, Greene, Joshua D., additional, Jurafsky, Dan, additional, Potts, Christopher, additional, and Karttunen, Lauri, additional
Published: 2019
Full Text: View/download PDF

38. Posing Fair Generalization Tasks for Natural Language Inference

Author: Geiger, Atticus, primary, Cases, Ignacio, additional, Karttunen, Lauri, additional, and Potts, Christopher, additional
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

38 results on '"Geiger, Atticus"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources