Author: "Bulian, Jannis" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Bulian, Jannis"' showing total 25 results

Start Over Author "Bulian, Jannis"

25 results on '"Bulian, Jannis"'

1. How Susceptible are LLMs to Influence in Prompts?

Author: Anagnostidis, Sotiris and Bulian, Jannis
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. As LLMs grow in capability, understanding their prompt-sensitivity becomes increasingly crucial for ensuring reliable and robust performance, particularly since evaluating these models becomes more challenging. In this work, we investigate how current models (Llama, Mixtral, Falcon) respond when presented with additional input from another model, mimicking a scenario where a more capable model -- or a system with access to more external information -- provides supplementary information to the target model. Across a diverse spectrum of question-answering tasks, we study how an LLM's response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Specifically, we explore the influence of the presence of an explanation, the stated authoritativeness of the source, and the stated confidence of the supplementary input. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation. The models are more likely to be swayed if the input is presented as being authoritative or confident, but the effect is small in size. This study underscores the significant prompt-sensitivity of LLMs and highlights the potential risks of incorporating outputs from external sources without thorough scrutiny and further validation. As LLMs continue to advance, understanding and mitigating such sensitivities will be crucial for their reliable and trustworthy deployment.
Published: 2024

2. On scalable oversight with weak LLMs judging strong LLMs

Author: Kenton, Zachary, Siegel, Noah Y., Kramár, János, Brown-Cohen, Jonah, Albanie, Samuel, Bulian, Jannis, Agarwal, Rishabh, Lindner, David, Tang, Yunhao, Goodman, Noah D., and Shah, Rohin
Subjects: Computer Science - Machine Learning
Abstract: Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies., Comment: 15 pages (53 including appendices). V2: minor correction to Figure 3; add Figure A.9 comparing open vs assigned consultancy; add a reference
Published: 2024

3. Assessing Large Language Models on Climate Information

Author: Bulian, Jannis, Schäfer, Mike S., Amini, Afra, Lam, Heidi, Ciaramita, Massimiliano, Gaiarin, Ben, Hübscher, Michelle Chen, Buck, Christian, Mede, Niels G., Leippold, Markus, and Strauß, Nadine
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. Our evaluation task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education. We evaluate several recent LLMs on a set of diverse climate questions. Our results point to a significant gap between surface and epistemological qualities of LLMs in the realm of climate communication.
Published: 2023

4. Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

Author: Roberts, Adam, Chung, Hyung Won, Levskaya, Anselm, Mishra, Gaurav, Bradbury, James, Andor, Daniel, Narang, Sharan, Lester, Brian, Gaffney, Colin, Mohiuddin, Afroz, Hawthorne, Curtis, Lewkowycz, Aitor, Salcianu, Alex, van Zee, Marc, Austin, Jacob, Goodman, Sebastian, Soares, Livio Baldini, Hu, Haitang, Tsvyashchenko, Sasha, Chowdhery, Aakanksha, Bastings, Jasmijn, Bulian, Jannis, Garcia, Xavier, Ni, Jianmo, Chen, Andrew, Kenealy, Kathleen, Clark, Jonathan H., Lee, Stephan, Garrette, Dan, Lee-Thorp, James, Raffel, Colin, Shazeer, Noam, Ritter, Marvin, Bosma, Maarten, Passos, Alexandre, Maitin-Shepard, Jeremy, Fiedel, Noah, Omernick, Mark, Saeta, Brennan, Sepassi, Ryan, Spiridonov, Alexander, Newlan, Joshua, and Gesmundo, Andrea
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. $\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.
Published: 2022

5. Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation

Author: Bulian, Jannis, Buck, Christian, Gajewski, Wojciech, Boerschinger, Benjamin, and Schuster, Tal
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The predictions of question answering (QA)systems are typically evaluated against manually annotated finite sets of one or more answers. This leads to a coverage limitation that results in underestimating the true performance of systems, and is typically addressed by extending over exact match (EM) with pre-defined rules or with the token-level F1 measure. In this paper, we present the first systematic conceptual and data-driven analysis to examine the shortcomings of token-level equivalence measures. To this end, we define the asymmetric notion of answer equivalence (AE), accepting answers that are equivalent to or improve over the reference, and publish over 23k human judgments for candidates produced by multiple QA systems on SQuAD. Through a careful analysis of this data, we reveal and quantify several concrete limitations of the F1 measure, such as a false impression of graduality, or missing dependence on the question. Since collecting AE annotations for each evaluated model is expensive, we learn a BERT matching (BEM) measure to approximate this task. Being a simpler task than QA, we find BEM to provide significantly better AE approximations than F1, and to more accurately reflect the performance of systems. Finally, we demonstrate the practical utility of AE and BEM on the concrete application of minimal accurate prediction sets, reducing the number of required answers by up to x2.6.
Published: 2022

6. Fool Me Twice: Entailment from Wikipedia Gamification

Author: Eisenschlos, Julian Martin, Dhingra, Bhuwan, Bulian, Jannis, Börschinger, Benjamin, and Boyd-Graber, Jordan
Subjects: Computer Science - Computation and Language
Abstract: We release FoolMeTwice (FM2 for short), a large dataset of challenging entailment pairs collected through a fun multi-player game. Gamification encourages adversarial examples, drastically lowering the number of examples that can be solved using "shortcuts" compared to other popular entailment datasets. Players are presented with two tasks. The first task asks the player to write a plausible claim based on the evidence from a Wikipedia page. The second one shows two plausible claims written by other players, one of which is false, and the goal is to identify it before the time runs out. Players "pay" to see clues retrieved from the evidence pool: the more evidence the player needs, the harder the claim. Game-play between motivated players leads to diverse strategies for crafting claims, such as temporal inference and diverting to unrelated evidence, and results in higher quality data for the entailment and evidence retrieval tasks. We open source the dataset and the game code., Comment: Published in NAACL 2021
Published: 2021

7. CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims

Author: Diggelmann, Thomas, Boyd-Graber, Jordan, Bulian, Jannis, Ciaramita, Massimiliano, and Leippold, Markus
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We introduce CLIMATE-FEVER, a new publicly available dataset for verification of climate change-related claims. By providing a dataset for the research community, we aim to facilitate and encourage work on improving algorithms for retrieving evidential support for climate-specific claims, addressing the underlying language understanding challenges, and ultimately help alleviate the impact of misinformation on climate change. We adapt the methodology of FEVER [1], the largest dataset of artificially designed claims, to real-life claims collected from the Internet. While during this process, we could rely on the expertise of renowned climate scientists, it turned out to be no easy task. We discuss the surprising, subtle complexity of modeling real-world climate-related claims within the \textsc{fever} framework, which we believe provides a valuable challenge for general natural language understanding. We hope that our work will mark the beginning of a new exciting long-term joint effort by the climate science and AI community., Comment: Accepted for the Tackling Climate Change with Machine Learning Workshop at NeurIPS 2020
Published: 2020

8. Meta Answering for Machine Reading

Author: Borschinger, Benjamin, Boyd-Graber, Jordan, Buck, Christian, Bulian, Jannis, Ciaramita, Massimiliano, Huebscher, Michelle Chen, Gajewski, Wojciech, Kilcher, Yannic, Nogueira, Rodrigo, and Saralegu, Lierni Sestorain
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We investigate a framework for machine reading, inspired by real world information-seeking problems, where a meta question answering system interacts with a black box environment. The environment encapsulates a competitive machine reader based on BERT, providing candidate answers to questions, and possibly some context. To validate the realism of our formulation, we ask humans to play the role of a meta-answerer. With just a small snippet of text around an answer, humans can outperform the machine reader, improving recall. Similarly, a simple machine meta-answerer outperforms the environment, improving both precision and recall on the Natural Questions dataset. The system relies on joint training of answer scoring and the selection of conditioning information.
Published: 2019

9. Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation

Author: Nogueira, Rodrigo, Bulian, Jannis, and Ciaramita, Massimiliano
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We propose a method to efficiently learn diverse strategies in reinforcement learning for query reformulation in the tasks of document retrieval and question answering. In the proposed framework an agent consists of multiple specialized sub-agents and a meta-agent that learns to aggregate the answers from sub-agents to produce a final answer. Sub-agents are trained on disjoint partitions of the training data, while the meta-agent is trained on the full training set. Our method makes learning faster, because it is highly parallelizable, and has better generalization performance than strong baselines, such as an ensemble of agents trained on the full data. We show that the improved performance is due to the increased diversity of reformulation strategies.
Published: 2018

10. Analyzing Language Learned by an Active Question Answering Agent

Author: Buck, Christian, Bulian, Jannis, Ciaramita, Massimiliano, Gajewski, Wojciech, Gesmundo, Andrea, Houlsby, Neil, and Wang, Wei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017]. In ActiveQA, question answering is framed as a reinforcement learning task in which an agent sits between the user and a black box question-answering system. The agent learns to reformulate the user's questions to elicit the optimal answers. It probes the system with many versions of a question that are generated via a sequence-to-sequence question reformulation model, then aggregates the returned evidence to find the best answer. This process is an instance of \emph{machine-machine} communication. The question reformulation model must adapt its language to increase the quality of the answers returned, matching the language of the question answering system. We find that the agent does not learn transformations that align with semantic intuitions but discovers through learning classical information retrieval techniques such as tf-idf re-weighting and stemming., Comment: Emergent Communication Workshop, NIPS 2017
Published: 2018

11. Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

Author: Buck, Christian, Bulian, Jannis, Ciaramita, Massimiliano, Gajewski, Wojciech, Gesmundo, Andrea, Houlsby, Neil, and Wang, Wei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We frame Question Answering (QA) as a Reinforcement Learning task, an approach that we call Active Question Answering. We propose an agent that sits between the user and a black box QA system and learns to reformulate questions to elicit the best possible answers. The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned evidence to yield the best answer. The reformulation system is trained end-to-end to maximize answer quality using policy gradient. We evaluate on SearchQA, a dataset of complex questions extracted from Jeopardy!. The agent outperforms a state-of-the-art base model, playing the role of the environment, and other benchmarks. We also analyze the language that the agent has learned while interacting with the question answering system. We find that successful question reformulations look quite different from natural language paraphrases. The agent is able to discover non-trivial reformulation strategies that resemble classic information retrieval techniques such as term re-weighting (tf-idf) and stemming.
Published: 2017

12. Assessing Large Language Models on climate information

Author: Bulian, Jannis, Schäfer, Mike S; https://orcid.org/0000-0002-0847-7503, Amini, Afra, Lam, Heidi, Ciaramita, Massimiliano, Gaiarin, Ben, Chen Hübscher, Michelle, Buck, Christian, Mede, Niels G; https://orcid.org/0000-0001-5707-7568, Leippold, Markus; https://orcid.org/0000-0001-5983-2360, Strauss, Nadine; https://orcid.org/0000-0002-5050-7067, Bulian, Jannis, Schäfer, Mike S; https://orcid.org/0000-0002-0847-7503, Amini, Afra, Lam, Heidi, Ciaramita, Massimiliano, Gaiarin, Ben, Chen Hübscher, Michelle, Buck, Christian, Mede, Niels G; https://orcid.org/0000-0001-5707-7568, Leippold, Markus; https://orcid.org/0000-0001-5983-2360, and Strauss, Nadine; https://orcid.org/0000-0002-5050-7067
Abstract: As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. Our evaluation task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education. We evaluate several recent LLMs on a set of diverse climate questions. Our results point to a significant gap between surface and epistemological qualities of LLMs in the realm of climate communication.
Published: 2024

13. Fixed-parameter Tractable Distances to Sparse Graph Classes

Author: Bulian, Jannis and Dawar, Anuj
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Computational Complexity, Computer Science - Discrete Mathematics, Computer Science - Logic in Computer Science
Abstract: We show that for various classes C of sparse graphs, and several measures of distance to such classes (such as edit distance and elimination distance), the problem of determining the distance of a given graph G to C is fixed-parameter tractable. The results are based on two general techniques. The first of these, building on recent work of Grohe et al. establishes that any class of graphs that is slicewise nowhere dense and slicewise first-order definable is FPT. The second shows that determining the elimination distance of a graph G to a minor-closed class C is FPT.
Published: 2015

14. Graph Isomorphism Parameterized by Elimination Distance to Bounded Degree

Author: Bulian, Jannis and Dawar, Anuj
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Computational Complexity, Computer Science - Discrete Mathematics, F.2.2, G.2.2
Abstract: A commonly studied means of parameterizing graph problems is the deletion distance from triviality (Guo et al. 2004), which counts vertices that need to be deleted from a graph to place it in some class for which efficient algorithms are known. In the context of graph isomorphism, we define triviality to mean a graph with maximum degree bounded by a constant, as such graph classes admit polynomial-time isomorphism tests. We generalise deletion distance to a measure we call elimination distance to triviality, based on elimination trees or tree-depth decompositions. We establish that graph canonisation, and thus graph isomorphism, is FPT when parameterized by elimination distance to bounded degree, extending results of Bouland et al. (2012)., Comment: 19 pages
Published: 2014

15. Bare canonicity of representable cylindric and polyadic algebras

Author: Bulian, Jannis and Hodkinson, Ian
Subjects: Mathematics - Logic, Computer Science - Logic in Computer Science, 03G15 (Primary) 03C05, 06B15, 06E15, 06E25 (Secondary)
Abstract: We show that for finite n at least 3, every first-order axiomatisation of the varieties of representable n-dimensional cylindric algebras, diagonal-free cylindric algebras, polyadic algebras, and polyadic equality algebras contains an infinite number of non-canonical formulas. We also show that the class of structures for each of these varieties is non-elementary. The proofs employ algebras derived from random graphs.
Published: 2012
Full Text: View/download PDF

16. Fixed-Parameter Tractable Distances to Sparse Graph Classes

Author: Bulian, Jannis and Dawar, Anuj
Published: 2017
Full Text: View/download PDF

17. Graph Isomorphism Parameterized by Elimination Distance to Bounded Degree

Author: Bulian, Jannis and Dawar, Anuj
Published: 2016
Full Text: View/download PDF

18. Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation

Author: Bulian, Jannis, primary, Buck, Christian, additional, Gajewski, Wojciech, additional, Börschinger, Benjamin, additional, and Schuster, Tal, additional
Published: 2022
Full Text: View/download PDF

19. Graph Isomorphism Parameterized by Elimination Distance to Bounded Degree

Author: Bulian, Jannis, primary and Dawar, Anuj, additional
Published: 2014
Full Text: View/download PDF

20. Fool Me Twice: Entailment from Wikipedia Gamification

Author: Eisenschlos, Julian, primary, Dhingra, Bhuwan, additional, Bulian, Jannis, additional, Börschinger, Benjamin, additional, and Boyd-Graber, Jordan, additional
Published: 2021
Full Text: View/download PDF

21. Recognizing Multimodal Entailment

Author: Ilharco, Cesar, primary, Shirazi, Afsaneh, additional, Gopalan, Arjun, additional, Nagrani, Arsha, additional, Bratanic, Blaz, additional, Bregler, Chris, additional, Funk, Christina, additional, Ferreira, Felipe, additional, Barcik, Gabriel, additional, Ilharco, Gabriel, additional, Osang, Georg, additional, Bulian, Jannis, additional, Frank, Jared, additional, Smaira, Lucas, additional, Cao, Qin, additional, Marino, Ricardo, additional, Patel, Roma, additional, Leung, Thomas, additional, and Imbrasaite, Vaiva, additional
Published: 2021
Full Text: View/download PDF

22. Fixed-Parameter Tractable Distances to Sparse Graph Classes

Author: Bulian, Jannis, primary and Dawar, Anuj, additional
Published: 2016
Full Text: View/download PDF

23. Fixed-parameter Tractable Distances to Sparse Graph Classes

Author: Jannis Bulian and Anuj Dawar, Bulian, Jannis, Dawar, Anuj, Jannis Bulian and Anuj Dawar, Bulian, Jannis, and Dawar, Anuj
Abstract: We show that for various classes C of sparse graphs, and several measures of distance to such classes (such as edit distance and elimination distance), the problem of determining the distance of a given graph G to C is fixed-parameter tractable. The results are based on two general techniques. The first of these, building on recent work of Grohe et al. establishes that any class of graphs that is slicewise nowhere dense and slicewise first-order definable is FPT. The second shows that determining the elimination distance of a graph G to a minor-closed class C is FPT.
Published: 2015
Full Text: View/download PDF

24. Graph Isomorphism Parameterized by Elimination Distance to Bounded Degree

Author: Bulian, Jannis, primary and Dawar, Anuj, additional
Published: 2015
Full Text: View/download PDF

25. Bare canonicity of representable cylindric and polyadic algebras

Author: Bulian, Jannis, primary and Hodkinson, Ian, additional
Published: 2013
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

25 results on '"Bulian, Jannis"'

1. How Susceptible are LLMs to Influence in Prompts?

2. On scalable oversight with weak LLMs judging strong LLMs

3. Assessing Large Language Models on Climate Information

4. Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

5. Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation

6. Fool Me Twice: Entailment from Wikipedia Gamification

7. CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims

8. Meta Answering for Machine Reading

9. Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation

10. Analyzing Language Learned by an Active Question Answering Agent

11. Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

12. Assessing Large Language Models on climate information

13. Fixed-parameter Tractable Distances to Sparse Graph Classes

14. Graph Isomorphism Parameterized by Elimination Distance to Bounded Degree

15. Bare canonicity of representable cylindric and polyadic algebras

16. Fixed-Parameter Tractable Distances to Sparse Graph Classes

17. Graph Isomorphism Parameterized by Elimination Distance to Bounded Degree

18. Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation

19. Graph Isomorphism Parameterized by Elimination Distance to Bounded Degree

20. Fool Me Twice: Entailment from Wikipedia Gamification

21. Recognizing Multimodal Entailment

22. Fixed-Parameter Tractable Distances to Sparse Graph Classes

23. Fixed-parameter Tractable Distances to Sparse Graph Classes

24. Graph Isomorphism Parameterized by Elimination Distance to Bounded Degree

25. Bare canonicity of representable cylindric and polyadic algebras

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

25 results on '"Bulian, Jannis"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources