292 results on '"Clark, Peter A"'
Search Results
2. Human Ageing Genomic Resources: updates on key databases in ageing research
- Author
-
Biotechnology and Biological Sciences Research Council (UK), #NODATA#, de Magalhães, João Pedro, Abidi, Zoya, Dos Santos, Gabriel Arantes, Avelar, Roberto A, Barardo, Diogo, Chatsirisupachai, Kasit, Clark, Peter, De-Souza, Evandro A, Johnson, Emily J, Lopes, Inês, Novoa, Guy, Senez, Ludovic, Talay, Angelo, Thornton, Daniel, To, Paul Ka Po, Biotechnology and Biological Sciences Research Council (UK), #NODATA#, de Magalhães, João Pedro, Abidi, Zoya, Dos Santos, Gabriel Arantes, Avelar, Roberto A, Barardo, Diogo, Chatsirisupachai, Kasit, Clark, Peter, De-Souza, Evandro A, Johnson, Emily J, Lopes, Inês, Novoa, Guy, Senez, Ludovic, Talay, Angelo, Thornton, Daniel, and To, Paul Ka Po
- Abstract
Ageing is a complex and multifactorial process. For two decades, the Human Ageing Genomic Resources (HAGR) have aided researchers in the study of various aspects of ageing and its manipulation. Here, we present the key features and recent enhancements of these resources, focusing on its six main databases. One database, GenAge, focuses on genes related to ageing, featuring 307 genes linked to human ageing and 2205 genes associated with longevity and ageing in model organisms. AnAge focuses on ageing, longevity, and life-history across animal species, containing data on 4645 species. DrugAge includes information about 1097 longevity drugs and compounds in model organisms such as mice, rats, flies, worms and yeast. GenDR provides a list of 214 genes associated with the life-extending benefits of dietary restriction in model organisms. CellAge contains a catalogue of 866 genes associated with cellular senescence. The LongevityMap serves as a repository for genetic variants associated with human longevity, encompassing 3144 variants pertaining to 884 genes. Additionally, HAGR provides various tools as well as gene expression signatures of ageing, dietary restriction, and replicative senescence based on meta-analyses. Our databases are integrated, regularly updated, and manually curated by experts. HAGR is freely available online (https://genomics.senescence.info/).
- Published
- 2024
3. Long-term follow-up observations of extreme coronal line emitting galaxies
- Author
-
Science and Technology Facilities Council (UK), Agencia Estatal de Investigación (España), Ministerio de Ciencia, Innovación y Universidades (España), Ministerio de Ciencia e Innovación (España), European Commission, UK Space Agency, National Science Foundation (US), National Aeronautics and Space Administration (US), Clark, Peter, Müller-Bravo, Tomás E., Science and Technology Facilities Council (UK), Agencia Estatal de Investigación (España), Ministerio de Ciencia, Innovación y Universidades (España), Ministerio de Ciencia e Innovación (España), European Commission, UK Space Agency, National Science Foundation (US), National Aeronautics and Space Administration (US), Clark, Peter, and Müller-Bravo, Tomás E.
- Abstract
We present new spectroscopic and photometric follow-up observations of the known sample of extreme coronal line-emitting galaxies (ECLEs) identified in the Sloan Digital Sky Survey (SDSS). With these new data, observations of the ECLE sample now span a period of two decades following their initial SDSS detections. We confirm the non-recurrence of the iron coronal line signatures in five of the seven objects, further supporting their identification as the transient light echoes of tidal disruption events (TDEs). Photometric observations of these objects in optical bands show little overall evolution. In contrast, mid-infrared (MIR) observations show ongoing long-term declines consistent with power-law decay. The remaining two objects had been classified as active galactic nuclei (AGNs) with unusually strong coronal lines rather than being TDE related, given the persistence of the coronal lines in earlier follow-up spectra. We confirm this classification, with our spectra continuing to show the presence of strong, unchanged coronal line features and AGN-like MIR colours and behaviour. We have constructed spectral templates of both subtypes of ECLE to aid in distinguishing the likely origin of newly discovered ECLEs. We highlight the need for higher cadence, and more rapid, follow-up observations of such objects to better constrain their properties and evolution. We also discuss the relationships between ECLEs, TDEs, and other identified transients having significant MIR variability.
- Published
- 2024
4. I'm in AGNi: A new standard for AGN pluralisation
- Author
-
Gow, Andrew D., Clark, Peter, Rycanowski, Dan, Gow, Andrew D., Clark, Peter, and Rycanowski, Dan
- Abstract
We present a new standard acronym for Active Galactic Nuclei, finally settling the argument of AGN vs. AGNs. Our new standard is not only etymologically superior (following the consensus set by SNe), but also boasts other linguistic opportunities, connecting strongly with relevant theology and streamlining descriptions of AGN properties., Comment: 4 pages, 3 figures, accepted for publication in Acta Prima Aprilia
- Published
- 2024
5. The rate of extreme coronal line emitting galaxies in the Sloan Digital Sky Survey and their relation to tidal disruption events
- Author
-
Callow, Joseph, Graur, Or, Clark, Peter, Palmese, Antonella, Aguilar, Jessica, Ahlen, Steven, BenZvi, Segev, Brooks, David, Claybaugh, Todd, de la Macorra, Axel, Doel, Peter, Forero-Romero, Jaime E., Gaztañaga, Enrique, Gontcho, Satya Gontcho A, Lambert, Andrew, Landriau, Martin, Manera, Marc, Meisner, Aaron, Miquel, Ramon, Moustakas, John, Nie, Jundan, Poppett, Claire, Prada, Francisco, Rezaie, Mehdi, Rossi, Graziano, Sanchez, Eusebio, Silber, Joseph H., Tarlé, Gregory, Weaver, Benjamin A., Zhou, Zhimin, Callow, Joseph, Graur, Or, Clark, Peter, Palmese, Antonella, Aguilar, Jessica, Ahlen, Steven, BenZvi, Segev, Brooks, David, Claybaugh, Todd, de la Macorra, Axel, Doel, Peter, Forero-Romero, Jaime E., Gaztañaga, Enrique, Gontcho, Satya Gontcho A, Lambert, Andrew, Landriau, Martin, Manera, Marc, Meisner, Aaron, Miquel, Ramon, Moustakas, John, Nie, Jundan, Poppett, Claire, Prada, Francisco, Rezaie, Mehdi, Rossi, Graziano, Sanchez, Eusebio, Silber, Joseph H., Tarlé, Gregory, Weaver, Benjamin A., and Zhou, Zhimin
- Abstract
Strong high-ionization iron coronal lines (CLs) are a rare phenomenon observed in galaxy and quasi-stellar object spectra that are thought to be created as a result of tidal disruption event (TDE) flares. To test whether these CLs are the result of TDE activity, we search for extreme coronal line emitting galaxies (ECLEs) in the Sloan Digital Sky Survey (SDSS), measure their rate, and compare it to TDE rates from the literature. We detect sufficiently strong CLs in 14 objects, doubling the number previously found in SDSS. Using follow-up spectra from the Dark Energy Spectroscopic Instrument and Gemini Multi-Object Spectrograph, Wide-field Infrared Survey Explorer mid-infrared observations, and Liverpool Telescope optical photometry, we find that of the seven new objects, only one evolves in a manner consistent with that of the five previously discovered variable ECLEs. Using this new sample of six variable ECLEs, we calculate the galaxy-normalised rate of ECLEs in SDSS to be $R_\mathrm{G}=2.2~^{+1.3}_{-0.8}~(\mathrm{statistical})~^{+0.0}_{-1.3}~(\mathrm{systematic})\times10^{-5}~\mathrm{galaxy}^{-1}~\mathrm{year}^{-1}$. The mass-normalised rate is $R_\mathrm{M}=1.9~^{+1.1}_{-0.7}~(\mathrm{statistical})~^{+0.0}_{-1.1}~(\mathrm{systematic})\times10^{-16}~\mathrm{M_\odot^{-1}}~\mathrm{year}^{-1}$ and the volumetric rate is $R_\mathrm{V}=6.9~^{+5.6}_{-2.1}~(\mathrm{statistical})~^{+0.0}_{-3.9}~(\mathrm{systematic})\times10^{-8}~\mathrm{Mpc}^{-3}~\mathrm{year}^{-1}$. Our rates are comparable to TDE rates from the literature, supporting the suggestion that the CLs in variable ECLEs are the product of TDEs., Comment: Submitted to MNRAS. 19 pages, 12 figures
- Published
- 2024
6. Long-term follow-up observations of extreme coronal line emitting galaxies
- Author
-
Clark, Peter, Graur, Or, Callow, Joseph, Aguilar, Jessica, Ahlen, Steven, Anderson, Joseph P, Berger, Edo, Müller-bravo, Tomás E, Brink, Thomas G, Brooks, David, Chen, Ting-wan, Claybaugh, Todd, De la macorra, Axel, Doel, Peter, Filippenko, Alexei V, Forero-romero, Jamie E, Gomez, Sebastian, Gromadzki, Mariusz, Honscheid, Klaus, Inserra, Cosimo, Kisner, Theodore, Landriau, Martin, Makrygianni, Lydia, Manera, Marc, Meisner, Aaron, Miquel, Ramon, Moustakas, John, Nicholl, Matt, Nie, Jundan, Onori, Francesca, Palmese, Antonella, Poppett, Claire, Reynolds, Thomas, Rezaie, Mehdi, Rossi, Graziano, Sanchez, Eusebio, Schubnell, Michael, Tarlé, Gregory, Weaver, Benjamin A, Wevers, Thomas, Young, David R, Zheng, Weikang, Zhou, Zhimin, Clark, Peter, Graur, Or, Callow, Joseph, Aguilar, Jessica, Ahlen, Steven, Anderson, Joseph P, Berger, Edo, Müller-bravo, Tomás E, Brink, Thomas G, Brooks, David, Chen, Ting-wan, Claybaugh, Todd, De la macorra, Axel, Doel, Peter, Filippenko, Alexei V, Forero-romero, Jamie E, Gomez, Sebastian, Gromadzki, Mariusz, Honscheid, Klaus, Inserra, Cosimo, Kisner, Theodore, Landriau, Martin, Makrygianni, Lydia, Manera, Marc, Meisner, Aaron, Miquel, Ramon, Moustakas, John, Nicholl, Matt, Nie, Jundan, Onori, Francesca, Palmese, Antonella, Poppett, Claire, Reynolds, Thomas, Rezaie, Mehdi, Rossi, Graziano, Sanchez, Eusebio, Schubnell, Michael, Tarlé, Gregory, Weaver, Benjamin A, Wevers, Thomas, Young, David R, Zheng, Weikang, and Zhou, Zhimin
- Published
- 2024
7. Long-term follow-up observations of extreme coronal line emitting galaxies
- Author
-
Clark, Peter, Graur, Or, Callow, Joseph, Aguilar, Jessica, Ahlen, Steven, Anderson, Joseph P, Berger, Edo, Müller-bravo, Tomás E, Brink, Thomas G, Brooks, David, Chen, Ting-wan, Claybaugh, Todd, De la macorra, Axel, Doel, Peter, Filippenko, Alexei V, Forero-romero, Jamie E, Gomez, Sebastian, Gromadzki, Mariusz, Honscheid, Klaus, Inserra, Cosimo, Kisner, Theodore, Landriau, Martin, Makrygianni, Lydia, Manera, Marc, Meisner, Aaron, Miquel, Ramon, Moustakas, John, Nicholl, Matt, Nie, Jundan, Onori, Francesca, Palmese, Antonella, Poppett, Claire, Reynolds, Thomas, Rezaie, Mehdi, Rossi, Graziano, Sanchez, Eusebio, Schubnell, Michael, Tarlé, Gregory, Weaver, Benjamin A, Wevers, Thomas, Young, David R, Zheng, Weikang, Zhou, Zhimin, Clark, Peter, Graur, Or, Callow, Joseph, Aguilar, Jessica, Ahlen, Steven, Anderson, Joseph P, Berger, Edo, Müller-bravo, Tomás E, Brink, Thomas G, Brooks, David, Chen, Ting-wan, Claybaugh, Todd, De la macorra, Axel, Doel, Peter, Filippenko, Alexei V, Forero-romero, Jamie E, Gomez, Sebastian, Gromadzki, Mariusz, Honscheid, Klaus, Inserra, Cosimo, Kisner, Theodore, Landriau, Martin, Makrygianni, Lydia, Manera, Marc, Meisner, Aaron, Miquel, Ramon, Moustakas, John, Nicholl, Matt, Nie, Jundan, Onori, Francesca, Palmese, Antonella, Poppett, Claire, Reynolds, Thomas, Rezaie, Mehdi, Rossi, Graziano, Sanchez, Eusebio, Schubnell, Michael, Tarlé, Gregory, Weaver, Benjamin A, Wevers, Thomas, Young, David R, Zheng, Weikang, and Zhou, Zhimin
- Published
- 2024
8. PROC2PDDL: Open-Domain Planning Representations from Texts
- Author
-
Zhang, Tianyi, Zhang, Li, Hou, Zhaoyi, Wang, Ziyu, Gu, Yuling, Clark, Peter, Callison-Burch, Chris, Tandon, Niket, Zhang, Tianyi, Zhang, Li, Hou, Zhaoyi, Wang, Ziyu, Gu, Yuling, Clark, Peter, Callison-Burch, Chris, and Tandon, Niket
- Abstract
Planning in a text-based environment continues to be a major challenge for AI systems. Recent approaches have used language models to predict a planning domain definition (e.g., PDDL) but have only been evaluated in closed-domain simulated environments. To address this, we present Proc2PDDL , the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations. Using this dataset, we evaluate state-of-the-art models on defining the preconditions and effects of actions. We show that Proc2PDDL is highly challenging, with GPT-3.5's success rate close to 0% and GPT-4's around 35%. Our analysis shows both syntactic and semantic errors, indicating LMs' deficiency in both generating domain-specific prgorams and reasoning about events. We hope this analysis and dataset helps future progress towards integrating the best of LMs and formal planning.
- Published
- 2024
9. Data-driven Discovery with Large Generative Models
- Author
-
Majumder, Bodhisattwa Prasad, Surana, Harshit, Agarwal, Dhruv, Hazra, Sanchaita, Sabharwal, Ashish, Clark, Peter, Majumder, Bodhisattwa Prasad, Surana, Harshit, Agarwal, Dhruv, Hazra, Sanchaita, Sabharwal, Ashish, and Clark, Peter
- Abstract
With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery -- a paradigm encompassing the search and verification of hypotheses purely from a set of provided datasets, without the need for additional data collection or physical experiments. We first outline several desiderata for an ideal data-driven discovery system. Then, through DATAVOYAGER, a proof-of-concept utilizing GPT-4, we demonstrate how LGMs fulfill several of these desiderata -- a feat previously unattainable -- while also highlighting important limitations in the current system that open up opportunities for novel ML research. We contend that achieving accurate, reliable, and robust end-to-end discovery systems solely through the current capabilities of LGMs is challenging. We instead advocate for fail-proof tool integration, along with active user moderation through feedback mechanisms, to foster data-driven scientific discoveries with efficiency and reproducibility.
- Published
- 2024
10. Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic
- Author
-
Weir, Nathaniel, Sanders, Kate, Weller, Orion, Sharma, Shreya, Jiang, Dongwei, Jiang, Zhengping, Mishra, Bhavana Dalvi, Tafjord, Oyvind, Jansen, Peter, Clark, Peter, Van Durme, Benjamin, Weir, Nathaniel, Sanders, Kate, Weller, Orion, Sharma, Shreya, Jiang, Dongwei, Jiang, Zhengping, Mishra, Bhavana Dalvi, Tafjord, Oyvind, Jansen, Peter, Clark, Peter, and Van Durme, Benjamin
- Abstract
Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy datasets and limited performance gains by modern neuro-symbolic engines. To address these problems, we formulate a consistent and theoretically grounded approach to annotating decompositional entailment datasets, and evaluate its impact on LLM-based textual inference. We find that our resulting dataset, RDTE (Recognizing Decompositional Textual Entailment), has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets, suggesting that RDTE is a significant step forward in the long-standing problem of forming a clear protocol for discerning entailment. We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in a modern neuro-symbolic reasoning engine significantly improves results (both accuracy and proof quality) over other entailment classifier baselines, illustrating the practical benefit of this advance for textual inference.
- Published
- 2024
11. Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
- Author
-
Nottingham, Kolby, Majumder, Bodhisattwa Prasad, Mishra, Bhavana Dalvi, Singh, Sameer, Clark, Peter, Fox, Roy, Nottingham, Kolby, Majumder, Bodhisattwa Prasad, Mishra, Bhavana Dalvi, Singh, Sameer, Clark, Peter, and Fox, Roy
- Abstract
Large language models (LLMs) have recently been used for sequential decision making in interactive environments. However, leveraging environment reward signals for continual LLM actor improvement is not straightforward. We propose Skill Set Optimization (SSO) for improving LLM actor performance through constructing and refining sets of transferable skills. SSO constructs skills by extracting common subtrajectories with high rewards and generating subgoals and instructions to represent each skill. These skills are provided to the LLM actor in-context to reinforce behaviors with high rewards. Then, SSO further refines the skill set by pruning skills that do not continue to result in high rewards. We evaluate our method in the classic videogame NetHack and the text environment ScienceWorld to demonstrate SSO's ability to optimize a set of skills and perform in-context policy improvement. SSO outperforms baselines by 40% in our custom NetHack task and outperforms the previous state-of-the-art in ScienceWorld by 35%., Comment: 8 pages, preprint
- Published
- 2024
12. The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
- Author
-
Hase, Peter, Bansal, Mohit, Clark, Peter, Wiegreffe, Sarah, Hase, Peter, Bansal, Mohit, Clark, Peter, and Wiegreffe, Sarah
- Abstract
How can we train models to perform well on hard test data when hard training data is by definition difficult to label correctly? This question has been termed the scalable oversight problem and has drawn increasing attention as language models have continually improved. In this paper, we present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data, even performing as well as oracle models finetuned on hard data. We demonstrate this kind of easy-to-hard generalization using simple finetuning methods like in-context learning, linear classifier heads, and QLoRA for seven different measures of datapoint hardness, including six empirically diverse human hardness measures (like grade level) and one model-based measure (loss-based). Furthermore, we show that even if one cares most about model performance on hard data, it can be better to collect easy data rather than hard data for finetuning, since hard data is generally noisier and costlier to collect. Our experiments use open models up to 70b in size and four publicly available question-answering datasets with questions ranging in difficulty from 3rd grade science questions to college level STEM questions and general-knowledge trivia. We conclude that easy-to-hard generalization in LMs is surprisingly strong for the tasks studied. Our code is available at: https://github.com/allenai/easy-to-hard-generalization, Comment: ACL 2024. 23 pages, 20 figures
- Published
- 2024
13. PDDLEGO: Iterative Planning in Textual Environments
- Author
-
Zhang, Li, Jansen, Peter, Zhang, Tianyi, Clark, Peter, Callison-Burch, Chris, Tandon, Niket, Zhang, Li, Jansen, Peter, Zhang, Tianyi, Clark, Peter, Callison-Burch, Chris, and Tandon, Niket
- Abstract
Planning in textual environments have been shown to be a long-standing challenge even for current models. A recent, promising line of work uses LLMs to generate a formal representation of the environment that can be solved by a symbolic planner. However, existing methods rely on a fully-observed environment where all entity states are initially known, so a one-off representation can be constructed, leading to a complete plan. In contrast, we tackle partially-observed environments where there is initially no sufficient information to plan for the end-goal. We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal. By accomplishing the sub-goal, more information is acquired to augment the representation, eventually achieving the end-goal. We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation, with strong performance (98%) on the more complex Cooking World simulation where end-to-end LLMs fail to generate coherent plans (4%)., Comment: In *SEM 2024
- Published
- 2024
14. Learning to Reason via Program Generation, Emulation, and Search
- Author
-
Weir, Nathaniel, Khalifa, Muhammad, Qiu, Linlu, Weller, Orion, Clark, Peter, Weir, Nathaniel, Khalifa, Muhammad, Qiu, Linlu, Weller, Orion, and Clark, Peter
- Abstract
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e.g. word concatenation). However, not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding. Our goal is to extend an LM's program synthesis skills to such tasks and evaluate the results via pseudo-programs, namely Python programs where some leaf function calls are left undefined. To that end, we propose, Code Generation and Emulated EXecution (CoGEX). CoGEX works by (1) training LMs to generate their own pseudo-programs, (2) teaching them to emulate their generated program's execution, including those leaf functions, allowing the LM's knowledge to fill in the execution gaps; and (3) using them to search over many programs to find an optimal one. To adapt the CoGEX model to a new task, we introduce a method for performing program search to find a single program whose pseudo-execution yields optimal performance when applied to all the instances of a given dataset. We show that our approach yields large improvements compared to standard in-context learning approaches on a battery of tasks, both algorithmic and soft reasoning. This result thus demonstrates that code synthesis can be applied to a much broader class of problems than previously considered. Our released dataset, fine-tuned models, and implementation can be found at \url{https://github.com/nweir127/CoGEX}., Comment: 16 pages, 10 figures
- Published
- 2024
15. Characterizing prostate cancer risk through multi-ancestry genome-wide discovery of 187 novel risk variants
- Author
-
Wang, Anqi, Shen, Jiayi, Rodriguez, Alex A., Saunders, Edward J., Chen, Fei, Janivara, Rohini, Darst, Burcu F., Sheng, Xin, Xu, Yili, Chou, Alisha J., Benlloch, Sara, Dadaev, Tokhir, Brook, Mark N., Plym, Anna, Sahimi, Ali, Hoffman, Thomas J., Takahashi, Atushi, Matsuda, Koichi, Momozawa, Yukihide, Fujita, Masashi, Laisk, Triin, Figueredo, Jessica, Muir, Kenneth, Ito, Shuji, Liu, Xiaoxi, Uchio, Yuji, Kubo, Michiaki, Kamatani, Yoichiro, Lophatananon, Artitaya, Wan, Peggy, Andrews, Caroline, Lori, Adriana, Choudhury, Parichoy P., Schleutker, Johanna, Tammela, Teuvo L. J., Sipeky, Csilla, Auvinen, Anssi, Giles, Graham G., Southey, Melissa C., MacInnis, Robert J., Cybulski, Cezary, Wokolorczyk, Dominika, Lubinski, Jan, Rentsch, Christopher T., Cho, Kelly, Mcmahon, Benjamin H., Neal, David E., Donovan, Jenny L., Hamdy, Freddie C., Martin, Richard M., Nordestgaard, Borge G., Nielsen, Sune F., Weischer, Maren, Bojesen, Stig E., Roder, Andreas, Stroomberg, Hein V., Batra, Jyotsna, Chambers, Suzanne, Horvath, Lisa, Clements, Judith A., Tilly, Wayne, Risbridger, Gail P., Gronberg, Henrik, Aly, Markus, Szulkin, Robert, Eklund, Martin, Nordstrom, Tobias, Pashayan, Nora, Dunning, Alison M., Ghoussaini, Maya, Travis, Ruth C., Key, Tim J., Riboli, Elio, Park, Jong Y., Sellers, Thomas A., Lin, Hui-Yi, Albanes, Demetrius, Weinstein, Stephanie, Cook, Michael B., Mucci, Lorelei A., Giovannucci, Edward, Lindstrom, Sara, Kraft, Peter, Hunter, David J., Penney, Kathryn L., Turman, Constance, Tangen, Catherine M., Goodman, Phyllis J., Thompson, Ian M., Jr., Hamilton, Robert J., Fleshner, Neil E., Finelli, Antonio, Parent, Marie-Elise, Stanford, Janet L., Ostrander, Elaine A., Koutros, Stella, Freeman, Laura E. Beane, Stampfer, Meir, Wolk, Alicja, Hakansson, Niclas, Andriole, Gerald L., Hoover, Robert N., Machiela, Mitchell J., Sorensen, Karina Dalsgaard, Borre, Michael, Blot, William J., Zheng, Wei, Yeboah, Edward D., Mensah, James E., Lu, Yong-Jie, Zhang, Hong-Wei, Feng, Ninghan, Mao, Xueying, Wu, Yudong, Zhao, Shan-Chao, Sun, Zan, Thibodeau, Stephen N., McDonnell, Shannon K., Schaid, Daniel J., West, Catharine M. L., Barnett, Gill, Maier, Christiane, Schnoeller, Thomas, Luedeke, Manuel, Kibel, Adam S., Drake, Bettina F., Cussenot, Olivier, Cancel-Tassin, Geraldine, Menegaux, Florence, Truong, Therese, Koudou, Yves Akoli, John, Esther M., Grindedal, Eli Marie, Maehle, Lovise, Khaw, Kay-Tee, Ingles, Sue A., Stern, Mariana C., Vega, Ana, Gomez-Caamano, Antonio, Fachal, Laura, Rosenstein, Barry S., Kerns, Sarah L., Ostrer, Harry, Teixeira, Manuel R., Paulo, Paula, Brandao, Andreia, Watya, Stephen, Lubwama, Alexander, Bensen, Jeannette T., Butler, Ebonee N., Mohler, James L., Taylor, Jack A., Kogevinas, Manolis, Dierssen-Sotos, Trinidad, Castano-Vinyals, Gemma, Cannon-Albright, Lisa, Teerlink, Craig C., Huff, Chad D., Pilie, Patrick, Yu, Yao, Bohlender, Ryan J., Gu, Jian, Strom, Sara S., Multigner, Luc, Blanchet, Pascal, Brureau, Laurent, Kaneva, Radka, Slavov, Chavdar, Mitev, Vanio, Leach, Robin J., Brenner, Hermann, Chen, Xuechen, Holleczek, Bernd, Schoettker, Ben, Klein, Eric A., Hsing, Ann W., Kittles, Rick A., Murphy, Adam B., Logothetis, Christopher J., Kim, Jeri, Neuhausen, Susan L., Steele, Linda, Ding, Yuan Chun, Isaacs, William B., Nemesure, Barbara, Hennis, Anselm J. M., Carpten, John, Pandha, Hardev, Michael, Agnieszka, De Ruyck, Kim, De Meerleer, Gert, Ost, Piet, Xu, Jianfeng, Razack, Azad, Lim, Jasmine, Teo, Soo-Hwang, Newcomb, Lisa F., Lin, Daniel W., Fowke, Jay H., Neslund-Dudas, Christine M., Rybicki, Benjamin A., Gamulin, Marija, Lessel, Davor, Kulis, Tomislav, Usmani, Nawaid, Abraham, Aswin, Singhal, Sandeep, Parliament, Matthew, Claessens, Frank, Joniau, Steven, Van den Broeck, Thomas, Gago-Dominguez, Manuela, Castelao, Jose Esteban, Martinez, Maria Elena, Larkin, Samantha, Townsend, Paul A., Aukim-Hastie, Claire, Bush, William S., Aldrich, Melinda C., Crawford, Dana C., Srivastava, Shiv, Cullen, Jennifer, Petrovics, Gyorgy, Casey, Graham, Wang, Ying, Tettey, Yao, Lachance, Joseph, Tang, Wei, Biritwum, Richard B., Adjei, Andrew A., Tay, Evelyn, Truelove, Ann, Niwa, Shelley, Yamoah, Kosj, Govindasami, Koveela, Chokkalingam, Anand P., Keaton, Jacob M., Hellwege, Jacklyn N., Clark, Peter E., Jalloh, Mohamed, Gueye, Serigne M., Niang, Lamine, Ogunbiyi, Olufemi, Shittu, Olayiwola, Amodu, Olukemi, Adebiyi, Akindele O., Aisuodionoe-Shadrach, Oseremen I., Ajibola, Hafees O., Jamda, Mustapha A., Oluwole, Olabode P., Nwegbu, Maxwell, Adusei, Ben, Mante, Sunny, Darkwa-Abrahams, Afua, Diop, Halimatou, Gundell, Susan M., Roobol, Monique J., Jenster, Guido, van Schaik, Ron H. N., Hu, Jennifer J., Sanderson, Maureen, Kachuri, Linda, Varma, Rohit, McKean-Cowdin, Roberta, Torres, Mina, Preuss, Michael H., Loos, Ruth J. F., Zawistowski, Matthew, Zollner, Sebastian, Lu, Zeyun, Van Den Eeden, Stephen K., Easton, Douglas F., Ambs, Stefan, Edwards, Todd L., Magi, Reedik, Rebbeck, Timothy R., Fritsche, Lars, Chanock, Stephen J., Berndt, Sonja I., Wiklund, Fredrik, Nakagawa, Hidewaki, Witte, John S., Gaziano, J. Michael, Justice, Amy C., Mancuso, Nick, Terao, Chikashi, Eeles, Rosalind A., Kote-Jarai, Zsofia, Madduri, Ravi K., Conti, David V., Haiman, Christopher A., Wang, Anqi, Shen, Jiayi, Rodriguez, Alex A., Saunders, Edward J., Chen, Fei, Janivara, Rohini, Darst, Burcu F., Sheng, Xin, Xu, Yili, Chou, Alisha J., Benlloch, Sara, Dadaev, Tokhir, Brook, Mark N., Plym, Anna, Sahimi, Ali, Hoffman, Thomas J., Takahashi, Atushi, Matsuda, Koichi, Momozawa, Yukihide, Fujita, Masashi, Laisk, Triin, Figueredo, Jessica, Muir, Kenneth, Ito, Shuji, Liu, Xiaoxi, Uchio, Yuji, Kubo, Michiaki, Kamatani, Yoichiro, Lophatananon, Artitaya, Wan, Peggy, Andrews, Caroline, Lori, Adriana, Choudhury, Parichoy P., Schleutker, Johanna, Tammela, Teuvo L. J., Sipeky, Csilla, Auvinen, Anssi, Giles, Graham G., Southey, Melissa C., MacInnis, Robert J., Cybulski, Cezary, Wokolorczyk, Dominika, Lubinski, Jan, Rentsch, Christopher T., Cho, Kelly, Mcmahon, Benjamin H., Neal, David E., Donovan, Jenny L., Hamdy, Freddie C., Martin, Richard M., Nordestgaard, Borge G., Nielsen, Sune F., Weischer, Maren, Bojesen, Stig E., Roder, Andreas, Stroomberg, Hein V., Batra, Jyotsna, Chambers, Suzanne, Horvath, Lisa, Clements, Judith A., Tilly, Wayne, Risbridger, Gail P., Gronberg, Henrik, Aly, Markus, Szulkin, Robert, Eklund, Martin, Nordstrom, Tobias, Pashayan, Nora, Dunning, Alison M., Ghoussaini, Maya, Travis, Ruth C., Key, Tim J., Riboli, Elio, Park, Jong Y., Sellers, Thomas A., Lin, Hui-Yi, Albanes, Demetrius, Weinstein, Stephanie, Cook, Michael B., Mucci, Lorelei A., Giovannucci, Edward, Lindstrom, Sara, Kraft, Peter, Hunter, David J., Penney, Kathryn L., Turman, Constance, Tangen, Catherine M., Goodman, Phyllis J., Thompson, Ian M., Jr., Hamilton, Robert J., Fleshner, Neil E., Finelli, Antonio, Parent, Marie-Elise, Stanford, Janet L., Ostrander, Elaine A., Koutros, Stella, Freeman, Laura E. Beane, Stampfer, Meir, Wolk, Alicja, Hakansson, Niclas, Andriole, Gerald L., Hoover, Robert N., Machiela, Mitchell J., Sorensen, Karina Dalsgaard, Borre, Michael, Blot, William J., Zheng, Wei, Yeboah, Edward D., Mensah, James E., Lu, Yong-Jie, Zhang, Hong-Wei, Feng, Ninghan, Mao, Xueying, Wu, Yudong, Zhao, Shan-Chao, Sun, Zan, Thibodeau, Stephen N., McDonnell, Shannon K., Schaid, Daniel J., West, Catharine M. L., Barnett, Gill, Maier, Christiane, Schnoeller, Thomas, Luedeke, Manuel, Kibel, Adam S., Drake, Bettina F., Cussenot, Olivier, Cancel-Tassin, Geraldine, Menegaux, Florence, Truong, Therese, Koudou, Yves Akoli, John, Esther M., Grindedal, Eli Marie, Maehle, Lovise, Khaw, Kay-Tee, Ingles, Sue A., Stern, Mariana C., Vega, Ana, Gomez-Caamano, Antonio, Fachal, Laura, Rosenstein, Barry S., Kerns, Sarah L., Ostrer, Harry, Teixeira, Manuel R., Paulo, Paula, Brandao, Andreia, Watya, Stephen, Lubwama, Alexander, Bensen, Jeannette T., Butler, Ebonee N., Mohler, James L., Taylor, Jack A., Kogevinas, Manolis, Dierssen-Sotos, Trinidad, Castano-Vinyals, Gemma, Cannon-Albright, Lisa, Teerlink, Craig C., Huff, Chad D., Pilie, Patrick, Yu, Yao, Bohlender, Ryan J., Gu, Jian, Strom, Sara S., Multigner, Luc, Blanchet, Pascal, Brureau, Laurent, Kaneva, Radka, Slavov, Chavdar, Mitev, Vanio, Leach, Robin J., Brenner, Hermann, Chen, Xuechen, Holleczek, Bernd, Schoettker, Ben, Klein, Eric A., Hsing, Ann W., Kittles, Rick A., Murphy, Adam B., Logothetis, Christopher J., Kim, Jeri, Neuhausen, Susan L., Steele, Linda, Ding, Yuan Chun, Isaacs, William B., Nemesure, Barbara, Hennis, Anselm J. M., Carpten, John, Pandha, Hardev, Michael, Agnieszka, De Ruyck, Kim, De Meerleer, Gert, Ost, Piet, Xu, Jianfeng, Razack, Azad, Lim, Jasmine, Teo, Soo-Hwang, Newcomb, Lisa F., Lin, Daniel W., Fowke, Jay H., Neslund-Dudas, Christine M., Rybicki, Benjamin A., Gamulin, Marija, Lessel, Davor, Kulis, Tomislav, Usmani, Nawaid, Abraham, Aswin, Singhal, Sandeep, Parliament, Matthew, Claessens, Frank, Joniau, Steven, Van den Broeck, Thomas, Gago-Dominguez, Manuela, Castelao, Jose Esteban, Martinez, Maria Elena, Larkin, Samantha, Townsend, Paul A., Aukim-Hastie, Claire, Bush, William S., Aldrich, Melinda C., Crawford, Dana C., Srivastava, Shiv, Cullen, Jennifer, Petrovics, Gyorgy, Casey, Graham, Wang, Ying, Tettey, Yao, Lachance, Joseph, Tang, Wei, Biritwum, Richard B., Adjei, Andrew A., Tay, Evelyn, Truelove, Ann, Niwa, Shelley, Yamoah, Kosj, Govindasami, Koveela, Chokkalingam, Anand P., Keaton, Jacob M., Hellwege, Jacklyn N., Clark, Peter E., Jalloh, Mohamed, Gueye, Serigne M., Niang, Lamine, Ogunbiyi, Olufemi, Shittu, Olayiwola, Amodu, Olukemi, Adebiyi, Akindele O., Aisuodionoe-Shadrach, Oseremen I., Ajibola, Hafees O., Jamda, Mustapha A., Oluwole, Olabode P., Nwegbu, Maxwell, Adusei, Ben, Mante, Sunny, Darkwa-Abrahams, Afua, Diop, Halimatou, Gundell, Susan M., Roobol, Monique J., Jenster, Guido, van Schaik, Ron H. N., Hu, Jennifer J., Sanderson, Maureen, Kachuri, Linda, Varma, Rohit, McKean-Cowdin, Roberta, Torres, Mina, Preuss, Michael H., Loos, Ruth J. F., Zawistowski, Matthew, Zollner, Sebastian, Lu, Zeyun, Van Den Eeden, Stephen K., Easton, Douglas F., Ambs, Stefan, Edwards, Todd L., Magi, Reedik, Rebbeck, Timothy R., Fritsche, Lars, Chanock, Stephen J., Berndt, Sonja I., Wiklund, Fredrik, Nakagawa, Hidewaki, Witte, John S., Gaziano, J. Michael, Justice, Amy C., Mancuso, Nick, Terao, Chikashi, Eeles, Rosalind A., Kote-Jarai, Zsofia, Madduri, Ravi K., Conti, David V., and Haiman, Christopher A.
- Abstract
The transferability and clinical value of genetic risk scores (GRSs) across populations remain limited due to an imbalance in genetic studies across ancestrally diverse populations. Here we conducted a multi-ancestry genome-wide association study of 156,319 prostate cancer cases and 788,443 controls of European, African, Asian and Hispanic men, reflecting a 57% increase in the number of non-European cases over previous prostate cancer genome-wide association studies. We identified 187 novel risk variants for prostate cancer, increasing the total number of risk variants to 451. An externally replicated multi-ancestry GRS was associated with risk that ranged from 1.8 (per standard deviation) in African ancestry men to 2.2 in European ancestry men. The GRS was associated with a greater risk of aggressive versus non-aggressive disease in men of African ancestry (P = 0.03). Our study presents novel prostate cancer susceptibility loci and a GRS with effective risk stratification across ancestry groups.
- Published
- 2023
- Full Text
- View/download PDF
16. Early Reduction of Glucose Consumption Is a Biomarker of Kinase Inhibitor Efficacy Which Can Be Reversed with GLUT1 Overexpression in Lung Cancer Cells
- Author
-
Ghezzi, Chiara, Ghezzi, Chiara, Perez, Stefani, Ryan, Kaitlin, Wong, Alicia, Chen, Bao Ying, Damoiseaux, Robert, Clark, Peter M, Ghezzi, Chiara, Ghezzi, Chiara, Perez, Stefani, Ryan, Kaitlin, Wong, Alicia, Chen, Bao Ying, Damoiseaux, Robert, and Clark, Peter M
- Abstract
PurposeSmall molecule inhibitors that target oncogenic driver kinases are an important class of therapies for non-small cell lung cancer (NSCLC) and other malignancies. However, these therapies are not without their challenges. Each inhibitor works on only a subset of patients, the pharmacokinetics of these inhibitors is variable, and these inhibitors are associated with significant side effects. Many of these inhibitors lack non-invasive biomarkers to confirm pharmacodynamic efficacy, and our understanding of how these inhibitors block cancer cell growth remains incomplete. Limited clinical studies suggest that early (< 2 weeks after start of therapy) changes in tumor glucose consumption, measured by [18F]FDG PET imaging, can predict therapeutic efficacy, but the scope of this strategy and functional relevance of this inhibition of glucose consumption remains understudied. Here we demonstrate that early inhibition of glucose consumption as can be measured clinically with [18F]FDG PET is a consistent phenotype of efficacious targeted kinase inhibitors and is necessary for the subsequent inhibition of growth across models of NSCLC.MethodsWe tested nine NSCLC cell lines (A549, H1129, H1734, H1993, H2228, H3122, H460, HCC827, and PC9 cells) and ten targeted therapies (afatinib, buparlisib, ceritinib, cabozantinib, crizotinib, dovitinib, erlotinib, ponatinib, trametinib, and vemurafenib) across concentrations ranging from 1.6 nM to 5 µM to evaluate whether these inhibitors block glucose consumption at 24-h post-drug treatment and cell growth at 72-h post-drug treatment. We overexpressed the facilitative glucose transporter SLC2A1 (GLUT1) to test the functional connection between blocked glucose consumption and cell growth after treatment with a kinase inhibitor. A subset of these inhibitors and cell lines were studied in vivo.ResultsAcross the nine NSCLC cell lines, ten targeted therapies, and a range of inhibitor concentrations, whether a kinase inhib
- Published
- 2023
17. PET imaging to monitor and study drug effects in liver disease and autoimmunity
- Author
-
Salas, Jessica, Clark, Peter M.1, Salas, Jessica, Salas, Jessica, Clark, Peter M.1, and Salas, Jessica
- Abstract
Autoimmune diseases affect approximately eight percent of the American population; their prevalence has increased globally. There are at least eighty different autoimmune diseases that have been discovered to date, each of which have different etiopathologies, clinical presentations and treatment courses. Autoimmune diseases are marked by a stark decrease in the quality of life of those afflicted. Understanding the biological processes that cause and contribute to these diseases is crucial for development of new treatment strategies and regimens. Chapter one of this dissertation summarizes the etiology and current treatment options for MS, how positron emission tomography (PET) is used to image pathways implicated in autoimmune diseases and summarizes the role that the deoxyribonucleoside salvage pathway has in autoimmunity. Chapter two explores the role the deoxyribonucleoside salvage pathway plays in the development of symptoms in a mouse model of multiple sclerosis. In this chapter, I illustrate that by targeting dCK (deoxycytidine kinase), the rate-limiting enzyme of the salvage pathway, there is a diminishment of clinical symptoms in myelin proteolipid protein (PLP139-151) induced EAE mice. This demonstrates that the salvage pathway plays a vital role in the pathology of the disease and could also play a role in the pathology of other autoimmune diseases. Furthermore, I explored the mechanism by which the small-molecule dCK inhibitor TRE-515 blocks dCK activity in vitro and in vivo, and how this affects the activation induced proliferation of pathogenic immune cells in our disease model.In chapter three, we evaluated whether we could use PET to assess and characterize drug-induced liver injury in mice and predict which mice would succumb to liver failure and those that would not. In chapter four, we evaluated whether we could visualize and quantify liver-infiltrating immune cells and hepatocyte inflammation using different PET radiotracers. Chapter five is a re
- Published
- 2023
18. Pacritinib inhibits glucose consumption in squamous cell lung cancer cells by targeting FLT3.
- Author
-
Ghezzi, Chiara, Ghezzi, Chiara, Chen, Bao Ying, Damoiseaux, Robert, Clark, Peter M, Ghezzi, Chiara, Ghezzi, Chiara, Chen, Bao Ying, Damoiseaux, Robert, and Clark, Peter M
- Abstract
Squamous cell lung cancer maintains its growth through elevated glucose consumption, but selective glucose consumption inhibitors are lacking. Here, we discovered using a high-throughput screen new compounds that block glucose consumption in three squamous cell lung cancer cell lines and identified 79 compounds that block glucose consumption in one or more of these cell lines. Based on its ability to block glucose consumption in all three cell lines, pacritinib, an inhibitor of FMS Related Receptor Tyrosine Kinase 3 (FLT3) and Janus Kinase 2 (JAK2), was further studied. Pacritinib decreased glucose consumption in squamous cell lung cancer cells in cell culture and in vivo without affecting glucose consumption in healthy tissues. Pacritinib blocked hexokinase activity, and Hexokinase 1 and 2 mRNA and protein expression. Overexpression of Hexokinase 1 blocked the ability of pacritinib to inhibit glucose consumption in squamous cell lung cancer cells. Overexpression of FLT3 but not JAK2 significantly increased glucose consumption and blocked the ability of pacritinib to inhibit glucose consumption in squamous cell lung cancer cells. Additional FLT3 inhibitors blocked glucose consumption in squamous cell lung cancer cells. Our study identifies FLT3 inhibitors as a new class of inhibitors that can block glucose consumption in squamous cell lung cancer.
- Published
- 2023
19. Gathering the Evidence for a Method to Assess Outcomes in Primary Care Dietetics in Australia
- Author
-
Clark, Peter W and Clark, Peter W
- Abstract
Managing chronic disease is Australia's biggest non-communicable health problem. Nearly half of Australia's population suffers from one chronic disease, and ten chronic diseases lead to the death of nine in ten Australians. Early intervention with preventative health measures can reduce or eliminate the later need for more intensive and costly care and improve the quality of life for those afflicted with chronic diseases. Nutrition is a major modifiable determinant of chronic disease, and dietitians are recognised experts in delivering evidence-based nutritional care. Dietitians play an essential role in managing chronic diseases in a primary care setting. Although dietitians are the fourth most accessed allied health professional in Australia's Medicare Chronic Disease Management program, there is no universal system in Australia for describing the activity of dietitians or capturing outcomes from their care. As a result, little is known about which practices bring about better care and outcomes in real-world settings. This research aimed to develop appropriate data standards for assessing process and outcome measures for private practice dietitians in primary care settings. Implementing data standards across the dietetic profession may lead to more effective dietetic practices and better health outcomes for all. [...], Thesis (PhD Doctorate), Doctor of Philosophy (PhD), School of Health Sci & Soc Wrk, Griffith Health, Full Text
- Published
- 2023
20. Long-term follow-up observations of extreme coronal line emitting galaxies
- Author
-
Clark, Peter, Graur, Or, Callow, Joseph, Aguilar, Jessica, Ahlen, Steven, Anderson, Joseph P., Berger, Edo, Brink, Thomas, Brooks, David, Chen, Ting-Wan, Claybaugh, Todd, de la Macorra, Axel, Doel, Peter, Filippenko, Alexei, Forero-Romero, Jamie, Gomez, Sebastian, Gromadzki, Mariusz, Honscheid, Klaus, Inserra, Cosimo, Kisner, Theodore, Landriau, Martin, Makrygianni, Lydia, Manera, Marc, Meisner, Aaron, Miquel, Ramon, Moustakas, John, Müller-Bravo, Tomás E., Nicholl, Matt, Nie, Jundan, Onori, Francesca, Palmese, Antonella, Poppett, Claire, Reynolds, Thomas, Rezaie, Mehdi, Rossi, Graziano, Sanchez, Eusebio, Schubnell, Michael, Tarlé, Gregory, Weaver, Benjamin A., Wevers, Thomas, Young, David R., Zheng, WeiKang, Zhou, Zhimin, Clark, Peter, Graur, Or, Callow, Joseph, Aguilar, Jessica, Ahlen, Steven, Anderson, Joseph P., Berger, Edo, Brink, Thomas, Brooks, David, Chen, Ting-Wan, Claybaugh, Todd, de la Macorra, Axel, Doel, Peter, Filippenko, Alexei, Forero-Romero, Jamie, Gomez, Sebastian, Gromadzki, Mariusz, Honscheid, Klaus, Inserra, Cosimo, Kisner, Theodore, Landriau, Martin, Makrygianni, Lydia, Manera, Marc, Meisner, Aaron, Miquel, Ramon, Moustakas, John, Müller-Bravo, Tomás E., Nicholl, Matt, Nie, Jundan, Onori, Francesca, Palmese, Antonella, Poppett, Claire, Reynolds, Thomas, Rezaie, Mehdi, Rossi, Graziano, Sanchez, Eusebio, Schubnell, Michael, Tarlé, Gregory, Weaver, Benjamin A., Wevers, Thomas, Young, David R., Zheng, WeiKang, and Zhou, Zhimin
- Abstract
We present new spectroscopic and photometric follow-up observations of the known sample of extreme coronal line emitting galaxies (ECLEs) identified in the Sloan Digital Sky Survey (SDSS). With these new data, observations of the ECLE sample now span a period of two decades following their initial SDSS detections. We confirm the nonrecurrence of the iron coronal line signatures in five of the seven objects, further supporting their identification as the transient light echoes of tidal disruption events (TDEs). Photometric observations of these objects in optical bands show little overall evolution. In contrast, mid-infrared (MIR) observations show ongoing long-term declines. The remaining two objects had been classified as active galactic nuclei (AGN) with unusually strong coronal lines rather than being TDE related, given the persistence of the coronal lines in earlier follow-up spectra. We confirm this classification, with our spectra continuing to show the presence of strong, unchanged coronal-line features and AGN-like MIR colours and behaviour. We have constructed spectral templates of both subtypes of ECLE to aid in distinguishing the likely origin of newly discovered ECLEs. We highlight the need for higher cadence, and more rapid, follow-up observations of such objects to better constrain their properties and evolution. We also discuss the relationships between ECLEs, TDEs, and other identified transients having significant MIR variability., Comment: This is a pre-copyedited, author-produced PDF of an article accepted for publication in Monthly Notices of the Royal Astronomical Society following peer review. Note the corrected caption of Figure 1 continued, which in this version correctly refers to 'SDSS J124' rather than the erroneous 'SDSS J1341' in the published version. 29 Pages, 14 Figures
- Published
- 2023
- Full Text
- View/download PDF
21. Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy
- Author
-
Wiegreffe, Sarah, Finlayson, Matthew, Tafjord, Oyvind, Clark, Peter, Sabharwal, Ashish, Wiegreffe, Sarah, Finlayson, Matthew, Tafjord, Oyvind, Clark, Peter, and Sabharwal, Ashish
- Abstract
When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer choices. Spreading probability mass across multiple surface forms with identical meaning (such as "bath" and "bathtub") is thought to cause an underestimation of a model's true performance, referred to as the "surface form competition" (SFC) hypothesis. This has motivated the introduction of various probability normalization methods. However, many core questions remain unanswered. How do we measure SFC? Are there direct ways of reducing it, and does doing so improve task performance? We propose a mathematical formalism for SFC which allows us to quantify and bound its impact for the first time. We identify a simple method for reducing it -- namely, increasing probability mass on the given answer choices by a) including them in the prompt and b) using in-context learning with even just one example. We show this method eliminates the impact of SFC in the majority of instances. Our experiments on three diverse datasets and six LMs reveal several additional surprising findings. For example, both normalization and prompting methods for reducing SFC can be ineffective or even detrimental to task performance for some LMs. We conclude with practical insights for effectively prompting LMs for multiple-choice tasks., Comment: EMNLP 2023
- Published
- 2023
22. Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation
- Author
-
Liang, Zhenwen, Yu, Wenhao, Rajpurohit, Tanmay, Clark, Peter, Zhang, Xiangliang, Kaylan, Ashwin, Liang, Zhenwen, Yu, Wenhao, Rajpurohit, Tanmay, Clark, Peter, Zhang, Xiangliang, and Kaylan, Ashwin
- Abstract
In this paper, we present a novel approach for distilling math word problem solving capabilities from large language models (LLMs) into smaller, more efficient student models. Our approach is designed to consider the student model's weaknesses and foster a tailored learning experience by generating targeted exercises aligned with educational science principles, such as knowledge tracing and personalized learning. Concretely, we let GPT-3 be a math tutor and run two steps iteratively: 1) assessing the student model's current learning status on a GPT-generated exercise book, and 2) improving the student model by training it with tailored exercise samples generated by GPT-3. Experimental results reveal that our approach outperforms LLMs (e.g., GPT-3 and PaLM) in accuracy across three distinct benchmarks while employing significantly fewer parameters. Furthermore, we provide a comprehensive analysis of the various components within our methodology to substantiate their efficacy.
- Published
- 2023
23. Language Models with Rationality
- Author
-
Kassner, Nora, Tafjord, Oyvind, Sabharwal, Ashish, Richardson, Kyle, Schuetze, Hinrich, Clark, Peter, Kassner, Nora, Tafjord, Oyvind, Sabharwal, Ashish, Richardson, Kyle, Schuetze, Hinrich, and Clark, Peter
- Abstract
While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent "beliefs". This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that answers are supported by interpretable chains of reasoning drawn from a consistent network of beliefs. Our approach, which we call REFLEX, is to add a rational, self-reflecting layer on top of the LLM. First, given a question, we construct a belief graph using a backward-chaining process to materialize relevant model beliefs (including beliefs about answer candidates) and their inferential relationships. Second, we identify and minimize contradictions in that graph using a formal constraint reasoner. We find that REFLEX significantly improves consistency (by 8%-11% absolute) without harming overall answer accuracy, resulting in answers supported by faithful chains of reasoning drawn from a more consistent belief system. This suggests a new style of system architecture in which an LLM extended with a rational layer can provide an interpretable window into system beliefs, add a systematic reasoning capability, and repair latent inconsistencies present in the LLM.
- Published
- 2023
24. IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions
- Author
-
Yu, Wenhao, Jiang, Meng, Clark, Peter, Sabharwal, Ashish, Yu, Wenhao, Jiang, Meng, Clark, Peter, and Sabharwal, Ashish
- Abstract
Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we introduce the first such dataset, named IfQA, where each question is based on a counterfactual presupposition via an "if" clause. For example, if Los Angeles was on the east coast of the U.S., what would be the time difference between Los Angeles and Paris? Such questions require models to go beyond retrieving direct factual knowledge from the Web: they must identify the right information to retrieve and reason about an imagined situation that may even go against the facts built into their parameters. The IfQA dataset contains over 3,800 questions that were annotated annotated by crowdworkers on relevant Wikipedia passages. Empirical analysis reveals that the IfQA dataset is highly challenging for existing open-domain QA methods, including supervised retrieve-then-read pipeline methods (EM score 36.2), as well as recent few-shot approaches such as chain-of-thought prompting with GPT-3 (EM score 27.4). The unique challenges posed by the IfQA benchmark will push open-domain QA research on both retrieval and counterfactual reasoning fronts.
- Published
- 2023
25. RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs
- Author
-
Akyürek, Afra Feyza, Akyürek, Ekin, Madaan, Aman, Kalyan, Ashwin, Clark, Peter, Wijaya, Derry, Tandon, Niket, Akyürek, Afra Feyza, Akyürek, Ekin, Madaan, Aman, Kalyan, Ashwin, Clark, Peter, Wijaya, Derry, and Tandon, Niket
- Abstract
Despite their unprecedented success, even the largest language models make mistakes. Similar to how humans learn and improve using feedback, previous work proposed providing language models with natural language feedback to guide them in repairing their outputs. Because human-generated critiques are expensive to obtain, researchers have devised learned critique generators in lieu of human critics while assuming one can train downstream models to utilize generated feedback. However, this approach does not apply to black-box or limited access models such as ChatGPT, as they cannot be fine-tuned. Moreover, in the era of large general-purpose language agents, fine-tuning is neither computationally nor spatially efficient as it results in multiple copies of the network. In this work, we introduce RL4F (Reinforcement Learning for Feedback), a multi-agent collaborative framework where the critique generator is trained to maximize end-task performance of GPT-3, a fixed model more than 200 times its size. RL4F produces critiques that help GPT-3 revise its outputs. We study three datasets for action planning, summarization and alphabetization and show relative improvements up to 10% in multiple text similarity metrics over other learned, retrieval-augmented or prompting-based critique generators., Comment: ACL 2023
- Published
- 2023
26. Self-Refine: Iterative Refinement with Self-Feedback
- Author
-
Madaan, Aman, Tandon, Niket, Gupta, Prakhar, Hallinan, Skyler, Gao, Luyu, Wiegreffe, Sarah, Alon, Uri, Dziri, Nouha, Prabhumoye, Shrimai, Yang, Yiming, Gupta, Shashank, Majumder, Bodhisattwa Prasad, Hermann, Katherine, Welleck, Sean, Yazdanbakhsh, Amir, Clark, Peter, Madaan, Aman, Tandon, Niket, Gupta, Prakhar, Hallinan, Skyler, Gao, Luyu, Wiegreffe, Sarah, Alon, Uri, Dziri, Nouha, Prabhumoye, Shrimai, Yang, Yiming, Gupta, Shashank, Majumder, Bodhisattwa Prasad, Hermann, Katherine, Welleck, Sean, Yazdanbakhsh, Amir, and Clark, Peter
- Abstract
Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it to refine itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner, and feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by ~20% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach., Comment: Code, data, and demo at https://selfrefine.info
- Published
- 2023
27. Light-Curve Structure and Halpha Line Formation in the Tidal Disruption Event AT 2019azh
- Author
-
Faris, Sara, Arcavi, Iair, Makrygianni, Lydia, Hiramatsu, Daichi, Terreran, Giacomo, Farah, Joseph, Howell, D. Andrew, McCully, Curtis, Newsome, Megan, Gonzalez, Estefania Padilla, Pellegrino, Craig, Bostroem, K. Azalee, Abojanb, Wiam, Lam, Marco C., Tomasella, Lina, Brink, Thomas G., Filippenko, Alexei V., French, K. Decker, Clark, Peter, Graur, Or, Leloudas, Giorgos, Gromadzki, Mariusz, Anderson, Joseph P., Nicholl, Matt, Gutierrez, Claudia P., Kankare, Erkki, Inserra, Cosimo, Galbany, Luis, Reynolds, Thomas, Mattila, Seppo, Heikkila, Teppo, Wang, Yanan, Onori, Francesca, Wevers, Thomas, Charalampopoulos, Panos, Johansson, Joel, Faris, Sara, Arcavi, Iair, Makrygianni, Lydia, Hiramatsu, Daichi, Terreran, Giacomo, Farah, Joseph, Howell, D. Andrew, McCully, Curtis, Newsome, Megan, Gonzalez, Estefania Padilla, Pellegrino, Craig, Bostroem, K. Azalee, Abojanb, Wiam, Lam, Marco C., Tomasella, Lina, Brink, Thomas G., Filippenko, Alexei V., French, K. Decker, Clark, Peter, Graur, Or, Leloudas, Giorgos, Gromadzki, Mariusz, Anderson, Joseph P., Nicholl, Matt, Gutierrez, Claudia P., Kankare, Erkki, Inserra, Cosimo, Galbany, Luis, Reynolds, Thomas, Mattila, Seppo, Heikkila, Teppo, Wang, Yanan, Onori, Francesca, Wevers, Thomas, Charalampopoulos, Panos, and Johansson, Joel
- Abstract
AT 2019azh is a H+He tidal disruption event (TDE) with one of the most extensive ultraviolet and optical datasets available to date. We present our photometric and spectroscopic observations of this event starting several weeks before and out to approximately two years after g-band peak brightness and combine them with public photometric data. This extensive dataset robustly reveals a change in the light-curve slope and a bump in the rising light curve of a TDE for the first time, which may indicate more than one dominant emission mechanism contributing to the pre-peak light curve. We further confirm the relation seen in previous TDEs whereby the redder emission peaks later than the bluer emission. The post-peak bolometric light curve of AT 2019azh is better described by an exponential decline than by the canonical t^{-5/3} (and in fact any) power-law decline. We find a possible mid-infrared excess around peak optical luminosity, but cannot determine its origin. In addition, we provide the earliest measurements of the Halpha emission-line evolution and find no significant time delay between the peak of the V-band light curve and that of the Halpha luminosity. These results can be used to constrain future models of TDE line formation and emission mechanisms in general. More pre-peak 1-2 day cadence observations of TDEs are required to determine whether the characteristics observed here are common among TDEs. More importantly, detailed emission models are needed to fully exploit such observations for understanding the emission physics of TDEs., Comment: Submitted to ApJ
- Published
- 2023
28. Overview of the distributed image processing infrastructure to produce the Legacy Survey of Space and Time
- Author
-
Hernandez, Fabio, Beckett, George, Clark, Peter, Doidge, Matt, Jenness, Tim, Karavakis, Edward, Boulc'h, Quentin Le, Love, Peter, Mainetti, Gabriele, Noble, Timothy, White, Brandon, Yang, Wei, Hernandez, Fabio, Beckett, George, Clark, Peter, Doidge, Matt, Jenness, Tim, Karavakis, Edward, Boulc'h, Quentin Le, Love, Peter, Mainetti, Gabriele, Noble, Timothy, White, Brandon, and Yang, Wei
- Abstract
The Vera C. Rubin Observatory is preparing to execute the most ambitious astronomical survey ever attempted, the Legacy Survey of Space and Time (LSST). Currently the final phase of construction is under way in the Chilean Andes, with the Observatory's ten-year science mission scheduled to begin in 2025. Rubin's 8.4-meter telescope will nightly scan the southern hemisphere collecting imagery in the wavelength range 320-1050 nm covering the entire observable sky every 4 nights using a 3.2 gigapixel camera, the largest imaging device ever built for astronomy. Automated detection and classification of celestial objects will be performed by sophisticated algorithms on high-resolution images to progressively produce an astronomical catalog eventually composed of 20 billion galaxies and 17 billion stars and their associated physical properties. In this article we present an overview of the system currently being constructed to perform data distribution as well as the annual campaigns which reprocess the entire image dataset collected since the beginning of the survey. These processing campaigns will utilize computing and storage resources provided by three Rubin data facilities (one in the US and two in Europe). Each year a Data Release will be produced and disseminated to science collaborations for use in studies comprising four main science pillars: probing dark matter and dark energy, taking inventory of solar system objects, exploring the transient optical sky and mapping the Milky Way. Also presented is the method by which we leverage some of the common tools and best practices used for management of large-scale distributed data processing projects in the high energy physics and astronomy communities. We also demonstrate how these tools and practices are utilized within the Rubin project in order to overcome the specific challenges faced by the Observatory., Comment: 8 pages, 2 figures, 26th International Conference on Computing in High Energy & Nuclear Physics
- Published
- 2023
29. BaRDa: A Belief and Reasoning Dataset that Separates Factual Accuracy and Reasoning Ability
- Author
-
Clark, Peter, Mishra, Bhavana Dalvi, Tafjord, Oyvind, Clark, Peter, Mishra, Bhavana Dalvi, and Tafjord, Oyvind
- Abstract
While there are numerous benchmarks comparing the performance of modern language models (LMs), end-task evaluations often conflate notions of *factual accuracy* ("truth") and *reasoning ability* ("rationality", or "honesty" in the sense of correctly reporting implications of beliefs). Our goal is a dataset that clearly distinguishes these two notions. Our approach is to leverage and extend a collection of human-annotated *entailment trees*, engineered to express both good and bad chains of reasoning, and using a mixture of true and false facts, in particular including counterfactual examples, to avoid belief bias (also known as the "content effect"). The resulting dataset, called BaRDa, contains 3000 entailments (1787 valid, 1213 invalid), using 6681 true and 2319 false statements. Testing on four GPT-series models, GPT3(curie)/GPT3(davinici)/3.5/4, we find factual accuracy (truth) scores of 74.1/80.6/82.6/87.1 and reasoning accuracy scores of 63.1/78.0/71.8/79.2. This shows the clear progression of models towards improved factual accuracy and entailment reasoning, and the dataset provides a new benchmark that more cleanly separates and quantifies these two notions., Comment: Added note about how dataset sampling was performed
- Published
- 2023
30. Leveraging Code to Improve In-context Learning for Semantic Parsing
- Author
-
Bogin, Ben, Gupta, Shivanshu, Clark, Peter, Sabharwal, Ashish, Bogin, Ben, Gupta, Shivanshu, Clark, Peter, and Sabharwal, Ashish
- Abstract
In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization. However, learning to parse to rare domain-specific languages (DSLs) from just a few demonstrations is challenging, limiting the performance of even the most capable LLMs. In this work, we improve the effectiveness of ICL for semantic parsing by (1) using general-purpose programming languages such as Python instead of DSLs, and (2) augmenting prompts with a structured domain description that includes, e.g., the available classes and functions. We show that both these changes significantly improve accuracy across three popular datasets. Combined, they lead to dramatic improvements (e.g. 7.9% to 66.5% on SMCalFlow compositional split), nearly closing the performance gap between easier i.i.d.\ and harder compositional splits when used with a strong model, and reducing the need for a large number of demonstrations. We find that the resemblance of the target parse language to general-purpose code is a more important factor than the language's popularity in pre-training corpora. Our findings provide an improved methodology for building semantic parsers in the modern context of ICL with LLMs., Comment: Accepted to NAACL 2024
- Published
- 2023
31. Tailoring with Targeted Precision: Edit-Based Agents for Open-Domain Procedure Customization
- Author
-
Lal, Yash Kumar, Zhang, Li, Brahman, Faeze, Majumder, Bodhisattwa Prasad, Clark, Peter, Tandon, Niket, Lal, Yash Kumar, Zhang, Li, Brahman, Faeze, Majumder, Bodhisattwa Prasad, Clark, Peter, and Tandon, Niket
- Abstract
How-to procedures, such as how to plant a garden, are now used by millions of users, but sometimes need customizing to meet a user's specific needs, e.g., planting a garden without pesticides. Our goal is to measure and improve an LLM's ability to perform such customization. Our approach is to test several simple multi-LLM-agent architectures for customization, as well as an end-to-end LLM, using a new evaluation set, called CustomPlans, of over 200 WikiHow procedures each with a customization need. We find that a simple architecture with two LLM agents used sequentially performs best, one that edits a generic how-to procedure and one that verifies its executability, significantly outperforming (10.5% absolute) an end-to-end prompted LLM. This suggests that LLMs can be configured reasonably effectively for procedure customization. This also suggests that multi-agent editing architectures may be worth exploring further for other customization applications (e.g. coding, creative writing) in the future., Comment: Camera ready version accepted to Findings of ACL 2024
- Published
- 2023
32. Digital Socrates: Evaluating LLMs through Explanation Critiques
- Author
-
Gu, Yuling, Tafjord, Oyvind, Clark, Peter, Gu, Yuling, Tafjord, Oyvind, and Clark, Peter
- Abstract
While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood. In response, our goal is to define a detailed way of characterizing the explanation capabilities of modern models and to create a nuanced, interpretable explanation evaluation tool that can generate such characterizations automatically, without relying on expensive API calls or human annotations. Our approach is to (a) define the new task of explanation critiquing - identifying and categorizing any main flaw in an explanation and providing suggestions to address the flaw, (b) create a sizeable, human-verified dataset for this task, and (c) train an open-source, automatic critique model (called Digital Socrates) using this data. Through quantitative and qualitative analysis, we demonstrate how Digital Socrates is useful for revealing insights about student models by examining their reasoning chains, and how it can provide high-quality, nuanced, automatic evaluation of those model explanations for the first time. Digital Socrates thus fills an important gap in evaluation tools for understanding and improving the explanation behavior of models.
- Published
- 2023
33. ADaPT: As-Needed Decomposition and Planning with Language Models
- Author
-
Prasad, Archiki, Koller, Alexander, Hartmann, Mareike, Clark, Peter, Sabharwal, Ashish, Bansal, Mohit, Khot, Tushar, Prasad, Archiki, Koller, Alexander, Hartmann, Mareike, Clark, Peter, Sabharwal, Ashish, Bansal, Mohit, and Khot, Tushar
- Abstract
Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next action (iterative executors) or generating plans and executing sub-tasks using LLMs (plan-and-execute). However, these methods struggle with task complexity, as the inability to execute any sub-task may lead to task failure. To address these shortcomings, we introduce As-Needed Decomposition and Planning for complex Tasks (ADaPT), an approach that explicitly plans and decomposes complex sub-tasks as-needed, i.e., when the LLM is unable to execute them. ADaPT recursively decomposes sub-tasks to adapt to both task complexity and LLM capability. Our results demonstrate that ADaPT substantially outperforms established strong baselines, achieving success rates up to 28.3% higher in ALFWorld, 27% in WebShop, and 33% in TextCraft -- a novel compositional dataset that we introduce. Through extensive analysis, we illustrate the importance of multilevel decomposition and establish that ADaPT dynamically adjusts to the capabilities of the executor LLM as well as to task complexity., Comment: NAACL 2024 (findings) camera-ready. Project Page: https://allenai.github.io/adaptllm
- Published
- 2023
34. Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
- Author
-
Gupta, Shashank, Shrivastava, Vaishnavi, Deshpande, Ameet, Kalyan, Ashwin, Clark, Peter, Sabharwal, Ashish, Khot, Tushar, Gupta, Shashank, Shrivastava, Vaishnavi, Deshpande, Ameet, Kalyan, Ashwin, Clark, Peter, Sabharwal, Ashish, and Khot, Tushar
- Abstract
Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs' capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks. Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups. Our experiments unveil that LLMs harbor deep rooted bias against various socio-demographics underneath a veneer of fairness. While they overtly reject stereotypes when explicitly asked ('Are Black people less skilled at mathematics?'), they manifest stereotypical and erroneous presumptions when asked to answer questions while adopting a persona. These can be observed as abstentions in responses, e.g., 'As a Black person, I can't answer this question as it requires math knowledge', and generally result in a substantial performance drop. Our experiments with ChatGPT-3.5 show that this bias is ubiquitous - 80% of our personas demonstrate bias; it is significant - some datasets show performance drops of 70%+; and can be especially harmful for certain groups - some personas suffer statistically significant drops on 80%+ of the datasets. Overall, all 4 LLMs exhibit this bias to varying extents, with GPT-4-Turbo showing the least but still a problematic amount of bias (evident in 42% of the personas). Further analysis shows that these persona-induced errors can be hard-to-discern and hard-to-avoid. Our findings serve as a cautionary tale that the practice of assigning personas to LLMs - a trend on the rise - can surface their deep-rooted biases and have unforeseeable and detrimental side-effects., Comment: Project page: https://allenai.github.io/persona-bias. Paper to appear at ICLR 2024. Added results for other LLMs in v2 (similar findings)
- Published
- 2023
35. QualEval: Qualitative Evaluation for Model Improvement
- Author
-
Murahari, Vishvak, Deshpande, Ameet, Clark, Peter, Rajpurohit, Tanmay, Sabharwal, Ashish, Narasimhan, Karthik, Kalyan, Ashwin, Murahari, Vishvak, Deshpande, Ameet, Clark, Peter, Rajpurohit, Tanmay, Sabharwal, Ashish, Narasimhan, Karthik, and Kalyan, Ashwin
- Abstract
Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world tasks, a single scalar to quantify and compare is insufficient to capture the fine-grained nuances of model behavior. Metrics serve only as a way to compare and benchmark models, and do not yield actionable diagnostics, thus making the model improvement process challenging. Model developers find themselves amid extensive manual efforts involving sifting through vast datasets and attempting hit-or-miss adjustments to training data or setups. In this work, we address the shortcomings of quantitative metrics by proposing QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights that when applied, accelerate model improvement. The insights are backed by a comprehensive dashboard with fine-grained visualizations and human-interpretable analyses. We corroborate the faithfulness of QualEval by demonstrating that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative on a challenging dialogue task (DialogSum) when compared to baselines. QualEval successfully increases the pace of model development, thus in essence serving as a data-scientist-in-a-box. Given the focus on critiquing and improving current evaluation metrics, our method serves as a refreshingly new technique for both model evaluation and improvement., Comment: NAACL 2024
- Published
- 2023
36. CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
- Author
-
Majumder, Bodhisattwa Prasad, Mishra, Bhavana Dalvi, Jansen, Peter, Tafjord, Oyvind, Tandon, Niket, Zhang, Li, Callison-Burch, Chris, Clark, Peter, Majumder, Bodhisattwa Prasad, Mishra, Bhavana Dalvi, Jansen, Peter, Tafjord, Oyvind, Tandon, Niket, Zhang, Li, Callison-Burch, Chris, and Clark, Peter
- Abstract
Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities, these agents to date do not continually improve over time beyond performance refinement on a specific task. Here we present CLIN, the first language-based agent to achieve this, so that it continually improves over multiple trials, including when both the environment and task are varied, and without requiring parameter updates. Our approach is to use a persistent, dynamic, textual memory centered on causal abstractions (rather than general "helpful hints") that is regularly updated after each trial so that the agent gradually learns useful knowledge for new trials. In the ScienceWorld benchmark, CLIN is able to continually improve on repeated trials on the same task and environment, outperforming state-of-the-art reflective language agents like Reflexion by 23 absolute points. CLIN can also transfer its learning to new environments (or new tasks), improving its zero-shot performance by 4 points (13 for new tasks) and can further improve performance there through continual memory updates, enhancing performance by an additional 17 points (7 for new tasks). This suggests a new architecture for agents built on frozen models that can still continually and rapidly improve over time., Comment: Project page: https://allenai.github.io/clin
- Published
- 2023
37. Image-guided therapeutic intervention in autoimmune diseases
- Author
-
Chen, Bao Ying, Clark, Peter M.1, Chen, Bao Ying, Chen, Bao Ying, Clark, Peter M.1, and Chen, Bao Ying
- Abstract
Multiple sclerosis (MS) affects more than 1 million Americans every year and is a chronic, demyelinating, neurodegenerative disease of the central nervous system. MS is a challenging disease to diagnose and treat as it can be a heterogenous in both its biological aspects and clinical presentation. Standard of care for diagnosing patients with MS is through the use of magnetic resonance imaging (MRI). While this technique is informative about anatomical structures, an imaging modality that can provide functional information about the disease will help to elucidate the complex mechanisms involved. Furthermore, current therapies for MS can have significant side effects on patients and/or only target a certain subset of patients. New therapies are needed to not only help MS patients but also patients with other autoimmune diseases. Chapter one of this dissertation will be a review about the current state of MS, introduction into positron emission tomography (PET) and the current radiotracers that have been developed to image different aspects of MS, and the deoxyribonucleoside salvage pathway in autoimmunity. Chapter two describes the first project that I worked on in which I utilize [18F]FAC to image brain-infiltrating leukocytes in the experimental autoimmune encephalomyelitis (EAE), a mouse model of MS. Brain-filtrating leukocytes contributes to MS pathology and have been shown to contribute to pathology in other neurological diseases including autoimmune encephalomyelitis. Because of its role in disease pathology, it is of importance that a strategy is available to image these pathogenic immune cells. Chapter three will talk about understanding the functional aspect of the deoxyribonucleoside salvage in EAE and the broader implications of this pathway for MS disease. Through our previous work on imaging this pathway in EAE, we found that this pathway is upregulated during disease onset and progression. I show that in this chapter the deoxyribonucleoside salvage is f
- Published
- 2022
38. Image-guided therapeutic intervention in autoimmune diseases
- Author
-
Chen, Bao Ying, Clark, Peter M.1, Chen, Bao Ying, Chen, Bao Ying, Clark, Peter M.1, and Chen, Bao Ying
- Abstract
Multiple sclerosis (MS) affects more than 1 million Americans every year and is a chronic, demyelinating, neurodegenerative disease of the central nervous system. MS is a challenging disease to diagnose and treat as it can be a heterogenous in both its biological aspects and clinical presentation. Standard of care for diagnosing patients with MS is through the use of magnetic resonance imaging (MRI). While this technique is informative about anatomical structures, an imaging modality that can provide functional information about the disease will help to elucidate the complex mechanisms involved. Furthermore, current therapies for MS can have significant side effects on patients and/or only target a certain subset of patients. New therapies are needed to not only help MS patients but also patients with other autoimmune diseases. Chapter one of this dissertation will be a review about the current state of MS, introduction into positron emission tomography (PET) and the current radiotracers that have been developed to image different aspects of MS, and the deoxyribonucleoside salvage pathway in autoimmunity. Chapter two describes the first project that I worked on in which I utilize [18F]FAC to image brain-infiltrating leukocytes in the experimental autoimmune encephalomyelitis (EAE), a mouse model of MS. Brain-filtrating leukocytes contributes to MS pathology and have been shown to contribute to pathology in other neurological diseases including autoimmune encephalomyelitis. Because of its role in disease pathology, it is of importance that a strategy is available to image these pathogenic immune cells. Chapter three will talk about understanding the functional aspect of the deoxyribonucleoside salvage in EAE and the broader implications of this pathway for MS disease. Through our previous work on imaging this pathway in EAE, we found that this pathway is upregulated during disease onset and progression. I show that in this chapter the deoxyribonucleoside salvage is f
- Published
- 2022
39. Do language models have coherent mental models of everyday things?
- Author
-
Gu, Yuling, Mishra, Bhavana Dalvi, Clark, Peter, Gu, Yuling, Mishra, Bhavana Dalvi, and Clark, Peter
- Abstract
When people think of everyday things like an egg, they typically have a mental image associated with it. This allows them to correctly judge, for example, that "the yolk surrounds the shell" is a false statement. Do language models similarly have a coherent picture of such everyday things? To investigate this, we propose a benchmark dataset consisting of 100 everyday things, their parts, and the relationships between these parts, expressed as 11,720 "X relation Y?" true/false questions. Using these questions as probes, we observe that state-of-the-art pre-trained language models (LMs) like GPT-3 and Macaw have fragments of knowledge about these everyday things, but do not have fully coherent "parts mental models" (54-59% accurate, 19-43% conditional constraint violation). We propose an extension where we add a constraint satisfaction layer on top of the LM's raw predictions to apply commonsense constraints. As well as removing inconsistencies, we find that this also significantly improves accuracy (by 16-20%), suggesting how the incoherence of the LM's pictures of everyday things can be significantly reduced., Comment: ACL 2023
- Published
- 2022
40. Lila: A Unified Benchmark for Mathematical Reasoning
- Author
-
Mishra, Swaroop, Finlayson, Matthew, Lu, Pan, Tang, Leonard, Welleck, Sean, Baral, Chitta, Rajpurohit, Tanmay, Tafjord, Oyvind, Sabharwal, Ashish, Clark, Peter, Kalyan, Ashwin, Mishra, Swaroop, Finlayson, Matthew, Lu, Pan, Tang, Leonard, Welleck, Sean, Baral, Chitta, Rajpurohit, Tanmay, Tafjord, Oyvind, Sabharwal, Ashish, Clark, Peter, and Kalyan, Ashwin
- Abstract
Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g., arithmetic, calculus (ii) language format e.g., question-answering, fill-in-the-blanks (iii) language diversity e.g., no language, simple language (iv) external knowledge e.g., commonsense, physics. We construct our benchmark by extending 20 datasets benchmark by collecting task instructions and solutions in the form of Python programs, thereby obtaining explainable solutions in addition to the correct answer. We additionally introduce two evaluation datasets to measure out-of-distribution performance and robustness to language perturbation. Finally, we introduce BHASKARA, a general-purpose mathematical reasoning model trained on LILA. Importantly, we find that multi-tasking leads to significant improvements (average relative improvement of 21.83% F1 score vs. single-task models), while the best performing model only obtains 60.40%, indicating the room for improvement in general mathematical reasoning and understanding., Comment: EMNLP 2022
- Published
- 2022
41. Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE
- Author
-
Gu, Yuling, Fu, Yao, Pyatkin, Valentina, Magnusson, Ian, Mishra, Bhavana Dalvi, Clark, Peter, Gu, Yuling, Fu, Yao, Pyatkin, Valentina, Magnusson, Ian, Mishra, Bhavana Dalvi, and Clark, Peter
- Abstract
Figurative language (e.g., "he flew like the wind") is challenging to understand, as it is hard to tell what implicit information is being conveyed from the surface form alone. We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language. We present DREAM-FLUTE, a figurative language understanding system that does this, first forming a "mental model" of situations described in a premise and hypothesis before making an entailment/contradiction decision and generating an explanation. DREAM-FLUTE uses an existing scene elaboration model, DREAM, for constructing its "mental model." In the FigLang2022 Shared Task evaluation, DREAM-FLUTE achieved (joint) first place (Acc@60=63.3%), and can perform even better with ensemble techniques, demonstrating the effectiveness of this approach. More generally, this work suggests that adding a reflective component to pretrained language models can improve their performance beyond standard fine-tuning (3.3% improvement in Acc@60)., Comment: Accepted at The Third Workshop on Figurative Language Processing @ EMNLP 2022
- Published
- 2022
42. Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning
- Author
-
Tafjord, Oyvind, Mishra, Bhavana Dalvi, Clark, Peter, Tafjord, Oyvind, Mishra, Bhavana Dalvi, and Clark, Peter
- Abstract
Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning. Such a capability would allow better understanding of why a model produced the answer it did. Our approach is to recursively combine a trained backward-chaining model, capable of generating a set of premises entailing an answer hypothesis, with a verifier that checks that the model itself believes those premises (and the entailment itself) through self-querying. To our knowledge, this is the first system to generate multistep chains that are both faithful (the answer follows from the reasoning) and truthful (the chain reflects the system's own internal beliefs). In evaluation using two different datasets, users judge that a majority (70%+) of generated chains clearly show how an answer follows from a set of facts - substantially better than a high-performance baseline - while preserving answer accuracy. By materializing model beliefs that systematically support an answer, new opportunities arise for understanding the model's system of belief, and diagnosing and correcting its misunderstandings when an answer is wrong., Comment: accepted at EMNLP 2022. arXiv admin note: substantial text overlap with arXiv:2204.13074
- Published
- 2022
43. Decomposed Prompting: A Modular Approach for Solving Complex Tasks
- Author
-
Khot, Tushar, Trivedi, Harsh, Finlayson, Matthew, Fu, Yao, Richardson, Kyle, Clark, Peter, Sabharwal, Ashish, Khot, Tushar, Trivedi, Harsh, Finlayson, Matthew, Fu, Yao, Richardson, Kyle, Clark, Peter, and Sabharwal, Ashish
- Abstract
Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks that can be delegated to a library of prompting-based LLMs dedicated to these sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task, further decomposed if necessary, and even easily replaced with more effective prompts, trained models, or symbolic functions if desired. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting using GPT3. On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks. When the complexity comes from the input length, we can recursively decompose the task into the same task but with smaller inputs. We also evaluate our approach on textual multi-step reasoning tasks: on long-context multi-hop QA task, we can more effectively teach the sub-tasks via our separate sub-tasks prompts; and on open-domain multi-hop QA, we can incorporate a symbolic information retrieval within our decomposition framework, leading to improved performance on both tasks. Datasets, Code and Prompts available at https://github.com/allenai/DecomP., Comment: ICLR'23 Camera Ready
- Published
- 2022
44. Complexity-Based Prompting for Multi-Step Reasoning
- Author
-
Fu, Yao, Peng, Hao, Sabharwal, Ashish, Clark, Peter, Khot, Tushar, Fu, Yao, Peng, Hao, Sabharwal, Ashish, Clark, Peter, and Khot, Tushar
- Abstract
We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer, large language models can generate new reasoning chains and predict answers for new inputs. A central question is which reasoning examples make the most effective prompts. In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning. We show that prompts with higher reasoning complexity, i.e., chains with more reasoning steps, achieve substantially better performance on multi-step reasoning tasks over strong baselines. We further extend our complexity-based criteria from prompting (selecting inputs) to decoding (selecting outputs), where we sample multiple reasoning chains from the model, then choose the majority of generated answers from complex reasoning chains (over simple chains). When used to prompt GPT-3 and Codex, our approach substantially improves multi-step reasoning accuracy and achieves new state-of-the-art (SOTA) performance on three math benchmarks (GSM8K, MultiArith, and MathQA) and two BigBenchHard tasks (Date Understanding and Penguins), with an average +5.3 and up to +18 accuracy improvements. Compared with existing example selection schemes like manual tuning or retrieval-based selection, selection based on reasoning complexity is intuitive, easy to implement, and annotation-efficient. Further results demonstrate the robustness of performance gains from complex prompts under format perturbation and distribution shift., Comment: Preprint
- Published
- 2022
45. Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
- Author
-
Lu, Pan, Qiu, Liang, Chang, Kai-Wei, Wu, Ying Nian, Zhu, Song-Chun, Rajpurohit, Tanmay, Clark, Peter, Kalyan, Ashwin, Lu, Pan, Qiu, Liang, Chang, Kai-Wei, Wu, Ying Nian, Zhu, Song-Chun, Rajpurohit, Tanmay, Clark, Peter, and Kalyan, Ashwin
- Abstract
Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in selecting in-context examples., Comment: ICLR 2023. 26 pages and 18 figures. The data and code are available at https://promptpg.github.io
- Published
- 2022
46. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
- Author
-
Lu, Pan, Mishra, Swaroop, Xia, Tony, Qiu, Liang, Chang, Kai-Wei, Zhu, Song-Chun, Tafjord, Oyvind, Clark, Peter, Kalyan, Ashwin, Lu, Pan, Mishra, Swaroop, Xia, Tony, Qiu, Liang, Chang, Kai-Wei, Zhu, Song-Chun, Tafjord, Oyvind, Clark, Peter, and Kalyan, Ashwin
- Abstract
When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (ScienceQA), a new benchmark that consists of ~21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations. We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions. ScienceQA demonstrates the utility of CoT in language models, as CoT improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA. We also explore the upper bound for models to leverage explanations by feeding those in the input; we observe that it improves the few-shot performance of GPT-3 by 18.96%. Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data. The data and code are available at https://scienceqa.github.io., Comment: Accepted to NeurIPS 2022. 22 pages, 17 figures, 9 tables. Project: https://scienceqa.github.io
- Published
- 2022
47. NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning
- Author
-
Weir, Nathaniel, Clark, Peter, Van Durme, Benjamin, Weir, Nathaniel, Clark, Peter, and Van Durme, Benjamin
- Abstract
Our goal is a modern approach to answering questions via systematic reasoning where answers are supported by human interpretable proof trees grounded in an NL corpus of authoritative facts. Such a system would help alleviate the challenges of interpretability and hallucination with modern LMs, and the lack of grounding of current explanation methods (e.g., Chain-of-Thought). This paper proposes a new take on Prolog-based inference engines, where we replace handcrafted rules with a combination of neural language modeling, guided generation, and semiparametric dense retrieval. Our implementation, NELLIE, is the first system to demonstrate fully interpretable, end-to-end grounded QA as entailment tree proof search, going beyond earlier work explaining known-to-be-true facts from text. In experiments, NELLIE outperforms a similar-sized state-of-the-art reasoner [Tafjord et al., 2022] while producing knowledge-grounded explanations. We also find NELLIE can exploit both semi-structured and NL text corpora to guide reasoning. Together these suggest a new way to jointly reap the benefits of both modern neural methods and traditional symbolic reasoning.
- Published
- 2022
48. GRade, Age, Nodes, and Tumor (GRANT) compared with Leibovich score to predict survival in localized renal cell carcinoma:A nationwide study
- Author
-
Juul, Simon, Donskov, Frede, Clark, Peter E., Lund, Lars, Azawi, Nessn H., Juul, Simon, Donskov, Frede, Clark, Peter E., Lund, Lars, and Azawi, Nessn H.
- Abstract
Objective To examine the performance of Leibovich score versus GRade, Age, Nodes, and Tumor score in predicting disease recurrence in renal cell carcinoma. Methods In total, 7653 patients diagnosed with renal cell carcinoma from 2010 to 2018 were captured in the nationwide DaRenCa database; 2652 underwent radical or partial nephrectomy and had full datasets regarding the GRade, Age, Nodes, and Tumor score and Leibovich score. Discrimination was assessed with a Cox regression model. The results were evaluated with concordance index analysis. Results Median follow-up was 40 months (interquartile range 24-56). Recurrence occurred in 17%, and 15% died. A significant proportion of patients (36%) had missing data for the calculation of the Leibovich score. Among 1957 clear cell renal cell carcinoma patients the distribution of GRade, Age, Nodes, and Tumor score of 0, 1, 2, or 3/4 was 21%, 56%, 21% and 1.4%, respectively, and for Leibovich score of low/intermediate/high this was 47%, 36% and 18%, respectively. A similar distribution was seen in 655 non-clear cell patients. Both Leibovich and GRade, Age, Nodes, and Tumor scores performed well in predicting outcomes for the favorable patient risk groups. The Leibovich score was better at predicting recurrence-free survival (concordance index 0.736 versus 0.643), but not overall survival (concordance index 0.657 versus 0.648). Similar results were obtained in non-clear cell renal cell carcinoma. Conclusion GRade, Age, Nodes, and Tumor and Leibovich scores were validated in clear cell and non-clear cell renal cell carcinoma. Leibovich score outperformed the GRade, Age, Nodes, and Tumor score in predicting recurrence-free survival and should remain the standard approach to risk stratify patients during follow-up when all data are available.
- Published
- 2022
49. GRade, Age, Nodes, and Tumor (GRANT) compared with Leibovich score to predict survival in localized renal cell carcinoma:A nationwide study
- Author
-
Juul, Simon, Donskov, Frede, Clark, Peter E., Lund, Lars, Azawi, Nessn H., Juul, Simon, Donskov, Frede, Clark, Peter E., Lund, Lars, and Azawi, Nessn H.
- Abstract
Objective To examine the performance of Leibovich score versus GRade, Age, Nodes, and Tumor score in predicting disease recurrence in renal cell carcinoma. Methods In total, 7653 patients diagnosed with renal cell carcinoma from 2010 to 2018 were captured in the nationwide DaRenCa database; 2652 underwent radical or partial nephrectomy and had full datasets regarding the GRade, Age, Nodes, and Tumor score and Leibovich score. Discrimination was assessed with a Cox regression model. The results were evaluated with concordance index analysis. Results Median follow-up was 40 months (interquartile range 24-56). Recurrence occurred in 17%, and 15% died. A significant proportion of patients (36%) had missing data for the calculation of the Leibovich score. Among 1957 clear cell renal cell carcinoma patients the distribution of GRade, Age, Nodes, and Tumor score of 0, 1, 2, or 3/4 was 21%, 56%, 21% and 1.4%, respectively, and for Leibovich score of low/intermediate/high this was 47%, 36% and 18%, respectively. A similar distribution was seen in 655 non-clear cell patients. Both Leibovich and GRade, Age, Nodes, and Tumor scores performed well in predicting outcomes for the favorable patient risk groups. The Leibovich score was better at predicting recurrence-free survival (concordance index 0.736 versus 0.643), but not overall survival (concordance index 0.657 versus 0.648). Similar results were obtained in non-clear cell renal cell carcinoma. Conclusion GRade, Age, Nodes, and Tumor and Leibovich scores were validated in clear cell and non-clear cell renal cell carcinoma. Leibovich score outperformed the GRade, Age, Nodes, and Tumor score in predicting recurrence-free survival and should remain the standard approach to risk stratify patients during follow-up when all data are available.
- Published
- 2022
50. Towards Teachable Reasoning Systems: Using a Dynamic Memory of User Feedback for Continual System Improvement
- Author
-
Mishra, Bhavana Dalvi, Tafjord, Oyvind, Clark, Peter, Mishra, Bhavana Dalvi, Tafjord, Oyvind, and Clark, Peter
- Abstract
Our goal is a teachable reasoning system for question-answering (QA), where a user can interact with faithful answer explanations, and correct its errors so that the system improves over time. Our approach is to augment a QA model with a dynamic memory of user feedback, containing user-supplied corrections to erroneous model beliefs that users identify during interaction. Retrievals from memory are used as additional context for QA, to help avoid previous mistakes in similar new situations - a novel application of memory-based continuous learning. With simulated feedback, we find that our system (called TeachMe) continually improves with time, and without model retraining, requiring feedback on only 25% of training examples to reach within 1% of the upper-bound (feedback on all examples). Similarly, in experiments with real users, we observe a similar trend, with performance improving by over 15% on a hidden test set after teaching. This suggests new opportunities for using frozen language models in an interactive setting where users can inspect, debug, and correct the model's beliefs, leading to improved system's performance over time., Comment: accepted at EMNLP 2022
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.