Author: "Amodei, Dario" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Amodei, Dario"' showing total 156 results

Start Over Author "Amodei, Dario"

156 results on '"Amodei, Dario"'

1. The Capacity for Moral Self-Correction in Large Language Models

Author: Ganguli, Deep, Askell, Amanda, Schiefer, Nicholas, Liao, Thomas I., Lukošiūtė, Kamilė, Chen, Anna, Goldie, Anna, Mirhoseini, Azalia, Olsson, Catherine, Hernandez, Danny, Drain, Dawn, Li, Dustin, Tran-Johnson, Eli, Perez, Ethan, Kernion, Jackson, Kerr, Jamie, Mueller, Jared, Landau, Joshua, Ndousse, Kamal, Nguyen, Karina, Lovitt, Liane, Sellitto, Michael, Elhage, Nelson, Mercado, Noemi, DasSarma, Nova, Rausch, Oliver, Lasenby, Robert, Larson, Robin, Ringer, Sam, Kundu, Sandipan, Kadavath, Saurav, Johnston, Scott, Kravec, Shauna, Showk, Sheer El, Lanham, Tamera, Telleen-Lawton, Timothy, Henighan, Tom, Hume, Tristan, Bai, Yuntao, Hatfield-Dodds, Zac, Mann, Ben, Amodei, Dario, Joseph, Nicholas, McCandlish, Sam, Brown, Tom, Olah, Christopher, Clark, Jack, Bowman, Samuel R., and Kaplan, Jared
Subjects: Computer Science - Computation and Language
Abstract: We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.
Published: 2023

2. Discovering Language Model Behaviors with Model-Written Evaluations

Author: Perez, Ethan, Ringer, Sam, Lukošiūtė, Kamilė, Nguyen, Karina, Chen, Edwin, Heiner, Scott, Pettit, Craig, Olsson, Catherine, Kundu, Sandipan, Kadavath, Saurav, Jones, Andy, Chen, Anna, Mann, Ben, Israel, Brian, Seethor, Bryan, McKinnon, Cameron, Olah, Christopher, Yan, Da, Amodei, Daniela, Amodei, Dario, Drain, Dawn, Li, Dustin, Tran-Johnson, Eli, Khundadze, Guro, Kernion, Jackson, Landis, James, Kerr, Jamie, Mueller, Jared, Hyun, Jeeyoon, Landau, Joshua, Ndousse, Kamal, Goldberg, Landon, Lovitt, Liane, Lucas, Martin, Sellitto, Michael, Zhang, Miranda, Kingsland, Neerav, Elhage, Nelson, Joseph, Nicholas, Mercado, Noemí, DasSarma, Nova, Rausch, Oliver, Larson, Robin, McCandlish, Sam, Johnston, Scott, Kravec, Shauna, Showk, Sheer El, Lanham, Tamera, Telleen-Lawton, Timothy, Brown, Tom, Henighan, Tom, Hume, Tristan, Bai, Yuntao, Hatfield-Dodds, Zac, Clark, Jack, Bowman, Samuel R., Askell, Amanda, Grosse, Roger, Hernandez, Danny, Ganguli, Deep, Hubinger, Evan, Schiefer, Nicholas, and Kaplan, Jared
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors., Comment: for associated data visualizations, see https://www.evals.anthropic.com/model-written/ for full datasets, see https://github.com/anthropics/evals
Published: 2022

3. Constitutional AI: Harmlessness from AI Feedback

Author: Bai, Yuntao, Kadavath, Saurav, Kundu, Sandipan, Askell, Amanda, Kernion, Jackson, Jones, Andy, Chen, Anna, Goldie, Anna, Mirhoseini, Azalia, McKinnon, Cameron, Chen, Carol, Olsson, Catherine, Olah, Christopher, Hernandez, Danny, Drain, Dawn, Ganguli, Deep, Li, Dustin, Tran-Johnson, Eli, Perez, Ethan, Kerr, Jamie, Mueller, Jared, Ladish, Jeffrey, Landau, Joshua, Ndousse, Kamal, Lukosuite, Kamile, Lovitt, Liane, Sellitto, Michael, Elhage, Nelson, Schiefer, Nicholas, Mercado, Noemi, DasSarma, Nova, Lasenby, Robert, Larson, Robin, Ringer, Sam, Johnston, Scott, Kravec, Shauna, Showk, Sheer El, Fort, Stanislav, Lanham, Tamera, Telleen-Lawton, Timothy, Conerly, Tom, Henighan, Tom, Hume, Tristan, Bowman, Samuel R., Hatfield-Dodds, Zac, Mann, Ben, Amodei, Dario, Joseph, Nicholas, McCandlish, Sam, Brown, Tom, and Kaplan, Jared
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences. We then train with RL using the preference model as the reward signal, i.e. we use 'RL from AI Feedback' (RLAIF). As a result we are able to train a harmless but non-evasive AI assistant that engages with harmful queries by explaining its objections to them. Both the SL and RL methods can leverage chain-of-thought style reasoning to improve the human-judged performance and transparency of AI decision making. These methods make it possible to control AI behavior more precisely and with far fewer human labels.
Published: 2022

4. Measuring Progress on Scalable Oversight for Large Language Models

Author: Bowman, Samuel R., Hyun, Jeeyoon, Perez, Ethan, Chen, Edwin, Pettit, Craig, Heiner, Scott, Lukošiūtė, Kamilė, Askell, Amanda, Jones, Andy, Chen, Anna, Goldie, Anna, Mirhoseini, Azalia, McKinnon, Cameron, Olah, Christopher, Amodei, Daniela, Amodei, Dario, Drain, Dawn, Li, Dustin, Tran-Johnson, Eli, Kernion, Jackson, Kerr, Jamie, Mueller, Jared, Ladish, Jeffrey, Landau, Joshua, Ndousse, Kamal, Lovitt, Liane, Elhage, Nelson, Schiefer, Nicholas, Joseph, Nicholas, Mercado, Noemí, DasSarma, Nova, Larson, Robin, McCandlish, Sam, Kundu, Sandipan, Johnston, Scott, Kravec, Shauna, Showk, Sheer El, Fort, Stanislav, Telleen-Lawton, Timothy, Brown, Tom, Henighan, Tom, Hume, Tristan, Bai, Yuntao, Hatfield-Dodds, Zac, Mann, Ben, and Kaplan, Jared
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on ways it can be studied empirically. We first present an experimental design centered on tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance. These results are an encouraging sign that scalable oversight will be tractable to study with present models and bolster recent findings that large language models can productively assist humans with difficult tasks., Comment: v2 fixes a few typos from v1
Published: 2022

5. In-context Learning and Induction Heads

Author: Olsson, Catherine, Elhage, Nelson, Nanda, Neel, Joseph, Nicholas, DasSarma, Nova, Henighan, Tom, Mann, Ben, Askell, Amanda, Bai, Yuntao, Chen, Anna, Conerly, Tom, Drain, Dawn, Ganguli, Deep, Hatfield-Dodds, Zac, Hernandez, Danny, Johnston, Scott, Jones, Andy, Kernion, Jackson, Lovitt, Liane, Ndousse, Kamal, Amodei, Dario, Brown, Tom, Clark, Jack, Kaplan, Jared, McCandlish, Sam, and Olah, Chris
Subjects: Computer Science - Machine Learning
Abstract: "Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.
Published: 2022

6. Toy Models of Superposition

Author: Elhage, Nelson, Hume, Tristan, Olsson, Catherine, Schiefer, Nicholas, Henighan, Tom, Kravec, Shauna, Hatfield-Dodds, Zac, Lasenby, Robert, Drain, Dawn, Chen, Carol, Grosse, Roger, McCandlish, Sam, Kaplan, Jared, Amodei, Dario, Wattenberg, Martin, and Olah, Christopher
Subjects: Computer Science - Machine Learning
Abstract: Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. This paper provides a toy model where polysemanticity can be fully understood, arising as a result of models storing additional sparse features in "superposition." We demonstrate the existence of a phase change, a surprising connection to the geometry of uniform polytopes, and evidence of a link to adversarial examples. We also discuss potential implications for mechanistic interpretability., Comment: Also available at https://transformer-circuits.pub/2022/toy_model/index.html
Published: 2022

7. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Author: Ganguli, Deep, Lovitt, Liane, Kernion, Jackson, Askell, Amanda, Bai, Yuntao, Kadavath, Saurav, Mann, Ben, Perez, Ethan, Schiefer, Nicholas, Ndousse, Kamal, Jones, Andy, Bowman, Sam, Chen, Anna, Conerly, Tom, DasSarma, Nova, Drain, Dawn, Elhage, Nelson, El-Showk, Sheer, Fort, Stanislav, Hatfield-Dodds, Zac, Henighan, Tom, Hernandez, Danny, Hume, Tristan, Jacobson, Josh, Johnston, Scott, Kravec, Shauna, Olsson, Catherine, Ringer, Sam, Tran-Johnson, Eli, Amodei, Dario, Brown, Tom, Joseph, Nicholas, McCandlish, Sam, Olah, Chris, Kaplan, Jared, and Clark, Jack
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society
Abstract: We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful outputs. We make three main contributions. First, we investigate scaling behaviors for red teaming across 3 model sizes (2.7B, 13B, and 52B parameters) and 4 model types: a plain language model (LM); an LM prompted to be helpful, honest, and harmless; an LM with rejection sampling; and a model trained to be helpful and harmless using reinforcement learning from human feedback (RLHF). We find that the RLHF models are increasingly difficult to red team as they scale, and we find a flat trend with scale for the other model types. Second, we release our dataset of 38,961 red team attacks for others to analyze and learn from. We provide our own analysis of the data and find a variety of harmful outputs, which range from offensive language to more subtly harmful non-violent unethical outputs. Third, we exhaustively describe our instructions, processes, statistical methodologies, and uncertainty about red teaming. We hope that this transparency accelerates our ability to work together as a community in order to develop shared norms, practices, and technical standards for how to red team language models.
Published: 2022

8. Language Models (Mostly) Know What They Know

Author: Kadavath, Saurav, Conerly, Tom, Askell, Amanda, Henighan, Tom, Drain, Dawn, Perez, Ethan, Schiefer, Nicholas, Hatfield-Dodds, Zac, DasSarma, Nova, Tran-Johnson, Eli, Johnston, Scott, El-Showk, Sheer, Jones, Andy, Elhage, Nelson, Hume, Tristan, Chen, Anna, Bai, Yuntao, Bowman, Sam, Fort, Stanislav, Ganguli, Deep, Hernandez, Danny, Jacobson, Josh, Kernion, Jackson, Kravec, Shauna, Lovitt, Liane, Ndousse, Kamal, Olsson, Catherine, Ringer, Sam, Amodei, Dario, Brown, Tom, Clark, Jack, Joseph, Nicholas, Mann, Ben, McCandlish, Sam, Olah, Chris, and Kaplan, Jared
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks. Performance at self-evaluation further improves when we allow models to consider many of their own samples before predicting the validity of one specific possibility. Next, we investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer. Models perform well at predicting P(IK) and partially generalize across tasks, though they struggle with calibration of P(IK) on new tasks. The predicted P(IK) probabilities also increase appropriately in the presence of relevant source materials in the context, and in the presence of hints towards the solution of mathematical word problems. We hope these observations lay the groundwork for training more honest models, and for investigating how honesty generalizes to cases where models are trained on objectives other than the imitation of human writing., Comment: 23+17 pages; refs added, typos fixed
Published: 2022

9. Scaling Laws and Interpretability of Learning from Repeated Data

Author: Hernandez, Danny, Brown, Tom, Conerly, Tom, DasSarma, Nova, Drain, Dawn, El-Showk, Sheer, Elhage, Nelson, Hatfield-Dodds, Zac, Henighan, Tom, Hume, Tristan, Johnston, Scott, Mann, Ben, Olah, Chris, Olsson, Catherine, Amodei, Dario, Joseph, Nicholas, Kaplan, Jared, and McCandlish, Sam
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Recent large language models have been trained on vast datasets, but also often on repeated data, either intentionally for the purpose of upweighting higher quality data, or unintentionally because data deduplication is not perfect and the model is exposed to repeated data at the sentence, paragraph, or document level. Some works have reported substantial negative performance effects of this repeated data. In this paper we attempt to study repeated data systematically and to understand its effects mechanistically. To do this, we train a family of models where most of the data is unique but a small fraction of it is repeated many times. We find a strong double descent phenomenon, in which repeated data can lead test loss to increase midway through training. A predictable range of repetition frequency leads to surprisingly severe degradation in performance. For instance, performance of an 800M parameter model can be degraded to that of a 2x smaller model (400M params) by repeating 0.1% of the data 100 times, despite the other 90% of the training tokens remaining unique. We suspect there is a range in the middle where the data can be memorized and doing so consumes a large fraction of the model's capacity, and this may be where the peak of degradation occurs. Finally, we connect these observations to recent mechanistic interpretability work - attempting to reverse engineer the detailed computations performed by the model - by showing that data repetition disproportionately damages copying and internal structures associated with generalization, such as induction heads, providing a possible mechanism for the shift from generalization to memorization. Taken together, these results provide a hypothesis for why repeating a relatively small fraction of data in large language models could lead to disproportionately large harms to performance., Comment: 23 pages, 22 figures
Published: 2022

10. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Author: Bai, Yuntao, Jones, Andy, Ndousse, Kamal, Askell, Amanda, Chen, Anna, DasSarma, Nova, Drain, Dawn, Fort, Stanislav, Ganguli, Deep, Henighan, Tom, Joseph, Nicholas, Kadavath, Saurav, Kernion, Jackson, Conerly, Tom, El-Showk, Sheer, Elhage, Nelson, Hatfield-Dodds, Zac, Hernandez, Danny, Hume, Tristan, Johnston, Scott, Kravec, Shauna, Lovitt, Liane, Nanda, Neel, Olsson, Catherine, Amodei, Dario, Brown, Tom, Clark, Jack, McCandlish, Sam, Olah, Chris, Mann, Ben, and Kaplan, Jared
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations, and is fully compatible with training for specialized skills such as python coding and summarization. We explore an iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data, efficiently improving our datasets and models. Finally, we investigate the robustness of RLHF training, and identify a roughly linear relation between the RL reward and the square root of the KL divergence between the policy and its initialization. Alongside our main results, we perform peripheral analyses on calibration, competing objectives, and the use of OOD detection, compare our models with human writers, and provide samples from our models using prompts appearing in recent related work., Comment: Data available at https://github.com/anthropics/hh-rlhf
Published: 2022

11. Predictability and Surprise in Large Generative Models

Author: Ganguli, Deep, Hernandez, Danny, Lovitt, Liane, DasSarma, Nova, Henighan, Tom, Jones, Andy, Joseph, Nicholas, Kernion, Jackson, Mann, Ben, Askell, Amanda, Bai, Yuntao, Chen, Anna, Conerly, Tom, Drain, Dawn, Elhage, Nelson, Showk, Sheer El, Fort, Stanislav, Hatfield-Dodds, Zac, Johnston, Scott, Kravec, Shauna, Nanda, Neel, Ndousse, Kamal, Olsson, Catherine, Amodei, Daniela, Amodei, Dario, Brown, Tom, Kaplan, Jared, McCandlish, Sam, Olah, Chris, and Clark, Jack
Subjects: Computer Science - Computers and Society
Abstract: Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property. Namely, these generative models have an unusual combination of predictable loss on a broad training distribution (as embodied in their "scaling laws"), and unpredictable specific capabilities, inputs, and outputs. We believe that the high-level predictability and appearance of useful capabilities drives rapid development of such models, while the unpredictable qualities make it difficult to anticipate the consequences of model deployment. We go through examples of how this combination can lead to socially harmful behavior with examples from the literature and real world observations, and we also perform two novel experiments to illustrate our point about harms from unpredictability. Furthermore, we analyze how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment. We conclude with a list of possible interventions the AI community may take to increase the chance of these models having a beneficial impact. We intend this paper to be useful to policymakers who want to understand and regulate AI systems, technologists who care about the potential policy impact of their work, and academics who want to analyze, critique, and potentially develop large generative models., Comment: Updated to reflect the version submitted (and accepted) to ACM FAccT '22. This update incorporates feedback from peer-review and fixes minor typos. See open access FAccT conference version at: https://dl.acm.org/doi/abs/10.1145/3531146.3533229
Published: 2022
Full Text: View/download PDF

12. A General Language Assistant as a Laboratory for Alignment

Author: Askell, Amanda, Bai, Yuntao, Chen, Anna, Drain, Dawn, Ganguli, Deep, Henighan, Tom, Jones, Andy, Joseph, Nicholas, Mann, Ben, DasSarma, Nova, Elhage, Nelson, Hatfield-Dodds, Zac, Hernandez, Danny, Kernion, Jackson, Ndousse, Kamal, Olsson, Catherine, Amodei, Dario, Brown, Tom, Clark, Jack, McCandlish, Sam, Olah, Chris, and Kaplan, Jared
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful, honest, and harmless. As an initial foray in this direction we study simple baseline techniques and evaluations, such as prompting. We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of large models. Next we investigate scaling trends for several training objectives relevant to alignment, comparing imitation learning, binary discrimination, and ranked preference modeling. We find that ranked preference modeling performs much better than imitation learning, and often scales more favorably with model size. In contrast, binary discrimination typically performs and scales very similarly to imitation learning. Finally we study a `preference model pre-training' stage of training, with the goal of improving sample efficiency when finetuning on human preferences., Comment: 26+19 pages; v2 typos fixed, refs added, figure scale / colors fixed; v3 correct very non-standard TruthfulQA formatting and metric, alignment implications slightly improved
Published: 2021

13. Evaluating Large Language Models Trained on Code

Author: Chen, Mark, Tworek, Jerry, Jun, Heewoo, Yuan, Qiming, Pinto, Henrique Ponde de Oliveira, Kaplan, Jared, Edwards, Harri, Burda, Yuri, Joseph, Nicholas, Brockman, Greg, Ray, Alex, Puri, Raul, Krueger, Gretchen, Petrov, Michael, Khlaaf, Heidy, Sastry, Girish, Mishkin, Pamela, Chan, Brooke, Gray, Scott, Ryder, Nick, Pavlov, Mikhail, Power, Alethea, Kaiser, Lukasz, Bavarian, Mohammad, Winter, Clemens, Tillet, Philippe, Such, Felipe Petroski, Cummings, Dave, Plappert, Matthias, Chantzis, Fotios, Barnes, Elizabeth, Herbert-Voss, Ariel, Guss, William Hebgen, Nichol, Alex, Paino, Alex, Tezak, Nikolas, Tang, Jie, Babuschkin, Igor, Balaji, Suchir, Jain, Shantanu, Saunders, William, Hesse, Christopher, Carr, Andrew N., Leike, Jan, Achiam, Josh, Misra, Vedant, Morikawa, Evan, Radford, Alec, Knight, Matthew, Brundage, Miles, Murati, Mira, Mayer, Katie, Welinder, Peter, McGrew, Bob, Amodei, Dario, McCandlish, Sam, Sutskever, Ilya, and Zaremba, Wojciech
Subjects: Computer Science - Machine Learning
Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics., Comment: corrected typos, added references, added authors, added acknowledgements
Published: 2021

14. Scaling Laws for Autoregressive Generative Modeling

Author: Henighan, Tom, Kaplan, Jared, Katz, Mor, Chen, Mark, Hesse, Christopher, Jackson, Jacob, Jun, Heewoo, Brown, Tom B., Dhariwal, Prafulla, Gray, Scott, Hallacy, Chris, Mann, Benjamin, Radford, Alec, Ramesh, Aditya, Ryder, Nick, Ziegler, Daniel M., Schulman, John, Amodei, Dario, and McCandlish, Sam
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depends on the compute budget through a power-law, with exponents that are nearly universal across all data domains. The cross-entropy loss has an information theoretic interpretation as $S($True$) + D_{\mathrm{KL}}($True$||$Model$)$, and the empirical scaling laws suggest a prediction for both the true data distribution's entropy and the KL divergence between the true and model distributions. With this interpretation, billion-parameter Transformers are nearly perfect models of the YFCC100M image distribution downsampled to an $8\times 8$ resolution, and we can forecast the model size needed to achieve any given reducible loss (ie $D_{\mathrm{KL}}$) in nats/image for other resolutions. We find a number of additional scaling laws in specific domains: (a) we identify a scaling relation for the mutual information between captions and images in multimodal models, and show how to answer the question "Is a picture worth a thousand words?"; (b) in the case of mathematical problem solving, we identify scaling laws for model performance when extrapolating beyond the training distribution; (c) we finetune generative image models for ImageNet classification and find smooth scaling of the classification loss and error rate, even as the generative loss levels off. Taken together, these results strengthen the case that scaling laws have important implications for neural network performance, including on downstream tasks., Comment: 20+17 pages, 33 figures; added appendix with additional language results
Published: 2020

15. Learning to summarize from human feedback

Author: Stiennon, Nisan, Ouyang, Long, Wu, Jeff, Ziegler, Daniel M., Lowe, Ryan, Voss, Chelsea, Radford, Alec, Amodei, Dario, and Christiano, Paul
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about -- summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. We conduct extensive analyses to understand our human feedback dataset and fine-tuned models We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want., Comment: NeurIPS 2020
Published: 2020

16. Language Models are Few-Shot Learners

Author: Brown, Tom B., Mann, Benjamin, Ryder, Nick, Subbiah, Melanie, Kaplan, Jared, Dhariwal, Prafulla, Neelakantan, Arvind, Shyam, Pranav, Sastry, Girish, Askell, Amanda, Agarwal, Sandhini, Herbert-Voss, Ariel, Krueger, Gretchen, Henighan, Tom, Child, Rewon, Ramesh, Aditya, Ziegler, Daniel M., Wu, Jeffrey, Winter, Clemens, Hesse, Christopher, Chen, Mark, Sigler, Eric, Litwin, Mateusz, Gray, Scott, Chess, Benjamin, Clark, Jack, Berner, Christopher, McCandlish, Sam, Radford, Alec, Sutskever, Ilya, and Amodei, Dario
Subjects: Computer Science - Computation and Language
Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general., Comment: 40+32 pages
Published: 2020

17. Scaling Laws for Neural Language Models

Author: Kaplan, Jared, McCandlish, Sam, Henighan, Tom, Brown, Tom B., Chess, Benjamin, Child, Rewon, Gray, Scott, Radford, Alec, Wu, Jeffrey, and Amodei, Dario
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence., Comment: 19 pages, 15 figures
Published: 2020

18. Fine-Tuning Language Models from Human Preferences

Author: Ziegler, Daniel M., Stiennon, Nisan, Wu, Jeffrey, Brown, Tom B., Radford, Alec, Amodei, Dario, Christiano, Paul, and Irving, Geoffrey
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Reward learning enables the application of reinforcement learning (RL) to tasks where reward is defined by human judgment, building a model of reward by asking humans questions. Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks. In this paper, we build on advances in generative pretraining of language models to apply reward learning to four natural language tasks: continuing text with positive sentiment or physically descriptive language, and summarization tasks on the TL;DR and CNN/Daily Mail datasets. For stylistic continuation we achieve good results with only 5,000 comparisons evaluated by humans. For summarization, models trained with 60,000 comparisons copy whole sentences from the input but skip irrelevant preamble; this leads to reasonable ROUGE scores and very good performance according to our human labelers, but may be exploiting the fact that labelers rely on simple heuristics.
Published: 2019

19. An Empirical Model of Large-Batch Training

Author: McCandlish, Sam, Kaplan, Jared, Amodei, Dario, and Team, OpenAI Dota
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In an increasing number of domains it has been demonstrated that deep learning models can be trained using relatively large batch sizes without sacrificing data efficiency. However the limits of this massive data parallelism seem to differ from domain to domain, ranging from batches of tens of thousands in ImageNet to batches of millions in RL agents that play the game Dota 2. To our knowledge there is limited conceptual understanding of why these limits to batch size differ or how we might choose the correct batch size in a new domain. In this paper, we demonstrate that a simple and easy-to-measure statistic called the gradient noise scale predicts the largest useful batch size across many domains and applications, including a number of supervised learning datasets (MNIST, SVHN, CIFAR-10, ImageNet, Billion Word), reinforcement learning domains (Atari and Dota), and even generative model training (autoencoders on SVHN). We find that the noise scale increases as the loss decreases over a training run and depends on the model size primarily through improved model performance. Our empirically-motivated theory also describes the tradeoff between compute-efficiency and time-efficiency, and provides a rough model of the benefits of adaptive batch-size training.
Published: 2018

20. Reward learning from human preferences and demonstrations in Atari

Author: Ibarz, Borja, Leike, Jan, Pohlen, Tobias, Irving, Geoffrey, Legg, Shane, and Amodei, Dario
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning
Abstract: To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels., Comment: NIPS 2018
Published: 2018

21. Supervising strong learners by amplifying weak experts

Author: Christiano, Paul, Shlegeris, Buck, and Amodei, Dario
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior. One solution is to have humans provide a training signal by demonstrating or judging performance, but this approach fails if the task is too complicated for a human to directly evaluate. We propose Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems. Iterated Amplification is closely related to Expert Iteration (Anthony et al., 2017; Silver et al., 2017), except that it uses no external reward function. We present results in algorithmic environments, showing that Iterated Amplification can efficiently learn complex behaviors.
Published: 2018

22. Variational Option Discovery Algorithms

Author: Achiam, Joshua, Edwards, Harrison, Amodei, Dario, and Abbeel, Pieter
Subjects: Computer Science - Artificial Intelligence
Abstract: We explore methods for option discovery based on variational inference and make two algorithmic contributions. First: we highlight a tight connection between variational option discovery methods and variational autoencoders, and introduce Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection. In VALOR, the policy encodes contexts from a noise distribution into trajectories, and the decoder recovers the contexts from the complete trajectories. Second: we propose a curriculum learning approach where the number of contexts seen by the agent increases whenever the agent's performance is strong enough (as measured by the decoder) on the current set of contexts. We show that this simple trick stabilizes training for VALOR and prior variational option discovery methods, allowing a single agent to learn many more modes of behavior than it could with a fixed context distribution. Finally, we investigate other topics related to variational option discovery, including fundamental limitations of the general approach and the applicability of learned options to downstream tasks.
Published: 2018

23. AI safety via debate

Author: Irving, Geoffrey, Christiano, Paul, and Amodei, Dario
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during training which agent behaviors are safe and useful, but this approach can fail if the task is too complicated for a human to directly judge. To help address this concern, we propose training agents via self play on a zero sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information. In an analogy to complexity theory, debate with optimal play can answer any question in PSPACE given polynomial time judges (direct judging answers only NP questions). In practice, whether debate works involves empirical questions about humans and the tasks we want AIs to perform, plus theoretical questions about the meaning of AI alignment. We report results on an initial MNIST experiment where agents compete to convince a sparse classifier, boosting the classifier's accuracy from 59.4% to 88.9% given 6 pixels and from 48.2% to 85.2% given 4 pixels. Finally, we discuss theoretical and practical aspects of the debate model, focusing on potential weaknesses as the model scales up, and we propose future human and computer experiments to test these properties., Comment: 24 pages, 6 figures
Published: 2018

24. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

Author: Brundage, Miles, Avin, Shahar, Clark, Jack, Toner, Helen, Eckersley, Peter, Garfinkel, Ben, Dafoe, Allan, Scharre, Paul, Zeitzoff, Thomas, Filar, Bobby, Anderson, Hyrum, Roff, Heather, Allen, Gregory C., Steinhardt, Jacob, Flynn, Carrick, hÉigeartaigh, Seán Ó, Beard, SJ, Belfield, Haydn, Farquhar, Sebastian, Lyle, Clare, Crootof, Rebecca, Evans, Owain, Page, Michael, Bryson, Joanna, Yampolskiy, Roman, and Amodei, Dario
Subjects: Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Computers and Society
Abstract: This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders.
Published: 2018

25. Deep reinforcement learning from human preferences

Author: Christiano, Paul, Leike, Jan, Brown, Tom B., Martic, Miljan, Legg, Shane, and Amodei, Dario
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback.
Published: 2017

26. Learning a Natural Language Interface with Neural Programmer

Author: Neelakantan, Arvind, Le, Quoc V., Abadi, Martin, McCallum, Andrew, and Amodei, Dario
Subjects: Computer Science - Computation and Language, Computer Science - Learning, Statistics - Machine Learning
Abstract: Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset. We enhance the objective function of Neural Programmer, a neural network with built-in discrete operations, and apply it on WikiTableQuestions, a natural language question-answering dataset. The model is trained end-to-end with weak supervision of question-answer pairs, and does not require domain-specific grammars, rules, or annotations that are key elements in previous approaches to program induction. The main experimental result in this paper is that a single Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with weak supervision. An ensemble of 15 models, with a trivial combination technique, achieves 37.7% accuracy, which is competitive to the current state-of-the-art accuracy of 37.1% obtained by a traditional natural language semantic parser., Comment: Published as a conference paper at ICLR 2017
Published: 2016

27. Concrete Problems in AI Safety

Author: Amodei, Dario, Olah, Chris, Steinhardt, Jacob, Christiano, Paul, Schulman, John, and Mané, Dan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Learning
Abstract: Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function ("avoiding side effects" and "avoiding reward hacking"), an objective function that is too expensive to evaluate frequently ("scalable supervision"), or undesirable behavior during the learning process ("safe exploration" and "distributional shift"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI., Comment: 29 pages
Published: 2016

28. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Author: Amodei, Dario, Anubhai, Rishita, Battenberg, Eric, Case, Carl, Casper, Jared, Catanzaro, Bryan, Chen, Jingdong, Chrzanowski, Mike, Coates, Adam, Diamos, Greg, Elsen, Erich, Engel, Jesse, Fan, Linxi, Fougner, Christopher, Han, Tony, Hannun, Awni, Jun, Billy, LeGresley, Patrick, Lin, Libby, Narang, Sharan, Ng, Andrew, Ozair, Sherjil, Prenger, Ryan, Raiman, Jonathan, Satheesh, Sanjeev, Seetapun, David, Sengupta, Shubho, Wang, Yi, Wang, Zhiqian, Wang, Chong, Xiao, Bo, Yogatama, Dani, Zhan, Jun, and Zhu, Zhenyao
Subjects: Computer Science - Computation and Language
Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.
Published: 2015

29. Thermodynamics for a network of neurons: Signatures of criticality

Author: Tkacik, Gasper, Mora, Thierry, Marre, Olivier, Amodei, Dario, Berry II, Michael J., and Bialek, William
Subjects: Quantitative Biology - Neurons and Cognition, Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics
Abstract: The activity of a neural network is defined by patterns of spiking and silence from the individual neurons. Because spikes are (relatively) sparse, patterns of activity with increasing numbers of spikes are less probable, but with more spikes the number of possible patterns increases. This tradeoff between probability and numerosity is mathematically equivalent to the relationship between entropy and energy in statistical physics. We construct this relationship for populations of up to N=160 neurons in a small patch of the vertebrate retina, using a combination of direct and model-based analyses of experiments on the response of this network to naturalistic movies. We see signs of a thermodynamic limit, where the entropy per neuron approaches a smooth function of the energy per neuron as N increases. The form of this function corresponds to the distribution of activity being poised near an unusual kind of critical point. Networks with more or less correlation among neurons would not reach this critical state. We suggest further tests of criticality, and give a brief discussion of its functional significance.
Published: 2014

30. Physical Principles for Scalable Neural Recording

Author: Marblestone, Adam H., Zamft, Bradley M., Maguire, Yael G., Shapiro, Mikhail G., Cybulski, Thaddeus R., Glaser, Joshua I., Amodei, Dario, Stranges, P. Benjamin, Kalhor, Reza, Dalrymple, David A., Seo, Dongjin, Alon, Elad, Maharbiz, Michel M., Carmena, Jose M., Rabaey, Jan M., Boyden, Edward S., Church, George M., and Kording, Konrad P.
Subjects: Quantitative Biology - Neurons and Cognition, Physics - Biological Physics
Abstract: Simultaneously measuring the activities of all neurons in a mammalian brain at millisecond resolution is a challenge beyond the limits of existing techniques in neuroscience. Entirely new approaches may be required, motivating an analysis of the fundamental physical constraints on the problem. We outline the physical principles governing brain activity mapping using optical, electrical,magnetic resonance, and molecular modalities of neural recording. Focusing on the mouse brain, we analyze the scalability of each method, concentrating on the limitations imposed by spatiotemporal resolution, energy dissipation, and volume displacement. We also study the physics of powering and communicating with microscale devices embedded in brain tissue.
Published: 2013
Full Text: View/download PDF

31. Searching for collective behavior in a network of real neurons

Author: Tkačik, Gašper, Marre, Olivier, Amodei, Dario, Schneidman, Elad, Bialek, William, and Berry II, Michael J
Subjects: Quantitative Biology - Neurons and Cognition, Condensed Matter - Statistical Mechanics, Physics - Biological Physics
Abstract: Maximum entropy models are the least structured probability distributions that exactly reproduce a chosen set of statistics measured in an interacting network. Here we use this principle to construct probabilistic models which describe the correlated spiking activity of populations of up to 120 neurons in the salamander retina as it responds to natural movies. Already in groups as small as 10 neurons, interactions between spikes can no longer be regarded as small perturbations in an otherwise independent system; for 40 or more neurons pairwise interactions need to be supplemented by a global interaction that controls the distribution of synchrony in the population. Here we show that such "K-pairwise" models--being systematic extensions of the previously used pairwise Ising models--provide an excellent account of the data. We explore the properties of the neural vocabulary by: 1) estimating its entropy, which constrains the population's capacity to represent visual information; 2) classifying activity patterns into a small set of metastable collective modes; 3) showing that the neural codeword ensembles are extremely inhomogenous; 4) demonstrating that the state of individual neurons is highly predictable from the rest of the population, allowing the capacity for error correction., Comment: 24 pages, 19 figures
Published: 2013
Full Text: View/download PDF

32. The simplest maximum entropy model for collective behavior in a neural network

Author: Tkacik, Gasper, Marre, Olivier, Mora, Thierry, Amodei, Dario, Berry II, Michael J., and Bialek, William
Subjects: Quantitative Biology - Neurons and Cognition, Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics
Abstract: Recent work emphasizes that the maximum entropy principle provides a bridge between statistical mechanics models for collective behavior in neural networks and experiments on networks of real neurons. Most of this work has focused on capturing the measured correlations among pairs of neurons. Here we suggest an alternative, constructing models that are consistent with the distribution of global network activity, i.e. the probability that K out of N cells in the network generate action potentials in the same small time bin. The inverse problem that we need to solve in constructing the model is analytically tractable, and provides a natural "thermodynamics" for the network in the limit of large N. We analyze the responses of neurons in a small patch of the retina to naturalistic stimuli, and find that the implied thermodynamics is very close to an unusual critical point, in which the entropy (in proper units) is exactly equal to the energy.
Published: 2012
Full Text: View/download PDF

33. Improving Precursor Selectivity in Data-Independent Acquisition Using Overlapping Windows

Author: Amodei, Dario, Egertson, Jarrett, MacLean, Brendan X., Johnson, Richard, Merrihew, Gennifer E., Keller, Austin, Marsh, Don, Vitek, Olga, Mallick, Parag, and MacCoss, Michael J.
Published: 2019
Full Text: View/download PDF

34. Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry

Author: Parker, Sarah J., Rost, Hannes, Rosenberger, George, Collins, Ben C., Malmström, Lars, Amodei, Dario, Venkatraman, Vidya, Raedschelders, Koen, Van Eyk, Jennifer E., and Aebersold, Ruedi
Published: 2015
Full Text: View/download PDF

35. Thermodynamics and signatures of criticality in a network of neurons

Author: Tkačik, Gašper, Mora, Thierry, Marre, Olivier, Amodei, Dario, Palmer, Stephanie E., Berry, Michael J., and Bialek, William
Published: 2015

36. Discovering Language Model Behaviors with Model-Written Evaluations

Author: Perez, Ethan, primary, Ringer, Sam, additional, Lukosiute, Kamile, additional, Nguyen, Karina, additional, Chen, Edwin, additional, Heiner, Scott, additional, Pettit, Craig, additional, Olsson, Catherine, additional, Kundu, Sandipan, additional, Kadavath, Saurav, additional, Jones, Andy, additional, Chen, Anna, additional, Mann, Benjamin, additional, Israel, Brian, additional, Seethor, Bryan, additional, McKinnon, Cameron, additional, Olah, Christopher, additional, Yan, Da, additional, Amodei, Daniela, additional, Amodei, Dario, additional, Drain, Dawn, additional, Li, Dustin, additional, Tran-Johnson, Eli, additional, Khundadze, Guro, additional, Kernion, Jackson, additional, Landis, James, additional, Kerr, Jamie, additional, Mueller, Jared, additional, Hyun, Jeeyoon, additional, Landau, Joshua, additional, Ndousse, Kamal, additional, Goldberg, Landon, additional, Lovitt, Liane, additional, Lucas, Martin, additional, Sellitto, Michael, additional, Zhang, Miranda, additional, Kingsland, Neerav, additional, Elhage, Nelson, additional, Joseph, Nicholas, additional, Mercado, Noemi, additional, DasSarma, Nova, additional, Rausch, Oliver, additional, Larson, Robin, additional, McCandlish, Sam, additional, Johnston, Scott, additional, Kravec, Shauna, additional, El Showk, Sheer, additional, Lanham, Tamera, additional, Telleen-Lawton, Timothy, additional, Brown, Tom, additional, Henighan, Tom, additional, Hume, Tristan, additional, Bai, Yuntao, additional, Hatfield-Dodds, Zac, additional, Clark, Jack, additional, Bowman, Samuel R., additional, Askell, Amanda, additional, Grosse, Roger, additional, Hernandez, Danny, additional, Ganguli, Deep, additional, Hubinger, Evan, additional, Schiefer, Nicholas, additional, and Kaplan, Jared, additional
Published: 2023
Full Text: View/download PDF

37. Characterizing deformability and surface friction of cancer cells

Author: Byun, Sangwon, Son, Sungmin, Amodei, Dario, Cermak, Nathan, Shaw, Josephine, Kang, Joon Ho, Hecht, Vivian C., Winslow, Monte M., Jacks, Tyler, Mallick, Parag, and Manalis, Scott R.
Published: 2013

38. Predictability and Surprise in Large Generative Models

Author: Ganguli, Deep, primary, Hernandez, Danny, additional, Lovitt, Liane, additional, Askell, Amanda, additional, Bai, Yuntao, additional, Chen, Anna, additional, Conerly, Tom, additional, Dassarma, Nova, additional, Drain, Dawn, additional, Elhage, Nelson, additional, El Showk, Sheer, additional, Fort, Stanislav, additional, Hatfield-Dodds, Zac, additional, Henighan, Tom, additional, Johnston, Scott, additional, Jones, Andy, additional, Joseph, Nicholas, additional, Kernian, Jackson, additional, Kravec, Shauna, additional, Mann, Ben, additional, Nanda, Neel, additional, Ndousse, Kamal, additional, Olsson, Catherine, additional, Amodei, Daniela, additional, Brown, Tom, additional, Kaplan, Jared, additional, McCandlish, Sam, additional, Olah, Christopher, additional, Amodei, Dario, additional, and Clark, Jack, additional
Published: 2022
Full Text: View/download PDF

39. Mirrors with regular hexagonal segments

Author: Amodei, Dario and Padin, Stephen
Subjects: Mirrors -- Research, Astronomy, Physics
Abstract: The point-spread function and emissivity are calculated for a mirror made from regular hexagonal segments of just a few different sizes. A mirror of this type has many similar segments, which is an advantage for manufacturing, and for an ~f/1 mirror with [greater than or equal to] 1000 segments and [greater than or equal to] 4 sizes of regular hexagons the increase in intersegment gap area is negligible. This result raises the possibility of making a mirror from very large numbers of identical small segments that are warped to the required figure. OCIS codes: 350.1260, 220.4880, 110.6770.
Published: 2003

40. Trump Can Keep America's AI Advantage.

Author: Amodei, Dario and Pottinger, Matt
Subjects: *INTERNET access control, *PUBLIC spending, *ENERGY infrastructure, *EXPORT controls, *SEMICONDUCTOR manufacturing
Abstract: The article discusses the importance of the U.S. maintaining its advantage in artificial intelligence (AI) to preserve national security. It highlights the strategic significance of AI technology and the potential benefits it could bring to various fields. The text emphasizes the need for the U.S. to lead in AI development and implement export controls to prevent other nations, particularly China, from surpassing American capabilities in this critical area. [Extracted from the article]
Published: 2025

41. Building high-quality assay libraries for targeted analysis of SWATH MS data

Author: Schubert, Olga T., Gillet, Ludovic C., Collins, Ben C., Navarro, Pedro, Rosenberger, George, Wolski, Witold E., Lam, Henry H N, Amodei, Dario, Mallick, Parag, MacLean, Brendan, Aebersold, Ruedi, Schubert, Olga T., Gillet, Ludovic C., Collins, Ben C., Navarro, Pedro, Rosenberger, George, Wolski, Witold E., Lam, Henry H N, Amodei, Dario, Mallick, Parag, MacLean, Brendan, and Aebersold, Ruedi
Abstract: Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2-3 d to complete, depending on the extent of the library and the computational resources available.
Published: 2015

42. Building high-quality assay libraries for targeted analysis of SWATH MS data

Author: Schubert, Olga T, primary, Gillet, Ludovic C, additional, Collins, Ben C, additional, Navarro, Pedro, additional, Rosenberger, George, additional, Wolski, Witold E, additional, Lam, Henry, additional, Amodei, Dario, additional, Mallick, Parag, additional, MacLean, Brendan, additional, and Aebersold, Ruedi, additional
Published: 2015
Full Text: View/download PDF

43. Characterizing deformability and surface friction of cancer cells

Author: Massachusetts Institute of Technology. Computational and Systems Biology Program, Massachusetts Institute of Technology. Department of Biological Engineering, Massachusetts Institute of Technology. Department of Biology, Massachusetts Institute of Technology. Department of Mechanical Engineering, Massachusetts Institute of Technology. Department of Physics, Koch Institute for Integrative Cancer Research at MIT, Byun, Sangwon, Son, Sungmin, Cermak, Nathan, Shaw, Josephine, Kang, Joon Ho, Hecht, Vivian Chaya, Winslow, Monte M., Jacks, Tyler E., Manalis, Scott R., Amodei, Dario, Mallick, Parag, Winslow, Monte Meier, Jacks, Tyler E, Manalis, Scott R, Massachusetts Institute of Technology. Computational and Systems Biology Program, Massachusetts Institute of Technology. Department of Biological Engineering, Massachusetts Institute of Technology. Department of Biology, Massachusetts Institute of Technology. Department of Mechanical Engineering, Massachusetts Institute of Technology. Department of Physics, Koch Institute for Integrative Cancer Research at MIT, Byun, Sangwon, Son, Sungmin, Cermak, Nathan, Shaw, Josephine, Kang, Joon Ho, Hecht, Vivian Chaya, Winslow, Monte M., Jacks, Tyler E., Manalis, Scott R., Amodei, Dario, Mallick, Parag, Winslow, Monte Meier, Jacks, Tyler E, and Manalis, Scott R
Abstract: Metastasis requires the penetration of cancer cells through tight spaces, which is mediated by the physical properties of the cells as well as their interactions with the confined environment. Various microfluidic approaches have been devised to mimic traversal in vitro by measuring the time required for cells to pass through a constriction. Although a cell’s passage time is expected to depend on its deformability, measurements from existing approaches are confounded by a cell's size and its frictional properties with the channel wall. Here, we introduce a device that enables the precise measurement of (i) the size of a single cell, given by its buoyant mass, (ii) the velocity of the cell entering a constricted microchannel (entry velocity), and (iii) the velocity of the cell as it transits through the constriction (transit velocity). Changing the deformability of the cell by perturbing its cytoskeleton primarily alters the entry velocity, whereas changing the surface friction by immobilizing positive charges on the constriction's walls primarily alters the transit velocity, indicating that these parameters can give insight into the factors affecting the passage of each cell. When accounting for cell buoyant mass, we find that cells possessing higher metastatic potential exhibit faster entry velocities than cells with lower metastatic potential. We additionally find that some cell types with higher metastatic potential exhibit greater than expected changes in transit velocities, suggesting that not only the increased deformability but reduced friction may be a factor in enabling invasive cancer cells to efficiently squeeze through tight spaces., National Cancer Institute (U.S.) (Contract CCNE-T (Grant 26697290-47281-A)), National Cancer Institute (U.S.) (Physical Sciences Oncology Center U54CA143874), Stand Up To Cancer (SU2C/AACR)
Published: 2014

44. Physical principles for scalable neural recording

Author: Massachusetts Institute of Technology. Department of Biological Engineering, Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology. Media Laboratory, Program in Media Arts and Sciences (Massachusetts Institute of Technology), Dalrymple, David Allen, Boyden, Edward Stuart, Marblestone, Adam Henry, Zamft, Bradley M., Shapiro, Mikhail G., Cybulski, Thaddeus R., Glaser, Joshua I., Amodei, Dario, Stranges, P. Benjamin, Kalhor, Reza, Seo, Dongjin, Alon, Elad, Maharbiz, Michel M., Carmena, Jose M., Rabaey, Jan M., Church, George M., Kording, Konrad P., Maguire, Yael G., 1975, Boyden, Edward, Massachusetts Institute of Technology. Department of Biological Engineering, Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology. Media Laboratory, Program in Media Arts and Sciences (Massachusetts Institute of Technology), Dalrymple, David Allen, Boyden, Edward Stuart, Marblestone, Adam Henry, Zamft, Bradley M., Shapiro, Mikhail G., Cybulski, Thaddeus R., Glaser, Joshua I., Amodei, Dario, Stranges, P. Benjamin, Kalhor, Reza, Seo, Dongjin, Alon, Elad, Maharbiz, Michel M., Carmena, Jose M., Rabaey, Jan M., Church, George M., Kording, Konrad P., Maguire, Yael G., 1975, and Boyden, Edward
Abstract: Simultaneously measuring the activities of all neurons in a mammalian brain at millisecond resolution is a challenge beyond the limits of existing techniques in neuroscience. Entirely new approaches may be required, motivating an analysis of the fundamental physical constraints on the problem. We outline the physical principles governing brain activity mapping using optical, electrical, magnetic resonance, and molecular modalities of neural recording. Focusing on the mouse brain, we analyze the scalability of each method, concentrating on the limitations imposed by spatiotemporal resolution, energy dissipation, and volume displacement. Based on this analysis, all existing approaches require orders of magnitude improvement in key parameters. Electrical recording is limited by the low multiplexing capacity of electrodes and their lack of intrinsic spatial resolution, optical methods are constrained by the scattering of visible light in brain tissue, magnetic resonance is hindered by the diffusion and relaxation timescales of water protons, and the implementation of molecular recording is complicated by the stochastic kinetics of enzymes. Understanding the physical limits of brain activity mapping may provide insight into opportunities for novel solutions. For example, unconventional methods for delivering electrodes may enable unprecedented numbers of recording sites, embedded optical devices could allow optical detectors to be placed within a few scattering lengths of the measured neurons, and new classes of molecularly engineered sensors might obviate cumbersome hardware architectures. We also study the physics of powering and communicating with microscale devices embedded in brain tissue and find that, while radio-frequency electromagnetic data transmission suffers from a severe power–bandwidth tradeoff, communication via infrared light or ultrasound may allow high data rates due to the possibility of spatial multiplexing. The use of embedded local recording and wi, Thiel Foundation, National Institutes of Health (U.S.), National Science Foundation (U.S.), McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology. Media Laboratory, New York Stem Cell Foundation (Robertson Neuroscience Investigator Award), Paul G. Allen Family Foundation (Distinguished Investigator in Neuroscience Award)
Published: 2013

45. A cross-platform toolkit for mass spectrometry and proteomics

Author: Chambers, Matthew C, Maclean, Brendan, Burke, Robert, Amodei, Dario, Ruderman, Daniel L, Neumann, Steffen, Gatto, Laurent, Fischer, Bernd, Pratt, Brian, Egertson, Jarrett, Hoff, Katherine, Kessner, Darren, Tasman, Natalie, Shulman, Nicholas, Frewen, Barbara, Baker, Tahmina A, Brusniak, Mi-Youn, Paulse, Christopher, Creasy, David, Flashner, Lisa, Kani, Kian, Moulding, Chris, Seymour, Sean L, Nuwaysir, Lydia M, Lefebvre, Brent, Kuhlmann, Frank, Roark, Joe, Rainer, Paape, Detlev, Suckau, Hemenway, Tina, Huhmer, Andreas, Langridge, James, Connolly, Brian, Chadick, Trey, Holly, Krisztina, Eckels, Josh, Deutsch, Eric W, Moritz, Robert L, Katz, Jonathan E, Agus, David B, MacCoss, Michael, Tabb, David L, Mallick, Parag, Chambers, Matthew C, Maclean, Brendan, Burke, Robert, Amodei, Dario, Ruderman, Daniel L, Neumann, Steffen, Gatto, Laurent, Fischer, Bernd, Pratt, Brian, Egertson, Jarrett, Hoff, Katherine, Kessner, Darren, Tasman, Natalie, Shulman, Nicholas, Frewen, Barbara, Baker, Tahmina A, Brusniak, Mi-Youn, Paulse, Christopher, Creasy, David, Flashner, Lisa, Kani, Kian, Moulding, Chris, Seymour, Sean L, Nuwaysir, Lydia M, Lefebvre, Brent, Kuhlmann, Frank, Roark, Joe, Rainer, Paape, Detlev, Suckau, Hemenway, Tina, Huhmer, Andreas, Langridge, James, Connolly, Brian, Chadick, Trey, Holly, Krisztina, Eckels, Josh, Deutsch, Eric W, Moritz, Robert L, Katz, Jonathan E, Agus, David B, MacCoss, Michael, Tabb, David L, and Mallick, Parag
Abstract: To the Editor: Mass spectrometry–based proteomics has become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biological and clinical samples1, identify pathways affected by endogenous and exogenous perturbations2 and characterize protein complexes3. Despite successes, the interpretation of vast proteomics data…
Published: 2012

46. Searching for Collective Behavior in a Large Network of Sensory Neurons

Author: Tkačik, Gašper, primary, Marre, Olivier, additional, Amodei, Dario, additional, Schneidman, Elad, additional, Bialek, William, additional, and Berry, Michael J., additional
Published: 2014
Full Text: View/download PDF

47. The simplest maximum entropy model for collective behavior in a neural network

Author: Tkačik, Gašper, primary, Marre, Olivier, additional, Mora, Thierry, additional, Amodei, Dario, additional, Berry II, Michael J, additional, and Bialek, William, additional
Published: 2013
Full Text: View/download PDF

48. Physical principles for scalable neural recording

Author: Marblestone, Adam H., primary, Zamft, Bradley M., additional, Maguire, Yael G., additional, Shapiro, Mikhail G., additional, Cybulski, Thaddeus R., additional, Glaser, Joshua I., additional, Amodei, Dario, additional, Stranges, P. Benjamin, additional, Kalhor, Reza, additional, Dalrymple, David A., additional, Seo, Dongjin, additional, Alon, Elad, additional, Maharbiz, Michel M., additional, Carmena, Jose M., additional, Rabaey, Jan M., additional, Boyden, Edward S., additional, Church, George M., additional, and Kording, Konrad P., additional
Published: 2013
Full Text: View/download PDF

49. Mapping a Complete Neural Population in the Retina

Author: Marre, Olivier, primary, Amodei, Dario, additional, Deshmukh, Nikhil, additional, Sadeghi, Kolia, additional, Soo, Frederick, additional, Holy, Timothy E., additional, and Berry, Michael J., additional
Published: 2012
Full Text: View/download PDF

50. A cross-platform toolkit for mass spectrometry and proteomics

Author: Chambers, Matthew C, primary, Maclean, Brendan, additional, Burke, Robert, additional, Amodei, Dario, additional, Ruderman, Daniel L, additional, Neumann, Steffen, additional, Gatto, Laurent, additional, Fischer, Bernd, additional, Pratt, Brian, additional, Egertson, Jarrett, additional, Hoff, Katherine, additional, Kessner, Darren, additional, Tasman, Natalie, additional, Shulman, Nicholas, additional, Frewen, Barbara, additional, Baker, Tahmina A, additional, Brusniak, Mi-Youn, additional, Paulse, Christopher, additional, Creasy, David, additional, Flashner, Lisa, additional, Kani, Kian, additional, Moulding, Chris, additional, Seymour, Sean L, additional, Nuwaysir, Lydia M, additional, Lefebvre, Brent, additional, Kuhlmann, Frank, additional, Roark, Joe, additional, Rainer, Paape, additional, Detlev, Suckau, additional, Hemenway, Tina, additional, Huhmer, Andreas, additional, Langridge, James, additional, Connolly, Brian, additional, Chadick, Trey, additional, Holly, Krisztina, additional, Eckels, Josh, additional, Deutsch, Eric W, additional, Moritz, Robert L, additional, Katz, Jonathan E, additional, Agus, David B, additional, MacCoss, Michael, additional, Tabb, David L, additional, and Mallick, Parag, additional
Published: 2012
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

156 results on '"Amodei, Dario"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources