Author: "Guha, Arjun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Guha, Arjun"' showing total 251 results

Start Over Author "Guha, Arjun"

251 results on '"Guha, Arjun"'

1. SelfCodeAlign: Self-Alignment for Code Generation

Author: Wei, Yuxiang, Cassano, Federico, Liu, Jiawei, Ding, Yifeng, Jain, Naman, Mueller, Zachary, de Vries, Harm, von Werra, Leandro, Guha, Arjun, and Zhang, Lingming
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Software Engineering
Abstract: Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks. It then samples multiple responses per task, pairs each with test cases, and validates them in a sandbox environment. Finally, passing examples are selected for instruction tuning. In our primary experiments, we use SelfCodeAlign with CodeQwen1.5-7B to generate a dataset of 74k instruction-response pairs. Finetuning on this dataset leads to a model that achieves a 67.1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller. Across all benchmarks, this finetuned model consistently outperforms the original version trained with OctoPack, the previous state-of-the-art method for instruction tuning without human annotations or distillation. Additionally, we show that SelfCodeAlign is effective across LLMs of various sizes, from 3B to 33B, and that the base models can benefit more from alignment with their own data distribution. We further validate each component's effectiveness in our pipeline, showing that SelfCodeAlign outperforms both direct distillation from GPT-4o and leading GPT-3.5-based distillation methods, such as OSS-Instruct and Evol-Instruct. SelfCodeAlign has also led to the creation of StarCoder2-Instruct, the first fully transparent, permissively licensed, and self-aligned code LLM that achieves state-of-the-art coding performance., Comment: Accepted to NeurIPS 2024
Published: 2024

2. Creating and Repairing Robot Programs in Open-World Domains

Author: Schlesinger, Claire, Guha, Arjun, and Biswas, Joydeep
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Using Large Language Models (LLMs) to produce robot programs from natural language has allowed for robot systems that can complete a higher diversity of tasks. However, LLM-generated programs may be faulty, either due to ambiguity in instructions, misinterpretation of the desired task, or missing information about the world state. As these programs run, the state of the world changes and they gather new information. When a failure occurs, it is important that they recover from the current world state and avoid repeating steps that they they previously completed successfully. We propose RoboRepair, a system which traces the execution of a program up until error, and then runs an LLM-produced recovery program that minimizes repeated actions. To evaluate the efficacy of our system, we create a benchmark consisting of eleven tasks with various error conditions that require the generation of a recovery program. We compare the efficiency of the recovery program to a plan built with an oracle that has foreknowledge of future errors., Comment: Under review at ACL Rolling Review
Published: 2024

3. Substance Beats Style: Why Beginning Students Fail to Code with LLMs

Author: Lucchetti, Francesca, Wu, Zixuan, Guha, Arjun, Feldman, Molly Q, and Anderson, Carolyn Jane
Subjects: Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: Although LLMs are increasing the productivity of professional programmers, existing work shows that beginners struggle to prompt LLMs to solve text-to-code tasks. Why is this the case? This paper explores two competing hypotheses about the cause of student-LLM miscommunication: (1) students simply lack the technical vocabulary needed to write good prompts, and (2) students do not understand the extent of information that LLMs need to solve code generation tasks. We study (1) with a causal intervention experiment on technical vocabulary and (2) by analyzing graphs that abstract how students edit prompts and the different failures that they encounter. We find that substance beats style: a poor grasp of technical vocabulary is merely correlated with prompt failure; that the information content of prompts predicts success; that students get stuck making trivial edits; and more. Our findings have implications for the use of LLMs in programming education, and for efforts to make computing more accessible with LLMs.
Published: 2024

4. NNsight and NDIF: Democratizing Access to Foundation Model Internals

Author: Fiotto-Kaufman, Jaden, Loftus, Alexander R, Todd, Eric, Brinkmann, Jannik, Juang, Caden, Pal, Koyena, Rager, Can, Mueller, Aaron, Marks, Samuel, Sharma, Arnab Sen, Lucchetti, Francesca, Ripa, Michael, Belfki, Adam, Prakash, Nikhil, Multani, Sumeet, Brodley, Carla, Guha, Arjun, Bell, Jonathan, Wallace, Byron, and Bau, David
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: The enormous scale of state-of-the-art foundation models has limited their accessibility to scientists, because customized experiments at large model sizes require costly hardware and complex engineering that is impractical for most researchers. To alleviate these problems, we introduce NNsight, an open-source Python package with a simple, flexible API that can express interventions on any PyTorch model by building computation graphs. We also introduce NDIF, a collaborative research platform providing researchers access to foundation-scale LLMs via the NNsight API. Code, documentation, and tutorials are available at https://www.nnsight.net., Comment: Code at https://nnsight.net
Published: 2024

5. Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs

Author: Hu, Zichao, Li, Junyi Jessy, Guha, Arjun, and Biswas, Joydeep
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: Open-weight LLMs are particularly appealing choices to generate training data for fine-tuning Code LLMs on domain-specific service robot applications because they are cost-effective, customizable, and offer better privacy protection. However, unlike proprietary LLMs, open-weight models are more error-prone and often produce programs that violate domain-specific constraints. A promising solution is to incorporate a robot simulator with a well-defined environment to verify program correctness. Yet, these environments require pre-enumeration of relevant entities and their states, which limits the diversity of programs that can be effectively verified. In this work, we introduce ROBO-INSTRUCT that preserves the diversity of programs generated by an LLM while providing the correctness of simulator-based checking. ROBO-INSTRUCT introduces ROBOSIM to dynamically synthesize consistent simulation environments for each generated program. Moreover, ROBO-INSTRUCT handles subtler instruction-program inconsistencies that do not result in a constraint violation via INSTALIGN, an LLM-aided instruction-program alignment process. Given domain-specific APIs and a few seed examples, ROBO-INSTRUCT can leverage an 8B Llama3 model to generate a training dataset for fine-tuning a 7B CodeLlama model. Our fine-tuned model achieves a 28.75% improvement in pass@1 over the original base model and a 13.75% improvement compared to its SELF-INSTRUCT-finetuned counterparts, even surpassing the performance of a few proprietary LLMs, such as GPT-3.5-Turbo and Gemini-Pro.
Published: 2024

6. Understanding How CodeLLMs (Mis)Predict Types with Activation Steering

Author: Lucchetti, Francesca and Guha, Arjun
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Programming Languages
Abstract: CodeLLMs are transforming software development as we know it. This is especially true for tasks where rule-based approaches fall short, like type prediction. The type prediction task consists in adding a new type annotation to a partially typed program, such that the resulting program is closer to being fully typed. The intractability of rule-based approaches and high cost of manual annotation make CodeLLMs an attractive solution to the problem. However, CodeLLMs are still far from being deployed on the large-scale due to doubts surrounding their reliability. To shed some light on how CodeLLMs approach type prediction, we investigate what happens when a model mispredicts a type. We show that by applying semantics-preserving edits to code, CodeLLMs are eventually misled into mispredicting type annotations. However, by leveraging activation steering we are able to "steer" the model back to the correct prediction, making models more robust against semantically irrelevant prompt features. We show that steering achieves comparable performance to fine-tuning directly on the type prediction task. Furthermore, we find that steering vectors computed from Python code are effective at correcting TypeScript mispredictions, and vice versa. To our knowledge, this is the first evidence of its kind to suggest that CodeLLMs learn task representations that transfer across languages., Comment: 14 pages, 7 figures
Published: 2024

7. StarCoder 2 and The Stack v2: The Next Generation

Author: Lozhkov, Anton, Li, Raymond, Allal, Loubna Ben, Cassano, Federico, Lamy-Poirier, Joel, Tazi, Nouamane, Tang, Ao, Pykhtar, Dmytro, Liu, Jiawei, Wei, Yuxiang, Liu, Tianyang, Tian, Max, Kocetkov, Denis, Zucker, Arthur, Belkada, Younes, Wang, Zijian, Liu, Qian, Abulkhanov, Dmitry, Paul, Indraneil, Li, Zhuang, Li, Wen-Ding, Risdal, Megan, Li, Jia, Zhu, Jian, Zhuo, Terry Yue, Zheltonozhskii, Evgenii, Dade, Nii Osae Osae, Yu, Wenhao, Krauß, Lucas, Jain, Naman, Su, Yixuan, He, Xuanli, Dey, Manan, Abati, Edoardo, Chai, Yekun, Muennighoff, Niklas, Tang, Xiangru, Oblokulov, Muhtasham, Akiki, Christopher, Marone, Marc, Mou, Chenghao, Mishra, Mayank, Gu, Alex, Hui, Binyuan, Dao, Tri, Zebaze, Armel, Dehaene, Olivier, Patry, Nicolas, Xu, Canwen, McAuley, Julian, Hu, Han, Scholak, Torsten, Paquet, Sebastien, Robinson, Jennifer, Anderson, Carolyn Jane, Chapados, Nicolas, Patwary, Mostofa, Tajbakhsh, Nima, Jernite, Yacine, Ferrandis, Carlos Muñoz, Zhang, Lingming, Hughes, Sean, Wolf, Thomas, Guha, Arjun, von Werra, Leandro, and de Vries, Harm
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
Published: 2024

8. How Beginning Programmers and Code LLMs (Mis)read Each Other

Author: Nguyen, Sydney, Babe, Hannah McLean, Zi, Yangtian, Guha, Arjun, Anderson, Carolyn Jane, and Feldman, Molly Q
Subjects: Computer Science - Human-Computer Interaction
Abstract: Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluating the correctness of generated code, and editing prompts when the generated code is incorrect. This paper presents a large-scale controlled study of how 120 beginning coders across three academic institutions approach writing and editing prompts. A novel experimental design allows us to target specific steps in the text-to-code process and reveals that beginners struggle with writing and editing prompts, even for problems at their skill level and when correctness is automatically determined. Our mixed-methods evaluation provides insight into student processes and perceptions with key implications for non-expert Code LLM use within and outside of education., Comment: Published in CHI 2024
Published: 2024
Full Text: View/download PDF

9. Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

Author: Cassano, Federico, Li, Luisa, Sethi, Akul, Shinn, Noah, Brennan-Jones, Abby, Ginesin, Jacob, Berman, Edward, Chakhnashvili, George, Lozhkov, Anton, Anderson, Carolyn Jane, and Guha, Arjun
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Programming Languages
Abstract: A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of code and an instruction to modify the code. The editing instruction may ask for a feature to be added or removed, describe a bug and ask for a fix, or ask for a different kind of solution. We introduce a carefully crafted benchmark of code editing tasks and use it to evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is better than the best open model at code editing tasks. We also introduce a new, carefully curated, permissively licensed training dataset of code editing tasks coupled with natural language instructions. Using this training dataset, we show that we can fine-tune open Code LLMs to significantly improve their code editing capabilities, closing the gap between open and closed models. All code, data, and models are available at https://github.com/nuprl/CanItEdit.
Published: 2023

10. Deploying and Evaluating LLMs to Program Service Mobile Robots

Author: Hu, Zichao, Lucchetti, Francesca, Schlesinger, Claire, Saxena, Yash, Freeman, Anders, Modak, Sadanand, Guha, Arjun, and Biswas, Joydeep
Subjects: Computer Science - Robotics
Abstract: Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contribute CodeBotler, an open-source robot-agnostic tool to program service mobile robots from natural language, and RoboEval, a benchmark for evaluating LLMs' capabilities of generating programs to complete service robot tasks. CodeBotler performs program generation via few-shot prompting of LLMs with an embedded domain-specific language (eDSL) in Python, and leverages skill abstractions to deploy generated programs on any general-purpose mobile robot. RoboEval evaluates the correctness of generated programs by checking execution traces starting with multiple initial states, and checking whether the traces satisfy temporal logic properties that encode correctness for each task. RoboEval also includes multiple prompts per task to test for the robustness of program generation. We evaluate several popular state-of-the-art LLMs with the RoboEval benchmark, and perform a thorough analysis of the modes of failures, resulting in a taxonomy that highlights common pitfalls of LLMs at generating robot programs. We release our code and benchmark at https://amrl.cs.utexas.edu/codebotler/., Comment: 8 pages, Accepted at IEEE Robotics and Automation Letters (RA-L)
Published: 2023
Full Text: View/download PDF

11. npm-follower: A Complete Dataset Tracking the NPM Ecosystem

Author: Pinckney, Donald, Cassano, Federico, Guha, Arjun, and Bell, Jonathan
Subjects: Computer Science - Software Engineering
Abstract: Software developers typically rely upon a large network of dependencies to build their applications. For instance, the NPM package repository contains over 3 million packages and serves tens of billions of downloads weekly. Understanding the structure and nature of packages, dependencies, and published code requires datasets that provide researchers with easy access to metadata and code of packages. However, prior work on NPM dataset construction typically has two limitations: 1) only metadata is scraped, and 2) packages or versions that are deleted from NPM can not be scraped. Over 330,000 versions of packages were deleted from NPM between July 2022 and May 2023. This data is critical for researchers as it often pertains to important questions of security and malware. We present npm-follower, a dataset and crawling architecture which archives metadata and code of all packages and versions as they are published, and is thus able to retain data which is later deleted. The dataset currently includes over 35 million versions of packages, and grows at a rate of about 1 million versions per month. The dataset is designed to be easily used by researchers answering questions involving either metadata or program analysis. Both the code and dataset are available at https://dependencies.science.
Published: 2023

12. Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Author: Cassano, Federico, Gouwar, John, Lucchetti, Francesca, Schlesinger, Claire, Freeman, Anders, Anderson, Carolyn Jane, Feldman, Molly Q, Greenberg, Michael, Jangda, Abhinav, and Guha, Arjun
Subjects: Computer Science - Programming Languages, Computer Science - Machine Learning
Abstract: Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available. Low resource languages include OCaml, Racket, and several others. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, MultiPL-T, translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize tests for commented code from a high-resource language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate Python code to a target low-resource language, and use tests to validate the translation. We apply this approach to generate tens of thousands of validated training items for Julia, Lua, OCaml, R, and Racket. Furthermore, we use an open model (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket. On established benchmarks (MultiPL-E), these models outperform other open Code LLMs. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.
Published: 2023

13. Continuing WebAssembly with Effect Handlers

Author: Phipps-Costin, Luna, Rossberg, Andreas, Guha, Arjun, Leijen, Daan, Hillerström, Daniel, Sivaramakrishnan, KC, Pretnar, Matija, and Lindley, Sam
Subjects: Computer Science - Programming Languages
Abstract: WebAssembly (Wasm) is a low-level portable code format offering near native performance. It is intended as a compilation target for a wide variety of source languages. However, Wasm provides no direct support for non-local control flow features such as async/await, generators/iterators, lightweight threads, first-class continuations, etc. This means that compilers for source languages with such features must ceremoniously transform whole source programs in order to target Wasm. We present WasmFX, an extension to Wasm which provides a universal target for non-local control features via effect handlers, enabling compilers to translate such features directly into Wasm. Our extension is minimal and only adds three main instructions for creating, suspending, and resuming continuations. Moreover, our primitive instructions are type-safe providing typed continuations which are well-aligned with the design principles of Wasm whose stacks are typed. We present a formal specification of WasmFX and show that the extension is sound. We have implemented WasmFX as an extension to the Wasm reference interpreter and also built a prototype WasmFX extension for Wasmtime, a production-grade Wasm engine, piggybacking on Wasmtime's existing fibers API. The preliminary performance results for our prototype are encouraging, and we outline future plans to realise a native implementation
Published: 2023

14. StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Author: Babe, Hannah McLean, Nguyen, Sydney, Zi, Yangtian, Guha, Arjun, Feldman, Molly Q, and Anderson, Carolyn Jane
Subjects: Computer Science - Machine Learning, Computer Science - Human-Computer Interaction, Computer Science - Software Engineering
Abstract: Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming. Our students wrote these prompts while working interactively with a Code LLM, and we observed very mixed success rates. We use StudentEval to evaluate 5 Code LLMs and find that StudentEval is a better discriminator of model performance than existing benchmarks. We analyze the prompts and find significant variation in students' prompting techniques. We also find that nondeterministic LLM sampling could mislead students into thinking that their prompts are more (or less) effective than they actually are, which has implications for how to teach with Code LLMs.
Published: 2023

15. Type Prediction With Program Decomposition and Fill-in-the-Type Training

Author: Cassano, Federico, Yee, Ming-Ho, Shinn, Noah, Guha, Arjun, and Holtzen, Steven
Subjects: Computer Science - Software Engineering, Computer Science - Machine Learning, Computer Science - Programming Languages
Abstract: TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not fit into the context window, generated types may not type check, and it is difficult to measure how well-typed the output program is. We address these challenges by building OpenTau, a search-based approach for type prediction that leverages large language models. We propose a new metric for type prediction quality, give a tree-based program decomposition that searches a space of generated types, and present fill-in-the-type fine-tuning for LLMs. We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file. All code, data, and models are available at: https://github.com/GammaTauAI/opentau.
Published: 2023

16. StarCoder: may the source be with you!

Author: Li, Raymond, Allal, Loubna Ben, Zi, Yangtian, Muennighoff, Niklas, Kocetkov, Denis, Mou, Chenghao, Marone, Marc, Akiki, Christopher, Li, Jia, Chim, Jenny, Liu, Qian, Zheltonozhskii, Evgenii, Zhuo, Terry Yue, Wang, Thomas, Dehaene, Olivier, Davaadorj, Mishig, Lamy-Poirier, Joel, Monteiro, João, Shliazhko, Oleh, Gontier, Nicolas, Meade, Nicholas, Zebaze, Armel, Yee, Ming-Ho, Umapathi, Logesh Kumar, Zhu, Jian, Lipkin, Benjamin, Oblokulov, Muhtasham, Wang, Zhiruo, Murthy, Rudra, Stillerman, Jason, Patel, Siva Sankalp, Abulkhanov, Dmitry, Zocca, Marco, Dey, Manan, Zhang, Zhihan, Fahmy, Nour, Bhattacharyya, Urvashi, Yu, Wenhao, Singh, Swayam, Luccioni, Sasha, Villegas, Paulo, Kunakov, Maxim, Zhdanov, Fedor, Romero, Manuel, Lee, Tony, Timor, Nadav, Ding, Jennifer, Schlesinger, Claire, Schoelkopf, Hailey, Ebert, Jan, Dao, Tri, Mishra, Mayank, Gu, Alex, Robinson, Jennifer, Anderson, Carolyn Jane, Dolan-Gavitt, Brendan, Contractor, Danish, Reddy, Siva, Fried, Daniel, Bahdanau, Dzmitry, Jernite, Yacine, Ferrandis, Carlos Muñoz, Hughes, Sean, Wolf, Thomas, Guha, Arjun, von Werra, Leandro, and de Vries, Harm
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Programming Languages, Computer Science - Software Engineering
Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.
Published: 2023

17. A Large Scale Analysis of Semantic Versioning in NPM

Author: Pinckney, Donald, Cassano, Federico, Guha, Arjun, and Bell, Jonathan
Subjects: Computer Science - Software Engineering
Abstract: The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a "semantic versioning" ('semver') scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possible, while breaking changes require manual intervention by downstream package maintainers. In order to understand how developers use semver, we build a dataset containing every version of every package on NPM and analyze the flow of updates throughout the ecosystem. We build a time-travelling dependency resolver for NPM, which allows us to determine precisely which versions of each dependency would have been resolved at different times. We segment our analysis to allow for a direct analysis of security-relevant updates (those that introduce or patch vulnerabilities) in comparison to the rest of the ecosystem. We find that when developers use semver correctly, critical updates such as security patches can flow quite rapidly to downstream dependencies in the majority of cases (90.09%), but this does not always occur, due to developers' imperfect use of both semver version constraints and semver version number increments. Our findings have implications for developers and researchers alike. We make our infrastructure and dataset publicly available under an open source license.
Published: 2023

18. Do Machine Learning Models Produce TypeScript Types That Type Check?

Author: Yee, Ming-Ho and Guha, Arjun
Subjects: Computer Science - Software Engineering, Computer Science - Programming Languages
Abstract: Type migration is the process of adding types to untyped code to gain assurance at compile time. TypeScript and other gradual type systems facilitate type migration by allowing programmers to start with imprecise types and gradually strengthen them. However, adding types is a manual effort and several migrations on large, industry codebases have been reported to have taken several years. In the research community, there has been significant interest in using machine learning to automate TypeScript type migration. Existing machine learning models report a high degree of accuracy in predicting individual TypeScript type annotations. However, in this paper we argue that accuracy can be misleading, and we should address a different question: can an automatic type migration tool produce code that passes the TypeScript type checker? We present TypeWeaver, a TypeScript type migration tool that can be used with an arbitrary type prediction model. We evaluate TypeWeaver with three models from the literature: DeepTyper, a recurrent neural network; LambdaNet, a graph neural network; and InCoder, a general-purpose, multi-language transformer that supports fill-in-the-middle tasks. Our tool automates several steps that are necessary for using a type prediction model, (1) importing types for a project's dependencies; (2) migrating JavaScript modules to TypeScript notation; (3) inserting predicted type annotations into the program to produce TypeScript when needed; and (4) rejecting non-type predictions when needed. We evaluate TypeWeaver on a dataset of 513 JavaScript packages, including packages that have never been typed before. With the best type prediction model, we find that only 21% of packages type check, but more encouragingly, 69% of files type check successfully., Comment: Published at the 37th European Conference on Object-Oriented Programming (ECOOP 2023)
Published: 2023
Full Text: View/download PDF

19. SantaCoder: don't reach for the stars!

Author: Allal, Loubna Ben, Li, Raymond, Kocetkov, Denis, Mou, Chenghao, Akiki, Christopher, Ferrandis, Carlos Munoz, Muennighoff, Niklas, Mishra, Mayank, Gu, Alex, Dey, Manan, Umapathi, Logesh Kumar, Anderson, Carolyn Jane, Zi, Yangtian, Poirier, Joel Lamy, Schoelkopf, Hailey, Troshin, Sergey, Abulkhanov, Dmitry, Romero, Manuel, Lappert, Michael, De Toni, Francesco, del Río, Bernardo García, Liu, Qian, Bose, Shamik, Bhattacharyya, Urvashi, Zhuo, Terry Yue, Yu, Ian, Villegas, Paulo, Zocca, Marco, Mangrulkar, Sourab, Lansky, David, Nguyen, Huu, Contractor, Danish, Villa, Luis, Li, Jia, Bahdanau, Dzmitry, Jernite, Yacine, Hughes, Sean, Fried, Daniel, Guha, Arjun, de Vries, Harm, and von Werra, Leandro
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode.
Published: 2023

20. MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Author: Cassano, Federico, Gouwar, John, Nguyen, Daniel, Nguyen, Sydney, Phipps-Costin, Luna, Pinckney, Donald, Yee, Ming-Ho, Zi, Yangtian, Anderson, Carolyn Jane, Feldman, Molly Q, Guha, Arjun, Greenberg, Michael, and Jangda, Abhinav
Subjects: Computer Science - Machine Learning, Computer Science - Programming Languages
Abstract: Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities with other languages. We propose MultiPL-E, a system for translating unit test-driven code generation benchmarks to new languages. We create the first massively multilingual code generation benchmark by using MultiPL-E to translate two popular Python code generation benchmarks to 18 additional programming languages. We use MultiPL-E to extend the HumanEval benchmark and MBPP benchmark to 18 languages that encompass a range of programming paradigms and popularity. Using these new parallel benchmarks, we evaluate the multi-language performance of three state-of-the-art code generation models: Codex, CodeGen, and InCoder. We find that Codex matches or even exceeds its performance on Python for several other languages. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible, making it straightforward to evaluate new models, benchmarks, and languages.
Published: 2022

21. Flexible and Optimal Dependency Management via Max-SMT

Author: Pinckney, Donald, Cassano, Federico, Guha, Arjun, Bell, Jon, Culpo, Massimiliano, and Gamblin, Todd
Subjects: Computer Science - Software Engineering
Abstract: Package managers such as NPM have become essential for software development. The NPM repository hosts over 2 million packages and serves over 43 billion downloads every week. Unfortunately, the NPM dependency solver has several shortcomings. 1) NPM is greedy and often fails to install the newest versions of dependencies; 2) NPM's algorithm leads to duplicated dependencies and bloated code, which is particularly bad for web applications that need to minimize code size; 3) NPM's vulnerability fixing algorithm is also greedy, and can even introduce new vulnerabilities; and 4) NPM's ability to duplicate dependencies can break stateful frameworks and requires a lot of care to workaround. Although existing tools try to address these problems they are either brittle, rely on post hoc changes to the dependency tree, do not guarantee optimality, or are not composable. We present PacSolve, a unifying framework and implementation for dependency solving which allows for customizable constraints and optimization goals. We use PacSolve to build MaxNPM, a complete, drop-in replacement for NPM, which empowers developers to combine multiple objectives when installing dependencies. We evaluate MaxNPM with a large sample of packages from the NPM ecosystem and show that it can: 1) reduce more vulnerabilities in dependencies than NPM's auditing tool in 33% of cases; 2) chooses newer dependencies than NPM in 14% of cases; and 3) chooses fewer dependencies than NPM in 21% of cases. All our code and data is open and available.
Published: 2022

22. Neuroepithelial bodies and terminal bronchioles are niches for distinctive club cells that repair the airways following acute notch inhibition

Author: Lingamallu, Sai Manoz, Deshpande, Aditya, Joy, Neenu, Ganeshan, Kirthana, Ray, Neelanjana, Ladher, Rajesh Kumar, Taketo, Makoto Mark, Lafkas, Daniel, and Guha, Arjun
Published: 2024
Full Text: View/download PDF

23. Solver-based Gradual Type Migration

Author: Phipps-Costin, Luna, Anderson, Carolyn Jane, Greenberg, Michael, and Guha, Arjun
Subjects: Computer Science - Programming Languages
Abstract: Gradually typed languages allow programmers to mix statically and dynamically typed code, enabling them to incrementally reap the benefits of static typing as they add type annotations to their code. However, this type migration process is typically a manual effort with limited tool support. This paper examines the problem of \emph{automated type migration}: given a dynamic program, infer additional or improved type annotations. Existing type migration algorithms prioritize different goals, such as maximizing type precision, maintaining compatibility with unmigrated code, and preserving the semantics of the original program. We argue that the type migration problem involves fundamental compromises: optimizing for a single goal often comes at the expense of others. Ideally, a type migration tool would flexibly accommodate a range of user priorities. We present TypeWhich, a new approach to automated type migration for the gradually-typed lambda calculus with some extensions. Unlike prior work, which relies on custom solvers, TypeWhich produces constraints for an off-the-shelf MaxSMT solver. This allows us to easily express objectives, such as minimizing the number of necessary syntactic coercions, and constraining the type of the migration to be compatible with unmigrated code. We present the first comprehensive evaluation of GTLC type migration algorithms, and compare TypeWhich to four other tools from the literature. Our evaluation uses prior benchmarks, and a new set of ``challenge problems.'' Moreover, we design a new evaluation methodology that highlights the subtleties of gradual type migration. In addition, we apply TypeWhich to a suite of benchmarks for Grift, a programming language based on the GTLC. TypeWhich is able to reconstruct all human-written annotations on all but one program.
Published: 2021

24. Iterative Program Synthesis for Adaptable Social Navigation

Author: Holtz, Jarrett, Andrews, Simon, Guha, Arjun, and Biswas, Joydeep
Subjects: Computer Science - Robotics, Computer Science - Programming Languages
Abstract: Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability. State-of-the-art approaches to social navigation fall into two categories: model-based social constraints and learning-based approaches. While effective, these approaches have fundamental limitations -- model-based approaches require constraint and parameter tuning to adapt to preferences and new scenarios, while learning-based approaches require reward functions, significant training data, and are hard to adapt to new social scenarios or new domains with limited demonstrations. In this work, we propose Iterative Dimension Informed Program Synthesis (IDIPS) to address these limitations by learning and adapting social navigation in the form of human-readable symbolic programs. IDIPS works by combining program synthesis, parameter optimization, predicate repair, and iterative human demonstration to learn and adapt model-free action selection policies from orders of magnitude less data than learning-based approaches. We introduce a novel predicate repair technique that can accommodate previously unseen social scenarios or preferences by growing existing policies. We present experimental results showing that IDIPS: 1) synthesizes effective policies that model user preference, 2) can adapt existing policies to changing preferences, 3) can extend policies to handle novel social scenarios such as locked doors, and 4) generates policies that can be transferred from simulation to real-world robots with minimal effort., Comment: IROS 2021
Published: 2021

25. Wasm/k: Delimited Continuations for WebAssembly

Author: Pinckney, Donald, Guha, Arjun, and Brun, Yuriy
Subjects: Computer Science - Programming Languages
Abstract: WebAssembly is designed to be an alternative to JavaScript that is a safe, portable, and efficient compilation target for a variety of languages. The performance of high-level languages depends not only on the underlying performance of WebAssembly, but also on the quality of the generated WebAssembly code. In this paper, we identify several features of high-level languages that current approaches can only compile to WebAssembly by generating complex and inefficient code. We argue that these problems could be addressed if WebAssembly natively supported first-class continuations. We then present Wasm/k, which extends WebAssembly with delimited continuations. Wasm/k introduces no new value types, and thus does not require significant changes to the WebAssembly type system (validation). Wasm/k is safe, even in the presence of foreign function calls (e.g., to and from JavaScript). Finally, Wasm/k is amenable to efficient implementation: we implement Wasm/k as a local change to Wasmtime, an existing WebAssembly JIT. We evaluate Wasm/k by implementing C/k, which adds delimited continuations to C/C++. C/k uses Emscripten and its implementation serves as a case study on how to use Wasm/k in a compiler that targets WebAssembly. We present several case studies using C/k, and show that on implementing green threads, it can outperform the state-of-the-art approach Asyncify with an 18% improvement in performance and a 30% improvement in code size.
Published: 2020
Full Text: View/download PDF

26. Accelerating Graph Sampling for Graph Machine Learning using GPUs

Author: Jangda, Abhinav, Polisetty, Sandeep, Guha, Arjun, and Serafini, Marco
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and GraphSAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling. Sampling is an embarrassingly parallel problem and may appear to lend itself to GPU acceleration, but the irregularity of graphs makes it hard to use GPU resources effectively. This paper presents NextDoor, a system designed to effectively perform graph sampling on GPUs. NextDoor employs a new approach to graph sampling that we call transit-parallelism, which allows load balancing and caching of edges. NextDoor provides end-users with a high-level abstraction for writing a variety of graph sampling algorithms. We implement several graph sampling applications, and show that NextDoor runs them orders of magnitude faster than existing systems., Comment: Published in EuroSys 2021
Published: 2020

27. Robot Action Selection Learning via Layered Dimension Informed Program Synthesis

Author: Holtz, Jarrett, Guha, Arjun, and Biswas, Joydeep
Subjects: Computer Science - Artificial Intelligence, Computer Science - Programming Languages, Computer Science - Robotics
Abstract: Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art. Such a paradigm, while very effective, suffers from a few key problems: 1) NNs are opaque to the user and hence not amenable to verification, 2) they require significant amounts of training data, and 3) they are hard to repair when the domain changes. We present two key insights about ASPs for robotics. First, ASPs need to reason about physically meaningful quantities derived from the state of the world, and second, there exists a layered structure for composing these policies. Leveraging these insights, we introduce layered dimension-informed program synthesis (LDIPS) - by reasoning about the physical dimensions of state variables, and dimensional constraints on operators, LDIPS directly synthesizes ASPs in a human-interpretable domain-specific language that is amenable to program repair. We present empirical results to demonstrate that LDIPS 1) can synthesize effective ASPs for robot soccer and autonomous driving domains, 2) requires two orders of magnitude fewer training examples than a comparable NN representation, and 3) can repair the synthesized ASPs with only a small number of corrections when transferring from simulation to real robots.
Published: 2020

28. SMT-based Robot Transition Repair

Author: Holtz, Jarrett, Guha, Arjun, and Biswas, Joydeep
Subjects: Computer Science - Robotics
Abstract: State machines are a common model for robot behaviors. Transition functions often rely on parameterized conditions to model preconditions for the controllers, where the correct values of the parameters depend on factors relating to the environment or the specific robot. In the absence of specific calibration procedures a roboticist must painstakingly adjust the parameters through a series of trial and error experiments. In this process, identifying when the robot has taken an incorrect action, and what should be done is straightforward, but finding the right parameter values can be difficult. We present an alternative approach that we call, interactive SMT-based Robot Transition Repair. During execution we record an execution trace of the transition function, and we ask the roboticist to identify a few instances where the robot has transitioned incorrectly, and what the correct transition should have been. A user supplies these corrections based on the type of error to repair, and an automated analysis of the traces partially evaluates the transition function for each correction. This system of constraints is then formulated as a MaxSMT problem, where the solution is a minimal adjustment to the parameters that satisfies the maximum number of constraints. In order to identify a repair that accurately captures user intentions and generalizes to novel scenarios, solutions are explored by iteratively adding constraints to the MaxSMT problem to yield sets of alternative repairs. We test with state machines from multiple domains including robot soccer and autonomous driving, and we evaluate solver based repair with respect to solver choice and optimization hyperparameters. Our results demonstrate that SRTR can repair a variety of states machines and error types 1) quickly, 2) with small numbers of corrections, while 3) not overcorrecting state machines and harming generalized performance., Comment: In submission to AIJ. arXiv admin note: text overlap with arXiv:1802.01706
Published: 2020

29. A Language-based Serverless Function Accelerator

Author: Herbert, Emily and Guha, Arjun
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Programming Languages
Abstract: Serverless computing is an approach to cloud computing that allows programmers to run serverless functions in response to external events. Serverless functions are priced at sub-second granularity, support transparent elasticity, and relieve programmers from managing the operating system. Thus serverless functions allow programmers to focus on writing application code, and the cloud provider to manage computing resources globally. Unfortunately, today's serverless platforms exhibit high latency, because it is difficult to maximize resource utilization while minimizing operating costs. This paper presents serverless function acceleration, which is an approach that transparently lowers the latency and resource utilization of a large class of serverless functions. We accomplish this using language-based sandboxing, whereas existing serverless platforms employ more expensive operating system sandboxing technologies, such as containers and virtual machines. OS-based sandboxing is compatible with more programs than language-based techniques. However, instead of ruling out any programs, we use language-based sandboxing when possible, and OS-based sandboxing if necessary. Moreover, we seamlessly transition between language-based and OS-based sandboxing by leveraging the fact that serverless functions must tolerate re-execution for fault tolerance. Therefore, when a serverless function attempts to perform an unsupported operation in the language-based sandbox, we can safely re-execute it in a container. We use a new approach to trace compilation to build source-level, interprocedural, execution trace trees for serverless functions written in JavaScript. We compile trace trees to a safe subset of Rust, validate the compiler output, and link it to a runtime system. We evaluate these techniques in our implementation, which we call Containerless.
Published: 2019

30. Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUs

Author: Jangda, Abhinav and Guha, Arjun
Subjects: Computer Science - Programming Languages, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Domain-specific languages that execute image processing pipelineson GPUs, such as Halide and Forma, operate by 1) dividing the image into overlapped tiles, and 2) fusing loops to improve memory locality. However, current approaches have limitations: 1) they require intra thread block synchronization, which has a non-trivial cost, 2) they must choose between small tiles that require more overlapped computations or large tiles that increase shared memory access (and lowers occupancy), and 3) their autoscheduling algorithms use simplified GPU models that can result in inefficient global memory accesses. We present a new approach for executing image processing pipelines on GPUs that addresses these limitations as follows. 1) We fuse loops to form overlapped tiles that fit in a single warp, which allows us to use lightweight warp synchronization. 2) We introduce hybrid tiling, which stores overlapped regions in a combination of thread-local registers and shared memory. Thus hybrid tiling either increases occupancy by decreasing shared memory usage or decreases overlapping computations using larger tiles. 3) We present an automatic loop fusion algorithm that considers several factors that affect the performance of GPU kernels. We implement these techniques in PolyMage-GPU, which is a new GPU backend for PolyMage. Our approach produces code that is faster than Halide's manual schedules: 1.65x faster on an NVIDIA GTX 1080Ti and 1.33 faster on an NVIDIA Tesla V100.
Published: 2019

31. Automatic Failure Recovery for End-User Programs on Service Mobile Robots

Author: Hammond, Jenna Claire, Biswas, Joydeep, and Guha, Arjun
Subjects: Computer Science - Robotics
Abstract: For service mobile robots to be most effective, it must be possible for non-experts and even end-users to program them to do new tasks. Regardless of the programming method (e.g., by demonstration or traditional programming), robot task programs are challenging to write, because they rely on multiple actions to succeed, including human-robot interactions. Unfortunately, interactions are prone to fail, because a human may perform the wrong action (e.g., if the robot's request is not clear). Moreover, when the robot cannot directly observe the human action, it may not detect the failure until several steps after it occurs. Therefore, writing fault-tolerant robot tasks is beyond the ability of non-experts. This paper presents a principled approach to detect and recover from a broad class of failures that occur in end-user programs on service mobile robots. We present a two-tiered Robot Task Programming Language (RTPL): 1) an expert roboticist uses a specification language to write a probabilistic model of the robot's actions and interactions, and 2) a non-expert then writes an ordinary sequential program for a particular task. The RTPL runtime system executes the task program sequentially, while using the probabilistic model to build a Bayesian network that tracks possible, unobserved failures. If an error is observed, RTPL uses Bayesian inference to find the likely root cause of the error, and then attempts to re-execute a portion of the program for recovery. Our empirical results show that RTPL 1) allows complex tasks to be written concisely, 2) correctly identifies the root cause of failure, and 3) allows multiple tasks to recover from a variety of errors, without task-specific error-recovery code.
Published: 2019

32. Making High-Performance Robots Safe and Easy to Use for an Introduction to Computing

Author: Spitzer, Joseph, Biswas, Joydeep, and Guha, Arjun
Subjects: Computer Science - Computers and Society, Computer Science - Programming Languages, Computer Science - Robotics
Abstract: Robots are a popular platform for introducing computing and artificial intelligence to novice programmers. However, programming state-of-the-art robots is very challenging, and requires knowledge of concurrency, operation safety, and software engineering skills, which can take years to teach. In this paper, we present an approach to introducing computing that allows students to safely and easily program high-performance robots. We develop a platform for students to program RoboCup Small Size League robots using JavaScript. The platform 1) ensures physical safety at several levels of abstraction, 2) allows students to program robots using the JavaScript in the browser, without the need to install software, and 3) presents a simplified JavaScript semantics that shields students from confusing language features. We discuss our experience running a week-long workshop using this platform, and analyze over 3,000 student-written program revisions to provide empirical evidence that our approach does help students., Comment: 8 pages, 7 figures, 4 tables
Published: 2019

33. Formal Foundations of Serverless Computing

Author: Jangda, Abhinav, Pinckney, Donald, Brun, Yuriy, and Guha, Arjun
Subjects: Computer Science - Programming Languages
Abstract: Serverless computing (also known as functions as a service) is a new cloud computing abstraction that makes it easier to write robust, large-scale web services. In serverless computing, programmers write what are called serverless functions, and the cloud platform transparently manages the operating system, resource allocation, load-balancing, and fault tolerance. When demand for the service spikes, the platform automatically allocates additional hardware to the service and manages load-balancing; when demand falls, the platform silently deallocates idle resources; and when the platform detects a failure, it transparently retries affected requests. In 2014, Amazon Web Services introduced the first serverless platform, AWS Lambda, and similar abstractions are now available on all major cloud computing platforms. Unfortunately, the serverless computing abstraction exposes several low-level operational details that make it hard for programmers to write and reason about their code. This paper sheds light on this problem by presenting $\lambda_\Lambda$, an operational semantics of the essence of serverless computing. Despite being a small (half a page) core calculus, $\lambda_\Lambda$ models all the low-level details that serverless functions can observe. To show that $\lambda_\Lambda$ is useful, we present three applications. First, to ease reasoning about code, we present a simplified naive semantics of serverless execution and precisely characterize when the naive semantics and $\lambda_\Lambda$ coincide. Second, we augment $\lambda_\Lambda$ with a key-value store to allow reasoning about stateful serverless functions. Third, since a handful of serverless platforms support serverless function composition, we show how to extend $\lambda_\Lambda$ with a composition language. We have implemented this composition language and show that it outperforms prior work.
Published: 2019
Full Text: View/download PDF

34. Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code

Author: Jangda, Abhinav, Powers, Bobby, Berger, Emery, and Guha, Arjun
Subjects: Computer Science - Programming Languages
Abstract: All major web browsers now support WebAssembly, a low-level bytecode intended to serve as a compilation target for code written in languages like C and C++. A key goal of WebAssembly is performance parity with native code; previous work reports near parity, with many applications compiled to WebAssembly running on average 10% slower than native code. However, this evaluation was limited to a suite of scientific kernels, each consisting of roughly 100 lines of code. Running more substantial applications was not possible because compiling code to WebAssembly is only part of the puzzle: standard Unix APIs are not available in the web browser environment. To address this challenge, we build Browsix-Wasm, a significant extension to Browsix that, for the first time, makes it possible to run unmodified WebAssembly-compiled Unix applications directly inside the browser. We then use Browsix-Wasm to conduct the first large-scale evaluation of the performance of WebAssembly vs. native. Across the SPEC CPU suite of benchmarks, we find a substantial performance gap: applications compiled to WebAssembly run slower by an average of 45% (Firefox) to 55% (Chrome), with peak slowdowns of 2.08x (Firefox) and 2.5x (Chrome). We identify the causes of this performance degradation, some of which are due to missing optimizations and code generation issues, while others are inherent to the WebAssembly platform., Comment: Accepted (to appear) at USENIX Annual Technical Conference 2019
Published: 2019
Full Text: View/download PDF

35. Putting in All the Stops: Execution Control for JavaScript

Author: Baxter, Samuel, Nigam, Rachit, Politz, Joe Gibbs, Krishnamurthi, Shriram, and Guha, Arjun
Subjects: Computer Science - Programming Languages
Abstract: Scores of compilers produce JavaScript, enabling programmers to use many languages on the Web, reuse existing code, and even use Web IDEs. Unfortunately, most compilers inherit the browser's compromised execution model, so long-running programs freeze the browser tab, infinite loops crash IDEs, and so on. The few compilers that avoid these problems suffer poor performance and are difficult to engineer. This paper presents Stopify, a source-to-source compiler that extends JavaScript with debugging abstractions and blocking operations, and easily integrates with existing compilers. We apply Stopify to 10 programming languages and develop a Web IDE that supports stopping, single-stepping, breakpointing, and long-running computations. For nine languages, Stopify requires no or trivial compiler changes. For eight, our IDE is the first that provides these features. Two of our subject languages have compilers with similar features. Stopify's performance is competitive with these compilers and it makes them dramatically simpler. Stopify's abstractions rely on first-class continuations, which it provides by compiling JavaScript to JavaScript. We also identify sub-languages of JavaScript that compilers implicitly use, and exploit these to improve performance. Finally, Stopify needs to repeatedly interrupt and resume program execution. We use a sampling-based technique to estimate program speed that outperforms other systems., Comment: In proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 2018
Published: 2018

36. Interactive Robot Transition Repair With SMT

Author: Holtz, Jarrett, Guha, Arjun, and Biswas, Joydeep
Subjects: Computer Science - Robotics, Computer Science - Programming Languages
Abstract: Complex robot behaviors are often structured as state machines, where states encapsulate actions and a transition function switches between states. Since transitions depend on physical parameters, when the environment changes, a roboticist has to painstakingly readjust the parameters to work in the new environment. We present interactive SMT-based Robot Transition Repair (SRTR): instead of manually adjusting parameters, we ask the roboticist to identify a few instances where the robot is in a wrong state and what the right state should be. A lightweight automated analysis of the transition function's source code then 1) identifies adjustable parameters, 2) converts the transition function into a system of logical constraints, and 3) formulates the constraints and user-supplied corrections as MaxSMT problem that yields new parameter values. Our evaluation shows that SRTR is effective on real robots and in simulation. We show that SRTR finds new parameters 1) quickly, 2) with only a few corrections, and 3) that the parameters generalize to new scenarios. We also show that a simple state machine corrected by SRTR can out-perform a more complex, expert-tuned state machine in the real world., Comment: International Joint Conference on Artificial Intelligence (IJCAI), 2018
Published: 2018

37. Tortoise: Interactive System Configuration Repair

Author: Weiss, Aaron, Guha, Arjun, and Brun, Yuriy
Subjects: Computer Science - Software Engineering
Abstract: System configuration languages provide powerful abstractions that simplify managing large-scale, networked systems. Thousands of organizations now use configuration languages, such as Puppet. However, specifications written in configuration languages can have bugs and the shell remains the simplest way to debug a misconfigured system. Unfortunately, it is unsafe to use the shell to fix problems when a system configuration language is in use: a fix applied from the shell may cause the system to drift from the state specified by the configuration language. Thus, despite their advantages, configuration languages force system administrators to give up the simplicity and familiarity of the shell. This paper presents a synthesis-based technique that allows administrators to use configuration languages and the shell in harmony. Administrators can fix errors using the shell and the technique automatically repairs the higher-level specification written in the configuration language. The approach (1) produces repairs that are consistent with the fix made using the shell; (2) produces repairs that are maintainable by minimizing edits made to the original specification; (3) ranks and presents multiple repairs when relevant; and (4) supports all shells the administrator may wish to use. We implement our technique for Puppet, a widely used system configuration language, and evaluate it on a suite of benchmarks under 42 repair scenarios. The top-ranked repair is selected by humans 76% of the time and the human-equivalent repair is ranked 1.31 on average., Comment: Published version in proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE) 2017
Published: 2017
Full Text: View/download PDF

38. How Beginning Programmers and Code LLMs (Mis)read Each Other

Author: Nguyen, Sydney, primary, Babe, Hannah McLean, additional, Zi, Yangtian, additional, Guha, Arjun, additional, Anderson, Carolyn Jane, additional, and Feldman, Molly Q, additional
Published: 2024
Full Text: View/download PDF

39. Activation Steering for Robust Type Prediction in CodeLLMs

Author: Lucchetti, Francesca, Guha, Arjun, Lucchetti, Francesca, and Guha, Arjun
Abstract: Contemporary LLMs pretrained on code are capable of succeeding at a wide variety of programming tasks. However, their performance is very sensitive to syntactic features, such as the names of variables and types, the structure of code, and presence of type hints. We contribute an inference-time technique to make CodeLLMs more robust to syntactic distractors that are semantically irrelevant. Our methodology relies on activation steering, which involves editing internal model activations to steer the model towards the correct prediction. We contribute a novel way to construct steering vectors by taking inspiration from mutation testing, which constructs minimal semantics-breaking code edits. In contrast, we construct steering vectors from semantics-preserving code edits. We apply our approach to the task of type prediction for the gradually typed languages Python and TypeScript. This approach corrects up to 90% of type mispredictions. Finally, we show that steering vectors calculated from Python activations reliably correct type mispredictions in TypeScript, and vice versa. This result suggests that LLMs may be learning to transfer knowledge of types across programming languages., Comment: 16 pages, 7 figures
Published: 2024

40. The Essence of JavaScript

Author: Guha, Arjun, Saftoiu, Claudiu, and Krishnamurthi, Shriram
Subjects: Computer Science - Programming Languages
Abstract: We reduce JavaScript to a core calculus structured as a small-step operational semantics. We present several peculiarities of the language and show that our calculus models them. We explicate the desugaring process that turns JavaScript programs into ones in the core. We demonstrate faithfulness to JavaScript using real-world test suites. Finally, we illustrate utility by defining a security property, implementing it as a type system on the core, and extending it to the full language., Comment: European Conference on Object-Oriented Programming (ECOOP) 2010
Published: 2015
Full Text: View/download PDF

41. Rehearsal: A Configuration Verification Tool for Puppet

Author: Shambaugh, Rian, Weiss, Aaron, and Guha, Arjun
Subjects: Computer Science - Programming Languages, F.3.1
Abstract: Large-scale data centers and cloud computing have turned system configuration into a challenging problem. Several widely-publicized outages have been blamed not on software bugs, but on configuration bugs. To cope, thousands of organizations use system configuration languages to manage their computing infrastructure. Of these, Puppet is the most widely used with thousands of paying customers and many more open-source users. The heart of Puppet is a domain-specific language that describes the state of a system. Puppet already performs some basic static checks, but they only prevent a narrow range of errors. Furthermore, testing is ineffective because many errors are only triggered under specific machine states that are difficult to predict and reproduce. With several examples, we show that a key problem with Puppet is that configurations can be non-deterministic. This paper presents Rehearsal, a verification tool for Puppet configurations. Rehearsal implements a sound, complete, and scalable determinacy analysis for Puppet. To develop it, we (1) present a formal semantics for Puppet, (2) use several analyses to shrink our models to a tractable size, and (3) frame determinism-checking as decidable formulas for an SMT solver. Rehearsal then leverages the determinacy analysis to check other important properties, such as idempotency. Finally, we apply Rehearsal to several real-world Puppet configurations., Comment: In proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 2016
Published: 2015

42. Morpheus: Safe and Flexible Dynamic Updates for SDNs

Author: Saur, Karla, Collard, Joseph, Foster, Nate, Guha, Arjun, Vanbever, Laurent, and Hicks, Michael
Subjects: Computer Science - Networking and Internet Architecture
Abstract: SDN controllers must be periodically modified to add features, improve performance, and fix bugs, but current techniques for implementing dynamic updates are inadequate. Simply halting old controllers and bringing up new ones can cause state to be lost, which often leads to incorrect behavior-e.g., if the state represents hosts blacklisted by a firewall, then traffic that should be blocked may be allowed to pass through. Techniques based on record and replay can reconstruct state automatically, but they are expensive to deploy and can lead to incorrect behavior. Problematic scenarios are especially likely to arise in distributed controllers and with semantics-altering updates. This paper presents a new approach to implementing dynamic controller updates based on explicit state transfer. Instead of attempting to infer state changes automatically-an approach that is expensive and fundamentally incomplete-our framework gives programmers effective tools for implementing correct updates that avoid major disruptions. We develop primitives that enable programmers to directly (and easily, in most cases) initialize the new controller's state as a function of old state and we design protocols that ensure consistent behavior during the transition. We also present a prototype implementation called Morpheus, and evaluate its effectiveness on representative case studies.
Published: 2015

43. ADsafety: Type-Based Verification of JavaScript Sandboxing

Author: Politz, Joe Gibbs, Eliopoulos, Spiridon, Guha, Arjun, and Krishnamurthi, Shriram
Subjects: Computer Science - Programming Languages
Abstract: Web sites routinely incorporate JavaScript programs from several sources into a single page. These sources must be protected from one another, which requires robust sandboxing. The many entry-points of sandboxes and the subtleties of JavaScript demand robust verification of the actual sandbox source. We use a novel type system for JavaScript to encode and verify sandboxing properties. The resulting verifier is lightweight and efficient, and operates on actual source. We demonstrate the effectiveness of our technique by applying it to ADsafe, which revealed several bugs and other weaknesses., Comment: in Proceedings of the USENIX Security Symposium (2011)
Published: 2015

44. A Fast Compiler for NetKAT

Author: Smolka, Steffen, Eliopoulos, Spiridon, Foster, Nate, and Guha, Arjun
Subjects: Computer Science - Programming Languages, D.3.4
Abstract: High-level programming languages play a key role in a growing number of networking platforms, streamlining application development and enabling precise formal reasoning about network behavior. Unfortunately, current compilers only handle "local" programs that specify behavior in terms of hop-by-hop forwarding behavior, or modest extensions such as simple paths. To encode richer "global" behaviors, programmers must add extra state -- something that is tricky to get right and makes programs harder to write and maintain. Making matters worse, existing compilers can take tens of minutes to generate the forwarding state for the network, even on relatively small inputs. This forces programmers to waste time working around performance issues or even revert to using hardware-level APIs. This paper presents a new compiler for the NetKAT language that handles rich features including regular paths and virtual networks, and yet is several orders of magnitude faster than previous compilers. The compiler uses symbolic automata to calculate the extra state needed to implement "global" programs, and an intermediate representation based on binary decision diagrams to dramatically improve performance. We describe the design and implementation of three essential compiler stages: from virtual programs (which specify behavior in terms of virtual topologies) to global programs (which specify network-wide behavior in terms of physical topologies), from global programs to local programs (which specify behavior in terms of single-switch behavior), and from local programs to hardware-level forwarding tables. We present results from experiments on real-world benchmarks that quantify performance in terms of compilation time and forwarding table size.
Published: 2015
Full Text: View/download PDF

45. Deploying and Evaluating LLMs to Program Service Mobile Robots

Author: Hu, Zichao, primary, Lucchetti, Francesca, additional, Schlesinger, Claire, additional, Saxena, Yash, additional, Freeman, Anders, additional, Modak, Sadanand, additional, Guha, Arjun, additional, and Biswas, Joydeep, additional
Published: 2024
Full Text: View/download PDF

46. NEUROEPITHELIAL BODIES AND TERMINAL BRONCHIOLES ARE NICHES FOR DISTINCTIVE CLUB CELLS THAT CAN REPAIR AIRWAYS FOLLOWING ACUTE NOTCH INHIBITION

Author: Lingamallu, Sai Manoz, primary, Deshpande, Aditya, additional, Joy, Neenu, additional, Ganeshan, Kirthana, additional, Lafkas, Daniel, additional, and Guha, Arjun, additional
Published: 2023
Full Text: View/download PDF

47. Continuing WebAssembly with Effect Handlers

Author: Phipps-Costin, Luna, primary, Rossberg, Andreas, additional, Guha, Arjun, additional, Leijen, Daan, additional, Hillerström, Daniel, additional, Sivaramakrishnan, KC, additional, Pretnar, Matija, additional, and Lindley, Sam, additional
Published: 2023
Full Text: View/download PDF

48. A Tool for Mutation Analysis in Racket

Author: Zhuang, Bambi, primary, Perretta, James, additional, Guha, Arjun, additional, and Bell, Jonathan, additional
Published: 2023
Full Text: View/download PDF

49. Do Machine Learning Models Produce TypeScript Types That Type Check?

Author: Ming-Ho Yee and Arjun Guha, Yee, Ming-Ho, Guha, Arjun, Ming-Ho Yee and Arjun Guha, Yee, Ming-Ho, and Guha, Arjun
Abstract: Type migration is the process of adding types to untyped code to gain assurance at compile time. TypeScript and other gradual type systems facilitate type migration by allowing programmers to start with imprecise types and gradually strengthen them. However, adding types is a manual effort and several migrations on large, industry codebases have been reported to have taken several years. In the research community, there has been significant interest in using machine learning to automate TypeScript type migration. Existing machine learning models report a high degree of accuracy in predicting individual TypeScript type annotations. However, in this paper we argue that accuracy can be misleading, and we should address a different question: can an automatic type migration tool produce code that passes the TypeScript type checker? We present TypeWeaver, a TypeScript type migration tool that can be used with an arbitrary type prediction model. We evaluate TypeWeaver with three models from the literature: DeepTyper, a recurrent neural network; LambdaNet, a graph neural network; and InCoder, a general-purpose, multi-language transformer that supports fill-in-the-middle tasks. Our tool automates several steps that are necessary for using a type prediction model, including (1) importing types for a project’s dependencies; (2) migrating JavaScript modules to TypeScript notation; (3) inserting predicted type annotations into the program to produce TypeScript when needed; and (4) rejecting non-type predictions when needed. We evaluate TypeWeaver on a dataset of 513 JavaScript packages, including packages that have never been typed before. With the best type prediction model, we find that only 21% of packages type check, but more encouragingly, 69% of files type check successfully.
Published: 2023
Full Text: View/download PDF

50. Do Machine Learning Models Produce TypeScript Types That Type Check? (Artifact)

Author: Ming-Ho Yee and Arjun Guha, Yee, Ming-Ho, Guha, Arjun, Ming-Ho Yee and Arjun Guha, Yee, Ming-Ho, and Guha, Arjun
Abstract: Type migration is the process of adding types to untyped code to gain assurance at compile time. TypeScript and other gradual type systems facilitate type migration by allowing programmers to start with imprecise types and gradually strengthen them. However, adding types is a manual effort and several migrations on large, industry codebases have been reported to have taken several years. In the research community, there has been significant interest in using machine learning to automate TypeScript type migration. Existing machine learning models report a high degree of accuracy in predicting individual TypeScript type annotations. However, in this paper we argue that accuracy can be misleading, and we should address a different question: can an automatic type migration tool produce code that passes the TypeScript type checker? We present TypeWeaver, a TypeScript type migration tool that can be used with an arbitrary type prediction model. We evaluate TypeWeaver with three models from the literature: DeepTyper, a recurrent neural network; LambdaNet, a graph neural network; and InCoder, a general-purpose, multi-language transformer that supports fill-in-the-middle tasks. Our tool automates several steps that are necessary for using a type prediction model, including (1) importing types for a project’s dependencies; (2) migrating JavaScript modules to TypeScript notation; (3) inserting predicted type annotations into the program to produce TypeScript when needed; and (4) rejecting non-type predictions when needed. We evaluate TypeWeaver on a dataset of 513 JavaScript packages, including packages that have never been typed before. With the best type prediction model, we find that only 21% of packages type check, but more encouragingly, 69% of files type check successfully.
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

251 results on '"Guha, Arjun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources