"N-gram language models" / Journal: icse: international conference on software engineering / Publisher: association for computing machinery - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"N-gram language models"' showing total 11 results

Start Over "N-gram language models" Journal icse: international conference on software engineering Publisher association for computing machinery

11 results on '"N-gram language models"'

1. Code Prediction by Feeding Trees to Transformers.

Author: Seohyun Kim, Jinman Zhao, Yuchi Tian, and Chandra, Satish
Subjects: MACHINE learning, PYTHON programming language, ARTIFICIAL intelligence, SOFTWARE engineering, COMPUTER software development
Abstract: Code prediction, more specifically autocomplete, has become an essential feature in modern IDEs. Autocomplete is more effective when the desired next token is at (or close to) the top of the list of potential completions offered by the IDE at cursor position. This is where the strength of the underlying machine learning system that produces a ranked order of potential completions comes into play. We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. Our work uses Transformers as the base neural architecture. We show that by making the Transformer architecture aware of the syntactic structure of code, we increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of several state-of-the-art next token prediction systems by margins ranging from 14% to 18%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a company internal Python corpus. Our code and data preparation pipeline will be available in open source. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

2. Collaborative Bug Finding for Android Apps.

Author: TAN, SHIN HWEI and ZIQIANG LI
Subjects: MOBILE apps, NATURAL language processing, RECOMMENDER systems, SOFTWARE engineering, COMPUTER software development
Abstract: Many automated test generation techniques have been proposed for finding crashes in Android apps. Despite recent advancement in these approaches, a study shows that Android app developers prefer reading test cases written in natural language. Meanwhile, there exist redundancies in bug reports (written in natural language) across different apps that have not been previously reused. We propose collaborative bug finding, a novel approach that uses bugs in other similar apps to discover bugs in the app under test. We design three settings with varying degrees of interactions between programmers: (1) bugs from programmers who develop a different app, (2) bugs from manually searching for bug reports in GitHub repositories, (3) bugs from a bug recommendation system, Bugine. Our studies of the first two settings in a software testing course show that collaborative bug finding helps students who are novice Android app testers to discover 17 new bugs. As students admit that searching for relevant bug reports could be time-consuming, we introduce Bugine, an approach that automatically recommends relevant GitHub issues for a given app. Bugine uses (1) natural language processing to find GitHub issues that mention common UI components shared between the app under test and other apps in our database, and (2) a ranking algorithm to select GitHub issues that are of the best quality. Our results show that Bugine is able to find 34 new bugs. In total, collaborative bug finding helps us find 51 new bugs, in which eight have been confirmed and 11 have been fixed by the developers. These results confirm our intuition that our proposed technique is useful in discovering new bugs for Android apps. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

3. Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code.

Author: Karampatsis, Rafael-Michael, Babii, Hlib, Robbes, Romain, Sutton, Charles, and Janes, Andrea
Subjects: VOCABULARY, SOFTWARE engineering, PROGRAMMING languages, COMPUTER software, COMPUTER software development
Abstract: Statistical language modeling techniques have successfully been applied to large source code corpora, yielding a variety of new software development tools, such as tools for code suggestion, improving readability, and API migration. A major issue with these techniques is that code introduces new vocabulary at a far higher rate than natural language, as new identifier names proliferate. Both large vocabularies and out-of-vocabulary issues severely affect Neural Language Models (NLMs) of source code, degrading their performance and rendering them unable to scale. In this paper, we address this issue by: 1) studying how various modelling choices impact the resulting vocabulary on a large-scale corpus of 13,362 projects; 2) presenting an open vocabulary source code NLM that can scale to such a corpus, 100 times larger than in previous work; and 3) showing that such models outperform the state of the art on three distinct code corpora (Java, C, Python). To our knowledge, these are the largest NLMs for code that have been reported. All datasets, code, and trained models used in this work are publicly available. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

4. Translating Video Recordings of Mobile App Usages into Replayable Scenarios.

Author: Bernal-Cárdenas, Carlos, Cooper, Nathan, Moran, Kevin, Chaparro, Oscar, Marcus, Andrian, and Poshyvanyk, Denys
Subjects: VIDEO recording, MOBILE apps, COMPUTER software development, MACHINE learning, COMPUTER vision
Abstract: Screen recordings of mobile applications are easy to obtain and capture a wealth of information pertinent to software developers (e.g., bugs or feature requests), making them a popular mechanism for crowdsourced app feedback. Thus, these videos are becoming a common artifact that developers must manage. In light of unique mobile development constraints, including swift release cycles and rapidly evolving platforms, automated techniques for analyzing all types of rich software artifacts provide benefit to mobile developers. Unfortunately, automatically analyzing screen recordings presents serious challenges, due to their graphical nature, compared to other types of (textual) artifacts. To address these challenges, this paper introduces V2S, a lightweight, automated approach for translating video recordings of Android app usages into replayable scenarios. V2S is based primarily on computer vision techniques and adapts recent solutions for object detection and image classification to detect and classify user actions captured in a video, and convert these into a replayable test scenario. We performed an extensive evaluation of V2S involving 175 videos depicting 3,534 GUI-based actions collected from users exercising features and reproducing bugs from over 80 popular Android apps. Our results illustrate that V2S can accurately replay scenarios from screen recordings, and is capable of reproducing ≈ 89% of our collected videos with minimal overhead. A case study with three industrial partners illustrates the potential usefulness of V2S from the viewpoint of developers. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

5. Supporting Analysts by Dynamic Extraction and Classification of Requirements-Related Knowledge.

Author: Hossein Abad, Zahra Shakeri, Gervasi, Vincenzo, Zowghi, Didar, and Far, Behrouz H.
Subjects: STAKEHOLDERS, COMPUTER software development, COMPUTER science, MACHINE learning, ARTIFICIAL intelligence
Abstract: In many software development projects, analysts are required to deal with systems' requirements from unfamiliar domains. Familiarity with the domain is necessary in order to get full leverage from interaction with stakeholders and for extracting relevant information from the existing project documents. Accurate and timely extraction and classification of requirements knowledge support analysts in this challenging scenario. Our approach is to mine real-time interaction records and project documents for the relevant phrasal units about the requirements related topics being discussed during elicitation. We propose to use both generative and discriminating methods. To extract the relevant terms, we leverage the flexibility and power of Weighted Finite State Transducers (WFSTs) in dynamic modelling of natural language processing tasks. We used an extended version of Support Vector Machines (SVMs) with variable-sized feature vectors to efficiently and dynamically extract and classify requirements-related knowledge from the existing documents. To evaluate the performance of our approach intuitively and quantitatively, we used edit distance and precision/recall metrics. We show in three case studies that the snippets extracted by our method are intuitively relevant and reasonably accurate. Furthermore, we found that statistical and linguistic parameters such as smoothing methods, and words contiguity and order features can impact the performance of both extraction and classification tasks. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

6. Natural Software Revisited.

Author: Rahman, Musfiqur, Palani, Dharani, and Rigby, Peter C.
Subjects: ARTIFICIAL intelligence, JAVA programming language, PUNCTUATION, DATA flow computing, NATURAL languages
Abstract: Recent works have concluded that software code is more repetitive and predictable, i.e. more natural, than English texts. On re-examination, we find that much of the apparent "naturalness" of source code is due to the presence of language specific syntax, especially separators, such as semi-colons and brackets. For example, separators account for 44% of all tokens in our Java corpus. When we follow the NLP practices of eliminating punctuation (e.g., separators) and stopwords (e.g., keywords), we find that code is still repetitive and predictable, but to a lesser degree than previously thought. We suggest that SyntaxTokens be filtered to reduce noise in code recommenders. Unlike the code written for a particular project, API code usage is similar across projects: a file is opened and closed in the same manner regardless of domain. When we restrict our n-grams to those contained in the Java API, we find that API usages are highly repetitive. Since API calls are common across programs, researchers have made reliable statistical models to recommend sophisticated API call sequences. Sequential n-gram models were developed for natural languages. Code is usually represented by an AST which contains control and data flow, making n-grams models a poor representation of code. Comparing n-grams to statistical graph representations of the same codebase, we find that graphs are more repetitive and contain higherlevel patterns than n-grams. We suggest that future work focus on statistical code graphs models that accurately capture complex coding patterns. Our replication package makes our scripts and data available to future researchers [1]. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

7. Crowdsourced software development and maintenance.

Author: Lin, Bin
Subjects: COMPUTER software development, COMPUTER software developers, FEATURE extraction, SOFTWARE architecture, SOFTWARE maintenance, CROWDSOURCING
Abstract: As modern software systems are becoming increasingly complex, developers often need to rely on online sources to address problems encountered during software development and maintenance. These resources provide developers with access to peers' expertise, covering knowledge of different software lifecycle phases, including design, implementation, and maintenance. However, exploiting such knowledge and converting it into actionable items is far from trivial, due to the vastness of the information available online as well as to its unstructured nature. In this research, we aim at (partially) crowdsourcing the software design, implementation and maintenance process by exploiting the knowledge embedded in various sources available on the Web (e.g., Stack Overflow discussions, presentations on SlideShare, open source code, etc.). For example, we want to support software design decisions (e.g., whether to use a specific library for the implementation of a feature) by performing opinion mining on the vast amount of information available on the Web, and we want to recommend refactoring operations by learning from the code written in open source systems. The final goal is to improve developers' productivity and code quality. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

8. Leveraging Informal Documentation to Summarize Classes and Methods in Context.

Author: Guerrouj, Latifa, Bourque, David, and Rigby, Peter C.
Subjects: COMPUTER software, CODE review (Computer science), COMPUTER software developers, SOURCE code, COMPUTER software development
Abstract: Critical information related to a software developer's current task is trapped in technical developer discussions, bug reports, code reviews, and other software artefacts. Much of this information pertains to the proper use of code elements (e.g., methods and classes) that capture vital problem domain knowledge. To understand the purpose of these code elements, software developers must either access documentation and online posts and understand the source code or peruse a substantial amount of text. In this paper, we use the context that surrounds code elements in StackOverflow posts to summarize the use and purpose of code elements. To provide focus to our investigation, we consider the generation of summaries for library identifiers discussed in StackOverflow. Our automatic summarization approach was evaluated on a sample of 100 randomly-selected library identifiers with respect to a benchmark of summaries provided by two annotators. The results show that the approach attains an R-precision of 54%, which is appropriate given the diverse ways in which code elements can be used. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

9. On the Naturalness of Software.

Author: Hindle, Abram, Barr, Earl T., Su, Zhendong, Gabel, Mark, and Devanbu, Premkumar
Subjects: PROGRAMMING languages, ELECTRONIC data processing, COMPUTER software, COMPUTER software development, SOFTWARE engineering, COMPUTER-aided software engineering
Abstract: Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension. We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations-and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether a) code can be usefully modeled by statistical language models and b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very repetitive, and in fact even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipse's built-in completion capability. We conclude the paper by laying out a vision for future research in this area. [ABSTRACT FROM AUTHOR]
Published: 2012

10. Fast and Precise Statistical Code Completion.

Author: Roos, Pascal
Subjects: APPLICATION program interfaces, N-gram models (Computational linguistics), APPLICATION software, SMOOTHING (Numerical analysis), NUMERICAL analysis
Abstract: The main problem we try to solve is API code completion which is both precise and works in real-time. We describe an efficient implementation of an N-gram language model combined with several smoothing methods and a completion algorithm based on beam search. We show that our system is both fast and precise using a thorough experimental evaluation. With optimal parameters we are able to find completions in milliseconds and the desired completion is in the top 3 suggestions in 89% of the time. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

11. Designing bug detection rules for fewer false alarms.

Author: Nam, Jaechang, Wang, Song, Xi, Yuan, and Tan, Lin
Subjects: SOFTWARE frameworks, FALSE alarms, MACHINE learning, OPEN source software, FAULT-tolerant computing, MALWARE
Abstract: One of the challenging issues of the existing static analysis tools is the high false alarm rate. To address the false alarm issue, we design bug detection rules by learning from a large number of real bugs from open-source projects from GitHub. Specifically, we build a framework that learns and refines bug detection rules for fewer false positives. Based on the framework, we implemented ten patterns, six of which are new ones to existing tools. To evaluate the framework, we implemented a static analysis tool, FeeFin, based on the framework with the ten bug detection rules and applied the tool for 1,800 open-source projects in GitHub. The 57 detected bugs by FeeFin has been confirmed by developers as true positives and 44 bugs out of the detected bugs were actually fixed. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"N-gram language models"'

1. Code Prediction by Feeding Trees to Transformers.

2. Collaborative Bug Finding for Android Apps.

3. Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code.

4. Translating Video Recordings of Mobile App Usages into Replayable Scenarios.

5. Supporting Analysts by Dynamic Extraction and Classification of Requirements-Related Knowledge.

6. Natural Software Revisited.

7. Crowdsourced software development and maintenance.

8. Leveraging Informal Documentation to Summarize Classes and Methods in Context.

9. On the Naturalness of Software.

10. Fast and Precise Statistical Code Completion.

11. Designing bug detection rules for fewer false alarms.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

11 results on '"N-gram language models"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources