Author: "You, Yudu" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"You, Yudu"' showing total 10 results

Start Over Author "You, Yudu"

10 results on '"You, Yudu"'

1. ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization

Author: Fang, Chunrong, Sun, Weisong, Chen, Yuchen, Chen, Xiao, Wei, Zhao, Zhang, Quanjun, You, Yudu, Luo, Bin, Liu, Yang, and Chen, Zhenyu
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, D.2.3, I.2.7
Abstract: (Source) code summarization aims to automatically generate succinct natural language summaries for given code snippets. Such summaries play a significant role in promoting developers to understand and maintain code. Inspired by neural machine translation, deep learning-based code summarization techniques widely adopt an encoder-decoder framework, where the encoder transforms given code snippets into context vectors, and the decoder decodes context vectors into summaries. Recently, large-scale pre-trained models for source code are equipped with encoders capable of producing general context vectors and have achieved substantial improvements on code summarization. However, although they are usually trained mainly on code-focused tasks and can capture general code features, they still fall short in capturing specific features that need to be summarized. This paper proposes a novel approach to improve code summarization based on summary-focused tasks. Specifically, we exploit a multi-task learning paradigm to train the encoder on three summary-focused tasks to enhance its ability to learn code-summary alignment, including unidirectional language modeling (ULM), masked language modeling (MLM), and action word prediction (AWP). Unlike pre-trained models that mainly predict masked tokens in code snippets, we design ULM and MLM to predict masked words in summaries. Intuitively, predicting words based on given code snippets would help learn the code-summary alignment. Additionally, we introduce the domain-specific task AWP to enhance the ability of the encoder to learn the alignment between action words and code snippets. The extensive experiments on four datasets demonstrate that our approach, called ESALE significantly outperforms baselines in all three widely used metrics, including BLEU, METEOR, and ROUGE-L., Comment: Accepted to IEEE Transactions on Software Engineering (TSE)
Published: 2024

2. A Prompt Learning Framework for Source Code Summarization

Author: Sun, Weisong, Fang, Chunrong, You, Yudu, Chen, Yuchen, Liu, Yi, Wang, Chong, Zhang, Jian, Zhang, Quanjun, Qian, Hanwei, Zhao, Wei, Liu, Yang, and Chen, Zhenyu
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, 68-04, 68T30, D.2.3, I.2.2, I.2.4
Abstract: (Source) code summarization is the task of automatically generating natural language summaries for given code snippets. Such summaries play a key role in helping developers understand and maintain source code. Recently, with the successful application of large language models (LLMs) in numerous fields, software engineering researchers have also attempted to adapt LLMs to solve code summarization tasks. The main adaptation schemes include instruction prompting and task-oriented fine-tuning. However, instruction prompting involves designing crafted prompts for zero-shot learning or selecting appropriate samples for few-shot learning and requires users to have professional domain knowledge, while task-oriented fine-tuning requires high training costs. In this paper, we propose a novel prompt learning framework for code summarization called PromptCS. PromptCS trains a prompt agent that can generate continuous prompts to unleash the potential for LLMs in code summarization. Compared to the human-written discrete prompt, the continuous prompts are produced under the guidance of LLMs and are therefore easier to understand by LLMs. PromptCS freezes the parameters of LLMs when training the prompt agent, which can greatly reduce the requirements for training resources. We evaluate PromptCS on the CodeSearchNet dataset involving multiple programming languages. The results show that PromptCS significantly outperforms instruction prompting schemes on all four widely used metrics. In some base LLMs, e.g., CodeGen-Multi-2B and StarCoderBase-1B and -3B, PromptCS even outperforms the task-oriented fine-tuning scheme. More importantly, the training efficiency of PromptCS is faster than the task-oriented fine-tuning scheme, with a more pronounced advantage on larger LLMs. The results of the human evaluation demonstrate that PromptCS can generate more good summaries compared to baselines., Comment: submitted to ACM Transactions on Software Engineering and Methodology
Published: 2023

3. Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

Author: Sun, Weisong, Fang, Chunrong, Miao, Yun, You, Yudu, Yuan, Mengzhe, Chen, Yuchen, Zhang, Quanjun, Guo, An, Chen, Xiang, Liu, Yang, and Chen, Zhenyu
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Programming Languages, 68-04, 68T30, D.2.3, I.2.2, I.2.4
Abstract: Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of the source code features while preserving its semantics. These representations can be used for facilitating subsequent code-related tasks. The abstract syntax tree (AST), a fundamental code feature, illustrates the syntactic information of the source code and has been widely used in code representation learning. However, there is still a lack of systematic and quantitative evaluation of how well AST-based code representation facilitates subsequent code-related tasks. In this paper, we first conduct a comprehensive empirical study to explore the effectiveness of the AST-based code representation in facilitating follow-up code-related tasks. To do so, we compare the performance of models trained with code token sequence (Token for short) based code representation and AST-based code representation on three popular types of code-related tasks. Surprisingly, the overall quantitative statistical results demonstrate that models trained with AST-based code representation consistently perform worse across all three tasks compared to models trained with Token-based code representation. Our further quantitative analysis reveals that models trained with AST-based code representation outperform models trained with Token-based code representation in certain subsets of samples across all three tasks. We also conduct comprehensive experiments to evaluate and reveal the impact of the choice of AST parsing/preprocessing/encoding methods on AST-based code representation and subsequent code-related tasks. Our study provides future researchers with detailed guidance on how to select solutions at each stage to fully exploit AST., Comment: submitted to ACM Transactions on Software Engineering and Methodology. arXiv admin note: text overlap with arXiv:2103.10668 by other authors
Published: 2023

4. Automatic Code Summarization via ChatGPT: How Far Are We?

Author: Sun, Weisong, Fang, Chunrong, You, Yudu, Miao, Yun, Liu, Yi, Li, Yuekang, Deng, Gelei, Huang, Shenghan, Chen, Yuchen, Zhang, Quanjun, Qian, Hanwei, Liu, Yang, and Chen, Zhenyu
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, 68T50, D.2.3
Abstract: To support software developers in understanding and maintaining programs, various automatic code summarization techniques have been proposed to generate a concise natural language comment for a given code snippet. Recently, the emergence of large language models (LLMs) has led to a great boost in the performance of natural language processing tasks. Among them, ChatGPT is the most popular one which has attracted wide attention from the software engineering community. However, it still remains unclear how ChatGPT performs in (automatic) code summarization. Therefore, in this paper, we focus on evaluating ChatGPT on a widely-used Python dataset called CSN-Python and comparing it with several state-of-the-art (SOTA) code summarization models. Specifically, we first explore an appropriate prompt to guide ChatGPT to generate in-distribution comments. Then, we use such a prompt to ask ChatGPT to generate comments for all code snippets in the CSN-Python test set. We adopt three widely-used metrics (including BLEU, METEOR, and ROUGE-L) to measure the quality of the comments generated by ChatGPT and SOTA models (including NCS, CodeBERT, and CodeT5). The experimental results show that in terms of BLEU and ROUGE-L, ChatGPT's code summarization performance is significantly worse than all three SOTA models. We also present some cases and discuss the advantages and disadvantages of ChatGPT in code summarization. Based on the findings, we outline several open challenges and opportunities in ChatGPT-based code summarization.
Published: 2023

5. An Extractive-and-Abstractive Framework for Source Code Summarization

Author: Sun, Weisong, Fang, Chunrong, Chen, Yuchen, Zhang, Quanjun, Tao, Guanhong, Han, Tingxu, Ge, Yifei, You, Yudu, and Luo, Bin
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, D.2.3, I.2.7
Abstract: (Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization techniques can be categorized into extractive methods and abstractive methods. The extractive methods extract a subset of important statements and keywords from the code snippet using retrieval techniques, and generate a summary that preserves factual details in important statements and keywords. However, such a subset may miss identifier or entity naming, and consequently, the naturalness of generated summary is usually poor. The abstractive methods can generate human-written-like summaries leveraging encoder-decoder models from the neural machine translation domain. The generated summaries however often miss important factual details. To generate human-written-like summaries with preserved factual details, we propose a novel extractive-and-abstractive framework. The extractive module in the framework performs a task of extractive code summarization, which takes in the code snippet and predicts important statements containing key factual details. The abstractive module in the framework performs a task of abstractive code summarization, which takes in the entire code snippet and important statements in parallel and generates a succinct and human-written-like natural language summary. We evaluate the effectiveness of our technique, called EACS, by conducting extensive experiments on three datasets involving six programming languages. Experimental results show that EACS significantly outperforms state-of-the-art techniques in terms of all three widely used metrics, including BLEU, METEOR, and ROUGH-L., Comment: Accepted to ACM Transactions on Software Engineering and Methodology (TOSEM)
Published: 2022

6. An Efficient Adaptive Fault Diagnosis Algorithm for Highly Scalable Data Center Networks

Author: Wang, Xiangke, You, Yudu, Li, Xiao-Yan, Liu, Ximeng, Yang, Yang, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Lin, Limei, editor, Liu, Yuhong, editor, and Lee, Chia-Wei, editor
Published: 2021
Full Text: View/download PDF

7. Esale: <underline>E</underline>nhancing Code-<underline>S</underline>ummary <underline>A</underline>lignment <underline>Le</underline>arning for Source Code Summarization

Author: Fang, Chunrong, Sun, Weisong, Chen, Yuchen, Chen, Xiao, Wei, Zhao, Zhang, Quanjun, You, Yudu, Luo, Bin, Liu, Yang, and Chen, Zhenyu
Abstract: (Source) code summarization aims to automatically generate succinct natural language summaries for given code snippets. Such summaries play a significant role in promoting developers to understand and maintain code. Inspired by neural machine translation, deep learning-based code summarization techniques widely adopt an encoder-decoder framework, where the encoder transforms given code snippets into context vectors, and the decoder decodes context vectors into summaries. Recently, large-scale pre-trained models for source code (e.g., CodeBERT and UniXcoder) are equipped with encoders capable of producing general context vectors and have achieved substantial improvements on the code summarization task. However, although they are usually trained mainly on code-focused tasks and can capture general code features, they still fall short in capturing specific features that need to be summarized. In a nutshell, they fail to learn the alignment between code snippets and summaries (code-summary alignment for short). In this paper, we propose a novel approach to improve code summarization based on summary-focused tasks. Specifically, we exploit a multi-task learning paradigm to train the encoder on three summary-focused tasks to enhance its ability to learn code-summary alignment, including unidirectional language modeling (ULM), masked language modeling (MLM), and action word prediction (AWP). Unlike pre-trained models that mainly predict masked tokens in code snippets, we design ULM and MLM to predict masked words in summaries. Intuitively, predicting words based on given code snippets would help learn the code-summary alignment. In addition, existing work shows that AWP affects the prediction of the entire summary. Therefore, we further introduce the domain-specific task AWP to enhance the ability of the encoder to learn the alignment between action words and code snippets. We evaluate the effectiveness of our approach, called Esale, by conducting extensive experiments on four datasets, including two widely used datasets JCSD and PCSD, a cross-project Java dataset CPJD, and a multilingual language dataset CodeSearchNet. Experimental results show that Esale significantly outperforms state-of-the-art baselines in all three widely used metrics, including BLEU, METEOR, and ROUGE-L. Moreover, the human evaluation proves that the summaries generated by Esale are more informative and closer to the ground-truth summaries.
Published: 2024
Full Text: View/download PDF

8. An Extractive-and-Abstractive Framework for Source Code Summarization.

Author: Sun, Weisong, Fang, Chunrong, Chen, Yuchen, Zhang, Quanjun, Tao, Guanhong, You, Yudu, Han, Tingxu, Ge, Yifei, Hu, Yuling, Luo, Bin, and Chen, Zhenyu
Subjects: TEXT summarization, AUTOMATIC summarization, PROGRAMMING languages, NATURAL languages, METEORS
Abstract: (Source) Code summarization aims to automatically generate summaries/comments for given code snippets in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization techniques can be categorized into extractive methods and abstractive methods. The extractive methods extract a subset of important statements and keywords from the code snippet using retrieval techniques and generate a summary that preserves factual details in important statements and keywords. However, such a subset may miss identifier or entity naming, and consequently, the naturalness of the generated summary is usually poor. The abstractive methods can generate human-written-like summaries leveraging encoder-decoder models. However, the generated summaries often miss important factual details. To generate human-written-like summaries with preserved factual details, we propose a novel extractive-and-abstractive framework. The extractive module in the framework performs the task of extractive code summarization, which takes in the code snippet and predicts important statements containing key factual details. The abstractive module in the framework performs the task of abstractive code summarization, which takes in the code snippet and important statements in parallel and generates a succinct and human-written-like natural language summary. We evaluate the effectiveness of our technique, called EACS, by conducting extensive experiments on three datasets involving six programming languages. Experimental results show that EACS significantly outperforms state-of-the-art techniques for all three widely used metrics, including BLEU, METEOR, and ROUGH-L. In addition, the human evaluation demonstrates that the summaries generated by EACS have higher naturalness and informativeness and are more relevant to given code snippets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. An Extractive-and-Abstractive Framework for Source Code Summarization

Author: Sun, Weisong, primary, Fang, Chunrong, additional, Chen, Yuchen, additional, Zhang, Quanjun, additional, Tao, Guanhong, additional, You, Yudu, additional, Han, Tingxu, additional, Ge, Yifei, additional, Hu, Yuling, additional, Luo, Bin, additional, and Chen, Zhenyu, additional
Published: 2023
Full Text: View/download PDF

10. An Efficient Adaptive Fault Diagnosis Algorithm for Highly Scalable Data Center Networks

Author: You Yudu, Xiao-Yan Li, Yang Yang, Xiangke Wang, and Ximeng Liu
Subjects: Service quality, Computer science, business.industry, Server, Big data, Scalability, Dimension (graph theory), Data center, Energy consumption, Fault (power engineering), business, Algorithm
Abstract: The big data system based on data center network provides low-latency, high-quality services for big data applications. When server failure occurs in data center network, the security of the big data platform and the service quality of big data applications will be severely affected. A highly scalable data center network (HSDC) is an emerging server-centric data center network that achieves incremental scalability while ensuring low cost and energy consumption, low diameter, and high bisection width. In this paper, we determined the connectivity and diagnosability of HSDC. Then we firstly design an efficient adaptive fault diagnosis algorithm to diagnose the actual status of all servers in HSDC with at most \({m2^m+4m(m-2)}\) (resp. 9) tests, where m is the dimension of the HSDC and \(m \ge 3\) (resp. \(m = 2\)). Experimental results show that for HSDC, our algorithm can achieve complete diagnosis and greatly reduce the number of tests.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"You, Yudu"'

1. ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization

2. A Prompt Learning Framework for Source Code Summarization

3. Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

4. Automatic Code Summarization via ChatGPT: How Far Are We?

5. An Extractive-and-Abstractive Framework for Source Code Summarization

6. An Efficient Adaptive Fault Diagnosis Algorithm for Highly Scalable Data Center Networks

7. Esale: <underline>E</underline>nhancing Code-<underline>S</underline>ummary <underline>A</underline>lignment <underline>Le</underline>arning for Source Code Summarization

8. An Extractive-and-Abstractive Framework for Source Code Summarization.

9. An Extractive-and-Abstractive Framework for Source Code Summarization

10. An Efficient Adaptive Fault Diagnosis Algorithm for Highly Scalable Data Center Networks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

10 results on '"You, Yudu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources