Author: "Wan, Chengcheng" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wan, Chengcheng"' showing total 49 results

Start Over Author "Wan, Chengcheng"

49 results on '"Wan, Chengcheng"'

1. CodeCipher: Learning to Obfuscate Source Code Against LLMs

Author: Lin, Yalan, Wan, Chengcheng, Fang, Yixiong, and Gu, Xiaodong
Subjects: Computer Science - Computation and Language
Abstract: While large code language models have made significant strides in AI-assisted coding tasks, there are growing concerns about privacy challenges. The user code is transparent to the cloud LLM service provider, inducing risks of unauthorized training, reading, and execution of the user code. In this paper, we propose CodeCipher, a novel method that perturbs privacy from code while preserving the original response from LLMs. CodeCipher transforms the LLM's embedding matrix so that each row corresponds to a different word in the original matrix, forming a token-to-token confusion mapping for obfuscating source code. The new embedding matrix is optimized by minimizing the task-specific loss function. To tackle the challenge of the discrete and sparse nature of word vector spaces, CodeCipher adopts a discrete optimization strategy that aligns the updated vector to the nearest valid token in the vocabulary before each gradient update. We demonstrate the effectiveness of our approach on three AI-assisted coding tasks including code completion, summarization, and translation. Results show that our model successfully confuses the privacy in source code while preserving the original LLM's performance.
Published: 2024

2. From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Author: Shi, Yuling, Wang, Songsong, Wan, Chengcheng, and Gu, Xiaodong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Programming Languages, Computer Science - Software Engineering
Abstract: While large language models have made significant strides in code generation, the pass rate of the generated code is bottlenecked on subtle errors, often requiring human intervention to pass tests, especially for complex problems. Existing LLM-based debugging systems treat generated programs as monolithic units, failing to address bugs at multiple levels of granularity, from low-level syntax errors to high-level algorithmic flaws. In this paper, we introduce Multi-Granularity Debugger (MGDebugger), a hierarchical code debugger by isolating, identifying, and resolving bugs at various levels of granularity. MGDebugger decomposes problematic code into a hierarchical tree structure of subfunctions, with each level representing a particular granularity of error. During debugging, it analyzes each subfunction and iteratively resolves bugs in a bottom-up manner. To effectively test each subfunction, we propose an LLM-simulated Python executor, which traces code execution and tracks important variable states to pinpoint errors accurately. Extensive experiments demonstrate that MGDebugger outperforms existing debugging systems, achieving an 18.9% improvement in accuracy over seed generations in HumanEval and a 97.6% repair success rate in HumanEvalFix. Furthermore, MGDebugger effectively fixes bugs across different categories and difficulty levels, demonstrating its robustness and effectiveness., Comment: Code and data available at https://github.com/YerbaPage/MGDebugger
Published: 2024

3. BinPRE: Enhancing Field Inference in Binary Analysis Based Protocol Reverse Engineering

Author: Jiang, Jiayi, Zhang, Xiyuan, Wan, Chengcheng, Chen, Haoyi, Sun, Haiying, and Su, Ting
Subjects: Computer Science - Software Engineering, Computer Science - Cryptography and Security
Abstract: Protocol reverse engineering (PRE) aims to infer the specification of network protocols when the source code is not available. Specifically, field inference is one crucial step in PRE to infer the field formats and semantics. To perform field inference, binary analysis based PRE techniques are one major approach category. However, such techniques face two key challenges - (1) the format inference is fragile when the logics of processing input messages may vary among different protocol implementations, and (2) the semantic inference is limited by inadequate and inaccurate inference rules. To tackle these challenges, we present BinPRE, a binary analysis based PRE tool. BinPRE incorporates (1) an instruction-based semantic similarity analysis strategy for format extraction; (2) a novel library composed of atomic semantic detectors for improving semantic inference adequacy; and (3) a cluster-and-refine paradigm to further improve semantic inference accuracy. We have evaluated BinPRE against five existing PRE tools, including Polyglot, AutoFormat, Tupni, BinaryInferno and DynPRE. The evaluation results on eight widely-used protocols show that BinPRE outperforms the prior PRE tools in both format and semantic inference. BinPRE achieves the perfection of 0.73 on format extraction and the F1-score of 0.74 (0.81) on semantic inference of types (functions), respectively. The field inference results of BinPRE have helped improve the effectiveness of protocol fuzzing by achieving 5-29% higher branch coverage, compared to those of the best prior PRE tool. BinPRE has also helped discover one new zero-day vulnerability, which otherwise cannot be found., Comment: Accepted by ACM Conference on Computer and Communications Security (CCS) 2024
Published: 2024

4. Vortex under Ripplet: An Empirical Study of RAG-enabled Applications

Author: Shao, Yuchen, Huang, Yuheng, Shen, Jiawei, Ma, Lei, Su, Ting, and Wan, Chengcheng
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios. However, developers face challenges in integrating RAG-enhanced LLMs into software systems, due to lack of interface specification, requirements from software context, and complicated system management. In this paper, we manually studied 100 open-source applications that incorporate RAG-enhanced LLMs, and their issue reports. We have found that more than 98% of applications contain multiple integration defects that harm software functionality, efficiency, and security. We have also generalized 19 defect patterns and proposed guidelines to tackle them. We hope this work could aid LLM-enabled software development and motivate future research.
Published: 2024

5. Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers

Author: Shi, Yuling, Zhang, Hongyu, Wan, Chengcheng, and Gu, Xiaodong
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models have catalyzed an unprecedented wave in code generation. While achieving significant advances, they blur the distinctions between machine- and human-authored source code, causing integrity and authenticity issues of software artifacts. Previous methods such as DetectGPT have proven effective in discerning machine-generated texts, but they do not identify and harness the unique patterns of machine-generated code. Thus, its applicability falters when applied to code. In this paper, we carefully study the specific patterns that characterize machine- and human-authored code. Through a rigorous analysis of code attributes such as lexical diversity, conciseness, and naturalness, we expose unique patterns inherent to each source. We particularly notice that the syntactic segmentation of code is a critical factor in identifying its provenance. Based on our findings, we propose DetectCodeGPT, a novel method for detecting machine-generated code, which improves DetectGPT by capturing the distinct stylized patterns of code. Diverging from conventional techniques that depend on external LLMs for perturbations, DetectCodeGPT perturbs the code corpus by strategically inserting spaces and newlines, ensuring both efficacy and efficiency. Experiment results show that our approach significantly outperforms state-of-the-art techniques in detecting machine-generated code., Comment: Accepted by the 47th International Conference on Software Engineering (ICSE 2025). Code available at https://github.com/YerbaPage/DetectCodeGPT
Published: 2024

6. On the Effectiveness of Large Language Models in Domain-Specific Code Generation

Author: Lin, Yalan, Chen, Meng, Hu, Yuhan, Zhang, Hongyu, Wan, Chengcheng, Wei, Zhao, Xu, Yong, Wang, Juhong, and Gu, Xiaodong
Subjects: Computer Science - Software Engineering
Abstract: Large language models (LLMs) such as ChatGPT have shown remarkable capabilities in code generation. Despite significant achievements, they rely on enormous training data to acquire a broad spectrum of open-domain knowledge. Besides, their evaluation revolves around open-domain benchmarks like HumanEval, which primarily consist of programming contests. Therefore, it is hard to fully characterize the intricacies and challenges associated with particular domains (e.g., web, game, and math). In this paper, we conduct an in-depth study of the LLMs in domain-specific code generation. Our results demonstrate that LLMs exhibit sub-optimal performance in generating domain-specific code, due to their limited proficiency in utilizing domain-specific libraries. We further observe that incorporating API knowledge as prompts can empower LLMs to generate more professional code. Based on these findings, we further investigate how to effectively incorporate API knowledge into the code generation process. We experiment with three strategies for incorporating domain knowledge, namely, external knowledge inquirer, chain-of-thought prompting, and chain-of-thought fine-tuning. We refer to these strategies as a new code generation approach called DomCoder. Experimental results show that all strategies of DomCoder lead to improvement in the effectiveness of domain-specific code generation under certain settings., Comment: Accepted by the ACM Transactions on Software Engineering and Methodology (TOSEM 2024)
Published: 2023

7. Automatic and Efficient Customization of Neural Networks for ML Applications

Author: Liu, Yuhan, Wan, Chengcheng, Du, Kuntai, Hoffmann, Henry, Jiang, Junchen, Lu, Shan, and Maire, Michael
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Networking and Internet Architecture
Abstract: ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API. However, these APIs offer the same pre-trained models regardless of how their output is used by different applications. This can be suboptimal as not all ML inference errors can cause application failures, and the distinction between inference errors that can or cannot cause failures varies greatly across applications. To tackle this problem, we first study 77 real-world applications, which collectively use six ML APIs from two providers, to reveal common patterns of how ML API output affects applications' decision processes. Inspired by the findings, we propose ChameleonAPI, an optimization framework for ML APIs, which takes effect without changing the application source code. ChameleonAPI provides application developers with a parser that automatically analyzes the application to produce an abstract of its decision process, which is then used to devise an application-specific loss function that only penalizes API output errors critical to the application. ChameleonAPI uses the loss function to efficiently train a neural network model customized for each application and deploys it to serve API invocations from the respective application via existing interface. Compared to a baseline that selects the best-of-all commercial ML API, we show that ChameleonAPI reduces incorrect application decisions by 43%.
Published: 2023

8. Self-Supervised Query Reformulation for Code Search

Author: Mao, Yuetian, Wan, Chengcheng, Jiang, Yuze, and Gu, Xiaodong
Subjects: Computer Science - Software Engineering
Abstract: Automatic query reformulation is a widely utilized technology for enriching user requirements and enhancing the outcomes of code search. It can be conceptualized as a machine translation task, wherein the objective is to rephrase a given query into a more comprehensive alternative. While showing promising results, training such a model typically requires a large parallel corpus of query pairs (i.e., the original query and a reformulated query) that are confidential and unpublished by online code search engines. This restricts its practicality in software development processes. In this paper, we propose SSQR, a self-supervised query reformulation method that does not rely on any parallel query corpus. Inspired by pre-trained models, SSQR treats query reformulation as a masked language modeling task conducted on an extensive unannotated corpus of queries. SSQR extends T5 (a sequence-to-sequence model based on Transformer) with a new pre-training objective named corrupted query completion (CQC), which randomly masks words within a complete query and trains T5 to predict the masked content. Subsequently, for a given query to be reformulated, SSQR identifies potential locations for expansion and leverages the pre-trained T5 model to generate appropriate content to fill these gaps. The selection of expansions is then based on the information gain associated with each candidate. Evaluation results demonstrate that SSQR outperforms unsupervised baselines significantly and achieves competitive performance compared to supervised methods., Comment: Accepted to be published in ESEC/FSE 2023
Published: 2023

9. CFP: A Reinforcement Learning Framework for Comprehensive Fairness-Performance Trade-Off in Machine Learning

Author: Zhang, Simiao, Bai, Jitao, Guan, Menghong, Zhang, Yueling, Sun, Jun, Huang, Yihao, Wang, Jiaping, Wan, ChengCheng, Su, Ting, Pu, Geguang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wand, Michael, editor, Malinovská, Kristína, editor, Schmidhuber, Jürgen, editor, and Tetko, Igor V., editor
Published: 2024
Full Text: View/download PDF

10. Doppler: Automated SKU Recommendation in Migrating SQL Workloads to the Cloud

Author: Cahoon, Joyce, Wang, Wenjing, Zhu, Yiwen, Lin, Katherine, Liu, Sean, Truong, Raymond, Singh, Neetu, Wan, Chengcheng, Ciortea, Alexandra M, Narasimhan, Sreraman, and Krishnan, Subru
Subjects: Computer Science - Databases
Abstract: Selecting the optimal cloud target to migrate SQL estates from on-premises to the cloud remains a challenge. Current solutions are not only time-consuming and error-prone, requiring significant user input, but also fail to provide appropriate recommendations. We present Doppler, a scalable recommendation engine that provides right-sized Azure SQL Platform-as-a-Service (PaaS) recommendations without requiring access to sensitive customer data and queries. Doppler introduces a novel price-performance methodology that allows customers to get a personalized rank of relevant cloud targets solely based on low-level resource statistics, such as latency and memory usage. Doppler supplements this rank with internal knowledge of Azure customer behavior to help guide new migration customers towards one optimal target. Experimental results over a 9-month period from prospective and existing customers indicate that Doppler can identify optimal targets and adapt to changes in customer workloads. It has also found cost-saving opportunities among over-provisioned cloud customers, without compromising on capacity or other requirements. Doppler has been integrated and released in the Azure Data Migration Assistant v5.5, which receives hundreds of assessment requests daily.
Published: 2022
Full Text: View/download PDF

11. Identifying socioeconomic exposure patterns and hotspots of global tropical cyclones from 1990 to 2019

Author: Wan, Chengcheng, Tian, Yinwei, Liu, Jianli, Yan, Yafei, Shi, Zhongchao, Wen, Jiahong, and Yan, Lijun
Published: 2024
Full Text: View/download PDF

12. An Underwater SLAM Approach Using Regularly Distributed Magnetic Beacons

Author: Chang, Shuai, Wan, Chengcheng, Zhang, Dalong, Li, Hui, Lin, Ye, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Yan, Liang, editor, and Deng, Yimin, editor
Published: 2023
Full Text: View/download PDF

13. Damage analysis of retired typhoons in mainland China from 2009 to 2019

Author: Wan, Chengcheng, Yan, Yafei, Shen, Liucheng, Liu, Jianli, Lai, Xiaoxia, Qian, Wei, Nie, Juan, and Wen, Jiahong
Published: 2023
Full Text: View/download PDF

14. Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

Author: Wan, Chengcheng, Hoffmann, Henry, Lu, Shan, and Maire, Michael
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time. Efficient architectural designs for these networks focus on re-using internal state; subnetworks must produce representations relevant for both immediate prediction as well as refinement by subsequent network stages. We consider traditional branched networks as well as a new class of recursively nested networks. Our new optimizer, Orthogonalized SGD, dynamically re-balances task-specific gradients when training a multitask network. In the context of anytime architectures, this optimizer projects gradients from later outputs onto a parameter subspace that does not interfere with those from earlier outputs. Experiments demonstrate that training with Orthogonalized SGD significantly improves generalization accuracy of anytime networks., Comment: ICML 2020
Published: 2020

15. Disaster Risk Reduction, Climate Change Adaptation and Their Linkages with Sustainable Development over the Past 30 Years: A Review

Author: Wen, Jiahong, Wan, Chengcheng, Ye, Qian, Yan, Jianping, and Li, Weijiang
Published: 2023
Full Text: View/download PDF

16. ALERT: Accurate Learning for Energy and Timeliness

Author: Wan, Chengcheng, Santriaji, Muhammad, Rogers, Eri, Hoffmann, Henry, Maire, Michael, and Lu, Shan
Subjects: Computer Science - Performance, Computer Science - Machine Learning
Abstract: An increasing number of software applications incorporate runtime Deep Neural Networks (DNNs) to process sensor data and return inference results to humans. Effective deployment of DNNs in these interactive scenarios requires meeting latency and accuracy constraints while minimizing energy, a problem exacerbated by common system dynamics. Prior approaches handle dynamics through either (1) system-oblivious DNN adaptation, which adjusts DNN latency/accuracy tradeoffs, or (2) application-oblivious system adaptation, which adjusts resources to change latency/energy tradeoffs. In contrast, this paper improves on the state-of-the-art by coordinating application- and system-level adaptation. ALERT, our runtime scheduler, uses a probabilistic model to detect environmental volatility and then simultaneously select both a DNN and a system resource configuration to meet latency, accuracy, and energy constraints. We evaluate ALERT on CPU and GPU platforms for image and speech tasks in dynamic environments. ALERT's holistic approach achieves more than 13% energy reduction, and 27% error reduction over prior approaches that adapt solely at the application or system level. Furthermore, ALERT incurs only 3% more energy consumption and 2% higher DNN-inference error than an oracle scheme with perfect application and system knowledge.
Published: 2019
Full Text: View/download PDF

17. Cost-benefit analysis of local knowledge-based flood adaptation measures: A case study of Datian community in Zhejiang Province, China

Author: Lai, Xiaoxia, Wen, Jiahong, Shan, Xinmeng, Shen, Liucheng, Wan, Chengcheng, Shao, Lin, Wu, Yanjuan, Chen, Bo, and Li, Weijiang
Published: 2023
Full Text: View/download PDF

18. A Joint Graph-Based Approach for Simultaneous Underwater Localization and Mapping for AUV Navigation Fusing Bathymetric and Magnetic-Beacon-Observation Data

Author: Chang, Shuai, primary, Zhang, Dalong, additional, Zhang, Linfeng, additional, Zou, Guoji, additional, Wan, Chengcheng, additional, Ma, Wencong, additional, and Zhou, Qingji, additional
Published: 2024
Full Text: View/download PDF

19. Keeper: Automated Testing and Fixing of Machine Learning Software.

Author: Wan, Chengcheng, Liu, Shicheng, Xie, Sophie, Liu, Yuhan, Hoffmann, Henry, Maire, Michael, and Lu, Shan
Subjects: COMPUTER software correctness, MACHINE learning, ENGINE testing, APPLICATION software, JUDGMENT (Psychology)
Abstract: The increasing number of software applications incorporating machine learning (ML) solutions has led to the need for testing techniques. However, testing ML software requires tremendous human effort to design realistic and relevant test inputs and to judge software output correctness according to human common sense. Even when misbehavior is exposed, it is often unclear whether the defect is inside ML API or the surrounding code and how to fix the implementation. This article tackles these challenges by proposing Keeper, an automated testing and fixing tool for ML software. The core idea of Keeper is designing pseudo-inverse functions that semantically reverse the corresponding ML task in an empirical way and proxy common human judgment of real-world data. It incorporates these functions into a symbolic execution engine to generate tests. Keeper also detects code smells that degrade software performance. Once misbehavior is exposed, Keeper attempts to change how ML APIs are used to alleviate the misbehavior. Our evaluation on a variety of applications shows that Keeper greatly improves branch coverage, while identifying 74 previously unknown failures and 19 code smells from 56 out of 104 applications. Our user studies show that 78% of end-users and 95% of developers agree with Keeper's detection and fixing results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. A Robust Graph-Based Bathymetric Simultaneous Localization and Mapping Approach for AUVs

Author: Zhang, Dalong, Chang, Shuai, Zou, Guoji, Wan, Chengcheng, and Li, Hui
Abstract: Due to the position drift of inertial navigation systems, it is still challenging to achieve long-term and accurate position estimates during underwater navigation. The seabed topography has been proven to be effective in aiding information for accurate positioning benefiting from its rich spatial variation. With the advantage of the multibeam echosounder (MBES) in efficient bathymetric survey, the simultaneous localization and mapping (SLAM) approach can be performed using bathymetric data in unknown environments for underwater vehicles to get good position estimates. The SLAM performance relies on the number and accuracy of loop closures heavily. Thereby, the capabilities of the data association method and solver in dealing with the uncertainties of vehicle pose estimates, bathymetric data, and topographic features affect the SLAM performance strongly. This work proposes a new graph-based bathymetric SLAM method to improve the robustness of the uncertainties in both factor-graph construction and optimization stages. In the front end, on the base of a matching suitability-based MBES submap construction method, a dual-stage bathymetric point cloud registration approach that is able to detect most false loop closures is proposed. In the back end, a robust optimizer based on Frechet distance is introduced to further identify and remove the false loop closures missed in front end. Experiments using field MBES bathymetric data sets are conducted to verify the effectiveness of the proposed approach.
Published: 2024
Full Text: View/download PDF

21. Robust Heading and Attitude Estimation of MEMS IMU in Magnetic Anomaly Field Using a Partially Adaptive Decoupled Extended Kalman Filter and LSTM Algorithm

Author: Li, Hui, primary, Chang, Shuai, additional, Yao, Qi, additional, Wan, Chengcheng, additional, Zou, Guoji, additional, and Zhang, Dalong, additional
Published: 2024
Full Text: View/download PDF

22. Disaster loss index development and comprehensive assessment: A case study of Shanghai

Author: Zhao, Luna, Wen, Jiahong, Wan, Chengcheng, Li, Li, Chen, Yuxi, Zhang, Huan, Liu, Huan, Yan, Jianping, Liu, Jianli, Tian, Tongfei, and Shi, Yong
Published: 2024
Full Text: View/download PDF

23. Run-Time Prevention of Software Integration Failures of Machine Learning APIs

Author: Wan, Chengcheng, primary, Liu, Yuhan, additional, Du, Kuntai, additional, Hoffmann, Henry, additional, Jiang, Junchen, additional, Maire, Michael, additional, and Lu, Shan, additional
Published: 2023
Full Text: View/download PDF

24. Learning Quality Evaluation of MOOC Based on Big Data Analysis

Author: Zhao, Zihao, Wu, Qiangqiang, Chen, Haopeng, Wan, Chengcheng, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Qiu, Meikang, editor
Published: 2017
Full Text: View/download PDF

25. SMOPAT: Mining semantic mobility patterns from trajectories of private vehicles

Author: Wan, Chengcheng, Zhu, Yanmin, Yu, Jiadi, and Shen, Yanyan
Published: 2018
Full Text: View/download PDF

26. GenCoG: A DSL-Based Approach to Generating Computation Graphs for TVM Testing

Author: Wang, Zihan, primary, Nie, Pengbo, additional, Miao, Xinyuan, additional, Chen, Yuting, additional, Wan, Chengcheng, additional, Bu, Lei, additional, and Zhao, Jianjun, additional
Published: 2023
Full Text: View/download PDF

27. HotGPT: How to Make Software Documentation More Useful with a Large Language Model?

Author: Su, Yiming, primary, Wan, Chengcheng, additional, Sethi, Utsav, additional, Lu, Shan, additional, Musuvathi, Madan, additional, and Nath, Suman, additional
Published: 2023
Full Text: View/download PDF

28. Doppler

Author: Cahoon, Joyce, Wang, Wenjing, Zhu, Yiwen, Lin, Katherine, Liu, Sean, Truong, Raymond, Singh, Neetu, Wan, Chengcheng, Ciortea, Alexandra M, Narasimhan, Sreraman, and Krishnan, Subru
Subjects: FOS: Computer and information sciences, Computer Science - Databases, General Engineering, Databases (cs.DB)
Abstract: Selecting the optimal cloud target to migrate SQL estates from on-premises to the cloud remains a challenge. Current solutions are not only time-consuming and error-prone, requiring significant user input, but also fail to provide appropriate recommendations. We present Doppler, a scalable recommendation engine that provides right-sized Azure SQL Platform-as-a-Service (PaaS) recommendations without requiring access to sensitive customer data and queries. Doppler introduces a novel price-performance methodology that allows customers to get a personalized rank of relevant cloud targets solely based on low-level resource statistics, such as latency and memory usage. Doppler supplements this rank with internal knowledge of Azure customer behavior to help guide new migration customers towards one optimal target. Experimental results over a 9-month period from prospective and existing customers indicate that Doppler can identify optimal targets and adapt to changes in customer workloads. It has also found cost-saving opportunities among over-provisioned cloud customers, without compromising on capacity or other requirements. Doppler has been integrated and released in the Azure Data Migration Assistant v5.5, which receives hundreds of assessment requests daily.
Published: 2022

29. Mechanical properties of nano SiO2 and fiber-reinforced concrete with steel fiber and high performance polypropylene fiber

Author: Mengjun Mei, Linsong Wu, Wan Chengcheng, Zhiwei Wu, Hui Liu, and Yanlin Yi
Subjects: steel fibers, mechanical properties, high performance polypropylene fibers, nano-SiO2, Materials of engineering and construction. Mechanics of materials, TA401-492, Chemical technology, TP1-1185
Abstract: This research studies the mechanical properties of concrete mixtures containing 1% nano-SiO _2 and different content macro-fiber. Steel (ST) fibers and High performance polypropylene (HPP) fibers of the same length and shape were used, a total of 10 concrete mixtures incorporating 1% of nano-SiO _2 by weight of the binder and 0.5%, 1%, 1.5% and 2% macro-fiber by volume of concrete were studied. The experimental results show that addition 1% nano-SiO _2 leads to an improvement in all of the mechanical properties of concrete and the incorporation of steel fiber and HPP fiber improves the mechanical properties of concrete. Furthermore, the tensile strength of concrete mixed with 2% steel fiber increased by 51.4%, and the flexural strength increased by 32.7%, the tensile strength of concrete mixed with 1% HPP fiber increased by 34.5%, and the flexural strength increased by 22.8%. It was also indicated that when the fiber content is 1 vol%, the HPP fiber can replace steel fiber.
Published: 2021
Full Text: View/download PDF

30. VarGAN: Adversarial Learning of Variable Semantic Representations

Author: Lin, Yalan, Wan, Chengcheng, Bai, Shuwen, and Gu, Xiaodong
Abstract: Variable names are of critical importance in code representation learning. However, due to diverse naming conventions, variables often receive arbitrary names, leading to long-tail, out-of-vocabulary (OOV), and other well-known problems. While the Byte-Pair Encoding (BPE) tokenizer has addressed the surface-level recognition of low-frequency tokens, it has not noticed the inadequate training of low-frequency identifiers by code representation models, resulting in an imbalanced distribution of rare and common identifiers. Consequently, code representation models struggle to effectively capture the semantics of low-frequency variable names. In this paper, we propose VarGAN, a novel method for variable name representations. VarGAN strengthens the training of low-frequency variables through adversarial training. Specifically, we regard the code representation model as a generator responsible for producing vectors from source code. Additionally, we employ a discriminator that detects whether the code input to the generator contains low-frequency variables. This adversarial setup regularizes the distribution of rare variables, making them overlap with their corresponding high-frequency counterparts in the vector space. Experimental results demonstrate that VarGAN empowers CodeBERT to generate code vectors that exhibit more uniform distribution for both low- and high-frequency identifiers. There is an improvement of 8% in similarity and relatedness scores compared to VarCLR in the IdBench benchmark. VarGAN is also validated in downstream tasks, where it exhibits enhanced capabilities in capturing token- and code-level semantics.
Published: 2024
Full Text: View/download PDF

31. Multi-Hazard Population Exposure in Low-Elevation Coastal Zones of China from 1990 to 2020.

Author: Feng, Siqi, Yang, Kexin, Liu, Jianli, Yang, Yvlu, Zhao, Luna, Wen, Jiahong, Wan, Chengcheng, and Yan, Lijun
Abstract: China's low-elevation coastal zone (LECZ) is characterized by multiple hazards and high impacts. How to quantitatively portray the spatiotemporal characteristics of the exposed population to multi-hazards in the LECZ is an important subject of risk reduction. In this study, the overall characteristics, spatial patterns, and main impact hazard in the LECZ from 1990 to 2020 were investigated using a multi-hazard population exposure model, spatial autocorrelation method, and principal component analysis (PCA) method. The results show that among the four hazards (earthquake, tropical cyclones (TCs), flood, and storm surge), TCs cover the largest area, accounting for 90.1% of the total LECZ area. TCs were also the hazard with the largest average annual growth rate of the exposed population (2.36%). The central region of China's LECZ is the cluster of exposed populations and the main distribution area with the largest increase in exposed populations. Therefore, the central region is a hotspot for multi-hazard risk management. Additionally, flood contributes the most to the multi-hazard population exposure index; thus, flood is a key hazard of concern in the LECZ. This study identifies the hotspot areas and priority hazards of multi-hazard exposed populations in the LECZ and provides important policy recommendations for multi-hazard risk management in the LECZ, which is important for LECZ to enhance the resilience of hazards. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

32. Hierarchical memory-constrained operator scheduling of neural architecture search networks

Author: Wang, Zihan, primary, Wan, Chengcheng, additional, Chen, Yuting, additional, Lin, Ziyi, additional, Jiang, He, additional, and Qiao, Lei, additional
Published: 2022
Full Text: View/download PDF

33. Automated testing of software that uses machine learning APIs

Author: Wan, Chengcheng, primary, Liu, Shicheng, additional, Xie, Sophie, additional, Liu, Yifan, additional, Hoffmann, Henry, additional, Maire, Michael, additional, and Lu, Shan, additional
Published: 2022
Full Text: View/download PDF

34. Coverage-Directed Differential Testing of X.509 Certificate Validation in SSL/TLS Implementations

Author: Nie, Pengbo, primary, Wan, Chengcheng, additional, Zhu, Jiayu, additional, Lin, Ziyi, additional, Chen, Yuting, additional, and Su, Zhendong, additional
Published: 2022
Full Text: View/download PDF

35. Correctness, Performance, and Energy-Efficiency: Improving Software Systems That Use Machine Learning Components

Author: Wan, Chengcheng
Subjects: Computer science
Abstract: An increasing number of software applications adopt machine learning (ML) components to solve real-world problems. The offering of ML cloud APIs further ease developers' burden of incorporating ML solutions, typically deep neural networks (DNNs). However, to achieve a correct, fast, and energy-efficient ML application, developers still need to carefully design its three crucial components: ML algorithm, system environment, and software context. To improve correctness, performance, and energy-efficiency of ML applications, this dissertation works on these components and makes the following contributions: First, to enhance the flexibility of neural networks, this dissertation proposes a novel neural network architecture and a customized optimizer that support anytime prediction. This design allows one neural network to generate a series of increasingly accurate outputs over time without sacrificing accuracy for flexibility. Second, this dissertation designs a run-time scheduler ALERT, which further manages system resources. ALERT holistically configures neural networks and system resources together to meet application-specific accuracy, performance, and energy-consumption constraints. It uses a probabilistic model to detect environmental volatility and makes use of the full potential of the DNN candidate set to optimize performance and satisfy constraints. Third, to understand the challenges of developing ML software, this dissertation conducts the first comprehensive study about how real-world applications are using machine learning cloud APIs. We generalize 8 anti-patterns that degrade functional, performance, or economical quality of the software. Fourth, guided by this study, we propose Keeper, a new testing framework for software systems that use machine learning APIs. Keeper automatically generates many test cases to thoroughly test every branch in the specified function and its callees. It analyzes the test runs and reports many failures, as well as potential patches, to developers.
Published: 2022
Full Text: View/download PDF

36. FPIA: A database for gene fusion profiling and interactive analyses

Author: Huang, Lu, primary, Zhu, Huimin, additional, Luo, Zhenhua, additional, Luo, Chukun, additional, Luo, Linjiang, additional, Nong, Baoting, additional, Zhang, Shiyu, additional, Wan, Chengcheng, additional, Wang, Yanzhi, additional, Songyang, Zhou, additional, and Xiong, Yuanyan, additional
Published: 2022
Full Text: View/download PDF

37. Cost-Benefit Analysis of Local Knowledge-Based Flood Adaptation Measures

Author: Lai, Xiaoxia, primary, Wen, Jiahong, additional, Shan, Xinmeng, additional, Shen, Liucheng, additional, Wan, Chengcheng, additional, Shao, Lin, additional, Wu, Yanjuan, additional, Chen, Bo, additional, and Li, Weijiang, additional
Published: 2022
Full Text: View/download PDF

38. A Replication of Are Machine Learning Cloud APIs Used Correctly

Author: Wan, Chengcheng, primary, Liu, Shicheng, additional, Hoffmann, Henry, additional, Maire, Michael, additional, and Lu, Shan, additional
Published: 2021
Full Text: View/download PDF

39. Are Machine Learning Cloud APIs Used Correctly?

Author: Wan, Chengcheng, primary, Liu, Shicheng, additional, Hoffmann, Henry, additional, Maire, Michael, additional, and Lu, Shan, additional
Published: 2021
Full Text: View/download PDF

40. A Study on the Efficiency of Tourism Poverty Alleviation in Ethnic Regions Based on the Staged DEA Model

Author: Yang, Jianchun, primary, Wu, Ying, additional, Wang, Jialian, additional, Wan, Chengcheng, additional, and Wu, Qian, additional
Published: 2021
Full Text: View/download PDF

41. A Simultaneous Localization and Mapping Approach Based on Detection of Magnetic Beacons

Author: Chang, Shuai, primary, Lin, Ye, additional, Fu, Xiaomei, additional, and Wan, Chengcheng, additional
Published: 2021
Full Text: View/download PDF

42. Guided, Deep Testing of X.509 Certificate Validation via Coverage Transfer Graphs

Author: Zhu, Jiayu, primary, Wan, Chengcheng, additional, Nie, Pengbo, additional, Chen, Yuting, additional, and Su, Zhendong, additional
Published: 2020
Full Text: View/download PDF

43. View-Centric Performance Optimization for Database-Backed Web Applications

Author: Yang, Junwen, primary, Yan, Cong, additional, Wan, Chengcheng, additional, Lu, Shan, additional, and Cheung, Alvin, additional
Published: 2019
Full Text: View/download PDF

44. Macrophages activate mesenchymal stem cells to acquire cancer‑associated fibroblast‑like features resulting in gastric epithelial cell lesions and malignant transformation in�vitro

Author: Zhang, Qiang, primary, Chai, Shuo, additional, Wang, Wei, additional, Wan, Chengcheng, additional, Zhang, Feng, additional, Li, Yuyun, additional, and Wang, Fengchao, additional
Published: 2018
Full Text: View/download PDF

45. Macrophages activate mesenchymal stem cells to acquire cancer-associated fibroblast-like features resulting in gastric epithelial cell lesions and malignant transformation in vitro.

Author: Zhang, Qiang, Chai, Shuo, Wang, Wei, Wan, Chengcheng, Zhang, Feng, Li, Yuyun, and Wang, Fengchao
Subjects: PRECANCEROUS conditions, MESENCHYMAL stem cells, FIBROBLASTS, MUCOUS membranes, EPITHELIUM
Abstract: The majority of premalignant gastric lesions develop in the mucosa that has been modified by chronic inflammation. As components of the gastritis microenvironment, mesenchymal stem cells (MSCs) and macrophages are critically involved in the initiation and development of the chronic gastritis-associated gastric epithelial lesions/malignancy process. However, in this process, the underlying mechanism of macrophages interacting with MSCs, particularly the effect of macrophages on MSCs phenotype and function remains to be elucidated. The present study revealed that human umbilical cord-derived MSCs were induced to differentiate into cancer-associated fibroblasts (CAFs) phenotype by co-culture with macrophages (THP-1 cells) in vitro, and which resulted in gastric epithelial lesions/potential malignancy via epithelial-mesenchymal transition-like changes. The results of the present study indicated that macrophages could induce MSCs to acquire CAF-like features and a pro-inflammatory phenotype to remodel the inflammatory microenvironment, which could potentiate oncogenic transformation of gastric epithelium cells. The present study provides potential targets and options for inflammation-associated gastric cancer prevention and intervention. [ABSTRACT FROM AUTHOR]
Published: 2019

46. Multi-perspective change impact analysis using linked data of software engineering

Author: Wan, Chengcheng, primary, Zhu, Zece, additional, Zhang, Yuchen, additional, and Chen, Yuting, additional
Published: 2016
Full Text: View/download PDF

47. Evaluating quality-in-use of FLOSS through analyzing user reviews

Author: Qian, Zhenzheng, primary, Wan, Chengcheng, additional, and Chen, Yuting, additional
Published: 2016
Full Text: View/download PDF

48. An empirical study on recovering requirement-to-code links

Author: Zhang, Yuchen, primary, Wan, Chengcheng, additional, and Jin, Bo, additional
Published: 2016
Full Text: View/download PDF

49. Multi-perspective change impact analysis using linked data of software engineering.

Author: Wan, Chengcheng, Zhu, Zece, Zhang, Yuchen, and Chen, Yuting
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

49 results on '"Wan, Chengcheng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources