Author: "Wu, Jiahui" / Database: arXiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wu, Jiahui"' showing total 13 results

Start Over Author "Wu, Jiahui" Database arXiv

13 results on '"Wu, Jiahui"'

1. PackageIntel: Leveraging Large Language Models for Automated Intelligence Extraction in Package Ecosystems

Author: Guo, Wenbo, Liu, Chengwei, Wang, Limin, Wu, Jiahui, Xu, Zhengzi, Huang, Cheng, Fang, Yong, and Liu, Yang
Subjects: Computer Science - Software Engineering
Abstract: The rise of malicious packages in public registries poses a significant threat to software supply chain (SSC) security. Although academia and industry employ methods like software composition analysis (SCA) to address this issue, existing approaches often lack timely and comprehensive intelligence updates. This paper introduces PackageIntel, a novel platform that revolutionizes the collection, processing, and retrieval of malicious package intelligence. By utilizing exhaustive search techniques, snowball sampling from diverse sources, and large language models (LLMs) with specialized prompts, PackageIntel ensures enhanced coverage, timeliness, and accuracy. We have developed a comprehensive database containing 20,692 malicious NPM and PyPI packages sourced from 21 distinct intelligence repositories. Empirical evaluations demonstrate that PackageIntel achieves a precision of 98.6% and an F1 score of 92.0 in intelligence extraction. Additionally, it detects threats on average 70% earlier than leading databases like Snyk and OSV, and operates cost-effectively at $0.094 per intelligence piece. The platform has successfully identified and reported over 1,000 malicious packages in downstream package manager mirror registries. This research provides a robust, efficient, and timely solution for identifying and mitigating threats within the software supply chain ecosystem.
Published: 2024

2. Assessing the Uncertainty and Robustness of Object Detection Models for Detecting Stickers on Laptops

Author: Lu, Chengjie, Wu, Jiahui, Ali, Shaukat, and Olsen, Mikkel Labori
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Refurbishing laptops extends their lives while contributing to reducing electronic waste, which promotes building a sustainable future. To this end, the Danish Technological Institute (DTI) focuses on the research and development of several applications, including laptop refurbishing. This has several steps, including cleaning, which involves identifying and removing stickers from laptop surfaces. DTI trained six sticker detection models (SDMs) based on open-source object detection models to identify such stickers precisely so these stickers can be removed automatically. However, given the diversity in types of stickers (e.g., shapes, colors, locations), identification of the stickers is highly uncertain, thereby requiring explicit quantification of uncertainty associated with the identified stickers. Such uncertainty quantification can help reduce risks in removing stickers, which, for example, could otherwise result in damaging laptop surfaces. For uncertainty quantification, we adopted the Monte Carlo Dropout method to evaluate the six SDMs from DTI using three datasets: the original image dataset from DTI and two datasets generated with vision language models, i.e., DALL-E-3 and Stable Diffusion-3. In addition, we presented novel robustness metrics concerning detection accuracy and uncertainty to assess the robustness of the SDMs based on adversarial datasets generated from the three datasets using a dense adversary method. Our evaluation results show that different SDMs perform differently regarding different metrics. Based on the results, we provide SDM selection guidelines and lessons learned from various perspectives., Comment: 18 pages, 6 figures, 4 tables
Published: 2024

3. Network-Based Transfer Learning Helps Improve Short-Term Crime Prediction Accuracy

Author: Wu, Jiahui and Frias-Martinez, Vanessa
Subjects: Computer Science - Machine Learning, Computer Science - Computers and Society
Abstract: Deep learning architectures enhanced with human mobility data have been shown to improve the accuracy of short-term crime prediction models trained with historical crime data. However, human mobility data may be scarce in some regions, negatively impacting the correct training of these models. To address this issue, we propose a novel transfer learning framework for short-term crime prediction models, whereby weights from the deep learning crime prediction models trained in source regions with plenty of mobility data are transferred to target regions to fine-tune their local crime prediction models and improve crime prediction accuracy. Our results show that the proposed transfer learning framework improves the F1 scores for target cities with mobility data scarcity, especially when the number of months of available mobility data is small. We also show that the F1 score improvements are pervasive across different types of crimes and diverse cities in the US., Comment: 19 pages, 3 figures, 7 tables. arXiv admin note: substantial text overlap with arXiv:2406.04382
Published: 2024

4. Improving the Fairness of Deep-Learning, Short-term Crime Prediction with Under-reporting-aware Models

Author: Wu, Jiahui and Frias-Martinez, Vanessa
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Deep learning crime predictive tools use past crime data and additional behavioral datasets to forecast future crimes. Nevertheless, these tools have been shown to suffer from unfair predictions across minority racial and ethnic groups. Current approaches to address this unfairness generally propose either pre-processing methods that mitigate the bias in the training datasets by applying corrections to crime counts based on domain knowledge or in-processing methods that are implemented as fairness regularizers to optimize for both accuracy and fairness. In this paper, we propose a novel deep learning architecture that combines the power of these two approaches to increase prediction fairness. Our results show that the proposed model improves the fairness of crime predictions when compared to models with in-processing de-biasing approaches and with models without any type of bias correction, albeit at the cost of reducing accuracy., Comment: 25 pages, 4 figures
Published: 2024

5. Dual-Capability Machine Learning Models for Quantum Hamiltonian Parameter Estimation and Dynamics Prediction

Author: An, Zheng, Wu, Jiahui, Lin, Zidong, Yang, Xiaobo, Li, Keren, and Zeng, Bei
Subjects: Quantum Physics
Abstract: Recent advancements in quantum hardware and classical computing simulations have significantly enhanced the accessibility of quantum system data, leading to an increased demand for precise descriptions and predictions of these systems. Accurate prediction of quantum Hamiltonian dynamics and identification of Hamiltonian parameters are crucial for advancements in quantum simulations, error correction, and control protocols. This study introduces a machine learning model with dual capabilities: it can deduce time-dependent Hamiltonian parameters from observed changes in local observables within quantum many-body systems, and it can predict the evolution of these observables based on Hamiltonian parameters. Our model's validity was confirmed through theoretical simulations across various scenarios and further validated by two experiments. Initially, the model was applied to a Nuclear Magnetic Resonance quantum computer, where it accurately predicted the dynamics of local observables. The model was then tested on a superconducting quantum computer with initially unknown Hamiltonian parameters, successfully inferring them. Our approach aims to enhance various quantum computing tasks, including parameter estimation, noise characterization, feedback processes, and quantum control optimization., Comment: 19 pages, 14 figures
Published: 2024

6. Random-coupled Neural Network

Author: Liu, Haoran, Liu, Mingzhe, Li, Peng, Wu, Jiahui, Jiang, Xin, Zuo, Zhuo, and Liu, Bingqi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Improving the efficiency of current neural networks and modeling them in biological neural systems have become popular research directions in recent years. Pulse-coupled neural network (PCNN) is a well applicated model for imitating the computation characteristics of the human brain in computer vision and neural network fields. However, differences between the PCNN and biological neural systems remain: limited neural connection, high computational cost, and lack of stochastic property. In this study, random-coupled neural network (RCNN) is proposed. It overcomes these difficulties in PCNN's neuromorphic computing via a random inactivation process. This process randomly closes some neural connections in the RCNN model, realized by the random inactivation weight matrix of link input. This releases the computational burden of PCNN, making it affordable to achieve vast neural connections. Furthermore, the image and video processing mechanisms of RCNN are researched. It encodes constant stimuli as periodic spike trains and periodic stimuli as chaotic spike trains, the same as biological neural information encoding characteristics. Finally, the RCNN is applicated to image segmentation, fusion, and pulse shape discrimination subtasks. It is demonstrated to be robust, efficient, and highly anti-noised, with outstanding performance in all applications mentioned above.
Published: 2024

7. Reality Bites: Assessing the Realism of Driving Scenarios with Large Language Models

Author: Wu, Jiahui, Lu, Chengjie, Arrieta, Aitor, Yue, Tao, and Ali, Shaukat
Subjects: Computer Science - Software Engineering
Abstract: Large Language Models (LLMs) are demonstrating outstanding potential for tasks such as text generation, summarization, and classification. Given that such models are trained on a humongous amount of online knowledge, we hypothesize that LLMs can assess whether driving scenarios generated by autonomous driving testing techniques are realistic, i.e., being aligned with real-world driving conditions. To test this hypothesis, we conducted an empirical evaluation to assess whether LLMs are effective and robust in performing the task. This reality check is an important step towards devising LLM-based autonomous driving testing techniques. For our empirical evaluation, we selected 64 realistic scenarios from \deepscenario--an open driving scenario dataset. Next, by introducing minor changes to them, we created 512 additional realistic scenarios, to form an overall dataset of 576 scenarios. With this dataset, we evaluated three LLMs (\gpt, \llama, and \mistral) to assess their robustness in assessing the realism of driving scenarios. Our results show that: (1) Overall, \gpt achieved the highest robustness compared to \llama and \mistral, consistently throughout almost all scenarios, roads, and weather conditions; (2) \mistral performed the worst consistently; (3) \llama achieved good results under certain conditions; and (4) roads and weather conditions do influence the robustness of the LLMs.
Published: 2024

8. Application of Graph Neural Networks in Dark Photon Search with Visible Decays at Future Beam Dump Experiment

Author: Lu, Zejia, Chen, Xiang, Wu, Jiahui, Zhang, Yulei, and Li, Liang
Subjects: High Energy Physics - Experiment, Physics - Instrumentation and Detectors
Abstract: Beam dump experiments provide a distinctive opportunity to search for dark photons, which are compelling candidates for dark matter with low mass. In this study, we propose the application of Graph Neural Networks (GNN) in tracking reconstruction with beam dump experiments to obtain high resolution in both tracking and vertex reconstruction. Our findings demonstrate that in a typical 3-track scenario with the visible decay mode, the GNN approach significantly outperforms the traditional approach, improving the 3-track reconstruction efficiency by up to 88% in the low mass region. Furthermore, we show that improving the minimal vertex detection distance significantly impacts the signal sensitivity in dark photon searches with the visible decay mode. By reducing the minimal vertex distance from 5 mm to 0.1 mm, the exclusion upper limit on the dark photon mass ($m_A\prime$) can be improved by up to a factor of 3.
Published: 2024
Full Text: View/download PDF

9. Uncertainty-Aware Test Prioritization: Approaches and Empirical Evaluation

Author: Zhang, Man, Wu, Jiahui, Ali, Shaukat, and Yue, Tao
Subjects: Computer Science - Software Engineering
Abstract: Complex software systems, e.g., Cyber-Physical Systems (CPSs), interact with the real world; thus, they often behave unexpectedly in uncertain environments. Testing such systems is challenging due to limited resources, time, complex testing infrastructure setup, and the inherent uncertainties in their operating environment. Devising uncertainty-aware testing solutions supported with test optimization techniques can be considered as a mandate for tackling this challenge. This paper proposes an uncertainty-aware and time-aware test case prioritization approach, named UncerPrio, for optimizing a sequence of tests to execute with a multi-objective search. To guide the prioritization with uncertainty, we identify four uncertainty measures: uncertainty measurement (AUM), uncertainty space (PUS), the number of uncertainties (ANU), and uncertainty coverage (PUU). Based on these measures and their combinations, we proposed 10 uncertainty-aware and multi-objective test case prioritization problems, and each problem was additionally defined with one cost objective (execution cost, PET) to be minimized and one effective measure (model coverage, PTR) to be maximized. Moreover, considering time constraints for test executions (i.e., time-aware), we defined 10 time budgets for all the 10 problems for identifying the best strategy in solving uncertainty-aware test prioritization. In our empirical study, we employed four well-known Multi-Objective Search Algorithms (MuOSAs): NSGA-II, MOCell, SPEA2, and CellDE with five use cases from two industrial CPS subject systems, and used Random Algorithm (RS) as the comparison baseline. Results show that all the MuOSAs significantly outperformed RS. The strategy of Prob.6 f(PET,PTR,AUM,ANU) (i.e., the problem with uncertainty measures AUM and ANU combined) achieved the overall best performance in observing uncertainty when using 100% time budget.
Published: 2023

10. ESAFL: Efficient Secure Additively Homomorphic Encryption for Cross-Silo Federated Learning

Author: Wu, Jiahui, Zhang, Weizhe, and Luo, Fucai
Subjects: Computer Science - Cryptography and Security
Abstract: Cross-silo federated learning (FL) enables multiple clients to collaboratively train a machine learning model without sharing training data, but privacy in FL remains a major challenge. Techniques using homomorphic encryption (HE) have been designed to solve this but bring their own challenges. Many techniques using single-key HE (SKHE) require clients to fully trust each other to prevent privacy disclosure between clients. However, fully trusted clients are hard to ensure in practice. Other techniques using multi-key HE (MKHE) aim to protect privacy from untrusted clients but lead to the disclosure of training results in public channels by untrusted third parties, e.g., the public cloud server. Besides, MKHE has higher computation and communication complexity compared with SKHE. We present a new FL protocol ESAFL that leverages a novel efficient and secure additively HE (ESHE) based on the hard problem of ring learning with errors. ESAFL can ensure the security of training data between untrusted clients and protect the training results against untrusted third parties. In addition, theoretical analyses present that ESAFL outperforms current techniques using MKHE in computation and communication, and intensive experiments show that ESAFL achieves approximate 204 times-953 times and 11 times-14 times training speedup while reducing the communication burden by 77 times-109 times and 1.25 times-2 times compared with the state-of-the-art FL models using SKHE.
Published: 2023

11. Unified Quantum State Tomography and Hamiltonian Learning Using Transformer Models: A Language-Translation-Like Approach for Quantum Systems

Author: An, Zheng, Wu, Jiahui, Yang, Muchun, Zhou, D. L., and Zeng, Bei
Subjects: Quantum Physics
Abstract: Schr\"odinger's equation serves as a fundamental component in characterizing quantum systems, wherein both quantum state tomography and Hamiltonian learning are instrumental in comprehending and interpreting quantum systems. While numerous techniques exist for carrying out state tomography and learning Hamiltonians individually, no method has been developed to combine these two aspects. In this study, we introduce a new approach that employs the attention mechanism in transformer models to effectively merge quantum state tomography and Hamiltonian learning. By carefully choosing and preparing the training data, our method integrates both tasks without altering the model's architecture, allowing the model to effectively learn the intricate relationships between quantum states and Hamiltonian. We also demonstrate the effectiveness of our approach across various quantum systems, ranging from simple 2-qubit cases to more involved 2D antiferromagnetic Heisenberg structures. The data collection process is streamlined, as it only necessitates a one-way generation process beginning with state tomography. Furthermore, the scalability and few-shot learning capabilities of our method could potentially minimize the resources required for characterizing and optimizing quantum systems. Our research provides valuable insights into the relationship between Hamiltonian structure and quantum system behavior, fostering opportunities for additional studies on quantum systems and the advancement of quantum computation and associated technologies., Comment: 15 pages, 10 figures
Published: 2023
Full Text: View/download PDF

12. Compatible Remediation on Vulnerabilities from Third-Party Libraries for Java Projects

Author: Zhang, Lyuye, Liu, Chengwei, Xu, Zhengzi, Chen, Sen, Fan, Lingling, Zhao, Lida, Wu, Jiahui, and Liu, Yang
Subjects: Computer Science - Software Engineering
Abstract: With the increasing disclosure of vulnerabilities in open-source software, software composition analysis (SCA) has been widely applied to reveal third-party libraries and the associated vulnerabilities in software projects. Beyond the revelation, SCA tools adopt various remediation strategies to fix vulnerabilities, the quality of which varies substantially. However, ineffective remediation could induce side effects, such as compilation failures, which impede acceptance by users. According to our studies, existing SCA tools could not correctly handle the concerns of users regarding the compatibility of remediated projects. To this end, we propose Compatible Remediation of Third-party libraries (CORAL) for Maven projects to fix vulnerabilities without breaking the projects. The evaluation proved that CORAL not only fixed 87.56% of vulnerabilities which outperformed other tools (best 75.32%) and achieved a 98.67% successful compilation rate and a 92.96% successful unit test rate. Furthermore, we found that 78.45% of vulnerabilities in popular Maven projects could be fixed without breaking the compilation, and the rest of the vulnerabilities (21.55%) could either be fixed by upgrades that break the compilations or even be impossible to fix by upgrading., Comment: 11 pages, conference
Published: 2023

13. Towards Understanding Third-party Library Dependency in C/C++ Ecosystem

Author: Tang, Wei, Xu, Zhengzi, Liu, Chengwei, Wu, Jiahui, Yang, Shouguo, Li, Yi, Luo, Ping, and Liu, Yang
Subjects: Computer Science - Software Engineering
Abstract: Third-party libraries (TPLs) are frequently reused in software to reduce development cost and the time to market. However, external library dependencies may introduce vulnerabilities into host applications. The issue of library dependency has received considerable critical attention. Many package managers, such as Maven, Pip, and NPM, are proposed to manage TPLs. Moreover, a significant amount of effort has been put into studying dependencies in language ecosystems like Java, Python, and JavaScript except C/C++. Due to the lack of a unified package manager for C/C++, existing research has only few understanding of TPL dependencies in the C/C++ ecosystem, especially at large scale. Towards understanding TPL dependencies in the C/C++ecosystem, we collect existing TPL databases, package management tools, and dependency detection tools, summarize the dependency patterns of C/C++ projects, and construct a comprehensive and precise C/C++ dependency detector. Using our detector, we extract dependencies from a large-scale database containing 24K C/C++ repositories from GitHub. Based on the extracted dependencies, we provide the results and findings of an empirical study, which aims at understanding the characteristics of the TPL dependencies. We further discuss the implications to manage dependency for C/C++ and the future research directions for software engineering researchers and developers in fields of library development, software composition analysis, and C/C++package manager., Comment: ASE 2022
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Wu, Jiahui"'

1. PackageIntel: Leveraging Large Language Models for Automated Intelligence Extraction in Package Ecosystems

2. Assessing the Uncertainty and Robustness of Object Detection Models for Detecting Stickers on Laptops

3. Network-Based Transfer Learning Helps Improve Short-Term Crime Prediction Accuracy

4. Improving the Fairness of Deep-Learning, Short-term Crime Prediction with Under-reporting-aware Models

5. Dual-Capability Machine Learning Models for Quantum Hamiltonian Parameter Estimation and Dynamics Prediction

6. Random-coupled Neural Network

7. Reality Bites: Assessing the Realism of Driving Scenarios with Large Language Models

8. Application of Graph Neural Networks in Dark Photon Search with Visible Decays at Future Beam Dump Experiment

9. Uncertainty-Aware Test Prioritization: Approaches and Empirical Evaluation

10. ESAFL: Efficient Secure Additively Homomorphic Encryption for Cross-Silo Federated Learning

11. Unified Quantum State Tomography and Hamiltonian Learning Using Transformer Models: A Language-Translation-Like Approach for Quantum Systems

12. Compatible Remediation on Vulnerabilities from Third-Party Libraries for Java Projects

13. Towards Understanding Third-party Library Dependency in C/C++ Ecosystem

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

13 results on '"Wu, Jiahui"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources