"metamorphic testing" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"metamorphic testing"' showing total 2,909 results

Start Over "metamorphic testing"

2,909 results on '"metamorphic testing"'

1. Exploring and Lifting the Robustness of LLM-powered Automated Program Repair with Metamorphic Testing

Author: Xue, Pengyu, Wu, Linhao, Yang, Zhen, Li, Xinyi, Yu, Zhongxing, Jin, Zhi, Li, Ge, Xiao, Yan, and Wu, Jingwen
Subjects: Computer Science - Software Engineering
Abstract: In recent years, Large language model-powered Automated Program Repair (LAPR) techniques have achieved state-of-the-art bug-fixing performance and have been pervasively applied and studied in both industry and academia. Nonetheless, LLMs were proved to be highly sensitive to input prompts, with slight differences in the expressions of semantically equivalent programs potentially causing repair failures. Therefore, it is crucial to conduct robustness testing on LAPR techniques before their practical deployment. However, related research is scarce. To this end, we propose MT-LAPR, a Metamorphic Testing framework exclusively for LAPR techniques, which summarizes nine widely-recognized Metamorphic Relations (MRs) by developers across three perturbation levels: token, statement, and block. Afterward, our proposed MRs are applied to buggy codes to generate test cases, which are semantically equivalent yet to affect the inference of LAPR. Experiments are carried out on two extensively examined bug-fixing datasets, i.e., Defect4J and QuixBugs, and four bug-fixing abled LLMs released recently, demonstrating that 34.4% - 48.5% of the test cases expose the instability of LAPR techniques on average, showing the effectiveness of MT-LAPR and uncovering a positive correlation between code readability and the robustness of LAPR techniques. Inspired by the above findings, this paper uses the test cases generated by MT-LAPR as samples to train a CodeT5-based code editing model aiming at improving code readability and then embeds it into the LAPR workflow as a data preprocessing step. Extensive experiments demonstrate that this approach significantly enhances the robustness of LAPR by 49.32% at most.
Published: 2024

2. Evaluating Human Trajectory Prediction with Metamorphic Testing

Author: Spieker, Helge, Belmecheri, Nassim, Gotlieb, Arnaud, and Lazaar, Nadjib
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: The prediction of human trajectories is important for planning in autonomous systems that act in the real world, e.g. automated driving or mobile robots. Human trajectory prediction is a noisy process, and no prediction does precisely match any future trajectory. It is therefore approached as a stochastic problem, where the goal is to minimise the error between the true and the predicted trajectory. In this work, we explore the application of metamorphic testing for human trajectory prediction. Metamorphic testing is designed to handle unclear or missing test oracles. It is well-designed for human trajectory prediction, where there is no clear criterion of correct or incorrect human behaviour. Metamorphic relations rely on transformations over source test cases and exploit invariants. A setting well-designed for human trajectory prediction where there are many symmetries of expected human behaviour under variations of the input, e.g. mirroring and rescaling of the input data. We discuss how metamorphic testing can be applied to stochastic human trajectory prediction and introduce the Wasserstein Violation Criterion to statistically assess whether a follow-up test case violates a label-preserving metamorphic relation., Comment: MET'24: 9th ACM International Workshop on Metamorphic Testing
Published: 2024
Full Text: View/download PDF

3. MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing

Author: Xu, Congying, Chen, Songqiang, Wu, Jiarong, Cheung, Shing-Chi, Terragni, Valerio, Zhu, Hengcheng, and Cao, Jialun
Subjects: Computer Science - Software Engineering
Abstract: While a recent study reveals that many developer-written test cases can encode a reusable Metamorphic Relation (MR), over 70% of them directly hard-code the source input and follow-up input in the encoded relation. Such encoded MRs, which do not contain an explicit input transformation to transform the source inputs to corresponding follow-up inputs, cannot be reused with new source inputs to enhance test adequacy. In this paper, we propose MR-Adopt (Automatic Deduction Of inPut Transformation) to automatically deduce the input transformation from the hard-coded source and follow-up inputs, aiming to enable the encoded MRs to be reused with new source inputs. With typically only one pair of source and follow-up inputs available in an MR-encoded test case as the example, we leveraged LLMs to understand the intention of the test case and generate additional examples of source-followup input pairs. This helps to guide the generation of input transformations generalizable to multiple source inputs. Besides, to mitigate the issue that LLMs generate erroneous code, we refine LLM-generated transformations by removing MR- irrelevant code elements with data-flow analysis. Finally, we assess candidate transformations based on encoded output relations and select the best transformation as the result. Evaluation results show that MR-Adopt can generate input transformations applicable to all experimental source inputs for 72.00% of encoded MRs, which is 33.33% more than using vanilla GPT-3.5. By incorporating MR- Adopt-generated input transformations, encoded MR-based test cases can effectively enhance the test adequacy, increasing the line coverage and mutation score by 10.62% and 18.91%, respectively., Comment: This paper is accepted to ASE 2024
Published: 2024

4. A study of the interactive role of metamorphic testing and machine learning in the quality assurance of a deep learning forecasting application

Author: Nasr, Islam, Nassar, Lobna, and Karray, Fakhri
Published: 2024
Full Text: View/download PDF

5. Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models

Author: Li, Ningke, Li, Yuekang, Liu, Yi, Shi, Ling, Wang, Kailong, and Wang, Haoyu
Subjects: Computer Science - Software Engineering
Abstract: Large language models (LLMs) have transformed the landscape of language processing, yet struggle with significant challenges in terms of security, privacy, and the generation of seemingly coherent but factually inaccurate outputs, commonly referred to as hallucinations. Among these challenges, one particularly pressing issue is Fact-Conflicting Hallucination (FCH), where LLMs generate content that directly contradicts established facts. Tackling FCH poses a formidable task due to two primary obstacles: Firstly, automating the construction and updating of benchmark datasets is challenging, as current methods rely on static benchmarks that don't cover the diverse range of FCH scenarios. Secondly, validating LLM outputs' reasoning process is inherently complex, especially with intricate logical relations involved. In addressing these obstacles, we propose an innovative approach leveraging logic programming to enhance metamorphic testing for detecting Fact-Conflicting Hallucinations (FCH). Our method gathers data from sources like Wikipedia, expands it with logical reasoning to create diverse test cases, assesses LLMs through structured prompts, and validates their coherence using semantic-aware assessment mechanisms. Our method generates test cases and detects hallucinations across six different LLMs spanning nine domains, revealing hallucination rates ranging from 24.7% to 59.8%. Key observations indicate that LLMs encounter challenges, particularly with temporal concepts, handling out-of-distribution knowledge, and exhibiting deficiencies in logical reasoning capabilities. The outcomes underscore the efficacy of logic-based test cases generated by our tool in both triggering and identifying hallucinations. These findings underscore the imperative for ongoing collaborative endeavors within the community to detect and address LLM hallucinations., Comment: 29 pages, 11 figures, 4 tables, to appear in OOPSLA'24 (Vol.8, No.OOPSLA2, Article 336)
Published: 2024

6. METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities

Author: Hyun, Sangwon, Guo, Mingyu, and Babar, M. Ali
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. However, their black-boxed and probabilistic characteristics can lead to potential risks in the quality of outputs in diverse LLM applications. Recent studies have tested Quality Attributes (QAs), such as robustness or fairness, of LLMs by generating adversarial input texts. However, existing studies have limited their coverage of QAs and tasks in LLMs and are difficult to extend. Additionally, these studies have only used one evaluation metric, Attack Success Rate (ASR), to assess the effectiveness of their approaches. We propose a MEtamorphic Testing for Analyzing LLMs (METAL) framework to address these issues by applying Metamorphic Testing (MT) techniques. This approach facilitates the systematic testing of LLM qualities by defining Metamorphic Relations (MRs), which serve as modularized evaluation metrics. The METAL framework can automatically generate hundreds of MRs from templates that cover various QAs and tasks. In addition, we introduced novel metrics that integrate the ASR method into the semantic qualities of text to assess the effectiveness of MRs accurately. Through the experiments conducted with three prominent LLMs, we have confirmed that the METAL framework effectively evaluates essential QAs on primary LLM tasks and reveals the quality risks in LLMs. Moreover, the newly proposed metrics can guide the optimal MRs for testing each task and suggest the most effective method for generating MRs., Comment: Accepted to International Conference on Software Testing, Verification and Validation (ICST) 2024 / Key words: Large-language models, Metamorphic testing, Quality evaluation, Text perturbations
Published: 2023

7. MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems in LLM Augmented Generation

Author: Wang, Guanyu, Li, Yuekang, Liu, Yi, Deng, Gelei, Li, Tianlin, Xu, Guosheng, Liu, Yang, Wang, Haoyu, and Wang, Kailong
Subjects: Computer Science - Software Engineering
Abstract: Augmented generation techniques such as Retrieval-Augmented Generation (RAG) and Cache-Augmented Generation (CAG) have revolutionized the field by enhancing large language model (LLM) outputs with external knowledge and cached information. However, the integration of vector databases, which serve as a backbone for these augmentations, introduces critical challenges, particularly in ensuring accurate vector matching. False vector matching in these databases can significantly compromise the integrity and reliability of LLM outputs, leading to misinformation or erroneous responses. Despite the crucial impact of these issues, there is a notable research gap in methods to effectively detect and address false vector matches in LLM-augmented generation. This paper presents MeTMaP, a metamorphic testing framework developed to identify false vector matching in LLM-augmented generation systems. We derive eight metamorphic relations (MRs) from six NLP datasets, which form our method's core, based on the idea that semantically similar texts should match and dissimilar ones should not. MeTMaP uses these MRs to create sentence triplets for testing, simulating real-world LLM scenarios. Our evaluation of MeTMaP over 203 vector matching configurations, involving 29 embedding models and 7 distance metrics, uncovers significant inaccuracies. The results, showing a maximum accuracy of only 41.51\% on our tests compared to the original datasets, emphasize the widespread issue of false matches in vector matching methods and the critical need for effective detection and mitigation in LLM-augmented applications.
Published: 2024

8. Word Closure-Based Metamorphic Testing for Machine Translation

Author: Xie, Xiaoyuan, Jin, Shuo, Chen, Songqiang, and Cheung, Shing-Chi
Subjects: Computer Science - Software Engineering
Abstract: With the wide application of machine translation, the testing of Machine Translation Systems (MTSs) has attracted much attention. Recent works apply Metamorphic Testing (MT) to address the oracle problem in MTS testing. Existing MT methods for MTS generally follow the workflow of input transformation and output relation comparison, which generates a follow-up input sentence by mutating the source input and compares the source and follow-up output translations to detect translation errors, respectively. These methods use various input transformations to generate test case pairs and have successfully triggered numerous translation errors. However, they have limitations in performing fine-grained and rigorous output relation comparison and thus may report many false alarms and miss many true errors. In this paper, we propose a word closure-based output comparison method to address the limitations of the existing MTS MT methods. We first propose word closure as a new comparison unit, where each closure includes a group of correlated input and output words in the test case pair. Word closures suggest the linkages between the appropriate fragment in the source output translation and its counterpart in the follow-up output for comparison. Next, we compare the semantics on the level of word closure to identify the translation errors. In this way, we perform a fine-grained and rigorous semantic comparison for the outputs and thus realize more effective violation identification. We evaluate our method with the test cases generated by five existing input transformations and the translation outputs from three popular MTSs. Results show that our method significantly outperforms the existing works in violation identification by improving the precision and recall and achieving an average increase of 29.9% in F1 score. It also helps to increase the F1 score of translation error localization by 35.9%., Comment: This paper was accepted by ACM Transactions on Software Engineering and Methodology (TOSEM) in June 2024
Published: 2023
Full Text: View/download PDF

9. MetaDetect: Metamorphic Testing Based Anomaly Detection for Multi-UAV Wireless Networks

Author: Yan, Boyang
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Software Engineering, Statistics - Methodology
Abstract: The reliability of wireless Ad Hoc Networks (WANET) communication is much lower than wired networks. WANET will be impacted by node overload, routing protocol, weather, obstacle blockage, and many other factors, all those anomalies cannot be avoided. Accurate prediction of the network entirely stopping in advance is essential after people could do networking re-routing or changing to different bands. In the present study, there are two primary goals. Firstly, design anomaly events detection patterns based on Metamorphic Testing (MT) methodology. Secondly, compare the performance of evaluation metrics, such as Transfer Rate, Occupancy rate, and the Number of packets received. Compared to other studies, the most significant advantage of mathematical interpretability, as well as not requiring dependence on physical environmental information, only relies on the networking physical layer and Mac layer data. The analysis of the results demonstrates that the proposed MT detection method is helpful for automatically identifying incidents/accident events on WANET. The physical layer transfer Rate metric could get the best performance., Comment: 9 pages, 7 figures
Published: 2023

10. Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

Author: Cheng, Mingfei, Zhou, Yuan, Xie, Xiaofei, Wang, Junjie, Meng, Guozhu, and Yang, Kairui
Subjects: Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, Computer Science - Robotics, Computer Science - Software Engineering
Abstract: Autonomous Driving System (ADS) testing is crucial in ADS development, with the current primary focus being on safety. However, the evaluation of non-safety-critical performance, particularly the ADS's ability to make optimal decisions and produce optimal paths for autonomous vehicles (AVs), is equally vital to ensure the intelligence and reduce risks of AVs. Currently, there is little work dedicated to assessing ADSs' optimal decision-making performance due to the lack of corresponding oracles and the difficulty in generating scenarios with non-optimal decisions. In this paper, we focus on evaluating the decision-making quality of an ADS and propose the first method for detecting non-optimal decision scenarios (NoDSs), where the ADS does not compute optimal paths for AVs. Firstly, to deal with the oracle problem, we propose a novel metamorphic relation (MR) aimed at exposing violations of optimal decisions. The MR identifies the property that the ADS should retain optimal decisions when the optimal path remains unaffected by non-invasive changes. Subsequently, we develop a new framework, Decictor, designed to generate NoDSs efficiently. Decictor comprises three main components: Non-invasive Mutation, MR Check, and Feedback. The Non-invasive Mutation ensures that the original optimal path in the mutated scenarios is not affected, while the MR Check is responsible for determining whether non-optimal decisions are made. To enhance the effectiveness of identifying NoDSs, we design a feedback metric that combines both spatial and temporal aspects of the AV's movement. We evaluate Decictor on Baidu Apollo, an open-source and production-grade ADS. The experimental results validate the effectiveness of Decictor in detecting non-optimal decisions of ADSs. Our work provides valuable and original insights into evaluating the non-safety-critical performance of ADSs.
Published: 2024

11. Metamorphic Testing of Image Captioning Systems via Image-Level Reduction

Author: Xie, Xiaoyuan, Li, Xingpeng, and Chen, Songqiang
Subjects: Computer Science - Software Engineering
Abstract: The Image Captioning (IC) technique is widely used to describe images in natural language. Recently, some IC system testing methods have been proposed. However, these methods still rely on pre-annotated information and hence cannot really alleviate the oracle problem in testing. Besides, their method artificially manipulates objects, which may generate unreal images as test cases and thus lead to less meaningful testing results. Thirdly, existing methods have various requirements on the eligibility of source test cases, and hence cannot fully utilize the given images to perform testing. To tackle these issues, in this paper, we propose REIC to perform metamorphic testing for IC systems with some image-level reduction transformations like image cropping and stretching. Instead of relying on the pre-annotated information, REIC uses a localization method to align objects in the caption with corresponding objects in the image, and checks whether each object is correctly described or deleted in the caption after transformation. With the image-level reduction transformations, REIC does not artificially manipulate any objects and hence can avoid generating unreal follow-up images. Besides, it eliminates the requirement on the eligibility of source test cases in the metamorphic transformation process, as well as decreases the ambiguity and boosts the diversity among the follow-up test cases, which consequently enables testing to be performed on any test image and reveals more distinct valid violations. We employ REIC to test five popular IC systems. The results demonstrate that REIC can sufficiently leverage the provided test images to generate follow-up cases of good reality, and effectively detect a great number of distinct violations, without the need for any pre-annotated information., Comment: Accepted by IEEE Transactions on Software Engineering (TSE) in September 2024
Published: 2023
Full Text: View/download PDF

12. Can ChatGPT advance software testing intelligence? An experience report on metamorphic testing

Author: Luu, Quang-Hung, Liu, Huai, and Chen, Tsong Yueh
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
Abstract: While ChatGPT is a well-known artificial intelligence chatbot being used to answer human's questions, one may want to discover its potential in advancing software testing. We examine the capability of ChatGPT in advancing the intelligence of software testing through a case study on metamorphic testing (MT), a state-of-the-art software testing technique. We ask ChatGPT to generate candidates of metamorphic relations (MRs), which are basically necessary properties of the object program and which traditionally require human intelligence to identify. These MR candidates are then evaluated in terms of correctness by domain experts. We show that ChatGPT can be used to generate new correct MRs to test several software systems. Having said that, the majority of MR candidates are either defined vaguely or incorrect, especially for systems that have never been tested with MT. ChatGPT can be used to advance software testing intelligence by proposing MR candidates that can be later adopted for implementing tests; but human intelligence should still inevitably be involved to justify and rectify their correctness., Comment: 4 pages (short communications), 2 figures, 2 tables
Published: 2023

13. Towards a Complete Metamorphic Testing Pipeline

Author: Duque-Torres, Alejandra and Pfahl, Dietmar
Subjects: Computer Science - Software Engineering, D.2.5
Abstract: Metamorphic Testing (MT) addresses the test oracle problem by examining the relationships between input-output pairs in consecutive executions of the System Under Test (SUT). These relations, known as Metamorphic Relations (MRs), specify the expected output changes resulting from specific input changes. However, achieving full automation in generating, selecting, and understanding MR violations poses challenges. Our research aims to develop methods and tools that assist testers in generating MRs, defining constraints, and providing explainability for MR outcomes. In the MR generation phase, we explore automated techniques that utilise a domain-specific language to generate and describe MRs. The MR constraint definition focuses on capturing the nuances of MR applicability by defining constraints. These constraints help identify the specific conditions under which MRs are expected to hold. The evaluation and validation involve conducting empirical studies to assess the effectiveness of the developed methods and validate their applicability in real-world regression testing scenarios. Through this research, we aim to advance the automation of MR generation, enhance the understanding of MR violations, and facilitate their effective application in regression testing., Comment: 5 pages
Published: 2023
Full Text: View/download PDF

14. An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

Author: Wang, Wenxuan, Huang, Jingyuan, Huang, Jen-tse, Chen, Chang, Gu, Jiazhen, He, Pinjia, and Lyu, Michael R.
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: The exponential growth of social media platforms has brought about a revolution in communication and content dissemination in human society. Nevertheless, these platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography, leading to severe negative consequences such as harm to teenagers' mental health. Despite tremendous efforts in developing and deploying textual and image content moderation methods, malicious users can evade moderation by embedding texts into images, such as screenshots of the text, usually with some interference. We find that modern content moderation software's performance against such malicious inputs remains underexplored. In this work, we propose OASIS, a metamorphic testing framework for content moderation software. OASIS employs 21 transform rules summarized from our pilot study on 5,000 real-world toxic contents collected from 4 popular social media applications, including Twitter, Instagram, Sina Weibo, and Baidu Tieba. Given toxic textual contents, OASIS can generate image test cases, which preserve the toxicity yet are likely to bypass moderation. In the evaluation, we employ OASIS to test five commercial textual content moderation software from famous companies (i.e., Google Cloud, Microsoft Azure, Baidu Cloud, Alibaba Cloud and Tencent Cloud), as well as a state-of-the-art moderation research model. The results show that OASIS achieves up to 100% error finding rates. Moreover, through retraining the models with the test cases generated by OASIS, the robustness of the moderation model can be improved without performance degradation., Comment: Accepted by ASE 2023. arXiv admin note: substantial text overlap with arXiv:2302.05706
Published: 2023

15. Robustness Evaluation in Hand Pose Estimation Models using Metamorphic Testing

Author: Pu, Muxin, Chong, Chun Yong, and Lim, Mei Kuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Software Engineering
Abstract: Hand pose estimation (HPE) is a task that predicts and describes the hand poses from images or video frames. When HPE models estimate hand poses captured in a laboratory or under controlled environments, they normally deliver good performance. However, the real-world environment is complex, and various uncertainties may happen, which could degrade the performance of HPE models. For example, the hands could be occluded, the visibility of hands could be reduced by imperfect exposure rate, and the contour of hands prone to be blurred during fast hand movements. In this work, we adopt metamorphic testing to evaluate the robustness of HPE models and provide suggestions on the choice of HPE models for different applications. The robustness evaluation was conducted on four state-of-the-art models, namely MediaPipe hands, OpenPose, BodyHands, and NSRM hand. We found that on average more than 80\% of the hands could not be identified by BodyHands, and at least 50\% of hands could not be identified by MediaPipe hands when diagonal motion blur is introduced, while an average of more than 50\% of strongly underexposed hands could not be correctly estimated by NSRM hand. Similarly, applying occlusions on only four hand joints will also largely degrade the performance of these models. The experimental results show that occlusions, illumination variations, and motion blur are the main obstacles to the performance of existing HPE models. These findings may pave the way for researchers to improve the performance and robustness of hand pose estimation models and their applications., Comment: Accepted at 2023 8th International Workshop on Metamorphic Testing, 8 pages
Published: 2023

16. Metamorphic Testing for Smart Contract Vulnerabilities Detection

Author: Li, Jiahao
Subjects: Computer Science - Software Engineering
Abstract: Despite the rapid growth of smart contracts, they are suffering numerous security vulnerabilities due to the absence of reliable development and testing. In this article, we apply the metamorphic testing technique to detect smart contract vulnerabilities. Based on the anomalies we observed in vulnerable smart contracts, we define five metamorphic relations to detect abnormal gas consumption and account interaction inconsistency of the target smart contract. Through dynamically executing transactions and checking the final violation of metamorphic relations, we determine whether a smart contract is vulnerable. We evaluate our approach on a benchmark of 67 manually annotated smart contracts. The experimental results show that our approach achieves a higher detection rate (TPR, true positive rate) with a lower misreport rate (FDR, false discovery rate) than the other three state-of-the-art tools. These results further suggest that metamorphic testing is a promising method for detecting smart contract vulnerabilities.
Published: 2023

17. Vulnerability detection method for blockchain smart contracts based on metamorphic testing

Author: Jinfu CHEN, Zhenxin WANG, Saihua CAI, Qiaowei FENG, Yuhao CHEN, Rongtian XU, and KwakuKudjo Patrick
Subjects: software testing, blockchain, smart contract, security vulnerability, metamorphic testing, Telecommunication, TK5101-6720
Abstract: Aimed at the defects of existing test methods, a vulnerability detection method for blockchain smart contracts based on metamorphic testing was proposed, which could generate test cases for specific functions in blockchain smart contracts to detect possible vulnerabilities.According to the possible security vulnerabilities, different metamorphosis relationships were designed and then metamorphic testing was performed.Through verifying whether the metamorphic relationship between the source test case and the subsequent test case was satisfied, whether the smart contract had related security vulnerabilities was judged.The experimental results show that the proposed method can effectively detect the security vulnerabilities in the smart contracts.
Published: 2023
Full Text: View/download PDF

18. Sensitive Region-based Metamorphic Testing Framework using Explainable AI

Author: Torikoshi, Yuma, Nishi, Yasuharu, and Takahashi, Juichi
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep Learning (DL) is one of the most popular research topics in machine learning and DL-driven image recognition systems have developed rapidly. Recent research has employed metamorphic testing (MT) to detect misclassified images. Most of them discuss metamorphic relations (MR), with limited attention given to which regions should be transformed. We focus on the fact that there are sensitive regions where even small transformations can easily change the prediction results and propose an MT framework that efficiently tests for regions prone to misclassification by transforming these sensitive regions. Our evaluation demonstrated that the sensitive regions can be specified by Explainable AI (XAI) and our framework effectively detects faults.
Published: 2023

19. Metamorphic testing of machine learning and conceptual hydrologic models

Author: P. Reichert, K. Ma, M. Höge, F. Fenicia, M. Baity-Jesi, D. Feng, and C. Shen
Subjects: Technology, Environmental technology. Sanitary engineering, TD1-1066, Geography. Anthropology. Recreation, Environmental sciences, GE1-350
Abstract: Predicting the response of hydrologic systems to modified driving forces beyond patterns that have occurred in the past is of high importance for estimating climate change impacts or the effect of management measures. This kind of prediction requires a model, but the impossibility of testing such predictions against observed data makes it difficult to estimate their reliability. Metamorphic testing offers a methodology for assessing models beyond validation with real data. It consists of defining input changes for which the expected responses are assumed to be known, at least qualitatively, and testing model behavior for consistency with these expectations. To increase the gain of information and reduce the subjectivity of this approach, we extend this methodology to a multi-model approach and include a sensitivity analysis of the predictions to training or calibration options. This allows us to quantitatively analyze differences in predictions between different model structures and calibration options in addition to the qualitative test of the expectations. In our case study, we apply this approach to selected conceptual and machine learning hydrological models calibrated for basins from the CAMELS data set. Our results confirm the superiority of the machine learning models over the conceptual hydrologic models regarding the quality of fit during calibration and validation periods. However, we also find that the response of machine learning models to modified inputs can deviate from the expectations and the magnitude, and even the sign of the response can depend on the training data. In addition, even in cases in which all models passed the metamorphic test, there are cases in which the quantitative response is different for different model structures. This demonstrates the importance of this kind of testing beyond and in addition to the usual calibration–validation analysis to identify potential problems and stimulate the development of improved models.
Published: 2024
Full Text: View/download PDF

20. MTGP: Combining Metamorphic Testing and Genetic Programming

Author: Sobania, Dominik, Briesch, Martin, Röchner, Philipp, and Rothlauf, Franz
Subjects: Computer Science - Software Engineering, Computer Science - Neural and Evolutionary Computing
Abstract: Genetic programming is an evolutionary approach known for its performance in program synthesis. However, it is not yet mature enough for a practical use in real-world software development, since usually many training cases are required to generate programs that generalize to unseen test cases. As in practice, the training cases have to be expensively hand-labeled by the user, we need an approach to check the program behavior with a lower number of training cases. Metamorphic testing needs no labeled input/output examples. Instead, the program is executed multiple times, first on a given (randomly generated) input, followed by related inputs to check whether certain user-defined relations between the observed outputs hold. In this work, we suggest MTGP, which combines metamorphic testing and genetic programming and study its performance and the generalizability of the generated programs. Further, we analyze how the generalizability depends on the number of given labeled training cases. We find that using metamorphic testing combined with labeled training cases leads to a higher generalization rate than the use of labeled training cases alone in almost all studied configurations. Consequently, we recommend researchers to use metamorphic testing in their systems if the labeling of the training data is expensive.
Published: 2023

21. MTTM: Metamorphic Testing for Textual Content Moderation Software

Author: Wang, Wenxuan, Huang, Jen-tse, Wu, Weibin, Zhang, Jianping, Huang, Yizhan, Li, Shuqing, He, Pinjia, and Lyu, Michael
Subjects: Computer Science - Computation and Language, Computer Science - Software Engineering
Abstract: The exponential growth of social media platforms such as Twitter and Facebook has revolutionized textual communication and textual content publication in human society. However, they have been increasingly exploited to propagate toxic content, such as hate speech, malicious advertisement, and pornography, which can lead to highly negative impacts (e.g., harmful effects on teen mental health). Researchers and practitioners have been enthusiastically developing and extensively deploying textual content moderation software to address this problem. However, we find that malicious users can evade moderation by changing only a few words in the toxic content. Moreover, modern content moderation software performance against malicious inputs remains underexplored. To this end, we propose MTTM, a Metamorphic Testing framework for Textual content Moderation software. Specifically, we conduct a pilot study on 2,000 text messages collected from real users and summarize eleven metamorphic relations across three perturbation levels: character, word, and sentence. MTTM employs these metamorphic relations on toxic textual contents to generate test cases, which are still toxic yet likely to evade moderation. In our evaluation, we employ MTTM to test three commercial textual content moderation software and two state-of-the-art moderation algorithms against three kinds of toxic content. The results show that MTTM achieves up to 83.9%, 51%, and 82.5% error finding rates (EFR) when testing commercial moderation software provided by Google, Baidu, and Huawei, respectively, and it obtains up to 91.2% EFR when testing the state-of-the-art algorithms from the academy. In addition, we leverage the test cases generated by MTTM to retrain the model we explored, which largely improves model robustness (0% to 5.9% EFR) while maintaining the accuracy on the original test set., Comment: Accepted by ICSE 2023
Published: 2023

22. Metamorphic Testing in Autonomous System Simulations

Author: Adigun, Jubril Gbolahan, Eisele, Linus, and Felderer, Michael
Subjects: Computer Science - Software Engineering, Computer Science - Multiagent Systems, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control
Abstract: Metamorphic testing has proven to be effective for test case generation and fault detection in many domains. It is a software testing strategy that uses certain relations between input-output pairs of a program, referred to as metamorphic relations. This approach is relevant in the autonomous systems domain since it helps in cases where the outcome of a given test input may be difficult to determine. In this paper therefore, we provide an overview of metamorphic testing as well as an implementation in the autonomous systems domain. We implement an obstacle detection and avoidance task in autonomous drones utilising the GNC API alongside a simulation in Gazebo. Particularly, we describe properties and best practices that are crucial for the development of effective metamorphic relations. We also demonstrate two metamorphic relations for metamorphic testing of single and more than one drones, respectively. Our relations reveal several properties and some weak spots of both the implementation and the avoidance algorithm in the light of metamorphic testing. The results indicate that metamorphic testing has great potential in the autonomous systems domain and should be considered for quality assurance in this field., Comment: 8 pages, 5 figures, 48th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA)
Published: 2022

23. Application of property-based testing tools\\ for metamorphic testing

Author: Alzahrani, Nasser, Spichkova, Maria, and Harland, James
Subjects: Computer Science - Software Engineering, Computer Science - Formal Languages and Automata Theory
Abstract: Metamorphic testing (MT) is a general approach for the testing of a specific kind of software systems -- so-called ``non-testable'', where the ``classical'' testing approaches are difficult to apply. MT is an effective approach for addressing the test oracle problem and test case generation problem. The test oracle problem is when it is difficult to determine the correct expected output of a particular test case or to determine whether the actual outputs agree with the expected outcomes. The core concept in MT is metamorphic relations (MRs) which provide formal specification of the system under test. One of the challenges in MT is \emph{effective test generation}. Property-based testing (PBT) is a testing methodology in which test cases are generated according to desired properties of the software. In some sense, MT can be seen as a very specific kind of PBT.\\ In this paper, we show how to use PBT tools to automate test generation and verification of MT. In addition to automation benefit, the proposed method shows how to combine general PBT with MT under the same testing framework., Comment: Preprint. Accepted to the 17th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2022). Final version published by SCITEPRESS, http://www.scitepress.org
Published: 2022

24. Unveiling Hidden DNN Defects with Decision-Based Metamorphic Testing

Author: Yuan, Yuanyuan, Pang, Qi, and Wang, Shuai
Subjects: Computer Science - Software Engineering
Abstract: Contemporary DNN testing works are frequently conducted using metamorphic testing (MT). In general, de facto MT frameworks mutate DNN input images using semantics-preserving mutations and determine if DNNs can yield consistent predictions. Nevertheless, we find that DNNs may rely on erroneous decisions (certain components on the DNN inputs) to make predictions, which may still retain the outputs by chance. Such DNN defects would be neglected by existing MT frameworks. Erroneous decisions, however, would likely result in successive mis-predictions over diverse images that may exist in real-life scenarios. This research aims to unveil the pervasiveness of hidden DNN defects caused by incorrect DNN decisions (but retaining consistent DNN predictions). To do so, we tailor and optimize modern eXplainable AI (XAI) techniques to identify visual concepts that represent regions in an input image upon which the DNN makes predictions. Then, we extend existing MT-based DNN testing frameworks to check the consistency of DNN decisions made over a test input and its mutated inputs. Our evaluation shows that existing MT frameworks are oblivious to a considerable number of DNN defects caused by erroneous decisions. We conduct human evaluations to justify the validity of our findings and to elucidate their characteristics. Through the lens of DNN decision-based metamorphic relations, we re-examine the effectiveness of metamorphic transformations proposed by existing MT frameworks. We summarize lessons from this study, which can provide insights and guidelines for future DNN testing., Comment: The extended version of a paper to appear in the Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, (ASE '22), 13 pages
Published: 2022

25. A Sequential Metamorphic Testing Framework for Understanding Automated Driving Systems

Author: Luu, Quang-Hung, Liu, Huai, Chen, Tsong Yueh, and Vu, Hai L.
Subjects: Computer Science - Software Engineering
Abstract: Automated driving systems (ADS) are expected to be reliable and robust against a wide range of driving scenarios. Their decisions, first and foremost, must be well understood. Understanding a decision made by ADS is a great challenge, because it is not straightforward to tell whether the decision is correct or not, and how to verify it systematically. In this paper, a Sequential MetAmoRphic Testing Smart framework is proposed based on metamorphic testing, a mainstream software testing approach. In metamorphic testing, metamorphic groups are constructed by selecting multiple inputs according to the so-called metamorphic relations, which are basically the system's necessary properties; the violation of certain relations by some corresponding metamorphic groups implies the detection of erroneous system behaviors. The proposed framework makes use of sequences of metamorphic groups to understand ADS behaviors, and is applicable without the need of ground-truth datasets. To demonstrate its effectiveness, the framework is applied to test three ADS models that steer an autonomous car in different scenarios with another car either leading in front or approaching in the opposite direction. The conducted experiments reveal a large number of undesirable behaviors in these top-ranked deep learning models in the scenarios. These counter-intuitive behaviors are associated with how the core models of ADS respond to different positions, directions and properties of the other car in its proximity. Further analysis of the results helps identify critical factors affecting ADS decisions and thus demonstrates that the framework can be used to provide a comprehensive understanding of ADS before their deployment, Comment: 11 pages, 6 figures, 3 tables
Published: 2022

26. Metamorphic Testing for Recommender Systems

Author: Iakusheva, Sofia, primary and Khritankov, Anton, additional
Published: 2024
Full Text: View/download PDF

27. An empirical study on metamorphic testing for recommender systems

Author: Mao, Chengying, Chen, Jifu, Yi, Xiaorong, and Wen, Linlin
Published: 2024
Full Text: View/download PDF

28. MorphQ++: A Reproducibility Study of Metamorphic Testing on Quantum Compilers.

Author: Linsey J. Kitt and Myra B. Cohen
Published: 2024
Full Text: View/download PDF

29. MT4SC: A User-Behavior-Sequence-Aware Metamorphic Testing Approach for Smart Contracts.

Author: Yuan-rui Ji, Chang-Ai Sun, Xin-hui Zheng, and Huai Liu
Published: 2024
Full Text: View/download PDF

30. MTSA: Multi-Granularity Metamorphic Testing for Sentiment Analysis Systems.

Author: Honghao Huang, Ya Pan, Lan Luo, and Yinfeng Wang
Published: 2024
Full Text: View/download PDF

31. Enhancing ADS Testing: An Open Educational Resource for Metamorphic Testing.

Author: Yifan Zhang, Dave Towey, Matthew Pike, Zhi Quan Zhou, and Tsong Yueh Chen
Published: 2024
Full Text: View/download PDF

32. MT-PART: Metamorphic-Testing-Based Adaptive Random Testing Through Partitioning.

Author: Zhihao Ying, Dave Towey, Tsong Yueh Chen, and Zhi Quan Zhou
Published: 2024
Full Text: View/download PDF

33. Metamorphic Testing of an Autonomous Delivery Robots Scheduler.

Author: Thomas Laurent 0003, Paolo Arcaini, Xiaoyi Zhang 0005, and Fuyuki Ishikawa
Published: 2024
Full Text: View/download PDF

34. METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities.

Author: Sangwon Hyun, Mingyu Guo, and Muhammad Ali Babar 0001
Published: 2024
Full Text: View/download PDF

35. Verifying Embedded Graphics Libraries leveraging Virtual Prototypes and Metamorphic Testing.

Author: Christoph Hazott, Florian Stögmüller, and Daniel Große
Published: 2024
Full Text: View/download PDF

36. MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems in LLM Augmented Generation.

Author: Guanyu Wang, Yuekang Li, Yi Liu 0069, Gelei Deng, Tianlin Li, Guosheng Xu 0001, Yang Liu 0003, Haoyu Wang 0001, and Kailong Wang
Published: 2024
Full Text: View/download PDF

37. Word Closure-Based Metamorphic Testing for Machine Translation

Author: Xie, Xiaoyuan, Jin, Shuo, Chen, Songqiang, Cheung, Shing Chi, Xie, Xiaoyuan, Jin, Shuo, Chen, Songqiang, and Cheung, Shing Chi
Abstract: With the wide application of machine translation, the testing of Machine Translation Systems (MTSs) has attracted much attention. Recent works apply Metamorphic Testing (MT) to address the oracle problem in MTS testing. Existing MT methods for MTS generally follow the workflow of input transformation and output relation comparison, which generates a follow-up input sentence by mutating the source input and compares the source and follow-up output translations to detect translation errors, respectively. These methods use various input transformations to generate the test case pairs and have successfully triggered numerous translation errors. However, they have limitations in performing fine-grained and rigorous output relation comparison and thus may report many false alarms and miss many true errors. In this paper, we propose a word closure-based output comparison method to address the limitations of the existing MTS MT methods. We first propose word closure as a new comparison unit, where each closure includes a group of correlated input and output words in the test case pair. Word closures suggest the linkages between the appropriate fragment in the source output translation and its counterpart in the follow-up output for comparison. Next, we compare the semantics on the level of word closure to identify the translation errors. In this way, we perform a fine-grained and rigorous semantic comparison for the outputs and thus realize more effective violation identification. We evaluate our method with the test cases generated by five existing input transformations and the translation outputs from three popular MTSs. Results show that our method significantly outperforms the existing works in violation identification by improving the precision and recall and achieving an average increase of 29.9% in F1 score. It also helps to increase the F1 score of translation error localization by 35.9%.
Published: 2024

38. HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models

Author: Li, Ningke, Li, Yuekang, Liu, Yi, Shi, Ling, Wang, Kailong, Wang, Haoyu, Li, Ningke, Li, Yuekang, Liu, Yi, Shi, Ling, Wang, Kailong, and Wang, Haoyu
Abstract: Large language models (LLMs) have transformed the landscape of language processing, yet struggle with significant challenges in terms of security, privacy, and the generation of seemingly coherent but factually inaccurate outputs, commonly referred to as hallucinations. Among these challenges, one particularly pressing issue is Fact-Conflicting Hallucination (FCH), where LLMs generate content that directly contradicts established facts. Tackling FCH poses a formidable task due to two primary obstacles: Firstly, automating the construction and updating of benchmark datasets is challenging, as current methods rely on static benchmarks that don't cover the diverse range of FCH scenarios. Secondly, validating LLM outputs' reasoning process is inherently complex, especially with intricate logical relations involved. In addressing these obstacles, we propose an innovative approach leveraging logic programming to enhance metamorphic testing for detecting Fact-Conflicting Hallucinations (FCH). Our method gathers data from sources like Wikipedia, expands it with logical reasoning to create diverse test cases, assesses LLMs through structured prompts, and validates their coherence using semantic-aware assessment mechanisms. Our method generates test cases and detects hallucinations across six different LLMs spanning nine domains, revealing hallucination rates ranging from 24.7% to 59.8%. Key observations indicate that LLMs encounter challenges, particularly with temporal concepts, handling out-of-distribution knowledge, and exhibiting deficiencies in logical reasoning capabilities. The outcomes underscore the efficacy of logic-based test cases generated by our tool in both triggering and identifying hallucinations. These findings underscore the imperative for ongoing collaborative endeavors within the community to detect and address LLM hallucinations.
Published: 2024

39. Metamorphic Testing for Web System Security

Author: Chaleshtari, Nazanin Bayati, Pastore, Fabrizio, Goknil, Arda, and Briand, Lionel C.
Subjects: Computer Science - Software Engineering
Abstract: Security testing aims at verifying that the software meets its security properties. In modern Web systems, however, this often entails the verification of the outputs generated when exercising the system with a very large set of inputs. Full automation is thus required to lower costs and increase the effectiveness of security testing. Unfortunately, to achieve such automation, in addition to strategies for automatically deriving test inputs, we need to address the oracle problem, which refers to the challenge, given an input for a system, of distinguishing correct from incorrect behavior. In this paper, we propose Metamorphic Security Testing for Web-interactions (MST-wi), a metamorphic testing approach that integrates test input generation strategies inspired by mutational fuzzing and alleviates the oracle problem in security testing. It enables engineers to specify metamorphic relations (MRs) that capture many security properties of Web systems. To facilitate the specification of such MRs, we provide a domain-specific language accompanied by an Eclipse editor. MST-wi automatically collects the input data and transforms the MRs into executable Java code to automatically perform security testing. It automatically tests Web systems to detect vulnerabilities based on the relations and collected data. We provide a catalog of 76 system-agnostic MRs to automate security testing in Web systems. It covers 39% of the OWASP security testing activities not automated by state-of-the-art techniques; further, our MRs can automatically discover 102 different types of vulnerabilities, which correspond to 45% of the vulnerabilities due to violations of security design principles according to the MITRE CWE database. We also define guidelines that enable test engineers to improve the testability of the system under test with respect to our approach., Comment: arXiv admin note: text overlap with arXiv:1912.05278
Published: 2022

40. Testing Ocean Software with Metamorphic Testing

Author: Luu, Quang-Hung, Liu, Huai, Chen, Tsong Yueh, and Vu, Hai L.
Subjects: Computer Science - Software Engineering
Abstract: Advancing ocean science has a significant impact to the development of the world, from operating a safe navigation for vessels to maintaining a healthy and diverse ocean ecosystem. Various ocean software systems have been extensively adopted for different purposes, for instance, predicting hourly sea level elevation across shorelines, simulating large-scale ocean circulations, as well as integrating into Earth system models for weather forecasts and climate projections. Regardless of their significance, guaranteeing the trustworthiness of ocean software and modelling systems is a long-standing challenge. The testing of ocean software suffers a lot from the so-called oracle problem, which refers to the absence of test oracles mainly due to the nonlinear interactions of multiple physical variables and the high complexity in computation. In the ocean, observed tidal signals are distorted by non-deterministic physical variables, hindering us from knowing the "true" astronomical tidal constituents existing in the timeseries. In this paper, we present how to test tidal analysis and prediction (TAP) software based on metamorphic testing (MT), a simple yet effective testing approach to the oracle problem. In particular, we construct metamorphic relations from the periodic property of astronomical tide, and then use them to successfully detect a real-life defect in an open-source TAP software. We also conduct a series of experiments to further demonstrate the applicability and effectiveness of MT in the testing of TAP software. Our study not only justifies the potential of MT in testing more complex ocean software and modelling systems, but also can be expanded to assess and improve the quality of a broader range of scientific simulation software systems., Comment: 7 pages, 3 tables
Published: 2022
Full Text: View/download PDF

41. MorphQ: Metamorphic Testing of the Qiskit Quantum Computing Platform

Author: Paltenghi, Matteo and Pradel, Michael
Subjects: Computer Science - Software Engineering, D.2.5
Abstract: As quantum computing is becoming increasingly popular, the underlying quantum computing platforms are growing both in ability and complexity. Unfortunately, testing these platforms is challenging due to the relatively small number of existing quantum programs and because of the oracle problem, i.e., a lack of specifications of the expected behavior of programs. This paper presents MorphQ, the first metamorphic testing approach for quantum computing platforms. Our two key contributions are (i) a program generator that creates a large and diverse set of valid (i.e., non-crashing) quantum programs, and (ii) a set of program transformations that exploit quantum-specific metamorphic relationships to alleviate the oracle problem. Evaluating the approach by testing the popular Qiskit platform shows that the approach creates over 8k program pairs within two days, many of which expose crashes. Inspecting the crashes, we find 13 bugs, nine of which have already been confirmed. MorphQ widens the slim portfolio of testing techniques of quantum computing platforms, helping to create a reliable software stack for this increasingly important field., Comment: Accepted as full paper in the technical track of ICSE 2023
Published: 2022
Full Text: View/download PDF

42. Metamorphic Testing-based Adversarial Attack to Fool Deepfake Detectors

Author: Lim, Nyee Thoang, Kuan, Meng Yi, Pu, Muxin, Lim, Mei Kuan, and Chong, Chun Yong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deepfakes utilise Artificial Intelligence (AI) techniques to create synthetic media where the likeness of one person is replaced with another. There are growing concerns that deepfakes can be maliciously used to create misleading and harmful digital contents. As deepfakes become more common, there is a dire need for deepfake detection technology to help spot deepfake media. Present deepfake detection models are able to achieve outstanding accuracy (>90%). However, most of them are limited to within-dataset scenario, where the same dataset is used for training and testing. Most models do not generalise well enough in cross-dataset scenario, where models are tested on unseen datasets from another source. Furthermore, state-of-the-art deepfake detection models rely on neural network-based classification models that are known to be vulnerable to adversarial attacks. Motivated by the need for a robust deepfake detection model, this study adapts metamorphic testing (MT) principles to help identify potential factors that could influence the robustness of the examined model, while overcoming the test oracle problem in this domain. Metamorphic testing is specifically chosen as the testing technique as it fits our demand to address learning-based system testing with probabilistic outcomes from largely black-box components, based on potentially large input domains. We performed our evaluations on MesoInception-4 and TwoStreamNet models, which are the state-of-the-art deepfake detection models. This study identified makeup application as an adversarial attack that could fool deepfake detectors. Our experimental results demonstrate that both the MesoInception-4 and TwoStreamNet models degrade in their performance by up to 30\% when the input data is perturbed with makeup., Comment: paper accepted at 26TH International Conference on Pattern Recognition (ICPR2022)
Published: 2022

43. Fairness Evaluation in Deepfake Detection Models using Metamorphic Testing

Author: Pu, Muxin, Kuan, Meng Yi, Lim, Nyee Thoang, Chong, Chun Yong, and Lim, Mei Kuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Software Engineering
Abstract: Fairness of deepfake detectors in the presence of anomalies are not well investigated, especially if those anomalies are more prominent in either male or female subjects. The primary motivation for this work is to evaluate how deepfake detection model behaves under such anomalies. However, due to the black-box nature of deep learning (DL) and artificial intelligence (AI) systems, it is hard to predict the performance of a model when the input data is modified. Crucially, if this defect is not addressed properly, it will adversely affect the fairness of the model and result in discrimination of certain sub-population unintentionally. Therefore, the objective of this work is to adopt metamorphic testing to examine the reliability of the selected deepfake detection model, and how the transformation of input variation places influence on the output. We have chosen MesoInception-4, a state-of-the-art deepfake detection model, as the target model and makeup as the anomalies. Makeups are applied through utilizing the Dlib library to obtain the 68 facial landmarks prior to filling in the RGB values. Metamorphic relations are derived based on the notion that realistic perturbations of the input images, such as makeup, involving eyeliners, eyeshadows, blushes, and lipsticks (which are common cosmetic appearance) applied to male and female images, should not alter the output of the model by a huge margin. Furthermore, we narrow down the scope to focus on revealing potential gender biases in DL and AI systems. Specifically, we are interested to examine whether MesoInception-4 model produces unfair decisions, which should be considered as a consequence of robustness issues. The findings from our work have the potential to pave the way for new research directions in the quality assurance and fairness in DL and AI systems., Comment: 8 pages, accepted at 7th International Workshop on Metamorphic Testing (MET22)
Published: 2022

44. Metamorphic Testing and Debugging of Tax Preparation Software

Author: Tizpaz-Niari, Saeid, Monjezi, Verya, Wagner, Morgan, Darian, Shiva, Reed, Krystia, and Trivedi, Ashutosh
Subjects: Computer Science - Software Engineering, Computer Science - Computers and Society
Abstract: This paper presents a data-driven framework to improve the trustworthiness of US tax preparation software systems. Given the legal implications of bugs in such software on its users, ensuring compliance and trustworthiness of tax preparation software is of paramount importance. The key barriers in developing debugging aids for tax preparation systems are the unavailability of explicit specifications and the difficulty of obtaining oracles. We posit that, since the US tax law adheres to the legal doctrine of precedent, the specifications about the outcome of tax preparation software for an individual taxpayer must be viewed in comparison with individuals that are deemed similar. Consequently, these specifications are naturally available as properties on the software requiring similar inputs provide similar outputs. Inspired by the metamorphic testing paradigm, we dub these relations metamorphic relations. In collaboration with legal and tax experts, we explicated metamorphic relations for a set of challenging properties from various US Internal Revenue Services (IRS) publications including Publication 596 (Earned Income Tax Credit), Schedule 8812 (Qualifying Children/Other Dependents), and Form 8863 (Education Credits). We focus on an open-source tax preparation software for our case study and develop a randomized test-case generation strategy to systematically validate the correctness of tax preparation software guided by metamorphic relations. We further aid this test-case generation by visually explaining the behavior of software on suspicious instances using easy to-interpret decision-tree models. Our tool uncovered several accountability bugs with varying severity ranging from non-robust behavior in corner-cases (unreliable behavior when tax returns are close to zero) to missing eligibility conditions in the updated versions of software., Comment: 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)
Published: 2022

45. Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective

Author: Manino, Edoardo, Rozanova, Julia, Carvalho, Danilo, Freitas, Andre, and Cordeiro, Lucas
Subjects: Computer Science - Computation and Language
Abstract: Metamorphic testing has recently been used to check the safety of neural NLP models. Its main advantage is that it does not rely on a ground truth to generate test cases. However, existing studies are mostly concerned with robustness-like metamorphic relations, limiting the scope of linguistic properties they can test. We propose three new classes of metamorphic relations, which address the properties of systematicity, compositionality and transitivity. Unlike robustness, our relations are defined over multiple source inputs, thus increasing the number of test cases that we can produce by a polynomial factor. With them, we test the internal consistency of state-of-the-art NLP models, and show that they do not always behave according to their expected linguistic properties. Lastly, we introduce a novel graphical notation that efficiently summarises the inner structure of metamorphic relations., Comment: Findings of the Association for Computational Linguistics 2022
Published: 2022

46. Ontology-based metamorphic testing for chatbots

Author: Božić, Josip
Published: 2022
Full Text: View/download PDF

47. MetaLiDAR: Automated metamorphic testing of LiDAR‐based autonomous driving systems.

Author: Yang, Zhen, Huang, Song, Zheng, Changyou, Wang, Xingya, Wang, Yang, and Xia, Chunyan
Subjects: *POINT cloud, *ARTIFICIAL intelligence, *DRIVERLESS cars, *TEST methods, *AUTONOMOUS vehicles
Abstract: Recent advances in artificial intelligence technology and perception components have promoted the rapid development of autonomous vehicles. However, as safety‐critical software, autonomous driving systems often make wrong judgments, seriously threatening human and property safety. LiDAR is one of the most critical sensors in autonomous vehicles, capable of accurately perceiving the three‐dimensional information of the environment. Nevertheless, the high cost of manually collecting and labeling point cloud data leads to a dearth of testing methods for LiDAR‐based perception modules. To bridge the critical gap, we introduce MetaLiDAR, a novel automated metamorphic testing methodology for LiDAR‐based autonomous driving systems. First, we propose three object‐level metamorphic relations for the domain characteristics of autonomous driving systems. Next, we design three transformation modules so that MetaLiDAR can generate natural‐looking follow‐up point clouds. Finally, we define corresponding evaluation metrics based on metamorphic relations. MetaLiDAR automatically determines whether source and follow‐up test cases meet the metamorphic relations based on the evaluation metrics. Our empirical research on five state‐of‐the‐art LiDAR‐based object detection models shows that MetaLiDAR can not only generate natural‐looking test point clouds to detect 181,547 inconsistent behaviors of different models but also significantly enhance the robustness of models by retraining with synthetic point clouds. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Uncertainty-Based Metamorphic Testing for Validating Plagiarism Detection Systems

Author: Chan, Pak Yuen Patrick, primary, Keung, Jacky, additional, and Yang, Zhen, additional
Published: 2023
Full Text: View/download PDF

49. Metamorphic testing of chess engines

Author: Méndez, Manuel, Benito-Parejo, Miguel, Ibias, Alfredo, and Núñez, Manuel
Published: 2023
Full Text: View/download PDF

50. Application of metamorphic testing on UAV path planning software

Author: Wu, Lvyuan, Xi, Zhiyu, Zheng, Zheng, and Li, Xiaoli
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

2,909 results on '"metamorphic testing"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources