838 results on '"causal discovery"'
Search Results
2. Discovering causal models for structural, construction and defense-related engineering phenomena
- Author
-
Naser, M.Z.
- Published
- 2025
- Full Text
- View/download PDF
3. Using GPT-4 to guide causal machine learning
- Author
-
Constantinou, Anthony C., Kitson, Neville K., and Zanga, Alessio
- Published
- 2025
- Full Text
- View/download PDF
4. Discovering the effective connectome of the brain with dynamic Bayesian DAG learning
- Author
-
Bagheri, Abdolmahdi, Pasande, Mohammad, Bello, Kevin, Araabi, Babak Nadjar, and Akhondi-Asl, Alireza
- Published
- 2024
- Full Text
- View/download PDF
5. Combination of Process Mining and Causal Discovery Generated Graph Models for Comprehensive Process Modeling
- Author
-
Hennebold, Christoph, Islam, Muhammad M., Krauß, Jonas, and Huber, Marco F.
- Published
- 2024
- Full Text
- View/download PDF
6. ETIA: Towards an Automated Causal Discovery Pipeline
- Author
-
Biza, Konstantina, Ntroumpogiannis, Antonios, Triantafillou, Sofia, Tsamardinos, Ioannis, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Pedreschi, Dino, editor, Monreale, Anna, editor, Guidotti, Riccardo, editor, Pellungrini, Roberto, editor, and Naretto, Francesca, editor
- Published
- 2025
- Full Text
- View/download PDF
7. Copula Entropy Based Causal Network Discovery from Non-stationary Time Series
- Author
-
Yang, Jing, Rao, Xinzhi, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
8. Efficient Nonlinear DAG Learning Under Projection Framework
- Author
-
Yin, Naiyu, Yu, Yue, Gao, Tian, Ji, Qiang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
9. Causal Behavior Pattern Inference for News Recommendation Through Multi-interest Matching
- Author
-
Chen, Xingming, Fan, Wenqi, Li, Qing, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Barhamgi, Mahmoud, editor, Wang, Hua, editor, and Wang, Xin, editor
- Published
- 2025
- Full Text
- View/download PDF
10. Regularized Multi-LLMs Collaboration for Enhanced Score-Based Causal Discovery
- Author
-
Li, Xiaoxuan, Liu, Yao, Wang, Ruoyu, Yao, Lina, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Barhamgi, Mahmoud, editor, Wang, Hua, editor, and Wang, Xin, editor
- Published
- 2025
- Full Text
- View/download PDF
11. Improving Reinforcement Learning-Based Autonomous Agents with Causal Models
- Author
-
Briglia, Giovanni, Lippi, Marco, Mariani, Stefano, Zambonelli, Franco, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Arisaka, Ryuta, editor, Sanchez-Anguix, Victor, editor, Stein, Sebastian, editor, Aydoğan, Reyhan, editor, van der Torre, Leon, editor, and Ito, Takayuki, editor
- Published
- 2025
- Full Text
- View/download PDF
12. Evaluation Criteria for Causal Discovery Without Ground-Truth Graphs
- Author
-
Wang, Lei, Huang, Shanshan, Jun, Liao, Liu, Li, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhou, Xiao-Hua, editor, and Jia, Jinzhu, editor
- Published
- 2025
- Full Text
- View/download PDF
13. Gradient-based causal discovery with latent variables.
- Author
-
Ni, Haotian, Wang, Tian-Zuo, Tao, Hong, Huang, Xiuqi, and Hou, Chenping
- Abstract
Discovering causal graphs from observational data is a challenging problem, which has garnered significant attention due to its crucial role in understanding causal relationships. In recent advancements, this problem is cast as a continuous optimization task with structural constraints, through which the great power of gradient-based methods can be exploited to address the causal discovery problem. Despite their statistical validity, these approaches return causal graphs with spurious edges in the presence of latent variables. In this paper, we generalize the gradient-based method to accommodate the existence of latent confounders and latent intermediate variables. Specifically, we propose a causal discovery method based on latent variable reconstruction. This method primarily consists of two stages. In the first stage, we propose a series of causal models that includes latent variables, which can be applied to different data assumptions. However, due to the influence of latent variables, the causal graph inevitably contains reversed edges. In light of this fact, we propose the method to correct these reversed edges on the second stage via variational autoencoder. Theoretical results show that under some mild conditions, our method can correctly identify the causal relations. Experiments on both synthetic and real datasets demonstrate the superiority of our method to existing gradient-based learning algorithms in the presence of latent variables. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. A Novel Hyper-Heuristic Algorithm with Soft and Hard Constraints for Causal Discovery Using a Linear Structural Equation Model.
- Author
-
Dang, Yinglong, Gao, Xiaoguang, and Wang, Zidong
- Subjects
- *
STRUCTURAL equation modeling , *ARTIFICIAL intelligence , *LINEAR equations , *CAUSAL models , *SOCIAL development - Abstract
Artificial intelligence plays an indispensable role in improving productivity and promoting social development, and causal discovery is one of the extremely important research directions in this field. Acyclic directed graphs (DAGs) are the most commonly used tool in causal modeling because of their excellent interpretability and structural properties. However, in the face of insufficient data, the accuracy and efficiency of DAGs learning are greatly reduced, resulting in a false perception of causality. As intuitive expert knowledge, structural constraints control DAG learning by limiting the causal relationship between variables, which is expected to solve the above-mentioned problem. However, it is often impossible to build a DAG by relying on expert knowledge alone. To solve this problem, we propose the use of expert knowledge as a hard constraint and the structural prior gained via data learning as a soft constraint. In this paper, we propose a fitness-rate-rank-based multiarmed bandit (FRRMAB) hyper-heuristic that integrates soft and hard constraints into the DAG learning process. For a linear structural equation model (SEM), soft constraints are obtained via partial correlation analysis. The experimental results on different networks show that the proposed method has higher scalability and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
15. Causal Discovery and Deep Learning Algorithms for Detecting Geochemical Patterns Associated with Gold-Polymetallic Mineralization: A Case Study of the Edongnan Region.
- Author
-
Luo, Zijing and Zuo, Renguang
- Subjects
- *
MACHINE learning , *PATTERN recognition systems , *GENERATIVE adversarial networks , *CAPSULE neural networks , *AUTOENCODER , *DEEP learning - Abstract
The identification of mineral deposit footprints by processing geochemical survey data constitutes a crucial stage in mineral exploration because it provides valuable and substantial information for future prospecting endeavors. However, the selection of appropriate pathfinder elements and the recognition of their anomalous patterns for determining metallogenic favorability based on geochemical survey data remain challenging tasks because of the complex interactions among different geochemical elements and the highly nonlinear and heterogeneous characteristics of their spatial distribution patterns. This study investigated the application of a causal discovery algorithm and deep learning models to identify geochemical anomaly patterns associated with mineralization. Using gold-polymetallic deposits in the Edongnan region of China as a case study, stream sediment samples containing concentrations of 39 elements were collected and preprocessed using a centered log-ratio transformation, addressing the closure effect of compositional data. The combination of the synthetic minority oversampling technique, Tomek link algorithm, and causal discovery algorithm to explore the potential associations and influences among geochemical elements provides new insights into the selection of pathfinder elements. Regarding the problem of identifying anomalous spatial distribution patterns in pathfinder elements and considering that the formation of mineral deposits is the result of various geological processes interacting under specific spatiotemporal conditions, we proposed a hybrid deep learning model called VAE-CAPSNET-GAN, which combines a variational autoencoder (VAE), capsule network (CAPSNET), and generative adversarial network (GAN). The model was designed to capture the spatial distribution characteristics of pathfinder elements and the spatial coupling relationships between mineral deposits and geochemical anomalies, enabling the recognition of geochemical anomaly patterns related to mineralization. The results showed that, compared to the VAE model, which also uses reconstruction error as the anomaly detection principle, VAE-CAPSNET-GAN exhibited superior performance in identifying known mineral deposits and delineating anomalous areas aligned more closely with the established metallogenic model. Furthermore, this weakens the impact of overlapping information. Multiple outcomes indicated that an integrated analytical framework combining a causal discovery algorithm with deep learning models can provide valuable clues for further delineating prospects. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
16. Diabetes Prediction Through Linkage of Causal Discovery and Inference Model with Machine Learning Models.
- Author
-
Noh, Mi Jin and Kim, Yang Sok
- Subjects
MACHINE learning ,CAUSAL artificial intelligence ,DIGESTIVE system diseases ,CAUSAL inference ,DEEP learning - Abstract
Background/Objectives: Diabetes is a dangerous disease that is accompanied by various complications, including cardiovascular disease. As the global diabetes population continues to increase, it is crucial to identify its causes. Therefore, we predicted diabetes using an AI model and quantitatively examined causal relationships using a causal discovery and inference model. Methods: Kaggle's dataset from the National Institute of Diabetes and Digestive and Kidney Diseases was analyzed using logistic regression, deep learning, gradient boosting, and decision trees. Causal discovery techniques, such as LiNGAM, were employed to infer relationships between variables. Results: The study achieved high accuracy across models using logistic regression (84.84%) and deep learning (84.83%). The causal model highlighted factors such as physical activity, difficulty in walking, and heavy drinking as direct contributors to diabetes. Conclusions: By combining AI with causal inference, this study provides both predictive performance and insight into the factors affecting diabetes, paving the way for tailored interventions. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
17. Causal discovery and fault diagnosis based on mixed data types for system reliability modeling.
- Author
-
Wang, Xiaokang, Jiang, Siqi, Li, Xinghan, and Wang, Mozhu
- Abstract
Causal relationships play an irreplaceable role in revealing the mechanisms of phenomena and guiding intervention actions. However, due to limitations in existing frameworks regarding model representations and learning algorithms, only a few studies have explored causal discovery on non-Euclidean data. In this paper, we address the issue by proposing a causal mapping process based on coordinate representations for heterogeneous non-Euclidean data. We propose a data generation mechanism between the parent nodes and the child nodes and create a causal mechanism based on multi-dimensional tensor regression. Furthermore, within the aforementioned theoretical framework, we propose a two-stage causal discovery approach based on regularized generalized canonical correlation analysis. Using the discrete representation in the shared projection direction, causal relationships between heterogeneous non-Euclidean variables can be discovered more accurately. Finally, empirical research is conducted on real-world industrial sensor data, which demonstrates the effectiveness of the proposed method for discovering causal relationships in heterogeneous non-Euclidean data. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
18. AnchorFCI: harnessing genetic anchors for enhanced causal discovery of cardiometabolic disease pathways.
- Author
-
Ribeiro, Adèle H., Crnkovic, Milena, Pereira, Jaqueline Lopes, Fisberg, Regina Mara, Sarti, Flavia Mori, Rogero, Marcelo Macedo, Heider, Dominik, and Cerqueira, Andressa
- Subjects
DISEASE risk factors ,CAUSAL inference ,HEART metabolism disorders ,BIOMARKERS ,WORLD health - Abstract
Introduction: Cardiometabolic diseases, a major global health concern, stem from complex interactions of lifestyle, genetics, and biochemical markers. While extensive research has revealed strong associations between various risk factors and these diseases, latent confounding and limited causal discovery methods hinder understanding of their causal relationships, essential for mechanistic insights and developing effective prevention and intervention strategies. Methods: We introduce anchorFCI, a novel adaptation of the conservative Really Fast Causal Inference (RFCI) algorithm, designed to enhance robustness and discovery power in causal learning by strategically selecting and integrating reliable anchor variables from a set of variables known not to be caused by the variables of interest. This approach is well-suited for studies of phenotypic, clinical, and sociodemographic data, using genetic variables that are recognized to be unaffected by these factors. We demonstrate the method's effectiveness through simulation studies and a comprehensive causal analysis of the 2015 ISA-Nutrition dataset, featuring both anchorFCI for causal discovery and state-of-the-art effect size identification tools from Judea Pearl's framework, showcasing a robust, fully data-driven causal inference pipeline. Results: Our simulation studies reveal that anchorFCI effectively enhances robustness and discovery power while handles latent confounding by integrating reliable anchor variables and their non-ancestral relationships. The 2015 ISA-Nutrition dataset analysis not only supports many established causal relationships but also elucidates their interconnections, providing a clearer understanding of the complex dynamics and multifaceted nature of cardiometabolic risk. Discussion: AnchorFCI holds significant potential for reliable causal discovery in complex, multidimensional datasets. By effectively integrating non-ancestral knowledge and addressing latent confounding, it is well-suited for various applications requiring robust causal inference from observational studies, providing valuable insights in epidemiology, genetics, and public health. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Federated multi-task Bayesian network learning in the presence of overlapping and distinct variables.
- Author
-
Yang, Xing, Niu, Ben, Lan, Tian, and Zhang, Chen
- Subjects
- *
DATA privacy , *BAYESIAN analysis , *STRUCTURAL models , *ADDITIVES - Abstract
AbstractBayesian Network (BN) is a powerful tool for causal dependence relationship discoveries of multivariate data. This article proposes a federated multi-task learning framework for BNs with overlapping and distinct variables. First, an additive structural causal model is proposed to describe the nonparametric causal dependence structure for each client’s BN. Then by assuming different clients can have similar, yet not identical, causal dependence structures, a two-step federated multi-task learning framework is formulated for parameter learning of different clients, which can protect data privacy in the meanwhile. In the first step, each client updates its local BN parameters with its own data. In the second step, the central server updates global parameters. The two steps iterate until converge. Numerical studies and a case study of a three-phase flow facility data demonstrate the efficacy of our proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Discovery and Inference of a Causal Network with Hidden Confounding.
- Author
-
Chen, Li, Li, Chunlin, Shen, Xiaotong, and Pan, Wei
- Subjects
- *
GENE regulatory networks , *TIME complexity , *LIKELIHOOD ratio tests , *POLYNOMIAL time algorithms , *ALZHEIMER'S disease - Abstract
This article proposes a novel causal discovery and inference method called GrIVET for a Gaussian directed acyclic graph with unmeasured confounders. GrIVET consists of an order-based causal discovery method and a likelihood-based inferential procedure. For causal discovery, we generalize the existing peeling algorithm to estimate the ancestral relations and candidate instruments in the presence of hidden confounders. Based on this, we propose a new procedure for instrumental variable estimation of each direct effect by separating it from any mediation effects. For inference, we develop a new likelihood ratio test of multiple causal effects that is able to account for the unmeasured confounders. Theoretically, we prove that the proposed method has desirable guarantees, including robustness to invalid instruments and uncertain interventions, estimation consistency, low-order polynomial time complexity, and validity of asymptotic inference. Numerically, GrIVET performs well and compares favorably against state-of-the-art competitors. Furthermore, we demonstrate the utility and effectiveness of the proposed method through an application inferring regulatory pathways from Alzheimer's disease gene expression data. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Causal Relationships in Longitudinal Observational Data: An Integrative Modeling Approach.
- Author
-
Biazoli Jr., Claudinei E., Sato, João R., and Pluess, Michael
- Abstract
Much research in psychology relies on data from observational studies that traditionally do not allow for causal interpretation. However, a range of approaches in statistics and computational sciences have been developed to infer causality from correlational data. Based on conceptual and theoretical considerations on the integration of interventional and time-restrainment notions of causality, we set out to design and empirically test a new approach to identify potential causal factors in longitudinal correlational data. A principled and representative set of simulations and an illustrative application to identify early-life determinants of cognitive development in a large cohort study are presented. The simulation results illustrate the potential but also the limitations for discovering causal factors in observational data. In the illustrative application, plausible candidates for early-life determinants of cognitive abilities in 5-year-old children were identified. Based on these results, we discuss the possibilities of using exploratory causal discovery in psychological research but also highlight its limits and potential misuses and misinterpretations. Much research in psychology relies on data from observational studies that traditionally do not allow for causal interpretation. However, a range of approaches in statistics and computational sciences have been developed to study causal links even in observational data. Based on recent theoretical results, we propose a new systematic approach to discover potential causal factors in longitudinal observational studies. To test this new approach and illustrate how it can be used, we used both simulated data and data from a large observational study of children. In the illustrative application, we sought to identify which environmental, familial, or individual characteristics when children are 9 months are causally related with cognitive abilities at 5 years old. We discuss the possibilities of using methods to discover causal factors in psychological research but also highlight its limits and potential misuses and misinterpretations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Uncovering Causal Relationships for Debiased Repost Prediction Using Deep Generative Models.
- Author
-
Sun, Wu-Jiu and Liu, Xiao Fan
- Subjects
CAUSAL inference ,KNOWLEDGE graphs ,PUBLIC opinion ,PREDICTION models ,DEEP learning - Abstract
Microblogging platforms like X (formerly Twitter) and Sina Weibo have become key channels for spreading information online. Accurately predicting information spread, such as users' reposting activities, is essential for applications including content recommendation and analyzing public sentiment. Current advanced models rely on deep representation learning to extract features from various inputs, such as users' social connections and repost history, to forecast reposting behavior. Nonetheless, these models frequently ignore intrinsic confounding factors, which may cause the models to capture spurious relationships, ultimately impacting prediction performance. To address this limitation, we propose a novel Debiased Reposting Prediction model (DRP). Our model mitigates the influence of confounding variables by incorporating intervention operations from causal inference, enabling it to learn the causal associations between features and user reposting behavior. Specifically, we introduce a memory network within DRP to enhance the model's perception of confounder distributions. This network aggregates and learns confounding information dispersed across different training data batches by optimizing the reconstruction loss. Furthermore, recognizing the challenge of acquiring prior knowledge of causal graphs, which is crucial for causal inference, we develop a causal discovery module within DRP (CD-DRP). This module allows the model to autonomously uncover the causal graph of feature variables by analyzing microblogging data. Experimental results on multiple real-world datasets demonstrate that our proposed method effectively uncovers causal relationships between variables, exhibits strong time efficiency, and outperforms state-of-the-art models in prediction performance (improved by 2.54%) and overfitting reduction (by 7.44%). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. A Review of Causal Methods for High-Dimensional Data
- Author
-
Zewude A. Berkessa, Esa Laara, and Patrik Waldmann
- Subjects
Causal discovery ,causal effect estimation ,causal methods ,confounding bias ,endogeneity ,high-dimensionality ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Causal learning from observational data is an important scientific endeavor, but the statistical and computational challenges posed by the high-dimensionality of many modern datasets are substantial. Peculiarities such as spurious correlations, endogeneity, noise accumulation, and deflated empirical covariance estimation complicate analysis. These issues may lead to confounding bias, which can be misleading when attempting to learn the true causal relationships and causal effects between variables. In this survey, we provide a comprehensive review of causal analysis and the theory behind high-dimensionality. We discuss the effects of high-dimensionality on causal estimation methods and their corresponding solutions. Finally, we present evaluation metrics and software tools for both causal effect estimation and causal discovery.
- Published
- 2025
- Full Text
- View/download PDF
24. Causal discovery and fault diagnosis based on mixed data types for system reliability modeling
- Author
-
Xiaokang Wang, Siqi Jiang, Xinghan Li, and Mozhu Wang
- Subjects
Causal discovery ,Non-Euclidean data ,Canonical correlation analysis ,Industrial fault diagnosis ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Causal relationships play an irreplaceable role in revealing the mechanisms of phenomena and guiding intervention actions. However, due to limitations in existing frameworks regarding model representations and learning algorithms, only a few studies have explored causal discovery on non-Euclidean data. In this paper, we address the issue by proposing a causal mapping process based on coordinate representations for heterogeneous non-Euclidean data. We propose a data generation mechanism between the parent nodes and the child nodes and create a causal mechanism based on multi-dimensional tensor regression. Furthermore, within the aforementioned theoretical framework, we propose a two-stage causal discovery approach based on regularized generalized canonical correlation analysis. Using the discrete representation in the shared projection direction, causal relationships between heterogeneous non-Euclidean variables can be discovered more accurately. Finally, empirical research is conducted on real-world industrial sensor data, which demonstrates the effectiveness of the proposed method for discovering causal relationships in heterogeneous non-Euclidean data.
- Published
- 2025
- Full Text
- View/download PDF
25. Causal Discovery from Temporal Data: An Overview and New Perspectives.
- Author
-
Gong, Chang, Zhang, Chuzhe, Yao, Di, Bi, Jingping, Li, Wenbin, and Xu, YongJun
- Subjects
- *
ARTIFICIAL neural networks , *SCIENCE conferences , *GRAPH neural networks , *MACHINE learning , *LANGUAGE models , *GRANGER causality test , *INDEPENDENT component analysis , *DIRECTED graphs - Published
- 2025
- Full Text
- View/download PDF
26. Offline model-based reinforcement learning with causal structured world models.
- Author
-
Zhu, Zhengmao, Tian, Honglong, Chen, Xionghui, Zhang, Kun, and Yu, Yang
- Abstract
Model-based methods have recently been shown promising for offline reinforcement learning (RL), which aims at learning good policies from historical data without interacting with the environment. Previous model-based offline RL methods employ a straightforward prediction method that maps the states and actions directly to the next-step states. However, such a prediction method tends to capture spurious relations caused by the sampling policy preference behind the offline data. It is sensible that the environment model should focus on causal influences, which can facilitate learning an effective policy that can generalize well to unseen states. In this paper, we first provide theoretical results that causal environment models can outperform plain environment models in offline RL by incorporating the causal structure into the generalization error bound. We also propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structured World Models (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL. Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly, and, as a result, outperforms both model-based offline RL algorithms and causal model-based offline RL algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
27. Mixed-variable graphical modeling framework towards risk prediction of hospital-acquired pressure injury in spinal cord injury individuals
- Author
-
Yanke Li, Anke Scheel-Sailer, Robert Riener, and Diego Paez-Granados
- Subjects
Graphical models ,Causal discovery ,Predictive modeling ,Spinal cord injury ,Pressure injury ,Medicine ,Science - Abstract
Abstract Developing machine learning (ML) methods for healthcare predictive modeling requires absolute explainability and transparency to build trust and accountability. Graphical models (GM) are key tools for this but face challenges like small sample sizes, mixed variables, and latent confounders. This paper presents a novel learning framework addressing these challenges by integrating latent variables using fast causal inference (FCI), accommodating mixed variables with predictive permutation conditional independence tests (PPCIT), and employing a systematic graphical embedding approach leveraging expert knowledge. This method ensures a transparent model structure and an explainable feature selection and modeling approach, achieving competitive prediction performance. For real-world validation, data of hospital-acquired pressure injuries (HAPI) among individuals with spinal cord injury (SCI) were used, where the approach achieved a balanced accuracy of 0.941 and an AUC of 0.983, outperforming most benchmarks. The PPCIT method also demonstrated superior accuracy and scalability over other benchmarks in causal discovery validation on synthetic datasets that closely resemble our real dataset. This holistic framework effectively addresses the challenges of mixed variables and explainable predictive modeling for disease onset, which is crucial for enabling transparency and interpretability in ML-based healthcare.
- Published
- 2024
- Full Text
- View/download PDF
28. Mixed-variable graphical modeling framework towards risk prediction of hospital-acquired pressure injury in spinal cord injury individuals.
- Author
-
Li, Yanke, Scheel-Sailer, Anke, Riener, Robert, and Paez-Granados, Diego
- Subjects
SPINAL cord injuries ,PRESSURE ulcers ,FEATURE selection ,CAUSAL inference ,PREDICTION models - Abstract
Developing machine learning (ML) methods for healthcare predictive modeling requires absolute explainability and transparency to build trust and accountability. Graphical models (GM) are key tools for this but face challenges like small sample sizes, mixed variables, and latent confounders. This paper presents a novel learning framework addressing these challenges by integrating latent variables using fast causal inference (FCI), accommodating mixed variables with predictive permutation conditional independence tests (PPCIT), and employing a systematic graphical embedding approach leveraging expert knowledge. This method ensures a transparent model structure and an explainable feature selection and modeling approach, achieving competitive prediction performance. For real-world validation, data of hospital-acquired pressure injuries (HAPI) among individuals with spinal cord injury (SCI) were used, where the approach achieved a balanced accuracy of 0.941 and an AUC of 0.983, outperforming most benchmarks. The PPCIT method also demonstrated superior accuracy and scalability over other benchmarks in causal discovery validation on synthetic datasets that closely resemble our real dataset. This holistic framework effectively addresses the challenges of mixed variables and explainable predictive modeling for disease onset, which is crucial for enabling transparency and interpretability in ML-based healthcare. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Long-term sequelae of SARS-CoV-2 two years following infection: exploring the interplay of biological, psychological, and social factors.
- Author
-
Verveen, Anouk, Nugroho, Fajar Agung, Bucur, Ioan Gabriel, Wynberg, Elke, Willigen, Hugo D.G. van, Davidovich, Udi, Lok, Anja, Charante, Eric P. Moll van, Bree, Godelieve J. de, Jong, Menno D. de, Kootstra, Neeltje, Claassen, Tom, Jonge, Marien I. de, Heskes, Tom, Prins, Maria, Knoop, Hans, Nieuwkerk, Pythia T., and Group, the RECoVERED Study
- Abstract
Background Severe fatigue and cognitive complaints are frequently reported after SARS-CoV-2 infection and may be accompanied by depressive symptoms and/or limitations in physical functioning. The long-term sequelae of COVID-19 may be influenced by biomedical, psychological, and social factors, the interplay of which is largely understudied over time. We aimed to investigate how the interplay of these factors contribute to the persistence of symptoms after COVID-19. Methods RECoVERED, a prospective cohort study in Amsterdam, the Netherlands, enrolled participants aged⩾16 years after SARS-CoV-2 diagnosis. We used a structural network analysis to assess relationships between biomedical (initial COVID-19 severity, inflammation markers), psychological (illness perceptions, coping, resilience), and social factors (loneliness, negative life events) and persistent symptoms 24 months after initial disease (severe fatigue, difficulty concentrating, depressive symptoms and limitations in physical functioning). Causal discovery, an explorative data-driven approach testing all possible associations and retaining the most likely model, was performed. Results Data from 235/303 participants (77.6%) who completed the month 24 study visit were analysed. The structural model revealed associations between the putative factors and outcomes. The outcomes clustered together with severe fatigue as its central point. Loneliness, fear avoidance in response to symptoms, and illness perceptions were directly linked to the outcomes. Biological (inflammatory markers) and clinical (severity of initial illness) variables were connected to the outcomes only via psychological or social variables. Conclusions Our findings support a model where biomedical, psychological, and social factors contribute to the development of long-term sequelae of SARS-CoV-2 infection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning.
- Author
-
Bo, Shi and Xiao, Minheng
- Subjects
- *
SUPPLY chain management , *SUPPLY chains - Abstract
Managing delivery risks is a critical challenge in modern supply chain management due to the increasing complexity and interdependencies of global supply networks. Existing methods often rely on correlation-based approaches, which fail to uncover the true causes behind delivery delays. This limitation makes it difficult for supply chain managers to identify actionable factors that can mitigate risks effectively. To address these challenges, we propose a novel method that integrates causal discovery with reinforcement learning to identify the root causes of delivery risks. Unlike traditional correlation-based methods, our approach uncovers both the direction and strength of causal relationships between variables, allowing for more accurate identification of the key drivers behind delivery delays. By applying causal strength quantification, we further measure the impact of each factor on delivery performance. Using real-world supply chain data, our results demonstrate that the proposed method reveals hidden causal relationships between factors such as shipping mode, order size, and delivery status. These insights enable supply chain managers to implement more targeted interventions, significantly improving risk mitigation strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Missing Data Imputation Based on Causal Inference to Enhance Advanced Persistent Threat Attack Prediction.
- Author
-
Cheng, Xiang, Kuang, Miaomiao, and Yang, Hongyu
- Subjects
- *
DENIAL of service attacks , *MISSING data (Statistics) , *DATA security failures , *CYBERTERRORISM , *CAUSAL inference - Abstract
With the continuous development of network security situations, the types of attacks increase sharply, but can be divided into symmetric attacks and asymmetric attacks. Symmetric attacks such as phishing and DDoS attacks exploit fixed patterns, resulting in system crashes and data breaches that cause losses to businesses. Asymmetric attacks such as Advanced Persistent Threat (APT), a highly sophisticated and organized form of cyber attack, because of its concealment and complexity, realize data theft through long-term latency and pose a greater threat to organization security. In addition, there are challenges in the processing of missing data, especially in the application of symmetric and asymmetric data filling, the former is simple but not flexible, and the latter is complex and more suitable for highly complex attack scenarios. Since asymmetric attack research is particularly important, this paper proposes a method that combines causal discovery with graph autoencoder to solve missing data, classify potentially malicious nodes, and reveal causal relationships. The core is to use graphic autoencoders to learn the underlying causal structure of APT attacks, with a special focus on the complex causal relationships in asymmetric attacks. This causal knowledge is then applied to enhance the robustness of the model by compensating for data gaps. In the final phase, it also reveals causality, predicts and classifies potential APT attack nodes, and provides a comprehensive framework that not only predicts potential threats, but also provides insight into the logical sequence of the attacker's actions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Software application profile: tpc and micd—R packages for causal discovery with incomplete cohort data.
- Author
-
Andrews, Ryan M, Bang, Christine W, Didelez, Vanessa, Witte, Janine, and Foraita, Ronja
- Subjects
- *
MISSING data (Statistics) , *STATISTICAL errors , *APPLICATION software , *SOURCE code , *SCALING (Social sciences) - Abstract
Motivation The Peter Clark (PC) algorithm is a popular causal discovery method to learn causal graphs in a data-driven way. Until recently, existing PC algorithm implementations in R had important limitations regarding missing values, temporal structure or mixed measurement scales (categorical/continuous), which are all common features of cohort data. The new R packages presented here, micd and tpc , fill these gaps. Implementation micd and tpc packages are R packages. General features The micd package provides add-on functionality for dealing with missing values to the existing pcalg R package, including methods for multiple imputations relying on the Missing At Random assumption. Also, micd allows for mixed measurement scales assuming conditional Gaussianity. The tpc package efficiently exploits temporal information in a way that results in a more informative output that is less prone to statistical errors. Availability The tpc and micd packages are freely available on the Comprehensive R Archive Network (CRAN). Their source code is also available on GitHub (https://github.com/bips-hb/micd ; https://github.com/bips-hb/tpc). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Desire to Find Causal Relations: Response to Robinson and Wainer’s (2023) Reflection on the Field—It’s Just an Observation.
- Author
-
Ding, Cody
- Abstract
In the article It’s Just an Observation, Robinson and Wainer (Educational Psychology Review 35, Robinson, D., & Wainer, H. (2023). It’s just an observation. Educational Psychology Review, 35(83), Published online: 14 August, 2023) lamented that educational psychology is moving toward the dark side of the quality continuum, with fewer intervention studies and randomized controlled trials and a tendency to make causal inferences based on more armchair research using observational data. This paper discussed the challenges of making causal inferences, even with intervention studies and randomized controlled trials. We argued the usefulness of causal assumptions and modeling based on observational data regarding causal discovery while acknowledging their limitations. More importantly, the research rigor can be achieved in experimental or intervention studies as well as in studies using observational data. Showing favoritism could also taint our field by limiting our perspectives, stifling creativity, and diminishing scholarly variety. We should not allow the undue overinterpretation of correlational evidence to undermine the entire field of observational studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Impact of COVID-19 Pandemic on Sleep Including HRV and Physical Activity as Mediators: A Causal ML Approach
- Author
-
Khatibi, Elahe, Abbasian, Mahyar, Azimi, Iman, Labbaf, Sina, Feli, Mohammad, Borelli, Jessica, Dutt, Nikil, and Rahmani, Amir M
- Subjects
Clinical and Health Psychology ,Health Sciences ,Psychology ,Basic Behavioral and Social Science ,Behavioral and Social Science ,Sleep Research ,Good Health and Well Being ,COVID-19 ,HRV ,Sleep Quality ,Causal Machine Learning ,Causal Discovery ,Mediator Analysis ,Causal Inference - Abstract
Sleep quality is crucial to both mental and physical well-being. The COVID-19 pandemic, which has notably affected the population's health worldwide, has been shown to deteriorate people's sleep quality. Numerous studies have been conducted to evaluate the impact of the COVID-19 pandemic on sleep efficiency, investigating their relationships using correlation-based methods. These methods merely rely on learning spurious correlation rather than the causal relations among variables. Furthermore, they fail to pinpoint potential sources of bias and mediators and envision counterfactual scenarios, leading to a poor estimation. In this paper, we develop a Causal Machine Learning method, which encompasses causal discovery and causal inference components, to extract the causal relations between the COVID-19 pandemic (treatment variable) and sleep quality (outcome) and estimate the causal treatment effect, respectively. We conducted a wearable-based health monitoring study to collect data, including sleep quality, physical activity, and Heart Rate Variability (HRV) from college students before and after the COVID-19 lockdown in March 2020. Our causal discovery component generates a causal graph and pinpoints mediators in the causal model. We incorporate the strongly contributing mediators (i.e., HRV and physical activity) into our causal inference component to estimate the robust, accurate, and explainable causal effect of the pandemic on sleep quality. Finally, we validate our estimation via three refutation analysis techniques. Our experimental results indicate that the pandemic exacerbates college students' sleep scores by 8%. Our validation results show significant p-values confirming our estimation.
- Published
- 2023
35. CausalXtract, a flexible pipeline to extract causal effects from live-cell time-lapse imaging data
- Author
-
Franck Simon, Maria Colomba Comes, Tiziana Tocci, Louise Dupuis, Vincent Cabeli, Nikita Lagrange, Arianna Mencattini, Maria Carla Parrini, Eugenio Martinelli, and Herve Isambert
- Subjects
causal inference ,time-lapse image analysis ,live-cell imaging ,tumor on chip ,causal discovery ,granger causality ,Medicine ,Science ,Biology (General) ,QH301-705.5 - Abstract
Live-cell microscopy routinely provides massive amounts of time-lapse images of complex cellular systems under various physiological or therapeutic conditions. However, this wealth of data remains difficult to interpret in terms of causal effects. Here, we describe CausalXtract, a flexible computational pipeline that discovers causal and possibly time-lagged effects from morphodynamic features and cell–cell interactions in live-cell imaging data. CausalXtract methodology combines network-based and information-based frameworks, which is shown to discover causal effects overlooked by classical Granger and Schreiber causality approaches. We showcase the use of CausalXtract to uncover novel causal effects in a tumor-on-chip cellular ecosystem under therapeutically relevant conditions. In particular, we find that cancer-associated fibroblasts directly inhibit cancer cell apoptosis, independently from anticancer treatment. CausalXtract uncovers also multiple antagonistic effects at different time delays. Hence, CausalXtract provides a unique computational tool to interpret live-cell imaging data for a range of fundamental and translational research applications.
- Published
- 2025
- Full Text
- View/download PDF
36. AnchorFCI: harnessing genetic anchors for enhanced causal discovery of cardiometabolic disease pathways
- Author
-
Adèle H. Ribeiro, Milena Crnkovic, Jaqueline Lopes Pereira, Regina Mara Fisberg, Flavia Mori Sarti, Marcelo Macedo Rogero, Dominik Heider, and Andressa Cerqueira
- Subjects
causal discovery ,explainability ,RFCI ,genetic anchors ,unfaithfulness ,partial ancestral graphs ,Genetics ,QH426-470 - Abstract
IntroductionCardiometabolic diseases, a major global health concern, stem from complex interactions of lifestyle, genetics, and biochemical markers. While extensive research has revealed strong associations between various risk factors and these diseases, latent confounding and limited causal discovery methods hinder understanding of their causal relationships, essential for mechanistic insights and developing effective prevention and intervention strategies.MethodsWe introduce anchorFCI, a novel adaptation of the conservative Really Fast Causal Inference (RFCI) algorithm, designed to enhance robustness and discovery power in causal learning by strategically selecting and integrating reliable anchor variables from a set of variables known not to be caused by the variables of interest. This approach is well-suited for studies of phenotypic, clinical, and sociodemographic data, using genetic variables that are recognized to be unaffected by these factors. We demonstrate the method’s effectiveness through simulation studies and a comprehensive causal analysis of the 2015 ISA-Nutrition dataset, featuring both anchorFCI for causal discovery and state-of-the-art effect size identification tools from Judea Pearl’s framework, showcasing a robust, fully data-driven causal inference pipeline.ResultsOur simulation studies reveal that anchorFCI effectively enhances robustness and discovery power while handles latent confounding by integrating reliable anchor variables and their non-ancestral relationships. The 2015 ISA-Nutrition dataset analysis not only supports many established causal relationships but also elucidates their interconnections, providing a clearer understanding of the complex dynamics and multifaceted nature of cardiometabolic risk.DiscussionAnchorFCI holds significant potential for reliable causal discovery in complex, multidimensional datasets. By effectively integrating non-ancestral knowledge and addressing latent confounding, it is well-suited for various applications requiring robust causal inference from observational studies, providing valuable insights in epidemiology, genetics, and public health.
- Published
- 2024
- Full Text
- View/download PDF
37. Causal-assisted Sequence Segmentation and Its Soft Sensing Application for Multiphase Industrial Processes
- Author
-
He, Yimeng, Yao, Le, Kong, Xiangyin, Zhang, Xinmin, Song, Zhihuan, and Kano, Manabu
- Published
- 2024
- Full Text
- View/download PDF
38. The WHY in Business Processes: Discovery of Causal Execution Dependencies
- Author
-
Fournier, Fabiana, Limonad, Lior, Skarbovsky, Inna, and David, Yuval
- Published
- 2025
- Full Text
- View/download PDF
39. Causal discovery from nonstationary time series
- Author
-
Sadeghi, Agathe, Gopal, Achintya, and Fesanghary, Mohammad
- Published
- 2025
- Full Text
- View/download PDF
40. Learning distribution-free anchored linear structural equation models in the presence of measurement error
- Author
-
Chung, Junhyoung, Ahn, Youngmin, Shin, Donguk, and Park, Gunwoong
- Published
- 2024
- Full Text
- View/download PDF
41. Foundations of causal discovery on groups of variables
- Author
-
Wahl Jonas, Ninad Urmi, and Runge Jakob
- Subjects
causality ,causal discovery ,graphical models ,markov property ,faithfulness ,time series ,62d20 ,Mathematics ,QA1-939 ,Probabilities. Mathematical statistics ,QA273-280 - Abstract
Discovering causal relationships from observational data is a challenging task that relies on assumptions connecting statistical quantities to graphical or algebraic causal models. In this work, we focus on widely employed assumptions for causal discovery when objects of interest are (multivariate) groups of random variables rather than individual (univariate) random variables, as is the case in a variety of problems in scientific domains such as climate science or neuroscience. If the group level causal models are derived from partitioning a micro-level model into groups, we explore the relationship between micro- and group level causal discovery assumptions. We investigate the conditions under which assumptions like causal faithfulness hold or fail to hold. Our analysis encompasses graphical causal models that contain cycles and bidirected edges. We also discuss grouped time series causal graphs and variants thereof as special cases of our general theoretical framework. Thereby, we aim to provide researchers with a solid theoretical foundation for the development and application of causal discovery methods for variable groups.
- Published
- 2024
- Full Text
- View/download PDF
42. Learning debiased graph representations from the OMOP common data model for synthetic data generation
- Author
-
Nicolas Alexander Schulz, Jasmin Carus, Alexander Johannes Wiederhold, Ole Johanns, Frederik Peters, Natalie Rath, Katharina Rausch, Bernd Holleczek, Alexander Katalinic, the AI-CARE Working Group, and Christopher Gundler
- Subjects
Synthetic Data Generation ,Standardized Electronic Health Records ,Causal Discovery ,Discrete Time Series ,Structural Equation Models ,Graphical Models ,Medicine (General) ,R5-920 - Abstract
Abstract Background Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention. Methods Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts. Results The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand. Conclusion Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable.
- Published
- 2024
- Full Text
- View/download PDF
43. TSLiNGAM: DirectLiNGAM Under Heavy Tails.
- Author
-
Leyder, Sarah, Raymaekers, Jakob, and Verdonck, Tim
- Subjects
- *
DIRECTED acyclic graphs , *CAUSAL models , *STRUCTURAL models , *SUPPLY chain management , *NOISE - Abstract
AbstractOne of the established approaches to causal discovery consists of combining directed acyclic graphs (DAGs) with structural causal models (SCMs) to describe the functional dependencies of effects on their causes. Possible identifiability of SCMs given data depends on assumptions made on the noise variables and the functional classes in the SCM. For instance, in the LiNGAM model, the functional class is restricted to linear functions and the disturbances have to be non-Gaussian. In this work, we propose TSLiNGAM, a new method for identifying the DAG of a causal model based on observational data. TSLiNGAM builds on DirectLiNGAM, a popular algorithm which uses simple OLS regression for identifying causal directions between variables. TSLiNGAM leverages the non-Gaussianity assumption of the error terms in the LiNGAM model to obtain more efficient and robust estimation of the causal structure. TSLiNGAM is justified theoretically and is studied empirically in an extensive simulation study. It performs significantly better on heavy-tailed and skewed data and demonstrates a high small-sample efficiency. In addition, TSLiNGAM also shows better robustness properties as it is more resilient to contamination. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Invited commentary: where do the causal DAGS come from?
- Author
-
Didelez, Vanessa
- Subjects
- *
STATISTICAL models , *CAUSAL models , *DATA analysis , *CAUSALITY (Physics) , *LIFE course approach , *MATHEMATICAL models , *STATISTICS , *THEORY , *ALGORITHMS - Abstract
How do we construct our causal directed acyclic graphs (DAGs)—for example, for life-course modeling and analysis? In this commentary, I review how the data-driven construction of causal DAGs (causal discovery) has evolved, what promises it holds, and what limitations or caveats must be considered. I find that expert- or theory-driven model-building might benefit from some more checking against the data and that causal discovery could bring new ideas to old theories. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Tuning structure learning algorithms with out-of-sample and resampling strategies.
- Author
-
Chobtham, Kiattikun and Constantinou, Anthony C.
- Subjects
MACHINE learning ,BAYESIAN analysis ,SAMPLE size (Statistics) - Abstract
One of the challenges practitioners face when applying structure learning algorithms to their data involves determining a set of hyperparameters; otherwise, a set of hyperparameter defaults is assumed. The optimal hyperparameter configuration often depends on multiple factors, including the size and density of the usually unknown underlying true graph, the sample size of the input data, and the structure learning algorithm. We propose a novel hyperparameter tuning method, called the Out-of-sample Tuning for Structure Learning (OTSL), that employs out-of-sample and resampling strategies to estimate the optimal hyperparameter configuration for structure learning, given the input dataset and structure learning algorithm. Synthetic experiments show that employing OTSL to tune the hyperparameters of hybrid and score-based structure learning algorithms leads to improvements in graphical accuracy compared to the state-of-the-art. We also illustrate the applicability of this approach to real datasets from different disciplines. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. A hierarchical ensemble causal structure learning approach for wafer manufacturing.
- Author
-
Yang, Yu, Bom, Sthitie, and Shen, Xiaotong
- Subjects
ASSEMBLY line methods ,MANUFACTURING processes ,HIERARCHICAL Bayes model - Abstract
In manufacturing, causal relations between components have become crucial to automate assembly lines. Identifying these relations permits error tracing and correction in the absence of domain experts, in addition to advancing our knowledge about the operating characteristics of a complex system. This paper is motivated by a case study focusing on deciphering the causal structure of a wafer manufacturing system using data from sensors and abnormality monitors deployed within the assembly line. In response to the distinctive characteristics of the wafer manufacturing data, such as multimodality, high-dimensionality, imbalanced classes, and irregular missing patterns, we propose a hierarchical ensemble approach. This method leverages the temporal and domain constraints inherent in the assembly line and provides a measure of uncertainty in causal discovery. We extensively examine its operating characteristics via simulations and validate its effectiveness through simulation experiments and a practical application involving data obtained from Seagate Technology. Domain engineers have cross-validated the learned structures and corroborated the identified causal relationships. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Causality and causal inference for engineers: Beyond correlation, regression, prediction and artificial intelligence.
- Author
-
Naser, M. Z.
- Subjects
- *
ARTIFICIAL intelligence , *CAUSAL inference , *ENGINEERS , *CAUSAL models , *ENGINEERING - Abstract
In order to engineer new materials, structures, systems, and processes that address persistent challenges, engineers seek to tie causes to effects and understand the effects of causes. Such a pursuit requires a causal investigation to uncover the underlying structure of the data generating process (DGP) governing phenomena. A causal approach derives causal models that engineers can adopt to infer the effects of interventions (and explore possible counterfactuals). Yet, and for the most part, we continue to design experiments in the hope of empirically observing engineered intervention(s). Such experiments are idealized, complex, and costly and hence are narrow in scope. On the contrary, a causal investigation will allow us to peek into the how and why of a DGP and provide us with the essential means to articulate a causal model that accurately describes the phenomenon on hand and better predicts the outcome of possible interventions. Adopting a causal approach in engineering is perhaps more warranted than ever—especially with the rise of big data and the adoption of artificial intelligence (AI); wherein AI models are naivety presumed to describe causal ties. To bridge such knowledge gap, this primer presents fundamental principles behind causal discovery, causal inference, and counterfactuals from an engineering perspective and contrasts that to those pertaining to correlation, regression, and AI. This article is categorized under:Application Areas > Industry Specific ApplicationsAlgorithmic Development > Causality DiscoveryApplication Areas > Science and TechnologyTechnologies > Machine Learning [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Learning debiased graph representations from the OMOP common data model for synthetic data generation.
- Author
-
Schulz, Nicolas Alexander, Carus, Jasmin, Wiederhold, Alexander Johannes, Johanns, Ole, Peters, Frederik, Rath, Natalie, Rausch, Katharina, Holleczek, Bernd, Katalinic, Alexander, Nennecke, Alice, Kusche, Henrik, Heinrichs, Vera, Eberle, Andrea, Luttmann, Sabine, Abnaof, Khalid, Kim-Wanner, Soo-Zin, Handels, Heinz, Germer, Sebastian, Halber, Marco, and Richter, Martin
- Subjects
REPRESENTATIONS of graphs ,ELECTRONIC health record standards ,MEDICAL informatics ,DATA modeling ,NURSING informatics ,MARKOV processes - Abstract
Background: Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention. Methods: Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts. Results: The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand. Conclusion: Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Interpretability of Causal Discovery in Tracking Deterioration in a Highly Dynamic Process.
- Author
-
Choudhary, Asha, Vuković, Matej, Mutlu, Belgin, Haslgrübler, Michael, and Kern, Roman
- Subjects
- *
MANUFACTURING processes , *TIME series analysis , *PROCESS optimization , *TEXTILE industry , *VISCOSE - Abstract
In a dynamic production processes, mechanical degradation poses a significant challenge, impacting product quality and process efficiency. This paper explores a novel approach for monitoring degradation in the context of viscose fiber production, a highly dynamic manufacturing process. Using causal discovery techniques, our method allows domain experts to incorporate background knowledge into the creation of causal graphs. Further, it enhances the interpretability and increases the ability to identify potential problems via changes in causal relations over time. The case study employs a comprehensive analysis of the viscose fiber production process within a prominent textile industry, emphasizing the advantages of causal discovery for monitoring degradation. The results are compared with state-of-the-art methods, which are not considered to be interpretable, specifically LSTM-based autoencoder, UnSupervised Anomaly Detection on Multivariate Time Series (USAD), and Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data (TranAD), showcasing the alignment and validation of our approach. This paper provides valuable information on degradation monitoring strategies, demonstrating the efficacy of causal discovery in dynamic manufacturing environments. The findings contribute to the evolving landscape of process optimization and quality control. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. A high-precision interpretable framework for marine dissolved oxygen concentration inversion.
- Author
-
Xin Li, Zhenyi Liu, Zongchi Yang, Fan Meng, and Tao Song
- Subjects
OXYGEN content of seawater ,DEEP learning ,ARTIFICIAL intelligence ,MARINE ecology - Abstract
Variations in Marine Dissolved Oxygen Concentrations (MDOC) play a critical role in the study of marine ecosystems and global climate evolution. Although artificial intelligence methods, represented by deep learning, can enhance the precision of MDOC inversion, the uninterpretability of the operational mechanism involved in the "black-box" often make the process difficult to interpret. To address this issue, this paper proposes a high-precision interpretable framework (CDRP) for intelligent MDOC inversion, including Causal Discovery, Drift Detection, RuleFit Model, and Post Hoc Analysis. The entire process of the proposed framework is fully interpretable: (i) The causal relationships between various elements are further clarified. (ii) During the phase of concept drift analysis, the potential factors contributing to changes in marine data are extracted. (iii) The operational rules of RuleFit ensure computational transparency. (iv) Post hoc analysis provides a quantitative interpretation from both global and local perspectives. Furthermore, we have derived quantitative conclusions about the impacts of various marine elements, and our analysis maintains consistency with conclusions in marine literature on MDOC. Meanwhile, CDRP also ensures the precision of MDOC inversion: (i) PCMCI causal discovery eliminates the interference of weakly associated elements. (ii) Concept drift detection takes more representative key frames. (iii) RuleFit achieves higher precision than other models. Experiments demonstrate that CDRP has reached the optimal level in single point buoy data inversion task. Overall, CDRP can enhance the interpretability of the intelligent MDOC inversion process while ensuring high precision. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.