37 results on '"causal effect estimation"'
Search Results
2. A survey of deep causal models and their industrial applications.
- Author
-
Li, Zongyu, Guo, Xiaobo, and Qiang, Siwei
- Abstract
The notion of causality assumes a paramount position within the realm of human cognition. Over the past few decades, there has been significant advancement in the domain of causal effect estimation across various disciplines, including but not limited to computer science, medicine, economics, and industrial applications. Given the continous advancements in deep learning methodologies, there has been a notable surge in its utilization for the estimation of causal effects using counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective functions to estimate counterfactual data unbiasedly. Different from the existing surveys on causal models in machine learning, this review mainly focuses on the overview of the deep causal models based on neural networks, and its core contributions are as follows: (1) we cast insight on a comprehensive overview of deep causal models from both timeline of development and method classification perspectives; (2) we outline some typical applications of causal effect estimation to industry; (3) we also endeavor to present a detailed categorization and analysis on relevant datasets, source codes and experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Data-Driven Causal Effect Estimation Based on Graphical Causal Modelling: A Survey.
- Author
-
Cheng, Debo, Li, Jiuyong, Liu, Lin, Liu, Jixue, and Le, Thuc Duy
- Published
- 2024
- Full Text
- View/download PDF
4. Interpretable fracturing optimization of shale oil reservoir production based on causal inference
- Author
-
Yang, Huohai, Li, Yi, Min, Chao, Yue, Jie, Li, Fuwei, Li, Renze, and Chu, Xiangshu
- Published
- 2024
- Full Text
- View/download PDF
5. Causal inference in the medical domain: a survey.
- Author
-
Wu, Xing, Peng, Shaoqi, Li, Jingwen, Zhang, Jian, Sun, Qun, Li, Weimin, Qian, Quan, Liu, Yue, and Guo, Yike
- Subjects
CAUSAL inference ,ELECTRONIC health records - Abstract
Causal inference is considered a crucial topic in the medical field, as it enables the determination of causal effects for medical treatments through data analysis. However, the vast volume and complexity of medical data present significant challenges for traditional machine learning methods in accurately assessing treatment effects. Issues such as noise in the data, unstructured information, and label sparsity can lead to unstable causal identification and erroneous correlation inference. To address these challenges, we propose a systematic survey of causal inference in the medical field, which encompasses studies utilizing observational data, aimed at organizing and summarizing the key concepts, methods, and applications of causal inference. Moreover, the causal inference applications are presented across various types of medical data, including medical images and Electronic Medical Records (EMR), using specific medical cases as examples. The thorough review not only emphasizes the theoretical and practical significance of causal inference methods but also highlights potential research directions in the medical domain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. A Tutorial on Applying the Difference-in-Differences Method to Health Data
- Author
-
Rothbard, Sarah, Etheridge, James C., and Murray, Eleanor J.
- Published
- 2024
- Full Text
- View/download PDF
7. 基于工具变量的丁苯酞-急性缺血性卒中的 因果效应评估.
- Author
-
林容基, 陈 薇, 黄志新, and 蔡瑞初
- Abstract
Copyright of Journal of Guangdong University of Technology is the property of Journal of Guangdong University of Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
8. Assessing treatment effect heterogeneity : predictive covariate selection and subgroup identification
- Author
-
Papangelou, Konstantinos, Brown, Gavin, and Mu, Tingting
- Subjects
machine learning ,causal effect estimation ,variable selection ,information theory ,recursive partitioning ,subgroup identification - Abstract
A key objective in an interventional study, such as a randomised clinical trial, is the evaluation of heterogeneity of treatment effect in the population. This allows us to identify the most promising intervention for a given observation. In this thesis we approach this by targeting two tightly coupled sub-problems. The first concerns the identification of covariates and the second the identification of subgroups associated with treatment effect heterogeneity. Regarding the first problem we study an information theoretic approach. This can be motivated by phrasing the predictive covariate selection problem in log-likelihood terms. We study the properties of this approach in the case of randomised studies and evaluate low-dimensional approximations that are better suited for small-sample and/or high-dimensional studies. We identify some limitations and propose extensions based on propensity score weighting and stratification that extend this criterion in scenarios when the treatment assignment depends on the covariates. Regarding the second problem, we discuss recursive partitioning approaches coupled with weighting methods for treatment effect estimation. The purpose of these methods is to tackle the problem of subgroup identification in the presence of confounders in the data. Finally, studying the literature of subgroup identification we identify a significant number of approaches. Given such a large number of methods to choose from, an important question is how to select the best for a given task. We introduce a framework that uses the subgroup stability as a measure to capture the variations in the identified subgroups due to small changes in the data.
- Published
- 2021
9. Propensity score analysis with missing data using a multi-task neural network
- Author
-
Shu Yang, Peipei Du, Xixi Feng, Daihai He, Yaolong Chen, Linda L. D. Zhong, Xiaodong Yan, and Jiawei Luo
- Subjects
Observational study ,Propensity score analysis ,Neural network ,Multitasking learning ,Causal effect estimation ,Inverse probability weighting ,Medicine (General) ,R5-920 - Abstract
Abstract Background Propensity score analysis is increasingly used to control for confounding factors in observational studies. Unfortunately, unavoidable missing values make estimating propensity scores extremely challenging. We propose a new method for estimating propensity scores in data with missing values. Materials and methods Both simulated and real-world datasets are used in our experiments. The simulated datasets were constructed under 2 scenarios, the presence (T = 1) and the absence (T = 0) of the true effect. The real-world dataset comes from LaLonde’s employment training program. We construct missing data with varying degrees of missing rates under three missing mechanisms: MAR, MCAR, and MNAR. Then we compare MTNN with 2 other traditional methods in different scenarios. The experiments in each scenario were repeated 20,000 times. Our code is publicly available at https://github.com/ljwa2323/MTNN . Results Under the three missing mechanisms of MAR, MCAR and MNAR, the RMSE between the effect and the true effect estimated by our proposed method is the smallest in simulations and in real-world data. Furthermore, the standard deviation of the effect estimated by our method is the smallest. In situations where the missing rate is low, the estimation of our method is more accurate. Conclusions MTNN can perform propensity score estimation and missing value filling at the same time through shared hidden layers and joint learning, which solves the dilemma of traditional methods and is very suitable for estimating true effects in samples with missing values. The method is expected to be broadly generalized and applied to real-world observational studies.
- Published
- 2023
- Full Text
- View/download PDF
10. Real-World Effectiveness of Lung Cancer Screening Using Deep Learning-Based Counterfactual Prediction.
- Author
-
Zheng FENG, Zhaoyi CHEN, Yi GUO, PROSPERI, Mattia, MEHTA, Hiren, BRAITHWAITE, Dejana, Yonghui WU, and BIAN, Jiang
- Abstract
The benefits and harms of lung cancer screening (LCS) for patients in the real-world clinical setting have been argued. Recently, discriminative prediction modeling of lung cancer with stratified risk factors has been developed to investigate the real-world effectiveness of LCS from observational data. However, most of these studies were conducted at the population level that only measured the difference in the average outcome between groups. In this study, we built counterfactual prediction models for lung cancer risk and mortality and examined for individual patients whether LCS as a hypothetical intervention reduces lung cancer risk and subsequent mortality. We investigated traditional and deep learning (DL)-based causal methods that provide individualized treatment effect (ITE) at the patient level and evaluated them with a cohort from the OneFlorida+ Clinical Research Consortium. We further discussed and demonstrated that the ITE estimation model can be used to personalize clinical decision support for a broader population. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. How Confident Are We about Observational Findings in Healthcare: A Benchmark Study.
- Author
-
Schuemie, Martijn J, Cepeda, M Soledad, Suchard, Marc A, Yang, Jianxiao, Tian, Yuxi, Schuler, Alejandro, Ryan, Patrick B, Madigan, David, and Hripcsak, George
- Subjects
causal effect estimation ,evaluation ,methods ,observational research ,Clinical Research ,8.4 Research design and methodologies (health services) ,Health and social care services research ,Generic health relevance - Abstract
Healthcare professionals increasingly rely on observational healthcare data, such as administrative claims and electronic health records, to estimate the causal effects of interventions. However, limited prior studies raise concerns about the real-world performance of the statistical and epidemiological methods that are used. We present the "OHDSI Methods Benchmark" that aims to evaluate the performance of effect estimation methods on real data. The benchmark comprises a gold standard, a set of metrics, and a set of open source software tools. The gold standard is a collection of real negative controls (drug-outcome pairs where no causal effect appears to exist) and synthetic positive controls (drug-outcome pairs that augment negative controls with simulated causal effects). We apply the benchmark using four large healthcare databases to evaluate methods commonly used in practice: the new-user cohort, self-controlled cohort, case-control, case-crossover, and self-controlled case series designs. The results confirm the concerns about these methods, showing that for most methods the operating characteristics deviate considerably from nominal levels. For example, in most contexts, only half of the 95% confidence intervals we calculated contain the corresponding true effect size. We previously developed an "empirical calibration" procedure to restore these characteristics and we also evaluate this procedure. While no one method dominates, self-controlled methods such as the empirically calibrated self-controlled case series perform well across a wide range of scenarios.
- Published
- 2020
12. Causal Reasoning Methods in Medical Domain: A Review
- Author
-
Wu, Xing, Li, Jingwen, Qian, Quan, Liu, Yue, Guo, Yike, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Fujita, Hamido, editor, Fournier-Viger, Philippe, editor, Ali, Moonis, editor, and Wang, Yinglin, editor
- Published
- 2022
- Full Text
- View/download PDF
13. Propensity score analysis with missing data using a multi-task neural network.
- Author
-
Yang, Shu, Du, Peipei, Feng, Xixi, He, Daihai, Chen, Yaolong, Zhong, Linda L. D., Yan, Xiaodong, and Luo, Jiawei
- Subjects
- *
MISSING data (Statistics) , *EMPLOYEE training , *DATA analysis , *STANDARD deviations - Abstract
Background: Propensity score analysis is increasingly used to control for confounding factors in observational studies. Unfortunately, unavoidable missing values make estimating propensity scores extremely challenging. We propose a new method for estimating propensity scores in data with missing values. Materials and methods: Both simulated and real-world datasets are used in our experiments. The simulated datasets were constructed under 2 scenarios, the presence (T = 1) and the absence (T = 0) of the true effect. The real-world dataset comes from LaLonde's employment training program. We construct missing data with varying degrees of missing rates under three missing mechanisms: MAR, MCAR, and MNAR. Then we compare MTNN with 2 other traditional methods in different scenarios. The experiments in each scenario were repeated 20,000 times. Our code is publicly available at https://github.com/ljwa2323/MTNN. Results: Under the three missing mechanisms of MAR, MCAR and MNAR, the RMSE between the effect and the true effect estimated by our proposed method is the smallest in simulations and in real-world data. Furthermore, the standard deviation of the effect estimated by our method is the smallest. In situations where the missing rate is low, the estimation of our method is more accurate. Conclusions: MTNN can perform propensity score estimation and missing value filling at the same time through shared hidden layers and joint learning, which solves the dilemma of traditional methods and is very suitable for estimating true effects in samples with missing values. The method is expected to be broadly generalized and applied to real-world observational studies. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Causal hybrid modeling with double machine learning—applications in carbon flux modeling
- Author
-
Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Markus Reichstein, and Gustau Camps-Valls
- Subjects
knowledge-guided machine learning ,hybrid modeling ,causal effect estimation ,double machine learning ,temperature sensitivity ,carbon flux partitioning ,Computer engineering. Computer hardware ,TK7885-7895 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing double machine learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the Q _10 model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning.
- Published
- 2024
- Full Text
- View/download PDF
15. Clustering of causal graphs to explore drivers of river discharge.
- Author
-
Günther, Wiebke, Miersch, Peter, Ninad, Urmi, and Runge, Jakob
- Subjects
METEOROLOGICAL databases ,STREAM measurements ,CLUSTER analysis (Statistics) ,WATERSHEDS ,HYDROLOGY - Abstract
This work aims to classify catchments through the lens of causal inference and cluster analysis. In particular, it uses causal effects (CEs) of meteorological variables on river discharge while only relying on easily obtainable observational data. The proposed method combines time series causal discovery with CE estimation to develop features for a subsequent clustering step. Several ways to customize and adapt the features to the problem at hand are discussed. In an application example, the method is evaluated on 358 European river catchments. The found clusters are analyzed using the causal mechanisms that drive them and their environmental attributes. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Robust Methods for Quantifying the Effect of a Continuous Exposure From Observational Data.
- Author
-
Tourani, Roshan, Ma, Sisi, Usher, Michael, and Simon, Gyorgy J.
- Subjects
ANTIBIOTIC residues ,GLOBAL Positioning System ,DRUG dosage - Abstract
A cornerstone of clinical medicine is intervening on a continuous exposure, such as titrating the dosage of a pharmaceutical or controlling a laboratory result. In clinical trials, continuous exposures are dichotomized into narrow ranges, excluding large portions of the realistic treatment scenarios. The existing computational methods for estimating the effect of continuous exposure rely on a set of strict assumptions. We introduce new methods that are more robust towards violations of these assumptions. Our methods are based on the key observation that changes of exposure in the clinical setting are often achieved gradually, so effect estimates must be “locally” robust in narrower exposure ranges. We compared our methods with several existing methods on three simulated studies with increasing complexity. We also applied the methods to data from 14 k sepsis patients at M Health Fairview to estimate the effect of antibiotic administration latency on prolonged hospital stay. The proposed methods achieve good performance in all simulation studies. When the assumptions were violated, the proposed methods had estimation errors of one half to one fifth of the state-of-the-art methods. Applying our methods to the sepsis cohort resulted in effect estimates consistent with clinical knowledge. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. Clustering of causal graphs to explore drivers of river discharge
- Author
-
Wiebke Günther, Peter Miersch, Urmi Ninad, and Jakob Runge
- Subjects
catchment hydrology ,causal effect estimation ,causal inference ,clustering ,Environmental sciences ,GE1-350 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
This work aims to classify catchments through the lens of causal inference and cluster analysis. In particular, it uses causal effects (CEs) of meteorological variables on river discharge while only relying on easily obtainable observational data. The proposed method combines time series causal discovery with CE estimation to develop features for a subsequent clustering step. Several ways to customize and adapt the features to the problem at hand are discussed. In an application example, the method is evaluated on 358 European river catchments. The found clusters are analyzed using the causal mechanisms that drive them and their environmental attributes.
- Published
- 2023
- Full Text
- View/download PDF
18. Propensity Score Matching Underestimates Real Treatment Effect, in a Simulated Theoretical Multivariate Model.
- Author
-
Garcia Iglesias, Daniel
- Subjects
- *
MONTE Carlo method , *TREATMENT effectiveness , *ESTIMATION bias , *REGRESSION analysis , *PROPENSITY score matching - Abstract
Propensity Score Matching (PSM) is a useful method to reduce the impact of Treatment-Selection Bias in the estimation of causal effects in observational studies. After matching, the PSM significantly reduces the sample under investigation, which may lead to other possible biases (due to overfitting, excess of covariation or a reduced number of observations). In this sense, we wanted to analyze the behavior of this PSM compared with other widely used methods to deal with non-comparable groups, such as the Multivariate Regression Model (MRM). Monte Carlo Simulations are made to construct groups with different effects in order to compare the behavior of PSM and MRM estimating these effects. In addition, the Treatment Selection Bias reduction for the PSM is calculated. With the PSM a reduction in the Treatment Selection Bias is achieved (0.983 [0.982, 0.984]), with a reduction in the Relative Real Treatment Effect Estimation Error (0.216 [0.2, 0.232]), but despite this bias reduction and estimation error reduction, the MRM reduces this estimation error significantly more than the PSM (0.539 [0.522, 0.556], p < 0.001). In addition, the PSM leads to a 30% reduction in the sample. This loss of information derived from the matching process may lead to another not known bias and thus to the inaccuracy of the effect estimation compared with the MRM. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Multi-modal trajectory forecasting with Multi-scale Interactions and Multi-pseudo-target Supervision.
- Author
-
Zhao, Cong, Song, Andi, Zeng, Zimu, Ji, Yuxiong, and Du, Yuchuan
- Abstract
Trajectory forecasting is crucial for the advancement of autonomous vehicles. While much progress has been made, extant approaches often fall short in accounting for intricate social interactions and the unpredictability of human behavior. This paper introduces a novel multi-modal trajectory forecasting model named Multi-scale Interaction and Multi-pseudo-target Supervision (MIMS). Central to our approach is a multiscale hypergraph that discerns latent interactions across all scales, evaluating both their strength and type. Further enhancing our model's capabilities is a causal-effect-estimation-based pseudo-target generation method. This facilitates multi-modality modeling in motion forecasting by offering explicit supervision with multiple latent targets. We validate MIMS using the Argoverse Motion Forecasting dataset. The results reveal that MIMS outperforms current state-of-the-art models across various traffic conditions. Specifically, our multiscale hypergraph exhibits a superior capacity for capturing complex spatiotemporal dependencies compared to pairwise methods. Moreover, our multi-pseudo-target supervision eliminates the pattern convergence of multi-modal forecasting. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Discovering causally invariant features for out-of-distribution generalization.
- Author
-
Wang, Yujie, Yu, Kui, Xiang, Guodu, Cao, Fuyuan, and Liang, Jiye
- Subjects
- *
GENERALIZATION , *GRANGER causality test - Abstract
Out-of-distribution (OOD) generalization aims to generalize a model trained on source domains to unseen target domains. Recently, causality-based generalization methods have focused on learning invariant causal relationships around the label variable, as causal mechanisms are robust across different domains. However, these methods would yield an inaccurate causal variable set due to the lack of heterogeneous domain data or a prior causal structure, which severely weakens their generalization capacity. To address this problem, we propose a Causally Invariant Features Discovery (CIFD) framework, which combines causal structure discovery and causal effect estimation for selecting a high-quality causal variable set and realizing better OOD generalization. Specifically, CIFD first identifies all potential causal variables by learning a double-layer-based local causal structure around the label variable. Secondly, CIFD uses a double-layer causal effect estimator for estimating the causality of potential causal variables and obtaining true causal variables. The comprehensive experiments on both regression and classification tasks clearly demonstrate the superiority of our framework over the state-of-art methods. • We propose a CIFD framework to find accurate causal variables for OOD generalization. • Potential causal variables are identified by a double-layer local causal structure. • True causal variables are learnt by a double-layer total causal effect estimator. • Comprehensive experiments demonstrate the superiority of CIFD over the SOTA methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Causal inference for time series analysis: problems, methods and evaluation.
- Author
-
Moraffah, Raha, Sheth, Paras, Karami, Mansooreh, Bhattacharya, Anchit, Wang, Qianru, Tahir, Anique, Raglin, Adrienne, and Liu, Huan
- Subjects
CAUSAL inference ,EVALUATION methodology ,TIME series analysis ,DYNAMICAL systems ,SCIENTIFIC discoveries - Abstract
Time series data are a collection of chronological observations which are generated by several domains such as medical and financial fields. Over the years, different tasks such as classification, forecasting and clustering have been proposed to analyze this type of data. Time series data have been also used to study the effect of interventions overtime. Moreover, in many fields of science, learning the causal structure of dynamic systems and time series data is considered an interesting task which plays an important role in scientific discoveries. Estimating the effect of an intervention and identifying the causal relations from the data can be performed via causal inference. Existing surveys on time series discuss traditional tasks such as classification and forecasting or explain the details of the approaches proposed to solve a specific task. In this paper, we focus on two causal inference tasks, i.e., treatment effect estimation and causal discovery for time series data and provide a comprehensive review of the approaches in each task. Furthermore, we curate a list of commonly used evaluation metrics and datasets for each task and provide an in-depth insight. These metrics and datasets can serve as benchmark for research in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
22. Instrumental Heterogeneity in Sex-Specific Two-Sample Mendelian Randomization: Empirical Results From the Relationship Between Anthropometric Traits and Breast/Prostate Cancer
- Author
-
Yixin Gao, Jinhui Zhang, Huashuo Zhao, Fengjun Guan, and Ping Zeng
- Subjects
two-sample Mendelian randomization ,sex-specific and sex-combined instrumental variable ,sex heterogeneity ,causal effect estimation ,summary statistics ,breast cancer ,Genetics ,QH426-470 - Abstract
BackgroundIn two-sample Mendelian randomization (MR) studies, sex instrumental heterogeneity is an important problem needed to address carefully, which however is often overlooked and may lead to misleading causal inference.MethodsWe first employed cross-trait linkage disequilibrium score regression (LDSC), Pearson’s correlation analysis, and the Cochran’s Q test to examine sex genetic similarity and heterogeneity in instrumental variables (IVs) of exposures. Simulation was further performed to explore the influence of sex instrumental heterogeneity on causal effect estimation in sex-specific two-sample MR analyses. Furthermore, we chose breast/prostate cancer as outcome and four anthropometric traits as exposures as an illustrative example to illustrate the importance of taking sex heterogeneity of instruments into account in MR studies.ResultsThe simulation definitively demonstrated that sex-combined IVs can lead to biased causal effect estimates in sex-specific two-sample MR studies. In our real applications, both LDSC and Pearson’s correlation analyses showed high genetic correlation between sex-combined and sex-specific IVs of the four anthropometric traits, while nearly all the correlation coefficients were larger than zero but less than one. The Cochran’s Q test also displayed sex heterogeneity for some instruments. When applying sex-specific instruments, significant discrepancies in the magnitude of estimated causal effects were detected for body mass index (BMI) on breast cancer (P = 1.63E-6), for hip circumference (HIP) on breast cancer (P = 1.25E-20), and for waist circumference (WC) on prostate cancer (P = 0.007) compared with those generated with sex-combined instruments.ConclusionOur study reveals that the sex instrumental heterogeneity has non-ignorable impact on sex-specific two-sample MR studies and the causal effects of anthropometric traits on breast/prostate cancer would be biased if sex-combined IVs are incorrectly employed.
- Published
- 2021
- Full Text
- View/download PDF
23. Instrumental Heterogeneity in Sex-Specific Two-Sample Mendelian Randomization: Empirical Results From the Relationship Between Anthropometric Traits and Breast/Prostate Cancer.
- Author
-
Gao, Yixin, Zhang, Jinhui, Zhao, Huashuo, Guan, Fengjun, and Zeng, Ping
- Subjects
PROSTATE cancer ,BREAST cancer ,HETEROGENEITY ,BODY mass index ,WAIST circumference - Abstract
Background: In two-sample Mendelian randomization (MR) studies, sex instrumental heterogeneity is an important problem needed to address carefully, which however is often overlooked and may lead to misleading causal inference. Methods: We first employed cross-trait linkage disequilibrium score regression (LDSC), Pearson's correlation analysis, and the Cochran's Q test to examine sex genetic similarity and heterogeneity in instrumental variables (IVs) of exposures. Simulation was further performed to explore the influence of sex instrumental heterogeneity on causal effect estimation in sex-specific two-sample MR analyses. Furthermore, we chose breast/prostate cancer as outcome and four anthropometric traits as exposures as an illustrative example to illustrate the importance of taking sex heterogeneity of instruments into account in MR studies. Results: The simulation definitively demonstrated that sex-combined IVs can lead to biased causal effect estimates in sex-specific two-sample MR studies. In our real applications, both LDSC and Pearson's correlation analyses showed high genetic correlation between sex-combined and sex-specific IVs of the four anthropometric traits, while nearly all the correlation coefficients were larger than zero but less than one. The Cochran's Q test also displayed sex heterogeneity for some instruments. When applying sex-specific instruments, significant discrepancies in the magnitude of estimated causal effects were detected for body mass index (BMI) on breast cancer (P = 1.63E-6), for hip circumference (HIP) on breast cancer (P = 1.25E-20), and for waist circumference (WC) on prostate cancer (P = 0.007) compared with those generated with sex-combined instruments. Conclusion: Our study reveals that the sex instrumental heterogeneity has non-ignorable impact on sex-specific two-sample MR studies and the causal effects of anthropometric traits on breast/prostate cancer would be biased if sex-combined IVs are incorrectly employed. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
24. Propensity Score Matching Underestimates Real Treatment Effect, in a Simulated Theoretical Multivariate Model
- Author
-
Daniel Garcia Iglesias
- Subjects
propensity score matching ,multivariate analysis ,general linear model ,Monte Carlo method ,causal effect estimation ,observational study ,Mathematics ,QA1-939 - Abstract
Propensity Score Matching (PSM) is a useful method to reduce the impact of Treatment-Selection Bias in the estimation of causal effects in observational studies. After matching, the PSM significantly reduces the sample under investigation, which may lead to other possible biases (due to overfitting, excess of covariation or a reduced number of observations). In this sense, we wanted to analyze the behavior of this PSM compared with other widely used methods to deal with non-comparable groups, such as the Multivariate Regression Model (MRM). Monte Carlo Simulations are made to construct groups with different effects in order to compare the behavior of PSM and MRM estimating these effects. In addition, the Treatment Selection Bias reduction for the PSM is calculated. With the PSM a reduction in the Treatment Selection Bias is achieved (0.983 [0.982, 0.984]), with a reduction in the Relative Real Treatment Effect Estimation Error (0.216 [0.2, 0.232]), but despite this bias reduction and estimation error reduction, the MRM reduces this estimation error significantly more than the PSM (0.539 [0.522, 0.556], p < 0.001). In addition, the PSM leads to a 30% reduction in the sample. This loss of information derived from the matching process may lead to another not known bias and thus to the inaccuracy of the effect estimation compared with the MRM.
- Published
- 2022
- Full Text
- View/download PDF
25. Cold‐pool‐driven convective initiation: using causal graph analysis to determine what convection‐permitting models are missing.
- Author
-
Hirt, Mirjam, Craig, George C., Schäfer, Sophia A. K., Savre, Julien, and Heinze, Rieke
- Subjects
- *
NUMERICAL weather forecasting , *PRECIPITATION forecasting , *PREDICTION models - Abstract
Cold‐pool‐driven convective initiation is investigated in high‐resolution, convection‐permitting simulations with a focus on the diurnal cycle and organization of convection and the sensitivity to grid size. Simulations of four different days over Germany were performed using the ICON‐LEM model with grid sizes from 156 to 625 m. In these simulations, we identify cold pools, cold‐pool boundaries and initiated convection. Convection is triggered much more efficiently in the vicinity of cold pools than in other regions and can provide as much as 50% of total convective initiation, in particular in the late afternoon. By comparing different model resolutions, we find that cold pools are more frequent, smaller and less intense in lower‐resolution simulations. Furthermore, their gust fronts are weaker and less likely to trigger new convection. To identify how model resolution affects this triggering probability, we use a linear causal graph analysis. In doing so, we postulate a graph structure with potential causal pathways and then apply multi‐linear regression accordingly. We find a dominant, systematic effect: reducing grid sizes directly reduces upward mass flux at the gust front, which causes weaker triggering probabilities. These findings are expected to be even more relevant for km‐scale, numerical weather prediction models. We thus expect that a better representation of cold‐pool‐driven convective initiation will improve forecasts of convective precipitation. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
26. What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?
- Author
-
F. Ribeiro, Andre, Neffke, Frank, and Hausmann, Ricardo
- Published
- 2022
- Full Text
- View/download PDF
27. When Causal Inference Meets Graph Machine Learning: Unleashing the Potential of Mutual Benefit
- Subjects
Causal Inference ,Fairness ,Causal Effect Estimation ,Explanation ,Graph Learning ,Trustworthy AI - Abstract
Recent years have witnessed rapid development in graph-based machine learning (ML) in various high-impact domains (e.g., healthcare, recommendation, and security), especially those powered by effective graph neural networks (GNNs). Currently, the mainstream graph ML methods are based on statistical learning, e.g., utilizing the statistical correlations between node features, graph structure, and labels for node classification. However, statistical learning has been widely criticized for only capturing the superficial relations between variables in the data system, and consequently, rendering the lack of trustworthiness in real-world applications. For example, ML models often make biased predictions toward underrepresented groups. Besides, these ML models often lack explanation for humans. Therefore, it is crucial to understand the causality in the data system and the learning process. Causal inference is the discipline that aims to investigate the causality inside a system, for example, to identify and estimate the causal effect of a certain treatment (e.g., wearing a face mask) on an important outcome (e.g., COVID-19 infection). Involving the concepts and philosophy of causal inference into ML methods is often considered as a significant component of human-level intelligence and can serve as the foundation of artificial intelligence (AI). However, most traditional causal inference studies rely on strong assumptions and focus on independent and identically distributed (i.i.d.) data. Thus, most of them cannot be directly grafted on graphs. Therefore, causal inference on graphs is still faced with many unique barriers in effectiveness. Fortunately, the interplay between causal inference and graph ML has the potential to bring mutual benefit to each other. In this thesis, we will present the challenges and our research contributions for bridging the gap between causal inference and graph ML. Our research aims to unleash the mutual benefit in these two areas, mainly including two key research perspectives: Q1) How to leverage graph ML methods to facilitate causal inference in effectiveness? Q2) How to leverage causality to facilitate graph ML models in model trustworthiness (e.g., model fairness and explanation)? Correspondingly, we introduce the background, challenges, and related work in Part I. In Part II, we introduce our detailed research problems and methodologies for causal inference on graph data powered by graph ML technologies (Q1). In Part III, we present our work in causality-involved trustworthy graph ML methods (Q2). In Part IV, we further introduce future research directions on causal machine learning, trustworthy AI, and graph mining, providing insights that manifest in real-world scenarios to facilitate future high-stakes applications.
- Published
- 2023
- Full Text
- View/download PDF
28. Real-World Effectiveness of Lung Cancer Screening Using Deep Learning-Based Counterfactual Prediction.
- Author
-
Feng Z, Chen Z, Guo Y, Prosperi M, Mehta H, Braithwaite D, Wu Y, and Bian J
- Subjects
- Humans, Early Detection of Cancer, Risk Factors, Lung Neoplasms diagnosis, Deep Learning
- Abstract
The benefits and harms of lung cancer screening (LCS) for patients in the real-world clinical setting have been argued. Recently, discriminative prediction modeling of lung cancer with stratified risk factors has been developed to investigate the real-world effectiveness of LCS from observational data. However, most of these studies were conducted at the population level that only measured the difference in the average outcome between groups. In this study, we built counterfactual prediction models for lung cancer risk and mortality and examined for individual patients whether LCS as a hypothetical intervention reduces lung cancer risk and subsequent mortality. We investigated traditional and deep learning (DL)-based causal methods that provide individualized treatment effect (ITE) at the patient level and evaluated them with a cohort from the OneFlorida+ Clinical Research Consortium. We further discussed and demonstrated that the ITE estimation model can be used to personalize clinical decision support for a broader population.
- Published
- 2024
- Full Text
- View/download PDF
29. A general framework for causal classification
- Author
-
Li, Jiuyong, Zhang, Weijia, Liu, Lin, Yu, Kui, Le, Thuc Duy, and Liu, Jixue
- Published
- 2021
- Full Text
- View/download PDF
30. Causal effect estimation and inference using Stata.
- Author
-
Terza, Joseph V.
- Subjects
- *
CAUSAL models , *NONLINEAR estimation , *DATA transformations (Statistics) - Abstract
Terza (2016b, Health Services Research 51: 1109-1113) gives the correct generic expression for the asymptotic standard errors of statistics formed as sample means of nonlinear data transformations. In this article, I assess the performance of the Stata margins command as a relatively simple alternative for calculating such standard errors. I note that margins is not available for all packaged nonlinear regression commands in Stata and cannot be implemented in conjunction with user-defined-and-coded nonlinear estimation protocols that do not make a predict command available. When margins is available, however, I establish (using a real-data example) that it produces standard errors that are asymptotically equivalent to those obtained from the formulations in Terza (2016b) and the appendix available with this article. This result favors using margins (with its relative coding simplicity) when available. In all other cases, use Mata to code the standard-error formulations in Terza (2016b). I discuss examples, and I give corresponding Stata do-files in appendices. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
31. Aligning Machine Learning Solutions with Clinical Needs
- Author
-
Kamran, Fahad
- Subjects
- machine learning, causal effect estimation, survival analysis, risk stratification, resource allocation
- Abstract
The availability of large observational datasets in healthcare presents an opportunity to leverage machine learning techniques to learn complex relationships between an individual’s characteristics, underlying health status, and response to interventions. Despite progress, there is often a mismatch between how machine learning models are developed and clinical needs. In this dissertation, we study how considering clinical needs can and should inform model development in healthcare. First, in survival analysis, deep learning approaches have been proposed for estimating an individual's survival probability over some time horizon. However, these methods often focus on optimizing discriminative performance and have ignored model calibration. Well-calibrated survival curves present realistic and meaningful probabilistic estimates of the true underlying survival process for an individual, an essential characteristic for survival analysis models in many clinical contexts. In light of the shortcomings of existing approaches, we propose a new training scheme for optimizing deep survival analysis models for strong discriminative performance and good calibration. Across two clinical datasets, we show that our approach yields models with strong discriminative performance while improving calibration over existing methods. Second, in causal inference, past work has focused on accurately estimating conditional average treatment effects (CATEs) to help guide treatment allocation. However, in many settings, decision-makers only require a ranking of individuals to assist in allocating treatments. Leveraging the insight that ranking can be simpler than CATE estimation and better CATE accuracy doesn't necessarily translate to better treatment allocation, we propose an approach that optimizes directly for rankings of individuals to maximize benefit of treatment. Our tree-based approach maximizes the expected benefit across all treatment thresholds using a novel splitting criteria. Through experiments on synthetic datasets, we show that the proposed approach leads to better sample efficiency and better treatment assignments, as measured by expected benefit, compared to models optimized for accurate CATEs. Third, when exact CATEs are needed, we study the mismatch between theoretical results in CATE estimation and how this theory holds empirically. In recent years, techniques incorporating estimates of both the propensity score and potential outcomes have gained popularity in part due to their strong theoretical guarantees for overcoming confounding bias. However, how this theory translates to practice across an extensive set of practical settings, especially in the context of deep learning, has not been well explored. We present an in-depth exploration of popular techniques, finding that those relying only on estimates of the outcome, in particular the X-Learner, can consistently outperform more sophisticated techniques across a variety of practical settings. Finally, we study how the mismatch between machine learning objectives and clinical needs manifests in existing clinical tools for sepsis risk stratification. Standard risk-stratification approaches focus on predicting the likelihood of sepsis before the sepsis criteria is met. However, both the training and evaluation of these models do not match the ultimate goal of augmenting clinical decision-making to improve patient outcomes. We study both challenges, finding that: 1) existing risk stratification approaches deteriorate significantly when evaluating before clinical recognition of sepsis and 2) targeting those most likely to develop sepsis may be sub-optimal with respect to improving patient outcomes. Overall, our contributions bridge, in part, the gap between machine learning research and practice in healthcare. Ultimately, by recognizing domain-specific needs in clinical care as we have, machine learning practitioners can develop more impactful models.
- Published
- 2023
32. Counterfactual Reasoning in Observational Studies
- Author
-
Hassanpour, Negar
- Subjects
- Causal Inference, Causal Effect Estimation, Treatment Effect Estimation, Counterfactual Regression, Selection Bias
- Abstract
Abstract: As one of the main tasks in studying causality, the goal of Causal Inference is to determine "whether" (and perhaps "how much") the value of a certain variable (i.e., the effect) would change, had another specified variable (i.e., the cause) changed its value. A prominent example is the counterfactual question "Would this patient have lived longer had she received an alternative treatment?". The first challenge with causal inference is the unobservability of the counterfactual outcomes 一 i.e., outcomes obtained by applying the treatments that were not administered. The second common challenge is that the training data is often an observational study that exhibits selection bias 一 i.e., the treatment assignment can depend on the subjects' attributes. In this dissertation, I have explored ways to address the above-mentioned challenges. Specifically, my Research Contributions (RCs) are the following: My first RC addresses the first challenge: RC1. Unobservable counterfactuals prohibit proper evaluation of different methods' performance in estimating treatment effects. We provide an algorithm that can synthesize realistic observational datasets that exhibit various degrees of selection bias, then demonstrate that it can effectively assess various contextual bandit methods in the literature. The remaining RCs are related to the second challenge: RC2. Learning a common representation space that makes the transformed dataset close to a Randomized Controlled Trial (RCT), is a good strategy to reduce selection bias. We devise a method that further alleviates selection bias (attempting to account for it) by incorporating appropriate re-weighting schemes and show that it outperforms its competitors in the literature. RC3. Without loss of generality, we assume that three non-noise underlying factors generate any observational data. We devise a method that explicitly models these sources and argue that such model can better deal with selection bias. We then demonstrate its superior performance compared to the competing causal inference methods in the literature. RC4. The majority of current causal effect estimation methods fall under the category of discriminative approaches. A promising direction is to consider developing generative models, in an attempt to shed light on the true underlying data generating mechanism, which in turn is useful for the downstream task of counterfactual regression. We develop such a method and show empirically that it significantly outperforms state-of-the-art.
- Published
- 2022
33. Towards efficient and unbiased causal effect estimation from observational data
- Author
-
Cheng, Debo and University of South Australia. UniSA STEM.
- Subjects
causal effect estimation ,Causation ,causality ,Estimation theory ,causal inference ,Conditional expectations (Mathematics) - Abstract
Thesis (PhD(Computer and Information Science))--University of South Australia, 2021. Includes bibliographical references (pages 129-147) Causal effect estimation is a crucial task in causal inference. This thesis focuses on estimating causal effects from observational data. Covariate adjustment is a well-known approach to remove confounding bias by adjusting for a valid adjustment set when estimating causal effects from observational data. It is challenging to decide which variable should be adjusted for when the causal graph is unknown. The existing data-driven methods for estimating causal effects from data suffer from four challenges: data insufficiency, uncertainty in adjustment set identification, latent variables and low efficiency. In this thesis, I present four research contributions to address the four challenges by developing a set of theorems and algorithms based on graphical causal models for discovering valid adjustment sets and provide practical data-driven algorithms for estimating causal effects from data.
- Published
- 2021
34. A general framework for causal classification
- Author
-
Kui Yu, Lin Liu, Jiuyong Li, Thuc Duy Le, Jixue Liu, Weijia Zhang, Li, Jiuyong, Zhang, Weijia, Liu, Lin, Yu, Kui, Le, Thuc Duy, and Liu, Jixue
- Subjects
FOS: Computer and information sciences ,0301 basic medicine ,Computer Science - Machine Learning ,Computer science ,Product promotion ,Machine Learning (stat.ML) ,Machine learning ,computer.software_genre ,Machine Learning (cs.LG) ,03 medical and health sciences ,0302 clinical medicine ,Statistics - Machine Learning ,Uplift modelling ,Modelling methods ,Set (psychology) ,Implementation ,business.industry ,Applied Mathematics ,Causal effect ,Computer Science Applications ,causal effect estimation ,Management information systems ,030104 developmental biology ,Computational Theory and Mathematics ,030220 oncology & carcinogenesis ,Modeling and Simulation ,causal heterogeneity ,Classification methods ,uplift modelling ,Artificial intelligence ,business ,computer ,Information Systems - Abstract
In many applications, there is a need to predict the effect of an intervention on different individuals from data. For example, which customers are persuadable by a product promotion? which patients should be treated with a certain type of treatment? These are typical causal questions involving the effect or the change in outcomes made by an intervention. The questions cannot be answered with traditional classification methods as they only use associations to predict outcomes. For personalised marketing, these questions are often answered with uplift modelling. The objective of uplift modelling is to estimate causal effect, but its literature does not discuss when the uplift represents causal effect. Causal heterogeneity modelling can solve the problem, but its assumption of unconfoundedness is untestable in data. So practitioners need guidelines in their applications when using the methods. In this paper, we use causal classification for a set of personalised decision making problems, and differentiate it from classification. We discuss the conditions when causal classification can be resolved by uplift (and causal heterogeneity) modelling methods. We also propose a general framework for causal classification, by using off-the-shelf supervised methods for flexible implementations. Experiments have shown two instantiations of the framework work for causal classification and for uplift (causal heterogeneity) modelling, and are competitive with the other uplift (causal heterogeneity) modelling methods., International Journal of Data Science and Analytics (2021). arXiv admin note: text overlap with arXiv:1604.07212 by other authors
- Published
- 2021
35. Sample size and power calculations for medical studies by simulation when closed form expressions are not available.
- Author
-
Landau, Sabine and Stahl, Daniel
- Subjects
- *
SAMPLE size (Statistics) , *SIMULATION methods & models , *MEDICAL research , *RANDOMIZED controlled trials , *MONTE Carlo method , *MISSING data (Statistics) - Abstract
This paper shows how Monte Carlo simulation can be used for sample size, power or precision calculations when planning medical research studies. Standard study designs can lead to the use of analysis methods for which power formulae do not exist. This may be because complex modelling techniques with optimal statistical properties are used but power formulae have not yet been derived or because analysis models are employed that divert from the population model due to lack of availability of more appropriate analysis tools. Our presentation concentrates on the conceptual steps involved in carrying out power or precision calculations by simulation. We demonstrate these steps in three examples concerned with (i) drop out in longitudinal studies, (ii) measurement error in observational studies and (iii) causal effect estimation in randomised controlled trials with non-compliance. We conclude that the Monte Carlo simulation approach is an important general tool in the methodological arsenal for assessing power and precision. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
36. Propensity Score Matching underestimates Real Treatment Effect, in a simulated theoretical multivariate model
- Author
-
Daniel García Iglesias
- Subjects
Methodology (stat.ME) ,FOS: Computer and information sciences ,propensity score matching ,multivariate analysis ,general linear model ,Monte Carlo method ,causal effect estimation ,observational study ,General Mathematics ,fungi ,Computer Science (miscellaneous) ,Engineering (miscellaneous) ,Statistics - Methodology - Abstract
Propensity Score Matching (PSM) is a useful method to reduce the impact of Treatment-Selection Bias in the estimation of causal effects in observational studies. After matching, the PSM significantly reduces the sample under investigation, which may lead to other possible biases (due to overfitting, excess of covariation or a reduced number of observations). In this sense, we wanted to analyze the behavior of this PSM compared with other widely used methods to deal with non-comparable groups, such as the Multivariate Regression Model (MRM). Monte Carlo Simulations are made to construct groups with different effects in order to compare the behavior of PSM and MRM estimating these effects. In addition, the Treatment Selection Bias reduction for the PSM is calculated. With the PSM a reduction in the Treatment Selection Bias is achieved (0.983 [0.982, 0.984]), with a reduction in the Relative Real Treatment Effect Estimation Error (0.216 [0.2, 0.232]), but despite this bias reduction and estimation error reduction, the MRM reduces this estimation error significantly more than the PSM (0.539 [0.522, 0.556], p < 0.001). In addition, the PSM leads to a 30% reduction in the sample. This loss of information derived from the matching process may lead to another not known bias and thus to the inaccuracy of the effect estimation compared with the MRM.
- Published
- 2019
37. On Associative Confounder Bias
- Author
-
Wijayatunga, Priyantha and Wijayatunga, Priyantha
- Abstract
Conditioning on some set of confounders that causally affect both treatmentand outcome variables can be sufficient for eliminating bias introduced by allsuch confounders when estimating causal effect of the treatment on the outcomefrom observational data. It is done by including them in propensity score modelin so-called potential outcome framework for causal inference whereas in causalgraphical modeling framework usual conditioning on them is done. However inthe former framework, it is confusing when modeler finds a variable that is noncausallyassociated with both the treatment and the outcome. Some argue that suchvariables should also be included in the analysis for removing bias. But others arguethat they introduce no bias so they should be excluded and conditioning onthem introduces spurious dependence between the treatment and the outcome, thusresulting extra bias in the estimation. We show that there may be errors in boththe arguments in different contexts. When such a variable is found neither of theactions may give the correct causal effect estimate. Selecting one action over theother is needed in order to be less wrong.We discuss how to select the better action.
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.