3,035 results on '"Graphical Models"'
Search Results
2. Combination of autoregressive graphical models and time series bootstrap methods for risk management in marine insurance
- Author
-
Carli, Federico, Pesce, Elena, Porro, Francesco, and Riccomagno, Eva
- Published
- 2024
- Full Text
- View/download PDF
3. Data Transformation and Its Validity in a Two-Sample Problem: An Illustration Based on Graphical Models
- Author
-
Banzato, Erika, Risso, Davide, Chiogna, Monica, Djordjilović, Vera, Pollice, Alessio, editor, and Mariani, Paolo, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Probabilistic Fusion Framework Combining CNNs and Graphical Models for Multiresolution Satellite and UAV Image Classification
- Author
-
Pastorino, Martina, Moser, Gabriele, Guerra, Fabien, Serpico, Sebastiano B., Zerubia, Josiane, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Charting a Fair Path: FaGGM Fairness-Aware Generative Graphical Models
- Author
-
Jiang, Vivian Wei, Batista, Gustavo, Bain, Michael, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Gong, Mingming, editor, Song, Yiliao, editor, Koh, Yun Sing, editor, Xiang, Wei, editor, and Wang, Derui, editor
- Published
- 2025
- Full Text
- View/download PDF
6. On some publications of Sir David Cox.
- Author
-
Reid, Nancy
- Subjects
- *
STATISTICAL significance , *MODEL theory , *TIME series analysis , *SCANDINAVIANS , *STATISTICS - Abstract
Sir David Cox published four papers in the Scandinavian Journal of Statistics and two in the Scandinavian Actuarial Journal. This note provides some brief summaries of these papers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Continuity approximation in hybrid Bayesian networks structure learning.
- Author
-
Zhu, Wanchuang and Nguyen, Ngoc Lan Chi
- Abstract
Bayesian networks have been used to represent the joint distribution of multiple random variables in a flexible yet interpretable manner. One major challenge in learning the structure of a Bayesian network is how to model networks that include a mixture of continuous and discrete random variables, known as hybrid Bayesian networks. This paper overviews the literature on approaches to handle hybrid Bayesian networks. Typically, one of two approaches is taken: either the data are considered to have a joint distribution, designed for a mixture of discrete and continuous variables, or continuous random variables are discretized, resulting in discrete Bayesian networks. This paper proposes a strategy to model all random variables as Gaussian, referred to as Run it As Gaussian (RAG). We demonstrate that RAG results in more reliable estimates of graph structures theoretically and by simulation studies than other strategies. Both strategies are also implemented on a childhood obesity data set. The two different strategies give rise to significant differences in the optimal graph structures, with the results of the simulation study suggesting that our approach is more reliable. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Diagnosing and Handling Common Violations of Missing at Random.
- Author
-
Ji, Feng, Rabe-Hesketh, Sophia, and Skrondal, Anders
- Subjects
MAR ,data deletion ,diagnostic test ,graphical models ,m-graph ,missingness mechanisms ,ordered factorization ,structural equation models ,Models ,Statistical ,Bayes Theorem ,Psychometrics ,Models ,Theoretical ,Likelihood Functions - Abstract
Ignorable likelihood (IL) approaches are often used to handle missing data when estimating a multivariate model, such as a structural equation model. In this case, the likelihood is based on all available data, and no model is specified for the missing data mechanism. Inference proceeds via maximum likelihood or Bayesian methods, including multiple imputation without auxiliary variables. Such IL approaches are valid under a missing at random (MAR) assumption. Rabe-Hesketh and Skrondal (Ignoring non-ignorable missingness. Presidential Address at the International Meeting of the Psychometric Society, Beijing, China, 2015; Psychometrika, 2023) consider a violation of MAR where a variable A can affect missingness of another variable B also when A is not observed. They show that this case can be handled by discarding more data before proceeding with IL approaches. This data-deletion approach is similar to the sequential estimation of Mohan et al. (in: Advances in neural information processing systems, 2013) based on their ordered factorization theorem but is preferable for parametric models. Which kind of data-deletion or ordered factorization to employ depends on the nature of the MAR violation. In this article, we therefore propose two diagnostic tests, a likelihood-ratio test for a heteroscedastic regression model and a kernel conditional independence test. We also develop a test-based estimator that first uses diagnostic tests to determine which MAR violation appears to be present and then proceeds with the corresponding data-deletion estimator. Simulations show that the test-based estimator outperforms IL when the missing data problem is severe and performs similarly otherwise.
- Published
- 2023
9. Comparative study by adding bootstrapping stage in construction of biological networks.
- Author
-
Kaygusuz, Mehmet Ali and Purutçuoğlu, Vilda
- Subjects
AKAIKE information criterion ,BIOLOGICAL networks ,FISHER information ,BIOLOGICAL models ,NUMBER systems - Abstract
Model selection methods are very popular in high-dimensional settings in recent years due to the availability of massive amounts of data, specifically from genetical, image progressing, and financial sources. Therefore, the selection of the best estimated model becomes crucial. There are a number of model selection approaches in order to choose the optimal one among alternatives. Among them are the Akaike information criterion, Bayesian information criterion, Consistent Akaike information criterion with Fisher information matrix (CAICF), and Information and COMPlexity (ICOMP), which are very successful in lasso regression when constructing biological networks. In this study, we have proposed these criteria by inserting both non-parametric and Bayesian bootstrap approaches to optimize CAICF and ICOMP selection criteria when the sample size is smaller than the number of genes in the system. We evaluate the performance of the bootstrapping strategy with distinct Monte Carlo scenarios. From the majority of results it is shown that the model selection with bootstraps has higher accuracy than the model selection without bootstraps. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
10. Mixed-variable graphical modeling framework towards risk prediction of hospital-acquired pressure injury in spinal cord injury individuals
- Author
-
Yanke Li, Anke Scheel-Sailer, Robert Riener, and Diego Paez-Granados
- Subjects
Graphical models ,Causal discovery ,Predictive modeling ,Spinal cord injury ,Pressure injury ,Medicine ,Science - Abstract
Abstract Developing machine learning (ML) methods for healthcare predictive modeling requires absolute explainability and transparency to build trust and accountability. Graphical models (GM) are key tools for this but face challenges like small sample sizes, mixed variables, and latent confounders. This paper presents a novel learning framework addressing these challenges by integrating latent variables using fast causal inference (FCI), accommodating mixed variables with predictive permutation conditional independence tests (PPCIT), and employing a systematic graphical embedding approach leveraging expert knowledge. This method ensures a transparent model structure and an explainable feature selection and modeling approach, achieving competitive prediction performance. For real-world validation, data of hospital-acquired pressure injuries (HAPI) among individuals with spinal cord injury (SCI) were used, where the approach achieved a balanced accuracy of 0.941 and an AUC of 0.983, outperforming most benchmarks. The PPCIT method also demonstrated superior accuracy and scalability over other benchmarks in causal discovery validation on synthetic datasets that closely resemble our real dataset. This holistic framework effectively addresses the challenges of mixed variables and explainable predictive modeling for disease onset, which is crucial for enabling transparency and interpretability in ML-based healthcare.
- Published
- 2024
- Full Text
- View/download PDF
11. Med-MGF: multi-level graph-based framework for handling medical data imbalance and representation
- Author
-
Tuong Minh Nguyen, Kim Leng Poh, Shu-Ling Chong, and Jan Hau Lee
- Subjects
Pediatric sepsis ,Patient network ,Graphical models ,Message passing ,Machine learning ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background Modeling patient data, particularly electronic health records (EHR), is one of the major focuses of machine learning studies in healthcare, as these records provide clinicians with valuable information that can potentially assist them in disease diagnosis and decision-making. Methods In this study, we present a multi-level graph-based framework called MedMGF, which models both patient medical profiles extracted from EHR data and their relationship network of health profiles in a single architecture. The medical profiles consist of several layers of data embedding derived from interval records obtained during hospitalization, and the patient-patient network is created by measuring the similarities between these profiles. We also propose a modification to the Focal Loss (FL) function to improve classification performance in imbalanced datasets without the need to imputate the data. MedMGF’s performance was evaluated against several Graphical Convolutional Network (GCN) baseline models implemented with Binary Cross Entropy (BCE), FL, class balancing parameter $$\alpha$$ α , and Synthetic Minority Oversampling Technique (SMOTE). Results Our proposed framework achieved high classification performance (AUC: 0.8098, ACC: 0.7503, SEN: 0.8750, SPE: 0.7445, NPV: 0.9923, PPV: 0.1367) on an extreme imbalanced pediatric sepsis dataset (n=3,014, imbalance ratio of 0.047). It yielded a classification improvement of 3.81% for AUC, 15% for SEN compared to the baseline GCN+ $$\alpha$$ α FL (AUC: 0.7717, ACC: 0.8144, SEN: 0.7250, SPE: 0.8185, PPV: 0.1559, NPV: 0.9847), and an improvement of 5.88% in AUC and 22.5% compared to GCN+FL+SMOTE (AUC: 0.7510, ACC: 0.8431, SEN: 0.6500, SPE: 0.8520, PPV: 0.1688, NPV: 0.9814). It also showed a classification improvement of 3.86% for AUC, 15% for SEN compared to the baseline GCN+ $$\alpha$$ α BCE (AUC: 0.7712, ACC: 0.8133, SEN: 0.7250, SPE: 0.8173, PPV: 0.1551, NPV: 0.9847), and an improvement of 14.33% in AUC and 27.5% in comparison to GCN+BCE+SMOTE (AUC: 0.6665, ACC: 0.7271, SEN: 0.6000, SPE: 0.7329, PPV: 0.0941, NPV: 0.9754). Conclusion When compared to all baseline models, MedMGF achieved the highest SEN and AUC results, demonstrating the potential for several healthcare applications.
- Published
- 2024
- Full Text
- View/download PDF
12. Mixed-variable graphical modeling framework towards risk prediction of hospital-acquired pressure injury in spinal cord injury individuals.
- Author
-
Li, Yanke, Scheel-Sailer, Anke, Riener, Robert, and Paez-Granados, Diego
- Subjects
SPINAL cord injuries ,PRESSURE ulcers ,FEATURE selection ,CAUSAL inference ,PREDICTION models - Abstract
Developing machine learning (ML) methods for healthcare predictive modeling requires absolute explainability and transparency to build trust and accountability. Graphical models (GM) are key tools for this but face challenges like small sample sizes, mixed variables, and latent confounders. This paper presents a novel learning framework addressing these challenges by integrating latent variables using fast causal inference (FCI), accommodating mixed variables with predictive permutation conditional independence tests (PPCIT), and employing a systematic graphical embedding approach leveraging expert knowledge. This method ensures a transparent model structure and an explainable feature selection and modeling approach, achieving competitive prediction performance. For real-world validation, data of hospital-acquired pressure injuries (HAPI) among individuals with spinal cord injury (SCI) were used, where the approach achieved a balanced accuracy of 0.941 and an AUC of 0.983, outperforming most benchmarks. The PPCIT method also demonstrated superior accuracy and scalability over other benchmarks in causal discovery validation on synthetic datasets that closely resemble our real dataset. This holistic framework effectively addresses the challenges of mixed variables and explainable predictive modeling for disease onset, which is crucial for enabling transparency and interpretability in ML-based healthcare. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Finite Population Survey Sampling: An Unapologetic Bayesian Perspective.
- Author
-
Banerjee, Sudipto
- Abstract
This article attempts to offer some perspectives on Bayesian inference for finite population quantities when the units in the population are assumed to exhibit complex dependencies. Beginning with an overview of Bayesian hierarchical models, including some that yield design-based Horvitz-Thompson estimators, the article proceeds to introduce dependence in finite populations and sets out inferential frameworks for ignorable and nonignorable responses. Multivariate dependencies using graphical models and spatial processes are discussed and some salient features of two recent analyses for spatial finite populations are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Hyper Markov law in undirected graphical models with its applications.
- Author
-
Kang, Xiong and Yi Sun, Brian
- Subjects
- *
GIBBS sampling , *UNDIRECTED graphs , *LIKELIHOOD ratio tests , *MARKOV processes , *GENERALIZATION - Abstract
By exploring the prime decomposition of undirected graphs, this work investigates the hyper Markov property within the framework of arbitrary undirected graph, which can be seen as the generalization of that for decomposable graphical models proposed by Dawid and Laurizten. The proposed hyper Markov properties of this article can be used to characterize the conditional independence of a distribution or a statistical quantity and helpful to simplify the likelihood ratio functions for statistical test for two different graphs obtained by removing or adding one edge. As an application of these properties, the G-Wishart law is introduced as a prior law for graphical Gaussian models for Bayesian posterior updating, and a hypothesis test for precision matrix is designed to determine the model structures. Our simulation experiments are implemented by using the Gibbs Sampler algorithm and the results show that it performs better in convergence speed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Visibility graph-based covariance functions for scalable spatial analysis in non-convex partially Euclidean domains.
- Author
-
Gilbert, Brian and Datta, Abhirup
- Subjects
- *
EUCLIDEAN domains , *GAUSSIAN processes , *GEODESIC distance , *ENVIRONMENTAL monitoring , *INTEGRAL functions , *EUCLIDEAN distance - Abstract
We present a new method for constructing valid covariance functions of Gaussian processes for spatial analysis in irregular, non-convex domains such as bodies of water. Standard covariance functions based on geodesic distances are not guaranteed to be positive definite on such domains, while existing non-Euclidean approaches fail to respect the partially Euclidean nature of these domains where the geodesic distance agrees with the Euclidean distances for some pairs of points. Using a visibility graph on the domain, we propose a class of covariance functions that preserve Euclidean-based covariances between points that are connected in the domain while incorporating the non-convex geometry of the domain via conditional independence relationships. We show that the proposed method preserves the partially Euclidean nature of the intrinsic geometry on the domain while maintaining validity (positive definiteness) and marginal stationarity of the covariance function over the entire parameter space, properties which are not always fulfilled by existing approaches to construct covariance functions on non-convex domains. We provide useful approximations to improve computational efficiency, resulting in a scalable algorithm. We compare the performance of our method with those of competing state-of-the-art methods using simulation studies on synthetic non-convex domains. The method is applied to data regarding acidity levels in the Chesapeake Bay, showing its potential for ecological monitoring in real-world spatial applications on irregular domains. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Forensic Science and How Statistics Can Help It: Evidence, Likelihood Ratios, and Graphical Models.
- Author
-
Xu, Xiangyu and Vinci, Giuseppe
- Subjects
- *
CRIME statistics , *CRIMINAL investigation , *CRIMINAL justice system , *BAYESIAN analysis , *DNA fingerprinting - Abstract
The persistent issue of wrongful convictions in the United States emphasizes the need for scrutiny and improvement of the criminal justice system. While statistical methods for the evaluation of forensic evidence, including glass, fingerprints, and deoxyribonucleic acid, have significantly contributed to solving intricate crimes, there is a notable lack of national‐level standards to ensure the appropriate application of statistics in forensic investigations. We discuss the obstacles in the application of statistics in court and emphasize the importance of making statistical interpretation accessible to non‐statisticians, especially those who make decisions about potentially innocent individuals. We investigate the use and misuse of statistical methods in crime investigations, in particular the likelihood ratio approach. We further describe the use of graphical models, where hypotheses and evidence can be represented as nodes connected by arrows signifying association or causality. We emphasize the advantages of special graph structures, such as object‐oriented Bayesian networks and chain event graphs, which allow for the concurrent examination of evidence of various nature. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Med-MGF: multi-level graph-based framework for handling medical data imbalance and representation.
- Author
-
Nguyen, Tuong Minh, Poh, Kim Leng, Chong, Shu-Ling, and Lee, Jan Hau
- Subjects
MACHINE learning ,ELECTRONIC health records ,DIAGNOSIS ,SEPSIS ,HEALTH care networks - Abstract
Background: Modeling patient data, particularly electronic health records (EHR), is one of the major focuses of machine learning studies in healthcare, as these records provide clinicians with valuable information that can potentially assist them in disease diagnosis and decision-making. Methods: In this study, we present a multi-level graph-based framework called MedMGF, which models both patient medical profiles extracted from EHR data and their relationship network of health profiles in a single architecture. The medical profiles consist of several layers of data embedding derived from interval records obtained during hospitalization, and the patient-patient network is created by measuring the similarities between these profiles. We also propose a modification to the Focal Loss (FL) function to improve classification performance in imbalanced datasets without the need to imputate the data. MedMGF's performance was evaluated against several Graphical Convolutional Network (GCN) baseline models implemented with Binary Cross Entropy (BCE), FL, class balancing parameter α , and Synthetic Minority Oversampling Technique (SMOTE). Results: Our proposed framework achieved high classification performance (AUC: 0.8098, ACC: 0.7503, SEN: 0.8750, SPE: 0.7445, NPV: 0.9923, PPV: 0.1367) on an extreme imbalanced pediatric sepsis dataset (n=3,014, imbalance ratio of 0.047). It yielded a classification improvement of 3.81% for AUC, 15% for SEN compared to the baseline GCN+ α FL (AUC: 0.7717, ACC: 0.8144, SEN: 0.7250, SPE: 0.8185, PPV: 0.1559, NPV: 0.9847), and an improvement of 5.88% in AUC and 22.5% compared to GCN+FL+SMOTE (AUC: 0.7510, ACC: 0.8431, SEN: 0.6500, SPE: 0.8520, PPV: 0.1688, NPV: 0.9814). It also showed a classification improvement of 3.86% for AUC, 15% for SEN compared to the baseline GCN+ α BCE (AUC: 0.7712, ACC: 0.8133, SEN: 0.7250, SPE: 0.8173, PPV: 0.1551, NPV: 0.9847), and an improvement of 14.33% in AUC and 27.5% in comparison to GCN+BCE+SMOTE (AUC: 0.6665, ACC: 0.7271, SEN: 0.6000, SPE: 0.7329, PPV: 0.0941, NPV: 0.9754). Conclusion: When compared to all baseline models, MedMGF achieved the highest SEN and AUC results, demonstrating the potential for several healthcare applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Mechanistic modeling of social conditions in disease-prediction simulations via copulas and probabilistic graphical models: HIV case study
- Author
-
Khosheghbal, Amir, Haas, Peter J., and Gopalappa, Chaitra
- Published
- 2024
- Full Text
- View/download PDF
19. Foundations of causal discovery on groups of variables
- Author
-
Wahl Jonas, Ninad Urmi, and Runge Jakob
- Subjects
causality ,causal discovery ,graphical models ,markov property ,faithfulness ,time series ,62d20 ,Mathematics ,QA1-939 ,Probabilities. Mathematical statistics ,QA273-280 - Abstract
Discovering causal relationships from observational data is a challenging task that relies on assumptions connecting statistical quantities to graphical or algebraic causal models. In this work, we focus on widely employed assumptions for causal discovery when objects of interest are (multivariate) groups of random variables rather than individual (univariate) random variables, as is the case in a variety of problems in scientific domains such as climate science or neuroscience. If the group level causal models are derived from partitioning a micro-level model into groups, we explore the relationship between micro- and group level causal discovery assumptions. We investigate the conditions under which assumptions like causal faithfulness hold or fail to hold. Our analysis encompasses graphical causal models that contain cycles and bidirected edges. We also discuss grouped time series causal graphs and variants thereof as special cases of our general theoretical framework. Thereby, we aim to provide researchers with a solid theoretical foundation for the development and application of causal discovery methods for variable groups.
- Published
- 2024
- Full Text
- View/download PDF
20. Learning debiased graph representations from the OMOP common data model for synthetic data generation
- Author
-
Nicolas Alexander Schulz, Jasmin Carus, Alexander Johannes Wiederhold, Ole Johanns, Frederik Peters, Natalie Rath, Katharina Rausch, Bernd Holleczek, Alexander Katalinic, the AI-CARE Working Group, and Christopher Gundler
- Subjects
Synthetic Data Generation ,Standardized Electronic Health Records ,Causal Discovery ,Discrete Time Series ,Structural Equation Models ,Graphical Models ,Medicine (General) ,R5-920 - Abstract
Abstract Background Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention. Methods Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts. Results The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand. Conclusion Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable.
- Published
- 2024
- Full Text
- View/download PDF
21. What if we intervene?: Higher-order cross-lagged causal model with interventional approach under observational design.
- Author
-
Castro, Christopher, Michell, Kevin, Kristjanpoller, Werner, and Minutolo, Marcel C.
- Subjects
- *
GOLD futures , *STOCK index futures , *CAUSAL inference , *CONDITIONAL probability , *CAUSAL models - Abstract
Experimental design allows us to more accurately determine the causal relationship between variables correlated over time as compared to observational design based on conditional probability. Observational design allows us to establish merely a dependency relationships that helps predict the objective variable. However, under certain observational design conditions, it is possible to determine the causal effects of the experimental design without carrying out an intervention. In this work, we present a causal model of higher-order crossed lags capable of inferring causal relationships with hypothetical interventions under an observational design. Additionally, a visualization form is offered that allows us to analyze multiple interventions simultaneously. The methodology is applied to three financial series: the Euro–United States Dollar exchange rate; the Dow Jones Industrial Index; and Gold futures. An analysis of causality concerning their volatilities and differences between the approaches is presented as well as a classic approach of conditioning. Researchers must be cautious in defining the research objective and design for other studies since the approaches lead to very different causal conclusions. The framework presented is expected to be useful in any discipline where one wants to learn What would happen if we intervene? without actually making an intervention. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Computational Test for Conditional Independence.
- Author
-
Thorjussen, Christian B. H., Liland, Kristian Hovde, Måge, Ingrid, and Solberg, Lars Erik
- Subjects
- *
FALSE positive error , *CAUSAL inference , *STATISTICS , *RESEARCH personnel , *TEST methods - Abstract
Conditional Independence (CI) testing is fundamental in statistical analysis. For example, CI testing helps validate causal graphs or longitudinal data analysis with repeated measures in causal inference. CI testing is difficult, especially when testing involves categorical variables conditioned on a mixture of continuous and categorical variables. Current parametric and non-parametric testing methods are designed for continuous variables and can quickly fall short in the categorical case. This paper presents a computational approach for CI testing suited for categorical data types, which we call computational conditional independence (CCI) testing. The test procedure is based on permutation and combines machine learning prediction algorithms and Monte Carlo cross-validation. We evaluated the approach through simulation studies and assessed the performance against alternative methods: the generalized covariance measure test, the kernel conditional independence test, and testing with multinomial regression. We find that the computational approach to testing has utility over the alternative methods, achieving better control over type I error rates. We hope this work can expand the toolkit for CI testing for practitioners and researchers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Estimation of Graphical Models: An Overview of Selected Topics.
- Author
-
Chen, Li‐Pang
- Abstract
Summary: Graphical modelling is an important branch of statistics that has been successfully applied in biology, social science, causal inference and so on. Graphical models illuminate connections between many variables and can even describe complex data structures or noisy data. Graphical models have been combined with supervised learning techniques such as regression modelling and classification analysis with multi‐class responses. This paper first reviews some fundamental graphical modelling concepts, focusing on estimation methods and computational algorithms. Several advanced topics are then considered, delving into complex graphical structures and noisy data. Applications in regression and classification are considered throughout. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. DBHC: Discrete Bayesian HMM Clustering.
- Author
-
Budel, Gabriel, Frasincar, Flavius, and Boekestijn, David
- Abstract
Sequence data mining has become an increasingly popular research topic as the availability of data has grown rapidly over the past decades. Sequence clustering is a type of method within this field that is in high demand in the industry, but the sequence clustering problem is non-trivial and, as opposed to static cluster analysis, interpreting clusters of sequences is often difficult. Using Hidden Markov Models (HMMs), we propose the Discrete Bayesian HMM Clustering (DBHC) algorithm, an approach to clustering discrete sequences by extending a proven method for continuous sequences. The proposed algorithm is completely self-contained as it incorporates both the search for the number of clusters and the search for the number of hidden states in each cluster model in the parameter inference. We provide a working example and a simulation study to explain and showcase the capabilities of the DBHC algorithm. A case study illustrates how the hidden states in a mixture of HMMs can aid the interpretation task of a sequence cluster analysis. We conclude that the algorithm works well as it provides well-interpretable clusters for the considered application. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. ФОРМИРАНЕ НА УМЕНИЯ ЗА СЪЗДАВАНЕ НА ТЕКСТ У УЧЕНИЦИТЕ В НАЧАЛНИЯ ЕТАП НА ОСНОВНАТА ОБРАЗОВАТЕЛНА СТЕПЕН.
- Author
-
Райкова, Ирена
- Abstract
The article presents the results of a research carried out with students at primary level of education. The object of the research is the education of mother tongue. The subject is the process of formation and development of students’ skills to retell a story and text creation. Approbation of the suggested graphical models is realized through the Neurographica method. Graphical models are used in Bulgarian language and literature lessons. Students draw models of the narrative, before writing the text. As a result, they have visual support when creating the text. Analysis of the data shows various benefits of using graphical models for development of students’ communicative speech competence. The results of the study show a significant improvement of students’ writing skills. Among the skills which students improve best, are: to retell the episodes of the story in their chronological order, to build a coherent composition of the text and logical connection between the sentences, better graphical design of the text, etc. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Learning debiased graph representations from the OMOP common data model for synthetic data generation.
- Author
-
Schulz, Nicolas Alexander, Carus, Jasmin, Wiederhold, Alexander Johannes, Johanns, Ole, Peters, Frederik, Rath, Natalie, Rausch, Katharina, Holleczek, Bernd, Katalinic, Alexander, Nennecke, Alice, Kusche, Henrik, Heinrichs, Vera, Eberle, Andrea, Luttmann, Sabine, Abnaof, Khalid, Kim-Wanner, Soo-Zin, Handels, Heinz, Germer, Sebastian, Halber, Marco, and Richter, Martin
- Subjects
REPRESENTATIONS of graphs ,ELECTRONIC health record standards ,MEDICAL informatics ,DATA modeling ,NURSING informatics ,MARKOV processes - Abstract
Background: Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention. Methods: Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts. Results: The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand. Conclusion: Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Probabilistic Circuits with Constraints via Convex Optimization
- Author
-
Ghandi, Soroush, Quost, Benjamin, de Campos, Cassio, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bifet, Albert, editor, Davis, Jesse, editor, Krilavičius, Tomas, editor, Kull, Meelis, editor, Ntoutsi, Eirini, editor, and Žliobaitė, Indrė, editor
- Published
- 2024
- Full Text
- View/download PDF
28. Shareable and Inheritable Incremental Compilation in iOOBN
- Author
-
Samiullah, Md, Nicholson, Ann, Albrecht, David, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Fenrong, editor, Sadanandan, Arun Anand, editor, Pham, Duc Nghia, editor, Mursanto, Petrus, editor, and Lukose, Dickson, editor
- Published
- 2024
- Full Text
- View/download PDF
29. Machine learning quick reference : quick and essential machine learning hacks for training smart data models.
- Author
-
Kumar, Rahul
- Subjects
Data scientist ,Graphical Models ,Machine learning ,Bayesian Theory - Abstract
Summary: Machine learning makes it possible to learn about the unknowns and gain hidden insights into your datasets by mastering many tools and techniques. This book guides you to do just that in a very compact manner. After giving a quick overview of what machine learning is all about, Machine Learning Quick Reference jumps right into its core algorithms and demonstrates how they can be applied to real-world scenarios. From model evaluation to optimizing their performance, this book will introduce you to the best practices in machine learning. Furthermore, you will also look at the more advanced aspects such as training neural networks and work with different kinds of data, such as text, time-series, and sequential data. Advanced methods and techniques such as causal inference, deep Gaussian processes, and more are also covered. By the end of this book, you will be able to train fast, accurate machine learning models at your fingertips, which you can easily use as a point of reference.
- Published
- 2019
30. Modelos gráficos na geografia
- Author
-
Hervé Théry
- Subjects
Roger Brunet ,chorematics ,graphical models ,Robert Ferras ,Geography. Anthropology. Recreation - Abstract
The renewed interest in the graphic modelling of territories (generally known as “chorematic” modelling) has led to the need for an enlightening text and precise indications of the method. The aim here is not to present the underlying theory, well formulated by Roger Brunet, but simply to provide simple indications, examples and bibliographical sources for anyone wishing to practice graphic modelling of territories.
- Published
- 2024
- Full Text
- View/download PDF
31. Development and validation of a mortality risk prediction model for chronic obstructive pulmonary disease: a cross-sectional study using probabilistic graphical modellingResearch in context
- Author
-
Tyler C. Lovelace, Min Hyung Ryu, Minxue Jia, Peter Castaldi, Frank C. Sciurba, Craig P. Hersh, and Panayiotis V. Benos
- Subjects
COPD mortality ,Graphical models ,Machine learning ,Medicine (General) ,R5-920 - Abstract
Summary: Background: Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of mortality. Predicting mortality risk in patients with COPD can be important for disease management strategies. Although all-cause mortality predictors have been developed previously, limited research exists on factors directly affecting COPD-specific mortality. Methods: In a retrospective study, we used probabilistic graphs to analyse clinical cross-sectional data (COPDGene cohort), including demographics, spirometry, quantitative chest imaging, and symptom features, as well as gene expression data. COPDGene recruited current and former smokers, aged 45–80 years with >10 pack-years smoking history, from across the USA (Phase 1, 11/2007-4/2011) and invited them for a follow-up visit (Phase 2, 7/2013-7/2017). ECLIPSE cohort recruited current and former smokers (COPD patients and controls from USA and Europe), aged 45–80 with smoking history >10 pack-years (12/2005-11/2007). We applied graphical models on multi-modal data COPDGene Phase 1 participants to identify factors directly affecting all-cause and COPD-specific mortality (primary outcomes); and on Phase 2 follow-up cohort to identify additional molecular and social factors affecting mortality. We used penalized Cox regression with features selected by the causal graph to build VAPORED, a mortality risk prediction model. VAPORED was compared to existing scores (BODE: BMI, airflow obstruction, dyspnoea, exercise capacity; ADO: age, dyspnoea, airflow obstruction) on the ability to rank individuals by mortality risk, using four evaluation metrics (concordance, concordance probability estimate (CPE), cumulative/dynamic (C/D) area under the receiver operating characteristic curve (AUC), and integrated C/D AUC). The results were validated in ECLIPSE. Findings: Graphical models, applied on the COPDGene Phase 1 samples (n = 8610), identified 11 and 7 variables directly linked to all-cause and COPD-specific mortality, respectively. Although many appear in both models, non-lung comorbidities appear only in the all-cause model, while forced vital capacity (FVC %predicted) appears in COPD-specific mortality model only. Additionally, the graph model of Phase 2 data (n = 3182) identified internet access, CD4 T cells and platelets to be linked to lower mortality risk. Furthermore, using the 7 variables linked to COPD-specific mortality (forced expiratory volume in 1 s/forced vital capacity (FEV1/FVC) ration, FVC %predicted, age, history of pneumonia, oxygen saturation, 6-min walk distance, dyspnoea) we developed VAPORED mortality risk score, which we validated on the ECLIPSE cohort (3-yr all-cause mortality data, n = 2312). VAPORED performed significantly better than ADO, BODE, and updated BODE indices in predicting all-cause mortality in ECLIPSE in terms of concordance (VAPORED [0.719] vs ADO [0.693; FDR p-value 0.014], BODE [0.695; FDR p-value 0.020], and updated BODE [0.694; FDR p-value 0.021]); CPE (VAPORED [0.714] vs ADO [0.673; FDR p-value
- Published
- 2024
- Full Text
- View/download PDF
32. Distinct COPD subtypes in former smokers revealed by gene network perturbation analysis
- Author
-
Buschur, Kristina L, Riley, Craig, Saferali, Aabida, Castaldi, Peter, Zhang, Grace, Aguet, Francois, Ardlie, Kristin G, Durda, Peter, Craig Johnson, W, Kasela, Silva, Liu, Yongmei, Manichaikul, Ani, Rich, Stephen S, Rotter, Jerome I, Smith, Josh, Taylor, Kent D, Tracy, Russell P, Lappalainen, Tuuli, Graham Barr, R, Sciurba, Frank, Hersh, Craig P, and Benos, Panayiotis V
- Subjects
Biomedical and Clinical Sciences ,Cardiovascular Medicine and Haematology ,Clinical Sciences ,Genetics ,Clinical Research ,Chronic Obstructive Pulmonary Disease ,Women's Health ,Human Genome ,Lung ,Precision Medicine ,2.1 Biological and endogenous factors ,4.2 Evaluation of markers and technologies ,Respiratory ,Good Health and Well Being ,Humans ,Gene Regulatory Networks ,Smokers ,Genome-Wide Association Study ,Pulmonary Disease ,Chronic Obstructive ,Prognosis ,COPD ,Graphical models ,Gene expression ,Disease subtypes ,Cardiorespiratory Medicine and Haematology ,Respiratory System ,Cardiovascular medicine and haematology ,Clinical sciences - Abstract
BackgroundChronic obstructive pulmonary disease (COPD) varies significantly in symptomatic and physiologic presentation. Identifying disease subtypes from molecular data, collected from easily accessible blood samples, can help stratify patients and guide disease management and treatment.MethodsBlood gene expression measured by RNA-sequencing in the COPDGene Study was analyzed using a network perturbation analysis method. Each COPD sample was compared against a learned reference gene network to determine the part that is deregulated. Gene deregulation values were used to cluster the disease samples.ResultsThe discovery set included 617 former smokers from COPDGene. Four distinct gene network subtypes are identified with significant differences in symptoms, exercise capacity and mortality. These clusters do not necessarily correspond with the levels of lung function impairment and are independently validated in two external cohorts: 769 former smokers from COPDGene and 431 former smokers in the Multi-Ethnic Study of Atherosclerosis (MESA). Additionally, we identify several genes that are significantly deregulated across these subtypes, including DSP and GSTM1, which have been previously associated with COPD through genome-wide association study (GWAS).ConclusionsThe identified subtypes differ in mortality and in their clinical and functional characteristics, underlining the need for multi-dimensional assessment potentially supplemented by selected markers of gene expression. The subtypes were consistent across cohorts and could be used for new patient stratification and disease prognosis.
- Published
- 2023
33. Quantum circuits for discrete graphical models
- Author
-
Piatkowski, Nico and Zoufal, Christa
- Published
- 2024
- Full Text
- View/download PDF
34. A Graphical Multi-Fidelity Gaussian Process Model, with Application to Emulation of Heavy-Ion Collisions.
- Author
-
Ji, Yi, Mak, Simon, Soeder, Derek, Paquet, J-F, and Bass, Steffen A.
- Subjects
- *
GAUSSIAN processes , *DIRECTED acyclic graphs , *GALAXY formation , *SCIENTIFIC computing , *EMULATION software , *MULTISENSOR data fusion - Abstract
With advances in scientific computing and mathematical modeling, complex scientific phenomena such as galaxy formations and rocket propulsion can now be reliably simulated. Such simulations can however be very time-intensive, requiring millions of CPU hours to perform. One solution is multi-fidelity emulation, which uses data of different fidelities to train an efficient predictive model which emulates the expensive simulator. For complex scientific problems and with careful elicitation from scientists, such multi-fidelity data may often be linked by a directed acyclic graph (DAG) representing its scientific model dependencies. We thus propose a new Graphical Multi-fidelity Gaussian Process (GMGP) model, which embeds this DAG structure (capturing scientific dependencies) within a Gaussian process framework. We show that the GMGP has desirable modeling traits via two Markov properties, and admits a scalable algorithm for recursive computation of the posterior mean and variance along at each depth level of the DAG. We also present a novel experimental design methodology over the DAG given an experimental budget, and propose a nonlinear extension of the GMGP via deep Gaussian processes. The advantages of the GMGP are then demonstrated via a suite of numerical experiments and an application to emulation of heavy-ion collisions, which can be used to study the conditions of matter in the Universe shortly after the Big Bang. The proposed model has broader uses in data fusion applications with graphical structure, which we further discuss. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Green finance: Evidence from large portfolios and networks during financial crises and recessions.
- Author
-
Argentiero, Amedeo, Bonaccolto, Giovanni, and Pedrini, Giulio
- Subjects
FINANCIAL crises ,ENVIRONMENTAL responsibility ,RECESSIONS ,STOCKS (Finance) ,SUSTAINABLE development - Abstract
In this article, we study the relevance of green finance from a portfolio and a network perspective. The estimates are derived from a regularized graphical model, which allows us to deal with two important issues. First, we refer to the curse of dimensionality, as we focus on a relatively large set of companies. Second, we explicitly take into account the heavy‐tailed distributions of financial time series, which reflect the impact of crises and recessions. Focusing on a time interval spanning across well‐known tail events, from the US subprime crisis to the recent outbreak of the COVID‐19 pandemic, we show that the selected green stocks offer a relevant contribution to the minimization of the overall portfolio risk. Moreover, they outperform the gray assets in terms of risk, profitability, and risk‐adjusted return in a statistically significant way. These findings are consistent with the estimates obtained from the network analysis. Indeed, the gray stocks exhibit a greater connection within the dynamic networks and, then, are more exposed to the risk of a greater propagation of negative spillover effects during stressed periods. Interestingly, the relevance of the green stocks increases when moving from the standard Gaussian to the leptokurtic setting. The policy implications suggested by these results induce policymakers to undertake synergistic interventions with private finance aimed at supporting a green economy and environmentally responsible companies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Joint modeling of association networks and longitudinal biomarkers: An application to childhood obesity.
- Author
-
Cremaschi, Andrea, De Iorio, Maria, Kothandaraman, Narasimhan, Yap, Fabian, Tint, Mya Thway, and Eriksson, Johan
- Subjects
- *
CHILDHOOD obesity , *OVERWEIGHT children , *NON-communicable diseases , *BIOMARKERS , *COHORT analysis , *GAUSSIAN processes - Abstract
The prevalence of chronic non‐communicable diseases such as obesity has noticeably increased in the last decade. The study of these diseases in early life is of paramount importance in determining their course in adult life and in supporting clinical interventions. Recently, attention has been drawn to approaches that study the alteration of metabolic pathways in obese children. In this work, we propose a novel joint modeling approach for the analysis of growth biomarkers and metabolite associations, to unveil metabolic pathways related to childhood obesity. Within a Bayesian framework, we flexibly model the temporal evolution of growth trajectories and metabolic associations through the specification of a joint nonparametric random effect distribution, with the main goal of clustering subjects, thus identifying risk sub‐groups. Growth profiles as well as patterns of metabolic associations determine the clustering structure. Inclusion of risk factors is straightforward through the specification of a regression term. We demonstrate the proposed approach on data from the Growing Up in Singapore Towards healthy Outcomes cohort study, based in Singapore. Posterior inference is obtained via a tailored MCMC algorithm, involving a nonparametric prior with mixed support. Our analysis has identified potential key pathways in obese children that allow for the exploration of possible molecular mechanisms associated with childhood obesity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Tropical Origin, Global Diversification, and Dispersal in the Pond Damselflies (Coenagrionoidea) Revealed by a New Molecular Phylogeny.
- Author
-
Willink, Beatriz, Ware, Jessica L, and Svensson, Erik I
- Subjects
- *
MOLECULAR phylogeny , *DAMSELFLIES , *ODONATA , *BIOMES , *CLIMATE change , *PONDS - Abstract
The processes responsible for the formation of Earth's most conspicuous diversity pattern, the latitudinal diversity gradient (LDG), remain unexplored for many clades in the Tree of Life. Here, we present a densely sampled and dated molecular phylogeny for the most speciose clade of damselflies worldwide (Odonata: Coenagrionoidea) and investigate the role of time, macroevolutionary processes, and biome-shift dynamics in shaping the LDG in this ancient insect superfamily. We used process-based biogeographic models to jointly infer ancestral ranges and speciation times and to characterize within-biome dispersal and biome-shift dynamics across the cosmopolitan distribution of Coenagrionoidea. We also investigated temporal and biome-dependent variation in diversification rates. Our results uncover a tropical origin of pond damselflies and featherlegs ~105 Ma, while highlighting the uncertainty of ancestral ranges within the tropics in deep time. Even though diversification rates have declined since the origin of this clade, global climate change and biome-shifts have slowly increased diversity in warm- and cold-temperate areas, where lineage turnover rates have been relatively higher. This study underscores the importance of biogeographic origin and time to diversify as important drivers of the LDG in pond damselflies and their relatives, while diversification dynamics have instead resulted in the formation of ephemeral species in temperate regions. Biome-shifts, although limited by tropical niche conservatism, have been the main factor reducing the steepness of the LDG in the last 30 Myr. With ongoing climate change and increasing northward range expansions of many damselfly taxa, the LDG may become less pronounced. Our results support recent calls to unify biogeographic and macroevolutionary approaches to improve our understanding of how latitudinal diversity gradients are formed and why they vary across time and among taxa. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Bayesian Learning of Causal Networks for Unsupervised Fault Diagnosis in Distributed Energy Systems
- Author
-
Federico Castelletti, Fabrizio Niro, Marco Denti, Daniele Tessera, and Andrea Pozzi
- Subjects
Clustering methods ,distributed power generation ,fault diagnosis ,graphical models ,machine learning ,statistics ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Distributed energy generation systems, key for producing electricity near usage points, are essential to meet theglobal electricity demand, leveraging diverse sources likerenewables, traditional fuels, and industrial waste heat. Despite theirhigh reliability, these systemsare not immune to faults and failures. Such incidents can result in considerable downtime and reduced efficiency, underlining the need for effectivefault detection and diagnosis techniques. Implementing these strategies is crucial not just for mitigating damage and preventing potential disasters, but also to maintain optimal performance levels. This paper introduces a novel methodology based on Bayesian graphical modeling for unsupervised fault diagnosis, focusing on organic Rankine cycle case study. It employsstructural learning to discern unknown intervention points within a directed acyclic graph that models the power plant’s operations. By analyzing real-world data, the study demonstrates the effectiveness of this approach, pinpointing a subset of variables that could be implicated in specific faults.
- Published
- 2024
- Full Text
- View/download PDF
39. Valuation Network for Ongoing Assessment of Threat to an Underwater Vehicle
- Author
-
Branko Ristic, Amanda Bessell, and Sanjeev Arulampalam
- Subjects
Machine reasoning ,graphical models ,valuation networks ,maritime threat assessment ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The paper develops a valuation based system for reasoning under uncertainty in the context of threat assessment onboard an underwater vehicle. The focus is on threat posed by the nearby contacts, while the vessel is navigating busy waters with warships, merchant ships and fishing vessels. A graphical model of a valuation network is developed, representing the (uncertain) contextual prior knowledge and received observations over the course of time. Two types of valuations are considered in this context: (1) probability mass functions, assuming that all probabilistic values are known precisely; (2) credal sets (sets of probabilities), when probabilistic values are specified only as the confidence intervals. The performance of two valuation networks is presented, using a typical scenario-log, involving a varying number of different types of contacts over time.
- Published
- 2024
- Full Text
- View/download PDF
40. Confidence in causal inference under structure uncertainty in linear causal models with equal variances
- Author
-
Strieder David and Drton Mathias
- Subjects
confidence intervals ,causal-effects ,linear structural equation models ,equal error variances ,graphical models ,62d20 ,62h22 ,Mathematics ,QA1-939 ,Probabilities. Mathematical statistics ,QA273-280 - Abstract
Inferring the effect of interventions within complex systems is a fundamental problem of statistics. A widely studied approach uses structural causal models that postulate noisy functional relations among a set of interacting variables. The underlying causal structure is then naturally represented by a directed graph whose edges indicate direct causal dependencies. In a recent line of work, additional assumptions on the causal models have been shown to render this causal graph identifiable from observational data alone. One example is the assumption of linear causal relations with equal error variances that we will take up in this work. When the graph structure is known, classical methods may be used for calculating estimates and confidence intervals for causal-effects. However, in many applications, expert knowledge that provides an a priori valid causal structure is not available. Lacking alternatives, a commonly used two-step approach first learns a graph and then treats the graph as known in inference. This, however, yields confidence intervals that are overly optimistic and fail to account for the data-driven model choice. We argue that to draw reliable conclusions, it is necessary to incorporate the remaining uncertainty about the underlying causal structure in confidence statements about causal-effects. To address this issue, we present a framework based on test inversion that allows us to give confidence regions for total causal-effects that capture both sources of uncertainty: causal structure and numerical size of non-zero effects.
- Published
- 2023
- Full Text
- View/download PDF
41. Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches.
- Author
-
Ouardirhi, Zainab, Mahmoudi, Sidi Ahmed, and Zbakh, Mostapha
- Subjects
OBJECT recognition (Computer vision) ,VIDEO surveillance ,DECISION making ,DATA augmentation - Abstract
Smart video surveillance systems (SVSs) have garnered significant attention for their autonomous monitoring capabilities, encompassing automated detection, tracking, analysis, and decision making within complex environments, with minimal human intervention. In this context, object detection is a fundamental task in SVS. However, many current approaches often overlook occlusion by nearby objects, posing challenges to real-world SVS applications. To address this crucial issue, this paper presents a comprehensive comparative analysis of occlusion-handling techniques tailored for object detection. The review outlines the pretext tasks common to both domains and explores various architectural solutions to combat occlusion. Unlike prior studies that primarily focus on a single dataset, our analysis spans multiple benchmark datasets, providing a thorough assessment of various object detection methods. By extending the evaluation to datasets beyond the KITTI benchmark, this study offers a more holistic understanding of each approach's strengths and limitations. Additionally, we delve into persistent challenges in existing occlusion-handling approaches and emphasize the need for innovative strategies and future research directions to drive substantial progress in this field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Bayesian dynamic network modelling: an application to metabolic associations in cardiovascular diseases.
- Author
-
Molinari, Marco, Cremaschi, Andrea, De Iorio, Maria, Chaturvedi, Nishi, Hughes, Alun, and Tillin, Therese
- Subjects
- *
BAYESIAN analysis , *GIBBS sampling , *CARDIOVASCULAR diseases , *METABOLIC models , *SPARSE graphs - Abstract
We propose a novel approach to the estimation of multiple Graphical Models to analyse temporal patterns of association among a set of metabolites over different groups of patients. Our motivating application is the Southall And Brent REvisited (SABRE) study, a tri-ethnic cohort study conducted in the UK. We are interested in identifying potential ethnic differences in metabolite levels and associations as well as their evolution over time, with the aim of gaining a better understanding of different risk of cardio-metabolic disorders across ethnicities. Within a Bayesian framework, we employ a nodewise regression approach to infer the structure of the graphs, borrowing information across time as well as across ethnicities. The response variables of interest are metabolite levels measured at two time points and for two ethnic groups, Europeans and South-Asians. We use nodewise regression to estimate the high-dimensional precision matrices of the metabolites, imposing sparsity on the regression coefficients through the dynamic horseshoe prior, thus favouring sparser graphs. We provide the code to fit the proposed model using the software Stan, which performs posterior inference using Hamiltonian Monte Carlo sampling, as well as a detailed description of a block Gibbs sampling scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Spatial Versus Graphical Representation of Distributional Semantic Knowledge.
- Author
-
Mao, Shufan, Huebner, Philip A., and Willits, Jon A.
- Subjects
- *
KNOWLEDGE graphs , *NATURAL languages , *VECTOR spaces , *RUMOR , *SEMANTICS , *CORPORA - Abstract
Spatial distributional semantic models represent word meanings in a vector space. While able to model many basic semantic tasks, they are limited in many ways, such as their inability to represent multiple kinds of relations in a single semantic space and to directly leverage indirect relations between two lexical representations. To address these limitations, we propose a distributional graphical model that encodes lexical distributional data in a graphical structure and uses spreading activation for determining the plausibility of word sequences. We compare our model to existing spatial and graphical models by systematically varying parameters that contributing to dimensions of theoretical interest in semantic modeling. In order to be certain about what the models should be able to learn, we trained each model on an artificial corpus describing events in an artificial world simulation containing experimentally controlled verb–noun selectional preferences. The task used for model evaluation requires recovering observed selectional preferences and inferring semantically plausible but never observed verb–noun pairs. We show that the distributional graphical model performed better than all other models. Further, we argue that the relative success of this model comes from its improved ability to access the different orders of spatial representations with the spreading activation on the graph, enabling the model to infer the plausibility of noun–verb pairs unobserved in the training data. The model integrates classical ideas of representing semantic knowledge in a graph with spreading activation and more recent trends focused on the extraction of lexical distributional data from large natural language corpora. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Describing Conditional Independence Statements Using Undirected Graphs.
- Author
-
Malouche, Dhafer
- Subjects
- *
DISTRIBUTION (Probability theory) , *GAUSSIAN distribution , *AXIOMS - Abstract
This paper investigates the capability of undirected graphs (UGs) to represent a set of Conditional Independence (CI) statements derived from a given probability distribution of a random vector. While it is established that certain axioms can govern this set, providing sufficient conditions for UGs to capture specific CI statements, our focus is on covariance and concentration graphs. These remain the only known families of UGs capable of describing CI statements. We explore the issue of complete representation of CI statements through their corresponding covariance and concentration graphs. Two parameters are defined, one each from the covariance and concentration graphs, to determine the limitations concerning the cardinality of the conditioning subset that the graph can represent. We establish a relationship between these parameters and the cardinality of the separators in each graph, providing a straightforward computational method to evaluate them. In conclusion, we enhance the aforementioned procedure and introduce criteria to ascertain, without additional computations, whether the graphs can fully represent a given set of CI statements. We demonstrate that either the concentration or the covariance graph forms a cycle, and when considered in conjunction, they can represent the entire relation. These criteria also enable us, in specific cases, to deduce the covariance graph from the concentration graph and vice versa. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Nonparanormal graph quilting with applications to calcium imaging.
- Author
-
Chang, Andersen, Zheng, Lili, Dasarathy, Gautam, and Allen, Genevera I.
- Subjects
- *
QUILTING , *QUILTS , *CALCIUM , *FUNCTIONAL connectivity , *COVARIANCE matrices - Abstract
Probabilistic graphical models have become an important unsupervised learning tool for detecting network structures for a variety of problems, including the estimation of functional neuronal connectivity from two‐photon calcium imaging data. However, in the context of calcium imaging, technological limitations only allow for partially overlapping layers of neurons in a brain region of interest to be jointly recorded. In this case, graph estimation for the full data requires inference for edge selection when many pairs of neurons have no simultaneous observations. This leads to the graph quilting problem, which seeks to estimate a graph in the presence of block‐missingness in the empirical covariance matrix. Solutions for the graph quilting problem have previously been studied for Gaussian graphical models; however, neural activity data from calcium imaging are often non‐Gaussian, thereby requiring a more flexible modelling approach. Thus, in our work, we study two approaches for nonparanormal graph quilting based on the Gaussian copula graphical model, namely, a maximum likelihood procedure and a low rank‐based framework. We provide theoretical guarantees on edge recovery for the former approach under similar conditions to those previously developed for the Gaussian setting, and we investigate the empirical performance of both methods using simulations as well as real data calcium imaging data. Our approaches yield more scientifically meaningful functional connectivity estimates compared to existing Gaussian graph quilting methods for this calcium imaging data set. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. Random Graphical Model of Microbiome Interactions in Related Environments
- Author
-
Vinciotti, Veronica, Wit, Ernst C., and Richter, Francisco
- Published
- 2024
- Full Text
- View/download PDF
47. On Learning When to Decompose Graphical Models
- Author
-
Petrova, Aleksandra, Larrosa, Javier, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sellmann, Meinolf, editor, and Tierney, Kevin, editor
- Published
- 2023
- Full Text
- View/download PDF
48. Categorical Information Geometry
- Author
-
Perrone, Paolo, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nielsen, Frank, editor, and Barbaresco, Frédéric, editor
- Published
- 2023
- Full Text
- View/download PDF
49. Minimal labels, maximum gain : image classification with graph-based semi-supervised learning
- Author
-
Sellars, Philip, Schönlieb, Carola-Bibiane, and Aviles-Rivero, Angelica
- Subjects
Deep-Learning ,Image Classification ,Graphical Models ,Semi-Supervised - Abstract
In the last decade, the use and deployment of machine learning systems for computer vision has risen dramatically. To train a machine learning model it is often assumed that the practitioner has access to a large and representative labelled dataset from which they can optimise their model in a supervised manner. However, in many domains, there is a large cost to obtaining labelled data. In technical fields we need manual annotations from domain experts and for deep learning models we need large datasets to reduce over-fitting. Acting as a potential solution, the paradigm of semi-supervised learning extracts information from both labelled and unlabelled data and reduces the number of labels needed for training. This thesis deals with the development of novel classical and deep machine learning approaches for semi-supervised image classification. Our approaches are centred around graph-based learning, and we apply them to a range of real-world problems including hyperspectral, natural and medical imaging. Firstly, we propose and design a superpixel contracted semi-supervised learning framework to classify hyperspectral images. This approach is built around the p=2 graph Laplacian and uses over-segmentation to greatly reduce the size of the graph as well as providing a regularizing prior. Secondly, we combine graph based semi-supervised learning with deep neural networks and re-examine modern data ablation to create a state-of-the-art framework for natural image classification. Finally, we combine graph-based approaches, optimising the more demanding p=1 graph Laplacian, with deep neural networks architectures and apply it to the field of medical imaging. We design a general framework for diagnosis and apply it to chest X-rays, including the diagnosis of COVID-19. For all the approaches in the paper, we show, through rigorous experimental and detailed ablation studies, that our models produce state-of-the-art results and are competitive with fully supervised models whilst only using a fraction of the available labels. Overall, the contributions of this thesis are focused on the design and implementation of new graph-based semi-supervised frameworks for image classification, which include geometrical and data constraints along with deep neural-networks. Highlighting the power of semi-supervised learning to overcome the need for costly labelled datasets.
- Published
- 2021
- Full Text
- View/download PDF
50. Robustness in sum-product networks : from measurement to ensembles
- Author
-
Conaty, Diarmaid and Martinez del Rincon, Jesus
- Subjects
006.3 ,Sum-Product networks ,robustness ,machine learning ,probabilistic graphical models ,graphical models ,SPN - Abstract
The research presented in this thesis is an attempt to tackle the problem of trust in classifications using Sum-Product Networks. A method of gauging the reliability of a classification through perturbing model weights using Credal Sum-Product Networks and creating a metric in the form of robustness to represent this is presented and demonstrated empirically to be of use in this context. We propose a practical use for this tool as a key component of an ensemble Hierarchical Sum-Product Network model, formally define such an approach and then empirically show that it can improve model accuracy. Further to this, other possibilities for improving the accuracy in SPN classifications were investigated in the form of a novel modification, associating weights with product nodes. As with other probabilistic models, conclusions drawn from Sum-Product Networks are often sensitive to small perturbations in the numerical parameters, indicating lack of statistical support. Background is provided on the concept of Credal Sum-Product Networks, a class of imprecise probabilistic graphical models that extend SPNs to the imprecise case. Detail is presented of algorithms and complexity results for common inference tasks. We introduce the concept of robustness as a metric for prediction reliability, obtained through perturbing the weights of the SPN within a credal using CSPNs. Experiments are performed, using standard categorical datasets and a real world case study, that show empirically that CSPNs can distinguish between reliable and unreliable classifications of SPNs. Thus robustness can be seen as providing an important tool for the analysis of such models. An extension of CSPNs to facilitate robustness analysis over datasets containing continuous variables is achieved through altering the leaf nodes to propagate density values. Experiments across several continuous datasets are used to demonstrate that CSPNs are still an effective tool for measuring model robustness, with conclusions made using categorical data continuing to hold in the presence of continuous data. We introduce the concept of adding weights to the children of product nodes in the base SPN structure as exponents to the value computed for each child. A number of methods for calculating this method during the learning process are investigated alongside methods of scaling such values. Some modest but limited potential is observed for gaining accuracy at the risk of losing model explainability. We then expand on our work on robustness measurements by investigating their utility for deferring classification across an ensemble of classifiers. We demonstrate that performance gains can be obtained with such an approach in an ad-hoc hierarchical setting. From this, we develop a new method of ensemble learning using SPNs through the systematic creation of a hierarchy of learned classifiers. In testing time, this hierarchical approach defers the classification of the ensemble model to the hierarchical layer deemed most confident according to its robustness value computed by a CSPN. A proof is presented to show that our approach can only improve classification accuracy with respect to the initial classifier in the ensemble hierarchy. This proof is given empirical weight through multiple experiments using a large selection of standard categorical datasets. Further to this, the behaviour of the hierarchical SPN continues to be observed with variations to the number of layers and strongest learners of the hierarchy. This approach is shown to be more powerful than a number of state of the art ensemble-strategy competitors.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.