1. SEMbap: Bow-free covariance search and data de-correlation.
- Author
-
Grassi, Mario and Tarantino, Barbara
- Subjects
STRUCTURAL equation modeling ,DIRECTED acyclic graphs ,SEARCH algorithms ,PRINCIPAL components analysis ,GENE expression ,DIRECTED graphs - Abstract
Large-scale studies of gene expression are commonly influenced by biological and technical sources of expression variation, including batch effects, sample characteristics, and environmental impacts. Learning the causal relationships between observable variables may be challenging in the presence of unobserved confounders. Furthermore, many high-dimensional regression techniques may perform worse. In fact, controlling for unobserved confounding variables is essential, and many deconfounding methods have been suggested for application in a variety of situations. The main contribution of this article is the development of a two-stage deconfounding procedure based on Bow-free Acyclic Paths (BAP) search developed into the framework of Structural Equation Models (SEM), called SEMbap(). In the first stage, an exhaustive search of missing edges with significant covariance is performed via Shipley d-separation tests; then, in the second stage, a Constrained Gaussian Graphical Model (CGGM) is fitted or a low dimensional representation of bow-free edges structure is obtained via Graph Laplacian Principal Component Analysis (gLPCA). We compare four popular deconfounding methods to BAP search approach with applications on simulated and observed expression data. In the former, different structures of the hidden covariance matrix have been replicated. Compared to existing methods, BAP search algorithm is able to correctly identify hidden confounding whilst controlling false positive rate and achieving good fitting and perturbation metrics. Author summary: Directed acyclic graphs (DAGs) directed graph, with variables at the vertices and direct causal connections at the edges, can be used to illustrate the causal structure of the SEM, but this does not always mean that all significant factors are considered. We examine a class of models that may include some hidden variables. Specifically, we consider that the graph represents a bow-free acyclic path diagram (BAP), where the directed edges signify direct causal effects, while the bidirected edges suggest hidden confounders. In this paper, we provide a two-step deconfounding technique based on BAP search, which is included into the SEM framework via the SEMbap() function implemented in the R package SEMgraph. Secondly, we want to offer a significant evaluation of the most advanced deconfounding techniques using both synthetic and real data, as well as knowledge of a biological signaling pathway encoded in a DAG, in terms of (i) SEM fitting, (ii) system perturbation, and (iii) recovery performance metrics. The BAP search algorithm outperforms current techniques in accurately detecting hidden confounding, regulating false positive rate, and producing well-fitting and perturbation metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF