290 results on '"graphical lasso"'
Search Results
2. Sparse inference of the human haematopoietic system from heterogeneous and partially observed genomic data.
- Author
-
Sottile, Gianluca, Augugliaro, Luigi, Vinciotti, Veronica, Arancio, Walter, and Coronnello, Claudia
- Abstract
Haematopoiesis is the process of blood cells' formation, with progenitor stem cells differentiating into mature forms such as white and red blood cells or platelets. While progenitor cells share regulatory pathways involving common nuclear factors, specific networks shape their fate towards particular lineages. This paper analyses the complex regulatory network that drives the formation of mature red blood cells and platelets from their common precursors. Using the latest reverse transcription quantitative real-time PCR genomic data, we develop a dedicated graphical model that incorporates the effect of external genomic data and allows inference of regulatory networks from the high-dimensional and partially observed data. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
3. Exploring glioma heterogeneity through omics networks: from gene network discovery to causal insights and patient stratification.
- Author
-
Kastendiek, Nina, Coletti, Roberta, Gross, Thilo, and Lopes, Marta B.
- Subjects
- *
CANCER genetics , *BRAIN tumors , *JACOBIAN matrices , *LIFE sciences , *TRANSCRIPTOMES - Abstract
Gliomas are primary malignant brain tumors with a typically poor prognosis, exhibiting significant heterogeneity across different cancer types. Each glioma type possesses distinct molecular characteristics determining patient prognosis and therapeutic options. This study aims to explore the molecular complexity of gliomas at the transcriptome level, employing a comprehensive approach grounded in network discovery. The graphical lasso method was used to estimate a gene co-expression network for each glioma type from a transcriptomics dataset. Causality was subsequently inferred from correlation networks by estimating the Jacobian matrix. The networks were then analyzed for gene importance using centrality measures and modularity detection, leading to the selection of genes that might play an important role in the disease. To explore the pathways and biological functions these genes are involved in, KEGG and Gene Ontology (GO) enrichment analyses on the disclosed gene sets were performed, highlighting the significance of the genes selected across several relevent pathways and GO terms. Spectral clustering based on patient similarity networks was applied to stratify patients into groups with similar molecular characteristics and to assess whether the resulting clusters align with the diagnosed glioma type. The results presented highlight the ability of the proposed methodology to uncover relevant genes associated with glioma intertumoral heterogeneity. Further investigation might encompass biological validation of the putative biomarkers disclosed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. On the application of Gaussian graphical models to paired data problems.
- Author
-
Ranciati, Saverio and Roverato, Alberto
- Abstract
Gaussian graphical models are nowadays commonly applied to the comparison of groups sharing the same variables, by jointly learning their independence structures. We consider the case where there are exactly two dependent groups and the association structure is represented by a family of coloured Gaussian graphical models suited to deal with paired data problems. To learn the two dependent graphs, together with their across-graph association structure, we implement a fused graphical lasso penalty. We carry out a comprehensive analysis of this approach, with special attention to the role played by some relevant submodel classes. In this way, we provide a broad set of tools for the application of Gaussian graphical models to paired data problems. These include results useful for the specification of penalty values in order to obtain a path of lasso solutions and an ADMM algorithm that solves the fused graphical lasso optimization problem. Finally, we carry out a simulation study to compare our method with the traditional graphical lasso, and present an application of our method to cancer genomics where it is of interest to compare cancer cells with a control sample from histologically normal tissues adjacent to the tumor. All the methods described in this article are implemented in the R package pdglasso available at . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. IDGM: an approach to estimate the graphical model of interval-valued data.
- Author
-
Wu, Qiying, Wang, Huiwen, and Lu, Shan
- Abstract
Graphical models describe the conditional dependence structure among random variables via vertices and edges and have attracted increasing attention in recent years. However, when the variable is interval-valued instead of a scalar, it remains unclear how the graphical model can be estimated since interval-valued data impose additional complexity, including the lower bound should not be greater than the upper bound and each interval is itself a two-dimensional object. In this paper, we propose an algorithm, named the interval-valued data graphical model (IDGM), to realize such estimation, extending the graphical model concept to interval-valued data modeling. To address the complexity of interval-valued data, we apply the midpoints and log-ranges transformation to engage the center and range information of an interval. Then, we identify the network structure based on a variant 2 × 2 block-wise sparsity graphical lasso that incorporates the penalty term of the precision matrix. The numerical simulations along with two real-world applications in the fields of macroeconomics and finance show the advantages of IDGM over the competing methods and demonstrate the effectiveness of IDGM in graphical model estimation for interval-valued data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Meta graphical lasso: uncovering hidden interactions among latent mechanisms
- Author
-
Koji Maruhashi, Hisashi Kashima, Satoru Miyano, and Heewon Park
- Subjects
Graphical model ,Graphical lasso ,Latent factor ,Stiefel manifolds ,Medicine ,Science - Abstract
Abstract In complex systems, it’s crucial to uncover latent mechanisms and their context-dependent relationships. This is especially true in medical research, where identifying unknown cancer mechanisms and their impact on phenomena like drug resistance is vital. Directly observing these mechanisms is challenging due to measurement complexities, leading to an approach that infers latent mechanisms from observed variable distributions. Despite machine learning advancements enabling sophisticated generative models, their black-box nature complicates the interpretation of complex latent mechanisms. A promising method for understanding these mechanisms involves estimating latent factors through linear projection, though there’s no assurance that inferences made under specific conditions will remain valid across contexts. We propose a novel solution, suggesting data, even from systems appearing complex, can often be explained by sparse dependencies among a few common latent factors, regardless of the situation. This simplification allows for modeling that yields significant insights across diverse fields. We demonstrate this with datasets from finance, where we capture societal trends from stock price movements, and medicine, where we uncover new insights into cancer drug resistance through gene expression analysis.
- Published
- 2024
- Full Text
- View/download PDF
7. Inferring Diagnostic and Prognostic Gene Expression Signatures Across WHO Glioma Classifications: A Network-Based Approach.
- Author
-
Coletti, Roberta, Leiria de Mendonça, Mónica, Vinga, Susana, and Lopes, Marta B.
- Subjects
- *
PUBLIC health officers , *CENTRAL nervous system , *GENE regulatory networks , *GLIOMAS , *GENE expression - Abstract
Tumor heterogeneity is a challenge to designing effective and targeted therapies. Glioma-type identification depends on specific molecular and histological features, which are defined by the official World Health Organization (WHO) classification of the central nervous system (CNS). These guidelines are constantly updated to support the diagnosis process, which affects all the successive clinical decisions. In this context, the search for new potential diagnostic and prognostic targets, characteristic of each glioma type, is crucial to support the development of novel therapies. Based on The Cancer Genome Atlas (TCGA) glioma RNA-sequencing data set updated according to the 2016 and 2021 WHO guidelines, we proposed a 2-step variable selection approach for biomarker discovery. Our framework encompasses the graphical lasso algorithm to estimate sparse networks of genes carrying diagnostic information. These networks are then used as input for regularized Cox survival regression model, allowing the identification of a smaller subset of genes with prognostic value. In each step, the results derived from the 2016 and 2021 classes were discussed and compared. For both WHO glioma classifications, our analysis identifies potential biomarkers, characteristic of each glioma type. Yet, better results were obtained for the WHO CNS classification in 2021, thereby supporting recent efforts to include molecular data on glioma classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Estimation of Graphical Models: An Overview of Selected Topics.
- Author
-
Chen, Li‐Pang
- Abstract
Summary: Graphical modelling is an important branch of statistics that has been successfully applied in biology, social science, causal inference and so on. Graphical models illuminate connections between many variables and can even describe complex data structures or noisy data. Graphical models have been combined with supervised learning techniques such as regression modelling and classification analysis with multi‐class responses. This paper first reviews some fundamental graphical modelling concepts, focusing on estimation methods and computational algorithms. Several advanced topics are then considered, delving into complex graphical structures and noisy data. Applications in regression and classification are considered throughout. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Partial Tail-Correlation Coefficient Applied to Extremal-Network Learning.
- Author
-
Gong, Yan, Zhong, Peng, Opitz, Thomas, and Huser, Raphaël
- Subjects
- *
FOREIGN exchange , *RANDOM variables , *MULTIVARIATE analysis , *STATISTICAL correlation , *ALGEBRA , *GRAPHICAL modeling (Statistics) - Abstract
We propose a novel extremal dependence measure called the partial tail-correlation coefficient (PTCC), in analogy to the partial correlation coefficient in classical multivariate analysis. The construction of our new coefficient is based on the framework of multivariate regular variation and transformed-linear algebra operations. We show how this coefficient allows identifying pairs of variables that have partially uncorrelated tails given some other variables in a random vector. Unlike other recently introduced conditional independence frameworks for extremes, our approach requires minimal modeling assumptions and can thus be used in exploratory analyses to learn the structure of extremal graphical models. Similarly to traditional Gaussian graphical models where edges correspond to the nonzero entries of the precision matrix, we can exploit classical inference methods for high-dimensional data, such as the graphical Lasso with Laplacian spectral constraints, to efficiently learn the extremal network structure via the PTCC. We apply our new method to study extreme risk networks in two different datasets (extreme river discharges and historical global currency exchange data) and show that we can extract interesting extremal structures with meaningful domain-specific interpretations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Meta graphical lasso: uncovering hidden interactions among latent mechanisms.
- Author
-
Maruhashi, Koji, Kashima, Hisashi, Miyano, Satoru, and Park, Heewon
- Subjects
DRUG resistance in cancer cells ,DRUG resistance ,MACHINE learning ,GENE expression ,MEDICAL research - Abstract
In complex systems, it's crucial to uncover latent mechanisms and their context-dependent relationships. This is especially true in medical research, where identifying unknown cancer mechanisms and their impact on phenomena like drug resistance is vital. Directly observing these mechanisms is challenging due to measurement complexities, leading to an approach that infers latent mechanisms from observed variable distributions. Despite machine learning advancements enabling sophisticated generative models, their black-box nature complicates the interpretation of complex latent mechanisms. A promising method for understanding these mechanisms involves estimating latent factors through linear projection, though there's no assurance that inferences made under specific conditions will remain valid across contexts. We propose a novel solution, suggesting data, even from systems appearing complex, can often be explained by sparse dependencies among a few common latent factors, regardless of the situation. This simplification allows for modeling that yields significant insights across diverse fields. We demonstrate this with datasets from finance, where we capture societal trends from stock price movements, and medicine, where we uncover new insights into cancer drug resistance through gene expression analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Optimal Portfolio Using Factor Graphical Lasso*.
- Author
-
Lee, Tae-Hwy and Seregina, Ekaterina
- Subjects
SPARSE matrices ,LOW-rank matrices ,RATE of return on stocks ,FACTOR structure ,RISK exposure - Abstract
Graphical models are a powerful tool to estimate a high-dimensional inverse covariance (precision) matrix, which has been applied for a portfolio allocation problem. The assumption made by these models is a sparsity of the precision matrix. However, when stock returns are driven by common factors, such assumption does not hold. We address this limitation and develop a framework, Factor Graphical Lasso (FGL), which integrates graphical models with the factor structure in the context of portfolio allocation by decomposing a precision matrix into low-rank and sparse components. Our theoretical results and simulations show that FGL consistently estimates the portfolio weights and risk exposure and also that FGL is robust to heavy-tailed distributions which makes our method suitable for financial applications. FGL-based portfolios are shown to exhibit superior performance over several prominent competitors including equal-weighted and index portfolios in the empirical application for the S&P500 constituents. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Family‐wise error rate control in Gaussian graphical model selection via distributionally robust optimization
- Author
-
Tran, Chau, Cisneros‐Velarde, Pedro, Oh, Sang‐Yun, and Petersen, Alexander
- Subjects
Good Health and Well Being ,distributionally robust optimization ,family-wise error rate ,Gaussian graphical model ,graphical lasso ,Statistics - Published
- 2022
13. pISTA: PRECONDITIONED ITERATIVE SOFT THRESHOLDING ALGORITHM FOR GRAPHICAL LASSO.
- Author
-
SHALOM, GAL, TREISTER, ERAN, and YAVNEH, IRAD
- Subjects
- *
THRESHOLDING algorithms , *QUASI-Newton methods , *SPARSE matrices - Abstract
We propose a novel quasi-Newton method for solving the sparse inverse covariance estimation problem also known as the graphical least absolute shrinkage and selection operator (GLASSO). This problem is often solved using a second-order quadratic approximation. However, in such algorithms the Hessian term is complex and computationally expensive to handle. Therefore, our method uses the inverse of the Hessian as a preconditioner to simplify and approximate the quadratic element at the cost of a more complex l1 element. The variables of the resulting preconditioned problem are coupled only by the l1 subderivative of each other, which can be guessed with minimal cost using the gradient itself, allowing the algorithm to be parallelized and implemented efficiently on GPU hardware accelerators. Numerical results on synthetic and real data demonstrate that our method is competitive with other state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Partial correlation graphical LASSO.
- Author
-
Carter, Jack Storror, Rossell, David, and Smith, Jim Q.
- Subjects
- *
COVARIANCE matrices , *GRAPHICAL modeling (Statistics) , *STANDARDIZATION - Abstract
Standard likelihood penalties to learn Gaussian graphical models are based on regularizing the off‐diagonal entries of the precision matrix. Such methods, and their Bayesian counterparts, are not invariant to scalar multiplication of the variables, unless one standardizes the observed data to unit sample variances. We show that such standardization can have a strong effect on inference and introduce a new family of penalties based on partial correlations. We show that the latter, as well as the maximum likelihood, L0$$ {L}_0 $$ and logarithmic penalties are scale invariant. We illustrate the use of one such penalty, the partial correlation graphical LASSO, which sets an L1$$ {L}_1 $$ penalty on partial correlations. The associated optimization problem is no longer convex, but is conditionally convex. We show via simulated examples and in two real datasets that, besides being scale invariant, there can be important gains in terms of inference. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. The network of commodity risk.
- Author
-
Foroni, Beatrice, Morelli, Giacomo, and Petrella, Lea
- Abstract
In this paper, we investigate the interconnections among and within the Energy, Agricultural, and Metal commodities, operating in a risk management framework with a twofold goal. First, we estimate the Value-at-Risk (VaR) employing GARCH and Markov-switching GARCH models with different error term distributions. The use of such models allows us to take into account well-known stylized facts shown in the time series of commodities as well as possible regime changes in their conditional variance dynamics. We rely on backtesting procedures to select the best model for each commodity. Second, we estimate the sparse Gaussian Graphical model of commodities exploiting the Graphical LASSO (GLASSO) methodology to detect the most relevant conditional dependence structure among and within the sectors. A novel feature of our framework is that GLASSO estimation is achieved exploring the precision matrix of the multivariate Gaussian distribution obtained using a Gaussian copula with marginals given by the residuals of the aforementioned selected models. We apply our approach to the sample of twenty-four series of commodity futures prices over the years 2005–2022. We find that Soybean Oil, Cotton, and Coffee represent the major sources of propagation of financial distress in commodity markets while Gold, Natural Gas UK, and Heating Oil are depicted as safe-haven commodities. The impact of Covid-19 is reflected in increased heterogeneity, as captured by the strongest relationships between commodities belonging to the same commodity sector and by weakened inter-sectorial connections. This finding suggests that connectedness does not always increase in response to crisis events. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Penalized Model-Based Functional Clustering: A Regularization Approach via Shrinkage Methods
- Author
-
Pronello, Nicola, Ignaccolo, Rosaria, Ippoliti, Luigi, Fontanella, Sara, Gaul, Wolfgang, Managing Editor, Vichi, Maurizio, Managing Editor, Weihs, Claus, Managing Editor, Baier, Daniel, Editorial Board Member, Critchley, Frank, Editorial Board Member, Decker, Reinhold, Editorial Board Member, Diday, Edwin, Editorial Board Member, Greenacre, Michael, Editorial Board Member, Lauro, Carlo Natale, Editorial Board Member, Meulman, Jacqueline, Editorial Board Member, Monari, Paola, Editorial Board Member, Nishisato, Shizuhiko, Editorial Board Member, Ohsumi, Noboru, Editorial Board Member, Opitz, Otto, Editorial Board Member, Ritter, Gunter, Editorial Board Member, Schader, Martin, Editorial Board Member, Brito, Paula, editor, Dias, José G., editor, Lausen, Berthold, editor, Montanari, Angela, editor, and Nugent, Rebecca, editor
- Published
- 2023
- Full Text
- View/download PDF
17. An Analysis of Travelers’ Personalities and Accommodations Ratings Using Open Datasets
- Author
-
Takahashi, Naoki, Hamada, Yuri, Shoji, Hiroko, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Rauterberg, Matthias, editor
- Published
- 2023
- Full Text
- View/download PDF
18. OPTIMIZACIÓN DE CARTERAS DE RENTA VARIABLE CON MACHINE LEARNINGOPTIMIZACIÓN DE CARTERAS DE RENTA VARIABLE CON MACHINE LEARNING
- Author
-
Alejandro Vargas Sánchez and André Nicolas Monje Prudencio
- Subjects
optimización de carteras ,aprendizaje automático ,graphical lasso ,clustering affinity propagation ,muti-dimensional scaling ,General Works - Abstract
Ante el creciente papel y aceptación de la Inteligencia Artificial en el mundo de las Finanzas, esta investigación propone aplicar técnicas de Aprendizaje Automático en la gestión de carteras de inversión de renta variable abriendo la posibilidad de mejorar el proceso de estructuración de portafolios que generen resultados empíricos óptimos en relación a técnicas tradicionales, como la cartera de Máximo Índice de Sharpe y el Portafolio Igualmente Ponderado. En contraparte a estas técnicas tradicionales, se aplica la Técnica de Clustering Affinity Propagation como la principal para identificar patrones de comportamiento similar entre empresas, complementada con el algoritmo de Graphical Lasso para estimar la estructura de dependencia de los datos y Multi-Dimensional Scaling para mejorar la representación visual de los Clusters. A través de los resultados, se identifica que el portafolio que maximiza las medidas de rentabilidad y riesgo es aquel conformado mediante estas técnicas de Machine Learning. Se concluye que al combinar estas tres técnicas de Aprendizaje Automático, se obtiene una alternativa viable y efectiva en la gestión de carteras de inversión en el mercado de renta variable.
- Published
- 2024
- Full Text
- View/download PDF
19. Using network analysis to examine the connectivity between the brain regions in rs-fMRI data of FND patient and healthy participant : A single subject study
- Author
-
Samira Ahmadi, Elham Faghihzadeh, and Mohammad Ali Oghabian
- Subjects
FND disease ,graphical lasso ,rs-fMRI ,Network analysis ,Gaussian graphical model ,Biology (General) ,QH301-705.5 ,Probabilities. Mathematical statistics ,QA273-280 - Abstract
Introduction: Functional neurological disorders (FND) is one of the most common causes of neuropathy, However, its cause continues to be mysterious. Understanding the underlying mechanisms of FND is crucial for treatment strategies. The study was conducted on brain images(rs-fMRI) taken from two volunteers (FND patient and healthy subject) who had the same characteristics. Method: We fitted Gaussian Graphical Models to a single subject data using a network approach. Results: Based on the results of the networks, the number of significant edges was more in the left hemisphere in the patient, but in the healthy person, the number of these non-zero edges was more in the right hemisphere. Both the networks related to the healthy person and the patient had high density. Therefore, it indicated that the regions considered by these 2 people were strongly related to each other. The results showed the existence of more links and positive relationships between the regions, most of which showed a strong relationship. Among these connections, there were also negative connections. The networks of the healthy participant with almost symmetrical structures and the patient with FND showed different characteristics, including asymmetry between the hemispheres. Conclusion: this study is the first to demonstrate that the brain regions of both FND patient and healthy participant can be conceptualized as networks. The findings of this study add to a growing body of literature that FND patient brain regions can be analyzed using network approaches.
- Published
- 2023
- Full Text
- View/download PDF
20. Using Network Analysis to Examine the Brain Regions Connectivity of Functional Neurological Disorder Patient and Healthy Participant.
- Author
-
Ahmadi, Samira, Oghabian, Mohammad Ali, and Faghihzadeh, Elham
- Subjects
NEUROPATHY ,NEUROLOGICAL disorders ,HIGH density lipoproteins ,ASSOCIATIONS, institutions, etc. ,BRAIN - Abstract
Introduction: Functional neurological disorder (FND) is one of the most common causes of neuropathy, However, its cause continues to be mysterious. Understanding the underlying mechanisms of that is crucial for treatment strategies. The study was conducted on brain images resting state fMRI taken from two volunteers (functional neurological disorder patient and healthy subject)who had the same characteristics. Methods: We fitted Gaussian Graphical Models to a single subject data using a network approach. Results: Based on the results of the networks, the number of significant edges was more in the left hemisphere in the patient, but in the healthy person, the number of these non-zero edges was more in the right hemisphere. Both the networks related to the healthy person and the patient had high density. Therefore, it indicated that the regions considered by these 2 people were strongly related to each other. The results showed the existence of more links and positive relationships between the regions, most of which showed a strong relationship. Among these connections, there were also negative connections. The networks of the healthy participant with almost symmetrical structures and the patient with Functional neurological disorder showed different characteristics, including asymmetry between the hemispheres. Conclusion: this study is the first to demonstrate that the brain regions of both functional neurological disorder patient and healthy participant can be conceptualized as networks. The findings of this study add to a growing body of literature that functional neurological disorder patient brain regions can be analyzed using network approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
21. Genetic architecture of inter-specific and -generic grass hybrids by network analysis on multi-omics data
- Author
-
Elesandro Bornhofen, Dario Fè, Istvan Nagy, Ingo Lenk, Morten Greve, Thomas Didion, Christian S. Jensen, Torben Asp, and Luc Janss
- Subjects
Graphical lasso ,Metabolome ,Multi-trait mixed model ,Network science ,Polyploid ,Transcriptome ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Understanding the mechanisms underlining forage production and its biomass nutritive quality at the omics level is crucial for boosting the output of high-quality dry matter per unit of land. Despite the advent of multiple omics integration for the study of biological systems in major crops, investigations on forage species are still scarce. Results Our results identified substantial changes in gene co-expression and metabolite-metabolite network topologies as a result of genetic perturbation by hybridizing L. perenne with another species within the genus (L. multiflorum) relative to across genera (F. pratensis). However, conserved hub genes and hub metabolomic features were detected between pedigree classes, some of which were highly heritable and displayed one or more significant edges with agronomic traits in a weighted omics-phenotype network. In spite of tagging relevant biological molecules as, for example, the light-induced rice 1 (LIR1), hub features were not necessarily better explanatory variables for omics-assisted prediction than features stochastically sampled and all available regressors. Conclusions The utilization of computational techniques for the reconstruction of co-expression networks facilitates the identification of key omic features that serve as central nodes and demonstrate correlation with the manifestation of observed traits. Our results also indicate a robust association between early multi-omic traits measured in a greenhouse setting and phenotypic traits evaluated under field conditions.
- Published
- 2023
- Full Text
- View/download PDF
22. Robust Multivariate Lasso Regression with Covariance Estimation.
- Author
-
Chang, Le and Welsh, A. H.
- Subjects
- *
OPTIMIZATION algorithms , *MATRIX inversion , *OUTLIER detection , *COVARIANCE matrices , *STATISTICAL correlation , *DATA analysis - Abstract
Multivariate regression with covariance estimation (MRCE) is a method that performs sparse estimation of multivariate regression coefficients, while taking account the covariance structure of the response variables. MRCE uses a penalized likelihood approach to simultaneously estimate the regression coefficients and the inverse covariance matrix so that prediction accuracy can be significantly improved. However, traditional likelihood-based methods such as MRCE can produce very misleading results in the presence of outliers. In this work, we propose an extension of MRCE, namely, a robust multivariate lasso regression with covariance estimation (RMLC) to handle potential outliers within the data. By using Huber's loss or Tukey's biweight loss, RMLC can be resistant to outliers in the responses or in both the responses and the covariates. A novel optimization algorithm that incorporates a 2-fold accelerated proximal gradient (APG) algorithm is developed to solve RMLC efficiently. We also demonstrate that our proposed RMLC enjoys the oracle property. Our simulation study shows that RMLC produces very reliable results for both the regression coefficients and the correlation structure of the responses, even if the data are contaminated. A real analysis on hyperspectral data further demonstrates the utility of RMLC. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. Genetic architecture of inter-specific and -generic grass hybrids by network analysis on multi-omics data.
- Author
-
Bornhofen, Elesandro, Fè, Dario, Nagy, Istvan, Lenk, Ingo, Greve, Morten, Didion, Thomas, Jensen, Christian S., Asp, Torben, and Janss, Luc
- Subjects
BIOMOLECULES ,BIOLOGICAL systems ,DATA analysis ,BIOLOGICAL tags ,BIOMASS production ,BIOMASS conversion ,IDENTIFICATION - Abstract
Background: Understanding the mechanisms underlining forage production and its biomass nutritive quality at the omics level is crucial for boosting the output of high-quality dry matter per unit of land. Despite the advent of multiple omics integration for the study of biological systems in major crops, investigations on forage species are still scarce. Results: Our results identified substantial changes in gene co-expression and metabolite-metabolite network topologies as a result of genetic perturbation by hybridizing L. perenne with another species within the genus (L. multiflorum) relative to across genera (F. pratensis). However, conserved hub genes and hub metabolomic features were detected between pedigree classes, some of which were highly heritable and displayed one or more significant edges with agronomic traits in a weighted omics-phenotype network. In spite of tagging relevant biological molecules as, for example, the light-induced rice 1 (LIR1), hub features were not necessarily better explanatory variables for omics-assisted prediction than features stochastically sampled and all available regressors. Conclusions: The utilization of computational techniques for the reconstruction of co-expression networks facilitates the identification of key omic features that serve as central nodes and demonstrate correlation with the manifestation of observed traits. Our results also indicate a robust association between early multi-omic traits measured in a greenhouse setting and phenotypic traits evaluated under field conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
24. Linear Manifold Modeling and Graph Estimation based on Multivariate Functional Data with Different Coarseness Scales.
- Author
-
Pircalabelu, Eugen and Claeskens, Gerda
- Subjects
- *
SAMPLE size (Statistics) , *FUNCTIONAL magnetic resonance imaging , *UNDIRECTED graphs - Abstract
We develop a high-dimensional graphical modeling approach for functional data where the number of functions exceeds the available sample size. This is accomplished by proposing a sparse estimator for a concentration matrix when identifying linear manifolds. As such, the procedure extends the ideas of the manifold representation for functional data to high-dimensional settings where the number of functions is larger than the sample size. By working in a penalized setting it enriches the functional data framework by estimating sparse undirected graphs that show how functional nodes connect to other functional nodes. The procedure allows multiple coarseness scales to be present in the data and proposes a simultaneous estimation of several related graphs. Its performance is illustrated using a real-life fMRI dataset and with simulated data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. オープンデータセットを用いた旅客施設と 利用者の性格分析.
- Author
-
高橋直己, 浜田百合, and 庄司裕子
- Published
- 2023
26. An Integrated Approach of Learning Genetic Networks From Genome-Wide Gene Expression Data Using Gaussian Graphical Model and Monte Carlo Method.
- Author
-
Zhao, Haitao, Datta, Sujay, and Duan, Zhong-Hui
- Subjects
- *
MONTE Carlo method , *GENE expression , *GENE regulatory networks , *UNDIRECTED graphs - Abstract
Global genetic networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single genes or local networks. The Gaussian graphical model (GGM) is widely applied to learn genetic networks because it defines an undirected graph decoding the conditional dependence between genes. Many algorithms based on the GGM have been proposed for learning genetic network structures. Because the number of gene variables is typically far more than the number of samples collected, and a real genetic network is typically sparse, the graphical lasso implementation of GGM becomes a popular tool for inferring the conditional interdependence among genes. However, graphical lasso, although showing good performance in low dimensional data sets, is computationally expensive and inefficient or even unable to work directly on genome-wide gene expression data sets. In this study, the method of Monte Carlo Gaussian graphical model (MCGGM) was proposed to learn global genetic networks of genes. This method uses a Monte Carlo approach to sample subnetworks from genome-wide gene expression data and graphical lasso to learn the structures of the subnetworks. The learned subnetworks are then integrated to approximate a global genetic network. The proposed method was evaluated with a relatively small real data set of RNA-seq expression levels. The results indicate the proposed method shows a strong ability of decoding the interactions with high conditional dependences among genes. The method was then applied to genome-wide data sets of RNA-seq expression levels. The gene interactions with high interdependence from the estimated global networks show that most of the predicted gene-gene interactions have been reported in the literatures playing important roles in different human cancers. Also, the results validate the ability and reliability of the proposed method to identify high conditional dependences among genes in large-scale data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
27. Graphical model for mixed data types.
- Author
-
Wu, Qiying, Wang, Huiwen, Lu, Shan, and Sun, Hui
- Subjects
- *
ACQUISITION of data , *DATA modeling , *ENGINES , *GRAPHICAL modeling (Statistics) - Abstract
With the development of data collection technologies, data types have become more diverse. Additionally, graphical models, as tools for describing variable network relationships, have become increasingly popular in recent years. Previous studies have focused on graphical models tailored to specific types of data. However, these existing methods fail to identify graphical models for mixed data types. The difficulty of constructing graphical models for mixed data types lies in the fact that each type of data has its own space, which challenges the estimation of network relationships in a graphical model when the data are combined. To address this issue, this study presents a novel method that utilizes a vectorization and alignment strategy developed particularly for mixed data types, including scalar, interval-valued, compositional, and functional data, to estimate a graphical model. By iteratively employing a block-sparse graphical lasso method on aligned data, the method can achieve satisfactory results, as shown by numerous simulation experiments. The results also validate the superiority of our proposed method over potential competing methods. Furthermore, this method was applied to an engine damage propagation network as an illustrative example. Our method provides a novel modeling approach for graphical models in the case of mixed data types. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
28. An Integrated Approach of Learning Genetic Networks From Genome-Wide Gene Expression Data Using Gaussian Graphical Model and Monte Carlo Method.
- Author
-
Haitao Zhao, Sujay Datta, and Zhong-Hui Duan
- Subjects
MONTE Carlo method ,GENE expression ,GENE regulatory networks ,UNDIRECTED graphs - Abstract
Global genetic networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single genes or local networks. The Gaussian graphical model (GGM) is widely applied to learn genetic networks because it defines an undirected graph decoding the conditional dependence between genes. Many algorithms based on the GGM have been proposed for learning genetic network structures. Because the number of gene variables is typically far more than the number of samples collected, and a real genetic network is typically sparse, the graphical lasso implementation of GGM becomes a popular tool for inferring the conditional interdependence among genes. However, graphical lasso, although showing good performance in low dimensional data sets, is computationally expensive and inefficient or even unable to work directly on genome-wide gene expression data sets. In this study, the method of Monte Carlo Gaussian graphical model (MCGGM) was proposed to learn global genetic networks of genes. This method uses a Monte Carlo approach to sample subnetworks from genome-wide gene expression data and graphical lasso to learn the structures of the subnetworks. The learned subnetworks are then integrated to approximate a global genetic network. The proposed method was evaluated with a relatively small real data set of RNA-seq expression levels. The results indicate the proposed method shows a strong ability of decoding the interactions with high conditional dependences among genes. The method was then applied to genome-wide data sets of RNA-seq expression levels. The gene interactions with high interdependence from the estimated global networks show that most of the predicted gene-gene interactions have been reported in the literatures playing important roles in different human cancers. Also, the results validate the ability and reliability of the proposed method to identify high conditional dependences among genes in large-scale data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. A resample-replace lasso procedure for combining high-dimensional markers with limit of detection.
- Author
-
Wang, Jinjuan, Zhao, Yunpeng, Tang, Larry L., Mueller, Claudius, and Li, Qizhai
- Subjects
- *
DETECTION limit , *COVARIANCE matrices , *MATRIX inversion , *MEDICAL screening - Abstract
In disease screening, a biomarker combination developed by combining multiple markers tends to have a higher sensitivity than an individual marker. Parametric methods for marker combination rely on the inverse of covariance matrices, which is often a non-trivial problem for high-dimensional data generated by modern high-throughput technologies. Additionally, another common problem in disease diagnosis is the existence of limit of detection (LOD) for an instrument – that is, when a biomarker's value falls below the limit, it cannot be observed and is assigned an NA value. To handle these two challenges in combining high-dimensional biomarkers with the presence of LOD, we propose a resample-replace lasso procedure. We first impute the values below LOD and then use the graphical lasso method to estimate the means and precision matrices for the high-dimensional biomarkers. The simulation results show that our method outperforms alternative methods such as either substitute NA values with LOD values or remove observations that have NA values. A real case analysis on a protein profiling study of glioblastoma patients on their survival status indicates that the biomarker combination obtained through the proposed method is more accurate in distinguishing between two groups. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. Gaussian graphical models with applications to omics analyses.
- Author
-
Shutta, Katherine H., De Vito, Roberta, Scholtens, Denise M., and Balasubramanian, Raji
- Subjects
- *
GENE expression profiling , *OVARIAN cancer , *GENOMICS , *PROTEOMICS - Abstract
Gaussian graphical models (GGMs) provide a framework for modeling conditional dependencies in multivariate data. In this tutorial, we provide an overview of GGM theory and a demonstration of various GGM tools in R. The mathematical foundations of GGMs are introduced with the goal of enabling the researcher to draw practical conclusions by interpreting model results. Background literature is presented, emphasizing methods recently developed for high‐dimensional applications such as genomics, proteomics, or metabolomics. The application of these methods is illustrated using a publicly available dataset of gene expression profiles from 578 participants with ovarian cancer in The Cancer Genome Atlas. Stand‐alone code for the demonstration is available as an RMarkdown file at https://github.com/katehoffshutta/ggmTutorial. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. Joint Gaussian graphical model estimation: A survey.
- Author
-
Tsai, Katherine, Koyejo, Oluwasanmi, and Kolar, Mladen
- Subjects
- *
GRAPHICAL modeling (Statistics) , *INFERENTIAL statistics , *SCIENTIFIC discoveries , *ELECTRONIC data processing - Abstract
Graphs representing complex systems often share a partial underlying structure across domains while retaining individual features. Thus, identifying common structures can shed light on the underlying signal, for instance, when applied to scientific discovery or clinical diagnoses. Furthermore, growing evidence shows that the shared structure across domains boosts the estimation power of graphs, particularly for high‐dimensional data. However, building a joint estimator to extract the common structure may be more complicated than it seems, most often due to data heterogeneity across sources. This manuscript surveys recent work on statistical inference of joint Gaussian graphical models, identifying model structures that fit various data generation processes. This article is categorized under:Data: Types and Structure > Graph and Network DataStatistical Models > Graphical Models [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering.
- Author
-
Casa, Alessandro, Cappozzo, Andrea, and Fop, Michael
- Subjects
- *
GAUSSIAN mixture models , *MATRIX decomposition , *SPARSE matrices , *EXPECTATION-maximization algorithms - Abstract
Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Sparse Covariance and Precision Random Design Regression
- Author
-
Fang, Xi, Winter, Steven, Kashlak, Adam B., Kilgour, D. Marc, editor, Kunze, Herb, editor, Makarov, Roman, editor, Melnik, Roderick, editor, and Wang, Xu, editor
- Published
- 2021
- Full Text
- View/download PDF
34. Statistical methods for the testing and estimation of linear dependence structures on paired high-dimensional data : application to genomic data
- Author
-
Mestres, Adrià Caballé, Bochkina, Natalia, and Aitken, Colin
- Subjects
572.8 ,high-dimensional data ,conditional dependence ,gene expression ,hypothesis testing ,correlation matrix ,precision matrix ,graphical lasso - Abstract
This thesis provides novel methodology for statistical analysis of paired high-dimensional genomic data, with the aimto identify gene interactions specific to each group of samples as well as the gene connections that change between the two classes of observations. An example of such groups can be patients under two medical conditions, in which the estimation of gene interaction networks is relevant to biologists as part of discerning gene regulatory mechanisms that control a disease process like, for instance, cancer. We construct these interaction networks fromdata by considering the non-zero structure of correlationmatrices, which measure linear dependence between random variables, and their inversematrices, which are commonly known as precision matrices and determine linear conditional dependence instead. In this regard, we study three statistical problems related to the testing, single estimation and joint estimation of (conditional) dependence structures. Firstly, we develop hypothesis testingmethods to assess the equality of two correlation matrices, and also two correlation sub-matrices, corresponding to two classes of samples, and hence the equality of the underlying gene interaction networks. We consider statistics based on the average of squares, maximum and sum of exceedances of sample correlations, which are suitable for both independent and paired observations. We derive the limiting distributions for the test statistics where possible and, for practical needs, we present a permuted samples based approach to find their corresponding non-parametric distributions. Cases where such hypothesis testing presents enough evidence against the null hypothesis of equality of two correlation matrices give rise to the problem of estimating two correlation (or precision) matrices. However, before that we address the statistical problem of estimating conditional dependence between random variables in a single class of samples when data are high-dimensional, which is the second topic of the thesis. We study the graphical lasso method which employs an L1 penalized likelihood expression to estimate the precision matrix and its underlying non-zero graph structure. The lasso penalization termis given by the L1 normof the precisionmatrix elements scaled by a regularization parameter, which determines the trade-off between sparsity of the graph and fit to the data, and its selection is our main focus of investigation. We propose several procedures to select the regularization parameter in the graphical lasso optimization problem that rely on network characteristics such as clustering or connectivity of the graph. Thirdly, we address the more general problem of estimating two precision matrices that are expected to be similar, when datasets are dependent, focusing on the particular case of paired observations. We propose a new method to estimate these precision matrices simultaneously, a weighted fused graphical lasso estimator. The analogous joint estimation method concerning two regression coefficient matrices, which we call weighted fused regression lasso, is also developed in this thesis under the same paired and high-dimensional setting. The two joint estimators maximize penalized marginal log likelihood functions, which encourage both sparsity and similarity in the estimated matrices, and that are solved using an alternating direction method of multipliers (ADMM) algorithm. Sparsity and similarity of thematrices are determined by two tuning parameters and we propose to choose them by controlling the corresponding average error rates related to the expected number of false positive edges in the estimated conditional dependence networks. These testing and estimation methods are implemented within the R package ldstatsHD, and are applied to a comprehensive range of simulated data sets as well as to high-dimensional real case studies of genomic data. We employ testing approaches with the purpose of discovering pathway lists of genes that present significantly different correlation matrices on healthy and unhealthy (e.g., tumor) samples. Besides, we use hypothesis testing problems on correlation sub-matrices to reduce the number of genes for estimation. The proposed joint estimation methods are then considered to find gene interactions that are common between medical conditions as well as interactions that vary in the presence of unhealthy tissues.
- Published
- 2018
35. graphiclasso: Graphical lasso for learning sparse inverse-covariance matrices.
- Author
-
Dallakyan, Aramayis
- Subjects
- *
SPARSE matrices , *COVARIANCE matrices , *AKAIKE information criterion , *CONVEX functions , *GAUSSIAN function - Abstract
In modern multivariate statistics, where high-dimensional datasets are ubiquitous, learning large (inverse-) covariance matrices is imperative for data analysis. A popular approach to estimating a large inverse-covariance matrix is to regularize the Gaussian log-likelihood function by imposing a convex penalty function. In a seminal article, Friedman, Hastie, and Tibshirani (2008, Biostatistics 9: 432–441) proposed a graphical lasso (Glasso) algorithm to efficiently estimate sparse inverse-covariance matrices from the convex regularized log-likelihood function. In this article, I first explore the Glasso algorithm and then introduce a new graphiclasso command for the large inverse-covariance matrix estimation. Moreover, I provide a useful command for tuning parameter selection in the Glasso algorithm using the extended Bayesian information criterion, the Akaike information criterion, and cross-validation. I demonstrate the use of Glasso using simulation results and real-world data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Tailored graphical lasso for data integration in gene network reconstruction
- Author
-
Camilla Lingjærde, Tonje G. Lien, Ørnulf Borgan, Helga Bergholtz, and Ingrid K. Glad
- Subjects
Graphical lasso ,Weighted graphical lasso ,High-dimensional inference ,Network models ,Genomics ,Multiomics ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Identifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in such situations, using $$L_1$$ L 1 -penalization on the matrix entries. The weighted graphical lasso is an extension in which prior biological information from other sources is integrated into the model. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, the method often fails to utilize the information effectively. Results We propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, tailoredGlasso. Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. We also find that among a larger set of methods, the tailored graphical is the most suitable for network inference from high-dimensional data with prior information of unknown accuracy. With our method, mRNA data are demonstrated to provide highly useful prior information for protein–protein interaction networks. Conclusions The method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading.
- Published
- 2021
- Full Text
- View/download PDF
37. The Model Selection Methods for Sparse Biological Networks
- Author
-
Kaygusuz, Mehmet Ali, Purutçuoğlu, Vilda, Xhafa, Fatos, Series Editor, Hemanth, D. Jude, editor, and Kose, Utku, editor
- Published
- 2020
- Full Text
- View/download PDF
38. On the Precision Matrix in Semi-High-Dimensional Settings
- Author
-
Hayashi, Kentaro, Yuan, Ke-Hai, Jiang, Ge, Wiberg, Marie, editor, Molenaar, Dylan, editor, González, Jorge, editor, Böckenholt, Ulf, editor, and Kim, Jee-Seon, editor
- Published
- 2020
- Full Text
- View/download PDF
39. Multi-task Attributed Graphical Lasso
- Author
-
Zhang, Yao, Xiong, Yun, Kong, Xiangnan, Liu, Xinyue, Zhu, Yangyong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wang, Xin, editor, Zhang, Rui, editor, Lee, Young-Koo, editor, Sun, Le, editor, and Moon, Yang-Sae, editor
- Published
- 2020
- Full Text
- View/download PDF
40. PORTFOLIO OPTIMIZATION WITH GRAPHICAL LASSO AND AN APPLICATION IN BORSA ISTANBUL.
- Author
-
USTAOĞLU, Erhan
- Subjects
PORTFOLIO management (Investments) ,COVARIANCE matrices ,MATRIX effect ,FEATURE selection ,MACHINE learning - Abstract
Copyright of Journal of Marmara University Social Sciences Institute / Öneri is the property of Marmara University, Institute of Social Sciences and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
41. Multi-task attributed graphical lasso and its application in fund classification.
- Author
-
Zhang, Yao, Peng, Sijia, Xiong, Yun, Kong, Xiangnan, Liu, Xinyue, and Zhu, Yangyong
- Subjects
- *
CLASSIFICATION , *TASKS , *CONCRETE , *MINES & mineral resources - Abstract
Sparse inverse covariance estimation, i.e., Graphical Lasso, reveals the underlying structure of graph for a set of variables on the basis of their observations. The estimated graphs can then facilitate a series of downstream tasks with graph mining techniques. Multi-task Graphical Lasso is designed for collectively estimating graphs sharing an identical set of variables, but it fails to contend with the situation when the tasks include different variables. In order to address this limitation, we propose Multi-task Attributed Graphical Lasso (MAGL) to learn graphs with observations and attributes jointly. Specifically, we introduce two concrete implementations, i.e., MAGL-LogDet and MAGL-HSIC, where the LogDet divergence and the Hilbert-Schmidt independence criterion are utilized respectively to explore latent relations between attributes of the variables and linkage structures among the variables. Experimental results show the effectiveness of MAGL-LogDet and MAGL-HSIC. We then apply MAGL to fund data and estimate stock graphs for each fund. We classify funds by using graph neural networks on the estimated graphs, and prove that we can benefit from MAGL in downstream tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
42. An Augmented High-Dimensional Graphical Lasso Method to Incorporate Prior Biological Knowledge for Global Network Learning
- Author
-
Yonghua Zhuang, Fuyong Xing, Debashis Ghosh, Farnoush Banaei-Kashani, Russell P. Bowler, and Katerina Kechris
- Subjects
graphical Lasso ,Gaussian graphical model ,protein-protein interaction ,gene network ,systems biology ,Genetics ,QH426-470 - Abstract
Biological networks are often inferred through Gaussian graphical models (GGMs) using gene or protein expression data only. GGMs identify conditional dependence by estimating a precision matrix between genes or proteins. However, conventional GGM approaches often ignore prior knowledge about protein-protein interactions (PPI). Recently, several groups have extended GGM to weighted graphical Lasso (wGlasso) and network-based gene set analysis (Netgsa) and have demonstrated the advantages of incorporating PPI information. However, these methods are either computationally intractable for large-scale data, or disregard weights in the PPI networks. To address these shortcomings, we extended the Netgsa approach and developed an augmented high-dimensional graphical Lasso (AhGlasso) method to incorporate edge weights in known PPI with omics data for global network learning. This new method outperforms weighted graphical Lasso-based algorithms with respect to computational time in simulated large-scale data settings while achieving better or comparable prediction accuracy of node connections. The total runtime of AhGlasso is approximately five times faster than weighted Glasso methods when the graph size ranges from 1,000 to 3,000 with a fixed sample size (n = 300). The runtime difference between AhGlasso and weighted Glasso increases when the graph size increases. Using proteomic data from a study on chronic obstructive pulmonary disease, we demonstrate that AhGlasso improves protein network inference compared to the Netgsa approach by incorporating PPI information.
- Published
- 2022
- Full Text
- View/download PDF
43. Sparse multivariate regression with missing values and its application to the prediction of material properties.
- Author
-
Teramoto, Keisuke and Hirose, Kei
- Subjects
MECHANICAL properties of condensed matter ,MISSING data (Statistics) ,STATISTICAL models ,MATRIX inversion ,REGRESSION analysis ,COVARIANCE matrices - Abstract
In the field of materials science and engineering, statistical analysis and machine learning techniques have recently been used to predict multiple material properties from an experimental design. These material properties correspond to response variables in the multivariate regression model. In this study, we conduct a penalized maximum likelihood procedure to estimate model parameters, including the regression coefficients and covariance matrix of response variables. In particular, we employ l1‐regularization to achieve a sparse estimation of The regression coefficients and inverse covariance matrix of response variables. In some cases, there may be a relatively large number of missing values in the response variables, owing to the difficulty of collecting data on material properties. We therefore propose a method that incorporates a correlation structure among the response variables into a statistical model to improve the prediction accuracy under the situation with missing values. The expectation maximization algorithm is also constructed, which enables application to a dataset with missing values in the responses. We apply our proposed procedure to real data consisting of 22 material properties. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. An Augmented High-Dimensional Graphical Lasso Method to Incorporate Prior Biological Knowledge for Global Network Learning.
- Author
-
Zhuang, Yonghua, Xing, Fuyong, Ghosh, Debashis, Banaei-Kashani, Farnoush, Bowler, Russell P., and Kechris, Katerina
- Subjects
BIOLOGICAL networks ,GLOBAL method of teaching ,PRIOR learning ,CHRONIC obstructive pulmonary disease ,PROTEIN-protein interactions ,GENE expression - Abstract
Biological networks are often inferred through Gaussian graphical models (GGMs) using gene or protein expression data only. GGMs identify conditional dependence by estimating a precision matrix between genes or proteins. However, conventional GGM approaches often ignore prior knowledge about protein-protein interactions (PPI). Recently, several groups have extended GGM to weighted graphical Lasso (wGlasso) and network-based gene set analysis (Netgsa) and have demonstrated the advantages of incorporating PPI information. However, these methods are either computationally intractable for large-scale data, or disregard weights in the PPI networks. To address these shortcomings, we extended the Netgsa approach and developed an augmented high-dimensional graphical Lasso (AhGlasso) method to incorporate edge weights in known PPI with omics data for global network learning. This new method outperforms weighted graphical Lasso-based algorithms with respect to computational time in simulated large-scale data settings while achieving better or comparable prediction accuracy of node connections. The total runtime of AhGlasso is approximately five times faster than weighted Glasso methods when the graph size ranges from 1,000 to 3,000 with a fixed sample size (n = 300). The runtime difference between AhGlasso and weighted Glasso increases when the graph size increases. Using proteomic data from a study on chronic obstructive pulmonary disease, we demonstrate that AhGlasso improves protein network inference compared to the Netgsa approach by incorporating PPI information. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. A random covariance model for bi‐level graphical modeling with application to resting‐state fMRI data.
- Author
-
Zhang, Lin, DiLernia, Andrew, Quevedo, Karina, Camchong, Jazmin, Lim, Kelvin, and Pan, Wei
- Subjects
- *
FUNCTIONAL magnetic resonance imaging , *KALMAN filtering , *RANDOM effects model , *GRAPHICAL modeling (Statistics) , *FUNCTIONAL connectivity - Abstract
We consider a novel problem, bi‐level graphical modeling, in which multiple individual graphical models can be considered as variants of a common group‐level graphical model and inference of both the group‐ and individual‐level graphical models is of interest. Such a problem arises from many applications, including multi‐subject neuro‐imaging and genomics data analysis. We propose a novel and efficient statistical method, the random covariance model, to learn the group‐ and individual‐level graphical models simultaneously. The proposed method can be nicely interpreted as a random covariance model that mimics the random effects model for mean structures in linear regression. It accounts for similarity between individual graphical models, identifies group‐level connections that are shared by individuals, and simultaneously infers multiple individual‐level networks. Compared to existing multiple graphical modeling methods that only focus on individual‐level graphical modeling, our model learns the group‐level structure underlying the multiple individual graphical models and enjoys computational efficiency that is particularly attractive for practical use. We further define a measure of degrees‐of‐freedom for the complexity of the model useful for model selection. We demonstrate the asymptotic properties of our method and show its finite‐sample performance through simulation studies. Finally, we apply the method to our motivating clinical data, a multi‐subject resting‐state functional magnetic resonance imaging dataset collected from participants diagnosed with schizophrenia, identifying both individual‐ and group‐level graphical models of functional connectivity. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
46. Estimating the AUC with a graphical lasso method for high-dimensional biomarkers with LOD.
- Author
-
Wang, Jirui, Zhao, Yunpeng, and Tang, Liansheng Larry
- Subjects
- *
RECEIVER operating characteristic curves , *ALGORITHMS , *NUMERICAL analysis - Abstract
This manuscript estimates the area under the receiver operating characteristic curve (AUC) of combined biomarkers in a high-dimensional setting. We propose a penalization approach to the inference of precision matrices in the presence of the limit of detection. A new version of expectation-maximization algorithm is then proposed for the penalized likelihood, with the use of numerical integration and the graphical lasso method. The estimated precision matrix is then applied to the inference of AUCs. The proposed method outperforms the existing methods in numerical studies. We apply the proposed method to a data set of brain tumor study. The results show a higher accuracy on the estimation of AUC compared with the existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. Strong links between plant traits and microbial activities but different abiotic drivers in mountain grasslands.
- Author
-
Weil, Sarah‐Sophie, Martinez‐Almoyna, Camille, Piton, Gabin, Renaud, Julien, Boulangeat, Louise, Foulquier, Arnaud, Saillard, Amélie, Choler, Philippe, Poulenard, Jérôme, Münkemüller, Tamara, and Thuiller, Wilfried
- Subjects
- *
GRASSLAND soils , *GRASSLANDS , *SOIL microbiology , *PLANT communities , *CONDITIONED response , *PLANT-soil relationships , *ECOSYSTEMS , *SOIL microbial ecology - Abstract
Aim: Plant–soil interactions can be major driving forces of community responses to environmental changes in terrestrial ecosystems. These interactions can leave signals in aboveground plant functional traits and belowground microbial activities and these signals can manifest in observed covariations. However, we know little about how these plant–soil linkages vary in response to environmental conditions at biogeographic scales for which experiments are impossible. Here, we investigate patterns of direct and indirect linkages between plant functional traits, soil microbial activities and environmental conditions in mountain grasslands along elevational gradients. Location: The French Alps. Taxon: Vascular plants and soil microbiota. Methods: We analysed observational grassland data sampled along 14 elevational gradients across the entire French Alps (between 1500 and 2800 m of elevation). Using Graphical Lasso, we inferred a partial correlation network to tease apart direct and indirect plant–soil linkages without defining the direction of interactions a priori. Results: We found tight spatial associations of plant traits with microbial activities, climate driving the former and soil properties the latter. In these plant–soil linkages, the dominance of specific plant traits was more important than their diversity. We then showed that in sites with conservative plant traits and reduced organic matter quality, soil microbes invested strongly in nutrient acquisition. Main conclusions: By investigating plant–soil linkages along elevational gradients in the French Alps, we showed that plant functional traits and belowground microbial activity are tightly linked and how they depend on environmental conditions. Overall, we demonstrated how soil functioning can be integrated in studies of ecosystem shifts under environmental change at large spatial scales. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
48. Convolutional Neural Network With Graphical Lasso to Extract Sparse Topological Features for Brain Disease Classification.
- Author
-
Ji, Junzhong and Yao, Yao
- Abstract
The functional connectivity provides new insights into the mechanisms of the human brain at network-level, which has been proved to be an effective biomarker for brain disease classification. Recently, machine learning methods have played an important role in functional connectivity classification, among which convolutional neural network (CNN) based methods become a new hot topic since they can extract topological features in the brain network. However, the conventional CNN-based methods haven’t taken sparse connectivity patterns (SCPs) of the human brain into consideration, which may lead to redundancy of the topological features, and limit their performance and generalization. To solve it, we propose a novel CNN-based model with graphical Lasso (CNNGLasso) to extract sparse topological features for brain disease classification. First, we develop a novel graphical Lasso model for revealing the SCPs at group-level. Then, the SCPs are used to guide the topological feature extraction. Finally, the obtained sparse topological features are used to classify the patients from normal controls. The experiment results on the ABIDE dataset demonstrate that the CNNGLasso outperforms the others on various performances. Besides, the abnormal brain regions derived from the trained model are consistent with the previous investigations, which further proves the application prospect of the CNNGLasso. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
49. Tailored graphical lasso for data integration in gene network reconstruction.
- Author
-
Lingjærde, Camilla, Lien, Tonje G., Borgan, Ørnulf, Bergholtz, Helga, and Glad, Ingrid K.
- Subjects
DATA integration ,MATRIX inversion ,COVARIANCE matrices ,GENE regulatory networks ,PROTEIN-protein interactions ,ACCURACY of information ,BIOLOGICAL networks - Abstract
Background: Identifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in such situations, using L 1 -penalization on the matrix entries. The weighted graphical lasso is an extension in which prior biological information from other sources is integrated into the model. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, the method often fails to utilize the information effectively. Results: We propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, tailoredGlasso. Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. We also find that among a larger set of methods, the tailored graphical is the most suitable for network inference from high-dimensional data with prior information of unknown accuracy. With our method, mRNA data are demonstrated to provide highly useful prior information for protein–protein interaction networks. Conclusions: The method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
50. Estimation of high-dimensional seemingly unrelated regression models.
- Author
-
Tan, Lidan, Chiong, Khai Xiang, and Moon, Hyungsik Roger
- Subjects
- *
SPARSE matrices , *REGRESSION analysis , *MATRIX inversion , *COVARIANCE matrices , *ASYMPTOTIC distribution - Abstract
In this article, we investigate seemingly unrelated regression (SUR) models that allow the number of equations (N) to be large and comparable to the number of the observations in each equation (T). It is well known that conventional SUR estimators, for example, the feasible generalized least squares estimator from Zellner (1962) does not perform well in a high-dimensional setting. We propose a new feasible GLS estimator called the feasible graphical lasso (FGLasso) estimator. For a feasible implementation of the GLS estimator, we use the graphical lasso estimation of the precision matrix (the inverse of the covariance matrix of the equation system errors) assuming that the underlying unknown precision matrix is sparse. We show that under certain conditions, FGLasso converges uniformly to GLS even when T < N, and it shares the same asymptotic distribution with the efficient GLS estimator when T > N log N. We confirm these results through finite sample Monte-Carlo simulations. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.