2,510 results on '"Adjacency matrix"'
Search Results
2. CRPGCN: predicting circRNA-disease associations using graph convolutional network based on heterogeneous network
- Author
-
Zhufang Kuang, Lei Deng, and Zhihao Ma
- Subjects
Computer science ,QH301-705.5 ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Principal component analysis ,Graph convolutional network ,computer.software_genre ,Biochemistry ,Heterogenous network ,Similarity (network science) ,Structural Biology ,Adjacency matrix ,Biology (General) ,Molecular Biology ,CircRNA-disease ,business.industry ,Mechanism (biology) ,Applied Mathematics ,Dimensionality reduction ,Deep learning ,Research ,RNA, Circular ,Computer Science Applications ,Graph (abstract data type) ,Artificial intelligence ,Data mining ,business ,computer ,Heterogeneous network ,Algorithms - Abstract
Background The existing studies show that circRNAs can be used as a biomarker of diseases and play a prominent role in the treatment and diagnosis of diseases. However, the relationships between the vast majority of circRNAs and diseases are still unclear, and more experiments are needed to study the mechanism of circRNAs. Nowadays, some scholars use the attributes between circRNAs and diseases to study and predict their associations. Nonetheless, most of the existing experimental methods use less information about the attributes of circRNAs, which has a certain impact on the accuracy of the final prediction results. On the other hand, some scholars also apply experimental methods to predict the associations between circRNAs and diseases. But such methods are usually expensive and time-consuming. Based on the above shortcomings, follow-up research is needed to propose a more efficient calculation-based method to predict the associations between circRNAs and diseases. Results In this study, a novel algorithm (method) is proposed, which is based on the Graph Convolutional Network (GCN) constructed with Random Walk with Restart (RWR) and Principal Component Analysis (PCA) to predict the associations between circRNAs and diseases (CRPGCN). In the construction of CRPGCN, the RWR algorithm is used to improve the similarity associations of the computed nodes with their neighbours. After that, the PCA method is used to dimensionality reduction and extract features, it makes the connection between circRNAs with higher similarity and diseases closer. Finally, The GCN algorithm is used to learn the features between circRNAs and diseases and calculate the final similarity scores, and the learning datas are constructed from the adjacency matrix, similarity matrix and feature matrix as a heterogeneous adjacency matrix and a heterogeneous feature matrix. Conclusions After 2-fold cross-validation, 5-fold cross-validation and 10-fold cross-validation, the area under the ROC curve of the CRPGCN is 0.9490, 0.9720 and 0.9722, respectively. The CRPGCN method has a valuable effect in predict the associations between circRNAs and diseases.
- Published
- 2021
3. Protein complexes identification based on go attributed network embedding
- Author
-
Zengyou He, Xiaoxia Liu, Kun Li, Wei Zheng, Bo Xu, Yijia Zhang, and Zhehuan Zhao
- Subjects
0301 basic medicine ,Proteomics ,Computer science ,02 engineering and technology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Ranking (information retrieval) ,03 medical and health sciences ,Similarity (network science) ,Structural Biology ,020204 information systems ,Protein Interaction Mapping ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,Adjacency matrix ,Representation (mathematics) ,Molecular Biology ,lcsh:QH301-705.5 ,Clique ,Network embedding ,business.industry ,Protein-protein interaction network ,Applied Mathematics ,Node (networking) ,Proteins ,Pattern recognition ,Protein complexes identification ,Computer Science Applications ,Identification (information) ,030104 developmental biology ,ComputingMethodologies_PATTERNRECOGNITION ,lcsh:Biology (General) ,lcsh:R858-859.7 ,Artificial intelligence ,business ,Research Article - Abstract
Background Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate different types of biological information into the complexes discovery process under a unified framework. Recently, attributed network embedding methods have be proved to be remarkably effective in generating vector representations for nodes in the network. In the transformed vector space, both the topological proximity and node attributed affinity between different nodes are preserved. Therefore, such attributed network embedding methods provide us a unified framework to integrate various biological evidences into the protein complexes identification process. Results In this article, we propose a new method called GANE to predict protein complexes based on Gene Ontology (GO) attributed network embedding. Firstly, it learns the vector representation for each protein from a GO attributed PPI network. Based on the pair-wise vector representation similarity, a weighted adjacency matrix is constructed. Secondly, it uses the clique mining method to generate candidate cores. Consequently, seed cores are obtained by ranking candidate cores based on their densities on the weighted adjacency matrix and removing redundant cores. For each seed core, its attachments are the proteins with correlation score that is larger than a given threshold. The combination of a seed core and its attachment proteins is reported as a predicted protein complex by the GANE algorithm. For performance evaluation, we compared GANE with six protein complex identification methods on five yeast PPI networks. Experimental results showes that GANE performs better than the competing algorithms in terms of different evaluation metrics. Conclusions GANE provides a framework that integrate many valuable and different biological information into the task of protein complex identification. The protein vector representation learned from our attributed PPI network can also be used in other tasks, such as PPI prediction and disease gene prediction. Electronic supplementary material The online version of this article (10.1186/s12859-018-2555-x) contains supplementary material, which is available to authorized users.
- Published
- 2018
4. GCNFORMER: graph convolutional network and transformer for predicting lncRNA-disease associations.
- Author
-
Yao, Dengju, Li, Bailin, Zhan, Xiaojuan, Zhan, Xiaorong, and Yu, Liyang
- Subjects
TRANSFORMER models ,LINCRNA ,CANCER case studies ,GENE expression ,FEATURE extraction - Abstract
Background: A growing body of researches indicate that the disrupted expression of long non-coding RNA (lncRNA) is linked to a range of human disorders. Therefore, the effective prediction of lncRNA-disease association (LDA) can not only suggest solutions to diagnose a condition but also save significant time and labor costs. Method: In this work, we proposed a novel LDA predicting algorithm based on graph convolutional network and transformer, named GCNFORMER. Firstly, we integrated the intraclass similarity and interclass connections between miRNAs, lncRNAs and diseases, and built a graph adjacency matrix. Secondly, to completely obtain the features between various nodes, we employed a graph convolutional network for feature extraction. Finally, to obtain the global dependencies between inputs and outputs, we used a transformer encoder with a multiheaded attention mechanism to forecast lncRNA-disease associations. Results: The results of fivefold cross-validation experiment on the public dataset revealed that the AUC and AUPR of GCNFORMER achieved 0.9739 and 0.9812, respectively. We compared GCNFORMER with six advanced LDA prediction models, and the results indicated its superiority over the other six models. Furthermore, GCNFORMER's effectiveness in predicting potential LDAs is underscored by case studies on breast cancer, colon cancer and lung cancer. Conclusions: The combination of graph convolutional network and transformer can effectively improve the performance of LDA prediction model and promote the in-depth development of this research filed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. CRPGCN: predicting circRNA-disease associations using graph convolutional network based on heterogeneous network.
- Author
-
Ma, Zhihao, Kuang, Zhufang, and Deng, Lei
- Subjects
PRINCIPAL components analysis ,ALGORITHMS ,RANDOM walks ,RECEIVER operating characteristic curves ,FORECASTING - Abstract
Background: The existing studies show that circRNAs can be used as a biomarker of diseases and play a prominent role in the treatment and diagnosis of diseases. However, the relationships between the vast majority of circRNAs and diseases are still unclear, and more experiments are needed to study the mechanism of circRNAs. Nowadays, some scholars use the attributes between circRNAs and diseases to study and predict their associations. Nonetheless, most of the existing experimental methods use less information about the attributes of circRNAs, which has a certain impact on the accuracy of the final prediction results. On the other hand, some scholars also apply experimental methods to predict the associations between circRNAs and diseases. But such methods are usually expensive and time-consuming. Based on the above shortcomings, follow-up research is needed to propose a more efficient calculation-based method to predict the associations between circRNAs and diseases. Results: In this study, a novel algorithm (method) is proposed, which is based on the Graph Convolutional Network (GCN) constructed with Random Walk with Restart (RWR) and Principal Component Analysis (PCA) to predict the associations between circRNAs and diseases (CRPGCN). In the construction of CRPGCN, the RWR algorithm is used to improve the similarity associations of the computed nodes with their neighbours. After that, the PCA method is used to dimensionality reduction and extract features, it makes the connection between circRNAs with higher similarity and diseases closer. Finally, The GCN algorithm is used to learn the features between circRNAs and diseases and calculate the final similarity scores, and the learning datas are constructed from the adjacency matrix, similarity matrix and feature matrix as a heterogeneous adjacency matrix and a heterogeneous feature matrix. Conclusions: After 2-fold cross-validation, 5-fold cross-validation and 10-fold cross-validation, the area under the ROC curve of the CRPGCN is 0.9490, 0.9720 and 0.9722, respectively. The CRPGCN method has a valuable effect in predict the associations between circRNAs and diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Biomarker detection using corrected degree of domesticity in hybrid social network feature selection for improving classifier performance.
- Author
-
Zengin, Hatice Yağmur and Karabulut, Erdem
- Subjects
SOCIAL networks ,BIOMARKERS ,MACHINE learning ,FEATURE selection ,SOCIAL network analysis ,RECEIVER operating characteristic curves ,SUPPORT vector machines - Abstract
Background: Dimension reduction, especially feature selection, is an important step in improving classification performance for high-dimensional data. Particularly in cancer research, when reducing the number of features, i.e., genes, it is important to select the most informative features/potential biomarkers that could affect the diagnostic accuracy. Therefore, researchers continuously try to explore more efficient ways to reduce the large number of features/genes to a small but informative subset before the classification task. Hybrid methods have been extensively investigated for this purpose, and research to find the optimal approach is ongoing. Social network analysis is used as a part of a hybrid method, although there are several issues that have arisen when using social network tools, such as using a single environment for computing, constructing an adjacency matrix or computing network measures. Therefore, in our study, we apply a hybrid feature selection method consisting of several machine learning algorithms in addition to social network analysis with our proposed network metric, called the corrected degree of domesticity, in a single environment, R, to improve the support vector machine classifier's performance. In addition, we evaluate and compare the performances of several combinations used in the different steps of the method with a simulation experiment. Results: The proposed method improves the classifier's performance compared to using the whole feature set in all the cases we investigate. Additionally, in terms of the area under the receiver operating characteristic (ROC) curve, our approach improves classification performance compared to several approaches in the literature. Conclusion: When using the corrected degree of domesticity as a network degree centrality measure, it is important to use our correction to compare nodes/features with no connection outside of their community since it provides a more accurate ranking among the features. Due to the nature of the hybrid method, which includes social network analysis, it is necessary to investigate possible combinations to provide an optimal solution for the microarray data used in the research. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Solving the puzzle of quality of life in cancer: integrating causal inference and machine learning for data-driven insights.
- Author
-
Bozcuk, Hakan Şat and Alemdar, Mustafa Serkan
- Subjects
MACHINE learning ,ACYCLIC model ,PYTHON programming language ,CAUSAL inference ,SOCIAL skills - Abstract
Background: Understanding the determinants of global quality of life in cancer patients is crucial for improving their overall well-being. While correlations between various factors and quality of life have been established, the causal relationships remain largely unexplored. This study aimed to identify the causal factors influencing global quality of life in cancer patients and compare them with known correlative factors. Methods: We conducted a retrospective analysis of European Organization for Research and Treatment of Cancer Quality of Life Questionnaire data, alongside demographic and disease-related features, collected from new cancer patients during their initial visit to an oncology outpatient clinic. Correlations with global quality of life were identified using univariate and multivariate regression analyses. Causal inference analysis was performed using two approaches. First, we employed the Dowhy Python library for causal analysis, incorporating prior information and manual characterization of an acyclic graph. Second, we utilized the Linear Non-Gaussian Acyclic Model (LiNGAM) machine learning algorithm from the Lingam Python library, which automatically generated an acyclic graph without prior information. The significance level was set at p < 0.05. Results: Multivariate analysis of 469 new admissions revealed that disease stage, role functioning, emotional functioning, social functioning, fatigue, pain and diarrhea were linked with global quality of life. The most influential direct causal factors were emotional functioning, social functioning, and physical functioning, while the most influential indirect factors were physical functioning, emotional functioning, and fatigue. Additionally, the most prominent total causal factors were identified as type of cancer (diagnosis), cancer stage, and sex, with total causal effect ratios of -9.47, -4.67, and − 1.48, respectively. The LiNGAM algorithm identified type of cancer (diagnosis), nausea and vomiting and social functioning as significant, with total causal effect ratios of -9.47, -0.42, and 0.42, respectively. Conclusions: This study identified that causal factors for global quality of life in new cancer patients are distinct from correlative factors. Understanding these causal relationships could provide valuable insights into the complex dynamics of quality of life in cancer patients and guide targeted interventions to improve their well-being. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Network Evolution Model-based prediction of tumor mutation burden from radiomic-clinical features in endometrial cancers.
- Author
-
Tan, Qing, Wang, Qian, Jin, Suoqin, Zhou, Fuling, and Zou, Xiufen
- Subjects
ENDOMETRIAL cancer ,RECEIVER operating characteristic curves ,ENTROPY (Information theory) ,SUPPORT vector machines ,DNA polymerases - Abstract
Background: Endometrial Cancer (EC) is one of the most prevalent malignancies that affect the female population globally. In the context of immunotherapy, Tumor Mutation Burden (TMB) in the DNA polymerase epsilon (POLE) subtype of this cancer holds promise as a viable therapeutic target. Methods: We devised a method known as NEM-TIE to forecast the TMB status of patients with endometrial cancer. This approach utilized a combination of the Network Evolution Model, Transfer Information Entropy, Clique Percolation (CP) methodology, and Support Vector Machine (SVM) classification. To construct the Network Evolution Model, we employed an adjacency matrix that utilized transfer information entropy to assess the information gain between nodes of radiomic-clinical features. Subsequently, using the CP algorithm, we unearthed potentially pivotal modules in the Network Evolution Model. Finally, the SVM classifier extracted essential features from the module set. Results: Upon analyzing the importance of modules, we discovered that the dependence count energy in tumor volumes-of-interest holds immense significance in distinguishing TMB statuses among patients with endometrial cancer. Using the 13 radiomic-clinical features extracted via NEM-TIE, we demonstrated that the area under the receiver operating characteristic curve (AUROC) in the test set is 0.98 (95% confidence interval: 0.95–1.00), surpassing the performance of existing techniques such as the mRMR and Laplacian methods. Conclusions: Our study proposed the NEM-TIE method as a means to identify the TMB status of patients with endometrial cancer. The integration of radiomic-clinical data utilizing the NEM-TIE method may offer a novel technology for supplementary diagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Protein-protein interaction prediction based on multiple kernels and partial network with linear programming
- Author
-
Cathy H. Wu, Li Liao, and Lei Huang
- Subjects
0301 basic medicine ,Theoretical computer science ,Linear programming ,Computer science ,0206 medical engineering ,02 engineering and technology ,Random walk ,Synthetic data ,Network inference ,03 medical and health sciences ,Robustness (computer science) ,Structural Biology ,Modelling and Simulation ,Feature (machine learning) ,Interaction prediction ,Adjacency matrix ,Subnetwork ,Molecular Biology ,Applied Mathematics ,Research ,Computer Science Applications ,Protein interaction network ,030104 developmental biology ,Modeling and Simulation ,Kernel (statistics) ,Protein–protein interaction prediction ,Algorithm ,020602 bioinformatics - Abstract
Background Prediction of de novo protein-protein interaction is a critical step toward reconstructing PPI networks, which is a central task in systems biology. Recent computational approaches have shifted from making PPI prediction based on individual pairs and single data source to leveraging complementary information from multiple heterogeneous data sources and partial network structure. However, how to quickly learn weights for heterogeneous data sources remains a challenge. In this work, we developed a method to infer de novo PPIs by combining multiple data sources represented in kernel format and obtaining optimal weights based on random walk over the existing partial networks. Results Our proposed method utilizes Barker algorithm and the training data to construct a transition matrix which constrains how a random walk would traverse the partial network. Multiple heterogeneous features for the proteins in the network are then combined into the form of weighted kernel fusion, which provides a new "adjacency matrix" for the whole network that may consist of disconnected components but is required to comply with the transition matrix on the training subnetwork. This requirement is met by adjusting the weights to minimize the element-wise difference between the transition matrix and the weighted kernels. The minimization problem is solved by linear programming. The weighted kernel fusion is then transformed to regularized Laplacian (RL) kernel to infer missing or new edges in the PPI network, which can potentially connect the previously disconnected components. Conclusions The results on synthetic data demonstrated the soundness and robustness of the proposed algorithms under various conditions. And the results on real data show that the accuracies of PPI prediction for yeast data and human data measured as AUC are increased by up to 19 % and 11 % respectively, as compared to a control method without using optimal weights. Moreover, the weights learned by our method Weight Optimization by Linear Programming (WOLP) are very consistent with that learned by sampling, and can provide insights into the relations between PPIs and various feature kernel, thereby improving PPI prediction even for disconnected PPI networks.
- Published
- 2016
10. Graph theoretical comparison of functional connectivity between cLTP treated and untreated microelectrode arrays
- Author
-
Myles Akin, Rhonda Dzakpasu, and Yixin Guo
- Subjects
Theoretical computer science ,Artificial neural network ,business.industry ,Computer science ,General Neuroscience ,Spike train ,Scale-free network ,Pattern recognition ,Directed graph ,Degree distribution ,Cellular and Molecular Neuroscience ,Poster Presentation ,Adjacency matrix ,Artificial intelligence ,Graph property ,business ,Clustering coefficient - Abstract
Analyzing graph properties of neural networks has recently gained much attention in attempts to understand how information is processed in the brain. Using in-vitro techniques to form neural networks has increased in popularity as it allows one to develop small, easy to record networks that maintain many of the graph properties of larger brain networks [1]. One widely recognized tool for studying in vitro networks is the Microelectrode Array (MEAs) on which neurons can be cultured and recorded simultaneously. MEAs can be used to grow neural networks from disassociated cells to understand how neurons spontaneously connect to create networks and how these networks then evolve over time. In addition, these cultures can be treated with pharmacological agents to study how these agents affect the networks as a whole [2,3]. To understand the network formation of MEA cultured neurons, we study the graph theoretical properties of two MEAs networks, the control MEA network and the MEA network treated with chemical Long Term Potentiation (cLTP). The data sets for each MEA network consists of recording from three days: baseline, 2 days past baseline and 5 days past baseline. Based on these data sets and the assumption that each electrode on the MEA records one neuron, we construct functional connectivity graphs of MEA networks for different days. Nodes in such a connectivity graph represent the electrodes (also neurons). To determine whether there is a connection (an edge on the graph) between two nodes, we carry out several steps of computations. We first filter the recorded spike trains with a Gaussian kernel, and perform cross-correlation analysis using the Pearson product moment correlation coefficient [4]. We set a correlation threshold by applying a shuffling method to the inter-spike intervals of a spike train. Using thresholded correlations, unweighted, undirected adjacency matrices, we create corresponding graphs for untreated (not shown) and treated MEA networks (shown baseline and 5 days past baseline in Figure 1). We find that the synchronization and average node degree increase dramatically for the cLTP treated networks while the untreated network shows no obvious change. Figure 1 Graph models. (a) cLTP treated MEA network at baseline; (b) cLTP treated MEA network at 5 days past baseline. To better understand the treated and untreated MEA network, we will evaluate the graph theoretic properties, such as degree distribution and clustering coefficient. We will determine how cLTP affects these properties. The graphical analysis will enable us to identify what type of network each is (such as a small-world or a scale free network) and determine whether cLTP has an effect on the network development or merely on the strength of connectivity. We conjecture that cLTP treated networks have more efficient and quicker communication between nodes. Therefore, the cLTP treated networks show greater clustering as well as shorter path length than the untreated networks. Information flow is another important aspect of such graph model. We intend to develop directed graphs using transfer entropy to study how information flow of the network may change during its development.
- Published
- 2015
11. Graph Neural Network for representation learning of lung cancer.
- Author
-
Aftab, Rukhma, Qiang, Yan, Zhao, Juanjuan, Urrehman, Zia, and Zhao, Zijuan
- Subjects
LUNG cancer ,GAUSSIAN mixture models ,SQUAMOUS cell carcinoma ,SIGNAL convolution ,TUMOR classification - Abstract
The emergence of image-based systems to improve diagnostic pathology precision, involving the intent to label sets or bags of instances, greatly hinges on Multiple Instance Learning for Whole Slide Images(WSIs). Contemporary works have shown excellent performance for a neural network in MIL settings. Here, we examine a graph-based model to facilitate end-to-end learning and sample suitable patches using a tile-based approach. We propose MIL-GNN to employ a graph-based Variational Auto-encoder with a Gaussian mixture model to discover relations between sample patches for the purposes to aggregate patch details into an individual vector representation. Using the classical MIL dataset MUSK and distinguishing two lung cancer sub-types, lung cancer called adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), we exhibit the efficacy of our technique. We achieved a 97.42% accuracy on the MUSK dataset and a 94.3% AUC on the classification of lung cancer sub-types utilizing features. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Threshold selection in gene co-expression networks using spectral graph theory techniques
- Author
-
Michael A. Langston and Andy D. Perkins
- Subjects
Saccharomyces cerevisiae ,Biochemistry ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Cutoff ,Humans ,Gene Regulatory Networks ,Adjacency matrix ,Molecular Biology ,Selection (genetic algorithm) ,030304 developmental biology ,Mathematics ,Oligonucleotide Array Sequence Analysis ,0303 health sciences ,Algebraic connectivity ,Spectral graph theory ,Applied Mathematics ,Gene Expression Profiling ,Computational Biology ,Expression (computer science) ,Quantitative Biology::Genomics ,Spectral clustering ,Computer Science Applications ,Transformation (function) ,Proceedings ,030220 oncology & carcinogenesis ,Algorithm - Abstract
Background Gene co-expression networks are often constructed by computing some measure of similarity between expression levels of gene transcripts and subsequently applying a high-pass filter to remove all but the most likely biologically-significant relationships. The selection of this expression threshold necessarily has a significant effect on any conclusions derived from the resulting network. Many approaches have been taken to choose an appropriate threshold, among them computing levels of statistical significance, accepting only the top one percent of relationships, and selecting an arbitrary expression cutoff. Results We apply spectral graph theory methods to develop a systematic method for threshold selection. Eigenvalues and eigenvectors are computed for a transformation of the adjacency matrix of the network constructed at various threshold values. From these, we use a basic spectral clustering method to examine the set of gene-gene relationships and select a threshold dependent upon the community structure of the data. This approach is applied to two well-studied microarray data sets from Homo sapiens and Saccharomyces cerevisiae. Conclusion This method presents a systematic, data-based alternative to using more artificial cutoff values and results in a more conservative approach to threshold selection than some other popular techniques such as retaining only statistically-significant relationships or setting a cutoff to include a percentage of the highest correlations.
- Published
- 2009
13. Protein complexes identification based on go attributed network embedding.
- Author
-
Xu, Bo, Li, Kun, Zheng, Wei, Liu, Xiaoxia, Zhang, Yijia, Zhao, Zhehuan, and He, Zengyou
- Subjects
PROTEIN-protein interactions ,INTERMOLECULAR interactions ,PROTEOMICS ,MOLECULAR association ,ADAPTOR proteins - Abstract
Background: Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate different types of biological information into the complexes discovery process under a unified framework. Recently, attributed network embedding methods have be proved to be remarkably effective in generating vector representations for nodes in the network. In the transformed vector space, both the topological proximity and node attributed affinity between different nodes are preserved. Therefore, such attributed network embedding methods provide us a unified framework to integrate various biological evidences into the protein complexes identification process. Results: In this article, we propose a new method called GANE to predict protein complexes based on Gene Ontology (GO) attributed network embedding. Firstly, it learns the vector representation for each protein from a GO attributed PPI network. Based on the pair-wise vector representation similarity, a weighted adjacency matrix is constructed. Secondly, it uses the clique mining method to generate candidate cores. Consequently, seed cores are obtained by ranking candidate cores based on their densities on the weighted adjacency matrix and removing redundant cores. For each seed core, its attachments are the proteins with correlation score that is larger than a given threshold. The combination of a seed core and its attachment proteins is reported as a predicted protein complex by the GANE algorithm. For performance evaluation, we compared GANE with six protein complex identification methods on five yeast PPI networks. Experimental results showes that GANE performs better than the competing algorithms in terms of different evaluation metrics. Conclusions: GANE provides a framework that integrate many valuable and different biological information into the task of protein complex identification. The protein vector representation learned from our attributed PPI network can also be used in other tasks, such as PPI prediction and disease gene prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
14. MSV: a modular structural variant caller that reveals nested and complex rearrangements by unifying breakends inferred directly from reads.
- Author
-
Schmidt, Markus and Kutzner, Arne
- Published
- 2023
- Full Text
- View/download PDF
15. A proximity-based graph clustering method for the identification and application of transcription factor clusters.
- Author
-
Spadafore, Maxwell, Najarian, Kayvan, and Boyle, Alan P.
- Subjects
TRANSCRIPTION factors ,CELL physiology ,DNA ,GENE regulatory networks ,MARKOV processes - Abstract
Background: Transcription factors (TFs) form a complex regulatory network within the cell that is crucial to cell functioning and human health. While methods to establish where a TF binds to DNA are well established, these methods provide no information describing how TFs interact with one another when they do bind. TFs tend to bind the genome in clusters, and current methods to identify these clusters are either limited in scope, unable to detect relationships beyond motif similarity, or not applied to TF-TF interactions. Methods: Here, we present a proximity-based graph clustering approach to identify TF clusters using either ChIP-seq or motif search data. We use TF co-occurrence to construct a filtered, normalized adjacency matrix and use the Markov Clustering Algorithm to partition the graph while maintaining TF-cluster and cluster-cluster interactions. We then apply our graph structure beyond clustering, using it to increase the accuracy of motif-based TFBS searching for an example TF. Results: We show that our method produces small, manageable clusters that encapsulate many known, experimentally validated transcription factor interactions and that our method is capable of capturing interactions that motif similarity methods might miss. Our graph structure is able to significantly increase the accuracy of motif TFBS searching, demonstrating that the TF-TF connections within the graph correlate with biological TF-TF interactions. Conclusion: The interactions identified by our method correspond to biological reality and allow for fast exploration of TF clustering and regulatory dynamics. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
16. Comparison of co-expression measures: mutual information, correlation, and model based indices.
- Author
-
Lin Song, Langfelder, Peter, and Horvath, Steve
- Subjects
REGRESSION analysis ,STATISTICAL correlation ,GENE expression ,POLYNOMIALS ,MATRICES (Mathematics) - Abstract
Background: Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear howmuch MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes). Results: We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables. Conclusion: The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
17. lionessR: single sample network inference in R.
- Author
-
Kuijjer, Marieke L, Hsieh, Ping-Han, Quackenbush, John, and Glass, Kimberly
- Subjects
BONE cancer ,INDIVIDUALIZED medicine ,MEDICAL research ,GENE expression - Abstract
Background: In biomedical research, network inference algorithms are typically used to infer complex association patterns between biological entities, such as between genes or proteins, using data from a population. This resulting aggregate network, in essence, averages over the networks of those individuals in the population. LIONESS (Linear Interpolation to Obtain Network Estimates for Single Samples) is a method that can be used together with a network inference algorithm to extract networks for individual samples in a population. The method's key characteristic is that, by modeling networks for individual samples in a data set, it can capture network heterogeneity in a population. LIONESS was originally made available as a function within the PANDA (Passing Attributes between Networks for Data Assimilation) regulatory network reconstruction framework. However, the LIONESS algorithm is generalizable and can be used to model single sample networks based on a wide range of network inference algorithms.Results: In this software article, we describe lionessR, an R implementation of LIONESS that can be applied to any network inference method in R that outputs a complete, weighted adjacency matrix. As an example, we provide a vignette of an application of lionessR to model single sample networks based on correlated gene expression in a bone cancer dataset. We show how the tool can be used to identify differential patterns of correlation between two groups of patients.Conclusions: We developed lionessR, an open source R package to model single sample networks. We show how lionessR can be used to inform us on potential precision medicine applications in cancer. The lionessR package is a user-friendly tool to perform such analyses. The package, which includes a vignette describing the application, is freely available at: https://github.com/kuijjerlab/lionessR and at: http://bioconductor.org/packages/lionessR . [ABSTRACT FROM AUTHOR]- Published
- 2019
- Full Text
- View/download PDF
18. Genome-wide transcriptome and gene family analysis reveal candidate genes associated with potassium uptake of maize colonized by arbuscular mycorrhizal fungi.
- Author
-
Xu, Yunjian, Yan, Yixiu, Zhou, Tianyi, Chun, Jianhui, Tu, Yuanchao, Yang, Xinyu, Qin, Jie, Ou, Luyan, Ye, Liang, and Liu, Fang
- Abstract
Background: Potassium (K) is an essential nutrient for plant growth and development. Maize (Zea mays) is a widely planted crops in the world and requires a huge amount of K fertilizer. Arbuscular mycorrhizal fungi (AMF) are closely related to the K uptake of maize. Genetic improvement of maize K utilization efficiency will require elucidating the molecular mechanisms of maize K uptake through the mycorrhizal pathway. Here, we employed transcriptome and gene family analysis to elucidate the mechanism influencing the K uptake and utilization efficiency of mycorrhizal maize. Methods and results: The transcriptomes of maize were studied with and without AMF inoculation and under different K conditions. AM symbiosis increased the K concentration and dry weight of maize plants. RNA sequencing revealed that genes associated with the activity of the apoplast and nutrient reservoir were significantly enriched in mycorrhizal roots under low-K conditions but not under high-K conditions. Weighted gene correlation network analysis revealed that three modules were strongly correlated with K content. Twenty-one hub genes enriched in pathways associated with glycerophospholipid metabolism, glycerolipid metabolism, starch and sucrose metabolism, and anthocyanin biosynthesis were further identified. In general, these hub genes were upregulated in AMF-colonized roots under low-K conditions. Additionally, the members of 14 gene families associated with K obtain were identified (ARF: 38, ILK: 4, RBOH: 12, RUPO: 20, MAPKK: 89, CBL: 14, CIPK: 44, CPK: 40, PIN: 10, MYB: 174, NPF: 79, KT: 19, HAK/HKT/KUP: 38, and CPA: 8) from maize. The transcript levels of these genes showed that 92 genes (ARF:6, CBL:5, CIPK:13, CPK:2, HAK/HKT/KUP:7, PIN:2, MYB:26, NPF:16, RBOH:1, MAPKK:12 and RUPO:2) were upregulated with AM symbiosis under low-K conditions. Conclusions: This study indicated that AMF increase the resistance of maize to low-K stress by regulating K uptake at the gene transcription level. Our findings provide a genome-level resource for the functional assignment of genes regulated by K treatment and AM symbiosis in K uptake-related gene families in maize. This may contribute to elucidate the molecular mechanisms of maize response to low K stress with AMF inoculation, and provided a theoretical basis for AMF application in the crop field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Answering open questions in biology using spatial genomics and structured methods.
- Author
-
Jena, Siddhartha G., Verma, Archit, and Engelhardt, Barbara E.
- Subjects
BIOLOGICAL systems ,CYTOLOGY ,CELL morphology ,CONCEPTUAL models ,BIOLOGICAL models - Abstract
Genomics methods have uncovered patterns in a range of biological systems, but obscure important aspects of cell behavior: the shapes, relative locations, movement, and interactions of cells in space. Spatial technologies that collect genomic or epigenomic data while preserving spatial information have begun to overcome these limitations. These new data promise a deeper understanding of the factors that affect cellular behavior, and in particular the ability to directly test existing theories about cell state and variation in the context of morphology, location, motility, and signaling that could not be tested before. Rapid advancements in resolution, ease-of-use, and scale of spatial genomics technologies to address these questions also require an updated toolkit of statistical methods with which to interrogate these data. We present a framework to respond to this new avenue of research: four open biological questions that can now be answered using spatial genomics data paired with methods for analysis. We outline spatial data modalities for each open question that may yield specific insights, discuss how conflicting theories may be tested by comparing the data to conceptual models of biological behavior, and highlight statistical and machine learning-based tools that may prove particularly helpful to recover biological understanding. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. PCP-GC-LM: single-sequence-based protein contact prediction using dual graph convolutional neural network and convolutional neural network.
- Author
-
Ouyang, J., Gao, Y., and Yang, Y.
- Subjects
CONVOLUTIONAL neural networks ,GRAPH neural networks ,PROTEIN structure ,NERVE tissue proteins ,DEEP learning - Abstract
Background: Recently, the process of evolution information and the deep learning network has promoted the improvement of protein contact prediction methods. Nevertheless, still remain some bottleneck: (1) One of the bottlenecks is the prediction of orphans and other fewer evolution information proteins. (2) The other bottleneck is the method of predicting single-sequence-based proteins mainly focuses on selecting protein sequence features and tuning the neural network architecture, However, while the deeper neural networks improve prediction accuracy, there is still the problem of increasing the computational burden. Compared with other neural networks in the field of protein prediction, the graph neural network has the following advantages: due to the advantage of revealing the topology structure via graph neural network and being able to take advantage of the hierarchical structure and local connectivity of graph neural networks has certain advantages in capturing the features of different levels of abstraction in protein molecules. When using protein sequence and structure information for joint training, the dependencies between the two kinds of information can be better captured. And it can process protein molecular structures of different lengths and shapes, while traditional neural networks need to convert proteins into fixed-size vectors or matrices for processing. Results: Here, we propose a single-sequence-based protein contact map predictor PCP-GC-LM, with dual-level graph neural networks and convolution networks. Our method performs better with other single-sequence-based predictors in different independent tests. In addition, to verify the validity of our method against complex protein structures, we will also compare it with other methods in two homodimers protein test sets (DeepHomo test dataset and CASP-CAPRI target dataset). Furthermore, we also perform ablation experiments to demonstrate the necessity of a dual graph network. In all, our framework presents new modules to accurately predict inter-chain contact maps in protein and it's also useful to analyze interactions in other types of protein complexes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Tensor product algorithms for inference of contact network from epidemiological data.
- Author
-
Dolgov, Sergey and Savostyanov, Dmitry
- Subjects
MARKOV chain Monte Carlo ,BAYESIAN analysis ,MARKOV processes ,BAYESIAN field theory ,TENSOR products - Abstract
We consider a problem of inferring contact network from nodal states observed during an epidemiological process. In a black-box Bayesian optimisation framework this problem reduces to a discrete likelihood optimisation over the set of possible networks. The cardinality of this set grows combinatorially with the number of network nodes, which makes this optimisation computationally challenging. For each network, its likelihood is the probability for the observed data to appear during the evolution of the epidemiological process on this network. This probability can be very small, particularly if the network is significantly different from the ground truth network, from which the observed data actually appear. A commonly used stochastic simulation algorithm struggles to recover rare events and hence to estimate small probabilities and likelihoods. In this paper we replace the stochastic simulation with solving the chemical master equation for the probabilities of all network states. Since this equation also suffers from the curse of dimensionality, we apply tensor train approximations to overcome it and enable fast and accurate computations. Numerical simulations demonstrate efficient black-box Bayesian inference of the network. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Med-MGF: multi-level graph-based framework for handling medical data imbalance and representation.
- Author
-
Nguyen, Tuong Minh, Poh, Kim Leng, Chong, Shu-Ling, and Lee, Jan Hau
- Subjects
MACHINE learning ,ELECTRONIC health records ,DIAGNOSIS ,SEPSIS ,HEALTH care networks - Abstract
Background: Modeling patient data, particularly electronic health records (EHR), is one of the major focuses of machine learning studies in healthcare, as these records provide clinicians with valuable information that can potentially assist them in disease diagnosis and decision-making. Methods: In this study, we present a multi-level graph-based framework called MedMGF, which models both patient medical profiles extracted from EHR data and their relationship network of health profiles in a single architecture. The medical profiles consist of several layers of data embedding derived from interval records obtained during hospitalization, and the patient-patient network is created by measuring the similarities between these profiles. We also propose a modification to the Focal Loss (FL) function to improve classification performance in imbalanced datasets without the need to imputate the data. MedMGF's performance was evaluated against several Graphical Convolutional Network (GCN) baseline models implemented with Binary Cross Entropy (BCE), FL, class balancing parameter α , and Synthetic Minority Oversampling Technique (SMOTE). Results: Our proposed framework achieved high classification performance (AUC: 0.8098, ACC: 0.7503, SEN: 0.8750, SPE: 0.7445, NPV: 0.9923, PPV: 0.1367) on an extreme imbalanced pediatric sepsis dataset (n=3,014, imbalance ratio of 0.047). It yielded a classification improvement of 3.81% for AUC, 15% for SEN compared to the baseline GCN+ α FL (AUC: 0.7717, ACC: 0.8144, SEN: 0.7250, SPE: 0.8185, PPV: 0.1559, NPV: 0.9847), and an improvement of 5.88% in AUC and 22.5% compared to GCN+FL+SMOTE (AUC: 0.7510, ACC: 0.8431, SEN: 0.6500, SPE: 0.8520, PPV: 0.1688, NPV: 0.9814). It also showed a classification improvement of 3.86% for AUC, 15% for SEN compared to the baseline GCN+ α BCE (AUC: 0.7712, ACC: 0.8133, SEN: 0.7250, SPE: 0.8173, PPV: 0.1551, NPV: 0.9847), and an improvement of 14.33% in AUC and 27.5% in comparison to GCN+BCE+SMOTE (AUC: 0.6665, ACC: 0.7271, SEN: 0.6000, SPE: 0.7329, PPV: 0.0941, NPV: 0.9754). Conclusion: When compared to all baseline models, MedMGF achieved the highest SEN and AUC results, demonstrating the potential for several healthcare applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. AC099850.3 promotes HBV-HCC cell proliferation and invasion through regulating CD276: a novel strategy for sorafenib and immune checkpoint combination therapy.
- Author
-
He, Aoxiao, Huang, Zhihao, Feng, Qian, Zhang, Shan, Li, Fan, Li, Dan, Lu, Hongcheng, and Wang, Jiakun
- Subjects
GENE expression ,CELL cycle ,IMMUNE checkpoint proteins ,HEPATITIS B ,CELL proliferation - Abstract
Background: This study investigates the molecular mechanisms of CC@AC&SF@PP NPs loaded with AC099850.3 siRNA and sorafenib (SF) for improving hepatitis B virus-related hepatocellular carcinoma (HBV-HCC). Methods: A dataset of 44 HBV-HCC patients and their survival information was selected from the TCGA database. Immune genes related to survival status were identified using the ImmPort database and WGCNA analysis. A prognostic risk model was constructed and analyzed using Lasso regression. Differential analysis was performed to screen key genes, and their significance and predictive accuracy for HBV-HCC were validated using Kaplan–Meier survival curves, ROC analysis, CIBERSORT analysis, and correlation analysis. The correlation between AC099850.3 and the gene expression matrix was calculated, followed by GO and KEGG enrichment analysis using AC099850.3 and its co-expressed genes. HepG2.2.15 cells were selected for in vitro validation, and lentivirus interference, cell cycle determination, CCK-8 experiments, colony formation assays, Transwell experiments, scratch experiments, and flow cytometry were performed to investigate the effects of key genes on HepG2.2.15 cells. A subcutaneous transplanted tumor model in mice was constructed to verify the inhibitory effect of key genes on HBV-HCC tumors. Subsequently, pH-triggered drug release NPs (CC@AC&SF@PP) were prepared, and their therapeutic effects on HBV-HCC in situ tumor mice were studied. Results: A prognostic risk model (AC012313.9, MIR210HG, AC099850.3, AL645933.2, C6orf223, GDF10) was constructed through bioinformatics analysis, showing good sensitivity and specificity in diagnostic prediction. AC099850.3 was identified as a key gene, and enrichment analysis revealed its impact on cell cycle pathways. In vitro cell experiments demonstrated that AC099850.3 promotes HepG2.2.15 cell proliferation and invasion by regulating immune checkpoint CD276 expression and cell cycle progression. In vivo, subcutaneously transplanted tumor experiments showed that AC099850.3 promotes the growth of HBV-HCC tumors in nude mice. Furthermore, pH-triggered drug release NPs (CC@AC&SF@PP) loaded with AC099850.3 siRNA and SF were successfully prepared and delivered to the in situ HBV-HCC, enhancing the effectiveness of combined therapy for HBV-HCC. Conclusions: AC099850.3 accelerates the cell cycle progression and promotes the occurrence and development of HBV-HCC by upregulating immune checkpoint CD276 expression. CC@AC&SF@PP NPs loaded with AC099850.3 siRNA and SF improve the effectiveness of combined therapy for HBV-HCC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Game theory elucidates how competitive dynamics mediate animal social networks.
- Author
-
Dubois, Frédérique
- Published
- 2024
- Full Text
- View/download PDF
25. Smccnet 2.0: a comprehensive tool for multi-omics network inference with shiny visualization.
- Author
-
Liu, Weixuan, Vu, Thao, R. Konigsberg, Iain, A. Pratte, Katherine, Zhuang, Yonghua, and Kechris, Katerina J.
- Subjects
MULTIOMICS ,MACHINE learning ,STATISTICAL correlation ,PHENOTYPES - Abstract
Summary: Sparse multiple canonical correlation network analysis (SmCCNet) is a machine learning technique for integrating omics data along with a variable of interest (e.g., phenotype of complex disease), and reconstructing multi-omics networks that are specific to this variable. We present the second-generation SmCCNet (SmCCNet 2.0) that adeptly integrates single or multiple omics data types along with a quantitative or binary phenotype of interest. In addition, this new package offers a streamlined setup process that can be configured manually or automatically, ensuring a flexible and user-friendly experience. Availability: This package is available in both CRAN: https://cran.r-project.org/web/packages/SmCCNet/index.html and Github: https://github.com/KechrisLab/SmCCNet under the MIT license. The network visualization tool is available at https://smccnet.shinyapps.io/smccnetnetwork/. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Construction of a gene model related to the prognosis of patients with gastric cancer receiving immunotherapy and exploration of COX7A1 gene function.
- Author
-
Wang, Si-yu, Wang, Yu-xin, Shen, Ao, Yang, Xian-qi, Liang, Cheng-cai, Huang, Run-jie, Jian, Rui, An, Nan, Xiao, Yu-long, Wang, Li-shuai, Zhao, Yin, Lin, Chuan, Wang, Chang-ping, Yuan, Zhi-ping, and Yuan, Shu-qiang
- Subjects
IMMUNOSTAINING ,GENE expression ,TREATMENT effectiveness ,CANCER prognosis ,PROGRESSION-free survival - Abstract
Background: GC is a highly heterogeneous tumor with different responses to immunotherapy, and the positive response depends on the unique interaction between the tumor and the tumor microenvironment (TME). However, the currently available methods for prognostic prediction are not satisfactory. Therefore, this study aims to construct a novel model that integrates relevant gene sets to predict the clinical efficacy of immunotherapy and the prognosis of GC patients based on machine learning. Methods: Seven GC datasets were collected from the Gene Expression Omnibus (GEO) database, The Cancer Genome Atlas (TCGA) database and literature sources. Based on the immunotherapy cohort, we first obtained a list of immunotherapy related genes through differential expression analysis. Then, Cox regression analysis was applied to divide these genes with prognostic significancy into protective and risky types. Then, the Single Sample Gene Set Enrichment Analysis (ssGSEA) algorithm was used to score the two categories of gene sets separately, and the scores differences between the two gene sets were used as the basis for constructing the prognostic model. Subsequently, Weighted Correlation Network Analysis (WGCNA) and Cytoscape were applied to further screen the gene sets of the constructed model, and finally COX7A1 was selected for the exploration and prediction of the relationship between the clinical efficacy of immunotherapy for GC. The correlation between COX7A1 and immune cell infiltration, drug sensitivity scoring, and immunohistochemical staining were performed to initially understand the potential role of COX7A1 in the development and progression of GC. Finally, the differential expression of COX7A1 was verified in those GC patients receiving immunotherapy. Results: First, 47 protective genes and 408 risky genes were obtained, and the ssGSEA algorithm was applied for model construction, showing good prognostic discrimination ability. In addition, the patients with high model scores showed higher TMB and MSI levels, and lower tumor heterogeneity scores. Then, it is found that the COX7A1 expressions in GC tissues were significantly lower than those in their corresponding paracancerous tissues. Meanwhile, the patients with high COX7A1 expression showed higher probability of cancer invasion, worse clinical efficacy of immunotherapy, worse overall survival (OS) and worse disease-free survival (DFS). Conclusions: The ssGSEA score we constructed can serve as a biomarker for GC patients and provide important guidance for individualized treatment. In addition, the COX7A1 gene can accurately distinguish the prognosis of GC patients and predict the clinical efficacy of immunotherapy for GC patients. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Spatial Markov matrices for measuring the spatial dependencies of an epidemiological spread : case Covid'19 Madagascar.
- Author
-
Tabera Tsilefa, Stefana and Raherinirina, Angelo
- Subjects
STOCHASTIC matrices ,MARKOV processes ,PROBABILITY theory ,PUBLISHED articles ,NEIGHBORHOODS - Abstract
Background: This article applies a variant of the Markov chain that explicitly incorporates spatial effects. It is an extension of the Markov class allowing a more complete analysis of the spatial dimensions of transition dynamics. The aim is to provide a methodology for applying the explicit model to spatial dependency analysis. Methods: Here, the question is to study and quantify whether neighborhood context affects transitional dynamics. Rather than estimating a homogeneous law, the model requires the estimation of k transition laws each dependent on spatial neighbor state. This article used published data on confirmed cases of Covid'19 in the 22 regions of Madagascar. These data were discretized to obtain a discrete state of propagation intensity. Results: The analysis gave us the transition probabilities between Covid'19 intensity states knowing the context of neighboring regions, and the propagation time laws knowing the spatial contexts. The results showed that neighboring regions had an effect on the propagation of Covid'19 in Madagascar. Conclusion: After analysis, we can say that there is spatial dependency according to these spatial transition matrices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Distinct tumor-TAM interactions in IDH-stratified glioma microenvironments unveiled by single-cell and spatial transcriptomics.
- Author
-
Motevasseli, Meysam, Darvishi, Maryam, Khoshnevisan, Alireza, Zeinalizadeh, Mehdi, Saffar, Hiva, Bayat, Shiva, Najafi, Ali, Abbaspour, Mohammad Javad, Mamivand, Ali, Olson, Susan B., and Tabrizi, Mina
- Subjects
IMMUNE checkpoint inhibitors ,TRANSCRIPTOMES ,TUMOR microenvironment ,GLIOBLASTOMA multiforme ,GLIOMAS - Abstract
Tumor-associated macrophages (TAMs) residing in the tumor microenvironment (TME) are characterized by their pivotal roles in tumor progression, antitumor immunity, and TME remodeling. However, a thorough comparative characterization of tumor-TAM crosstalk across IDH-defined categories of glioma remains elusive, likely contributing to mixed outcomes in clinical trials. We delineated the phenotypic heterogeneity of TAMs across IDH-stratified gliomas. Notably, two TAM subsets with a mesenchymal phenotype were enriched in IDH-WT glioblastoma (GBM) and correlated with poorer patient survival and reduced response to anti-PD-1 immune checkpoint inhibitor (ICI). We proposed SLAMF9 receptor as a potential therapeutic target. Inference of gene regulatory networks identified PPARG, ELK1, and MXI1 as master transcription factors of mesenchymal BMD-TAMs. Our analyses of reciprocal tumor-TAM interactions revealed distinct crosstalk in IDH-WT tumors, including ANXA1-FPR1/3, FN1-ITGAVB1, VEGFA-NRP1, and TNFSF12-TNFRSF12A with known contribution to immunosuppression, tumor proliferation, invasion and TAM recruitment. Spatially resolved transcriptomics further elucidated the architectural organization of highlighted communications. Furthermore, we demonstrated significant upregulation of ANXA1, FN1, NRP1, and TNFRSF12A genes in IDH-WT tumors using bulk RNA-seq and RT-qPCR. Longitudinal expression analysis of candidate genes revealed no difference between primary and recurrent tumors indicating that the interactive network of malignant states with TAMs does not drastically change upon recurrence. Collectively, our study offers insights into the unique cellular composition and communication of TAMs in glioma TME, revealing novel vulnerabilities for therapeutic interventions in IDH-WT GBM. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. mir-744-5p inhibits cell growth and angiogenesis in osteosarcoma by targeting NFIX.
- Author
-
Xie, Lin, Li, Wei, and Li, Yu
- Subjects
OSTEOSARCOMA ,IN vitro studies ,FLOW cytometry ,COLONY-forming units assay ,MICRORNA ,POLYMERASE chain reaction ,APOPTOSIS ,CELL proliferation ,DESCRIPTIVE statistics ,GENE expression ,CELL lines ,BIOINFORMATICS ,TUMOR suppressor genes ,PATHOLOGIC neovascularization ,CELL survival ,DNA-binding proteins ,DISEASE progression - Abstract
Background: Osteosarcoma (OS) is a malignant bone tumor that commonly occurs in children and adolescents under the age of 20. Dysregulation of microRNAs (miRNAs) is an important factor in the occurrence and progression of OS. MicroRNA miR-744-5p is aberrantly expressed in various tumors. However, its roles and molecular targets in OS remain unclear. Methods: Differentially expressed miRNAs in OS were analyzed using the Gene Expression Omnibus dataset GSE65071, and the potential hub miRNA was identified through weighted gene co-expression network analysis. Quantitative real-time PCR (qRT-PCR) was used to detect the expression of miR-744-5p in OS cell lines. In vitro experiments, including CCK-8 assays, colony formation assays, flow cytometry apoptosis assays, and tube formation assays, were performed to explore the effects of miR-744-5p on OS cell biological behaviors. The downstream target genes of miR-744-5p were predicted through bioinformatics, and the binding sites were validated by a dual-luciferase reporter assay. Results: The lowly expressed miRNA, miR-744-5p, was identified as a hub miRNA involved in OS progression through bioinformatic analysis. Nuclear factor I X (NFIX) was confirmed as a direct target for miR-744-5p in OS. In vitro studies revealed that overexpression of miR-744-5p could restrain the growth of OS cells, whereas miR-744-5p inhibition showed the opposite effect. It was also observed that treatment with the conditioned medium from miR-744-5p-overexpressed OS cells led to poorer proliferation and angiogenesis in human umbilical vein endothelial cells (HUVECs). Furthermore, NFIX overexpression restored the suppression effects of miR-744-5p overexpression on OS cell growth and HUVECs angiogenesis. Conclusion: Our results indicated that miR-744-5p is a potential tumor-suppressive miRNA in OS progression by targeting NFIX to restrain the growth of OS cells and angiogenesis in HUVECs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Identifying potential anthocyanin biosynthesis regulator in Chinese cherry by comprehensive genome-wide characterization of the R2R3-MYB transcription factor gene family.
- Author
-
Wang, Yan, Tu, Hongxia, Zhang, Jing, Wang, Hao, Liu, Zhenshan, Zhou, Jingting, He, Wen, Lin, Yuanxiu, Zhang, Yunting, Li, Mengyao, Wu, Zhiwei, Chen, Qing, Zhang, Yong, Luo, Ya, Tang, Haoru, and Wang, Xiaorong
- Subjects
TRANSCRIPTION factors ,MYB gene ,FRUIT skins ,COLOR variation (Biology) ,GENETIC transcription regulation - Abstract
Background: Chinese cherry [Cerasus pseudocerasus (Lindl.) G.Don] (syn. Prunus pseudocerasus Lindl.) is an economically important fruiting cherry species with a diverse range of attractive colors, spanning from the lightest yellow to the darkest black purple. However, the MYB transcription factors involved in anthocyanin biosynthesis underlying fruit color variation in Chinese cherry remain unknown. Results: In this study, we characterized the R2R3-MYB gene family of Chinese cherry by genome-wide identification and compared it with those of 10 Rosaceae relatives and Arabidopsis thaliana. A total of 1490 R2R3-MYBs were classified into 43 subfamilies, which included 29 subfamilies containing both Rosaceae MYBs and AtMYBs. One subfamily (S45) contained only Rosaceae MYBs, while three subfamilies (S12, S75, and S77) contained only AtMYBs. The variation in gene numbers within identical subfamilies among different species and the absence of certain subfamilies in some species indicated the species-specific expansion within MYB gene family in Chinese cherry and its relatives. Segmental and tandem duplication events primarily contributed to the expansion of Chinese cherry R2R3-CpMYBs. The duplicated gene pairs underwent purifying selection during evolution after duplication events. Phylogenetic relationships and transcript profiling revealed that CpMYB10 and CpMYB4 are involved in the regulation of anthocyanin biosynthesis in Chinese cherry fruits. Expression patterns, transient overexpression and VIGS results confirmed that CpMYB10 promotes anthocyanin accumulation in the fruit skin, while CpMYB4 acts as a repressor, inhibiting anthocyanin biosynthesis of Chinese cherry. Conclusions: This study provides a comprehensive and systematic analysis of R2R3-MYB gene family in Chinese cherry and Rosaceae relatives, and identifies two regulators, CpMYB10 and CpMYB4, involved in anthocyanin biosynthesis in Chinese cherry. These results help to develop and utilize the potential functions of anthocyanins in Chinese cherry. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. A graph-learning based model for automatic diagnosis of Sjögren's syndrome on digital pathological images: a multicentre cohort study.
- Author
-
Wu, Ruifan, Chen, Zhipei, Yu, Jiali, Lai, Peng, Chen, Xuanyi, Han, Anjia, Xu, Meng, Fan, Zhaona, Cheng, Bin, Jiang, Ying, and Xia, Juan
- Subjects
SJOGREN'S syndrome ,SYSTEMIC lupus erythematosus ,RECEIVER operating characteristic curves ,SIALADENITIS ,SALIVARY glands - Abstract
Background: Sjögren's Syndrome (SS) is a rare chronic autoimmune disorder primarily affecting adult females, characterized by chronic inflammation and salivary and lacrimal gland dysfunction. It is often associated with systemic lupus erythematosus, rheumatoid arthritis and kidney disease, which can lead to increased mortality. Early diagnosis is critical, but traditional methods for diagnosing SS, mainly through histopathological evaluation of salivary gland tissue, have limitations. Methods: The study used 100 labial gland biopsy, creating whole-slide images (WSIs) for analysis. The proposed model, named Cell-tissue-graph-based pathological image analysis model (CTG-PAM) and based on graph theory, characterizes single-cell feature, cell-cell feature, and cell-tissue feature. Building upon these features, CTG-PAM achieves cellular-level classification, enabling lymphocyte recognition. Furthermore, it leverages connected component analysis techniques in the cell graph structure to perform SS diagnosis based on lymphocyte counts. Findings: CTG-PAM outperforms traditional deep learning methods in diagnosing SS. Its area under the receiver operating characteristic curve (AUC) is 1.0 for the internal validation dataset and 0.8035 for the external test dataset. This indicates high accuracy. The sensitivity of CTG-PAM for the external dataset is 98.21%, while the accuracy is 93.75%. In comparison, the sensitivity and accuracy for traditional deep learning methods (ResNet-50) are lower. The study also shows that CTG-PAM's diagnostic accuracy is closer to skilled pathologists compared to beginners. Interpretation: Our findings indicate that CTG-PAM is a reliable method for diagnosing SS. Additionally, CTG-PAM shows promise in enhancing the prognosis of SS patients and holds significant potential for the differential diagnosis of both non-neoplastic and neoplastic diseases. The AI model potentially extends its application to diagnosing immune cells in tumor microenvironments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. BEROLECMI: a novel prediction method to infer circRNA-miRNA interaction from the role definition of molecular attributes and biological networks.
- Author
-
Wang, Xin-Fei, Yu, Chang-Qing, You, Zhu-Hong, Wang, Yan, Huang, Lan, Qiao, Yan, Wang, Lei, and Li, Zheng-Wei
- Subjects
COMPETITIVE endogenous RNA ,BIOLOGICAL networks ,MORPHOLOGY ,NON-coding RNA ,PREDICTION models ,CIRCULAR RNA - Abstract
Circular RNA (CircRNA)–microRNA (miRNA) interaction (CMI) is an important model for the regulation of biological processes by non-coding RNA (ncRNA), which provides a new perspective for the study of human complex diseases. However, the existing CMI prediction models mainly rely on the nearest neighbor structure in the biological network, ignoring the molecular network topology, so it is difficult to improve the prediction performance. In this paper, we proposed a new CMI prediction method, BEROLECMI, which uses molecular sequence attributes, molecular self-similarity, and biological network topology to define the specific role feature representation for molecules to infer the new CMI. BEROLECMI effectively makes up for the lack of network topology in the CMI prediction model and achieves the highest prediction performance in three commonly used data sets. In the case study, 14 of the 15 pairs of unknown CMIs were correctly predicted. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Drug repositioning based on residual attention network and free multiscale adversarial training.
- Author
-
Li, Guanghui, Li, Shuwen, Liang, Cheng, Xiao, Qiu, and Luo, Jiawei
- Subjects
DRUG repositioning ,BIPARTITE graphs ,DRUG development ,THERAPEUTICS ,FORECASTING - Abstract
Background: Conducting traditional wet experiments to guide drug development is an expensive, time-consuming and risky process. Analyzing drug function and repositioning plays a key role in identifying new therapeutic potential of approved drugs and discovering therapeutic approaches for untreated diseases. Exploring drug-disease associations has far-reaching implications for identifying disease pathogenesis and treatment. However, reliable detection of drug-disease relationships via traditional methods is costly and slow. Therefore, investigations into computational methods for predicting drug-disease associations are currently needed. Results: This paper presents a novel drug-disease association prediction method, RAFGAE. First, RAFGAE integrates known associations between diseases and drugs into a bipartite network. Second, RAFGAE designs the Re_GAT framework, which includes multilayer graph attention networks (GATs) and two residual networks. The multilayer GATs are utilized for learning the node embeddings, which is achieved by aggregating information from multihop neighbors. The two residual networks are used to alleviate the deep network oversmoothing problem, and an attention mechanism is introduced to combine the node embeddings from different attention layers. Third, two graph autoencoders (GAEs) with collaborative training are constructed to simulate label propagation to predict potential associations. On this basis, free multiscale adversarial training (FMAT) is introduced. FMAT enhances node feature quality through small gradient adversarial perturbation iterations, improving the prediction performance. Finally, tenfold cross-validations on two benchmark datasets show that RAFGAE outperforms current methods. In addition, case studies have confirmed that RAFGAE can detect novel drug-disease associations. Conclusions: The comprehensive experimental results validate the utility and accuracy of RAFGAE. We believe that this method may serve as an excellent predictor for identifying unobserved disease-drug associations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Benchmarking clustering, alignment, and integration methods for spatial transcriptomics.
- Author
-
Hu, Yunfei, Xie, Manfei, Li, Yikang, Rao, Mingxing, Shen, Wenjun, Luo, Can, Qin, Haoran, Baek, Jihoon, and Zhou, Xin Maizie
- Published
- 2024
- Full Text
- View/download PDF
35. Shirebi granules ameliorate acute gouty arthritis by inhibiting NETs-induced imbalance between immunity and inflammation.
- Author
-
Li, Xin, Mao, Xia, Jiang, Hong, Xia, Cong, Fu, Lu, Gao, Wenjing, Chen, Wenjia, Li, Weijie, Wang, Ping, Zhang, Yanqiong, and Xu, Haiyu
- Subjects
INFLAMMATION prevention ,CHINESE medicine ,ACUTE diseases ,PHENOMENOLOGICAL biology ,RESEARCH funding ,HERBAL medicine ,CELLULAR signal transduction ,FLUORESCENT antibody technique ,RATS ,BIOINFORMATICS ,GOUT ,ANIMAL experimentation ,WESTERN immunoblotting ,INFLAMMATION ,BIOMARKERS ,DRUG dosage ,THERAPEUTICS ,DRUG administration - Abstract
Background: Acute gouty arthritis (AGA) is classified as 'arthritis' in traditional Chinese medicine (TCM) theory. Shirebi granules (SGs), derived from the classic prescription SiMiaoWan, exerts satisfying therapeutic efficacy in ameliorating AGA clinically. However, the underlying mechanisms of SGs against AGA remain unclarified. Methods: AGA-related biological processes, signal pathways and biomarker genes were mined from the GEO database through bioinformatics. SGs components were systematically recognized using the UPLC-Q-TOF–MS/MS. A correlation network was established based on the biomarker genes and the chemical components, from which the signal pathway used for further study was selected. Finally, we established an AGA model using SD rats injected with monosodium urate (MSU) in the ankle joint for experimental validation. A combination of behavioral tests, H&E, safranin O- fast green, western blotting, and immunofluorescence were employed to reveal the mechanism of action of SGs on AGA. Results: The deterioration of AGA was significantly related to the imbalance between immunity and inflammation, neutrophil chemotaxis and inflammatory factor activation. HDAC5, PRKCB, NFκB1, MPO, PRKCA, PIK3CA were identified to be the candidate targets of SGs against AGA, associated with neutrophil extracellular traps (NETs) signal pathway. Animal experiments demonstrated that SGs effectively repaired cartilage damage, blocked TLR4 activation, and inhibited the expression of NETs indicators and inflammatory factors. In addition, SGs prominently alleviated joint redness and swelling, improved joint dysfunction, inhibited inflammatory infiltration of AGA rats. Conclusion: Our data reveal that SGs may effectively alleviate the disease severity of AGA by suppressing NETs-promoted imbalance between immunity and inflammation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. COMSE: analysis of single-cell RNA-seq data using community detection-based feature selection.
- Author
-
Luo, Qinhuan, Chen, Yaozhu, and Lan, Xun
- Subjects
FEATURE selection ,RNA sequencing ,CELL cycle ,DATA analysis ,GENES - Abstract
Background: Single-cell RNA sequencing enables studying cells individually, yet high gene dimensions and low cell numbers challenge analysis. And only a subset of the genes detected are involved in the biological processes underlying cell-type specific functions. Result: In this study, we present COMSE, an unsupervised feature selection framework using community detection to capture informative genes from scRNA-seq data. COMSE identified homogenous cell substates with high resolution, as demonstrated by distinguishing different cell cycle stages. Evaluations based on real and simulated scRNA-seq datasets showed COMSE outperformed methods even with high dropout rates in cell clustering assignment. We also demonstrate that by identifying communities of genes associated with batch effects, COMSE parses signals reflecting biological difference from noise arising due to differences in sequencing protocols, thereby enabling integrated analysis of scRNA-seq datasets of different sources. Conclusions: COMSE provides an efficient unsupervised framework that selects highly informative genes in scRNA-seq data improving cell sub-states identification and cell clustering. It identifies gene subsets that reveal biological and technical heterogeneity, supporting applications like batch effect correction and pathway analysis. It also provides robust results for bulk RNA-seq data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. scPriorGraph: constructing biosemantic cell–cell graphs with prior gene set selection for cell type identification from scRNA-seq data.
- Author
-
Cao, Xiyue, Huang, Yu-An, You, Zhu-Hong, Shang, Xuequn, Hu, Lun, Hu, Peng-Wei, and Huang, Zhi-An
- Published
- 2024
- Full Text
- View/download PDF
38. STdGCN: spatial transcriptomic cell-type deconvolution using graph convolutional networks.
- Author
-
Li, Yawei and Luo, Yuan
- Published
- 2024
- Full Text
- View/download PDF
39. Identification of TFRC as a biomarker for pulmonary arterial hypertension based on bioinformatics and experimental verification.
- Author
-
Yang, Chuang, Liu, Yi-Hang, and Zheng, Hai-Kuo
- Subjects
PULMONARY arterial hypertension ,GENE expression ,LABORATORY rats ,GENE regulatory networks ,GENE expression profiling - Abstract
Background: Pulmonary arterial hypertension (PAH) is a life-threatening chronic cardiopulmonary disease. However, there is a paucity of studies that reflect the available biomarkers from separate gene expression profiles in PAH. Methods: The GSE131793 and GSE113439 datasets were combined for subsequent analyses, and batch effects were removed. Bioinformatic analysis was then performed to identify differentially expressed genes (DEGs). Weighted gene co-expression network analysis (WGCNA) and a protein-protein interaction (PPI) network analysis were then used to further filter the hub genes. Functional enrichment analysis of the intersection genes was performed using Gene Ontology (GO), Disease Ontology (DO), Kyoto encyclopedia of genes and genomes (KEGG) and gene set enrichment analysis (GSEA). The expression level and diagnostic value of hub gene expression in pulmonary arterial hypertension (PAH) patients were also analyzed in the validation datasets GSE53408 and GSE22356. In addition, target gene expression was validated in the lungs of a monocrotaline (MCT)-induced pulmonary hypertension (PH) rat model and in the serum of PAH patients. Results: A total of 914 differentially expressed genes (DEGs) were identified, with 722 upregulated and 192 downregulated genes. The key module relevant to PAH was selected using WGCNA. By combining the DEGs and the key module of WGCNA, 807 genes were selected. Furthermore, protein–protein interaction (PPI) network analysis identified HSP90AA1, CD8A, HIF1A, CXCL8, EPRS1, POLR2B, TFRC, and PTGS2 as hub genes. The GSE53408 and GSE22356 datasets were used to evaluate the expression of TFRC, which also showed robust diagnostic value. According to GSEA enrichment analysis, PAH-relevant biological functions and pathways were enriched in patients with high TFRC levels. Furthermore, TFRC expression was found to be upregulated in the lung tissues of our experimental PH rat model compared to those of the controls, and the same conclusion was reached in the serum of the PAH patients. Conclusions: According to our bioinformatics analysis, the observed increase of TFRC in the lung tissue of human PAH patients, as indicated by transcriptomic data, is consistent with the alterations observed in PAH patients and rodent models. These data suggest that TFRC may serve as a potential biomarker for PAH. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Pathways to experienced coercion during psychiatric admission: a network analysis.
- Author
-
Silva, Benedetta, Morandi, Stéphane, Bachelard, Mizue, Bonsack, Charles, and Golay, Philippe
- Abstract
Background: In mental health care, experienced coercion, also known as perceived coercion, is defined as the patient's subjective experience of being submitted to coercion. Besides formal coercion, many other factors have been identified as potentially affecting the experience of being coerced. This study aimed to explore the interplay between these factors and to provide new insights into how they lead to experienced coercion. Methods: Cross-sectional network analysis was performed on data collected from 225 patients admitted to six psychiatric hospitals. Thirteen variables were selected and included in the analyses. A Gaussian Graphical Model (GGM) using Spearman's rank-correlation method and EBICglasso regularisation was estimated. Centrality indices of strength and expected influence were computed. To evaluate the robustness of the estimated parameters, both edge-weight accuracy and centrality stability were investigated. Results: The estimated network was densely connected. Formal coercion was only weakly associated with both experienced coercion at admission and during hospital stay. Experienced coercion at admission was most strongly associated with the patients' perceived level of implication in the decision-making process. Experienced humiliation and coercion during hospital stay, the most central node in the network, was found to be most strongly related to the interpersonal separation that patients perceived from staff, the level of coercion perceived upon admission and their satisfaction with the decision taken and the level of information received. Conclusions: Reducing formal coercion may not be sufficient to effectively reduce patients' feeling of being coerced. Different factors seemed indeed to come into play and affect experienced coercion at different stages of the hospitalisation process. Interventions aimed at reducing experienced coercion and its negative effects should take these stage-specific elements into account and propose tailored strategies to address them. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Ant colony optimization for the identification of dysregulated gene subnetworks from expression data.
- Author
-
Hanna, Eileen Marie, El Hasbani, Ghadi, and Azar, Danielle
- Abstract
Background: High-throughput experimental technologies can provide deeper insights into pathway perturbations in biomedical studies. Accordingly, their usage is central to the identification of molecular targets and the subsequent development of suitable treatments for various diseases. Classical interpretations of generated data, such as differential gene expression and pathway analyses, disregard interconnections between studied genes when looking for gene-disease associations. Given that these interconnections are central to cellular processes, there has been a recent interest in incorporating them in such studies. The latter allows the detection of gene modules that underlie complex phenotypes in gene interaction networks. Existing methods either impose radius-based restrictions or freely grow modules at the expense of a statistical bias towards large modules. We propose a heuristic method, inspired by Ant Colony Optimization, to apply gene-level scoring and module identification with distance-based search constraints and penalties, rather than radius-based constraints. Results: We test and compare our results to other approaches using three datasets of different neurodegenerative diseases, namely Alzheimer's, Parkinson's, and Huntington's, over three independent experiments. We report the outcomes of enrichment analyses and concordance of gene-level scores for each disease. Results indicate that the proposed approach generally shows superior stability in comparison to existing methods. It produces stable and meaningful enrichment results in all three datasets which have different case to control proportions and sample sizes. Conclusion: The presented network-based gene expression analysis approach successfully identifies dysregulated gene modules associated with a certain disease. Using a heuristic based on Ant Colony Optimization, we perform a distance-based search with no radius constraints. Experimental results support the effectiveness and stability of our method in prioritizing modules of high relevance. Our tool is publicly available at github.com/GhadiElHasbani/ACOxGS.git. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Interactive molecular causal networks of hypertension using a fast machine learning algorithm MRdualPC.
- Author
-
Kelly, Jack, Xu, Xiaoguang, Eales, James M., Keavney, Bernard, Berzuini, Carlo, Tomaszewski, Maciej, and Guo, Hui
- Abstract
Background: Understanding the complex interactions between genes and their causal effects on diseases is crucial for developing targeted treatments and gaining insight into biological mechanisms. However, the analysis of molecular networks, especially in the context of high-dimensional data, presents significant challenges. Methods: This study introduces MRdualPC, a computationally tractable algorithm based on the MRPC approach, to infer large-scale causal molecular networks. We apply MRdualPC to investigate the upstream causal transcriptomics influencing hypertension using a comprehensive dataset of kidney genome and transcriptome data. Results: Our algorithm proves to be 100 times faster than MRPC on average in identifying transcriptomics drivers of hypertension. Through clustering, we identify 63 modules with causal driver genes, including 17 modules with extensive causal networks. Notably, we find that genes within one of the causal networks are associated with the electron transport chain and oxidative phosphorylation, previously linked to hypertension. Moreover, the identified causal ancestor genes show an over-representation of blood pressure-related genes. Conclusions: MRdualPC has the potential for broader applications beyond gene expression data, including multi-omics integration. While there are limitations, such as the need for clustering in large gene expression datasets, our study represents a significant advancement in building causal molecular networks, offering researchers a valuable tool for analyzing big data and investigating complex diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Integrative analysis of transcriptome and metabolome provides insights into the mechanisms of leaf variegation in Heliopsis helianthoides.
- Author
-
Qin, Helan, Guo, Jia, Jin, Yingshan, Li, Zijing, Chen, Ju, Bie, Zhengwei, Luo, Chunyu, Peng, Feitong, Yan, Dongyan, Kong, Qinggang, Liang, Fang, Zhang, Hua, Hu, Xuefan, Cui, Rongfeng, and Cui, Xiuna
- Abstract
Background: In the field of ornamental horticulture, phenotypic mutations, particularly in leaf color, are of great interest due to their potential in developing new plant varieties. The introduction of variegated leaf traits in plants like Heliopsis helianthoides, a perennial herbaceous species with ecological adaptability, provides a rich resource for molecular breeding and research on pigment metabolism and photosynthesis. We aimed to explore the mechanism of leaf variegation of Heliopsis helianthoides (using HY2021F1-0915 variegated mutant named HY, and green-leaf control check named CK in 2020 April, May and June) by analyzing the transcriptome and metabolome. Results: Leaf color and physiological parameters were found to be significantly different between HY and CK types. Chlorophyll content of HY was lower than that of CK samples. Combined with the result of Weighted Gene Co-expression Network Analysis (WGCNA), 26 consistently downregulated differentially expressed genes (DEGs) were screened in HY compared to CK subtypes. Among the DEGs, 9 genes were verified to be downregulated in HY than CK by qRT-PCR. The reduction of chlorophyll content in HY might be due to the downregulation of FSD2. Low expression level of PFE2, annotated as ferritin-4, might also contribute to the interveinal chlorosis of HY. Based on metabolome data, differential metabolites (DEMs) between HY and CK samples were significantly enriched on ABC transporters in three months. By integrating DEGs and DEMs, they were enriched on carotenoids pathway. Downregulation of four carotenoid pigments might be one of the reasons for HY's light color. Conclusion: FSD2 and PFE2 (ferritin-4) were identified as key genes which likely contribute to the reduced chlorophyll content and interveinal chlorosis observed in HY. The differential metabolites were significantly enriched in ABC transporters. Carotenoid biosynthesis pathway was highlighted with decreased pigments in HY individuals. These findings not only enhance our understanding of leaf variegation mechanisms but also offer valuable insights for future plant breeding strategies aimed at preserving and enhancing variegated-leaf traits in ornamental plants. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Radiogenomic analysis for predicting lymph node metastasis and molecular annotation of radiomic features in pancreatic cancer.
- Author
-
Tang, Yi, Su, Yi-xi, Zheng, Jin-mei, Zhuo, Min-ling, Qian, Qing-fu, Shen, Qing-ling, Lin, Peng, and Chen, Zhi-kui
- Subjects
LYMPHATIC metastasis ,PANCREATIC cancer ,MACHINE learning ,FEATURE extraction ,GENE regulatory networks - Abstract
Background: To provide a preoperative prediction model for lymph node metastasis in pancreatic cancer patients and provide molecular information of key radiomic features. Methods: Two cohorts comprising 151 and 54 pancreatic cancer patients were included in the analysis. Radiomic features from the tumor region of interests were extracted by using PyRadiomics software. We used a framework that incorporated 10 machine learning algorithms and generated 77 combinations to construct radiomics-based models for lymph node metastasis prediction. Weighted gene coexpression network analysis (WGCNA) was subsequently performed to determine the relationships between gene expression levels and radiomic features. Molecular pathways enrichment analysis was performed to uncover the underlying molecular features. Results: Patients in the in-house cohort (mean age, 61.3 years ± 9.6 [SD]; 91 men [60%]) were separated into training (n = 105, 70%) and validation (n = 46, 30%) cohorts. A total of 1,239 features were extracted and subjected to machine learning algorithms. The 77 radiomic models showed moderate performance for predicting lymph node metastasis, and the combination of the StepGBM and Enet algorithms had the best performance in the training (AUC = 0.84, 95% CI = 0.77–0.91) and validation (AUC = 0.85, 95% CI = 0.73–0.98) cohorts. We determined that 15 features were core variables for lymph node metastasis. Proliferation-related processes may respond to the main molecular alterations underlying these features. Conclusions: Machine learning-based radiomics could predict the status of lymph node metastasis in pancreatic cancer, which is associated with proliferation-related alterations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Identification of steroid-induced osteonecrosis of the femoral head biomarkers based on immunization and animal experiments.
- Author
-
Luo, Dongqiang, Gao, Xiaolu, Zhu, Xianqiong, Wu, Jiayu, Yang, Qingyi, Xu, Ying, Huang, Yuxuan, He, Xiaolin, Li, Yan, and Gao, Pengfei
- Subjects
ANIMAL experimentation ,FEMUR head ,GENE ontology ,BIOMARKERS ,IDIOPATHIC femoral necrosis ,GENE regulatory networks ,GENE expression - Abstract
Background: Steroid-induced osteonecrosis of femoral head (SONFH) is a severe health risk, and this study aims to identify immune-related biomarkers and pathways associated with the disease through bioinformatics analysis and animal experiments. Method: Using SONFH-related datasets obtained from the GEO database, we performed differential expression analysis and weighted gene co-expression network analysis (WGCNA) to extract SONFH-related genes. A protein-protein interaction (PPI) network was then constructed, and core sub-network genes were identified. Immune cell infiltration and clustering analysis of SONFH samples were performed to assess differences in immune cell populations. WGCNA analysis was used to identify module genes associated with immune cells, and hub genes were identified using machine learning. Internal and external validation along with animal experiments were conducted to confirm the differential expression of hub genes and infiltration of immune cells in SONFH. Results: Differential expression analysis revealed 502 DEGs. WGCNA analysis identified a blue module closely related to SONFH, containing 1928 module genes. Intersection analysis between DEGs and blue module genes resulted in 453 intersecting genes. The PPI network and MCODE module identified 15 key targets enriched in various signaling pathways. Analysis of immune cell infiltration showed statistically significant differences in CD8 + t cells, monocytes, macrophages M2 and neutrophils between SONFH and control samples. Unsupervised clustering classified SONFH samples into two clusters (C1 and C2), which also exhibited significant differences in immune cell infiltration. The hub genes (ICAM1, NR3C1, and IKBKB) were further identified using WGCNA and machine learning analysis. Based on these hub genes, a clinical prediction model was constructed and validated internally and externally. Animal experiments confirmed the upregulation of hub genes in SONFH, with an associated increase in immune cell infiltration. Conclusion: This study identified ICAM1, NR3C1, and IKBKB as potential immune-related biomarkers involved in immune cell infiltration of CD8 + t cells, monocytes, macrophages M2, neutrophils and other immune cells in the pathogenesis of SONFH. These biomarkers act through modulation of the chemokine signaling pathway, Toll-like receptor signaling pathway, and other pathways. These findings provide valuable insights into the disease mechanism of SONFH and may aid in future drug development efforts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Comparative transcriptome analysis reveals the impact of daily temperature difference on male sterility in photo-thermo-sensitive male sterile wheat.
- Author
-
Niu, Fuqiang, Liu, Zihan, Liu, Yongjie, Bai, Jianfang, Zhang, Tianbao, Yuan, Shaohua, Bai, Xiucheng, Zhao, Changping, Zhang, Fengting, Sun, Hui, Zhang, Liping, and Song, Xiyue
- Abstract
Background: Photo-thermo-sensitive male sterility (PTMS), which refers to the male sterility triggered by variations in photoperiod and temperature, is a crucial element in the wheat two-line hybrid system. The development of safe production and efficient propagation for male sterile lines holds utmost importance in two-line hybrid wheat. Under the stable photoperiod condition, PTMS is mainly induced by high or low temperatures in wheat, but the effect of daily temperature difference (DTD) on the fertility conversion of PTMS lines has not been reported. Here, three BS type PTMS lines including BS108, BS138, and BS366, as well as a control wheat variety J411 were used to analyze the correlation between fertility and DTD using differentially sowing tests, photo-thermo-control experiments, and transcriptome sequencing. Results: The differentially sowing tests suggested that the optimal sowing time for safe seed production of the three PTMS lines was from October 5th to 25th in Dengzhou, China. Under the condition of 12 h 12 °C, the PTMS lines were greatly affected by DTD and exhibited complete male sterility at a temperature difference of 15 °C. Furthermore, under different temperature difference conditions, a total of 20,677 differentially expressed genes (DEGs) were obtained using RNA sequencing. Moreover, through weighted gene co-expression network analysis (WGCNA) and KEGG enrichment analysis, the identified DEGs had a close association with "starch and sucrose metabolism", "phenylpropanoid biosynthesis", "MAPK signaling pathway-plant", "flavonoid biosynthesis", and "cutin, and suberine and wax biosynthesis". qRT-PCR analysis showed the expression levels of core genes related to KEGG pathways significantly decreased at a temperature difference of 15 ° C. Finally, we constructed a transcriptome mediated network of temperature difference affecting male sterility. Conclusions: The findings provide important theoretical insights into the correlation between temperature difference and male sterility, providing guidance for the identification and selection of more secure and effective PTMS lines. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in silico exploration.
- Author
-
Yang, Xiuhui, Mann, Koren K., Wu, Hao, and Ding, Jun
- Published
- 2024
- Full Text
- View/download PDF
48. Unraveling pathogenesis, biomarkers and potential therapeutic agents for endometriosis associated with disulfidptosis based on bioinformatics analysis, machine learning and experiment validation.
- Author
-
Zhao, Xiaoxuan, Zhao, Yang, Zhang, Yuanyuan, Fan, Qingnan, Ke, Huanxiao, Chen, Xiaowei, Jin, Linxi, Tang, Hongying, Jiang, Yuepeng, and Ma, Jing
- Subjects
ENDOMETRIUM ,MACHINE learning ,HLA histocompatibility antigens ,BIOMARKERS ,APOPTOSIS ,ENDOMETRIOSIS - Abstract
Background: Endometriosis (EMs) is an enigmatic disease of yet-unknown pathogenesis. Disulfidptosis, a novel identified form of programmed cell death resulting from disulfide stress, stands a chance of treating diverse ailments. However, the potential roles of disulfidptosis-related genes (DRGs) in EMs remain elusive. This study aims to thoroughly explore the key disulfidptosis genes involved in EMs, and probe novel diagnostic markers and candidate therapeutic compounds from the aspect of disulfidptosis based on bioinformatics analysis, machine learning, and animal experiments. Results: Enrichment analysis on key module genes and differentially expressed genes (DEGs) of eutopic and ectopic endometrial tissues in EMs suggested that EMs was closely related to disulfidptosis. And then, we obtained 20 and 16 disulfidptosis-related DEGs in eutopic and ectopic endometrial tissue, respectively. The protein-protein interaction (PPI) network revealed complex interactions between genes, and screened nine and ten hub genes in eutopic and ectopic endometrial tissue, respectively. Furthermore, immune infiltration analysis uncovered distinct differences in the immunocyte, human leukocyte antigen (HLA) gene set, and immune checkpoints in the eutopic and ectopic endometrial tissues when compared with health control. Besides, the hub genes mentioned above showed a close correlation with the immune microenvironment of EMs. Furthermore, four machine learning algorithms were applied to screen signature genes in eutopic and ectopic endometrial tissue, including the binary logistic regression (BLR), the least absolute shrinkage and selection operator (LASSO), the support vector machine-recursive feature elimination (SVM-RFE), and the extreme gradient boosting (XGBoost). Model training and hyperparameter tuning were implemented on 80% of the data using a ten-fold cross-validation method, and tested in the testing sets which determined the excellent diagnostic performance of these models by six indicators (Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, Accuracy, and Area Under Curve). And seven eutopic signature genes (ACTB, GYS1, IQGAP1, MYH10, NUBPL, SLC7A11, TLN1) and five ectopic signature genes (CAPZB, CD2AP, MYH10, OXSM, PDLIM1) were finally identified based on machine learning. The independent validation dataset also showed high accuracy of the signature genes (IQGAP1, SLC7A11, CD2AP, MYH10, PDLIM1) in predicting EMs. Moreover, we screened 12 specific compounds for EMs based on ectopic signature genes and the pharmacological impact of tretinoin on signature genes was further verified in the ectopic lesion in the EMs murine model. Conclusion: This study verified a close association between disulfidptosis and EMs based on bioinformatics analysis, machine learning, and animal experiments. Further investigation on the biological mechanism of disulfidptosis in EMs is anticipated to yield novel advancements for searching for potential diagnostic biomarkers and revolutionary therapeutic approaches in EMs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Leveraging shortest dependency paths in low-resource biomedical relation extraction.
- Author
-
Enayati, Saman and Vucetic, Slobodan
- Subjects
NATURAL language processing - Abstract
Background: Biomedical Relation Extraction (RE) is essential for uncovering complex relationships between biomedical entities within text. However, training RE classifiers is challenging in low-resource biomedical applications with few labeled examples. Methods: We explore the potential of Shortest Dependency Paths (SDPs) to aid biomedical RE, especially in situations with limited labeled examples. In this study, we suggest various approaches to employ SDPs when creating word and sentence representations under supervised, semi-supervised, and in-context-learning settings. Results: Through experiments on three benchmark biomedical text datasets, we find that incorporating SDP-based representations enhances the performance of RE classifiers. The improvement is especially notable when working with small amounts of labeled data. Conclusion: SDPs offer valuable insights into the complex sentence structure found in many biomedical text passages. Our study introduces several straightforward techniques that, as demonstrated experimentally, effectively enhance the accuracy of RE classifiers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Choices of measures of association affect the visualisation and composition of the multimorbidity networks.
- Author
-
Baneshi, Mohammad Reza, Dobson, Annette, and Mishra, Gita D.
- Subjects
COMORBIDITY ,DEATH certificates ,ODDS ratio ,CHRONIC diseases - Abstract
Background: Network analysis, commonly used to describe the patterns of multimorbidity, uses the strength of association between conditions as weight to classify conditions into communities and calculate centrality statistics. Our aim was to examine the robustness of the results to the choice of weight. Methods: Data used on 27 chronic conditions listed on Australian death certificates for women aged 85+. Five statistics were calculated to measure the association between 351 possible pairs: odds ratio (OR), lift, phi correlation, Salton cosine index (SCI), and normalised-joint frequency of pairs (NF). Network analysis was performed on the 10% of pairs with the highest weight according to each definition, the 'top pairs'. Results: Out of 56 'top pairs' identified, 13 ones were consistent across all statistics. In networks of OR and lift, three of the conditions which did not join communities were among the top five most prevalent conditions. Networks based on phi and NF had one or two conditions not part of any community. For the SCI statistics, all three conditions which did not join communities had prevalence below 3%. Low prevalence conditions were more likely to have high degree in networks of OR and lift but not SCI. Conclusion: Use of different statistics to estimate weights leads to different networks. For exploratory purposes, one may apply alternative weights to identify a large list of pairs for further assessment in independent studies. However, when the aim is to visualise the data in a robust and parsimonious network, only pairs which are selected by multiple statistics should be visualised. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.