383 results on '"Xiujuan Lei"'
Search Results
2. SeaConvNeXt: A Lightweight Two-Branch Network Architecture for Efficient Prediction of Specific IHC Proteins and Antigens on Hematoxylin and Eosin (H&E) Images
- Author
-
Yuli Chen, Guoping Chen, Guoying Shi, Yao Zhou, Jiayang Bai, Germán Corredor, Cheng Lu, and Xiujuan Lei
- Subjects
immunohistochemistry (ihc) ,bi-stage registration based on density clustering (birec) ,automatic label generation ,seaconvnext ,attention mechanism ,multi-level local and global features ,virtual ihc staning prediction ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Immunohistochemistry (IHC) is a vital technique for detecting specific proteins and antigens in tissue sections using antibodies, aiding in the analysis of tumor growth and metastasis. However, IHC is costly and time-consuming, making it challenging to implement on a large scale. To address this issue, we introduce a method that enables virtual IHC staining directly on Hematoxylin and Eosin (H&E) images. Firstly, we have developed a novel registration technique, called Bi-stage Registration based on density Clustering (BiReC), to enhance the registration efficiency between H&E and IHC images. This method involves automatically generating numerous Regions Of Interest (ROI) labels on the H&E image for model training, with the labels being determined by the intensity of IHC staining. Secondly, we propose a novel two-branch network architecture, called SeaConvNeXt, which integrates a lightweight Squeeze-Enhanced Axial (SEA) attention mechanism to efficiently extract and fuse multi-level local and global features from H&E images for direct prediction of specific proteins and antigens. The SeaConvNeXt consists of a ConvNeXt branch and a global fusion branch. The ConvNeXt branch extracts multi-level local features at four stages, while the global fusion branch, including an SEA Transformer module and three global blocks, is designed for global feature extraction and multiple feature fusion. Our experiments demonstrate that SeaConvNeXt outperforms current state-of-the-art methods on two public datasets with corresponding IHC and H&E images, achieving an AUC of 90.7% on the HER2SC dataset and 82.5% on the CRC dataset. These results suggest that SeaConvNeXt has great potential for predicting virtual IHC staining on H&E images.
- Published
- 2024
- Full Text
- View/download PDF
3. Molecular Generation and Optimization of Molecular Properties Using a Transformer Model
- Author
-
Zhongyin Xu, Xiujuan Lei, Mei Ma, and Yi Pan
- Subjects
molecular optimization ,transformer ,matched molecular pairs (mmps) ,logd ,solubility ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.
- Published
- 2024
- Full Text
- View/download PDF
4. Whole-genome identification and expression profiling of growth-regulating factor (GRF) and GRF-interacting factor (GIF) gene families in Panax ginseng
- Author
-
Ping Wang, Ying Xiao, Min Yan, Yan Yan, Xiujuan Lei, Peng Di, and Yingping Wang
- Subjects
Growth-regulating factor ,GRF-interacting factor ,Panax ginseng ,Expression pattern ,Cis-acting elements ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Panax ginseng is a perennial herb and one of the most widely used traditional medicines in China. During its long growth period, it is affected by various environmental factors. Past studies have shown that growth-regulating factors (GRFs) and GRF-interacting factors (GIFs) are involved in regulating plant growth and development, responding to environmental stress, and responding to the induction of exogenous hormones. However, GRF and GIF transcription factors in ginseng have not been reported. Results In this study, 20 GRF gene members of ginseng were systematically identified and found to be distributed on 13 chromosomes. The ginseng GIF gene family has only ten members, which are distributed on ten chromosomes. Phylogenetic analysis divided these PgGRFs into six clades and PgGIFs into two clades. In total, 18 of the 20 PgGRFs and eight of the ten PgGIFs are segmental duplications. Most PgGRF and PgGIF gene promoters contain some hormone- and stress- related cis-regulatory elements. Based on the available public RNA-Seq data, the expression patterns of PgGRF and PgGIF genes were analysed from 14 different tissues. The responses of the PgGRF gene to different hormones (6-BA, ABA, GA3, IAA) and abiotic stresses (cold, heat, drought, and salt) were studied. The expression of the PgGRF gene was significantly upregulated under GA3 induction and three weeks of heat treatment. The expression level of the PgGIF gene changed only slightly after one week of heat treatment. Conclusions The results of this study may be helpful for further study of the function of PgGRF and PgGIF genes and lay a foundation for further study of their role in the growth and development of Panax ginseng.
- Published
- 2023
- Full Text
- View/download PDF
5. RMDGCN: Prediction of RNA methylation and disease associations based on graph convolutional network with attention mechanism
- Author
-
Lian Liu, Yumeng Zhou, and Xiujuan Lei
- Subjects
Biology (General) ,QH301-705.5 - Published
- 2023
6. Identification and analysis of BAHD superfamily related to malonyl ginsenoside biosynthesis in Panax ginseng
- Author
-
Ping Wang, Yan Yan, Min Yan, Xiangmin Piao, Yingping Wang, Xiujuan Lei, He Yang, Nanqi Zhang, Wanying Li, Peng Di, and Limin Yang
- Subjects
Panax ginseng ,BAHD gene family ,malonyltransferase ,malonyl ginsenoside ,biosynthesis ,Plant culture ,SB1-1110 - Abstract
IntroductionThe BAHD (benzylalcohol O-acetyl transferase, anthocyanin O-hydroxycinnamoyl transferase, N-hydroxycinnamoyl anthranilate benzoyl transferase and deacetylvindoline 4-O-acetyltransferase), has various biological functions in plants, including catalyzing the biosynthesis of terpenes, phenolics and esters, participating in plant stress response, affecting cell stability, and regulating fruit quality. MethodsBioinformatics methods, real-time fluorescence quantitative PCR technology, and ultra-high-performance liquid chromatography combined with an Orbitrap mass spectrometer were used to explore the relationship between the BAHD gene family and malonyl ginsenosides in Panax ginseng. ResultsIn this study, 103 BAHD genes were identified in P. ginseng, mainly distributed in three major clades. Most PgBAHDs contain cis-acting elements associated with abiotic stress response and plant hormone response. Among the 103 genes, 68 PgBAHDs are WGD (whole-genome duplication) genes. The significance of malonylation in biosynthesis has garnered considerable attention in the study of malonyltransferases. The phylogenetic tree results showed 34 PgBAHDs were clustered with genes that have malonyl characterization. Among them, seven PgBAHDs (PgBAHD4, 45, 65, 74, 90, 97, and 99) showed correlations > 0.9 with crucial enzyme genes involved in ginsenoside biosynthesis and > 0.8 with malonyl ginsenosides. These seven genes were considered potential candidates involved in the biosynthesis of malonyl ginsenosides. DiscussionThese results help elucidate the structure, evolution, and functions of the P. ginseng BAHD gene family, and establish the foundation for further research on the mechanism of BAHD genes in ginsenoside biosynthesis.
- Published
- 2023
- Full Text
- View/download PDF
7. Fragment-pair based drug molecule solubility prediction through attention mechanism
- Author
-
Jianping Liu, Xiujuan Lei, Chunyan Ji, and Yi Pan
- Subjects
drug discovery ,drug molecules ,solubility prediction ,attention mechanism ,fragments ,Therapeutics. Pharmacology ,RM1-950 - Abstract
The purpose of drug discovery is to identify new drugs, and the solubility of drug molecules is an important physicochemical property in medicinal chemistry, that plays a crucial role in drug discovery. In solubility prediction, high-precision computational methods can significantly reduce the experimental costs and time associated with drug development. Therefore, artificial intelligence technologies have been widely used for solubility prediction. This study utilized the attention layer in mechanism in the deep learning model to consider the atomic-level features of the molecules, and used gated recurrent neural networks to aggregate vectors between layers. It also utilized molecular fragment technology to divide the complete molecule into pairs of fragments, extracted characteristics from each fragment pair, and finally fused the characteristics to predict the solubility of drug molecules. We compared and evaluated our method with five existing models using two performance evaluation indicators, demonstrating that our method has better performance and greater robustness.
- Published
- 2023
- Full Text
- View/download PDF
8. Molecular analysis of the 14-3-3 genes in Panax ginseng and their responses to heat stress
- Author
-
Qi Wang, Wenyue Peng, Junbo Rong, Mengyang Zhang, Wenhao Jia, Xiujuan Lei, and Yingping Wang
- Subjects
14-3-3 ,Abiotic stresses ,Heat stress ,qRT-PCR ,Gene structure ,Phylogenetic analysis ,Medicine ,Biology (General) ,QH301-705.5 - Abstract
Background Panax Ginseng is a perennial and semi-shady herb with tremendous medicinal value. Due to its unique botanical characteristics, ginseng is vulnerable to various abiotic factors during its growth and development, especially in high temperatures. Proteins encoded by 14-3-3 genes form a highly conserved protein family that widely exists in eukaryotes. The 14-3-3 family regulates the vital movement of cells and plays an essential role in the response of plants to abiotic stresses, including high temperatures. Currently, there is no relevant research on the 14-3-3 genes of ginseng. Methods The identification of the ginseng 14-3-3 gene family was mainly based on ginseng genomic data and Hidden Markov Models (HMM). We used bioinformatics-related databases and tools to analyze the gene structure, physicochemical properties, cis-acting elements, gene ontology (GO), phylogenetic tree, interacting proteins, and transcription factor regulatory networks. We analyzed the transcriptome data of different ginseng tissues to clarify the expression pattern of the 14-3-3 gene family in ginseng. The expression level and modes of 14-3-3 genes under heat stress were analyzed by quantitative real-time PCR (qRT-PCR) technology to determine the genes in the 14-3-3 gene family responding to high-temperature stress. Results In this study, 42 14-3-3 genes were identified from the ginseng genome and renamed PgGF14-1 to PgGF14-42. Gene structure and evolutionary relationship research divided PgGF14s into epsilon (ε) and non-epsilon (non-ε) groups, mainly located in four evolutionary branches. The gene structure and motif remained highly consistent within a subgroup. The physicochemical properties and structure of the predicted PgGF14 proteins conformed to the essential characteristics of 14-3-3 proteins. RNA-seq results indicated that the detected PgGF14s existed in different organs and tissues but differed in abundance; their expression was higher in roots, stems, leaves, and fruits but lower in seeds. The analysis of GO, cis-acting elements, interacting proteins, and regulatory networks of transcription factors indicated that PgGF14s might participate in physiological processes, such as response to stress, signal transduction, material synthesis-metabolism, and cell development. The qRT-PCR results indicated PgGF14s had multiple expression patterns under high-temperature stress with different change trends in several treatment times, and 38 of them had an apparent response to high-temperature stress. Furthermore, PgGF14-5 was significantly upregulated, and PgGF14-4 was significantly downregulated in all treatment times. This research lays a foundation for further study on the function of 14-3-3 genes and provides theoretical guidance for investigating abiotic stresses in ginseng.
- Published
- 2023
- Full Text
- View/download PDF
9. Shifts in rhizosphere microbial communities in Oplopanax elatus Nakai are related to soil chemical properties under different growth conditions
- Author
-
Wanying Li, Xiujuan Lei, Rui Zhang, Qingjun Cao, He Yang, Nanqi Zhang, Shuangli Liu, and Yingping Wang
- Subjects
Medicine ,Science - Abstract
Abstract Plant growth environment plays an important role in shaping soil microbial communities. To understand the response of soil rhizosphere microbial communities in Oplopanax elatus Nakai plant to a changed growth conditions from natural habitation to cultivation after transplant. Here, a comparative study of soil chemical properties and microbial community using high-throughput sequencing was conducted under cultivated conditions (CT) and natural conditions (WT), in Changbai Mountain, Northeast of China. The results showed that rhizosphere soil in CT had higher pH and lower content of soil organic matter (SOM) and available nitrogen compared to WT. These changes influenced rhizosphere soil microbial communities, resulting in higher soil bacterial and fungi richness and diversity in CT soil, and increased the relative abundance of bacterial phyla Acidobacteria, Chloroflexi, Gemmatimonadetes, Firmicutes and Patescibacteria, and the fungi phyla Mortierellomycota and Zoopagomycota, while decreased bacterial phyla Actinobacteria, WPS-2, Gemmatimonadetes, and Verrucomicrobia, and the fungi phyla Ascomycota, and Basidiomycota. Redundancy analysis analysis indicated soil pH and SOM were the primarily environmental drivers in shaping the rhizosphere soil microbial community in O. elatus under varied growth conditions. Therefore, more attention on soil nutrition management especially organic fertilizer inputs should be paid in O. elatus cultivation.
- Published
- 2022
- Full Text
- View/download PDF
10. CircR2Disease v2.0: An Updated Web Server for Experimentally Validated circRNA–disease Associations and Its Application
- Author
-
Chunyan Fan, Xiujuan Lei, Jiaojiao Tie, Yuchen Zhang, Fang-Xiang Wu, and Yi Pan
- Subjects
circRNA ,circRNA–disease association ,Graph convolutional network ,Gradient boosting decision tree ,Machine learning ,Biology (General) ,QH301-705.5 ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
With accumulating dysregulated circular RNAs (circRNAs) in pathological processes, the regulatory functions of circRNAs, especially circRNAs as microRNA (miRNA) sponges and their interactions with RNA-binding proteins (RBPs), have been widely validated. However, the collected information on experimentally validated circRNA–disease associations is only preliminary. Therefore, an updated CircR2Disease database providing a comprehensive resource and web tool to clarify the relationships between circRNAs and diseases in diverse species is necessary. Here, we present an updated CircR2Disease v2.0 with the increased number of circRNA–disease associations and novel characteristics. CircR2Disease v2.0 provides more than 5-fold experimentally validated circRNA–disease associations compared to its previous version. This version includes 4201 entries between 3077 circRNAs and 312 disease subtypes. Secondly, the information of circRNA–miRNA, circRNA–miRNA–target, and circRNA–RBP interactions has been manually collected for various diseases. Thirdly, the gene symbols of circRNAs and disease name IDs can be linked with various nomenclature databases. Detailed descriptions such as samples and journals have also been integrated into the updated version. Thus, CircR2Disease v2.0 can serve as a platform for users to systematically investigate the roles of dysregulated circRNAs in various diseases and further explore the posttranscriptional regulatory function in diseases. Finally, we propose a computational method named circDis based on the graph convolutional network (GCN) and gradient boosting decision tree (GBDT) to illustrate the applications of the CircR2Disease v2.0 database. CircR2Disease v2.0 is available at http://bioinfo.snnu.edu.cn/CircR2Disease_v2.0 and https://github.com/bioinforlab/CircR2Disease-v2.0.
- Published
- 2022
- Full Text
- View/download PDF
11. A Comparison of the Physiological Traits and Gene Expression of Brassinosteroids Signaling under Drought Conditions in Two Chickpea Cultivars
- Author
-
Khatereh Felagari, Bahman Bahramnejad, Adel Siosemardeh, Khaled Mirzaei, Xiujuan Lei, and Jian Zhang
- Subjects
chickpea ,drought stress ,abiotic stress tolerance ,gene expression ,proline ,BES1 ,Agriculture - Abstract
This study aimed to investigate the effects of drought stress at the flowering stage on the physiological and molecular responses of the genes involved in the brassinosteroid pathway of two chickpea cultivars (ILC1799: drought tolerant, and ILC3279: drought sensitive). The drought resulted in significant reductions in chlorophyll a, chlorophyll b, total chlorophyll and carotenoid content in both cultivars, and had significantly lesser effects on the tolerant cultivar, Samin, compared to that of ILC3279. However, the relative water content, the osmotic potential and the cell membrane stability were less affected by drought in both cultivars. The proline content and peroxidase activity increased significantly under drought stress in both cultivars, with a higher amount in Samin (ILC1799). Members of the BES1 family positively mediate brassinosteroid signaling and play an important role in regulating plant stress responses. The expression of these genes was analyzed in chickpea cultivars under drought. Further, a genome-wide analysis of BES1 genes in the chickpea genome was conducted. Six CaBES1 genes were identified in total, and their phylogenetic tree, gene structures, and conserved motifs were determined. CaBES1 gene expression patterns were analyzed using a transcription database and quantitative real-time PCR analysis. The results revealed that the expression of CaBES1 genes are different in response to various plant stresses. The expression levels of CaBES1.1, CaBES1.2, CaNAC72 and CaRD26 genes were measured by using qRT-PCR. The relative expression of CaBES1.2 in the drought conditions was significantly downregulated. In contrast to CaBES1.1 and CaBES1.2, the expression of CaRD26 and CaNAC72 showed a significant increase under drought stress. The expression of CaRD26 and CaNAC72 genes was significantly higher in the Samin cultivar compared to that of ILC3279 cultivars.
- Published
- 2023
- Full Text
- View/download PDF
12. A Deep Neural Network for Cervical Cell Classification Based on Cytology Images
- Author
-
Ming Fang, Xiujuan Lei, Bo Liao, and Fang-Xiang Wu
- Subjects
Cell image classification ,cervical cell detection ,deep learning ,neural networks ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Cervical cancer is one of the most common cancers among women. Fortunately, cervical cancer is treatable if it is diagnosed timely and administered appropriately. The death rate of cervical cancer has been greatly reduced since Pap smear test was applied. However, Pap smear test is a time-consuming and error-prone process. Moreover, classifying cervical cells into different categories is clinically meaningful but also challenging in the field of cervical cancer detection. To address these concerns, computer-aided diagnosis systems with deep learning need to be designed to automatically analyze cervical cytology images. In this study, we construct a deep convolutional neural network with feature representations learned via multiple kernels with different sizes to automatically classify cervical cytology images, named DeepCELL. Firstly, we design three different basic modules of DeepCELL to capture feature information via multiple kernels with different sizes. Then, we stack several such basic modules to form the cervical cell classification model. Finally, we perform a series of experiments to evaluate the proposed method on two cervical cytology datasets: Herlev and SIPaKMeD. Our method achieves the accuracy of 95.628%, precision of 95.685%, recall of 95.647% and F-score of 95.636% on SIPaKMeD dataset, which are the highest among all competing methods. Similarly, our method also achieves satisfactory result on Herlev dataset. In summary, extensive experimental results demonstrate that our proposed method has a promising performance in cervical cell image classification.
- Published
- 2022
- Full Text
- View/download PDF
13. A dual graph neural network for drug–drug interactions prediction based on molecular structure and interactions
- Author
-
Mei Ma and Xiujuan Lei
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Expressive molecular representation plays critical roles in researching drug design, while effective methods are beneficial to learning molecular representations and solving related problems in drug discovery, especially for drug-drug interactions (DDIs) prediction. Recently, a lot of work has been put forward using graph neural networks (GNNs) to forecast DDIs and learn molecular representations. However, under the current GNNs structure, the majority of approaches learn drug molecular representation from one-dimensional string or two-dimensional molecular graph structure, while the interaction information between chemical substructure remains rarely explored, and it is neglected to identify key substructures that contribute significantly to the DDIs prediction. Therefore, we proposed a dual graph neural network named DGNN-DDI to learn drug molecular features by using molecular structure and interactions. Specifically, we first designed a directed message passing neural network with substructure attention mechanism (SA-DMPNN) to adaptively extract substructures. Second, in order to improve the final features, we separated the drug-drug interactions into pairwise interactions between each drug’s unique substructures. Then, the features are adopted to predict interaction probability of a DDI tuple. We evaluated DGNN–DDI on real-world dataset. Compared to state-of-the-art methods, the model improved DDIs prediction performance. We also conducted case study on existing drugs aiming to predict drug combinations that may be effective for the novel coronavirus disease 2019 (COVID-19). Moreover, the visual interpretation results proved that the DGNN-DDI was sensitive to the structure information of drugs and able to detect the key substructures for DDIs. These advantages demonstrated that the proposed method enhanced the performance and interpretation capability of DDI prediction modeling. Author summary Drug-drug interactions (DDIs) may cause adverse effects that damage the body. Therefore, it is critical to predict potential drug-drug interactions. The majority of the prediction techniques still rely on the similarity hypothesis for drugs, sometimes neglect the molecular structure, and fail to include the interaction information between chemical substructure when predicting DDIs. We exploited this idea to develop and confirm the role that molecular structure and interaction information between chemical substructure play in DDIs prediction. The model includes a molecular substructure extraction framework to explain why substructures contribute differently to DDIs prediction, and a co-attention mechanism to explain why the interaction information between chemical substructure can improve DDIs prediction. Compared to state-of-the-art methods, the model improved the performance of DDIs prediction on real-world dataset. Furthermore, it could identify crucial components of treatment combinations that might be efficient against the emerging coronavirus disease 2019 (COVID-19).
- Published
- 2023
14. Drug repositioning based on heterogeneous networks and variational graph autoencoders
- Author
-
Song Lei, Xiujuan Lei, and Lian Liu
- Subjects
drug repositioning ,heterogeneous network ,variational graph autoencoders ,graph representation learning ,COVID-19 ,Therapeutics. Pharmacology ,RM1-950 - Abstract
Predicting new therapeutic effects (drug repositioning) of existing drugs plays an important role in drug development. However, traditional wet experimental prediction methods are usually time-consuming and costly. The emergence of more and more artificial intelligence-based drug repositioning methods in the past 2 years has facilitated drug development. In this study we propose a drug repositioning method, VGAEDR, based on a heterogeneous network of multiple drug attributes and a variational graph autoencoder. First, a drug-disease heterogeneous network is established based on three drug attributes, disease semantic information, and known drug-disease associations. Second, low-dimensional feature representations for heterogeneous networks are learned through a variational graph autoencoder module and a multi-layer convolutional module. Finally, the feature representation is fed to a fully connected layer and a Softmax layer to predict new drug-disease associations. Comparative experiments with other baseline methods on three datasets demonstrate the excellent performance of VGAEDR. In the case study, we predicted the top 10 possible anti-COVID-19 drugs on the existing drug and disease data, and six of them were verified by other literatures.
- Published
- 2022
- Full Text
- View/download PDF
15. Drug Repositioning with GraphSAGE and Clustering Constraints Based on Drug and Disease Networks
- Author
-
Yuchen Zhang, Xiujuan Lei, Yi Pan, and Fang-Xiang Wu
- Subjects
drug reposition ,graph neural network ,GraphSAGE ,matrix factorization ,clustering constraint ,COVID-19 ,Therapeutics. Pharmacology ,RM1-950 - Abstract
The understanding of therapeutic properties is important in drug repositioning and drug discovery. However, chemical or clinical trials are expensive and inefficient to characterize the therapeutic properties of drugs. Recently, artificial intelligence (AI)-assisted algorithms have received extensive attention for discovering the potential therapeutic properties of drugs and speeding up drug development. In this study, we propose a new method based on GraphSAGE and clustering constraints (DRGCC) to investigate the potential therapeutic properties of drugs for drug repositioning. First, the drug structure features and disease symptom features are extracted. Second, the drug–drug interaction network and disease similarity network are constructed according to the drug–gene and disease–gene relationships. Matrix factorization is adopted to extract the clustering features of networks. Then, all the features are fed to the GraphSAGE to predict new associations between existing drugs and diseases. Benchmark comparisons on two different datasets show that our method has reliable predictive performance and outperforms other six competing. We have also conducted case studies on existing drugs and diseases and aimed to predict drugs that may be effective for the novel coronavirus disease 2019 (COVID-19). Among the predicted anti-COVID-19 drug candidates, some drugs are being clinically studied by pharmacologists, and their binding sites to COVID-19-related protein receptors have been found via the molecular docking technology.
- Published
- 2022
- Full Text
- View/download PDF
16. Metapath Aggregated Graph Neural Network and Tripartite Heterogeneous Networks for Microbe-Disease Prediction
- Author
-
Yali Chen and Xiujuan Lei
- Subjects
microbe-disease associations ,heterogeneous network ,metapath aggregated graph neural network ,multi-head attention mechanism ,COVID-19 ,Microbiology ,QR1-502 - Abstract
More and more studies have shown that understanding microbe-disease associations cannot only reveal the pathogenesis of diseases, but also promote the diagnosis and prognosis of diseases. Because traditional medical experiments are time-consuming and expensive, many computational methods have been proposed in recent years to identify potential microbe-disease associations. In this study, we propose a method based on heterogeneous network and metapath aggregated graph neural network (MAGNN) to predict microbe-disease associations, called MATHNMDA. First, we introduce microbe-drug interactions, drug-disease associations, and microbe-disease associations to construct a microbe-drug-disease heterogeneous network. Then we take the heterogeneous network as input to MAGNN. Second, for each layer of MAGNN, we carry out intra-metapath aggregation with a multi-head attention mechanism to learn the structural and semantic information embedded in the target node context, the metapath-based neighbor nodes, and the context between them, by encoding the metapath instances under the metapath definition mode. We then use inter-metapath aggregation with an attention mechanism to combine the semantic information of all different metapaths. Third, we can get the final embedding of microbe nodes and disease nodes based on the output of the last layer in the MAGNN. Finally, we predict potential microbe-disease associations by reconstructing the microbe-disease association matrix. In addition, we evaluated the performance of MATHNMDA by comparing it with that of its variants, some state-of-the-art methods, and different datasets. The results suggest that MATHNMDA is an effective prediction method. The case studies on asthma, inflammatory bowel disease (IBD), and coronavirus disease 2019 (COVID-19) further validate the effectiveness of MATHNMDA.
- Published
- 2022
- Full Text
- View/download PDF
17. Identifying the sequence specificities of circRNA-binding proteins based on a capsule network architecture
- Author
-
Zhengfeng Wang and Xiujuan Lei
- Subjects
Circular RNA ,RNA-binding protein ,Sequence specificities ,Capsule network ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Circular RNAs (circRNAs) are widely expressed in cells and tissues and are involved in biological processes and human diseases. Recent studies have demonstrated that circRNAs can interact with RNA-binding proteins (RBPs), which is considered an important aspect for investigating the function of circRNAs. Results In this study, we design a slight variant of the capsule network, called circRB, to identify the sequence specificities of circRNAs binding to RBPs. In this model, the sequence features of circRNAs are extracted by convolution operations, and then, two dynamic routing algorithms in a capsule network are employed to discriminate between different binding sites by analysing the convolution features of binding sites. The experimental results show that the circRB method outperforms the existing computational methods. Afterwards, the trained models are applied to detect the sequence motifs on the seven circRNA-RBP bound sequence datasets and matched to known human RNA motifs. Some motifs on circular RNAs overlap with those on linear RNAs. Finally, we also predict binding sites on the reported full-length sequences of circRNAs interacting with RBPs, attempting to assist current studies. We hope that our model will contribute to better understanding the mechanisms of the interactions between RBPs and circRNAs. Conclusion In view of the poor studies about the sequence specificities of circRNA-binding proteins, we designed a classification framework called circRB based on the capsule network. The results show that the circRB method is an effective method, and it achieves higher prediction accuracy than other methods.
- Published
- 2021
- Full Text
- View/download PDF
18. CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization
- Author
-
Yuchen Zhang, Xiujuan Lei, Zengqiang Fang, and Yi Pan
- Subjects
circular rnas (circrnas) ,circrna-disease associations ,matepath2vec++ ,matrix factorization ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Circular RNA (circRNA) is a novel non-coding endogenous RNAs. Evidence has shown that circRNAs are related to many biological processes and play essential roles in different biological functions. Although increasing numbers of circRNAs are discovered using high-throughput sequencing technologies, these techniques are still time-consuming and costly. In this study, we propose a computational method to predict circRNA-disesae associations which is based on metapath2vec++ and matrix factorization with integrated multiple data (called PCD_MVMF). To construct more reliable networks, various aspects are considered. Firstly, circRNA annotation, sequence, and functional similarity networks are established, and disease-related genes and semantics are adopted to construct disease functional and semantic similarity networks. Secondly, metapath2vec++ is applied on an integrated heterogeneous network to learn the embedded features and initial prediction score. Finally, we use matrix factorization, take similarity as a constraint, and optimize it to obtain the final prediction results. Leave-one-out cross-validation, five-fold cross-validation, and f-measure are adopted to evaluate the performance of PCD_MVMF. These evaluation metrics verify that PCD_MVMF has better prediction performance than other methods. To further illustrate the performance of PCD_MVMF, case studies of common diseases are conducted. Therefore, PCD_MVMF can be regarded as a reliable and useful circRNA-disease association prediction tool.
- Published
- 2020
- Full Text
- View/download PDF
19. Matrix factorization with neural network for predicting circRNA-RBP interactions
- Author
-
Zhengfeng Wang and Xiujuan Lei
- Subjects
circRNA ,RNA binding protein ,Matrix factorization ,Neural networks ,Positive unlabeled learning ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Circular RNA (circRNA) has been extensively identified in cells and tissues, and plays crucial roles in human diseases and biological processes. circRNA could act as dynamic scaffolding molecules that modulate protein-protein interactions. The interactions between circRNA and RNA Binding Proteins (RBPs) are also deemed to an essential element underlying the functions of circRNA. Considering cost-heavy and labor-intensive aspects of these biological experimental technologies, instead, the high-throughput experimental data has enabled the large-scale prediction and analysis of circRNA-RBP interactions. Results A computational framework is constructed by employing Positive Unlabeled learning (P-U learning) to predict unknown circRNA-RBP interaction pairs with kernel model MFNN (Matrix Factorization with Neural Networks). The neural network is employed to extract the latent factors of circRNA and RBP in the interaction matrix, the P-U learning strategy is applied to alleviate the imbalanced characteristics of data samples and predict unknown interaction pairs. For this purpose, the known circRNA-RBP interaction data samples are collected from the circRNAs in cancer cell lines database (CircRic), and the circRNA-RBP interaction matrix is constructed as the input of the model. The experimental results show that kernel MFNN outperforms the other deep kernel models. Interestingly, it is found that the deeper of hidden layers in neural network framework does not mean the better in our model. Finally, the unlabeled interactions are scored using P-U learning with MFNN kernel, and the predicted interaction pairs are matched to the known interactions database. The results indicate that our method is an effective model to analyze the circRNA-RBP interactions. Conclusion For a poorly studied circRNA-RBP interactions, we design a prediction framework only based on interaction matrix by employing matrix factorization and neural network. We demonstrate that MFNN achieves higher prediction accuracy, and it is an effective method.
- Published
- 2020
- Full Text
- View/download PDF
20. Identification of Pathway-Specific Protein Domain by Incorporating Hyperparameter Optimization Based on 2D Convolutional Neural Network
- Author
-
Ali Ghualm, Xiujuan Lei, Yuchen Zhang, Shi Cheng, and Min Guo
- Subjects
Molecular structure prediction ,deep learning ,convolutional neural network ,evolutionary knowledge ,multiple features ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Pathway-specific protein domain (PSPD) are associated with specific pathways. Many protein domains are pervasive in various biological processes, whereas other domains are linked to specific pathways. Many human disease pathways, such as cancer pathways and signaling pathway-related diseases, have caused the loss of functional PSPD. Therefore, the creation of an accurate method to predict its roles is a critical step toward human disease and pathways. In this study, we proposed a deep learning model based on a two-dimensional neural network (2D-CNN-PSPD) with a pathway-specific protein domain association prediction. In terms of the purposes of a sub-pathway, its parent pathway and its super pathway are linked to the Uni-Pathway. We also proposed a dipeptide composition (DPC) model and a dipeptide deviation (DDE) model of feature extraction profiles as PSSM. Then, we predicted the proteins associated with the same sub-pathway or with the same organism. The DDE model and DPC model of the PSSM feature profile input was associated with our proposed 2D-CNN method. We deployed several parameters to optimize the model's output performance and used the hyperparameter optimization approach to find the best model for our dataset based on the 10-fold cross-validation results. Ultimately, we assessed the predictive performance of the current model by using independent datasets and cross-validation datasets. Therefore, we enhanced the efficiency of deep learning methods. PSPD is involved in any known pathway and then follow the association in different stages of the pathway hierarchy with other proteins. Our proposed method could identify 2D-CNN-PSPD with 0.83% sensitivity, 0.92% specificity, 87.27% accuracy, and 0.75% accuracy. We provided an important method for the analysis of PSPD proteins in the proposed research, and our achievements might promote computational biological research. We concluded our proposed model architecture in the future, the use of the latest features, and the multi-one structure to predict different types of molecules, such as DNA, RNA, and disease-pathway specific proteins associations.
- Published
- 2020
- Full Text
- View/download PDF
21. Disease-Pathway Association Prediction Based on Random Walks With Restart and PageRank
- Author
-
Ali Ghulam, Xiujuan Lei, Min Guo, and Chen Bian
- Subjects
Pathway similarity network ,disease similarity network ,disease pathway association ,PageRank algorithm ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The study of disease-pathway association in human diseases is a perennial focus of the biomedical field. The association of diseases and pathways can help in the discovery of the mechanisms or relationships of human diseases. The accuracy of disease identification has been less than satisfactory despite decades of research in this area. Therefore, this study proposes a computational model for the prediction of disease-pathway associations. The proposed computational model is based on Random Walk with Restart on heterogeneous network (RWRH) and PageRank. The RWRH disease-pathway association model is a novel computational model that can predict potential disease-pathway associations. Furthermore, the model can help pathologists understand the correlations among disease-pathway associations, treatments, and reactions. We performed a pathway-based study to expand disease variation relationships and to find new molecular correlations between genetic mutations. We constructed a biological network on the basis of shared gene interactions of disease-pathways and attempted to investigate the pathogenesis of a disease by analyzing the constructed network. The network construction was based on two parts. First, the similarity between pathway-pathway networks was calculated. Second, a disease-disease (DD) similarity network was constructed, and the correlation between disease and disease similarity was calculated. We also investigated the pathway seed node and disease seed node with high PageRank. Moreover, we focused on mining the complexity of disease-pathway associations. We used the bipartite network of disease-pathway associations to combine the obtained biological information, which was based on the pair similarity of sequence expression weights. These weights, which were obtained by using the multilayer resource-allocation algorithm, were used to calculate the prediction scores of each disease-pathway pair. Here, through leave-one-out cross-validation, we examined a $210\times1855$ matrix, with the 210 rows representing diseases and 1855 columns indicating pathways. The disease-pathway adjacency matrix contained 13,838 known disease-pathway associations. The best predictive results achieved an area-under-the-curve value of 0.8218 and a two-class precision-recall curve. These results indicate that our method has higher scientific performance than previously proposed methods. We predicted pathogen, DD, and disease-pathway relationships by comparing them with known associations and through publication search. We then proposed the possible reasons for our predictions.
- Published
- 2020
- Full Text
- View/download PDF
22. ISGm1A: Integration of Sequence Features and Genomic Features to Improve the Prediction of Human m1A RNA Methylation Sites
- Author
-
Lian Liu, Xiujuan Lei, Jia Meng, and Zhen Wei
- Subjects
Epitranscriptome ,m¹A ,site prediction ,sequence features ,genomic features ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
As a new epitranscriptomic modification, N1-methyladenosine (m1A) plays an important role in the gene expression regulation. Although some computational methods were proposed to predict m1A modification sites, all of these methods apply machine learning predictions based on the nucleotide sequence features, and they missed the layer of information in transcript topology and RNA secondary structures. To enhance the prediction model of m1A RNA methylation, we proposed a computational framework, ISGm1A, which stands for integration sequence features and genomic features to improve the prediction of human m1A RNA methylation sites. Based on the random forest algorithm, ISGm1A takes advantage of both conventional sequence features and 75 genomic characteristics to improve the prediction performance of m1A sites in human. The results of five-fold cross validation and independent test show that ISGm1A outperforms other prediction algorithms (AUC = 0.903 and 0.909). In addition, through analyzing the importance of features, we found that the genomic features play a more important role in site prediction than the sequence features. Furthermore, with ISGm1A, we generated a high accuracy map of m1A by predicting all adenines sites in the transcriptome. The data and the results of the study are freely accessible at: https://github.com/lianliu09/m1a_prediction.git.
- Published
- 2020
- Full Text
- View/download PDF
23. Bioinformatics approaches for deciphering the epitranscriptome: Recent progress and emerging topics
- Author
-
Lian Liu, Bowen Song, Jiani Ma, Yi Song, Song-Yao Zhang, Yujiao Tang, Xiangyu Wu, Zhen Wei, Kunqi Chen, Jionglong Su, Rong Rong, Zhiliang Lu, João Pedro de Magalhães, Daniel J. Rigden, Lin Zhang, Shao-Wu Zhang, Yufei Huang, Xiujuan Lei, Hui Liu, and Jia Meng
- Subjects
Epitranscriptome ,RNA modification ,Bioinformatics approaches ,Recent progress ,Future perspective ,Biotechnology ,TP248.13-248.65 - Abstract
Post-transcriptional RNA modification occurs on all types of RNA and plays a vital role in regulating every aspect of RNA function. Thanks to the development of high-throughput sequencing technologies, transcriptome-wide profiling of RNA modifications has been made possible. With the accumulation of a large number of high-throughput datasets, bioinformatics approaches have become increasing critical for unraveling the epitranscriptome. We review here the recent progress in bioinformatics approaches for deciphering the epitranscriptomes, including epitranscriptome data analysis techniques, RNA modification databases, disease-association inference, general functional annotation, and studies on RNA modification site prediction. We also discuss the limitations of existing approaches and offer some future perspectives.
- Published
- 2020
- Full Text
- View/download PDF
24. Detecting overlapping protein complexes in weighted PPI network based on overlay network chain in quotient space
- Author
-
Jie Zhao and Xiujuan Lei
- Subjects
Protein complexes ,Gene ontology ,Quotient space ,Granular computation ,Clustering ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Protein complexes are the cornerstones of many biological processes and gather them to form various types of molecular machinery that perform a vast array of biological functions. In fact, a protein may belong to multiple protein complexes. Most existing protein complex detection algorithms cannot reflect overlapping protein complexes. To solve this problem, a novel overlapping protein complexes identification algorithm is proposed. Results In this paper, a new clustering algorithm based on overlay network chain in quotient space, marked as ONCQS, was proposed to detect overlapping protein complexes in weighted PPI networks. In the quotient space, a multilevel overlay network is constructed by using the maximal complete subgraph to mine overlapping protein complexes. The GO annotation data is used to weight the PPI network. According to the compatibility relation, the overlay network chain in quotient space was calculated. The protein complexes are contained in the last level of the overlay network. The experiments were carried out on four PPI databases, and compared ONCQS with five other state-of-the-art methods in the identification of protein complexes. Conclusions We have applied ONCQS to four PPI databases DIP, Gavin, Krogan and MIPS, the results show that it is superior to other five existing algorithms MCODE, MCL, CORE, ClusterONE and COACH in detecting overlapping protein complexes.
- Published
- 2019
- Full Text
- View/download PDF
25. Prediction of miRNA-circRNA Associations Based on k-NN Multi-Label with Random Walk Restart on a Heterogeneous Network
- Author
-
Zengqiang Fang and Xiujuan Lei
- Subjects
mirna-circrna associations ,heterogeneous network ,multi-label ,random walk restart ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Circular RNAs (circRNAs) play important roles in various biological processes, as essential non-coding RNAs that have effects on transcriptional and posttranscriptional gene expression regulation. Recently, many studies have shown that circRNAs can be regarded as micro RNA (miRNA) sponges, which are known to be associated with certain diseases. Therefore efficient computation methods are needed to explore miRNA-circRNA interactions, but only very few computational methods for predicting the associations between miRNAs and circRNAs exist. In this study, we adopt an improved random walk computational method, named KRWRMC, to express complicated associations between miRNAs and circRNAs. Our major contributions can be summed up in two points. First, in the conventional Random Walk Restart Heterogeneous (RWRH) algorithm, the computational method simply converts the circRNA/miRNA similarity network into the transition probability matrix; in contrast, we take the influence of the neighbor of the node in the network into account, which can suggest or stress some potential associations. Second, our proposed KRWRMC is the first computational model to calculate large numbers of miRNA-circRNA associations, which can be regarded as biomarkers to diagnose certain diseases and can thus help us to better understand complicated diseases. The reliability of KRWRMC has been verified by Leave One Out Cross Validation (LOOCV) and 10-fold cross validation, the results of which indicate that this method achieves excellent performance in predicting potential miRNA-circRNA associations.
- Published
- 2019
- Full Text
- View/download PDF
26. Identifying Cancer genes by combining two-rounds RWR based on multiple biological data
- Author
-
Wenxiang Zhang, Xiujuan Lei (IEEE member), and Chen Bian
- Subjects
Identify cancer genes ,Quadruple layer heterogeneous network ,Two-rounds random walk with restart ,Multiple biological data ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background It’s a very urgent task to identify cancer genes that enables us to understand the mechanisms of biochemical processes at a biomolecular level and facilitates the development of bioinformatics. Although a large number of methods have been proposed to identify cancer genes at recent times, the biological data utilized by most of these methods is still quite less, which reflects an insufficient consideration of the relationship between genes and diseases from a variety of factors. Results In this paper, we propose a two-rounds random walk algorithm to identify cancer genes based on multiple biological data (TRWR-MB), including protein-protein interaction (PPI) network, pathway network, microRNA similarity network, lncRNA similarity network, cancer similarity network and protein complexes. In the first-round random walk, all cancer nodes, cancer-related genes, cancer-related microRNAs and cancer-related lncRNAs, being associated with all the cancer, are used as seed nodes, and then a random walker walks on a quadruple layer heterogeneous network constructed by multiple biological data. The first-round random walk aims to select the top score k of potential cancer genes. Then in the second-round random walk, genes, microRNAs and lncRNAs, being associated with a certain special cancer in corresponding cancer class, are regarded as seed nodes, and then the walker walks on a new quadruple layer heterogeneous network constructed by lncRNAs, microRNAs, cancer and selected potential cancer genes. After the above walks finish, we combine the results of two-rounds RWR as ranking score for experimental analysis. As a result, a higher value of area under the receiver operating characteristic curve (AUC) is obtained. Besides, cases studies for identifying new cancer genes are performed in corresponding section. Conclusion In summary, TRWR-MB integrates multiple biological data to identify cancer genes by analyzing the relationship between genes and cancer from a variety of biological molecular perspective.
- Published
- 2019
- Full Text
- View/download PDF
27. Predicting metabolite-disease associations based on KATZ model
- Author
-
Xiujuan Lei and Cheng Zhang
- Subjects
Metabolite-disease associations ,Heterogeneous network ,KATZ ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Analysis ,QA299.6-433 - Abstract
Abstract Background Increasing numbers of evidences have illuminated that metabolites can respond to pathological changes. However, identifying the diseases-related metabolites is a magnificent challenge in the field of biology and medicine. Traditional medical equipment not only has the limitation of its accuracy but also is expensive and time-consuming. Therefore, it’s necessary to take advantage of computational methods for predicting potential associations between metabolites and diseases. Results In this study, we develop a computational method based on KATZ algorithm to predict metabolite-disease associations (KATZMDA). Firstly, we extract data about metabolite-disease pairs from the latest version of HMDB database for the materials of prediction. Then we take advantage of disease semantic similarity and the improved disease Gaussian Interaction Profile (GIP) kernel similarity to obtain more reliable disease similarity and enhance the predictive performance of our proposed computational method. Simultaneously, KATZ algorithm is applied in the domains of metabolomics for the first time. Conclusions According to three kinds of cross validations and case studies of three common diseases, KATZMDA is worth serving as an impactful measuring tool for predicting the potential associations between metabolites and diseases.
- Published
- 2019
- Full Text
- View/download PDF
28. Protein complex detection based on flower pollination mechanism in multi-relation reconstructed dynamic protein networks
- Author
-
Xiujuan Lei, Ming Fang, Ling Guo, and Fang-Xiang Wu
- Subjects
Protein complex ,Dynamic protein-protein interaction (PPI) network ,Essential protein ,Flower pollination algorithm ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Detecting protein complex in protein-protein interaction (PPI) networks plays a significant part in bioinformatics field. It enables us to obtain the better understanding for the structures and characteristics of biological systems. Methods In this study, we present a novel algorithm, named Improved Flower Pollination Algorithm (IFPA), to identify protein complexes in multi-relation reconstructed dynamic PPI networks. Specifically, we first introduce a concept called co-essentiality, which considers the protein essentiality to search essential interactions, Then, we devise the multi-relation reconstructed dynamic PPI networks (MRDPNs) and discover the potential cores of protein complexes in MRDPNs. Finally, an IFPA algorithm is put forward based on the flower pollination mechanism to generate protein complexes by simulating the process of pollen find the optimal pollination plants, namely, attach the peripheries to the corresponding cores. Results The experimental results on three different datasets (DIP, MIPS and Krogan) show that our IFPA algorithm is more superior to some representative methods in the prediction of protein complexes. Conclusions Our proposed IFPA algorithm is powerful in protein complex detection by building multi-relation reconstructed dynamic protein networks and using improved flower pollination algorithm. The experimental results indicate that our IFPA algorithm can obtain better performance than other methods.
- Published
- 2019
- Full Text
- View/download PDF
29. Predicting Metabolite–Disease Associations Based on LightGBM Model
- Author
-
Cheng Zhang, Xiujuan Lei, and Lian Liu
- Subjects
metabolite-disease associations ,light gradient boosting machine ,features ,computational method ,performance evaluation ,Genetics ,QH426-470 - Abstract
Metabolites have been shown to be closely related to the occurrence and development of many complex human diseases by a large number of biological experiments; investigating their correlation mechanisms is thus an important topic, which attracts many researchers. In this work, we propose a computational method named LGBMMDA, which is based on the Light Gradient Boosting Machine (LightGBM) to predict potential metabolite–disease associations. This method extracts the features from statistical measures, graph theoretical measures, and matrix factorization results, utilizing the principal component analysis (PCA) process to remove noise or redundancy. We evaluated our method compared with other used methods and demonstrated the better areas under the curve (AUCs) of LGBMMDA. Additionally, three case studies deeply confirmed that LGBMMDA has obvious superiority in predicting metabolite–disease pairs and represents a powerful bioinformatics tool.
- Published
- 2021
- Full Text
- View/download PDF
30. Predicting Metabolite-Disease Associations Based on Spy Strategy and ABC Algorithm
- Author
-
Xiujuan Lei, Cheng Zhang, and Yueyue Wang
- Subjects
metabolites ,disease ,associations ,spy strategy ,ABC algorithm ,Biology (General) ,QH301-705.5 - Abstract
In recent years, latent metabolite-disease associations have been a significant focus in the biomedical domain. And more and more experimental evidence has been adduced that metabolites correlate with the diagnosis of complex human diseases. Several computational methods have been developed to detect potential metabolite-disease associations. In this article, we propose a novel method based on the spy strategy and an artificial bee colony (ABC) algorithm for metabolite-disease association prediction (SSABCMDA). Due to the fact that there are large parts of missing associations in unconfirmed metabolite-disease pairs, spy strategy is adopted to extract reliable negative samples from unconfirmed pairs. Considering the effects of parameters, the ABC algorithm is utilized to optimize parameters. In relevant cross-validation experiments, our method achieves excellent predictive performance. Moreover, three types of case studies are conducted on three common diseases to demonstrate the validity and utility of SSABCMDA method. Relevant experimental results indicate that our method can predict potential associations between metabolites and diseases effectively.
- Published
- 2020
- Full Text
- View/download PDF
31. Prioritizing CircRNA–Disease Associations With Convolutional Neural Network Based on Multiple Similarity Feature Fusion
- Author
-
Chunyan Fan, Xiujuan Lei, and Yi Pan
- Subjects
circRNA-disease associations ,circRNA-miRNA interaction ,similarity kernel fusion ,feature matrix ,convolutional neural network ,Genetics ,QH426-470 - Abstract
Accumulating evidence shows that circular RNAs (circRNAs) have significant roles in human health and in the occurrence and development of diseases. Biological researchers have identified disease-related circRNAs that could be considered as potential biomarkers for clinical diagnosis, prognosis, and treatment. However, identification of circRNA–disease associations using traditional biological experiments is still expensive and time-consuming. In this study, we propose a novel method named MSFCNN for the task of circRNA–disease association prediction, involving two-layer convolutional neural networks on a feature matrix that fuses multiple similarity kernels and interaction features among circRNAs, miRNAs, and diseases. First, four circRNA similarity kernels and seven disease similarity kernels are constructed based on the biological or topological properties of circRNAs and diseases. Subsequently, the similarity kernel fusion method is used to integrate the similarity kernels into one circRNA similarity kernel and one disease similarity kernel, respectively. Then, a feature matrix for each circRNA–disease pair is constructed by integrating the fused circRNA similarity kernel and fused disease similarity kernel with interactions and features among circRNAs, miRNAs, and diseases. The features of circRNA–miRNA and disease–miRNA interactions are selected using principal component analysis. Finally, taking the constructed feature matrix as an input, we used two-layer convolutional neural networks to predict circRNA–disease association labels and mine potential novel associations. Five-fold cross validation shows that our proposed model outperforms conventional machine learning methods, including support vector machine, random forest, and multilayer perception approaches. Furthermore, case studies of predicted circRNAs for specific diseases and the top predicted circRNA–disease associations are analyzed. The results show that the MSFCNN model could be an effective tool for mining potential circRNA–disease associations.
- Published
- 2020
- Full Text
- View/download PDF
32. LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor
- Author
-
Lian Liu, Xiujuan Lei, Zengqiang Fang, Yujiao Tang, Jia Meng, and Zhen Wei
- Subjects
m6A ,lncRNA ,site prediction ,epitranscriptome ,ensemble model ,Genetics ,QH426-470 - Abstract
N6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications, which plays an important role in many biological processes, such as splicing, RNA localization, and degradation. Studies have shown that m6A on lncRNA has important functions, including regulating the expression and functions of lncRNA, regulating the synthesis of pre-mRNA, promoting the proliferation of cancer cells, and affecting cell differentiation and many others. Although a number of methods have been proposed to predict m6A RNA methylation sites, most of these methods aimed at general m6A sites prediction without noticing the uniqueness of the lncRNA methylation prediction problem. Since many lncRNAs do not have a polyA tail and cannot be captured in the polyA selection step of the most widely adopted RNA-seq library preparation protocol, lncRNA methylation sites cannot be effectively captured and are thus likely to be significantly underrepresented in existing experimental data affecting the accuracy of existing predictors. In this paper, we propose a new computational framework, LITHOPHONE, which stands for long noncoding RNA methylation sites prediction from sequence characteristics and genomic information with an ensemble predictor. We show that the methylation sites of lncRNA and mRNA have different patterns exhibited in the extracted features and should be differently handled when making predictions. Due to the used experiment protocols, the number of known lncRNA m6A sites is limited, and insufficient to train a reliable predictor; thus, the performance can be improved by combining both lncRNA and mRNA data using an ensemble predictor. We show that the newly developed LITHOPHONE approach achieved a reasonably good performance when tested on independent datasets (AUC: 0.966 and 0.835 under full transcript and mature mRNA modes, respectively), marking a substantial improvement compared with existing methods. Additionally, LITHOPHONE was applied to scan the entire human lncRNAome for all possible lncRNA m6A sites, and the results are freely accessible at: http://180.208.58.19/lith/.
- Published
- 2020
- Full Text
- View/download PDF
33. Deep belief network–Based Matrix Factorization Model for MicroRNA-Disease Associations Prediction
- Author
-
Yulian Ding, Fei Wang, Xiujuan Lei, Bo Liao, and Fang-Xiang Wu
- Subjects
Evolution ,QH359-425 - Abstract
MicroRNAs (miRNAs) are small single-stranded noncoding RNAs that have shown to play a critical role in regulating gene expression. In past decades, cumulative experimental studies have verified that miRNAs are implicated in many complex human diseases and might be potential biomarkers for various types of diseases. With the increase of miRNA-related data and the development of analysis methodologies, some computational methods have been developed for predicting miRNA-disease associations, which are more economical and time-saving than traditional biological experimental approaches. In this study, a novel computational model, deep belief network (DBN)-based matrix factorization (DBN-MF), is proposed for miRNA-disease association prediction. First, the raw interaction features of miRNAs and diseases were obtained from the miRNA-disease adjacent matrix. Second, 2 DBNs were used for unsupervised learning of the features of miRNAs and diseases, respectively, based on the raw interaction features. Finally, a classifier consisting of 2 DBNs and a cosine score function was trained with the initial weights of DBN from the last step. During the training, the miRNA-disease adjacent matrix was factorized into 2 feature matrices for the representation of miRNAs and diseases, and the final prediction label was obtained according to the feature matrices. The experimental results show that the proposed model outperforms the state-of-the-art approaches in miRNA-disease association prediction based on the 10-fold cross-validation. Besides, the effectiveness of our model was further demonstrated by case studies.
- Published
- 2020
- Full Text
- View/download PDF
34. Predicting Essential Proteins Based on Second-Order Neighborhood Information and Information Entropy
- Author
-
Jie Zhao and Xiujuan Lei
- Subjects
Essential proteins ,information entropy ,neighborhood information ,protein interaction networks ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Essential proteins are critical components of living organisms and indispensable to cellular life. Identification of essential proteins plays a critical role in the survival and development of life process and understanding the function of cell machinery. The experimental methods are usually costly and time-consuming. In order to overcome these limitations, many computational methods have been proposed to discover essential proteins based on the topological features of PPI networks and other biological information. In this paper, a new method named NIE is proposed to predict essential proteins based on second-order neighborhood information and information entropy of protein complex and subcellular localization. Firstly, a number of studies have shown that the RNA-Seq data is more advantageous than traditional gene expression data in predicting essential proteins. Meanwhile, the protein essentiality is closely related to the subcellular localization information, protein complex information and protein GO terms through data analysis. A weighted PPI network is constructed to reduce the impact of false positives and false negatives data on the identification of essential proteins, which integrates the GO terms information with Pearson correlation coefficient of RNA-Seq data. Secondly, the information entropy of protein complexes and subcellular localization is calculated to represent the biological characteristics of proteins. Furthermore, an information propagation model is constructed, which combines the biological properties of the proteins with the second-order neighborhood information in the PPI network to measure the essentiality of the proteins. In the experiments section, the proposed method is implemented on three common datasets (DIP, Krogan and MIPS) of Saccharomyces cerevisiae. A comparison study with other commonly used algorithms, including LAC, NC, PeC, WDC, UC, LIDC and LBCC is performed to evaluate the performance of NIE. The results show that the new method NIE has a better performance in predicting essential proteins.
- Published
- 2019
- Full Text
- View/download PDF
35. Predicting Microbe-Disease Association by Learning Graph Representations and Rule-Based Inference on the Heterogeneous Network
- Author
-
Xiujuan Lei and Yueyue Wang
- Subjects
microbe-disease association ,heterogeneous network ,network embedding algorithm ,Node2vec ,skip-gram ,Microbiology ,QR1-502 - Abstract
More and more clinical observations have implied that microbes have great effects on human diseases. Understanding the relations between microbes and diseases are of profound significance for disease prevention and therapy. In this paper, we propose a predictive model based on the known microbe-disease associations to discover potential microbe-disease associations through integrating Learning Graph Representations and a modified Scoring mechanism on the Heterogeneous network (called LGRSH). Firstly, the similarity networks for microbe and disease are obtained based on the similarity of Gaussian interaction profile kernel. Then, we construct a heterogeneous network including these two similarity networks and microbe-disease associations’ network. After that, the embedding algorithm Node2vec is implemented to learn representations of nodes in the heterogeneous network. Finally, according to these low-dimensional vector representations, we calculate the relevance between each microbe and disease by utilizing a modified rule-based inference method. By comparison with three other methods including LRLSHMDA, KATZHMDA and BiRWHMDA, LGRSH performs better than others. Moreover, in case studies of asthma, Chronic Obstructive Pulmonary Disease and Inflammatory Bowel Disease, there are 8, 8, and 10 out of the top-10 discovered disease-related microbes were validated respectively, demonstrating that LGRSH performs well in predicting potential microbe-disease associations.
- Published
- 2020
- Full Text
- View/download PDF
36. Locating Multiple Optima via Brain Storm Optimization Algorithms
- Author
-
Shi Cheng, Junfeng Chen, Xiujuan Lei, and Yuhui Shi
- Subjects
Brain storm optimization ,swarm intelligence ,brain storm optimization in objective space algorithm ,multimodal optimization ,nonlinear equation systems ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Locating multiple optima/peaks in a single run and maintaining these found optima until the end of a run is the goal of multimodal optimization. Three variants of brain storm optimization (BSO) algorithms, which include original BSO algorithm, BSO in objective space algorithm with Gaussian random variable, and BSO in objective space algorithm with Cauchy random variable, were utilized to solve multimodal optimization problems in this paper. The experimental tests were conducted on eight benchmark problems and its applications in seven nonlinear equation system problems. The performance and effectiveness of various BSO algorithms on solving multimodal optimization problems were validated based on the experimental results. The conclusions could be made that the global search ability and solutions maintenance ability of an algorithm needs to be balanced simultaneously on solving multimodal optimization problems.
- Published
- 2018
- Full Text
- View/download PDF
37. iOPTICS-GSO for identifying protein complexes from dynamic PPI networks
- Author
-
Xiujuan Lei, Huan Li, Aidong Zhang, and Fang-Xiang Wu
- Subjects
Ordering points to identify the clustering structure algorithm (OPTICS) ,Glowworm swarm optimization algorithm (GSO) ,Protein complex ,Density-based clustering ,Internal medicine ,RC31-1245 ,Genetics ,QH426-470 - Abstract
Abstract Background Identifying protein complexes plays an important role for understanding cellular organization and functional mechanisms. As plenty of evidences have indicated that dense sub-networks in dynamic protein-protein interaction network (DPIN) usually correspond to protein complexes, identifying protein complexes is formulated as density-based clustering. Methods In this paper, a new approach named iOPTICS-GSO is developed, which is the improved Ordering Points to Identify the Clustering Structure (OPTICS) algorithm with Glowworm swarm optimization algorithm (GSO) to optimize the parameters in OPTICS when finding dense sub-networks. In our iOPTICS-GSO, the concept of core node is redefined and the Euclidean distance in OPTICS is replaced with the improved similarity between the nodes in the PPI network according to their interaction strength, and dense sub-networks are considered as protein complexes. Results The experiment results have shown that our iOPTICS-GSO outperforms of algorithms such as DBSCAN, CFinder, MCODE, CMC, COACH, ClusterOne MCL and OPTICS_PSO in terms of f-measure and p-value on four DPINs, which are from the DIP, Krogan, MIPS and Gavin datasets. In addition, our predicted protein complexes have a small p-value and thus are highly likely to be true protein complexes. Conclusion The proposed iOPTICS-GSO gains optimal clustering results by adopting GSO algorithm to optimize the parameters in OPTICS, and the result on four datasets shows superior performance. What’s more, the results provided clues for biologists to verify and find new protein complexes.
- Published
- 2017
- Full Text
- View/download PDF
38. Predicting Metabolite-Disease Associations Based on Linear Neighborhood Similarity with Improved Bipartite Network Projection Algorithm
- Author
-
Xiujuan Lei and Cheng Zhang
- Subjects
Electronic computers. Computer science ,QA75.5-76.95 - Abstract
A large number of clinical observations have showed that metabolites are involved in a variety of important human diseases in the recent years. Nonetheless, the inherent noise and incompleteness in the existing biological datasets are tough factors which limit the prediction accuracy of current computational methods. To solve this problem, in this paper, a prediction method, IBNPLNSMDA, is proposed which uses the improved bipartite network projection method to predict latent metabolite-disease associations based on linear neighborhood similarity. Specifically, liner neighborhood similarity matrix about metabolites (diseases) is reconstructed according to the new feature which is gained by the known metabolite-disease associations and relevant integrated similarities. The improved bipartite network projection method is adopted to infer the potential associations between metabolites and diseases. At last, IBNPLNSMDA achieves a reliable performance in LOOCV (AUC of 0.9634) outperforming the compared methods. In addition, in case studies of four common human diseases, simulation results confirm the utility of our method in discovering latent metabolite-disease pairs. Thus, we believe that IBNPLNSMDA could serve as a reliable computational tool for metabolite-disease associations prediction.
- Published
- 2020
- Full Text
- View/download PDF
39. Predicting circRNA–Disease Associations Based on Improved Collaboration Filtering Recommendation System With Multiple Data
- Author
-
Xiujuan Lei, Zengqiang Fang, and Ling Guo
- Subjects
circRNA–disease association ,collaboration filtering ,multiple biological data ,recommendation system ,neighbor information ,Genetics ,QH426-470 - Abstract
With the development of high-throughput techniques, various biological molecules are discovered, which includes the circular RNAs (circRNAs). Circular RNA is a novel endogenous noncoding RNA that plays significant roles in regulating gene expression, moderating the microRNAs transcription as sponges, diagnosing diseases, and so on. Based on the circRNA particular molecular structures that are closed-loop structures with neither 5′-3′ polarities nor polyadenylated tails, circRNAs are more stable and conservative than the normal linear coding or noncoding RNAs, which makes circRNAs a biomarker of various diseases. Although some conventional experiments are used to identify the associations between circRNAs and diseases, almost the techniques and experiments are time-consuming and expensive. In this study, we propose a collaboration filtering recommendation system–based computational method, which handles the “cold start” problem to predict the potential circRNA–disease associations, which is named ICFCDA. All the known circRNA–disease associations data are downloaded from circR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). Based on these data, multiple data are extracted from different databases to calculate the circRNA similarity networks and the disease similarity networks. The collaboration filtering recommendation system algorithm is first employed to predict circRNA–disease associations. Then, the leave-one-out cross validation mechanism is adopted to measure the performance of our proposed computational method. ICFCDA achieves the areas under the curve of 0.946, which is better than other existing methods. In order to further illustrate the performance of ICFCDA, case studies of some common diseases are made, and the results are confirmed by other databases. The experimental results show that ICFCDA is competent in predicting the circRNA–disease associations.
- Published
- 2019
- Full Text
- View/download PDF
40. deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks
- Author
-
Ping Luo, Yulian Ding, Xiujuan Lei, and Fang-Xiang Wu
- Subjects
deep learning ,convolutional neural networks ,driver gene prediction ,cancer mutations ,gene similarity network ,Genetics ,QH426-470 - Abstract
With the advances in high-throughput technologies, millions of somatic mutations have been reported in the past decade. Identifying driver genes with oncogenic mutations from these data is a critical and challenging problem. Many computational methods have been proposed to predict driver genes. Among them, machine learning-based methods usually train a classifier with representations that concatenate various types of features extracted from different kinds of data. Although successful, simply concatenating different types of features may not be the best way to fuse these data. We notice that a few types of data characterize the similarities of genes, to better integrate them with other data and improve the accuracy of driver gene prediction, in this study, a deep learning-based method (deepDriver) is proposed by performing convolution on mutation-based features of genes and their neighbors in the similarity networks. The method allows the convolutional neural network to learn information within mutation data and similarity networks simultaneously, which enhances the prediction of driver genes. deepDriver achieves AUC scores of 0.984 and 0.976 on breast cancer and colorectal cancer, which are superior to the competing algorithms. Further evaluations of the top 10 predictions also demonstrate that deepDriver is valuable for predicting new driver genes.
- Published
- 2019
- Full Text
- View/download PDF
41. Prediction of disease-related metabolites using bi-random walks.
- Author
-
Xiujuan Lei and Jiaojiao Tie
- Subjects
Medicine ,Science - Abstract
Metabolites play a significant role in various complex human disease. The exploration of the relationship between metabolites and diseases can help us to better understand the underlying pathogenesis. Several network-based methods have been used to predict the association between metabolite and disease. However, some methods ignored hierarchical differences in disease network and failed to work in the absence of known metabolite-disease associations. This paper presents a bi-random walks based method for disease-related metabolites prediction, called MDBIRW. First of all, we reconstruct the disease similarity network and metabolite functional similarity network by integrating Gaussian Interaction Profile (GIP) kernel similarity of diseases and GIP kernel similarity of metabolites, respectively. Then, the bi-random walks algorithm is executed on the reconstructed disease similarity network and metabolite functional similarity network to predict potential disease-metabolite associations. At last, MDBIRW achieves reliable performance in leave-one-out cross validation (AUC of 0.910) and 5-fold cross validation (AUC of 0.924). The experimental results show that our method outperforms other existing methods for predicting disease-related metabolites.
- Published
- 2019
- Full Text
- View/download PDF
42. BRWSP: Predicting circRNA-Disease Associations Based on Biased Random Walk to Search Paths on a Multiple Heterogeneous Network
- Author
-
Xiujuan Lei and Wenxiang Zhang
- Subjects
Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The circular RNAs (circRNAs) have significant effects on a variety of biological processes, the dysfunction of which is closely related to the emergence and development of diseases. Therefore, identification of circRNA-disease associations will contribute to analysing the pathogenesis of diseases. Here, we present a computational model called BRWSP to predict circRNA-disease associations, which searches paths on a multiple heterogeneous network based on biased random walk. Firstly, BRWSP constructs a multiple heterogeneous network by using circRNAs, diseases, and genes. Then, the biased random walk algorithm runs on the multiple heterogeneous network to search paths between circRNAs and diseases. Finally, the performance of BRWSP is significantly better than the state-of-the-art algorithms. Furthermore, BRWSP further contributes to the discovery of novel circRNA-disease associations.
- Published
- 2019
- Full Text
- View/download PDF
43. A Multiobjective Brain Storm Optimization Algorithm Based on Decomposition
- Author
-
Cai Dai and Xiujuan Lei
- Subjects
Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Brain storm optimization (BSO) algorithm is a simple and effective evolutionary algorithm. Some multiobjective brain storm optimization algorithms have low search efficiency. This paper combines the decomposition technology and multiobjective brain storm optimization algorithm (MBSO/D) to improve the search efficiency. Given weight vectors transform a multiobjective optimization problem into a series of subproblems. The decomposition technology determines the neighboring clusters of each cluster. Solutions of adjacent clusters generate new solutions to update population. An adaptive selection strategy is used to balance exploration and exploitation. Besides, MBSO/D compares with three efficient state-of-the-art algorithms, e.g., NSGAII and MOEA/D, on twenty-two test problems. The experimental results show that MBSO/D is more efficient than compared algorithms and can improve the search efficiency for most test problems.
- Published
- 2019
- Full Text
- View/download PDF
44. A new method for predicting essential proteins based on participation degree in protein complex and subgraph density.
- Author
-
Xiujuan Lei and Xiaoqin Yang
- Subjects
Medicine ,Science - Abstract
Essential proteins are crucial to living cells. Identification of essential proteins from protein-protein interaction (PPI) networks can be applied to pathway analysis and function prediction, furthermore, it can contribute to disease diagnosis and drug design. There have been some experimental and computational methods designed to identify essential proteins, however, the prediction precision remains to be improved. In this paper, we propose a new method for identifying essential proteins based on Participation degree of a protein in protein Complexes and Subgraph Density, named as PCSD. In order to test the performance of PCSD, four PPI datasets (DIP, Krogan, MIPS and Gavin) are used to conduct experiments. The experiment results have demonstrated that PCSD achieves a better performance for predicting essential proteins compared with some competing methods including DC, SC, EC, IC, LAC, NC, WDC, PeC, UDoNC, and compared with the most recent method LBCC, PCSD can correctly predict more essential proteins from certain numbers of top ranked proteins on the DIP dataset, which indicates that PCSD is very effective in discovering essential proteins in most case.
- Published
- 2018
- Full Text
- View/download PDF
45. A Decomposition-Based Multiobjective Evolutionary Algorithm with Adaptive Weight Adjustment
- Author
-
Cai Dai and Xiujuan Lei
- Subjects
Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Recently, decomposition-based multiobjective evolutionary algorithms have good performances in the field of multiobjective optimization problems (MOPs) and have been paid attention by many scholars. Generally, a MOP is decomposed into a number of subproblems through a set of weight vectors with good uniformly and aggregate functions. The main role of weight vectors is to ensure the diversity and convergence of obtained solutions. However, these algorithms with uniformity of weight vectors cannot obtain a set of solutions with good diversity on some MOPs with complex Pareto optimal fronts (PFs) (i.e., PFs with a sharp peak or low tail or discontinuous PFs). To deal with this problem, an improved decomposition-based multiobjective evolutionary algorithm with adaptive weight adjustment (IMOEA/DA) is proposed. Firstly, a new method based on uniform design and crowding distance is used to generate a set of weight vectors with good uniformly. Secondly, according to the distances of obtained nondominated solutions, an adaptive weight vector adjustment strategy is proposed to redistribute the weight vectors of subobjective spaces. Thirdly, a selection strategy is used to help each subobjective space to obtain a nondominated solution (if have). Comparing with six efficient state-of-the-art algorithms, for example, NSGAII, MOEA/D, MOEA/D-AWA, EMOSA, RVEA, and KnEA on some benchmark functions, the proposed algorithm is able to find a set of solutions with better diversity and convergence.
- Published
- 2018
- Full Text
- View/download PDF
46. Identifying Cancer-Specific circRNA–RBP Binding Sites Based on Deep Learning
- Author
-
Zhengfeng Wang, Xiujuan Lei, and Fang-Xiang Wu
- Subjects
circrna ,rna binding protein ,cancer-specific ,convolutional neural network ,Organic chemistry ,QD241-441 - Abstract
Circular RNAs (circRNAs) are extensively expressed in cells and tissues, and play crucial roles in human diseases and biological processes. Recent studies have reported that circRNAs could function as RNA binding protein (RBP) sponges, meanwhile RBPs can also be involved in back-splicing. The interaction with RBPs is also considered an important factor for investigating the function of circRNAs. Hence, it is necessary to understand the interaction mechanisms of circRNAs and RBPs, especially in human cancers. Here, we present a novel method based on deep learning to identify cancer-specific circRNA−RBP binding sites (CSCRSites), only using the nucleotide sequences as the input. In CSCRSites, an architecture with multiple convolution layers is utilized to detect the features of the raw circRNA sequence fragments, and further identify the binding sites through a fully connected layer with the softmax output. The experimental results show that CSCRSites outperform the conventional machine learning classifiers and some representative deep learning methods on the benchmark data. In addition, the features learnt by CSCRSites are converted to sequence motifs, some of which can match to human known RNA motifs involved in human diseases, especially cancer. Therefore, as a deep learning-based tool, CSCRSites could significantly contribute to the function analysis of cancer-associated circRNAs.
- Published
- 2019
- Full Text
- View/download PDF
47. Predicting Protein Complexes in Weighted Dynamic PPI Networks Based on ICSC
- Author
-
Jie Zhao, Xiujuan Lei, and Fang-Xiang Wu
- Subjects
Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Protein complexes play a critical role in understanding the biological processes and the functions of cellular mechanisms. Most existing protein complex detection algorithms cannot reflect dynamics of protein complexes. In this paper, a novel algorithm named Improved Cuckoo Search Clustering (ICSC) algorithm is proposed to detect protein complexes in weighted dynamic protein-protein interaction (PPI) networks. First, we constructed weighted dynamic PPI networks and detected protein complex cores in each dynamic subnetwork. Then, ICSC algorithm was used to cluster the protein attachments to the cores. The experimental results on both DIP dataset and Krogan dataset demonstrated that ICSC algorithm is more effective in identifying protein complexes than other competing methods.
- Published
- 2017
- Full Text
- View/download PDF
48. Feature Selection via Swarm Intelligence for Determining Protein Essentiality
- Author
-
Ming Fang, Xiujuan Lei, Shi Cheng, Yuhui Shi, and Fang-Xiang Wu
- Subjects
feature selection ,essential protein ,flower pollination algorithm ,machine learning ,protein-protein interaction (PPI) network ,Organic chemistry ,QD241-441 - Abstract
Protein essentiality is fundamental to comprehend the function and evolution of genes. The prediction of protein essentiality is pivotal in identifying disease genes and potential drug targets. Since the experimental methods need many investments in time and funds, it is of great value to predict protein essentiality with high accuracy using computational methods. In this study, we present a novel feature selection named Elite Search mechanism-based Flower Pollination Algorithm (ESFPA) to determine protein essentiality. Unlike other protein essentiality prediction methods, ESFPA uses an improved swarm intelligence–based algorithm for feature selection and selects optimal features for protein essentiality prediction. The first step is to collect numerous features with the highly predictive characteristics of essentiality. The second step is to develop a feature selection strategy based on a swarm intelligence algorithm to obtain the optimal feature subset. Furthermore, an elite search mechanism is adopted to further improve the quality of feature subset. Subsequently a hybrid classifier is applied to evaluate the essentiality for each protein. Finally, the experimental results show that our method is competitive to some well-known feature selection methods. The proposed method aims to provide a new perspective for protein essentiality determination.
- Published
- 2018
- Full Text
- View/download PDF
49. Neighbor Affinity-Based Core-Attachment Method to Detect Protein Complexes in Dynamic PPI Networks
- Author
-
Xiujuan Lei and Jing Liang
- Subjects
protein-protein interaction (PPI) network ,protein complexes ,neighbor affinity ,core-attachment ,Organic chemistry ,QD241-441 - Abstract
Protein complexes play significant roles in cellular processes. Identifying protein complexes from protein-protein interaction (PPI) networks is an effective strategy to understand biological processes and cellular functions. A number of methods have recently been proposed to detect protein complexes. However, most of methods predict protein complexes from static PPI networks, and usually overlook the inherent dynamics and topological properties of protein complexes. In this paper, we proposed a novel method, called NABCAM (Neighbor Affinity-Based Core-Attachment Method), to identify protein complexes from dynamic PPI networks. Firstly, the centrality score of every protein is calculated. The proteins with the highest centrality scores are regarded as the seed proteins. Secondly, the seed proteins are expanded to complex cores by calculating the similarity values between the seed proteins and their neighboring proteins. Thirdly, the attachments are appended to their corresponding protein complex cores by comparing the affinity among neighbors inside the core, against that outside the core. Finally, filtering processes are carried out to obtain the final clustering result. The result in the DIP database shows that the NABCAM algorithm can predict protein complexes effectively in comparison with other state-of-the-art methods. Moreover, many protein complexes predicted by our method are biologically significant.
- Published
- 2017
- Full Text
- View/download PDF
50. ICDFGF: Identification of potential circRNA-disease associations based on feature graph factorization.
- Author
-
Yuchen Zhang 0003, Xiujuan Lei, Zhengfeng Wang, and Yi Pan 0001
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.