385 results on '"Xiujuan Lei"'
Search Results
152. Predicting novel CircRNA-disease associations based on random walk and logistic regression model.
- Author
-
Yulian Ding, Bolin Chen, Xiujuan Lei, Bo Liao, and Fang-Xiang Wu
- Published
- 2020
- Full Text
- View/download PDF
153. Relational completion based non-negative matrix factorization for predicting metabolite-disease associations.
- Author
-
Xiujuan Lei, Jiaojiao Tie, and Hamido Fujita
- Published
- 2020
- Full Text
- View/download PDF
154. Application of Improved Particle Swarm Optimization Algorithm in UCAV Path Planning.
- Author
-
Qianzhi Ma and Xiujuan Lei
- Published
- 2009
- Full Text
- View/download PDF
155. Mobile Robot Path Planning with Complex Constraints Based on the Second-Order Oscillating Particle Swarm Optimization Algorithm.
- Author
-
Qianzhi Ma, Xiujuan Lei, and Qun Zhang
- Published
- 2009
- Full Text
- View/download PDF
156. 2-D Maximum-Entropy Thresholding Image Segmentation Method Based on Second-Order Oscillating PSO.
- Author
-
Xiujuan Lei and Ali Fu
- Published
- 2009
- Full Text
- View/download PDF
157. The aircraft departure scheduling based on particle swarm optimization combined with simulated annealing algorithm.
- Author
-
Fu Ali, Xiujuan Lei, and Xiao Xiao
- Published
- 2008
- Full Text
- View/download PDF
158. The aircraft departure scheduling based on second-order oscillating particle swarm optimization algorithm.
- Author
-
Xiujuan Lei, Fu Ali, and Zhongke Shi
- Published
- 2008
- Full Text
- View/download PDF
159. Two-Dimensional Maximum Entropy Image Segmentation Method Based on Quantum-Behaved Particle Swarm Optimization Algorithm.
- Author
-
Xiujuan Lei and Ali Fu
- Published
- 2008
- Full Text
- View/download PDF
160. Air robot path planning based on Intelligent Water Drops optimization.
- Author
-
Haibin Duan, Senqi Liu, and Xiujuan Lei
- Published
- 2008
- Full Text
- View/download PDF
161. The Variations, Combination Strategies Analysis of Particle Swarm Optimization.
- Author
-
Xiujuan Lei and Zhongke Shi
- Published
- 2007
- Full Text
- View/download PDF
162. Optimization and Simulation of a MO Problem Solved by GAs and PSO.
- Author
-
Xiujuan Lei, Zhongke Shi, and Laijun Wang
- Published
- 2007
- Full Text
- View/download PDF
163. Machine learning approaches for predicting biomolecule–disease associations
- Author
-
Yulian Ding, Xiujuan Lei, Bo Liao, and Fang-Xiang Wu
- Subjects
media_common.quotation_subject ,Disease ,Biology ,Machine learning ,computer.software_genre ,Biochemistry ,Non-negative matrix factorization ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Prediction methods ,Genetics ,Feature (machine learning) ,Humans ,Representation (mathematics) ,Molecular Biology ,030304 developmental biology ,media_common ,0303 health sciences ,business.industry ,Deep learning ,Computational Biology ,RNA, Circular ,General Medicine ,3. Good health ,Interdependence ,MicroRNAs ,030220 oncology & carcinogenesis ,RNA, Long Noncoding ,Artificial intelligence ,business ,computer ,Predictive modelling - Abstract
Biomolecules, such as microRNAs, circRNAs, lncRNAs and genes, are functionally interdependent in human cells, and all play critical roles in diverse fundamental and vital biological processes. The dysregulations of such biomolecules can cause diseases. Identifying the associations between biomolecules and diseases can uncover the mechanisms of complex diseases, which is conducive to their diagnosis, treatment, prognosis and prevention. Due to the time consumption and cost of biologically experimental methods, many computational association prediction methods have been proposed in the past few years. In this study, we provide a comprehensive review of machine learning-based approaches for predicting disease–biomolecule associations with multi-view data sources. Firstly, we introduce some databases and general strategies for integrating multi-view data sources in the prediction models. Then we discuss several feature representation methods for machine learning-based prediction models. Thirdly, we comprehensively review machine learning-based prediction approaches in three categories: basic machine learning methods, matrix completion-based methods and deep learning-based methods, while discussing their advantages and disadvantages. Finally, we provide some perspectives for further improving biomolecule–disease prediction methods.
- Published
- 2021
- Full Text
- View/download PDF
164. The clustering model and algorithm of PPI network based on propagating mechanism of artificial bee colony.
- Author
-
Xiujuan Lei, Jianfang Tian, Liang Ge, and Aidong Zhang
- Published
- 2013
- Full Text
- View/download PDF
165. A web server for identifying circRNA-RBP variable-length binding sites based on stacked generalization ensemble deep learning network
- Author
-
Zhengfeng Wang and Xiujuan Lei
- Subjects
Binding Sites ,Deep Learning ,RNA-Binding Proteins ,RNA, Circular ,Molecular Biology ,General Biochemistry, Genetics and Molecular Biology ,Algorithms - Abstract
Circular RNA (circRNA) can exert biological functions by interacting with RNA-binding protein (RBP), and some deep learning-based methods have been developed to predict RBP binding sites on circRNA. However, most of these methods identify circRNA-RBP binding sites are only based on single data resource and cannot provide exact binding sites, only providing the probability value of a sequence fragment. To solve these problems, we propose a binding sites localization algorithm that fuses binding sites from multiple databases, and further design a stacked generalization ensemble deep learning model named CirRBP to identify RBP binding sites on circRNA. The CirRBP is trained by combining the binding sites from multiple databases and makes predictions by weighted aggregating the predictions of each sub-model. The results show that the CirRBP outperforms any sub-model and existing online prediction model. For better access to our research results, we develop an open-source web application called CRWS (CircRNA-RBP Web Server). Its back-end learning model of the CRWS is a stacked generalization ensemble learning model CirRBP based on different deep learning frameworks. Given a full-length circRNA or fragment sequence and a target RBP, the CRWS can analyze and provide the exact potential binding sites of the target RBP on the given sequence through the binding sites localization algorithm, and visualize it. In addition, the CRWS can discover the most widely distributed motif in each RBP dataset. Up to now, CRWS is the first significant online tool that uses multi-source data to train models and predict exact binding sites. CRWS is now publicly and freely available without login requirement at: http://www.bioinformatics.team.
- Published
- 2022
166. Clustering PPI data based on Improved functional-flow model through Quantum-behaved PSO.
- Author
-
Xiujuan Lei, Xu Huang, Lei Shi 0021, and Aidong Zhang
- Published
- 2012
- Full Text
- View/download PDF
167. Comprehensive Analysis of Features and Annotations of Pathway Databases
- Author
-
Chen Bian, Xiujuan Lei, Ali Ghulam, and Min Guo
- Subjects
Computational Mathematics ,ComputingMethodologies_PATTERNRECOGNITION ,Information retrieval ,ComputingMethodologies_SIMULATIONANDMODELING ,Computer science ,Genetics ,Molecular Biology ,Biochemistry - Abstract
This study focused on describing the necessary information related to pathway mechanisms, characteristics, and databases feature annotations. Various difficulties related to data storage and retrieval in biological pathway databases are discussed. These focus on different techniques for retrieving annotations, features, and methods of digital pathway databases for biological pathway analysis. Furthermore, many pathway databases annotations, features, and search databases were also examined (which are reasonable for the integration into microarray examination). The investigation was performed on the databases, which contain human pathways to understand the hidden components of cells applied in this process. Three different domain-specific pathways were selected for this study and the information of pathway databases was extracted from the existing literature. The research compared different pathways and performed molecular level relations. Moreover, the associations between pathway networks were also evaluated. The study involved datasets for gene pathway matrices and pathway scoring techniques. Additionally, different pathways techniques, such as metabolomics and biochemical pathways, translation, control, and signaling pathways and signal transduction, were also considered. We also analyzed the list of gene sets and constructed a gene pathway network. This article will serve as a useful manual for storing a repository of specific biological data and disease pathways.
- Published
- 2021
- Full Text
- View/download PDF
168. Identifying the sequence specificities of circRNA-binding proteins based on a capsule network architecture
- Author
-
Xiujuan Lei and Zhengfeng Wang
- Subjects
Computer science ,RNA-binding protein ,Computational biology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,DNA-binding protein ,RNA Motifs ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Circular RNA ,Sequence specificities ,Humans ,Binding site ,Molecular Biology ,lcsh:QH301-705.5 ,030304 developmental biology ,Sequence (medicine) ,0303 health sciences ,Network architecture ,Binding Sites ,Methodology Article ,Applied Mathematics ,Computational Biology ,RNA-Binding Proteins ,RNA, Circular ,Convolution (computer science) ,Computer Science Applications ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,lcsh:R858-859.7 ,DNA microarray ,Sequence motif ,Capsule network ,Algorithms ,Function (biology) - Abstract
Background Circular RNAs (circRNAs) are widely expressed in cells and tissues and are involved in biological processes and human diseases. Recent studies have demonstrated that circRNAs can interact with RNA-binding proteins (RBPs), which is considered an important aspect for investigating the function of circRNAs. Results In this study, we design a slight variant of the capsule network, called circRB, to identify the sequence specificities of circRNAs binding to RBPs. In this model, the sequence features of circRNAs are extracted by convolution operations, and then, two dynamic routing algorithms in a capsule network are employed to discriminate between different binding sites by analysing the convolution features of binding sites. The experimental results show that the circRB method outperforms the existing computational methods. Afterwards, the trained models are applied to detect the sequence motifs on the seven circRNA-RBP bound sequence datasets and matched to known human RNA motifs. Some motifs on circular RNAs overlap with those on linear RNAs. Finally, we also predict binding sites on the reported full-length sequences of circRNAs interacting with RBPs, attempting to assist current studies. We hope that our model will contribute to better understanding the mechanisms of the interactions between RBPs and circRNAs. Conclusion In view of the poor studies about the sequence specificities of circRNA-binding proteins, we designed a classification framework called circRB based on the capsule network. The results show that the circRB method is an effective method, and it achieves higher prediction accuracy than other methods.
- Published
- 2021
169. A Review of Drug Repositioning Based Chemical-induced Cell Line Expression Data
- Author
-
Fei Wang, Xiujuan Lei, and Fang-Xiang Wu
- Subjects
0301 basic medicine ,Pharmacology ,Databases, Factual ,Computer science ,Drug candidate ,Organic Chemistry ,Drug Repositioning ,Computational Biology ,Computational biology ,Biochemistry ,Cell Line ,03 medical and health sciences ,Drug repositioning ,030104 developmental biology ,0302 clinical medicine ,Expression data ,Cell culture ,Drug Discovery ,Molecular Medicine ,Transcriptome ,Gene ,030217 neurology & neurosurgery - Abstract
Drug repositioning is an important area of biomedical research. The drug repositioning studies have shifted to computational approaches. Large-scale perturbation databases, such as the Connectivity Map and the Library of Integrated Network-Based Cellular Signatures, contain a number of chemical-induced gene expression profiles and provide great opportunities for computational biology and drug repositioning. One reason is that the profiles provided by the Connectivity Map and the Library of Integrated Network-Based Cellular Signatures databases show an overall view of biological mechanism in drugs, diseases and genes. In this article, we provide a review of the two databases and their recent applications in drug repositioning.
- Published
- 2020
- Full Text
- View/download PDF
170. WITMSG: Large-scale Prediction of Human Intronic m6A RNA Methylation Sites from Sequence and Genomic Features
- Author
-
Xiujuan Lei, Zhen Wei, Lian Liu, and Jia Meng
- Subjects
0303 health sciences ,intron ,RNA localization ,site prediction ,RNA methylation ,Intron ,m6A ,Computational biology ,Biology ,Article ,Random forest ,03 medical and health sciences ,0302 clinical medicine ,030220 oncology & carcinogenesis ,RNA splicing ,Genetics ,genomic features ,Epigenetics ,sequence features ,Genetics (clinical) ,Selection (genetic algorithm) ,030304 developmental biology ,Sequence (medicine) - Abstract
Introduction: N6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications. It plays important roles in various biological processes, such as splicing, RNA localization and degradation, many of which are related to the functions of introns. Although a number of computational approaches have been proposed to predict the m6A sites in different species, none of them were optimized for intronic m6A sites. As existing experimental data overwhelmingly relied on polyA selection in sample preparation and the intronic RNAs are usually underrepresented in the captured RNA library, the accuracy of general m6A sites prediction approaches is limited for intronic m6A sites prediction task. Methodology: A computational framework, WITMSG, dedicated to the large-scale prediction of intronic m6A RNA methylation sites in humans has been proposed here for the first time. Based on the random forest algorithm and using only known intronic m6A sites as the training data, WITMSG takes advantage of both conventional sequence features and a variety of genomic characteristics for improved prediction performance of intron-specific m6A sites. Results and Conclusion: It has been observed that WITMSG outperformed competing approaches (trained with all the m6A sites or intronic m6A sites only) in 10-fold cross-validation (AUC: 0.940) and when tested on independent datasets (AUC: 0.946). WITMSG was also applied intronome-wide in humans to predict all possible intronic m6A sites, and the prediction results are freely accessible at http://rnamd.com/intron/.
- Published
- 2020
- Full Text
- View/download PDF
171. Integrating random walk with restart and k-Nearest Neighbor to identify novel circRNA-disease association
- Author
-
Chen Bian and Xiujuan Lei
- Subjects
Multidisciplinary ,Computer science ,lcsh:R ,Computational Biology ,Disease Association ,lcsh:Medicine ,RNA, Circular ,computer.software_genre ,Random walk ,Article ,Weighting ,k-nearest neighbors algorithm ,Data processing ,Cluster Analysis ,Humans ,Computational models ,Genetic Predisposition to Disease ,lcsh:Q ,Data mining ,lcsh:Science ,computer ,Algorithms - Abstract
CircRNA is a special type of non-coding RNA, which is closely related to the occurrence and development of many complex human diseases. However, it is time-consuming and expensive to determine the circRNA-disease associations through experimental methods. Therefore, based on the existing databases, we propose a method named RWRKNN, which integrates the random walk with restart (RWR) and k-nearest neighbors (KNN) to predict the associations between circRNAs and diseases. Specifically, we apply RWR algorithm on weighting features with global network topology information, and employ KNN to classify based on features. Finally, the prediction scores of each circRNA-disease pair are obtained. As demonstrated by leave-one-out, 5-fold cross-validation and 10-fold cross-validation, RWRKNN achieves AUC values of 0.9297, 0.9333 and 0.9261, respectively. And case studies show that the circRNA-disease associations predicted by RWRKNN can be successfully demonstrated. In conclusion, RWRKNN is a useful method for predicting circRNA-disease associations.
- Published
- 2020
- Full Text
- View/download PDF
172. A pseudo-Siamese framework for circRNA-RBP binding sites prediction integrating BiLSTM and soft attention mechanism
- Author
-
Yajing Guo and Xiujuan Lei
- Subjects
Binding Sites ,RNA-Binding Proteins ,RNA, Circular ,Molecular Biology ,General Biochemistry, Genetics and Molecular Biology ,Software - Abstract
Circular RNAs (circRNAs) are widely expressed in tissues and play a key role in diseases through interacting with RNA binding proteins (RBPs). Since the high cost of traditional technology, computational methods are developed to identify the binding sites between circRNAs and RBPs. Unfortunately, these methods suffer from the insufficient learning of features and the single classification of output. To address these limitations, we propose a novel method named circ-pSBLA which constructs a pseudo-Siamese framework integrating Bi-directional long short-term memory (BiLSTM) network and soft attention mechanism for circRNA-RBP binding sites prediction. Softmax function and CatBoost are adopted to classify, respectively, and then a pseudo-Siamese framework is constructed. circ-pSBLA combines them to get final output. To validate the effectiveness of circ-pSBLA, we compare it with other state-of-the-art methods and carry out an ablation experiment on 17 sub-datasets. Moreover, we do motif analysis on 3 sub-datasets. The results show that circ-pSBLA achieves superior performance and outperforms other methods. All supporting source codes can be downloaded from https://github.com/gyj9811/circ-pSBLA.
- Published
- 2022
173. A Deep Neural Network for Cervical Cell Detection Based on Cytology Images
- Author
-
Ming Fang, Xiujuan Lei, Bo Liao, Fang-Xiang Wu, and FangXiang Wu
- Published
- 2022
- Full Text
- View/download PDF
174. An Improved bacteria Foraging Optimization Algorithm based on intuitionistic fuzzy Set for Clustering PPI Networks.
- Author
-
Shuang Wu, Xiujuan Lei, and Jianfang Tian
- Published
- 2012
- Full Text
- View/download PDF
175. CircR2Disease: a manually curated database for experimentally supported circular RNAs associated with various diseases.
- Author
-
Chunyan Fan, Xiujuan Lei, Zengqiang Fang, Qinghua Jiang, and Fang-Xiang Wu
- Published
- 2018
- Full Text
- View/download PDF
176. ISGm1A: Integration of Sequence Features and Genomic Features to Improve the Prediction of Human m1A RNA Methylation Sites
- Author
-
Jia Meng, Lian Liu, Xiujuan Lei, and Zhen Wei
- Subjects
Regulation of gene expression ,General Computer Science ,site prediction ,RNA methylation ,Computer science ,General Engineering ,Nucleic acid sequence ,Epitranscriptome ,RNA ,m¹A ,Computational biology ,Cross-validation ,Random forest ,Transcriptome ,genomic features ,General Materials Science ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,lcsh:TK1-9971 ,sequence features ,Sequence (medicine) - Abstract
As a new epitranscriptomic modification, N1-methyladenosine (m1A) plays an important role in the gene expression regulation. Although some computational methods were proposed to predict m1A modification sites, all of these methods apply machine learning predictions based on the nucleotide sequence features, and they missed the layer of information in transcript topology and RNA secondary structures. To enhance the prediction model of m1A RNA methylation, we proposed a computational framework, ISGm1A, which stands for integration sequence features and genomic features to improve the prediction of human m1A RNA methylation sites. Based on the random forest algorithm, ISGm1A takes advantage of both conventional sequence features and 75 genomic characteristics to improve the prediction performance of m1A sites in human. The results of five-fold cross validation and independent test show that ISGm1A outperforms other prediction algorithms (AUC = 0.903 and 0.909). In addition, through analyzing the importance of features, we found that the genomic features play a more important role in site prediction than the sequence features. Furthermore, with ISGm1A, we generated a high accuracy map of m1A by predicting all adenines sites in the transcriptome. The data and the results of the study are freely accessible at: https://github.com/lianliu09/m1a_prediction.git.
- Published
- 2020
- Full Text
- View/download PDF
177. Identification of Pathway-Specific Protein Domain by Incorporating Hyperparameter Optimization Based on 2D Convolutional Neural Network
- Author
-
Yuchen Zhang, Ali Ghualm, Min Guo, Shi Cheng, and Xiujuan Lei
- Subjects
0301 basic medicine ,Specific protein ,General Computer Science ,Computer science ,Protein domain ,multiple features ,convolutional neural network ,Computational biology ,evolutionary knowledge ,Convolutional neural network ,Domain (software engineering) ,03 medical and health sciences ,chemistry.chemical_compound ,Feature (machine learning) ,Molecule ,General Materials Science ,Organism ,Dipeptide ,030102 biochemistry & molecular biology ,Artificial neural network ,business.industry ,Deep learning ,General Engineering ,RNA ,deep learning ,030104 developmental biology ,chemistry ,Hyperparameter optimization ,Molecular structure prediction ,Artificial intelligence ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,lcsh:TK1-9971 ,DNA - Abstract
Pathway-specific protein domain (PSPD) are associated with specific pathways. Many protein domains are pervasive in various biological processes, whereas other domains are linked to specific pathways. Many human disease pathways, such as cancer pathways and signaling pathway-related diseases, have caused the loss of functional PSPD. Therefore, the creation of an accurate method to predict its roles is a critical step toward human disease and pathways. In this study, we proposed a deep learning model based on a two-dimensional neural network (2D-CNN-PSPD) with a pathway-specific protein domain association prediction. In terms of the purposes of a sub-pathway, its parent pathway and its super pathway are linked to the Uni-Pathway. We also proposed a dipeptide composition (DPC) model and a dipeptide deviation (DDE) model of feature extraction profiles as PSSM. Then, we predicted the proteins associated with the same sub-pathway or with the same organism. The DDE model and DPC model of the PSSM feature profile input was associated with our proposed 2D-CNN method. We deployed several parameters to optimize the model’s output performance and used the hyperparameter optimization approach to find the best model for our dataset based on the 10-fold cross-validation results. Ultimately, we assessed the predictive performance of the current model by using independent datasets and cross-validation datasets. Therefore, we enhanced the efficiency of deep learning methods. PSPD is involved in any known pathway and then follow the association in different stages of the pathway hierarchy with other proteins. Our proposed method could identify 2D-CNN-PSPD with 0.83% sensitivity, 0.92% specificity, 87.27% accuracy, and 0.75% accuracy. We provided an important method for the analysis of PSPD proteins in the proposed research, and our achievements might promote computational biological research. We concluded our proposed model architecture in the future, the use of the latest features, and the multi-one structure to predict different types of molecules, such as DNA, RNA, and disease-pathway specific proteins associations.
- Published
- 2020
178. Prediction of miRNA-circRNA associations based on k-NN multi-label with random walk restart on a heterogeneous network
- Author
-
Xiujuan Lei and Zengqiang Fang
- Subjects
Computer Networks and Communications ,Computer science ,Node (networking) ,Reliability (computer networking) ,Contrast (statistics) ,Computational biology ,Random walk ,Cross-validation ,Computer Science Applications ,Similarity (network science) ,Artificial Intelligence ,Heterogeneous network ,Information Systems ,Coding (social sciences) - Abstract
Circular RNAs (circRNAs) play important roles in various biological processes, as essential non -coding RNAs that have effects on transcriptional and posttranscriptional gene expression regulation. Recently, many studies have shown that circRNAs can be regarded as micro RNA (miRNA) sponges, which are known to be associated with certain diseases. Therefore efficient computation methods are needed to explore miRNA-circRNA interactions, but only very few computational methods for predicting the associations between miRNAs and circRNAs exist. In this study, we adopt an improved random walk computational method, named KRWRMC, to express complicated associations between miRNAs and circRNAs. Our major contributions can be summed up in two points. First, in the conventional Random Walk Restart Heterogeneous (RWRH) algorithm, the computational method simply converts the circRNA/miRNA similarity network into the transition probability matrix; in contrast, we take the influence of the neighbor of the node in the network into account, which can suggest or stress some potential associations. Second, our proposed KRWRMC is the first computational model to calculate large numbers of miRNA-circRNA associations, which can be regarded as biomarkers to diagnose certain diseases and can thus help us to better understand complicated diseases. The reliability of KRWRMC has been verified by Leave One Out Cross Validation (LOOCV) and 10-fold cross validation, the results of which indicate that this method achieves excellent performance in predicting potential miRNA-circRNA associations.
- Published
- 2019
- Full Text
- View/download PDF
179. A Survey on Computational Methods for Essential Proteins and Genes Prediction
- Author
-
Ming Fang, Xiujuan Lei, and Ling Guo
- Subjects
0303 health sciences ,03 medical and health sciences ,Computational Mathematics ,030302 biochemistry & molecular biology ,Genetics ,Computational biology ,Biology ,Molecular Biology ,Biochemistry ,Gene ,030304 developmental biology - Abstract
Background: Essential proteins play important roles in the survival or reproduction of an organism and support the stability of the system. Essential proteins are the minimum set of proteins absolutely required to maintain a living cell. The identification of essential proteins is a very important topic not only for a better comprehension of the minimal requirements for cellular life, but also for a more efficient discovery of the human disease genes and drug targets. Traditionally, as the experimental identification of essential proteins is complex, it usually requires great time and expense. With the cumulation of high-throughput experimental data, many computational methods that make useful complements to experimental methods have been proposed to identify essential proteins. In addition, the ability to rapidly and precisely identify essential proteins is of great significance for discovering disease genes and drug design, and has great potential for applications in basic and synthetic biology research. Objective: The aim of this paper is to provide a review on the identification of essential proteins and genes focusing on the current developments of different types of computational methods, point out some progress and limitations of existing methods, and the challenges and directions for further research are discussed.
- Published
- 2019
- Full Text
- View/download PDF
180. Prediction of RBP binding sites on circRNAs using an LSTM-based deep sequence learning architecture
- Author
-
Xiujuan Lei and Zhengfeng Wang
- Subjects
Binding Sites ,Base Sequence ,business.industry ,Deep learning ,Computational Biology ,RNA-Binding Proteins ,Reproducibility of Results ,RNA-binding protein ,Computational biology ,RNA, Circular ,Biology ,ENCODE ,Deep Learning ,ROC Curve ,Circular RNA ,Databases, Genetic ,Word2vec ,Sequence learning ,Artificial intelligence ,Binding site ,business ,Molecular Biology ,Function (biology) ,Algorithms ,Information Systems - Abstract
Circular RNAs (circRNAs) are widely expressed in highly diverged eukaryotes. Although circRNAs have been known for many years, their function remains unclear. Interaction with RNA-binding protein (RBP) to influence post-transcriptional regulation is considered to be an important pathway for circRNA function, such as acting as an oncogenic RBP sponge to inhibit cancer. In this study, we design a deep learning framework, CRPBsites, to predict the binding sites of RBPs on circRNAs. In this model, the sequences of variable-length binding sites are transformed into embedding vectors by word2vec model. Bidirectional LSTM is used to encode the embedding vectors of binding sites, and then they are fed into another LSTM decoder for decoding and classification tasks. To train and test the model, we construct four datasets that contain sequences of variable-length binding sites on circRNAs, and each set corresponds to an RBP, which is overexpressed in bladder cancer tissues. Experimental results on four datasets and comparison with other existing models show that CRPBsites has superior performance. Afterwards, we found that there were highly similar binding motifs in the four binding site datasets. Finally, we applied well-trained CRPBsites to identify the binding sites of IGF2BP1 on circCDYL, and the results proved the effectiveness of this method. In conclusion, CRPBsites is an effective prediction model for circRNA-RBP interaction site identification. We hope that CRPBsites can provide valuable guidance for experimental studies on the influence of circRNA on post-transcriptional regulation.
- Published
- 2021
181. Predicting Microbe-Disease Association Based on Multiple Similarities and LINE Algorithm
- Author
-
Yueyue Wang, Yi Pan, Cheng Lu, and Xiujuan Lei
- Subjects
Structure (mathematical logic) ,Information Services ,Computer science ,Applied Mathematics ,Association (object-oriented programming) ,Computational Biology ,Cross-validation ,Semantic similarity ,Similarity (network science) ,Kernel (statistics) ,Genetics ,Humans ,Construct (philosophy) ,Algorithm ,Heterogeneous network ,Algorithms ,Biotechnology - Abstract
Numerous microbes have been found to have vital impacts on human health through affecting biological processes. Therefore, exploring potential associations between microbes and diseases will promote the understanding and diagnosis of diseases. In this study, we present a novel computational model, named MSLINE, to infer potential microbe-disease associations by integrating Multiple Similarities and Large-scale Information Network Embedding (LINE) based on known associations. Specifically, on the basis of known microbe-disease associations from the Human Microbe-Disease Association Database, we first increase the known associations by collecting proven associations from existing literatures. We then construct a microbe-disease heterogeneous network (MDHN) by integrating known associations and multiple similarities (including Gaussian interaction profile kernel similarity, microbe function similarity, disease semantic similarity and disease-symptom similarity). After that, we implement random walk and LINE algorithm on MDHN to learn its structure information. Finally, we score the microbe-disease associations according to the structure information for every nodes. In the Leave-one-out cross validation and 5-fold cross validation, MSLINE performs better compared to other existing methods. Moreover, case studies of different diseases proved that MSLINE could predict the potential microbe-disease associations efficiently.
- Published
- 2021
182. GATCDA: Predicting circRNA-Disease Associations Based on Graph Attention Network
- Author
-
Xiujuan Lei, Fang-Xiang Wu, and Chen Bian
- Subjects
0301 basic medicine ,Cancer Research ,Mechanism (biology) ,Computer science ,Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,Computational biology ,Disease ,Article ,graph attention network ,circRNA–miRNA–mRNA axis ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Oncology ,Similarity (network science) ,030220 oncology & carcinogenesis ,Attention network ,Graph (abstract data type) ,circRNA–disease association ,RC254-282 - Abstract
Simple Summary CircRNAs (circular RNAs), a novel kind of non-coding RNAs, play a regulatory role in cellular processes. A growing number of biological experiments has proved that circRNAs can be used as biomarkers and therapeutic targets of some cancers. As the time and financial costs of biological experiments are high, computational methods have become a better way to predict the associations between circRNAs and diseases. Graph attention network was first applied to predict circRNA-disease associations with multiple similarities of data in this study. The circRNA–miRNA interactions and disease-mRNA interactions were adopted to construct features. The computational method proposed in this study has improved the prediction performance. Abstract CircRNAs (circular RNAs) are a class of non-coding RNA molecules with a closed circular structure. CircRNAs are closely related to the occurrence and development of diseases. Due to the time-consuming nature of biological experiments, computational methods have become a better way to predict the interactions between circRNAs and diseases. In this study, we developed a novel computational method called GATCDA utilizing a graph attention network (GAT) to predict circRNA–disease associations with disease symptom similarity, network similarity, and information entropy similarity for both circRNAs and diseases. GAT learns representations for nodes on a graph by an attention mechanism, which assigns different weights to different nodes in a neighborhood. Considering that the circRNA–miRNA–mRNA axis plays an important role in the generation and development of diseases, circRNA–miRNA interactions and disease–mRNA interactions were adopted to construct features, in which mRNAs were related to 88% of miRNAs. As demonstrated by five-fold cross-validation, GATCDA yielded an AUC value of 0.9011. In addition, case studies showed that GATCDA can predict unknown circRNA–disease associations. In conclusion, GATCDA is a useful method for exploring associations between circRNAs and diseases.
- Published
- 2021
183. Human drug-pathway association prediction based on network consistency projection
- Author
-
Ali Ghulam, Xiujuan Lei, Yuchen Zhang, and Zhenqiang Wu
- Subjects
Computational Mathematics ,Structural Biology ,Neoplasms ,Organic Chemistry ,Computational Biology ,Humans ,Computer Simulation ,Biochemistry ,Algorithms - Abstract
We present a novel computational method for drug-pathway association prediction based on known drug-pathway associations. The association between a drug and a pathway needs to be examined to not only explain the cause and enable the identification, therapy, and diagnosis of a human disease. Though, biological studies and clinical trials require substantial time and resources to identify drug-pathway associations. Considerable research attention has been devoted to many scientists have developed computer models to predict the future interactions of drug-pathway organizations. We proposed a novel computing approach known as the Network Consistency Projection for Human Drug-Pathway Association (NCPHDPA). This method was based on the drug pathway target wherein biologically related drugs appear to interact with pathway targets in identical diseases and vice versa. We computed the pathway-pathway-interaction similarity of drugs sharing similarities on the basis of pairwise Jaccard similarity and then computed the drug-drug-interaction similarity of drugs sharing similar drug targets based on Jaccard similarity. The system was combined because of the cosine similarity drug network, the pathway cosine resemblance network, and the interaction network for recognized drug-pathway. NCPHDPA was a parameter less solution and did not require negative tests. Notably, NCPHDPA could be used to predict drugs without any known related pathway. Test results showed that our proposed NCPHDPA method with LOOCV achieved a high ROC of AUC = 0.7479, and with10-fold CV obtained ROC of AUC = 0.7566. The Result of ROC (AUC) comparison of NCPHDPA with other methods, such as SIMCCDA LOOCV (AUC = 0.7364), LOMDA LOOCV (AUC = 0.6729) and DMTHNDM LOOCV (AUC = 0.50.00) obtained. The robust predictive capability of the NCPHDPA was demonstrated in three case studies on drugs involved in pathways, cancer pathways, and hepatocellular carcinoma. Few attempts have been made to compared with other methods, our proposed NCPHDPA method had reliable predictive performance. The results yielded some interesting findings as that interaction of these proteins can cause a change in its associated pathway, leading to the onset of cancer.
- Published
- 2021
184. Essential Protein Prediction Based on Shuffled Frog‐Leaping Algorithm
- Author
-
Xiaoqin, YANG, primary, Xiujuan, LEI, additional, and Jie, ZHAO, additional
- Published
- 2021
- Full Text
- View/download PDF
185. CircR2Disease v2.0: An Updated Web Server for Experimentally Validated circRNA-disease Associations and Its Application
- Author
-
Xiujuan Lei, Jiaojiao Tie, Yuchen Zhang, Yi Pan, Chunyan Fan, and Fang-Xiang Wu
- Subjects
Computational Mathematics ,Web server ,Gradient boosting decision tree ,Computer science ,Genetics ,Graph (abstract data type) ,Disease ,Computational biology ,computer.software_genre ,Molecular Biology ,Biochemistry ,computer ,Web tool - Abstract
With accumulating dysregulated circular RNAs (circRNAs) in pathological processes, the regulatory functions of circRNAs, especially circRNAs as microRNA (miRNA) sponges and their interactions with RNA-binding proteins (RBPs), have been widely validated. However, the collected information on experimentally validated circRNA-disease associations is only preliminary. Therefore, an updated CircR2Disease database providing a comprehensive resource and web tool to clarify the relationships between circRNAs and diseases in diverse species is necessary. Here, we present an updated CircR2Disease v2.0 with the increased number of circRNA-disease associations and novel characteristics. CircR2Disease v2.0 provides more than 5-fold experimentally validated circRNA-disease associations compared to its previous version. This version includes 4201 entries between 3077 circRNAs and 312 disease subtypes. Secondly, the information of circRNA-miRNA, circRNA-miRNA-target, and circRNA-RBP interactions has been manually collected for various diseases. Thirdly, the gene symbols of circRNAs and disease name IDs can be linked with various nomenclature databases. Detailed descriptions such as samples and journals have also been integrated into the updated version. Thus, CircR2Disease v2.0 can serve as a platform for users to systematically investigate the roles of dysregulated circRNAs in various diseases and further explore the posttranscriptional regulatory function in diseases. Finally, we propose a computational method named circDis based on the graph convolutional network (GCN) and gradient boosting decision tree (GBDT) to illustrate the applications of the CircR2Disease v2.0 database. CircR2Disease v2.0 is available at http://bioinfo.snnu.edu.cn/CircR2Disease_v2.0 and https://github.com/bioinforlab/CircR2Disease-v2.0.
- Published
- 2020
186. Graph Convolution Networks Using Message Passing and Multi-Source Similarity Features for Predicting circRNA-Disease Association
- Author
-
Yi Pan, Xiujuan Lei, Yan-Qing Zhang, Thosini Bamunu Mudiyanselage, and Nipuna Senanayake
- Subjects
FOS: Computer and information sciences ,0301 basic medicine ,Computer Science - Machine Learning ,Theoretical computer science ,business.industry ,Computer science ,Deep learning ,Association (object-oriented programming) ,0206 medical engineering ,Message passing ,Feature extraction ,Machine Learning (stat.ML) ,02 engineering and technology ,Cross-validation ,Machine Learning (cs.LG) ,Convolution ,03 medical and health sciences ,030104 developmental biology ,Kernel (image processing) ,Similarity (network science) ,Statistics - Machine Learning ,Artificial intelligence ,business ,020602 bioinformatics - Abstract
Graphs can be used to effectively represent complex data structures. Learning these irregular data in graphs is challenging and still suffers from shallow learning. Applying deep learning on graphs has demonstrated good performance in many applications including social analysis, bioinformatics etc. Message passing graph convolution network is a powerful method which has expressive power to learn graph structures. Meanwhile, circular ribonucleic acid (circRNA) is a type of non-coding RNA which plays a critical role in human diseases. Identifying the associations between circRNAs and diseases is important for diagnosis and treatment of complex diseases. However, there are limited number of known associations between them and conducting biological experiments to identify new associations is time consuming and expensive. As a result, there is a need of building efficient and feasible computation methods to predict potential circRNA-disease associations. In this paper, we propose a novel graph convolution network framework to learn features from a graph built with multi-source similarity information to predict circRNA-disease associations. First we use multi-source information of circRNA similarity, disease and circRNA Gaussian Interaction Profile (GIP) kernel similarity to extract the features using first graph convolution. Then we predict disease associations for each circRNA with a second graph convolution. Proposed framework with five-fold cross validation on various experiments shows promising results in predicting circRNA-disease association and outperforms other existing methods.
- Published
- 2020
187. A comprehensive survey on computational methods of non-coding RNA and disease association prediction
- Author
-
Yi Pan, Xiujuan Lei, Thosini Bamunu Mudiyanselage, Chen Bian, Ning Yu, Yuchen Zhang, and Wei Lan
- Subjects
Computer science ,Disease Association ,Machine learning ,computer.software_genre ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Prediction methods ,Similarity (psychology) ,Humans ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Biological data ,business.industry ,Sequence Analysis, RNA ,Deep learning ,Computational Biology ,RNA, Circular ,Non-coding RNA ,MicroRNAs ,030220 oncology & carcinogenesis ,RNA, Long Noncoding ,Artificial intelligence ,Experimental methods ,business ,computer ,Information Systems ,Data integration - Abstract
The studies on relationships between non-coding RNAs and diseases are widely carried out in recent years. A large number of experimental methods and technologies of producing biological data have also been developed. However, due to their high labor cost and production time, nowadays, calculation-based methods, especially machine learning and deep learning methods, have received a lot of attention and been used commonly to solve these problems. From a computational point of view, this survey mainly introduces three common non-coding RNAs, i.e. miRNAs, lncRNAs and circRNAs, and the related computational methods for predicting their association with diseases. First, the mainstream databases of above three non-coding RNAs are introduced in detail. Then, we present several methods for RNA similarity and disease similarity calculations. Later, we investigate ncRNA-disease prediction methods in details and classify these methods into five types: network propagating, recommend system, matrix completion, machine learning and deep learning. Furthermore, we provide a summary of the applications of these five types of computational methods in predicting the associations between diseases and miRNAs, lncRNAs and circRNAs, respectively. Finally, the advantages and limitations of various methods are identified, and future researches and challenges are also discussed.
- Published
- 2020
188. Identifying Gene Signatures for Cancer Drug Repositioning Based on Sample Clustering
- Author
-
Fang-Xiang Wu, Fei Wang, Xiujuan Lei, Yulian Ding, and Bo Liao
- Subjects
0303 health sciences ,Databases, Factual ,Computer science ,Applied Mathematics ,Cancer drugs ,Drug Repositioning ,Computational biology ,Gene signature ,3. Good health ,Weighting ,03 medical and health sciences ,Drug repositioning ,0302 clinical medicine ,Differentially expressed genes ,Homogeneous ,030220 oncology & carcinogenesis ,Neoplasms ,Genetics ,Cluster Analysis ,Humans ,Cluster analysis ,Transcriptome ,Gene ,030304 developmental biology ,Biotechnology - Abstract
Drug repositioning is an important approach for drug discovery. Computational drug repositioning approaches typically use a gene signature to represent a particular disease and connect the gene signature with drug perturbation profiles. Although disease samples, especially from cancer, may be heterogeneous, most existing methods consider them as a homogeneous set to identify differentially expressed genes (DEGs)for further determining a gene signature. As a result, some genes that should be in a gene signature may be averaged off. In this study, we propose a new framework to identify gene signatures for cancer drug repositioning based on sample clustering (GS4CDRSC). GS4CDRSC first groups samples into several clusters based on their gene expression profiles. Second, an existing method is applied to the samples in each cluster for generating a list of DEGs. Then a weighting approach is used to identify an intergrated gene signature from all the lists of DEGs. The integrated gene signature is used to connect with drug perturbation profiles in the Connectivity Map (CMap)database to generate a list of drug candidates. GS4CDRSC has been tested with several cancer datasets and existing methods. The computational results show that GS4CDRSC outperforms those methods without the sample clustering and weighting approaches in terms of both number and rate of predicted known drugs for specific cancers.
- Published
- 2020
189. Variational graph auto-encoders for miRNA-disease association prediction
- Author
-
Xiujuan Lei, Fang-Xiang Wu, Bo Liao, Li-Ping Tian, and Yulian Ding
- Subjects
Association score ,Computer science ,Disease Association ,Latent variable ,Machine learning ,computer.software_genre ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Humans ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,business.industry ,Deep learning ,030302 biochemistry & molecular biology ,Auto encoders ,Computational Biology ,Autoencoder ,MicroRNAs ,Graph (abstract data type) ,Artificial intelligence ,Neural Networks, Computer ,business ,computer ,Heterogeneous network ,Algorithms - Abstract
Cumulative experimental studies have demonstrated the critical roles of microRNAs (miRNAs) in the diverse fundamental and important biological processes, and in the development of numerous complex human diseases. Thus, exploring the relationships between miRNAs and diseases is helpful with understanding the mechanisms, the detection, diagnosis, and treatment of complex diseases. As the identification of miRNA-disease associations via traditional biological experiments is time-consuming and expensive, an effective computational prediction method is appealing. In this study, we present a deep learning framework with variational graph auto-encoder for miRNA-disease association prediction (VGAE-MDA). VGAE-MDA first gets the representations of miRNAs and diseases from the heterogeneous networks constructed by miRNA-miRNA similarity, disease-disease similarity, and known miRNA-disease associations. Then, VGAE-MDA constructs two sub-networks: miRNA-based network and disease-based network. Combining the representations based on the heterogeneous network, two variational graph auto-encoders (VGAE) are deployed for calculating the miRNA-disease association scores from two sub-networks, respectively. Lastly, VGAE-MDA obtains the final predicted association score for a miRNA-disease pair by integrating the scores from these two trained networks. Unlike the previous model, the VGAE-MDA can mitigate the effect of noises from random selection of negative samples. Besides, the use of graph convolutional neural (GCN) network can naturally incorporate the node features from the graph structure while the variational autoencoder (VAE) makes use of latent variables to predict associations from the perspective of data distribution. The experimental results show that VGAE-MDA outperforms the state-of-the-art approaches in miRNA-disease association prediction. Besides, the effectiveness of our model has been further demonstrated by case studies.
- Published
- 2020
190. PPI modules detection method through ABC-IFC algorithm.
- Author
-
Xiujuan Lei, Jianfang Tian, and Fang-Xiang Wu
- Published
- 2013
- Full Text
- View/download PDF
191. Prediction of CircRNA-Disease Associations Using KATZ Model Based on Heterogeneous Networks
- Author
-
Xiujuan Lei, Fang-Xiang Wu, and Chunyan Fan
- Subjects
0301 basic medicine ,0206 medical engineering ,Complex disease ,02 engineering and technology ,Disease ,Computational biology ,Similarity measure ,Biology ,Applied Microbiology and Biotechnology ,Cross-validation ,03 medical and health sciences ,Mirna sponge ,Humans ,Gene Regulatory Networks ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,CircRNA-disease association ,Computational model ,Computational Biology ,RNA, Circular ,Cell Biology ,Models, Theoretical ,KATZ model ,similarity measure ,030104 developmental biology ,RNA ,Large group ,020602 bioinformatics ,Heterogeneous network ,Research Paper ,Developmental Biology - Abstract
Circular RNAs (circRNAs) are a large group of endogenous non-coding RNAs which are key members of gene regulatory processes. Those circRNAs in human paly significant roles in health and diseases. Owing to the characteristics of their universality, specificity and stability, circRNAs are becoming an ideal class of biomarkers for disease diagnosis, treatment and prognosis. Identification of the relationships between circRNAs and diseases can help understand the complex disease mechanism. However, traditional experiments are costly and time-consuming, and little computational models have been developed to predict novel circRNA-disease associations. In this study, a heterogeneous network was constructed by employing the circRNA expression profiles, disease phenotype similarity and Gaussian interaction profile kernel similarity. Then, we developed a computational model of KATZ measures for human circRNA-disease association prediction (KATZHCDA). The leave-one-out cross validation (LOOCV) and 5-fold cross validation were implemented to investigate the effects of these four types of similarity measures. As a result, KATZHCDA model yields the AUCs of 0.8469 and 0.7936+/-0.0065 in LOOCV and 5-fold cross validation, respectively. Furthermore, we analyze the candidate association between hsa_circ_0006054 and colorectal cancer, and results showed that hsa_circ_0006054 may function as miRNA sponge in the carcinogenesis of colorectal cancer. Overall, it is anticipated that our proposed model could become an effective resource for clinical experimental guidance.
- Published
- 2018
- Full Text
- View/download PDF
192. Locating Multiple Optima via Brain Storm Optimization Algorithms
- Author
-
Xiujuan Lei, Yuhui Shi, Shi Cheng, and Junfeng Chen
- Subjects
0209 industrial biotechnology ,Mathematical optimization ,Optimization problem ,General Computer Science ,swarm intelligence ,Computer science ,General Engineering ,Cauchy distribution ,Particle swarm optimization ,Brain storm optimization ,02 engineering and technology ,brain storm optimization in objective space algorithm ,nonlinear equation systems ,Normal distribution ,Nonlinear system ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,General Materials Science ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Cluster analysis ,multimodal optimization ,lcsh:TK1-9971 ,Random variable - Abstract
Locating multiple optima/peaks in a single run and maintaining these found optima until the end of a run is the goal of multimodal optimization. Three variants of brain storm optimization (BSO) algorithms, which include original BSO algorithm, BSO in objective space algorithm with Gaussian random variable, and BSO in objective space algorithm with Cauchy random variable, were utilized to solve multimodal optimization problems in this paper. The experimental tests were conducted on eight benchmark problems and its applications in seven nonlinear equation system problems. The performance and effectiveness of various BSO algorithms on solving multimodal optimization problems were validated based on the experimental results. The conclusions could be made that the global search ability and solutions maintenance ability of an algorithm needs to be balanced simultaneously on solving multimodal optimization problems.
- Published
- 2018
- Full Text
- View/download PDF
193. Application of Fireworks Algorithm in Bioinformatics
- Author
-
Xiujuan Lei, Yuchen Zhang, and Ying Tan
- Subjects
0301 basic medicine ,03 medical and health sciences ,030104 developmental biology ,Computer science ,Fireworks algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Data mining ,computer.software_genre ,computer - Abstract
Fireworks Algorithm (FWA) has been applied to many fields in recent years, showing a strong ability to solve optimization problems. In this chapter, FWA is applied to some research hotspots in bioinformatics, such as biclustering of gene expression data, disease-gene prediction, and identification of LncRNA-protein interactions. This chapter briefly introduces some backgrounds of bioinformatics and related issues. Through corresponding bioinformatics' problems to optimization problems, some specific optimization functions are constructed and solved by the Fireworks Algorithm. The simulation results illustrate that the fireworks algorithm shows high performance and potential application value in the field of bioinformatics.
- Published
- 2020
- Full Text
- View/download PDF
194. Detecting overlapping protein complexes in weighted PPI network based on overlay network chain in quotient space
- Author
-
Xiujuan Lei and Jie Zhao
- Subjects
Computer science ,0206 medical engineering ,Protein complexes ,Overlay network ,02 engineering and technology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Clustering ,03 medical and health sciences ,Chain (algebraic topology) ,Structural Biology ,Cluster Analysis ,Humans ,Protein Interaction Maps ,Cluster analysis ,lcsh:QH301-705.5 ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,business.industry ,Applied Mathematics ,Research ,Quotient space ,Proteins ,Pattern recognition ,Quotient space (topology) ,Granular computation ,Computer Science Applications ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,lcsh:Biology (General) ,Ppi network ,Core (graph theory) ,lcsh:R858-859.7 ,Gene ontology ,Artificial intelligence ,business ,020602 bioinformatics ,Algorithms - Abstract
Background Protein complexes are the cornerstones of many biological processes and gather them to form various types of molecular machinery that perform a vast array of biological functions. In fact, a protein may belong to multiple protein complexes. Most existing protein complex detection algorithms cannot reflect overlapping protein complexes. To solve this problem, a novel overlapping protein complexes identification algorithm is proposed. Results In this paper, a new clustering algorithm based on overlay network chain in quotient space, marked as ONCQS, was proposed to detect overlapping protein complexes in weighted PPI networks. In the quotient space, a multilevel overlay network is constructed by using the maximal complete subgraph to mine overlapping protein complexes. The GO annotation data is used to weight the PPI network. According to the compatibility relation, the overlay network chain in quotient space was calculated. The protein complexes are contained in the last level of the overlay network. The experiments were carried out on four PPI databases, and compared ONCQS with five other state-of-the-art methods in the identification of protein complexes. Conclusions We have applied ONCQS to four PPI databases DIP, Gavin, Krogan and MIPS, the results show that it is superior to other five existing algorithms MCODE, MCL, CORE, ClusterONE and COACH in detecting overlapping protein complexes.
- Published
- 2019
195. Identifying Cancer-Specific circRNA–RBP Binding Sites Based on Deep Learning
- Author
-
Fang-Xiang Wu, Xiujuan Lei, and Zhengfeng Wang
- Subjects
Pharmaceutical Science ,convolutional neural network ,RNA-binding protein ,Computational biology ,rna binding protein ,Biology ,Convolutional neural network ,Article ,Analytical Chemistry ,lcsh:QD241-441 ,03 medical and health sciences ,0302 clinical medicine ,Deep Learning ,lcsh:Organic chemistry ,Sequence Analysis, Protein ,Neoplasms ,Drug Discovery ,Humans ,RNA, Neoplasm ,Physical and Theoretical Chemistry ,Binding site ,Databases, Protein ,cancer-specific ,030304 developmental biology ,0303 health sciences ,Binding Sites ,business.industry ,Deep learning ,Organic Chemistry ,RNA-Binding Proteins ,RNA, Circular ,Neoplasm Proteins ,Chemistry (miscellaneous) ,030220 oncology & carcinogenesis ,Softmax function ,Molecular Medicine ,circrna ,Artificial intelligence ,Benchmark data ,business ,Sequence motif ,Function (biology) - Abstract
Circular RNAs (circRNAs) are extensively expressed in cells and tissues, and play crucial roles in human diseases and biological processes. Recent studies have reported that circRNAs could function as RNA binding protein (RBP) sponges, meanwhile RBPs can also be involved in back-splicing. The interaction with RBPs is also considered an important factor for investigating the function of circRNAs. Hence, it is necessary to understand the interaction mechanisms of circRNAs and RBPs, especially in human cancers. Here, we present a novel method based on deep learning to identify cancer-specific circRNA&ndash, RBP binding sites (CSCRSites), only using the nucleotide sequences as the input. In CSCRSites, an architecture with multiple convolution layers is utilized to detect the features of the raw circRNA sequence fragments, and further identify the binding sites through a fully connected layer with the softmax output. The experimental results show that CSCRSites outperform the conventional machine learning classifiers and some representative deep learning methods on the benchmark data. In addition, the features learnt by CSCRSites are converted to sequence motifs, some of which can match to human known RNA motifs involved in human diseases, especially cancer. Therefore, as a deep learning-based tool, CSCRSites could significantly contribute to the function analysis of cancer-associated circRNAs.
- Published
- 2019
196. Predicting circRNA-Disease Associations Based on Improved Collaboration Filtering Recommendation System With Multiple Data
- Author
-
Ling Guo, Zengqiang Fang, and Xiujuan Lei
- Subjects
0301 basic medicine ,Linear coding ,lcsh:QH426-470 ,Computer science ,Computational biology ,Disease ,Recommender system ,Non-coding RNA ,Cross-validation ,lcsh:Genetics ,03 medical and health sciences ,Multiple data ,030104 developmental biology ,0302 clinical medicine ,recommendation system ,Circular RNA ,collaboration filtering ,030220 oncology & carcinogenesis ,Genetics ,Molecular Medicine ,multiple biological data ,neighbor information ,circRNA–disease association ,Genetics (clinical) ,Original Research - Abstract
With the development of high-throughput techniques, various biological molecules are discovered, which includes the circular RNAs (circRNAs). Circular RNA is a novel endogenous noncoding RNA that plays significant roles in regulating gene expression, moderating the microRNAs transcription as sponges, diagnosing diseases, and so on. Based on the circRNA particular molecular structures that are closed-loop structures with neither 5′-3′ polarities nor polyadenylated tails, circRNAs are more stable and conservative than the normal linear coding or noncoding RNAs, which makes circRNAs a biomarker of various diseases. Although some conventional experiments are used to identify the associations between circRNAs and diseases, almost the techniques and experiments are time-consuming and expensive. In this study, we propose a collaboration filtering recommendation system–based computational method, which handles the “cold start” problem to predict the potential circRNA–disease associations, which is named ICFCDA. All the known circRNA–disease associations data are downloaded from circR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). Based on these data, multiple data are extracted from different databases to calculate the circRNA similarity networks and the disease similarity networks. The collaboration filtering recommendation system algorithm is first employed to predict circRNA–disease associations. Then, the leave-one-out cross validation mechanism is adopted to measure the performance of our proposed computational method. ICFCDA achieves the areas under the curve of 0.946, which is better than other existing methods. In order to further illustrate the performance of ICFCDA, case studies of some common diseases are made, and the results are confirmed by other databases. The experimental results show that ICFCDA is competent in predicting the circRNA–disease associations.
- Published
- 2019
197. PDG-PIO: Predicting Disease-genes Based on Pigeon-inspired Optimization
- Author
-
Shi Cheng, Xiujuan Lei, and Yuchen Zhang
- Subjects
0303 health sciences ,Biological data ,Optimization problem ,business.industry ,Semantics (computer science) ,Computer science ,Stochastic matrix ,02 engineering and technology ,Machine learning ,computer.software_genre ,Measure (mathematics) ,Field (computer science) ,03 medical and health sciences ,Identification (information) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Heterogeneous network ,030304 developmental biology - Abstract
Combining large-scale biological data, using computational methods to mine potential disease-gene associations is a popular strategy. At the same time, bio-inspired intelligent optimization has always been a hot research field of intelligent computing. In this study, we apply the pigeon-inspired optimization (PIO) algorithm to the identification of human disease-genes. The problem of predicting disease-genes is translated into a single-objective optimization problem. A reasonable objective function is designed to measure the association between genes and inquiring diseases in a heterogeneous network, and the corresponding probability matrix is generated. The experimental results show that the proposed method (PDG-PIO) can accurately identify disease-genes.
- Published
- 2019
- Full Text
- View/download PDF
198. Dynamic Multimodal Optimization: A Preliminary Study
- Author
-
Yi-nan Guo, Shi Cheng, Jing Liang, Xiujuan Lei, Yuhui Shi, Hui Lu, and Junfeng Chen
- Subjects
Set (abstract data type) ,Mathematical optimization ,Optimization problem ,Local optimum ,Computer science ,0206 medical engineering ,0202 electrical engineering, electronic engineering, information engineering ,Process (computing) ,Particle swarm optimization ,020201 artificial intelligence & image processing ,02 engineering and technology ,020602 bioinformatics - Abstract
The benchmark problems have played a fundamental role in verifying the algorithm’s search ability. A dynamic multimodal optimization (DMO) problem is defined as an optimization problem with multiple global optima and characteristics of global optima which are changed during the search process. Two cases are used to illustrate the application scenario of DMO. A set of benchmark functions on DMO, which contains eight problems, are proposed to show the difficulty of DMO. The properties of the proposed benchmark problems, such as the distribution of solutions, the scalability, the number of global/local optima, are discussed.
- Published
- 2019
- Full Text
- View/download PDF
199. GBDTCDA: Predicting circRNA-disease Associations Based on Gradient Boosting Decision Tree with Multiple Biological Data Fusion
- Author
-
Xiujuan Lei and Zengqiang Fang
- Subjects
circRNA-disease associations ,Computer science ,Feature vector ,Machine learning ,computer.software_genre ,Applied Microbiology and Biotechnology ,Cross-validation ,Machine Learning ,03 medical and health sciences ,Circular RNA ,Humans ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,0303 health sciences ,Computational model ,Biological data ,business.industry ,Decision Trees ,Computational Biology ,Regression analysis ,Cell Biology ,RNA, Circular ,Gradient Boosting ,Artificial intelligence ,Gradient boosting ,multiple biological data ,business ,computer ,Biological network ,Developmental Biology ,Research Paper - Abstract
Circular RNA (circRNA) is a closed-loop structural non-coding RNA molecule which plays a significant role during the gene regulation processes. There are many previous studies shown that circRNAs can be regarded as the sponges of miRNAs. Thus, circRNA is also a key point for disease diagnosing, treating and inferring. However, traditional experimental approaches to verify the associations between the circRNA and disease are time-consuming and money-consuming. There are few computational models to predict potential circRNA-disease associations, which become our motivation to propose a new computational model. In this study, we propose a machine learning based computational model named Gradient Boosting Decision Tree with multiple biological data to predict circRNA-disease associations (GBDTCDA). The known circRNA-disease associations' data are downloaded from cricR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). The feature vector of each circRNA-disease association pair is composed of four parts, which are the statistics information of different biological networks, the graph theory information of different biological networks, circRNA-disease associations' network information and circRNA nucleotide sequence information, respectively. Therefore, we use those feature vectors to train the gradient boosting decision tree regression model. Then, the leave one out cross validation (LOOCV) is adopted to evaluate the performance of our computational model. As for predicting some common diseases related circRNAs, our method GBDTCDA also obtains the better results. The Area under the ROC Curve (AUC) values of Basal cell carcinoma, Non-small cell lung cancer and cervical cancer are 95.8%, 88.3% and 93.5%, respectively. For further illustrating the performance of GBDTCDA, a case study of breast cancer is also supplemented in this study. Thus, our proposed method GBDTCDA is a powerful tool to predict potential circRNA-disease associations based on experimental results and analyses.
- Published
- 2019
200. Identification of Essential Proteins Based on Improved HITS Algorithm
- Author
-
Xiujuan Lei, Fang-Xiang Wu, and Siguo Wang
- Subjects
0301 basic medicine ,lcsh:QH426-470 ,Computer science ,essential proteins ,0206 medical engineering ,Closeness ,HITS algorithm ,02 engineering and technology ,Computational biology ,HSEP ,Article ,03 medical and health sciences ,Annotation ,Protein Interaction Mapping ,Genetics ,Animals ,Humans ,Protein Interaction Maps ,weighted PPI networks ,Genetics (clinical) ,Clustering coefficient ,Genes, Essential ,Gene ontology ,lcsh:Genetics ,Identification (information) ,Protein Transport ,030104 developmental biology ,Gene Ontology ,Ppi network ,Enhanced Data Rates for GSM Evolution ,020602 bioinformatics ,Algorithms ,Protein Binding - Abstract
Essential proteins are critical to the development and survival of cells. Identifying and analyzing essential proteins is vital to understand the molecular mechanisms of living cells and design new drugs. With the development of high-throughput technologies, many protein&ndash, protein interaction (PPI) data are available, which facilitates the studies of essential proteins at the network level. Up to now, although various computational methods have been proposed, the prediction precision still needs to be improved. In this paper, we propose a novel method by applying Hyperlink-Induced Topic Search (HITS) on weighted PPI networks to detect essential proteins, named HSEP. First, an original undirected PPI network is transformed into a bidirectional PPI network. Then, both biological information and network topological characteristics are taken into account to weighted PPI networks. Pieces of biological information include gene expression data, Gene Ontology (GO) annotation and subcellular localization. The edge clustering coefficient is represented as network topological characteristics to measure the closeness of two connected nodes. We conducted experiments on two species, namely Saccharomyces cerevisiae and Drosophila melanogaster, and the experimental results show that HSEP outperformed some state-of-the-art essential proteins detection techniques.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.