40 results on '"Zou, Quan"'
Search Results
2. DeepMPM: a mortality risk prediction model using longitudinal EHR data
- Author
-
Yang, Fan, Zhang, Jian, Chen, Wanyi, Lai, Yongxuan, Wang, Ying, and Zou, Quan
- Published
- 2022
- Full Text
- View/download PDF
3. Overcoming CRISPR-Cas9 off-target prediction hurdles: A novel approach with ESB rebalancing strategy and CRISPR-MCA model.
- Author
-
Yang, Yanpeng, Zheng, Yanyi, Zou, Quan, Li, Jian, and Feng, Hailin
- Subjects
ARTIFICIAL neural networks ,GENOME editing ,CRISPRS ,DEEP learning ,ENCODING - Abstract
The off-target activities within the CRISPR-Cas9 system remains a formidable barrier to its broader application and development. Recent advancements have highlighted the potential of deep learning models in predicting these off-target effects, yet they encounter significant hurdles including imbalances within datasets and the intricacies associated with encoding schemes and model architectures. To surmount these challenges, our study innovatively introduces an Efficiency and Specificity-Based (ESB) class rebalancing strategy, specifically devised for datasets featuring mismatches-only off-target instances, marking a pioneering approach in this realm. Furthermore, through a meticulous evaluation of various One-hot encoding schemes alongside numerous hybrid neural network models, we discern that encoding and models of moderate complexity ideally balance performance and efficiency. On this foundation, we advance a novel hybrid model, the CRISPR-MCA, which capitalizes on multi-feature extraction to enhance predictive accuracy. The empirical results affirm that the ESB class rebalancing strategy surpasses five conventional methods in addressing extreme dataset imbalances, demonstrating superior efficacy and broader applicability across diverse models. Notably, the CRISPR-MCA model excels in off-target effect prediction across four distinct mismatches-only datasets and significantly outperforms contemporary state-of-the-art models in datasets comprising both mismatches and indels. In summation, the CRISPR-MCA model, coupled with the ESB rebalancing strategy, offers profound insights and a robust framework for future explorations in this field. Author summary: In the field of gene editing, the application of deep learning technologies holds significant promise for predicting off-target effects in the CRISPR-Cas9 system. Nevertheless, one of the primary challenges encountered is the extreme imbalance among classes within the off-target datasets, which severely hampers the predictive accuracy for certain classes. Furthermore, as an array of sequence encoding methods continue to evolve, there has been a corresponding increase in model complexity. Addressing these issues, we introduce a novel Efficiency and Specificity-Based (ESB) class rebalancing strategy designed to mitigate the impact of class imbalance. Additionally, we assess the influence of six encoding schemes and four distinct architectural approaches on the prediction performance, employing four benchmark datasets for validation. Building upon these insights, we have developed a new hybrid model, termed CRISPR-MCA. Our experimental results demonstrate that the ESB strategy significantly surpasses the performance of existing baseline methods across multiple models. Moreover, the CRISPR-MCA model exhibits robust performance on two distinct types of datasets, affirming its effectiveness in enhancing the accuracy of deep learning predictions for off-target activities. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. GAM-MDR: probing miRNA–drug resistance using a graph autoencoder based on random path masking.
- Author
-
Zhou, Zhecheng, Du, Zhenya, Jiang, Xin, Zhuo, Linlin, Xu, Yixin, Fu, Xiangzheng, Liu, Mingzhe, and Zou, Quan
- Subjects
DEEP learning ,GENE expression ,MICRORNA ,ACQUISITION of data ,DISEASE progression ,THERAPEUTICS - Abstract
MicroRNAs (miRNAs) are found ubiquitously in biological cells and play a pivotal role in regulating the expression of numerous target genes. Therapies centered around miRNAs are emerging as a promising strategy for disease treatment, aiming to intervene in disease progression by modulating abnormal miRNA expressions. The accurate prediction of miRNA–drug resistance (MDR) is crucial for the success of miRNA therapies. Computational models based on deep learning have demonstrated exceptional performance in predicting potential MDRs. However, their effectiveness can be compromised by errors in the data acquisition process, leading to inaccurate node representations. To address this challenge, we introduce the GAM-MDR model, which combines the graph autoencoder (GAE) with random path masking techniques to precisely predict potential MDRs. The reliability and effectiveness of the GAM-MDR model are mainly reflected in two aspects. Firstly, it efficiently extracts the representations of miRNA and drug nodes in the miRNA–drug network. Secondly, our designed random path masking strategy efficiently reconstructs critical paths in the network, thereby reducing the adverse impact of noisy data. To our knowledge, this is the first time that a random path masking strategy has been integrated into a GAE to infer MDRs. Our method was subjected to multiple validations on public datasets and yielded promising results. We are optimistic that our model could offer valuable insights for miRNA therapeutic strategies and deepen the understanding of the regulatory mechanisms of miRNAs. Our data and code are publicly available at GitHub: https://github.com/ZZCrazy00/GAM-MDR. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. GraphADT: empowering interpretable predictions of acute dermal toxicity with multi-view graph pooling and structure remapping.
- Author
-
Ma, Xinqian, Fu, Xiangzheng, Wang, Tao, Zhuo, Linlin, and Zou, Quan
- Subjects
GRAPH neural networks ,MOLECULAR graphs ,MOLECULAR structure ,CHEMICAL bonds ,MOLECULES ,DEEP learning - Abstract
Motivation Accurate prediction of acute dermal toxicity (ADT) is essential for the safe and effective development of contact drugs. Currently, graph neural networks, a form of deep learning technology, accurately model the structure of compound molecules, enhancing predictions of their ADT. However, many existing methods emphasize atom-level information transfer and overlook crucial data conveyed by molecular bonds and their interrelationships. Additionally, these methods often generate "equal" node representations across the entire graph, failing to accentuate "important" substructures like functional groups, pharmacophores, and toxicophores, thereby reducing interpretability. Results We introduce a novel model, GraphADT, utilizing structure remapping and multi-view graph pooling (MVPool) technologies to accurately predict compound ADT. Initially, our model applies structure remapping to better delineate bonds, transforming "bonds" into new nodes and "bond-atom-bond" interactions into new edges, thereby reconstructing the compound molecular graph. Subsequently, we use MVPool to amalgamate data from various perspectives, minimizing biases inherent to single-view analyses. Following this, the model generates a robust node ranking collaboratively, emphasizing critical nodes or substructures to enhance model interpretability. Lastly, we apply a graph comparison learning strategy to train both the original and structure remapped molecular graphs, deriving the final molecular representation. Experimental results on public datasets indicate that the GraphADT model outperforms existing state-of-the-art models. The GraphADT model has been demonstrated to effectively predict compound ADT, offering potential guidance for the development of contact drugs and related treatments. Availability and implementation Our code and data are accessible at: https://github.com/mxqmxqmxq/GraphADT.git. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. scTPC: a novel semisupervised deep clustering model for scRNA-seq data.
- Author
-
Qiu, Yushan, Yang, Lingfei, Jiang, Hao, and Zou, Quan
- Subjects
DEEP learning ,NEGATIVE binomial distribution ,RNA sequencing ,DATA modeling ,FUZZY clustering technique ,SEQUENCE analysis ,RESEARCH personnel - Abstract
Motivation Continuous advancements in single-cell RNA sequencing (scRNA-seq) technology have enabled researchers to further explore the study of cell heterogeneity, trajectory inference, identification of rare cell types, and neurology. Accurate scRNA-seq data clustering is crucial in single-cell sequencing data analysis. However, the high dimensionality, sparsity, and presence of "false" zero values in the data can pose challenges to clustering. Furthermore, current unsupervised clustering algorithms have not effectively leveraged prior biological knowledge, making cell clustering even more challenging. Results This study investigates a semisupervised clustering model called scTPC, which integrates the t riplet constraint, p airwise constraint, and c ross-entropy constraint based on deep learning. Specifically, the model begins by pretraining a denoising autoencoder based on a zero-inflated negative binomial distribution. Deep clustering is then performed in the learned latent feature space using triplet constraints and pairwise constraints generated from partial labeled cells. Finally, to address imbalanced cell-type datasets, a weighted cross-entropy loss is introduced to optimize the model. A series of experimental results on 10 real scRNA-seq datasets and five simulated datasets demonstrate that scTPC achieves accurate clustering with a well-designed framework. Availability and implementation scTPC is a Python-based algorithm, and the code is available from https://github.com/LF-Yang/Code or https://zenodo.org/records/10951780. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Revisiting drug–protein interaction prediction: a novel global–local perspective.
- Author
-
Zhou, Zhecheng, Liao, Qingquan, Wei, Jinhang, Zhuo, Linlin, Wu, Xiaonan, Fu, Xiangzheng, and Zou, Quan
- Subjects
MULTILAYER perceptrons ,BIPARTITE graphs ,DRUG repositioning ,DEEP learning ,TRANSFORMER models ,INDIVIDUALIZED medicine ,PROTEIN-protein interactions - Abstract
Motivation Accurate inference of potential drug–protein interactions (DPIs) aids in understanding drug mechanisms and developing novel treatments. Existing deep learning models, however, struggle with accurate node representation in DPI prediction, limiting their performance. Results We propose a new computational framework that integrates global and local features of nodes in the drug–protein bipartite graph for efficient DPI inference. Initially, we employ pre-trained models to acquire fundamental knowledge of drugs and proteins and to determine their initial features. Subsequently, the MinHash and HyperLogLog algorithms are utilized to estimate the similarity and set cardinality between drug and protein subgraphs, serving as their local features. Then, an energy-constrained diffusion mechanism is integrated into the transformer architecture, capturing interdependencies between nodes in the drug–protein bipartite graph and extracting their global features. Finally, we fuse the local and global features of nodes and employ multilayer perceptrons to predict the likelihood of potential DPIs. A comprehensive and precise node representation guarantees efficient prediction of unknown DPIs by the model. Various experiments validate the accuracy and reliability of our model, with molecular docking results revealing its capability to identify potential DPIs not present in existing databases. This approach is expected to offer valuable insights for furthering drug repurposing and personalized medicine research. Availability and implementation Our code and data are accessible at: https://github.com/ZZCrazy00/DPI. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Diff-AMP: tailored designed antimicrobial peptide framework with all-in-one generation, identification, prediction and optimization.
- Author
-
Wang, Rui, Wang, Tao, Zhuo, Linlin, Wei, Jinhang, Fu, Xiangzheng, Zou, Quan, and Yao, Xiaojun
- Subjects
ANTIMICROBIAL peptides ,INTERNET servers ,CONVOLUTIONAL neural networks ,PEPTIDE antibiotics ,REINFORCEMENT learning ,DEEP learning ,DRUG toxicity - Abstract
Antimicrobial peptides (AMPs), short peptides with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due to their low drug resistance and toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, AMP generation overlooks the complex interdependencies among amino acids. Secondly, current models fail to integrate crucial tasks like screening, attribute prediction and iterative optimization. Consequently, we develop a integrated deep learning framework, Diff-AMP, that automates AMP generation, identification, attribute prediction and iterative optimization. We innovatively integrate kinetic diffusion and attention mechanisms into the reinforcement learning framework for efficient AMP generation. Additionally, our prediction module incorporates pre-training and transfer learning strategies for precise AMP identification and screening. We employ a convolutional neural network for multi-attribute prediction and a reinforcement learning-based iterative optimization strategy to produce diverse AMPs. This framework automates molecule generation, screening, attribute prediction and optimization, thereby advancing AMP research. We have also deployed Diff-AMP on a web server, with code, data and server details available in the Data Availability section. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. FEOpti-ACVP: identification of novel anti-coronavirus peptide sequences based on feature engineering and optimization.
- Author
-
Jiang, Jici, Pei, Hongdi, Li, Jiayu, Li, Mingxin, Zou, Quan, and Lv, Zhibin
- Subjects
AMINO acid sequence ,DEEP learning ,MACHINE learning ,ENGINEERING ,FEATURE extraction ,IDENTIFICATION - Abstract
Anti-coronavirus peptides (ACVPs) represent a relatively novel approach of inhibiting the adsorption and fusion of the virus with human cells. Several peptide-based inhibitors showed promise as potential therapeutic drug candidates. However, identifying such peptides in laboratory experiments is both costly and time consuming. Therefore, there is growing interest in using computational methods to predict ACVPs. Here, we describe a model for the prediction of ACVPs that is based on the combination of feature engineering (FE) optimization and deep representation learning. FEOpti-ACVP was pre-trained using two feature extraction frameworks. At the next step, several machine learning approaches were tested in to construct the final algorithm. The final version of FEOpti-ACVP outperformed existing methods used for ACVPs prediction and it has the potential to become a valuable tool in ACVP drug design. A user-friendly webserver of FEOpti-ACVP can be accessed at http://servers.aibiochem.net/soft/FEOpti-ACVP/. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. CircRNA identification and feature interpretability analysis.
- Author
-
Niu, Mengting, Wang, Chunyu, Chen, Yaojia, Zou, Quan, Qi, Ren, and Xu, Lei
- Subjects
CIRCULAR RNA ,LINCRNA - Abstract
Background: Circular RNAs (circRNAs) can regulate microRNA activity and are related to various diseases, such as cancer. Functional research on circRNAs is the focus of scientific research. Accurate identification of circRNAs is important for gaining insight into their functions. Although several circRNA prediction models have been developed, their prediction accuracy is still unsatisfactory. Therefore, providing a more accurate computational framework to predict circRNAs and analyse their looping characteristics is crucial for systematic annotation. Results: We developed a novel framework, CircDC, for classifying circRNAs from other lncRNAs. CircDC uses four different feature encoding schemes and adopts a multilayer convolutional neural network and bidirectional long short-term memory network to learn high-order feature representation and make circRNA predictions. The results demonstrate that the proposed CircDC model is more accurate than existing models. In addition, an interpretable analysis of the features affecting the model is performed, and the computational framework is applied to the extended application of circRNA identification. Conclusions: CircDC is suitable for the prediction of circRNA. The identification of circRNA helps to understand and delve into the related biological processes and functions. Feature importance analysis increases model interpretability and uncovers significant biological properties. The relevant code and data in this article can be accessed for free at https://github.com/nmt315320/CircDC.git. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. A computational model of circRNA-associated diseases based on a graph neural network: prediction and case studies for follow-up experimental validation.
- Author
-
Niu, Mengting, Wang, Chunyu, Zhang, Zhanguo, and Zou, Quan
- Subjects
INTERNET servers ,CIRCULAR RNA ,DEEP learning ,STOMACH cancer ,HEPATOCELLULAR carcinoma ,FORECASTING ,MULTIOMICS - Abstract
Background: Circular RNAs (circRNAs) have been confirmed to play a vital role in the occurrence and development of diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for studying etiopathogenesis and treating diseases. To this end, based on the graph Markov neural network algorithm (GMNN) constructed in our previous work GMNN2CD, we further considered the multisource biological data that affects the association between circRNA and disease and developed an updated web server CircDA and based on the human hepatocellular carcinoma (HCC) tissue data to verify the prediction results of CircDA. Results: CircDA is built on a Tumarkov-based deep learning framework. The algorithm regards biomolecules as nodes and the interactions between molecules as edges, reasonably abstracts multiomics data, and models them as a heterogeneous biomolecular association network, which can reflect the complex relationship between different biomolecules. Case studies using literature data from HCC, cervical, and gastric cancers demonstrate that the CircDA predictor can identify missing associations between known circRNAs and diseases, and using the quantitative real-time PCR (RT-qPCR) experiment of HCC in human tissue samples, it was found that five circRNAs were significantly differentially expressed, which proved that CircDA can predict diseases related to new circRNAs. Conclusions: This efficient computational prediction and case analysis with sufficient feedback allows us to identify circRNA-associated diseases and disease-associated circRNAs. Our work provides a method to predict circRNA-associated diseases and can provide guidance for the association of diseases with certain circRNAs. For ease of use, an online prediction server (http://server.malab.cn/CircDA) is provided, and the code is open-sourced (https://github.com/nmt315320/CircDA.git) for the convenience of algorithm improvement. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. CoraL: interpretable contrastive meta-learning for the prediction of cancer-associated ncRNA-encoded small peptides.
- Author
-
Li, Zhongshen, Jin, Junru, He, Wenjia, Long, Wentao, Yu, Haoqing, Gao, Xin, Nakai, Kenta, Zou, Quan, and Wei, Leyi
- Subjects
TUMOR markers ,PEPTIDES ,HUMAN fingerprints ,CORALS ,DEEP learning ,CARCINOGENESIS ,SOURCE code - Abstract
NcRNA-encoded small peptides (ncPEPs) have recently emerged as promising targets and biomarkers for cancer immunotherapy. Therefore, identifying cancer-associated ncPEPs is crucial for cancer research. In this work, we propose CoraL, a novel supervised contrastive meta-learning framework for predicting cancer-associated ncPEPs. Specifically, the proposed meta-learning strategy enables our model to learn meta-knowledge from different types of peptides and train a promising predictive model even with few labeled samples. The results show that our model is capable of making high-confidence predictions on unseen cancer biomarkers with only five samples, potentially accelerating the discovery of novel cancer biomarkers for immunotherapy. Moreover, our approach remarkably outperforms existing deep learning models on 15 cancer-associated ncPEPs datasets, demonstrating its effectiveness and robustness. Interestingly, our model exhibits outstanding performance when extended for the identification of short open reading frames derived from ncPEPs, demonstrating the strong prediction ability of CoraL at the transcriptome level. Importantly, our feature interpretation analysis discovers unique sequential patterns as the fingerprint for each cancer-associated ncPEPs, revealing the relationship among certain cancer biomarkers that are validated by relevant literature and motif comparison. Overall, we expect CoraL to be a useful tool to decipher the pathogenesis of cancer and provide valuable information for cancer research. The dataset and source code of our proposed method can be found at https://github.com/Johnsunnn/CoraL. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Adaptive learning embedding features to improve the predictive performance of SARS-CoV-2 phosphorylation sites.
- Author
-
Jiao, Shihu, Ye, Xiucai, Ao, Chunyan, Sakurai, Tetsuya, Zou, Quan, and Xu, Lei
- Subjects
SARS-CoV-2 ,DEEP learning ,MACHINE learning - Abstract
Motivation The rapid and extensive transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to an unprecedented global health emergency, affecting millions of people and causing an immense socioeconomic impact. The identification of SARS-CoV-2 phosphorylation sites plays an important role in unraveling the complex molecular mechanisms behind infection and the resulting alterations in host cell pathways. However, currently available prediction tools for identifying these sites lack accuracy and efficiency. Results In this study, we presented a comprehensive biological function analysis of SARS-CoV-2 infection in a clonal human lung epithelial A549 cell, revealing dramatic changes in protein phosphorylation pathways in host cells. Moreover, a novel deep learning predictor called PSPred-ALE is specifically designed to identify phosphorylation sites in human host cells that are infected with SARS-CoV-2. The key idea of PSPred-ALE lies in the use of a self-adaptive learning embedding algorithm, which enables the automatic extraction of context sequential features from protein sequences. In addition, the tool uses multihead attention module that enables the capturing of global information, further improving the accuracy of predictions. Comparative analysis of features demonstrated that the self-adaptive learning embedding features are superior to hand-crafted statistical features in capturing discriminative sequence information. Benchmarking comparison shows that PSPred-ALE outperforms the state-of-the-art prediction tools and achieves robust performance. Therefore, the proposed model can effectively identify phosphorylation sites assistant the biomedical scientists in understanding the mechanism of phosphorylation in SARS-CoV-2 infection. Availability and implementation PSPred-ALE is available at https://github.com/jiaoshihu/PSPred-ALE and Zenodo (https://doi.org/10.5281/zenodo.8330277). [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks.
- Author
-
Wang, Yu, Pang, Chao, Wang, Yuzhe, Jin, Junru, Zhang, Jingjie, Zeng, Xiangxiang, Su, Ran, Zou, Quan, and Wei, Leyi
- Subjects
DEEP learning ,ARTIFICIAL intelligence ,ORGANIC chemistry ,DRUG synthesis ,DRUG development ,TRANSFORMER models - Abstract
Automating retrosynthesis with artificial intelligence expedites organic chemistry research in digital laboratories. However, most existing deep-learning approaches are hard to explain, like a "black box" with few insights. Here, we propose RetroExplainer, formulizing the retrosynthesis task into a molecular assembly process, containing several retrosynthetic actions guided by deep learning. To guarantee a robust performance of our model, we propose three units: a multi-sense and multi-scale Graph Transformer, structure-aware contrastive learning, and dynamic adaptive multi-task learning. The results on 12 large-scale benchmark datasets demonstrate the effectiveness of RetroExplainer, which outperforms the state-of-the-art single-step retrosynthesis approaches. In addition, the molecular assembly process renders our model with good interpretability, allowing for transparent decision-making and quantitative attribution. When extended to multi-step retrosynthesis planning, RetroExplainer has identified 101 pathways, in which 86.9% of the single reactions correspond to those already reported in the literature. As a result, RetroExplainer is expected to offer valuable insights for reliable, high-throughput, and high-quality organic synthesis in drug development. Automating retrosynthesis prediction in organic chemistry is a major application of ML. Here the authors present RetroExplainer, which offers a high-performance, transparent and interpretable deep-learning framework providing valuable insights for drug development. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model.
- Author
-
Wang, Jiacheng, Chen, Yaojia, and Zou, Quan
- Subjects
DEEP learning ,GENE regulatory networks ,MONONUCLEAR leukocytes ,BIOLOGICAL systems ,REGULATOR genes ,TRIPLE-negative breast cancer ,GENE expression - Abstract
The gene regulatory structure of cells involves not only the regulatory relationship between two genes, but also the cooperative associations of multiple genes. However, most gene regulatory network inference methods for single cell only focus on and infer the regulatory relationships of pairs of genes, ignoring the global regulatory structure which is crucial to identify the regulations in the complex biological systems. Here, we proposed a graph-based Deep learning model for Regulatory networks Inference among Genes (DeepRIG) from single-cell RNA-seq data. To learn the global regulatory structure, DeepRIG builds a prior regulatory graph by transforming the gene expression of data into the co-expression mode. Then it utilizes a graph autoencoder model to embed the global regulatory information contained in the graph into gene latent embeddings and to reconstruct the gene regulatory network. Extensive benchmarking results demonstrate that DeepRIG can accurately reconstruct the gene regulatory networks and outperform existing methods on multiple simulated networks and real-cell regulatory networks. Additionally, we applied DeepRIG to the samples of human peripheral blood mononuclear cells and triple-negative breast cancer, and presented that DeepRIG can provide accurate cell-type-specific gene regulatory networks inference and identify novel regulators of progression and inhibition. Author summary: Although many methods have been proposed to infer the gene regulatory network of a single cell, they only focus on the regulatory relationships of pairs of genes and ignore the global regulatory structure. Here, we present a deep learning-based model to learn the global regulatory structure and reconstruct the gene regulatory networks from single-cell RNA sequencing data with a graph view. We utilize the weighted gene co-expression analysis to build a prior regulatory graph of gene and a graph autoencoder to deconstruct the latent regulatory structure among genes. We performed extensive experiments on varieties of single-cell RNA sequencing datasets and compared our method with 9 stat-of-the-art gene regulatory network inference method. The results show that our method can significantly improve the accuracy of gene regulatory network inference and can be applied to identify key regulators in a wide range of scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Recall DNA methylation levels at low coverage sites using a CNN model in WGBS.
- Author
-
Luo, Ximei, Wang, Yansu, Zou, Quan, and Xu, Lei
- Subjects
DNA methylation ,REGULATOR genes ,GENETIC regulation ,DEEP learning ,METHYLATION ,METHYLGUANINE - Abstract
DNA methylation is an important regulator of gene transcription. WGBS is the gold-standard approach for base-pair resolution quantitative of DNA methylation. It requires high sequencing depth. Many CpG sites with insufficient coverage in the WGBS data, resulting in inaccurate DNA methylation levels of individual sites. Many state-of-arts computation methods were proposed to predict the missing value. However, many methods required either other omics datasets or other cross-sample data. And most of them only predicted the state of DNA methylation. In this study, we proposed the RcWGBS, which can impute the missing (or low coverage) values from the DNA methylation levels on the adjacent sides. Deep learning techniques were employed for the accurate prediction. The WGBS datasets of H1-hESC and GM12878 were down-sampled. The average difference between the DNA methylation level at 12× depth predicted by RcWGBS and that at >50× depth in the H1-hESC and GM2878 cells are less than 0.03 and 0.01, respectively. RcWGBS performed better than METHimpute even though the sequencing depth was as low as 12×. Our work would help to process methylation data of low sequencing depth. It is beneficial for researchers to save sequencing costs and improve data utilization through computational methods. Author summary: DNA methylation has a major impact on gene regulation. WGBS is the gold standard for investigating the DNA methylation. The DNA methylation level of the sites with low coverage are often not accurate in WGBS datasets. Therefore, we proposed a method based on the CNN model to perform DNA methylation level interpolation for specific sites and named this method as RcWGBS. RcWGBS did not rely on other omics data or other cross-sample data. It only used the sites with sufficient coverage contained in the target WGBS dataset for model training to obtain parameters. Then, the trained model can be used to predict the DNA methylation level of sites with low coverage. Our analyses showed that RcWGBS could recalibrate the methylation level of some CpGs with insufficient coverage. It is suggested that our research could benefit the WGBS datasets with insufficient sequencing coverage. RcWGBS is implemented as an R-packages. It is efficient and convenient and does not need other WGBS or omics data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Explainable Deep Hypergraph Learning Modeling the Peptide Secondary Structure Prediction.
- Author
-
Jiang, Yi, Wang, Ruheng, Feng, Jiuxin, Jin, Junru, Liang, Sirui, Li, Zhongshen, Yu, Yingying, Ma, Anjun, Su, Ran, Zou, Quan, Ma, Qin, and Wei, Leyi
- Subjects
PEPTIDES ,DEEP learning ,TERTIARY structure ,FUNCTIONAL analysis ,FORECASTING ,MODEL-based reasoning - Abstract
Accurately predicting peptide secondary structures remains a challenging task due to the lack of discriminative information in short peptides. In this study, PHAT is proposed, a deep hypergraph learning framework for the prediction of peptide secondary structures and the exploration of downstream tasks. The framework includes a novel interpretable deep hypergraph multi‐head attention network that uses residue‐based reasoning for structure prediction. The algorithm can incorporate sequential semantic information from large‐scale biological corpus and structural semantic information from multi‐scale structural segmentation, leading to better accuracy and interpretability even with extremely short peptides. The interpretable models are able to highlight the reasoning of structural feature representations and the classification of secondary substructures. The importance of secondary structures in peptide tertiary structure reconstruction and downstream functional analysis is further demonstrated, highlighting the versatility of our models. To facilitate the use of the model, an online server is established which is accessible via http://inner.wei‐group.net/PHAT/. The work is expected to assist in the design of functional peptides and contribute to the advancement of structural biology research. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. A Machine Learning Method to Identify Umami Peptide Sequences by Using Multiplicative LSTM Embedded Features.
- Author
-
Jiang, Jici, Li, Jiayu, Li, Junxian, Pei, Hongdi, Li, Mingxin, Zou, Quan, and Lv, Zhibin
- Subjects
AMINO acid sequence ,MACHINE learning ,DEEP learning ,UMAMI (Taste) ,FEATURE extraction ,TASTE testing of food - Abstract
Umami peptides enhance the umami taste of food and have good food processing properties, nutritional value, and numerous potential applications. Wet testing for the identification of umami peptides is a time-consuming and expensive process. Here, we report the iUmami-DRLF that uses a logistic regression (LR) method solely based on the deep learning pre-trained neural network feature extraction method, unified representation (UniRep based on multiplicative LSTM), for feature extraction from the peptide sequences. The findings demonstrate that deep learning representation learning significantly enhanced the capability of models in identifying umami peptides and predictive precision solely based on peptide sequence information. The newly validated taste sequences were also used to test the iUmami-DRLF and other predictors, and the result indicates that the iUmami-DRLF has better robustness and accuracy and remains valid at higher probability thresholds. The iUmami-DRLF method can aid further studies on enhancing the umami flavor of food for satisfying the need for an umami-flavored diet. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. A Unified Deep Learning Framework for Single-Cell ATAC-Seq Analysis Based on ProdDep Transformer Encoder.
- Author
-
Wang, Zixuan, Zhang, Yongqing, Yu, Yun, Zhang, Junming, Liu, Yuhang, and Zou, Quan
- Subjects
LANGUAGE models ,DEEP learning ,TRANSCRIPTION factors ,SCALABILITY - Abstract
Recent advances in single-cell sequencing assays for the transposase-accessibility chromatin (scATAC-seq) technique have provided cell-specific chromatin accessibility landscapes of cis-regulatory elements, providing deeper insights into cellular states and dynamics. However, few research efforts have been dedicated to modeling the relationship between regulatory grammars and single-cell chromatin accessibility and incorporating different analysis scenarios of scATAC-seq data into the general framework. To this end, we propose a unified deep learning framework based on the ProdDep Transformer Encoder, dubbed PROTRAIT, for scATAC-seq data analysis. Specifically motivated by the deep language model, PROTRAIT leverages the ProdDep Transformer Encoder to capture the syntax of transcription factor (TF)-DNA binding motifs from scATAC-seq peaks for predicting single-cell chromatin accessibility and learning single-cell embedding. Based on cell embedding, PROTRAIT annotates cell types using the Louvain algorithm. Furthermore, according to the identified likely noises of raw scATAC-seq data, PROTRAIT denoises these values based on predated chromatin accessibility. In addition, PROTRAIT employs differential accessibility analysis to infer TF activity at single-cell and single-nucleotide resolution. Extensive experiments based on the Buenrostro2018 dataset validate the effeteness of PROTRAIT for chromatin accessibility prediction, cell type annotation, and scATAC-seq data denoising, therein outperforming current approaches in terms of different evaluation metrics. Besides, we confirm the consistency between the inferred TF activity and the literature review. We also demonstrate the scalability of PROTRAIT to analyze datasets containing over one million cells. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. Identification of Thermophilic Proteins Based on Sequence-Based Bidirectional Representations from Transformer-Embedding Features.
- Author
-
Pei, Hongdi, Li, Jiayu, Ma, Shuhan, Jiang, Jici, Li, Mingxin, Zou, Quan, and Lv, Zhibin
- Subjects
LANGUAGE models ,MACHINE learning ,PROTEOMICS ,METHODS engineering ,FEATURE extraction - Abstract
Thermophilic proteins have great potential to be utilized as biocatalysts in biotechnology. Machine learning algorithms are gaining increasing use in identifying such enzymes, reducing or even eliminating the need for experimental studies. While most previously used machine learning methods were based on manually designed features, we developed BertThermo, a model using Bidirectional Encoder Representations from Transformers (BERT), as an automatic feature extraction tool. This method combines a variety of machine learning algorithms and feature engineering methods, while relying on single-feature encoding based on the protein sequence alone for model input. BertThermo achieved an accuracy of 96.97% and 97.51% in 5-fold cross-validation and in independent testing, respectively, identifying thermophilic proteins more reliably than any previously described predictive algorithm. Additionally, BertThermo was tested by a balanced dataset, an imbalanced dataset and a dataset with homology sequences, and the results show that BertThermo was with the best robustness as comparied with state-of-the-art methods. The source code of BertThermo is available. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. DeepMPF: deep learning framework for predicting drug–target interactions based on multi-modal representation with meta-path semantic analysis.
- Author
-
Ren, Zhong-Hao, You, Zhu-Hong, Zou, Quan, Yu, Chang-Qing, Ma, Yan-Fang, Guan, Yong-Jian, You, Hai-Ru, Wang, Xin-Fei, and Pan, Jie
- Subjects
DEEP learning ,DRUG discovery ,DRUG design ,DRUG repositioning ,BIOLOGICAL networks ,INTERNET servers - Abstract
Background: Drug-target interaction (DTI) prediction has become a crucial prerequisite in drug design and drug discovery. However, the traditional biological experiment is time-consuming and expensive, as there are abundant complex interactions present in the large size of genomic and chemical spaces. For alleviating this phenomenon, plenty of computational methods are conducted to effectively complement biological experiments and narrow the search spaces into a preferred candidate domain. Whereas, most of the previous approaches cannot fully consider association behavior semantic information based on several schemas to represent complex the structure of heterogeneous biological networks. Additionally, the prediction of DTI based on single modalities cannot satisfy the demand for prediction accuracy. Methods: We propose a multi-modal representation framework of 'DeepMPF' based on meta-path semantic analysis, which effectively utilizes heterogeneous information to predict DTI. Specifically, we first construct protein–drug-disease heterogeneous networks composed of three entities. Then the feature information is obtained under three views, containing sequence modality, heterogeneous structure modality and similarity modality. We proposed six representative schemas of meta-path to preserve the high-order nonlinear structure and catch hidden structural information of the heterogeneous network. Finally, DeepMPF generates highly representative comprehensive feature descriptors and calculates the probability of interaction through joint learning. Results: To evaluate the predictive performance of DeepMPF, comparison experiments are conducted on four gold datasets. Our method can obtain competitive performance in all datasets. We also explore the influence of the different feature embedding dimensions, learning strategies and classification methods. Meaningfully, the drug repositioning experiments on COVID-19 and HIV demonstrate DeepMPF can be applied to solve problems in reality and help drug discovery. The further analysis of molecular docking experiments enhances the credibility of the drug candidates predicted by DeepMPF. Conclusions: All the results demonstrate the effectively predictive capability of DeepMPF for drug-target interactions. It can be utilized as a useful tool to prescreen the most potential drug candidates for the protein. The web server of the DeepMPF predictor is freely available at http://120.77.11.78/DeepMPF/, which can help relevant researchers to further study. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Deep learning models for disease-associated circRNA prediction: a review.
- Author
-
Chen, Yaojia, Wang, Jiacheng, Wang, Chuyu, Liu, Mingxin, and Zou, Quan
- Subjects
DEEP learning ,CIRCULAR RNA ,THERAPEUTICS ,FORECASTING ,DRUG target ,LEARNING ability ,DIAGNOSIS - Abstract
Emerging evidence indicates that circular RNAs (circRNAs) can provide new insights and potential therapeutic targets for disease diagnosis and treatment. However, traditional biological experiments are expensive and time-consuming. Recently, deep learning with a more powerful ability for representation learning enables it to be a promising technology for predicting disease-associated circRNAs. In this review, we mainly introduce the most popular databases related to circRNA, and summarize three types of deep learning-based circRNA-disease associations prediction methods: feature-generation-based, type-discrimination and hybrid-based methods. We further evaluate seven representative models on benchmark with ground truth for both balance and imbalance classification tasks. In addition, we discuss the advantages and limitations of each type of method and highlight suggested applications for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Deep learning meta-analysis for predicting plant soil-borne fungal disease occurrence from soil microbiome data.
- Author
-
Wang, Yansu and Zou, Quan
- Subjects
- *
VERTICILLIUM wilt diseases , *SOILBORNE plant diseases , *FUNGAL diseases of plants , *PLANT diseases , *DEEP learning - Abstract
Accurately predicting soil-borne fungal diseases linked to plant diseases through the analysis of soil microbial communities is advantageous for early disease detection and monitoring. In this study, a meta-analysis was conducted to establish a classification model for two soil-borne plant fungal diseases, Fusarium and Verticillium wilt disease, based on soil microbiome datasets. The study integrated a scalable denoising method and an imbalanced data processing strategy for processing imbalanced data. The findings reveal a substantial enhancement in model performance when employing denoised and balanced datasets as opposed to the original dataset. Overall, the model based on bacterial ASV features outperformed the model based on fungal ASV features, achieving an accuracy of over 90 % in predicting Fusarium and Verticillium wilt disease on the independent test set. Some bacteria, such as those classified as the Chitinophagaceae , Nocardioides , and Sphingomonas , have been identified as biomarkers for distinguishing between healthy and diseased soils. Despite this achievement, the models exhibited suboptimal classification precision, underscoring the necessity for additional training sets or more comprehensive environmental information to augment disease prediction capabilities. Our analysis highlights the importance of microbiome-based deep learning (DL) models to make plant disease predictions based on microbiome characteristics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Special Protein or RNA Molecules Computational Identification.
- Author
-
Qi, Ren and Zou, Quan
- Subjects
- *
CIRCULAR RNA , *INTERNET servers , *DEEP learning , *MOLECULES , *CONVOLUTIONAL neural networks , *PROTEINS , *RNA , *PROTEOMICS - Abstract
Furthermore, in terms of protein identification, Xu et al. concentrated on the study of antioxidant protein identification; they proposed a machine learning method, SeqSVM, to predict antioxidant proteins through extracted sequence features [[10]]. The identification of special protein or RNA molecules via computational methods is of great importance in understanding their biological functions and developing new treatments for diseases. Seven papers focus on describing protein function prediction or protein identification, which include the prediction of signal peptides in proteins, protein hydroxylation site prediction, protein-protein interaction (PPI) prediction, and protein identification. [Extracted from the article]
- Published
- 2023
- Full Text
- View/download PDF
25. Effector-GAN: prediction of fungal effector proteins based on pretrained deep representation learning methods and generative adversarial networks.
- Author
-
Wang, Yansu, Luo, Ximei, and Zou, Quan
- Subjects
INTERNET servers ,GENERATIVE adversarial networks ,FUNGAL proteins ,PROBABILISTIC generative models ,DEEP learning ,PHYTOPATHOGENIC fungi ,PLANT diseases - Abstract
Motivation Phytopathogenic fungi secrete effector proteins to subvert host defenses and facilitate infection. Systematic analysis and prediction of candidate fungal effector proteins are crucial for experimental validation and biological control of plant disease. However, two problems are still considered intractable to be solved in fungal effector prediction: one is the high-level diversity in effector sequences that increases the difficulty of protein feature learning, and the other is the class imbalance between effector and non-effector samples in the training dataset. Results In our study, pretrained deep representation learning methods are presented to represent multiple characteristics of sequences for predicting fungal effectors and generative adversarial networks are adapted to create synthetic feature samples to address the data imbalance problem. Compared with the state-of-the-art fungal effector prediction methods, Effector-GAN shows an overall improvement in accuracy in the independent test set. Availability and implementation Effector-GAN offers a user-friendly interface to inspect potential fungal effector proteins (http://lab.malab.cn/~wys/webserver/Effector-GAN). The Python script can be downloaded from http://lab.malab.cn/~wys/gitlab/effector-gan. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Predicting protein–peptide binding residues via interpretable deep learning.
- Author
-
Wang, Ruheng, Jin, Junru, Zou, Quan, Nakai, Kenta, and Wei, Leyi
- Subjects
DEEP learning ,AMINO acid sequence ,PROTEIN structure ,PROTEIN-protein interactions ,PROTEIN models ,DRUG discovery - Abstract
Summary Identifying the protein–peptide binding residues is fundamentally important to understand the mechanisms of protein functions and explore drug discovery. Although several computational methods have been developed, most of them highly rely on third-party tools or complex data preprocessing for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. To address the limitations, we propose PepBCL, a novel BERT (Bidirectional Encoder Representation from Transformers) -based contrastive learning framework to predict the protein–peptide binding residues based on protein sequences only. PepBCL is an end-to-end predictive model that is independent of feature engineering. Specifically, we introduce a well pre-trained protein language model that can automatically extract and learn high-latent representations of protein sequences relevant for protein structures and functions. Further, we design a novel contrastive learning module to optimize the feature representations of binding residues underlying the imbalanced dataset. We demonstrate that our proposed method significantly outperforms the state-of-the-art methods under benchmarking comparison, and achieves more robust performance. Moreover, we found that we further improve the performance via the integration of traditional features and our learnt features. Interestingly, the interpretable analysis of our model highlights the flexibility and adaptability of deep learning-based protein language model to capture both conserved and non-conserved sequential characteristics of peptide-binding residues. Finally, to facilitate the use of our method, we establish an online predictive platform as the implementation of the proposed PepBCL, which is now available at http://server.wei-group.net/PepBCL/. Availability and implementation https://github.com/Ruheng-W/PepBCL. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. GMNN2CD: identification of circRNA–disease associations based on variational inference and graph Markov neural networks.
- Author
-
Niu, Mengting, Zou, Quan, and Wang, Chunyu
- Subjects
- *
CIRCULAR RNA , *TRIGONOMETRIC functions , *CHARACTERISTIC functions , *SOURCE code , *DEEP learning , *THERAPEUTICS , *NANOBIOTECHNOLOGY - Abstract
Motivation With the analysis of the characteristic and function of circular RNAs (circRNAs), people have realized that they play a critical role in the diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for searching the etiopathogenesis and treatment of diseases. Nevertheless, it is inefficient to learn new associations only through biotechnology. Results Consequently, we present a computational method, GMNN2CD, which employs a graph Markov neural network (GMNN) algorithm to predict unknown circRNA–disease associations. First, used verified associations, we calculate semantic similarity and Gaussian interactive profile kernel similarity (GIPs) of the disease and the GIPs of circRNA and then merge them to form a unified descriptor. After that, GMNN2CD uses a fusion feature variational map autoencoder to learn deep features and uses a label propagation map autoencoder to propagate tags based on known associations. Based on variational inference, GMNN alternate training enhances the ability of GMNN2CD to obtain high-efficiency high-dimensional features from low-dimensional representations. Finally, 5-fold cross-validation of five benchmark datasets shows that GMNN2CD is superior to the state-of-the-art methods. Furthermore, case studies have shown that GMNN2CD can detect potential associations. Availability and implementation The source code and data are available at https://github.com/nmt315320/GMNN2CD.git. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data.
- Author
-
Zhao, Mengyuan, He, Wenying, Tang, Jijun, Zou, Quan, and Guo, Fei
- Subjects
GENE regulatory networks ,DEEP learning ,CONVOLUTIONAL neural networks ,TRANSCRIPTOMES ,RECEIVER operating characteristic curves ,RECURRENT neural networks - Abstract
Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.
- Author
-
Zhang, Meng, Jia, Cangzhi, Li, Fuyi, Li, Chen, Zhu, Yan, Akutsu, Tatsuya, Webb, Geoffrey I, Zou, Quan, Coin, Lachlan J M, and Song, Jiangning
- Subjects
DEEP learning ,MACHINE learning ,DROSOPHILA melanogaster ,MICE ,CORN ,GENETIC regulation - Abstract
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli , Bacillus subtilis , Homo sapiens , Mus musculus , Arabidopsis thaliana , Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning–based approaches generally outperformed scoring function–based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach.
- Author
-
Niu, Mengting, Zou, Quan, and Lin, Chen
- Subjects
- *
CIRCULAR RNA , *DEEP learning , *NUCLEOTIDE sequence , *RNA-binding proteins , *CARRIER proteins , *BINDING sites , *NON-coding RNA - Abstract
Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism. Increasing evidence shows that circular RNAs can directly bind to RNA-binding proteins (RBP) and play an important role in a variety of biological activities. The interactions between circRNAs and RBPs are key to comprehending the mechanism of posttranscriptional regulation. Accurately identifying binding sites is very useful for analyzing interactions. In past research, some predictors on the basis of machine learning (ML) have been presented, but prediction accuracy still needs to be ameliorated. Therefore, we present a novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP. CRBPDL combines five different feature encoding schemes to encode the original RNA sequence, uses deep multiscale residual networks (MSRN) and bidirectional gating recurrent units (BiGRUs) to effectively learn high-level feature representations, it is sufficient to extract local and global context information at the same time. Additionally, a self-attention mechanism is employed to train the robustness of the CRBPDL. Ultimately, the Adaboost algorithm is applied to integrate deep learning (DL) model to improve prediction performance and reliability of the model. To verify the usefulness of CRBPDL, we compared the efficiency with state-of-the-art methods on 37 circular RNA data sets and 31 linear RNA data sets. Moreover, results display that CRBPDL is capable of performing universal, reliable, and robust. The code and data sets are obtainable at https://github.com/nmt315320/CRBPDL.git. Author summary: More and more evidences show that circular RNA can directly bind to proteins and participate in countless different biological processes. The calculation method can quickly and accurately predict the binding site of circular RNA and RBP. In order to identify the interaction of circRNA with 37 different types of circRNA binding proteins, we developed an integrated deep learning network based on hierarchical network, called CRBPDL. It can effectively learn high-level feature representations. The performance of the model was verified through comparative experiments of different feature extraction algorithms, different deep learning models and classifier models. Moreover, the CRBPDL model was applied to 31 linear RNAs, and the effectiveness of our method was proved by comparison with the results of current excellent algorithms. It is expected that the CRBPDL model can effectively predict the binding site of circular RNA-RBP and provide reliable candidates for further biological experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. comparison of deep learning-based pre-processing and clustering approaches for single-cell RNA sequencing data.
- Author
-
Wang, Jiacheng, Zou, Quan, and Lin, Chen
- Subjects
- *
DEEP learning , *RNA sequencing , *RNA analysis , *DATA reduction , *TASK analysis , *QUALITY control , *DIMENSION reduction (Statistics) - Abstract
The emergence of single cell RNA sequencing has facilitated the studied of genomes, transcriptomes and proteomes. As available single-cell RNA-seq datasets are released continuously, one of the major challenges facing traditional RNA analysis tools is the high-dimensional, high-sparsity, high-noise and large-scale characteristics of single-cell RNA-seq data. Deep learning technologies match the characteristics of single-cell RNA-seq data perfectly and offer unprecedented promise. Here, we give a systematic review for most popular single-cell RNA-seq analysis methods and tools based on deep learning models, involving the procedures of data preprocessing (quality control, normalization, data correction, dimensionality reduction and data visualization) and clustering task for downstream analysis. We further evaluate the deep model-based analysis methods of data correction and clustering quantitatively on 11 gold standard datasets. Moreover, we discuss the data preferences of these methods and their limitations, and give some suggestions and guidance for users to select appropriate methods and tools. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. DeepCap-Kcr: accurate identification and investigation of protein lysine crotonylation sites based on capsule network.
- Author
-
Khanal, Jhabindra, Tayara, Hilal, Zou, Quan, and Chong, Kil To
- Subjects
CAPSULE neural networks ,PROTEOMICS ,DEEP learning ,LYSINE ,CONVOLUTIONAL neural networks ,POST-translational modification - Abstract
Lysine crotonylation (Kcr) is a posttranslational modification widely detected in histone and nonhistone proteins. It plays a vital role in human disease progression and various cellular processes, including cell cycle, cell organization, chromatin remodeling and a key mechanism to increase proteomic diversity. Thus, accurate information on such sites is beneficial for both drug development and basic research. Existing computational methods can be improved to more effectively identify Kcr sites in proteins. In this study, we proposed a deep learning model, DeepCap-Kcr, a capsule network (CapsNet) based on a convolutional neural network (CNN) and long short-term memory (LSTM) for robust prediction of Kcr sites on histone and nonhistone proteins (mammals). The proposed model outperformed the existing CNN architecture Deep-Kcr and other well-established tools in most cases and provided promising outcomes for practical use; in particular, the proposed model characterized the internal hierarchical representation as well as the important features from multiple levels of abstraction automatically learned from a small number of samples. The trained model was well generalized in other species (papaya). Moreover, we showed the features and properties generated by the internal capsule layer that can explore the internal data distribution related to biological significance (as a motif detector). The source code and data are freely available at https://github.com/Jhabindra-bioinfo/DeepCap-Kcr. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Molecular design in drug discovery: a comprehensive review of deep generative models.
- Author
-
Cheng, Yu, Gong, Yongshun, Liu, Yuansheng, Song, Bosheng, and Zou, Quan
- Subjects
DRUG design ,DEEP learning ,LEARNING communities ,DATA distribution - Abstract
Deep generative models have been an upsurge in the deep learning community since they were proposed. These models are designed for generating new synthetic data including images, videos and texts by fitting the data approximate distributions. In the last few years, deep generative models have shown superior performance in drug discovery especially de novo molecular design. In this study, deep generative models are reviewed to witness the recent advances of de novo molecular design for drug discovery. In addition, we divide those models into two categories based on molecular representations in silico. Then these two classical types of models are reported in detail and discussed about both pros and cons. We also indicate the current challenges in deep generative models for de novo molecular design. De novo molecular design automatically is promising but a long road to be explored. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
34. High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method.
- Author
-
Zhang, Yongqing, Wang, Zixuan, Zeng, Yuanqi, Zhou, Jiliu, and Zou, Quan
- Subjects
BINDING sites ,DEEP learning ,TRANSCRIPTION factors ,INDIVIDUALIZED medicine ,DNA sequencing ,PREDICTION models - Abstract
Transcription factors (TFs) are essential proteins in regulating the spatiotemporal expression of genes. It is crucial to infer the potential transcription factor binding sites (TFBSs) with high resolution to promote biology and realize precision medicine. Recently, deep learning-based models have shown exemplary performance in the prediction of TFBSs at the base-pair level. However, the previous models fail to integrate nucleotide position information and semantic information without noisy responses. Thus, there is still room for improvement. Moreover, both the inner mechanism and prediction results of these models are challenging to interpret. To this end, the D eep A ttentive E ncoder- D ecoder Neural Net work (D-AEDNet) is developed to identify the location of TFs–DNA binding sites in DNA sequences. In particular, our model adopts Skip Architecture to leverage the nucleotide position information in the encoder and removes noisy responses in the information fusion process by Attention Gate. Simultaneously, the T ranscription F actor M otif D iscovery based on S liding W indow (TF-MoDSW), an approach to discover TFs–DNA binding motifs by utilizing the output of neural networks, is proposed to understand the biological meaning of the predicted result. On ChIP-exo datasets, experimental results show that D-AEDNet has better performance than competing methods. Besides, we authenticate that Attention Gate can improve the interpretability of our model by ways of visualization analysis. Furthermore, we confirm that ability of D-AEDNet to learn TFs–DNA binding motifs outperform the state-of-the-art methods and availability of TF-MoDSW to discover biological sequence motifs in TFs–DNA interaction by conducting experiment on ChIP-seq datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
35. Editorial: Insights in computational genomics: 2022.
- Author
-
Emes, Richard D., Pirooznia, Mehdi, Zou, Quan, and Pellegrini, Marco
- Subjects
GENOMICS ,DEEP learning - Published
- 2023
- Full Text
- View/download PDF
36. Deep learning based method for predicting DNA N6-methyladenosine sites.
- Author
-
Han, Ke, Wang, Jianchun, Chu, Ying, Liao, Qian, Ding, Yijie, Zheng, Dequan, Wan, Jie, Guo, Xiaoyi, and Zou, Quan
- Subjects
- *
CONVOLUTIONAL neural networks , *DATABASES , *MACHINE learning , *MULTISCALE modeling , *ADENOSINES , *DEEP learning - Abstract
• The use of multi-scale convolutional layers can effectively help to identify hidden dependencies between multiple sequences, capture local patterns in the input sequences more flexibly, and extract location-specific features at different levels. • As global response normalization can achieve global feature aggregation, it can help extract more accurate features in the model and fully express the key information of the 6mA site. • The prediction results are better than other models, and a vector of contribution scores is created that clearly explains the prediction mechanism. DNA N6 methyladenine (6mA) plays an important role in many biological processes, and accurately identifying its sites helps one to understand its biological effects more comprehensively. Previous traditional experimental methods are very labor-intensive and traditional machine learning methods also seem to be somewhat insufficient as the database of 6mA methylation groups becomes progressively larger, so we propose a deep learning-based method called multi-scale convolutional model based on global response normalization (CG6mA) to solve the prediction problem of 6mA site. This method is tested with other methods on three different kinds of benchmark datasets, and the results show that our model can get more excellent prediction results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Generative adversarial network with the discriminator using measurements as an auxiliary input for single-pixel imaging.
- Author
-
Dai, Qianling, Yan, Qiurong, Zou, Quan, Li, Yi, and Yan, Jinwei
- Subjects
- *
GENERATIVE adversarial networks , *PIXELS , *DEEP learning , *IMAGE compression , *COMPRESSED sensing , *IMAGE converters - Abstract
Single-pixel imaging (SPI) can realize two-dimensional imaging with a single-pixel detector without spatial resolution, and has wide application prospects in many fields because of high sensitivity and low cost. The compression reconstruction algorithm based on deep learning can improve the quality of reconstructed images. Generative adversarial network (GAN), which has excellent performance in generating images, is also gradually used in compressed sensing. However, the prior of compressively sensed measurements has not been fully utilized. Therefore, this paper proposes generative adversarial networks MAID-GAN and MAID-GAN+ with the discriminator using measurements as an auxiliary input. The image and corresponding measurements are taken as inputs of the discriminator, and the Y-shaped network structure is used to fuse the feature maps of the image domain and the measurement domain, so as to better guide the generator to generate the image close to the original image and improve the quality of the generated image. Subpixel convolution sampling is used to extract image features, and the sampling network and the reconstruction network are optimized jointly. The simulation and experimental results show that networks proposed in this paper have obvious advantages in reconstruction under low sampling rates. • The optimized sampling masks can improve the sampling efficiency. • Using measurements as an auxiliary input to the discriminator can guide the generator to generate images with more details. • Adding global features to the generator can improve the quality of reconstructed images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Deep learning methods for bioinformatics and biomedicine.
- Author
-
Wang, Yansu, Xu, Lei, and Zou, Quan
- Subjects
- *
BIOINFORMATICS , *DEEP learning - Published
- 2023
- Full Text
- View/download PDF
39. AttentionMGT-DTA: A multi-modal drug-target affinity prediction using graph transformer and attention mechanism.
- Author
-
Wu, Hongjie, Liu, Junkai, Jiang, Tengsheng, Zou, Quan, Qi, Shujie, Cui, Zhiming, Tiwari, Prayag, and Ding, Yijie
- Subjects
- *
TRANSFORMER models , *DRUG discovery , *DEEP learning , *DRUG design , *MOLECULAR graphs , *FORECASTING - Abstract
The accurate prediction of drug-target affinity (DTA) is a crucial step in drug discovery and design. Traditional experiments are very expensive and time-consuming. Recently, deep learning methods have achieved notable performance improvements in DTA prediction. However, one challenge for deep learning-based models is appropriate and accurate representations of drugs and targets, especially the lack of effective exploration of target representations. Another challenge is how to comprehensively capture the interaction information between different instances, which is also important for predicting DTA. In this study, we propose AttentionMGT-DTA, a multi-modal attention-based model for DTA prediction. AttentionMGT-DTA represents drugs and targets by a molecular graph and binding pocket graph, respectively. Two attention mechanisms are adopted to integrate and interact information between different protein modalities and drug-target pairs. The experimental results showed that our proposed model outperformed state-of-the-art baselines on two benchmark datasets. In addition, AttentionMGT-DTA also had high interpretability by modeling the interaction strength between drug atoms and protein residues. Our code is available at https://github.com/JK-Liu7/AttentionMGT-DTA. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Multi-correntropy fusion based fuzzy system for predicting DNA N4-methylcytosine sites.
- Author
-
Ding, Yijie, Tiwari, Prayag, Guo, Fei, and Zou, Quan
- Subjects
- *
FUZZY systems , *DEEP learning , *STATISTICAL learning , *FUZZY logic , *FEATURE selection , *DNA - Abstract
The identification of DNA N4-methylcytosine (4mC) sites is an important field of bioinformatics. Statistical learning methods and deep learning have been applied in this direction. The previous methods focused on feature representation and feature selection, and did not take into account the deviation of noise samples for recognition. Moreover, these models were not established from the perspective of prediction error distribution. To solve the problem of complex error distribution, we propose a maximum multi-correntropy criterion based kernelized higher-order fuzzy inference system (MMC-KHFIS), which is constructed with multi-correntropy fusion. There are 6 4mC and 8 UCI data sets are employed to evaluate our model. The MMC-KHFIS achieves better performance in the experiment. • For complex error distribution, multi-correntropy based fuzzy system is built. • Fuzzy kernel is built to solve feature space projection in each fuzzy subset. • An efficient iterative algorithm is employed to optimize the fuzzy system. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.