Author: "Zou, Quan" / Publication Year Range: Last 3 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zou, Quan"' showing total 139 results

Start Over Author "Zou, Quan" Publication Year Range Last 3 years

139 results on '"Zou, Quan"'

1. Editorial: Artificial intelligence in drug discovery and development.

Author: Wei, Leyi, Zou, Quan, and Zeng, Xiangxiang
Subjects: *DRUG discovery, *ARTIFICIAL intelligence, *DRUG development
Published: 2024
Full Text: View/download PDF

2. Deciphering Microbial Adaptation in the Rhizosphere: Insights into Niche Preference, Functional Profiles, and Cross-Kingdom Co-occurrences.

Author: Wang, Yansu and Zou, Quan
Subjects: *RHIZOSPHERE, *BIOMACROMOLECULES, *MICROBIAL communities, *BACTERIAL communities, *RHIZOBACTERIA, *PLANT growth, *PHYSIOLOGICAL adaptation
Abstract: Rhizosphere microbial communities are to be as critical factors for plant growth and vitality, and their adaptive differentiation strategies have received increasing amounts of attention but are poorly understood. In this study, we obtained bacterial and fungal amplicon sequences from the rhizosphere and bulk soils of various ecosystems to investigate the potential mechanisms of microbial adaptation to the rhizosphere environment. Our focus encompasses three aspects: niche preference, functional profiles, and cross-kingdom co-occurrence patterns. Our findings revealed a correlation between niche similarity and nucleotide distance, suggesting that niche adaptation explains nucleotide variation among some closely related amplicon sequence variants (ASVs). Furthermore, biological macromolecule metabolism and communication among abundant bacteria increase in the rhizosphere conditions, suggesting that bacterial function is trait-mediated in terms of fitness in new habitats. Additionally, our analysis of cross-kingdom networks revealed that fungi act as intermediaries that facilitate connections between bacteria, indicating that microbes can modify their cooperative relationships to adapt. Overall, the evidence for rhizosphere microbial community adaptation, via differences in gene and functional and co-occurrence patterns, elucidates the adaptive benefits of genetic and functional flexibility of the rhizosphere microbiota through niche shifts. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Effects of crystalline lens rise and anterior chamber parameters on vault after implantable collamer lens placement.

Author: Zou, Quan, Zhao, Sen, Cheng, Lei, Song, Chao, Yuan, Ping, and Zhu, Ran
Subjects: *CRYSTALLINE lens, *BLAND-Altman plot, *ACOUSTIC microscopy, *MULTIPLE regression analysis, *RECEIVER operating characteristic curves, *FACTOR analysis
Abstract: Background: To analyze vault effects of crystalline lens rise (CLR) and anterior chamber parameters (recorded by Pentacam) in highly myopic patients receiving implantable collamer lenses (ICLs), which may avoid subsequent complications such as glaucoma and cataract caused by the abnormal vault. Methods: We collected clinical data of 137 patients with highly myopic vision, who were all subsequent recipients of V4c ICLs between June 2020 and January 2021. Horizontal ciliary sulcus-to-sulcus diameter (hSTS) and CLR were measured by ultrasonic biomicroscopy (UBM), and a Pentacam anterior segment analyzer was used to measure horizontal white-to-white diameter (hWTW), anterior chamber depth (ACD), anterior chamber angle (ACA), anterior chamber volume (ACV), CLR, and postoperative vault (Year 1 and Month 1). The lens thickness (LT) was determined by optical biometry (IOL Master instrument). The predictive model was generated through multiple linear regression analyses of influential factors, such as hSTS, CLR, hWTW, ACD, ACA, ACV, ICL size, and LT. The predictive performance of the multivariate model on vault after ICL was assessed using the receiver operating characteristic (ROC) curve with area under the curve (AUC) as well as the point of tangency. Results: Average CLR assessed by UBM was lower than the average value obtained by Pentacam (0.561 vs. 0.683). Bland-Altman analysis showed a good consistency in the two measurement methods and substantial correlation (r = 0.316; P = 0.000). The ROC curve of Model 1 (postoperative Year 1) displayed an AUC of 0.847 (95% confidence interval [CI]: 74.19–95.27), with optimal threshold of 0.581 (sensitivity, 0.857; specificity, 0.724). In addition, respective values for Model 2 (postoperative Month 1) were 0.783 (95% CI: 64.94–91.64) and 0.522 (sensitivity, 0.917; specificity, 0.605). Conclusion: CLR and anterior chamber parameters are important determinants of postoperative vault after ICL placement. The multivariate regression model we constructed may serve in large part as a predictive gauge, effectively avoid postoperative complication. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. A new fixed-time terminal sliding mode control for second-order nonlinear systems.

Author: Zou, Quan and Chang, Shuaichuan
Subjects: *SLIDING mode control, *NONLINEAR systems
Abstract: In this paper, the fixed-time control for a class of second-order nonlinear systems with matched disturbance is addressed by using terminal sliding mode control technique. To improve the control performance of traditional fixed-time control method, a new fixed-time stability theorem is constructed by introducing a simple linear term, and theoretical analysis shows that the proposed control method provides faster convergence speed and more accurate estimate of the upper bound of settling time. Moreover, the practical fixed-time stability is also discussed in details. Based on the proposed fixed-time stability theorem, a fixed-time terminal sliding mode controller for a class of second-order nonlinear system is designed to obtain a bounded settling time independently of the initial conditions of the system. Simulations are carried out to verify the feasibility of the proposed control method. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Optimization of drug–target affinity prediction methods through feature processing schemes.

Author: Ru, Xiaoqing, Zou, Quan, and Lin, Chen
Subjects: *PREDICTION models, *FORECASTING
Abstract: Motivation Numerous high-accuracy drug–target affinity (DTA) prediction models, whose performance is heavily reliant on the drug and target feature information, are developed at the expense of complexity and interpretability. Feature extraction and optimization constitute a critical step that significantly influences the enhancement of model performance, robustness, and interpretability. Many existing studies aim to comprehensively characterize drugs and targets by extracting features from multiple perspectives; however, this approach has drawbacks: (i) an abundance of redundant or noisy features; and (ii) the feature sets often suffer from high dimensionality. Results In this study, to obtain a model with high accuracy and strong interpretability, we utilize various traditional and cutting-edge feature selection and dimensionality reduction techniques to process self-associated features and adjacent associated features. These optimized features are then fed into learning to rank to achieve efficient DTA prediction. Extensive experimental results on two commonly used datasets indicate that, among various feature optimization methods, the regression tree-based feature selection method is most beneficial for constructing models with good performance and strong robustness. Then, by utilizing Shapley Additive Explanations values and the incremental feature selection approach, we obtain that the high-quality feature subset consists of the top 150D features and the top 20D features have a breakthrough impact on the DTA prediction. In conclusion, our study thoroughly validates the importance of feature optimization in DTA prediction and serves as inspiration for constructing high-performance and high-interpretable models. Availability and implementation https://github.com/RUXIAOQING964914140/FS_DTA. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. The evolution of individual and collective rights in the Chinese workplace.

Author: Zou, Quan
Subjects: *GROUP rights, *LABOR laws, *LABOR contracts, *CONTRACT employment, *CIVIL rights
Abstract: Due to the underdeveloped nature of organized labor, it is possible to view the 'individual' and 'collective' components of labor legislation in China as separate and severable. This article aims to challenge such thinking by arguing that collective labor law and collective bargaining practices in China have profoundly shaped the law of employment contracts and individual employment relations. To this end, analyzing the laws surrounding individual employment contracts should not proceed without considering collective labor law. This article investigates, in the first three decades following the establishment of the People's Republic of China in 1949, the significance of collective rights to the underdevelopment of legal rules of employment rights and the emergence of the socialist social contract. This article also examines, after the economic reform of 1978, the various ways collective bargaining contributed to the transformation from the socialist social contract to the standard contract of employment and from an underdeveloped to a comprehensive framework of employment legislation. Finally, in the post-economic-reform decades, the analysis suggests that collective bargaining encourages the empowerment of trade unions with legislative and administrative efforts and facilitates the incorporation of terms and conditions improvement into individual employment contracts. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. Special Protein or RNA Molecules Computational Identification.

Author: Qi, Ren and Zou, Quan
Subjects: *CIRCULAR RNA, *INTERNET servers, *DEEP learning, *MOLECULES, *CONVOLUTIONAL neural networks, *PROTEINS, *RNA, *PROTEOMICS
Abstract: Furthermore, in terms of protein identification, Xu et al. concentrated on the study of antioxidant protein identification; they proposed a machine learning method, SeqSVM, to predict antioxidant proteins through extracted sequence features [[10]]. The identification of special protein or RNA molecules via computational methods is of great importance in understanding their biological functions and developing new treatments for diseases. Seven papers focus on describing protein function prediction or protein identification, which include the prediction of signal peptides in proteins, protein hydroxylation site prediction, protein-protein interaction (PPI) prediction, and protein identification. [Extracted from the article]
Published: 2023
Full Text: View/download PDF

8. Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE.

Author: Wang, Chao and Zou, Quan
Abstract: Background: Protein solubility is a precondition for efficient heterologous protein expression at the basis of most industrial applications and for functional interpretation in basic research. However, recurrent formation of inclusion bodies is still an inevitable roadblock in protein science and industry, where only nearly a quarter of proteins can be successfully expressed in soluble form. Despite numerous solubility prediction models having been developed over time, their performance remains unsatisfactory in the context of the current strong increase in available protein sequences. Hence, it is imperative to develop novel and highly accurate predictors that enable the prioritization of highly soluble proteins to reduce the cost of actual experimental work. Results: In this study, we developed a novel tool, DeepSoluE, which predicts protein solubility using a long-short-term memory (LSTM) network with hybrid features composed of physicochemical patterns and distributed representation of amino acids. Comparison results showed that the proposed model achieved more accurate and balanced performance than existing tools. Furthermore, we explored specific features that have a dominant impact on the model performance as well as their interaction effects. Conclusions: DeepSoluE is suitable for the prediction of protein solubility in E. coli; it serves as a bioinformatics tool for prescreening of potentially soluble targets to reduce the cost of wet-experimental studies. The publicly available webserver is freely accessible at . [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. Investigation of the Splashing Characteristics of Lead Slag in Side-Blown Bath Melting Process.

Author: Zou, Quan, Hu, Jianhang, Yang, Shiliang, Wang, Hua, and Deng, Ge
Subjects: *SMELTING furnaces, *SLAG, *WATER immersion, *KINETIC energy, *MELTING, *TWO-phase flow
Abstract: Aiming at the melt splashing behavior in the smelting process of an oxygen-enriched side-blowing furnace, the volume of fluid model and the realizable k − ε turbulence model are coupled and simulated. The effects of different operating parameters (injection velocity, immersion depth, liquid level) on splash height are explored, and the simulation results are verified by water model experiments. The results show that the bubbles with residual kinetic energy escape to the slag surface and cause slag splashing. The slag splashing height gradually increases with the increase in injection velocity, and the time-averaged splashing height reaches 1.01 m when the injection speed is 160 m/s. Increasing the immersion depth of the lance, and the slag splashing height gradually decreases. When the immersion depth is 0.12 m, the time-averaged splashing height is 0.85 m. Increasing the liquid level is beneficial to reduce the splash height, when the liquid level is 2.7 m, the splash height reduces to 0.77 m. With the increase in the liquid level, the slag splashing height gradually decreases, and the time-averaged splashing height is 0.77 m when the initial liquid level is 2.7 m. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

10. Observer‐based sliding mode control for permanent magnet synchronous motor speed regulation system with a novel reaching law.

Author: Zou, Quan, Wei, Kai, and Zhou, Guangzu
Subjects: *PERMANENT magnet motors, *SLIDING mode control, *SPEED limits, *ROBUST control, *TORQUE control
Abstract: A novel reaching law‐based sliding mode controller is proposed for a permanent magnet synchronous motor (PMSM) speed regulation system with uncertainties and unknown load torque in this paper. The proposed reaching law is the improvement of the traditional power rate reaching law (PRL) by using a simple tuning function of the sliding variable. The tuning function is designed such that the reaching speed is fast when the system states are far away from the sliding surface, and vice versa. Theoretical analysis shows that the reaching time of the proposed reaching law is always shorter than that of the traditional PRL with the same gains. Moreover, unlike the traditional PRL and some existing auto‐tuning PRLs, the proposed reaching law can provide globally bounded reaching time independently on the initial conditions, and the reaching time can be effectively reduced by tuning the reaching law gains. Based on this novel reaching law, a disturbance observer is designed to estimate the total disturbance, and then based on the estimated disturbance and the novel reaching law, a sliding mode speed controller is designed for the robust control of PMSM speed regulation system. Simulations and experiments are carried out to demonstrate the superiority of the proposed control method. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

11. WMSA: a novel method for multiple sequence alignment of DNA sequences.

Author: Wei, Yanming, Zou, Quan, Tang, Furong, and Yu, Liang
Subjects: *SEQUENCE alignment, *INTERNET servers, *DNA sequencing, *FAST Fourier transforms, *SEPARATION of variables, *SOURCE code
Abstract: Motivation Multiple sequence alignment (MSA) is a fundamental problem in bioinformatics. The quality of alignment will affect downstream analysis. MAFFT has adopted the Fast Fourier Transform method for searching the homologous segments and using them as anchors to divide the sequences, then making alignment only on segments, which can save time and memory without overly reducing the sequence alignment quality. MAFFT becomes slow when the dataset is large. Results We made a software, WMSA, which uses the divide-and-conquer method to split the sequences into clusters, aligns those clusters into profiles with the center star strategy and then makes a progressive profile–profile alignment. The alignment is conducted by the compiled algorithms of MAFFT, K-Band with multithread parallelism. Our method can balance time, space and quality and performs better than MAFFT in test experiments on highly conserved datasets. Availability and implementation Source code is freely available at https://github.com/malabz/WMSA/ , which is implemented in C/C++ and supported on Linux, and datasets are available at https://github.com/malabz/WMSA-dataset. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

12. Alloying at a Subnanoscale Maximizes the Synergistic Effect on the Electrocatalytic Hydrogen Evolution.

Author: Zou, Quan, Akada, Yuji, Kuzume, Akiyoshi, Yoshida, Masataka, Imaoka, Takane, and Yamamoto, Kimihisa
Subjects: *SCANNING transmission electron microscopy, *X-ray photoelectron spectroscopy, *HYDROGEN evolution reactions, *ELECTRONIC modulation, *METAL catalysts
Abstract: Bonding dissimilar elements to provide synergistic effects is an effective way to improve the performance of metal catalysts. However, as the properties become more dissimilar, achieving synergistic effects effectively becomes more difficult due to phase separation. Here we describe a comprehensive study on how subnanoscale alloying is always effective for inter‐elemental synergy. Thirty‐six combinations of both bimetallic subnanoparticles (SNPs) and nanoparticles (NPs) were studied systematically using atomic‐resolution imaging and catalyst benchmarking based on the hydrogen evolution reaction (HER). Results revealed that SNPs always produce greater synergistic effects than NPs, the greatest synergistic effect was found for the combination of Pt and Zr. The atomic‐scale miscibility and the associated modulation of electronic states at the subnanoscale were much different from those at the nanoscale, which was observed by annular‐dark‐field scanning transmission electron microscopy (ADF‐STEM) and X‐ray photoelectron spectroscopy (XPS), respectively. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

13. Alloying at a Subnanoscale Maximizes the Synergistic Effect on the Electrocatalytic Hydrogen Evolution.

Author: Zou, Quan, Akada, Yuji, Kuzume, Akiyoshi, Yoshida, Masataka, Imaoka, Takane, and Yamamoto, Kimihisa
Subjects: *SCANNING transmission electron microscopy, *X-ray photoelectron spectroscopy, *HYDROGEN evolution reactions, *ELECTRONIC modulation, *METAL catalysts
Abstract: Bonding dissimilar elements to provide synergistic effects is an effective way to improve the performance of metal catalysts. However, as the properties become more dissimilar, achieving synergistic effects effectively becomes more difficult due to phase separation. Here we describe a comprehensive study on how subnanoscale alloying is always effective for inter‐elemental synergy. Thirty‐six combinations of both bimetallic subnanoparticles (SNPs) and nanoparticles (NPs) were studied systematically using atomic‐resolution imaging and catalyst benchmarking based on the hydrogen evolution reaction (HER). Results revealed that SNPs always produce greater synergistic effects than NPs, the greatest synergistic effect was found for the combination of Pt and Zr. The atomic‐scale miscibility and the associated modulation of electronic states at the subnanoscale were much different from those at the nanoscale, which was observed by annular‐dark‐field scanning transmission electron microscopy (ADF‐STEM) and X‐ray photoelectron spectroscopy (XPS), respectively. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

14. RFhy-m2G: Identification of RNA N2-methylguanosine modification sites based on random forest and hybrid features.

Author: Ao, Chunyan, Zou, Quan, and Yu, Liang
Subjects: *RANDOM forest algorithms, *RNA modification & restriction, *TRANSFER RNA, *FEATURE selection, *PREDICTION models
Abstract: • A novel method was proposed to identify RNA m2G sites using hybrid features. • The over-sample method SMOTE was adopted to deal with the problem of data imbalance. • After using MRMD to select features, the performance of the model is improved. • The RFhy-m2G is superior to other methods, which can effective identify m2G sites. N2-methylguanosine is a post-transcriptional modification of RNA that is found in eukaryotes and archaea. The biological function of m2G modification discovered so far is to control and stabilize the three-dimensional structure of tRNA and the dynamic barrier of reverse transcription. To discover additional biological functions of m2G, it is necessary to develop time-saving and labor-saving calculation tools to identify m2G. In this paper, based on hybrid features and a random forest, a novel predictor, RFhy-m2G, was developed to identify the m2G modification sites for three species. The hybrid feature used by the predictor is used to fuse the three features of ENAC, PseDNC, and NPPS. These three features include primary sequence derivation properties, physicochemical properties, and position-specific properties. Since there are redundant features in hybrid features, MRMD2.0 is used for optimal feature selection. Through feature analysis, it is found that the optimal hybrid features obtained still contain three kinds of properties, and the hybrid features can more accurately identify m2G modification sites and improve prediction performance. Based on five-fold cross-validation and independent testing to evaluate the prediction model, the accuracies obtained were 0.9982 and 0.9417, respectively. The robustness of the predictor is demonstrated by comparisons with other predictors. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

15. iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks.

Author: Akbar, Shahid, Zou, Quan, Raza, Ali, and Alarfaj, Fawaz Khaled
Abstract: Globally, fungal infections have become a major health concern in humans. Fungal diseases generally occur due to the invading fungus appearing on a specific portion of the body and becoming hard for the human immune system to resist. The recent emergence of COVID-19 has intensely increased different nosocomial fungal infections. The existing wet-laboratory-based medications are expensive, time-consuming, and may have adverse side effects on normal cells. In the last decade, peptide therapeutics have gained significant attention due to their high specificity in targeting affected cells without affecting healthy cells. Motivated by the significance of peptide-based therapies, we developed a highly discriminative prediction scheme called iAFPs-Mv-BiTCN to predict antifungal peptides correctly. The training peptides are encoded using word embedding methods such as skip-gram and attention mechanism-based bidirectional encoder representation using transformer. Additionally, transform-based evolutionary features are generated using the Pseduo position-specific scoring matrix using discrete wavelet transform (PsePSSM-DWT). The fused vector of word embedding and evolutionary descriptors is formed to compensate for the limitations of single encoding methods. A Shapley Additive exPlanations (SHAP) based global interpolation approach is applied to reduce training costs by choosing the optimal feature set. The selected feature set is trained using a bi-directional temporal convolutional network (BiTCN). The proposed iAFPs-Mv-BiTCN model achieved a predictive accuracy of 98.15 % and an AUC of 0.99 using training samples. In the case of the independent samples, our model obtained an accuracy of 94.11 % and an AUC of 0.98. Our iAFPs-Mv-BiTCN model outperformed existing models with a ~4 % and ~5 % higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed iAFPs-Mv-BiTCN model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia. • A Bidirectional Temporal Convolutional Networks-based computational model is developed for the Prediction of Antifungal peptides. • A Transform evolutionary matrix, self-attention based transformer, and fasttext-based word embedding are employed to numerically represent the peptide samples. • The SHAP interpolation-based feature selection is applied to select optimal features from the Hybrid vector • The proposed iAFPs-Mv-BiTCN model achieved the highest predictive results using training and independent datasets than existing computational models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. GPU-DEM-based heat transfer model for an HTGR pebble bed.

Author: Zou, Quan, Gui, Nan, Yang, Xingtuan, Tu, Jiyuan, and Jiang, Shengyao
Subjects: *THERMAL conductivity, *HEAT transfer, *HEAT radiation & absorption, *PEBBLES, *HEAT conduction, *DISCRETE element method
Abstract: Based on the discrete element method (DEM) and GPU parallel computing, a particle heat transfer model is developed to simulate the heat transfer in a pebble bed of the high-temperature gas-cooled reactor (HTGR). The model is implemented based on a previously developed GPU-DEM program by our team and uses the mesh-based neighbor searching algorithm for the heat transfer calculation. This model couples the conduction and radiative heat transfer between the pebbles and incorporates neural networks and empirical fittings to calculate the radiation view factors, which can improve computational efficiency. The effective thermal conductivity of different models and experimental data are used to verify the accuracy of the model, and the influence of different radiation heat transfer models on the results is also compared. The results show that the effective thermal conductivity derived from the current model is comparable to the classical models at different temperatures, and the numerical simulation results based on the current model are in good agreement with the corresponding experimental data. Additionally, the model achieves a single-core speedup ratio of 126–395 times with GPU acceleration, significantly enhancing computational efficiency. In conclusion, the current model has been effectively verified for accuracy and computational efficiency, and it demonstrates great potential in dealing with large-scale pebble flow and heat transfer challenges in HTGRs. • A Voronoi-tessellation-free new heat transfer model is proposed for pebble beds. • View factors are calculated by neural networks to couple conduction and radiation. • GPU parallel computing is employed at a speedup ratio of 126–395 times of CPU. • An alternate-read-write method and unified Memory Access technology are used. • The model accuracy is validated and discussed by comparing it with experiments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. GMNN2CD: identification of circRNA–disease associations based on variational inference and graph Markov neural networks.

Author: Niu, Mengting, Zou, Quan, and Wang, Chunyu
Subjects: *CIRCULAR RNA, *TRIGONOMETRIC functions, *CHARACTERISTIC functions, *SOURCE code, *DEEP learning, *THERAPEUTICS, *NANOBIOTECHNOLOGY
Abstract: Motivation With the analysis of the characteristic and function of circular RNAs (circRNAs), people have realized that they play a critical role in the diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for searching the etiopathogenesis and treatment of diseases. Nevertheless, it is inefficient to learn new associations only through biotechnology. Results Consequently, we present a computational method, GMNN2CD, which employs a graph Markov neural network (GMNN) algorithm to predict unknown circRNA–disease associations. First, used verified associations, we calculate semantic similarity and Gaussian interactive profile kernel similarity (GIPs) of the disease and the GIPs of circRNA and then merge them to form a unified descriptor. After that, GMNN2CD uses a fusion feature variational map autoencoder to learn deep features and uses a label propagation map autoencoder to propagate tags based on known associations. Based on variational inference, GMNN alternate training enhances the ability of GMNN2CD to obtain high-efficiency high-dimensional features from low-dimensional representations. Finally, 5-fold cross-validation of five benchmark datasets shows that GMNN2CD is superior to the state-of-the-art methods. Furthermore, case studies have shown that GMNN2CD can detect potential associations. Availability and implementation The source code and data are available at https://github.com/nmt315320/GMNN2CD.git. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

18. CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach.

Author: Niu, Mengting, Zou, Quan, and Lin, Chen
Subjects: *CIRCULAR RNA, *DEEP learning, *NUCLEOTIDE sequence, *RNA-binding proteins, *CARRIER proteins, *BINDING sites, *NON-coding RNA
Abstract: Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism. Increasing evidence shows that circular RNAs can directly bind to RNA-binding proteins (RBP) and play an important role in a variety of biological activities. The interactions between circRNAs and RBPs are key to comprehending the mechanism of posttranscriptional regulation. Accurately identifying binding sites is very useful for analyzing interactions. In past research, some predictors on the basis of machine learning (ML) have been presented, but prediction accuracy still needs to be ameliorated. Therefore, we present a novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP. CRBPDL combines five different feature encoding schemes to encode the original RNA sequence, uses deep multiscale residual networks (MSRN) and bidirectional gating recurrent units (BiGRUs) to effectively learn high-level feature representations, it is sufficient to extract local and global context information at the same time. Additionally, a self-attention mechanism is employed to train the robustness of the CRBPDL. Ultimately, the Adaboost algorithm is applied to integrate deep learning (DL) model to improve prediction performance and reliability of the model. To verify the usefulness of CRBPDL, we compared the efficiency with state-of-the-art methods on 37 circular RNA data sets and 31 linear RNA data sets. Moreover, results display that CRBPDL is capable of performing universal, reliable, and robust. The code and data sets are obtainable at https://github.com/nmt315320/CRBPDL.git. Author summary: More and more evidences show that circular RNA can directly bind to proteins and participate in countless different biological processes. The calculation method can quickly and accurately predict the binding site of circular RNA and RBP. In order to identify the interaction of circRNA with 37 different types of circRNA binding proteins, we developed an integrated deep learning network based on hierarchical network, called CRBPDL. It can effectively learn high-level feature representations. The performance of the model was verified through comparative experiments of different feature extraction algorithms, different deep learning models and classifier models. Moreover, the CRBPDL model was applied to 31 linear RNAs, and the effectiveness of our method was proved by comparison with the results of current excellent algorithms. It is expected that the CRBPDL model can effectively predict the binding site of circular RNA-RBP and provide reliable candidates for further biological experiments. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

19. novel fast multiple nucleotide sequence alignment method based on FM-index.

Author: Liu, Huan, Zou, Quan, and Xu, Yun
Subjects: *NUCLEOTIDE sequence, *SEQUENCE alignment, *HUMAN genome, *SOURCE code
Abstract: Multiple sequence alignment (MSA) is fundamental to many biological applications. But most classical MSA algorithms are difficult to handle large-scale multiple sequences, especially long sequences. Therefore, some recent aligners adopt an efficient divide-and-conquer strategy to divide long sequences into several short sub-sequences. Selecting the common segments (i.e. anchors) for division of sequences is very critical as it directly affects the accuracy and time cost. So, we proposed a novel algorithm, FMAlign, to improve the performance of multiple nucleotide sequence alignment. We use FM-index to extract long common segments at a low cost rather than using a space-consuming hash table. Moreover, after finding the longer optimal common segments, the sequences are divided by the longer common segments. FMAlign has been tested on virus and bacteria genome and human mitochondrial genome datasets, and compared with existing MSA methods such as MAFFT, HAlign and FAME. The experiments show that our method outperforms the existing methods in terms of running time, and has a high accuracy on long sequence sets. All the results demonstrate that our method is applicable to the large-scale nucleotide sequences in terms of sequence length and sequence number. The source code and related data are accessible in https://github.com/iliuh/FMAlign. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

20. NmRF: identification of multispecies RNA 2'-O-methylation modification sites from RNA sequences.

Author: Ao, Chunyan, Zou, Quan, and Yu, Liang
Subjects: *RNA modification & restriction, *TRANSFER RNA, *FEATURE selection, *RANDOM forest algorithms, *METHYL groups, *MACHINE learning, *BOOSTING algorithms
Abstract: 2'-O-methylation (Nm) is a post-transcriptional modification of RNA that is catalyzed by 2'-O-methyltransferase and involves replacing the H on the 2′-hydroxyl group with a methyl group. The 2'-O-methylation modification site is detected in a variety of RNA types (miRNA, tRNA, mRNA, etc.), plays an important role in biological processes and is associated with different diseases. There are few functional mechanisms developed at present, and traditional high-throughput experiments are time-consuming and expensive to explore functional mechanisms. For a deeper understanding of relevant biological mechanisms, it is necessary to develop efficient and accurate recognition tools based on machine learning. Based on this, we constructed a predictor called NmRF based on optimal mixed features and random forest classifier to identify 2'-O-methylation modification sites. The predictor can identify modification sites of multiple species at the same time. To obtain a better prediction model, a two-step strategy is adopted; that is, the optimal hybrid feature set is obtained by combining the light gradient boosting algorithm and incremental feature selection strategy. In 10-fold cross-validation, the accuracies of Homo sapiens and Saccharomyces cerevisiae were 89.069 and 93.885%, and the AUC were 0.9498 and 0.9832, respectively. The rigorous 10-fold cross-validation and independent tests confirm that the proposed method is significantly better than existing tools. A user-friendly web server is accessible at http://lab.malab.cn/∼acy/NmRF. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

21. comparison of deep learning-based pre-processing and clustering approaches for single-cell RNA sequencing data.

Author: Wang, Jiacheng, Zou, Quan, and Lin, Chen
Subjects: *DEEP learning, *RNA sequencing, *RNA analysis, *DATA reduction, *TASK analysis, *QUALITY control, *DIMENSION reduction (Statistics)
Abstract: The emergence of single cell RNA sequencing has facilitated the studied of genomes, transcriptomes and proteomes. As available single-cell RNA-seq datasets are released continuously, one of the major challenges facing traditional RNA analysis tools is the high-dimensional, high-sparsity, high-noise and large-scale characteristics of single-cell RNA-seq data. Deep learning technologies match the characteristics of single-cell RNA-seq data perfectly and offer unprecedented promise. Here, we give a systematic review for most popular single-cell RNA-seq analysis methods and tools based on deep learning models, involving the procedures of data preprocessing (quality control, normalization, data correction, dimensionality reduction and data visualization) and clustering task for downstream analysis. We further evaluate the deep model-based analysis methods of data correction and clustering quantitatively on 11 gold standard datasets. Moreover, we discuss the data preferences of these methods and their limitations, and give some suggestions and guidance for users to select appropriate methods and tools. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

22. Machine-learning-assisted sensor array for detecting COVID-19 through simulated exhaled air.

Author: Zou, Quan, Itoh, Toshio, Shin, Woosuck, and Sawano, Makoto
Subjects: *SENSOR arrays, *REVERSE transcriptase polymerase chain reaction, *HUMAN fingerprints, *ELECTRONIC noses
Abstract: Reverse transcription polymerase chain reaction (RT-PCR), the primary test for COVID-19, requires complicated sample collection and several hours to obtain results. Breath test for exhaled volatile organic compounds (VOCs) has gained substantial research attention as a simple non-invasive and fast screening method. However, a unique VOC fingerprint as a potential prognostic biomarker is still unavailable. Accordingly, this study prepared simulated VOC gases to test the classification performance of a sensing system. The simulated VOC gases were selected according to the actual composition of the exhaled breath of patients with acute respiratory diseases, including COVID-19. Two sets of metal oxide sensor arrays, comprising eight commercial sensors and eight Advanced Institute of Science and Technology (AIST) laboratory-made sensors, were prepared to test their sensing ability for the simulated gases. The principal component analysis (PCA) results revealed that the AIST sensors had better sensing ability than the commercial sensors. Moreover, the recursive feature elimination cross-validation (RFECV) of random forest (RF) further confirmed the superiority of the AIST sensors over the commercial sensors. An artificial neural network (ANN) with excellent prediction performance for gas concentration was developed. This study provides a promising method for rapidly screening respiratory diseases. • Development of sensor arrays for breath analysis for screening COVID-19. • Selecting nine target VOCs based on reports from exhaled breaths of patients. • Testing two sensor arrays with machine learning to discriminate VOCs. • Important to combine various sensor-response principles of semiconductors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. iTTCA-RF: a random forest predictor for tumor T cell antigens.

Author: Jiao, Shihu, Zou, Quan, Guo, Huannan, and Shi, Lei
Subjects: *T cells, *RANDOM forest algorithms, *ANTIGENS, *MAJOR histocompatibility complex, *ANTIGEN presenting cells, *FEATURE selection
Abstract: Background: Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging.Methods: In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm.Results: Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA .Conclusions: We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

24. DisBalance: a platform to automatically build balance-based disease prediction models and discover microbial biomarkers from microbiome data.

Author: Yang, Fenglong and Zou, Quan
Subjects: *PREDICTION models, *MEDICAL model, *BIOMARKERS, *ULCERATIVE colitis, *FEATURE selection
Abstract: How best to utilize the microbial taxonomic abundances in regard to the prediction and explanation of human diseases remains appealing and challenging, and the relative nature of microbiome data necessitates a proper feature selection method to resolve the compositional problem. In this study, we developed an all-in-one platform to address a series of issues in microbiome-based human disease prediction and taxonomic biomarkers discovery. We prioritize the interpretation, runtime and classification accuracy of the distal discriminative balances analysis (DBA-distal) method in selecting a set of distal discriminative balances, and develop DisBalance, a comprehensive platform, to integrate and streamline the workflows of disease model building, disease risk prediction and disease-related biomarker discovery for microbiome-based binary classifications. DisBalance allows the de novo model-building and disease risk prediction in a very fast and convenient way. To facilitate the model-driven and knowledge-driven discoveries, DisBalance dedicates multiple strategies for the mining of microbial biomarkers. The independent validation of the models constructed by the DisBalance pipeline is performed on seven microbiome datasets from the original article of DBA-distal. The implementation of the DisBalance platform is demonstrated by a complete analysis of a shotgun metagenomic dataset of Ulcerative Colitis (UC). As a free and open-source, DisBlance can be accessed at http://lab.malab.cn/soft/DisBalance. The source code and demo data for Disbalance are available at https://github.com/yangfenglong/DisBalance. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

25. GutBalance: a server for the human gut microbiome-based disease prediction and biomarker discovery with compositionality addressed.

Author: Yang, Fenglong, Zou, Quan, and Gao, Bo
Subjects: *GUT microbiome, *BIOMARKERS, *MEDICAL research, *SUPERVISED learning, *FORECASTING, *MEDICAL model, *METAGENOMICS, *PHENOTYPES
Abstract: The compositionality of the microbiome data is well-known but often neglected. The compositional transformation pertains to the supervised learning of microbiome data and is a critical step that decides the performance and reliability of the disease classifiers. We value the excellent performance of the distal discriminative balance analysis (DBA) method, which selects distal balances of pairs and trios of bacteria, in addressing the classification of high-dimensional microbiome data. By applying this method to the species-level abundances of all the disease phenotypes in the GMrepo database, we build a balance-based model repository for the classification of human gut microbiome–related diseases. The model repository supports the prediction of disease risks for new sample(s). More importantly, we highlight the concept of balance-disease associations rather than the conventional microbe-disease associations and develop the human Gut Balance-Disease Association Database (GBDAD). Each predictable balance for each disease model indicates a potential biomarker-disease relationship and can be interpreted as a bacteria ratio positively or negatively correlated with the disease. Furthermore, by linking the balance-disease associations to the evidenced microbe-disease associations in MicroPhenoDB, we surprisingly found that most species-disease associations inferred from the shotgun metagenomic datasets can be validated by external evidence beyond MicroPhenoDB. The balance-based species-disease association inference will accelerate the generation of new microbe-disease association hypotheses in gastrointestinal microecology research and clinical trials. The model repository and the GBDAD database are deployed on the GutBalance server, which supports interactive visualization and systematic interrogation of the disease models, disease-related balances and disease-related species of interest. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

26. VPTMdb: a viral posttranslational modification database.

Author: Xiang, Yujia, Zou, Quan, and Zhao, Lilin
Subjects: *POST-translational modification, *INTERNET servers, *DRUG target, *DNA viruses, *VIRAL proteins, *PHOSPHORYLATION
Abstract: In viruses, posttranslational modifications (PTMs) are essential for their life cycle. Recognizing viral PTMs is very important for a better understanding of the mechanism of viral infections and finding potential drug targets. However, few studies have investigated the roles of viral PTMs in virus–human interactions using comprehensive viral PTM datasets. To fill this gap, we developed the first comprehensive viral posttranslational modification database (VPTMdb) for collecting systematic information of PTMs in human viruses and infected host cells. The VPTMdb contains 1240 unique viral PTM sites with 8 modification types from 43 viruses (818 experimentally verified PTM sites manually extracted from 150 publications and 422 PTMs extracted from SwissProt) as well as 13 650 infected cells' PTMs extracted from seven global proteomics experiments in six human viruses. The investigation of viral PTM sequences motifs showed that most viral PTMs have the consensus motifs with human proteins in phosphorylation and five cellular kinase families phosphorylate more than 10 viral species. The analysis of protein disordered regions presented that more than 50% glycosylation sites of double-strand DNA viruses are in the disordered regions, whereas single-strand RNA and retroviruses prefer ordered regions. Domain–domain interaction analysis indicating potential roles of viral PTMs play in infections. The findings should make an important contribution to the field of virus–human interaction. Moreover, we created a novel sequence-based classifier named VPTMpre to help users predict viral protein phosphorylation sites. VPTMdb online web server (http://vptmdb.com:8787/VPTMdb/) was implemented for users to download viral PTM data and predict phosphorylation sites of interest. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

27. Robust generalised predictive position control for chain‐type rotary shell magazine with disturbance observer.

Author: Zhou, Guangzu, Qian, Linfang, Zou, Quan, Sun, Le, and Wei, Kai
Published: 2024
Full Text: View/download PDF

28. Photoactivatable base editors for spatiotemporally controlled genome editing in vivo.

Author: Zou, Quan, Lu, Yi, Qing, Bo, Li, Na, Zhou, Ting, Pan, Jinbin, Zhang, Xuejun, Zhang, Xuening, Chen, Yupeng, and Sun, Shao-Kai
Subjects: *GENOME editing, *CRISPRS, *NUCLEOTIDE sequencing, *REPORTER genes, *BLUE light, *TRANSGENIC mice
Abstract: CRISPR-based base editors (BEs) are powerful tools for precise nucleotide substitution in a wide range of organisms, but spatiotemporal control of base editing remains a daunting challenge. Herein, we develop a photoactivatable base editor (Mag-ABE) for spatiotemporally controlled genome editing in vivo for the first time. The base editing activity of Mag-ABE can be activated by blue light for spatiotemporal regulation of both EGFP reporter gene and various endogenous genes editing. Meanwhile, the Mag-ABE prefers to edit A4 and A5 positions rather than to edit A6 position, showing the potential to decrease bystander editing of traditional adenine base editors. After integration with upconversion nanoparticles as a light transducer, the Mag-ABE is further applied for near-infrared (NIR) light-activated base editing of liver in transgenic reporter mice successfully. This study opens a promising way to improve the operability, safety, and precision of base editing. [Display omitted] [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

29. Numerical study of the effects of loading method on mixing of two kinds of pebbles in HTGR: A GPU-DEM simulation.

Author: Zou, Quan, Gui, Nan, Yang, Xingtuan, Tu, Jiyuan, and Jiang, Shengyao
Subjects: *PEBBLES, *YOUNG'S modulus, *NUCLEAR reactors, *NUCLEAR reactor safety measures
Abstract: The flow, stacking, and mixing of pebbles in the High-Temperature Gas-cooled Reactor (HTGR) will affect the power distribution of the core and thus affect the economy and safety of the nuclear reactor. Simulations of pebble loading and mixing in HTR-PM based on GPU-DEM have been studied with particle numbers ranging from 230,000 to 420,000. The effects of four loading methods on pebble mixing are compared and analyzed. Mean position, segregation index (SI), Lacey's mixing index (PSMI), mixing entropy (ME), and porosity are used for quantitative analysis. In addition, an alternative approximate method is proposed to calculate the particle number fraction, which can help solve the problem that the particle number fraction is related to the mesh size. The final result shows that different physical parameters, such as mass and Young's modulus, will induce slight stratification during pebble mixing. At the same time, the simulation results with different loading methods have different mixing degrees. The reduced model mixes better than the single-pebble-loading method, but the latter is closer to engineering practice. • Loading & mixing of two kinds of pebbles in a real-scale bed are simulated • An in-house GPU-DEM program has been developed to simulate 420,000 particles • Mixing degrees are analyzed by mean position, SI, PSMI, ME, and porosity • Effect of mass and Young's modulus on the mixing of pebbles is explored. • A new method is proposed to calculate number fractions to solve the mesh effects. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

30. Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model.

Author: Akbar, Shahid, Raza, Ali, and Zou, Quan
Subjects: *PEPTIDES, *ANTIMICROBIAL peptides, *VIRUS diseases, *MACHINE learning, *ANTIVIRAL agents
Abstract: Background: Viral infections have been the main health issue in the last decade. Antiviral peptides (AVPs) are a subclass of antimicrobial peptides (AMPs) with substantial potential to protect the human body against various viral diseases. However, there has been significant production of antiviral vaccines and medications. Recently, the development of AVPs as an antiviral agent suggests an effective way to treat virus-affected cells. Recently, the involvement of intelligent machine learning techniques for developing peptide-based therapeutic agents is becoming an increasing interest due to its significant outcomes. The existing wet-laboratory-based drugs are expensive, time-consuming, and cannot effectively perform in screening and predicting the targeted motif of antiviral peptides. Methods: In this paper, we proposed a novel computational model called Deepstacked-AVPs to discriminate AVPs accurately. The training sequences are numerically encoded using a novel Tri-segmentation-based position-specific scoring matrix (PSSM-TS) and word2vec-based semantic features. Composition/Transition/Distribution-Transition (CTDT) is also employed to represent the physiochemical properties based on structural features. Apart from these, the fused vector is formed using PSSM-TS features, semantic information, and CTDT descriptors to compensate for the limitations of single encoding methods. Information gain (IG) is applied to choose the optimal feature set. The selected features are trained using a stacked-ensemble classifier. Results: The proposed Deepstacked-AVPs model achieved a predictive accuracy of 96.60%%, an area under the curve (AUC) of 0.98, and a precision-recall (PR) value of 0.97 using training samples. In the case of the independent samples, our model obtained an accuracy of 95.15%, an AUC of 0.97, and a PR value of 0.97. Conclusion: Our Deepstacked-AVPs model outperformed existing models with a ~ 4% and ~ 2% higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed Deepstacked-AVPs model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. A GPU-based DEM model for the pebble flow study in packed bed: Simulation scheme and validation.

Author: Zou, Quan, Gui, Nan, Yang, Xingtuan, Tu, Jiyuan, and Jiang, Shengyao
Subjects: *PEBBLES, *GRANULAR flow, *CHEMICAL engineering, *CHEMICAL engineers, *LOADING & unloading
Abstract: This study developed a GPU-based DEM (GPU-DEM) model for the pebble flow in the packed bed. The details of the GPU-DEM scheme have been illustrated. The uniform mesh neighbor-search method, unified memory addressing (UMA), and special boundary conditions were incorporated in the GPU-DEM model. Cases of the direct impact of two identical particles, the angle of repose of particles drained from a lifted hopper, and the circulating hopper to mimic a pebble flow in High-Temperature Gas-cooled Reactor (HTGR) reactors were carried out to systematically verify the applicability, computational efficiency, and accuracy of the GPU-DEM model. Related results, like the unloading speed, porosity, and velocity distribution, obtained by the GPU-DEM model have been validated too. In addition, the effects of single-precision floating-point number computation on GPU were discussed. The results demonstrated that the GPU-DEM can achieve 18– 20 times acceleration while maintaining accuracy, even when single-precision floating-point numbers were used for calculation. Therefore, it is a powerful tool to explore the particle flows in chemical engineering applications. [Display omitted] • A GPU-based DEM model and scheme for pebble flows in the packed bed are developed. • Uniform mesh search, unified memory addressing (UMA), and BCs were incorporated. • Applicability, computational efficiency, and accuracy of the GPU-DEM were analyzed. • Unloading speed, porosity, and velocity distribution were obtained for validation too. • With equivalent accuracy, GPU-DEM can achieve 18– 20 times as fast as the CPU-DEM. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

32. Identification of drug-side effect association via correntropy-loss based matrix factorization with neural tangent kernel.

Author: Ding, Yijie, Zhou, Hongmei, Zou, Quan, and Yuan, Lei
Subjects: *MATRIX decomposition, *DRUG side effects, *DRUG monitoring, *KERNEL operating systems, *MEDICATION safety, *MACHINE learning
Abstract: • Neural tangent kernel is used to construct the similarity matrices. • Correntropy-loss function is introduced into matrix factorization. • An efficient iterative algorithm is employed to optimize the model. Adverse drug reactions include side effects, allergic reactions, and secondary infections. Severe adverse reactions can cause cancer, deformity, or mutation. The monitoring of drug side effects is an important support for post marketing safety supervision of drugs, and an important basis for revising drug instructions. Its purpose is to timely detect and control drug safety risks. Traditional methods are time-consuming. To accelerate the discovery of side effects, we propose a machine learning based method, called correntropy-loss based matrix factorization with neural tangent kernel (CLMF-NTK), to solve the prediction of drug side effects. Our method and other computational methods are tested on three benchmark datasets, and the results show that our method achieves the best predictive performance. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

33. Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model.

Author: Wang, Jiacheng, Chen, Yaojia, and Zou, Quan
Subjects: *DEEP learning, *GENE regulatory networks, *MONONUCLEAR leukocytes, *BIOLOGICAL systems, *REGULATOR genes, *TRIPLE-negative breast cancer, *GENE expression
Abstract: The gene regulatory structure of cells involves not only the regulatory relationship between two genes, but also the cooperative associations of multiple genes. However, most gene regulatory network inference methods for single cell only focus on and infer the regulatory relationships of pairs of genes, ignoring the global regulatory structure which is crucial to identify the regulations in the complex biological systems. Here, we proposed a graph-based Deep learning model for Regulatory networks Inference among Genes (DeepRIG) from single-cell RNA-seq data. To learn the global regulatory structure, DeepRIG builds a prior regulatory graph by transforming the gene expression of data into the co-expression mode. Then it utilizes a graph autoencoder model to embed the global regulatory information contained in the graph into gene latent embeddings and to reconstruct the gene regulatory network. Extensive benchmarking results demonstrate that DeepRIG can accurately reconstruct the gene regulatory networks and outperform existing methods on multiple simulated networks and real-cell regulatory networks. Additionally, we applied DeepRIG to the samples of human peripheral blood mononuclear cells and triple-negative breast cancer, and presented that DeepRIG can provide accurate cell-type-specific gene regulatory networks inference and identify novel regulators of progression and inhibition. Author summary: Although many methods have been proposed to infer the gene regulatory network of a single cell, they only focus on the regulatory relationships of pairs of genes and ignore the global regulatory structure. Here, we present a deep learning-based model to learn the global regulatory structure and reconstruct the gene regulatory networks from single-cell RNA sequencing data with a graph view. We utilize the weighted gene co-expression analysis to build a prior regulatory graph of gene and a graph autoencoder to deconstruct the latent regulatory structure among genes. We performed extensive experiments on varieties of single-cell RNA sequencing datasets and compared our method with 9 stat-of-the-art gene regulatory network inference method. The results show that our method can significantly improve the accuracy of gene regulatory network inference and can be applied to identify key regulators in a wide range of scenarios. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

34. Minimalist O2 generator formed by in situ KMnO4 oxidation for tumor cascade therapy.

Author: Pan, Haiyan, Zou, Quan, Wang, Tingting, Li, Dong, and Sun, Shao-Kai
Subjects: *POTASSIUM permanganate, *INTRAVENOUS injections, *PHOTODYNAMIC therapy, *TREATMENT effectiveness, *TUMORS, *OXIDATION
Abstract: Diverse oxygen generation strategies have been developed to overcome hypoxia in tumors for enhancing the therapeutic efficacy, but inevitably suffering from tedious synthesis process of oxygen generators in vitro before in vivo administration. Herein, we show direct injection of commercially and clinically used KMnO 4 into solid tumors enables in situ formation of MnO 2 as an oxygen depot for cascade oxidation damage and enhanced photodynamic therapy. KMnO 4 can damage tumor tissues by oxidation and generate MnO 2 , and subsequent intravenous injection of Ce6 allows MnO 2 -triggered hypoxia-modulated photodynamic therapy of tumors. Excellent cascade tumor suppression effect is realized both in vitro and in vivo based on the KMnO 4 –Ce6 system without the need of synthesis. The proposed strategy lays down a novel way with unprecedented superiors of no need of synthesis process and ultra-facile administration procedure for tumor hypoxia-modulated cascade therapy. Intratumoral injection of KMnO 4 can not only damage tumor cells by oxidation, but also generate MnO 2 as a minimalist O 2 generator, and subsequently intravenous injection of Ce6 allows hypoxia-modulated photodynamic therapy of tumors. The proposed strategy lays down a new way with unprecedented superiors of avoiding the synthesis process and ultra-facile administration procedure for tumor hypoxia-modulated cascade therapy. [Display omitted] [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

35. Recall DNA methylation levels at low coverage sites using a CNN model in WGBS.

Author: Luo, Ximei, Wang, Yansu, Zou, Quan, and Xu, Lei
Subjects: *DNA methylation, *REGULATOR genes, *GENETIC regulation, *DEEP learning, *METHYLATION, *METHYLGUANINE
Abstract: DNA methylation is an important regulator of gene transcription. WGBS is the gold-standard approach for base-pair resolution quantitative of DNA methylation. It requires high sequencing depth. Many CpG sites with insufficient coverage in the WGBS data, resulting in inaccurate DNA methylation levels of individual sites. Many state-of-arts computation methods were proposed to predict the missing value. However, many methods required either other omics datasets or other cross-sample data. And most of them only predicted the state of DNA methylation. In this study, we proposed the RcWGBS, which can impute the missing (or low coverage) values from the DNA methylation levels on the adjacent sides. Deep learning techniques were employed for the accurate prediction. The WGBS datasets of H1-hESC and GM12878 were down-sampled. The average difference between the DNA methylation level at 12× depth predicted by RcWGBS and that at >50× depth in the H1-hESC and GM2878 cells are less than 0.03 and 0.01, respectively. RcWGBS performed better than METHimpute even though the sequencing depth was as low as 12×. Our work would help to process methylation data of low sequencing depth. It is beneficial for researchers to save sequencing costs and improve data utilization through computational methods. Author summary: DNA methylation has a major impact on gene regulation. WGBS is the gold standard for investigating the DNA methylation. The DNA methylation level of the sites with low coverage are often not accurate in WGBS datasets. Therefore, we proposed a method based on the CNN model to perform DNA methylation level interpolation for specific sites and named this method as RcWGBS. RcWGBS did not rely on other omics data or other cross-sample data. It only used the sites with sufficient coverage contained in the target WGBS dataset for model training to obtain parameters. Then, the trained model can be used to predict the DNA methylation level of sites with low coverage. Our analyses showed that RcWGBS could recalibrate the methylation level of some CpGs with insufficient coverage. It is suggested that our research could benefit the WGBS datasets with insufficient sequencing coverage. RcWGBS is implemented as an R-packages. It is efficient and convenient and does not need other WGBS or omics data. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

36. Risk factors of non-sentinel lymph node metastasis in breast cancer with 1–2 sentinel lymph node macrometastases underwent total mastectomy: a case-control study.

Author: Huang, Zhen, Wu, Zhe, Zou, Quan-qing, Xie, Yu-jie, Li, Li-hui, Huang, Yan-ping, Wu, Feng-ming, Huang, Dong, Pan, Yin-hua, and Yang, Jian-rong
Subjects: *METASTATIC breast cancer, *SENTINEL lymph nodes, *LYMPHATIC metastasis, *AXILLARY lymph node dissection, *MASTECTOMY, *MICROMETASTASIS
Abstract: Background: The randomized trials which include ACOSOG Z0011 and IBCSG 23-01 had found that the survival rates were not different in patients with cT1/2N0 and 1–2 sentinel lymph node (SLN)-positive, macro/micrometastases who underwent breast-conserving therapy, and micrometastases who underwent total mastectomy (TM), when axillary lymph node dissection (ALND) was omitted. However, for patients with cT1/2N0 and 1–2 SLN macrometastases who underwent TM; there was still insufficient evidence from clinical studies to support whether ALND can be exempted. This study aimed to investigate the risk factors of non-sentinel lymph node (nSLN) metastasis in breast cancer patients with 1–2 SLN macrometastases undergoing TM. Methods: The clinicopathological data of 1491 breast cancer patients who underwent TM and SLNB from January 2017 to February 2022 were retrospectively analyzed. Univariate and multivariate analyses were performed to analyze the risk factors for nSLN metastasis. Results: A total of 273 patients with 1–2 SLN macrometastases who underwent TM were enrolled. Postoperative pathological data showed that 35.2% patients had nSLN metastasis. The results of multivariate analysis indicated that tumor size (TS) (P = 0.002; OR: 1.051; 95% CI: 1.019–1.084) and ratio of SLN macrometastases (P = 0.0001; OR: 12.597: 95% CI: 4.302–36.890) were the independent risk factors for nSLN metastasis in breast cancer patients with 1–2 SLN macrometastases that underwent TM. The ROC curve analysis suggested that when TS ≤22 mm and ratio of SLN macrometastases ≤0.33, the incidence of nSLN metastasis could be reduced to 17.1%. Conclusions: The breast cancer patients with cT1/2N0 stage, undergoing TM and 1–2 SLN macrometastases, when the TS ≤22 mm and macrometastatic SLN does not exceed 1/3 of the total number of detected SLN, the incidence of nSLN metastasis is significantly reduced, but whether ALND can be exempted needs further exploration. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

37. Comparative Genomics and Functional Genomics Analysis in Plants.

Author: Wang, Jiacheng, Chen, Yaojia, and Zou, Quan
Subjects: *FUNCTIONAL genomics, *COMPARATIVE genomics, *FUNCTIONAL analysis, *SEX determination, *GENETIC regulation, *KIWIFRUIT, *GARLIC, *PEACH
Abstract: By integrating biological and bioinformatics data, Turek et al. (2023) [[21]] performed a genome assembly on the B10v3 cucumber genome and presented a well-organized and annotated reference genome of cucumber, contributing to better understanding of the cucumber genome. Using genome-wide classification and functional annotation, Cenci et al. (2022) [[18]] effectively characterized a large gene family across multiple species and established a classification framework for gene functional transfer, enabling a better understanding of the evolutionary history of one gene. In addition, several papers explored functional genomics of plant genomes, such as gene functional annotation and the development of functional genomics tools. Comparative genomics and functional genomics are two basic branches of plant genomics. [Extracted from the article]
Published: 2023
Full Text: View/download PDF

38. Diversity of Pholcus Spiders (Araneae: Pholcidae) in China's Lüliang Mountains: An Integrated Morphological and Molecular Approach.

Author: Zhao, Fang-Yu, Yang, Lan, Zou, Quan-Xuan, Ali, Abid, Li, Shu-Qiang, and Yao, Zhi-Yuan
Subjects: *SPIDERS, *DNA sequencing, *POISSON processes, *SPECIES distribution
Abstract: Simple Summary: Pholcus is the most diverse spider genus in Pholcidae, and is widely distributed in the Palaearctic, Indo-Malayan, Afrotropical, and Australasian Regions. Previously, the Pholcus spiders have not been recorded from the Lüliang Mountains of North China. We undertook an expedition there for the first time. Phylogenetic analyses of DNA sequence data from four gene fragments (COI, H3, wnt, 28S) suggested that Pholcus from the Lüliang Mountains were grouped into nine well-supported clades. We adopted an integrative approach, including morphology and four methods of molecular species delimitation (ABGD, GMYC, bPTP, and BPP), to investigate species boundaries. Such analyses identified the nine clades as nine separate species, of which eight are new to science. All of them belong to the P. phungiformes species group. Spiders of the genus Pholcus were collected for the first time during an expedition to the Lüliang Mountains in Shanxi Province, North China. Phylogenetic analyses of DNA sequence data from COI, H3, wnt, and 28S genes allowed us to group them into nine well-supported clades. We used morphology and four methods of molecular species delimitation, namely Automatic Barcode Gap Discovery (ABGD), the Generalized Mixed Yule Coalescent (GMYC), Bayesian Poisson Tree Processes (bPTP), and Bayesian Phylogenetics and Phylogeography (BPP), to investigate species boundaries. These integrative taxonomic analyses identified the nine clades as nine distinct species, comprising Pholcus luya Peng & Zhang, 2013 and eight other species new to science: Pholcus jiaocheng sp. nov., Pholcus linfen sp. nov., Pholcus lishi sp. nov., Pholcus luliang sp. nov., Pholcus wenshui sp. nov., Pholcus xiangfen sp. nov., Pholcus xuanzhong sp. nov., and Pholcus zhongyang sp. nov. The species occur in geographic proximity and show many morphological similarities. All of them belong to the P. phungiformes species group. The records from the Lüliang Mountains represent the westernmost distribution limit of this species group. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

39. DeepMPF: deep learning framework for predicting drug–target interactions based on multi-modal representation with meta-path semantic analysis.

Author: Ren, Zhong-Hao, You, Zhu-Hong, Zou, Quan, Yu, Chang-Qing, Ma, Yan-Fang, Guan, Yong-Jian, You, Hai-Ru, Wang, Xin-Fei, and Pan, Jie
Subjects: *DEEP learning, *DRUG discovery, *DRUG design, *DRUG repositioning, *BIOLOGICAL networks, *INTERNET servers
Abstract: Background: Drug-target interaction (DTI) prediction has become a crucial prerequisite in drug design and drug discovery. However, the traditional biological experiment is time-consuming and expensive, as there are abundant complex interactions present in the large size of genomic and chemical spaces. For alleviating this phenomenon, plenty of computational methods are conducted to effectively complement biological experiments and narrow the search spaces into a preferred candidate domain. Whereas, most of the previous approaches cannot fully consider association behavior semantic information based on several schemas to represent complex the structure of heterogeneous biological networks. Additionally, the prediction of DTI based on single modalities cannot satisfy the demand for prediction accuracy. Methods: We propose a multi-modal representation framework of 'DeepMPF' based on meta-path semantic analysis, which effectively utilizes heterogeneous information to predict DTI. Specifically, we first construct protein–drug-disease heterogeneous networks composed of three entities. Then the feature information is obtained under three views, containing sequence modality, heterogeneous structure modality and similarity modality. We proposed six representative schemas of meta-path to preserve the high-order nonlinear structure and catch hidden structural information of the heterogeneous network. Finally, DeepMPF generates highly representative comprehensive feature descriptors and calculates the probability of interaction through joint learning. Results: To evaluate the predictive performance of DeepMPF, comparison experiments are conducted on four gold datasets. Our method can obtain competitive performance in all datasets. We also explore the influence of the different feature embedding dimensions, learning strategies and classification methods. Meaningfully, the drug repositioning experiments on COVID-19 and HIV demonstrate DeepMPF can be applied to solve problems in reality and help drug discovery. The further analysis of molecular docking experiments enhances the credibility of the drug candidates predicted by DeepMPF. Conclusions: All the results demonstrate the effectively predictive capability of DeepMPF for drug-target interactions. It can be utilized as a useful tool to prescreen the most potential drug candidates for the protein. The web server of the DeepMPF predictor is freely available at http://120.77.11.78/DeepMPF/, which can help relevant researchers to further study. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

40. i6mA-Caps: a CapsuleNet-based framework for identifying DNA N6-methyladenine sites.

Author: Rehman, Mobeen Ur, Tayara, Hilal, Zou, Quan, and Chong, Kil To
Subjects: *INTERNET servers, *CAPSULE neural networks, *DNA, *NUCLEOTIDE sequence, *ARABIDOPSIS thaliana, *DNA sequencing
Abstract: Motivation DNA N6-methyladenine (6mA) has been demonstrated to have an essential function in epigenetic modification in eukaryotic species in recent research. 6mA has been linked to various biological processes. It's critical to create a new algorithm that can rapidly and reliably detect 6mA sites in genomes to investigate their biological roles. The identification of 6mA marks in the genome is the first and most important step in understanding the underlying molecular processes, as well as their regulatory functions. Results In this article, we proposed a novel computational tool called i6mA-Caps which CapsuleNet based a framework for identifying the DNA N6-methyladenine sites. The proposed framework uses a single encoding scheme for numerical representation of the DNA sequence. The numerical data is then used by the set of convolution layers to extract low-level features. These features are then used by the capsule network to extract intermediate-level and later high-level features to classify the 6mA sites. The proposed network is evaluated on three datasets belonging to three genomes which are Rosaceae , Rice and Arabidopsis thaliana. Proposed method has attained an accuracy of 96.71%, 94% and 86.83% for independent Rosaceae dataset, Rice dataset and A.thaliana dataset respectively. The proposed framework has exhibited improved results when compared with the existing top-of-the-line methods. Availability and implementation A user-friendly web-server is made available for the biological experts which can be accessed at: http://nsclbio.jbnu.ac.kr/tools/i6mA-Caps/. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

41. Effector-GAN: prediction of fungal effector proteins based on pretrained deep representation learning methods and generative adversarial networks.

Author: Wang, Yansu, Luo, Ximei, and Zou, Quan
Subjects: *INTERNET servers, *GENERATIVE adversarial networks, *FUNGAL proteins, *PROBABILISTIC generative models, *DEEP learning, *PHYTOPATHOGENIC fungi, *PLANT diseases
Abstract: Motivation Phytopathogenic fungi secrete effector proteins to subvert host defenses and facilitate infection. Systematic analysis and prediction of candidate fungal effector proteins are crucial for experimental validation and biological control of plant disease. However, two problems are still considered intractable to be solved in fungal effector prediction: one is the high-level diversity in effector sequences that increases the difficulty of protein feature learning, and the other is the class imbalance between effector and non-effector samples in the training dataset. Results In our study, pretrained deep representation learning methods are presented to represent multiple characteristics of sequences for predicting fungal effectors and generative adversarial networks are adapted to create synthetic feature samples to address the data imbalance problem. Compared with the state-of-the-art fungal effector prediction methods, Effector-GAN shows an overall improvement in accuracy in the independent test set. Availability and implementation Effector-GAN offers a user-friendly interface to inspect potential fungal effector proteins (http://lab.malab.cn/~wys/webserver/Effector-GAN). The Python script can be downloaded from http://lab.malab.cn/~wys/gitlab/effector-gan. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

42. Predicting protein–peptide binding residues via interpretable deep learning.

Author: Wang, Ruheng, Jin, Junru, Zou, Quan, Nakai, Kenta, and Wei, Leyi
Subjects: *DEEP learning, *AMINO acid sequence, *PROTEIN structure, *PROTEIN-protein interactions, *PROTEIN models, *DRUG discovery
Abstract: Summary Identifying the protein–peptide binding residues is fundamentally important to understand the mechanisms of protein functions and explore drug discovery. Although several computational methods have been developed, most of them highly rely on third-party tools or complex data preprocessing for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. To address the limitations, we propose PepBCL, a novel BERT (Bidirectional Encoder Representation from Transformers) -based contrastive learning framework to predict the protein–peptide binding residues based on protein sequences only. PepBCL is an end-to-end predictive model that is independent of feature engineering. Specifically, we introduce a well pre-trained protein language model that can automatically extract and learn high-latent representations of protein sequences relevant for protein structures and functions. Further, we design a novel contrastive learning module to optimize the feature representations of binding residues underlying the imbalanced dataset. We demonstrate that our proposed method significantly outperforms the state-of-the-art methods under benchmarking comparison, and achieves more robust performance. Moreover, we found that we further improve the performance via the integration of traditional features and our learnt features. Interestingly, the interpretable analysis of our model highlights the flexibility and adaptability of deep learning-based protein language model to capture both conserved and non-conserved sequential characteristics of peptide-binding residues. Finally, to facilitate the use of our method, we establish an online predictive platform as the implementation of the proposed PepBCL, which is now available at http://server.wei-group.net/PepBCL/. Availability and implementation https://github.com/Ruheng-W/PepBCL. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

43. Generative adversarial network with the discriminator using measurements as an auxiliary input for single-pixel imaging.

Author: Dai, Qianling, Yan, Qiurong, Zou, Quan, Li, Yi, and Yan, Jinwei
Subjects: *GENERATIVE adversarial networks, *PIXELS, *DEEP learning, *IMAGE compression, *COMPRESSED sensing, *IMAGE converters
Abstract: Single-pixel imaging (SPI) can realize two-dimensional imaging with a single-pixel detector without spatial resolution, and has wide application prospects in many fields because of high sensitivity and low cost. The compression reconstruction algorithm based on deep learning can improve the quality of reconstructed images. Generative adversarial network (GAN), which has excellent performance in generating images, is also gradually used in compressed sensing. However, the prior of compressively sensed measurements has not been fully utilized. Therefore, this paper proposes generative adversarial networks MAID-GAN and MAID-GAN+ with the discriminator using measurements as an auxiliary input. The image and corresponding measurements are taken as inputs of the discriminator, and the Y-shaped network structure is used to fuse the feature maps of the image domain and the measurement domain, so as to better guide the generator to generate the image close to the original image and improve the quality of the generated image. Subpixel convolution sampling is used to extract image features, and the sampling network and the reconstruction network are optimized jointly. The simulation and experimental results show that networks proposed in this paper have obvious advantages in reconstruction under low sampling rates. • The optimized sampling masks can improve the sampling efficiency. • Using measurements as an auxiliary input to the discriminator can guide the generator to generate images with more details. • Adding global features to the generator can improve the quality of reconstructed images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Integrating Single‐Cell and Spatial Transcriptomics Reveals Heterogeneity of Early Pig Skin Development and a Subpopulation with Hair Placode Formation.

Author: Wang, Yi, Jiang, Yao, Ni, Guiyan, Li, Shujuan, Balderson, Brad, Zou, Quan, Liu, Huatao, Jiang, Yifan, Sun, Jingchun, and Ding, Xiangdong
Subjects: *TRANSCRIPTOMES, *EPIDERMIS, *HAIR follicles, *SWINE, *HAIR, *HETEROGENEITY, *ETIOLOGY of diseases, *FETUS, KERATINOCYTE differentiation
Abstract: The dermis and epidermis, crucial structural layers of the skin, encompass appendages, hair follicles (HFs), and intricate cellular heterogeneity. However, an integrated spatiotemporal transcriptomic atlas of embryonic skin has not yet been described and would be invaluable for studying skin‐related diseases in humans. Here, single‐cell and spatial transcriptomic analyses are performed on skin samples of normal and hairless fetal pigs across four developmental periods. The cross‐species comparison of skin cells illustrated that the pig epidermis is more representative of the human epidermis than mice epidermis. Moreover, Phenome‐wide association study analysis revealed that the conserved genes between pigs and humans are strongly associated with human skin‐related diseases. In the epidermis, two lineage differentiation trajectories describe hair follicle (HF) morphogenesis and epidermal development. By comparing normal and hairless fetal pigs, it is found that the hair placode (Pc), the most characteristic initial structure in HFs, arises from progenitor‐like OGN+/UCHL1+ cells. These progenitors appear earlier in development than the previously described early Pc cells and exhibit abnormal proliferation and migration during differentiation in hairless pigs. The study provides a valuable resource for in‐depth insights into HF development, which may serve as a key reference atlas for studying human skin disease etiology using porcine models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. DeepAVP-TPPred: identification of antiviral peptides using transformed image-based localized descriptors and binary tree growth algorithm.

Author: Ullah, Matee, Akbar, Shahid, Raza, Ali, and Zou, Quan
Subjects: *ARTIFICIAL neural networks, *TREE growth, *PEPTIDES, *LIFE cycles (Biology), *FEATURE selection, *FEATURE extraction, *IDENTIFICATION
Abstract: Motivation Despite the extensive manufacturing of antiviral drugs and vaccination, viral infections continue to be a major human ailment. Antiviral peptides (AVPs) have emerged as potential candidates in the pursuit of novel antiviral drugs. These peptides show vigorous antiviral activity against a diverse range of viruses by targeting different phases of the viral life cycle. Therefore, the accurate prediction of AVPs is an essential yet challenging task. Lately, many machine learning-based approaches have developed for this purpose; however, their limited capabilities in terms of feature engineering, accuracy, and generalization make these methods restricted. Results In the present study, we aim to develop an efficient machine learning-based approach for the identification of AVPs, referred to as DeepAVP-TPPred, to address the aforementioned problems. First, we extract two new transformed feature sets using our designed image-based feature extraction algorithms and integrate them with an evolutionary information-based feature. Next, these feature sets were optimized using a novel feature selection approach called binary tree growth Algorithm. Finally, the optimal feature space from the training dataset was fed to the deep neural network to build the final classification model. The proposed model DeepAVP-TPPred was tested using stringent 5-fold cross-validation and two independent dataset testing methods, which achieved the maximum performance and showed enhanced efficiency over existing predictors in terms of both accuracy and generalization capabilities. Availability and implementation https://github.com/MateeullahKhan/DeepAVP-TPPred. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Integrated convolution and self-attention for improving peptide toxicity prediction.

Author: Jiao, Shihu, Ye, Xiucai, Sakurai, Tetsuya, Zou, Quan, and Liu, Ruijun
Subjects: *PEPTIDES, *AMINO acid sequence, *PEPTIDE drugs, *SOURCE code, *DRUG development
Abstract: Motivation Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. Results We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. Availability and implementation The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. scTPC: a novel semisupervised deep clustering model for scRNA-seq data.

Author: Qiu, Yushan, Yang, Lingfei, Jiang, Hao, and Zou, Quan
Subjects: *DEEP learning, *NEGATIVE binomial distribution, *RNA sequencing, *DATA modeling, *FUZZY clustering technique, *SEQUENCE analysis, *RESEARCH personnel
Abstract: Motivation Continuous advancements in single-cell RNA sequencing (scRNA-seq) technology have enabled researchers to further explore the study of cell heterogeneity, trajectory inference, identification of rare cell types, and neurology. Accurate scRNA-seq data clustering is crucial in single-cell sequencing data analysis. However, the high dimensionality, sparsity, and presence of "false" zero values in the data can pose challenges to clustering. Furthermore, current unsupervised clustering algorithms have not effectively leveraged prior biological knowledge, making cell clustering even more challenging. Results This study investigates a semisupervised clustering model called scTPC, which integrates the t riplet constraint, p airwise constraint, and c ross-entropy constraint based on deep learning. Specifically, the model begins by pretraining a denoising autoencoder based on a zero-inflated negative binomial distribution. Deep clustering is then performed in the learned latent feature space using triplet constraints and pairwise constraints generated from partial labeled cells. Finally, to address imbalanced cell-type datasets, a weighted cross-entropy loss is introduced to optimize the model. A series of experimental results on 10 real scRNA-seq datasets and five simulated datasets demonstrate that scTPC achieves accurate clustering with a well-designed framework. Availability and implementation scTPC is a Python-based algorithm, and the code is available from https://github.com/LF-Yang/Code or https://zenodo.org/records/10951780. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Revisiting drug–protein interaction prediction: a novel global–local perspective.

Author: Zhou, Zhecheng, Liao, Qingquan, Wei, Jinhang, Zhuo, Linlin, Wu, Xiaonan, Fu, Xiangzheng, and Zou, Quan
Subjects: *MULTILAYER perceptrons, *BIPARTITE graphs, *DRUG repositioning, *DEEP learning, *TRANSFORMER models, *INDIVIDUALIZED medicine, *PROTEIN-protein interactions
Abstract: Motivation Accurate inference of potential drug–protein interactions (DPIs) aids in understanding drug mechanisms and developing novel treatments. Existing deep learning models, however, struggle with accurate node representation in DPI prediction, limiting their performance. Results We propose a new computational framework that integrates global and local features of nodes in the drug–protein bipartite graph for efficient DPI inference. Initially, we employ pre-trained models to acquire fundamental knowledge of drugs and proteins and to determine their initial features. Subsequently, the MinHash and HyperLogLog algorithms are utilized to estimate the similarity and set cardinality between drug and protein subgraphs, serving as their local features. Then, an energy-constrained diffusion mechanism is integrated into the transformer architecture, capturing interdependencies between nodes in the drug–protein bipartite graph and extracting their global features. Finally, we fuse the local and global features of nodes and employ multilayer perceptrons to predict the likelihood of potential DPIs. A comprehensive and precise node representation guarantees efficient prediction of unknown DPIs by the model. Various experiments validate the accuracy and reliability of our model, with molecular docking results revealing its capability to identify potential DPIs not present in existing databases. This approach is expected to offer valuable insights for furthering drug repurposing and personalized medicine research. Availability and implementation Our code and data are accessible at: https://github.com/ZZCrazy00/DPI. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data.

Author: Tian, Qinzhong, Zhang, Pinglu, Zhai, Yixiao, Wang, Yansu, and Zou, Quan
Subjects: *NUCLEOTIDE sequencing, *TECHNOLOGICAL innovations, *CLASSIFICATION, *DEVELOPMENTAL biology, *DATABASES, *SYNTHETIC biology
Abstract: The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization.

Author: Qiu, Yushan, Guo, Dong, Zhao, Pu, and Zou, Quan
Subjects: *MATRIX decomposition, *MULTIOMICS, *METABOLOMICS, *NONNEGATIVE matrices, *CONSTRAINED optimization, *FEATURE selection, *TRANSCRIPTOMES
Abstract: Motivation The technology for analyzing single-cell multi-omics data has advanced rapidly and has provided comprehensive and accurate cellular information by exploring cell heterogeneity in genomics, transcriptomics, epigenomics, metabolomics and proteomics data. However, because of the high-dimensional and sparse characteristics of single-cell multi-omics data, as well as the limitations of various analysis algorithms, the clustering performance is generally poor. Matrix factorization is an unsupervised, dimensionality reduction-based method that can cluster individuals and discover related omics variables from different blocks. Here, we present a novel algorithm that performs joint dimensionality reduction learning and cell clustering analysis on single-cell multi-omics data using non-negative matrix factorization that we named scMNMF. We formulate the objective function of joint learning as a constrained optimization problem and derive the corresponding iterative formulas through alternating iterative algorithms. The major advantage of the scMNMF algorithm remains its capability to explore hidden related features among omics data. Additionally, the feature selection for dimensionality reduction and cell clustering mutually influence each other iteratively, leading to a more effective discovery of cell types. We validated the performance of the scMNMF algorithm using two simulated and five real datasets. The results show that scMNMF outperformed seven other state-of-the-art algorithms in various measurements. Availability and implementation scMNMF code can be found at https://github.com/yushanqiu/scMNMF. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

139 results on '"Zou, Quan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources