139 results on '"Tianyi Zang"'
Search Results
2. Explore potential disease related metabolites based on latent factor model
- Author
-
Yongtian Wang, Liran Juan, Jiajie Peng, Tao Wang, Tianyi Zang, and Yadong Wang
- Subjects
Metabolite ,Disease similarity ,Disease diagnosis ,Matrix decomposition ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background In biological systems, metabolomics can not only contribute to the discovery of metabolic signatures for disease diagnosis, but is very helpful to illustrate the underlying molecular disease-causing mechanism. Therefore, identification of disease-related metabolites is of great significance for comprehensively understanding the pathogenesis of diseases and improving clinical medicine. Results In the paper, we propose a disease and literature driven metabolism prediction model (DLMPM) to identify the potential associations between metabolites and diseases based on latent factor model. We build the disease glossary with disease terms from different databases and an association matrix based on the mapping between diseases and metabolites. The similarity of diseases and metabolites is used to complete the association matrix. Finally, we predict potential associations between metabolites and diseases based on the matrix decomposition method. In total, 1,406 direct associations between diseases and metabolites are found. There are 119,206 unknown associations between diseases and metabolites predicted with a coverage rate of 80.88%. Subsequently, we extract training sets and testing sets based on data increment from the database of disease-related metabolites and assess the performance of DLMPM on 19 diseases. As a result, DLMPM is proven to be successful in predicting potential metabolic signatures for human diseases with an average AUC value of 82.33%. Conclusion In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. The results show that DLMPM has a better performance in prioritizing candidate diseases-related metabolites compared with the previous methods and would be helpful for researchers to reveal more information about human diseases.
- Published
- 2022
- Full Text
- View/download PDF
3. A multi-network integration approach for measuring disease similarity based on ncRNA regulation and heterogeneous information
- Author
-
Ningyi Zhang and Tianyi Zang
- Subjects
Non-coding RNA ,Disease similarity ,Semantic association ,Gene functional network ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Measuring similarity between complex diseases has significant implications for revealing the pathogenesis of diseases and development in the domain of biomedicine. It has been consentaneous that functional associations between disease-related genes and semantic associations can be applied to calculate disease similarity. Currently, more and more studies have demonstrated the profound involvement of non-coding RNA in the regulation of genome organization and gene expression. Thus, taking ncRNA into account can be useful in measuring disease similarities. However, existing methods ignore the regulation functions of ncRNA in biological process. In this study, we proposed a novel deep-learning method to deduce disease similarity. Results In this article, we proposed a novel method, ImpAESim, a framework integrating multiple networks embedding to learn compact feature representations and disease similarity calculation. We first utilize three different disease-related information networks to build up a heterogeneous network, after a network diffusion process, RWR, a compact feature learning model composed of classic Auto Encoder (AE) and improved AE model is proposed to extract constraints and low-dimensional feature representations. We finally obtain an accurate and low-dimensional feature representation of diseases, then we employed the cosine distance as the measurement of disease similarity. Conclusion ImpAESim focuses on extracting a low-dimensional vector representation of features based on ncRNA regulation, and gene–gene interaction network. Our method can significantly reduce the calculation bias resulted from the sparse disease associations which are derived from semantic associations.
- Published
- 2022
- Full Text
- View/download PDF
4. CNN-DDI: a learning-based method for predicting drug–drug interactions using convolution neural networks
- Author
-
Chengcheng Zhang, Yao Lu, and Tianyi Zang
- Subjects
Drug–drug interactions ,Drug categories ,Convolutional neural network ,Multiple features combination ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Drug–drug interactions (DDIs) are the reactions between drugs. They are compartmentalized into three types: synergistic, antagonistic and no reaction. As a rapidly developing technology, predicting DDIs-associated events is getting more and more attention and application in drug development and disease diagnosis fields. In this work, we study not only whether the two drugs interact, but also specific interaction types. And we propose a learning-based method using convolution neural networks to learn feature representations and predict DDIs. Results In this paper, we proposed a novel algorithm using a CNN architecture, named CNN-DDI, to predict drug–drug interactions. First, we extract feature interactions from drug categories, targets, pathways and enzymes as feature vectors and employ the Jaccard similarity as the measurement of drugs similarity. Then, based on the representation of features, we build a new convolution neural network as the DDIs’ predictor. Conclusion The experimental results indicate that drug categories is effective as a new feature type applied to CNN-DDI method. And using multiple features is more informative and more effective than single feature. It can be concluded that CNN-DDI has more superiority than other existing algorithms on task of predicting DDIs.
- Published
- 2022
- Full Text
- View/download PDF
5. DRACP: a novel method for identification of anticancer peptides
- Author
-
Tianyi Zhao, Yang Hu, and Tianyi Zang
- Subjects
Anticancer peptides ,Deep belief network ,Relevance vector machine ,Random forest ,Cancer ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Millions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. Common ways against cancer include surgical operation, radiotherapy and chemotherapy. However, they are all very harmful for patients. Recently, the anticancer peptides (ACPs) have been discovered to be a potential way to treat cancer. Since ACPs are natural biologics, they are safer than other methods. However, the experimental technology is an expensive way to find ACPs so we purpose a new machine learning method to identify the ACPs. Results Firstly, we extracted the feature of ACPs in two aspects: sequence and chemical characteristics of amino acids. For sequence, average 20 amino acids composition was extracted. For chemical characteristics, we classified amino acids into six groups based on the patterns of hydrophobic and hydrophilic residues. Then, deep belief network has been used to encode the features of ACPs. Finally, we purposed Random Relevance Vector Machines to identify the true ACPs. We call this method ‘DRACP’ and tested the performance of it on two independent datasets. Its AUC and AUPR are higher than 0.9 in both datasets. Conclusion We developed a novel method named ‘DRACP’ and compared it with some traditional methods. The cross-validation results showed its effectiveness in identifying ACPs.
- Published
- 2020
- Full Text
- View/download PDF
6. LncDisAP: a computation model for LncRNA-disease association prediction based on multiple biological datasets
- Author
-
Yongtian Wang, Liran Juan, Jiajie Peng, Tianyi Zang, and Yadong Wang
- Subjects
Long non-coding RNAs ,Disease ,lncRNA network ,Random walking with restart ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Over the past decades, a large number of long non-coding RNAs (lncRNAs) have been identified. Growing evidence has indicated that the mutation and dysregulation of lncRNAs play a critical role in the development of many complex human diseases. Consequently, identifying potential disease-related lncRNAs is an effective means to improve the quality of disease diagnostics and treatment, which is the motivation of this work. Here, we propose a computational model (LncDisAP) for potential disease-related lncRNA identification based on multiple biological datasets. First, the associations between lncRNA and different data sources are collected from different databases. With these data sources as dimensions, we calculate the functional associations between lncRNAs by the recommendation strategy of collaborative filtering. Subsequently, a disease-associated lncRNA functional network is built with functional similarities between lncRNAs as the weight. Ultimately, potential disease-related lncRNAs can be identified based on ranked scores derived by random walking with restart (RWR). Then, training sets and testing sets are extracted from two different versions of a disease-lncRNA dataset to assess the performance of LncDisAP on 54 diseases. Results A lncRNA functional network is built based on the proposed computational model, and it contains 66,060 associations among 364 lncRNAs associated with 182 diseases in total. We extract 218 known disease-lncRNA pairs associated with 54 diseases to assess the network. As a result, the average AUC (area under the receiver operating characteristic curve) of LncDisAP is 78.08%. Conclusion In this article, a computational model integrating multiple lncRNA-related biological datasets is proposed for identifying potential disease-related lncRNAs. The result shows that LncDisAP is successful in predicting novel disease-related lncRNA signatures. In addition, with several common cancers taken as case studies, we found some unknown lncRNAs that could be associated with these diseases through our network. These results suggest that this method can be helpful in improving the quality for disease diagnostics and treatment.
- Published
- 2019
- Full Text
- View/download PDF
7. deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index
- Author
-
Bo Liu, Yadong Liu, Junyi Li, Hongzhe Guo, Tianyi Zang, and Yadong Wang
- Subjects
Long read alignment ,RNA-seq ,de Bruijn graph-based index ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract The alignment of long-read RNA sequencing reads is non-trivial due to high sequencing errors and complicated gene structures. We propose deSALT, a tailored two-pass alignment approach, which constructs graph-based alignment skeletons to infer exons and uses them to generate spliced reference sequences to produce refined alignments. deSALT addresses several difficult technical issues, such as small exons and sequencing errors, which break through bottlenecks of long RNA-seq read alignment. Benchmarks demonstrate that deSALT has a greater ability to produce accurate and homogeneous full-length alignments. deSALT is available at: https://github.com/hitbc/deSALT.
- Published
- 2019
- Full Text
- View/download PDF
8. Psi-Caller: A Lightweight Short Read-Based Variant Caller With High Speed and Accuracy
- Author
-
Yadong Liu, Tao Jiang, Yan Gao, Bo Liu, Tianyi Zang, and Yadong Wang
- Subjects
variant calling ,partial order alignment ,short read sequencing ,SNV/indel detection ,local assembly ,Biology (General) ,QH301-705.5 - Abstract
With the rapid development of short-read sequencing technologies, many population-scale resequencing studies have been carried out to study the associations between human genome variants and various phenotypes in recent years. Variant calling is one of the core bioinformatics tasks in such studies to comprehensively discover genomic variants in sequenced samples. Many efforts have been made to develop short read-based variant calling approaches; however, state-of-the-art tools are still computationally expensive. Meanwhile, cutting-edge genomics studies also have higher requirements on the yields of variant calling. Herein, we propose Partial-Order Alignment-based single nucleotide polymorphism (SNV) and Indel caller (Psi-caller), a lightweight variant calling algorithm that simultaneously achieves high performance and yield. Mainly, Psi-caller recognizes and divides the candidate variant site into three categories according to the complexity and location of the signatures and employs various methods including binomial model, partial-order alignment, and de Bruijn graph-based local assembly to handle various categories of candidate variant sites to call and genotype SNVs/Indels, respectively. Benchmarks on simulated and real short-read sequencing data sets demonstrate that Psi-caller is times faster than state-of-the-art tools with higher or equal sensitivity and accuracy. It has the potential to well handle large-scale data sets in cutting-edge genomics studies.
- Published
- 2021
- Full Text
- View/download PDF
9. DeepGP: An Integrated Deep Learning Method for Endocrine Disease Gene Prediction Using Omics Data
- Author
-
Ningyi Zhang, Haoyan Wang, Chen Xu, Liyuan Zhang, and Tianyi Zang
- Subjects
endocrine disease ,Graves’ disease ,T2DM ,PCOS ,T1DM ,IGF-I ,Biology (General) ,QH301-705.5 - Abstract
Endocrinology is the study focusing on hormones and their actions. Hormones are known as chemical messengers, released into the blood, that exert functions through receptors to make an influence in the target cell. The capacity of the mammalian organism to perform as a whole unit is made possible based on two principal control mechanisms, the nervous system and the endocrine system. The endocrine system is essential in regulating growth and development, tissue function, metabolism, and reproductive processes. Endocrine diseases such as diabetes mellitus, Grave’s disease, polycystic ovary syndrome, and insulin-like growth factor I deficiency (IGFI deficiency) are classical endocrine diseases. Endocrine dysfunction is also an increasing factor of morbidity in cancer and other dangerous diseases in humans. Thus, it is essential to understand the diseases from their genetic level in order to recognize more pathogenic genes and make a great effort in understanding the pathologies of endocrine diseases. In this study, we proposed a deep learning method named DeepGP based on graph convolutional network and convolutional neural network for prioritizing susceptible genes of five endocrine diseases. To test the performance of our method, we performed 10-cross-validations on an integrated reported dataset; DeepGP obtained a performance of the area under the curve of ∼83% and area under the precision-recall curve of ∼65%. We found that type 1 diabetes mellitus (T1DM) and type 2 diabetes mellitus (T2DM) share most of their associated genes; therefore, we should pay more attention to the rest of the genes related to T1DM and T2DM, respectively, which could help in understanding the pathogenesis and pathologies of these diseases.
- Published
- 2021
- Full Text
- View/download PDF
10. Identifying Alzheimer’s disease-related proteins by LRRGD
- Author
-
Tianyi Zhao, Yang Hu, Tianyi Zang, and Liang Cheng
- Subjects
Alzheimer’s disease ,Proteins ,Similarity of diseases ,Logistic regression ,Gradient descent ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Alzheimer’s disease (AD) imposes a heavy burden on society and every family. Therefore, diagnosing AD in advance and discovering new drug targets are crucial, while these could be achieved by identifying AD-related proteins. The time-consuming and money-costing biological experiment makes researchers turn to develop more advanced algorithms to identify AD-related proteins. Results Firstly, we proposed a hypothesis “similar diseases share similar related proteins”. Therefore, five similarity calculation methods are introduced to find out others diseases which are similar to AD. Then, these diseases’ related proteins could be obtained by public data set. Finally, these proteins are features of each disease and could be used to map their similarity to AD. We developed a novel method ‘LRRGD’ which combines Logistic Regression (LR) and Gradient Descent (GD) and borrows the idea of Random Forest (RF). LR is introduced to regress features to similarities. Borrowing the idea of RF, hundreds of LR models have been built by randomly selecting 40 features (proteins) each time. Here, GD is introduced to find out the optimal result. To avoid the drawback of local optimal solution, a good initial value is selected by some known AD-related proteins. Finally, 376 proteins are found to be related to AD. Conclusion Three hundred eight of three hundred seventy-six proteins are the novel proteins. Three case studies are done to prove our method’s effectiveness. These 308 proteins could give researchers a basis to do biological experiments to help treatment and diagnostic AD.
- Published
- 2019
- Full Text
- View/download PDF
11. Prioritizing candidate diseases-related metabolites based on literature and functional similarity
- Author
-
Yongtian Wang, Liran Juan, Jiajie Peng, Tianyi Zang, and Yadong Wang
- Subjects
Metabolite network ,Collaborative filtering ,Similarity of metabolites ,Random walking with restart ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background As the terminal products of cellular regulatory process, functional related metabolites have a close relationship with complex diseases, and are often associated with the same or similar diseases. Therefore, identification of disease related metabolites play a critical role in understanding comprehensively pathogenesis of disease, aiming at improving the clinical medicine. Considering that a large number of metabolic markers of diseases need to be explored, we propose a computational model to identify potential disease-related metabolites based on functional relationships and scores of referred literatures between metabolites. First, obtaining associations between metabolites and diseases from the Human Metabolome database, we calculate the similarities of metabolites based on modified recommendation strategy of collaborative filtering utilizing the similarities between diseases. Next, a disease-associated metabolite network (DMN) is built with similarities between metabolites as weight. To improve the ability of identifying disease-related metabolites, we introduce scores of text mining from the existing database of chemicals and proteins into DMN and build a new disease-associated metabolite network (FLDMN) by fusing functional associations and scores of literatures. Finally, we utilize random walking with restart (RWR) in this network to predict candidate metabolites related to diseases. Results We construct the disease-associated metabolite network and its improved network (FLDMN) with 245 diseases, 587 metabolites and 28,715 disease-metabolite associations. Subsequently, we extract training sets and testing sets from two different versions of the Human Metabolome database and assess the performance of DMN and FLDMN on 19 diseases, respectively. As a result, the average AUC (area under the receiver operating characteristic curve) of DMN is 64.35%. As a further improved network, FLDMN is proven to be successful in predicting potential metabolic signatures for 19 diseases with an average AUC value of 76.03%. Conclusion In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. This result suggests that integrating literature and functional associations can be an effective way to construct disease associated metabolite network for prioritizing candidate diseases-related metabolites.
- Published
- 2019
- Full Text
- View/download PDF
12. Human mitochondrial genome compression using machine learning techniques
- Author
-
Rongjie Wang, Tianyi Zang, and Yadong Wang
- Subjects
Compression ,Human mitochondrial genomes ,Machine learning ,Medicine ,Genetics ,QH426-470 - Abstract
Abstract Background In recent years, with the development of high-throughput genome sequencing technologies, a large amount of genome data has been generated, which has caused widespread concern about data storage and transmission costs. However, how to effectively compression genome sequences data remains an unsolved problem. Results In this paper, we propose a compression method using machine learning techniques (DeepDNA), for compressing human mitochondrial genome data. The experimental results show the effectiveness of our proposed method compared with other on the human mitochondrial genome data. Conclusions The compression method we proposed can be classified as non-reference based method, but the compression effect is comparable to that of reference based methods. Moreover, our method not only have a well compression results in the population genome with large redundancy, but also in the single genome with small redundancy. The codes of DeepDNA are available at https://github.com/rongjiewang/DeepDNA.
- Published
- 2019
- Full Text
- View/download PDF
13. Enhancement and Imputation of Peak Signal Enables Accurate Cell-Type Classification in scATAC-seq
- Author
-
Zhe Cui, Ya Cui, Yan Gao, Tao Jiang, Tianyi Zang, and Yadong Wang
- Subjects
scATAC-seq ,classification ,machine learning ,support vector machine ,cell-type annotation ,Genetics ,QH426-470 - Abstract
Single-cell Assay Transposase Accessible Chromatin sequencing (scATAC-seq) has been widely used in profiling genome-wide chromatin accessibility in thousands of individual cells. However, compared with single-cell RNA-seq, the peaks of scATAC-seq are much sparser due to the lower copy numbers (diploid in humans) and the inherent missing signals, which makes it more challenging to classify cell type based on specific expressed gene or other canonical markers. Here, we present svmATAC, a support vector machine (SVM)-based method for accurately identifying cell types in scATAC-seq datasets by enhancing peak signal strength and imputing signals through patterns of co-accessibility. We applied svmATAC to several scATAC-seq data from human immune cells, human hematopoietic system cells, and peripheral blood mononuclear cells. The benchmark results showed that svmATAC is free of literature-based markers and robust across datasets in different libraries and platforms. The source code of svmATAC is available at https://github.com/mrcuizhe/svmATAC under the MIT license.
- Published
- 2021
- Full Text
- View/download PDF
14. Identifying Protein Biomarkers in Blood for Alzheimer's Disease
- Author
-
Tianyi Zhao, Yang Hu, Tianyi Zang, and Yadong Wang
- Subjects
Alzheimer's disease ,similarity of diseases ,protein ,minimum angle regression ,elastic network ,Biology (General) ,QH301-705.5 - Abstract
Background: At present, the main diagnostic methods for Alzheimer's disease (AD) are positron emission tomography (PET) scanning of the brain and analysis of cerebrospinal fluid (CSF) sample, but these methods are expensive and harmful to patients. Recently, more researchers focus on diagnosing AD by detecting biomarkers in blood, which is a cheaper and harmless way. Therefore, identifying AD-related proteins in blood can help treatment and diagnosis.Methods: We proposed a hypothesis that similar diseases share similar proteins. Diseases with similar symptoms are caused by abnormalities of similar proteins. Assuming that the similarities between AD and other diseases obey the normal distribution, we developed an iterative method based on disease similarity (IBDS). We combined Elastic Network (EN) with Minimum angle regression (MAR) to find the optimal solution. Finally, we used case studies and Summary data Mendelian Random (SMR) to verify our method.Results: We selected 39 diseases which are highly related to AD. They correspond 1,481 kinds of proteins. One hundred and eighty-four proteins are reported to be related to AD in Uniprot and the number would be 284 with our method. The AUC of our method by cross-validation is 0.9251 which is much higher than previous methods.Conclusion: In this paper, we presented a novel method for prioritizing AD-related proteins. Seven proteins have tissue specificity in blood among these 284 proteins, which could be used to diagnose AD in future. Case studies and SMR have been used to prove the relationship between these 7 proteins and AD.Availability and Implementation:https://github.com/zty2009/Identifying-Protein-Biomarkers-in-Blood-for-Alzheimer-s-Disease
- Published
- 2020
- Full Text
- View/download PDF
15. Identifying diseases-related metabolites using random walk
- Author
-
Yang Hu, Tianyi Zhao, Ningyi Zhang, Tianyi Zang, Jun Zhang, and Liang Cheng
- Subjects
Metabolites ,Similarity of diseases ,Similarity of metabolites ,Random walk ,InfDisSim ,MISIM ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Metabolites disrupted by abnormal state of human body are deemed as the effect of diseases. In comparison with the cause of diseases like genes, these markers are easier to be captured for the prevention and diagnosis of metabolic diseases. Currently, a large number of metabolic markers of diseases need to be explored, which drive us to do this work. Methods The existing metabolite-disease associations were extracted from Human Metabolome Database (HMDB) using a text mining tool NCBO annotator as priori knowledge. Next we calculated the similarity of a pair-wise metabolites based on the similarity of disease sets of them. Then, all the similarities of metabolite pairs were utilized for constructing a weighted metabolite association network (WMAN). Subsequently, the network was utilized for predicting novel metabolic markers of diseases using random walk. Results Totally, 604 metabolites and 228 diseases were extracted from HMDB. From 604 metabolites, 453 metabolites are selected to construct the WMAN, where each metabolite is deemed as a node, and the similarity of two metabolites as the weight of the edge linking them. The performance of the network is validated using the leave one out method. As a result, the high area under the receiver operating characteristic curve (AUC) (0.7048) is achieved. The further case studies for identifying novel metabolites of diabetes mellitus were validated in the recent studies. Conclusion In this paper, we presented a novel method for prioritizing metabolite-disease pairs. The superior performance validates its reliability for exploring novel metabolic markers of diseases.
- Published
- 2018
- Full Text
- View/download PDF
16. Prediction for High Risk Clinical Symptoms of Epilepsy Based on Deep Learning Algorithm
- Author
-
Mingrui Sun, Fuxu Wang, Tengfei Min, Tianyi Zang, and Yadong Wang
- Subjects
Epilepsy ,prediction algorithms ,machine learning ,neural networks ,clinical diagnosis ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Accurate forecasting of high-risk clinical symptoms, like epileptic seizures, has the potential to transform clinical epilepsy care and to create new therapeutic strategies for individuals in clinical decision support systems. With the development of pervasive sensor technologies, physiological signals can be captured continuously to prevent the serious outcomes caused by epilepsy. However, the progress on seizure prediction has been hindered by the lack of automatic early warning system. The existing research is classifying electroencephalograph (EEG) clips and is distinguishing the clips of onset epileptic seizures. Deep learning is a promising method to analyze the large-scale unlabeled data and to widely spread the clinical treatment and risk prediction. In this paper, we outline a patient-specific method for extracting the frequency domain and time-series data features based on the two-layer convolutional neural networks (CNNs). A data preprocessing method based on the discrete Fourier transform is proposed to convert the time-domain signal of the EEG data to the frequency-domain signal. Long short-term memory networks are introduced in seizure prediction using pre-seizure clips of the EEG dataset, expanding the use of deep learning algorithms with recurrent neural networks (RNNs). Furthermore, the proposed CNN and RNN are compared with the traditional machine learning algorithms, such as linear discriminant analysis and logistic regression, and the evaluation criteria are on the area under the curve. The extensive experimental results demonstrate that our method can effectively extract the latent features with meaningful interpretation and exhibits excellent performance for predicting epileptic preictal state changes, and hence is an effective method in detecting the epileptic seizure.
- Published
- 2018
- Full Text
- View/download PDF
17. Peptide-Major Histocompatibility Complex Class I Binding Prediction Based on Deep Learning With Novel Feature
- Author
-
Tianyi Zhao, Liang Cheng, Tianyi Zang, and Yang Hu
- Subjects
peptide-major histocompatibility complex class I binding prediction ,deep learning ,convolutional neural network ,epitope prediction ,human leukocyte antigen ,Genetics ,QH426-470 - Abstract
Peptide-based vaccine development needs accurate prediction of the binding affinity between major histocompatibility complex I (MHC I) proteins and their peptide ligands. Nowadays more and more machine learning methods have been developed to predict binding affinity and some of them have become the popular tools. However most of them are designed by the shallow neural networks. Bengio said that deep neural networks can learn better fits with less data than shallow neural networks. In our case, some of the alleles only have dozens of peptide data. In addition, we transform each peptide into a characteristic matrix and input it into the model. As we know when dealing with the problem that the input is a matrix, convolutional neural network (CNN) can find the most critical features by itself. Obviously, compared with the traditional neural network model, CNN is more suitable for predicting binding affinity. Different from the previous studies which are based on blocks substitution matrix (BLOSUM), we used novel feature to do the prediction. Since we consider that the order of the sequence, hydropathy index, polarity and the length of the peptide could affect the binding affinity and the properties of these amino acids are key factors for their binding to MHC, we extracted these information from each peptide. In order to make full use of the data we have obtained, we have integrated different lengths of peptides into 15mer based on the binding mode of peptide to MHC I. In order to demonstrate that our method is reliable to predict peptide-MHC binding, we compared our method with several popular methods. The experiments show the superiority of our method.
- Published
- 2019
- Full Text
- View/download PDF
18. Integrate GWAS, eQTL, and mQTL Data to Identify Alzheimer’s Disease-Related Genes
- Author
-
Tianyi Zhao, Yang Hu, Tianyi Zang, and Yadong Wang
- Subjects
Alzheimer’s disease ,Mendelian randomization ,GWAS ,eQTL ,mQTL ,Genetics ,QH426-470 - Abstract
It is estimated that the impact of related genes on the risk of Alzheimer’s disease (AD) is nearly 70%. Identifying candidate causal genes can help treatment and diagnosis. The maturity of sequencing technology and the reduction of cost make genome-wide association study (GWAS) become an important means to find disease-related mutation sites. Because of linkage disequilibrium (LD), neither the gene regulated by SNP nor the specific SNP can be determined. Because GWAS is affected by sample size and interaction, we introduced empirical Bayes (EB) to make a meta-analysis of GWAS to greatly eliminate the bias caused by sample and the interaction of SNP. In addition, most SNPs are in the noncoding region, so it is not clear how they relate to phenotype. In this paper, expression quantitative trait locus (eQTL) studies and methylation quantitative trait locus (mQTL) studies are combined with GWAS to find the genes associated with Alzheimer disease in expression levels by pleiotropy. Summary data-based Mendelian randomization (SMR) is introduced to integrate GWAS and eQTL/mQTL data. Finally, we prioritized 274 significant SNPs, which belong to 20 genes by eQTL analysis and 379 significant SNPs, which belong to seven known genes by mQTL. Among them, 93 SNPs and 2 genes are overlapped. Finally, we did 10 case studies to prove the effectiveness of our method.
- Published
- 2019
- Full Text
- View/download PDF
19. Predicting circRNA-Disease Associations Based on circRNA Expression Similarity and Functional Similarity
- Author
-
Yongtian Wang, Chenxi Nie, Tianyi Zang, and Yadong Wang
- Subjects
circRNA ,disease ,circRNA expression similarity ,circRNA functional similarity ,PersonalRank ,Genetics ,QH426-470 - Abstract
Circular RNAs (circRNAs) are a novel class of endogenous noncoding RNAs that have well-conserved sequences. Emerging evidence has shown that circRNAs can be novel biomarkers or therapeutic targets for many diseases and play an important role in the development of various pathological conditions. Therefore, identifying potential disease-related circRNAs is helpful in improving the efficiency of finding therapeutic targets for diseases. Here, we propose a computational model (PreCDA) to predict potential circRNA–disease associations. First, we calculated the circRNA expression similarity based on circRNA expression profiles. The circRNA functional similarity is calculated based on cosine similarity, and the disease similarity is used as the dimension of each circRNA vector. The associations between circRNAs and diseases are defined based on the circRNA functional similarity and expression similarity. We constructed a disease-related circRNA association network and used a graph-based recommendation algorithm (PersonalRank) to sort candidate disease-related circRNAs. As a result, PreCDA has an average area under the receiver operating characteristic curve value of 78.15% in predicting candidate disease-related circRNAs. In addition, we discuss the factors that affect the performance of this method and find some unknown circRNAs related to diseases, with several common diseases used as case studies. These results show that PreCDA has good performance in predicting potential circRNA–disease associations and is helpful for the diagnosis and treatment of human diseases.
- Published
- 2019
- Full Text
- View/download PDF
20. Identification of Alzheimer's Disease-Related Genes Based on Data Integration Method
- Author
-
Yang Hu, Tianyi Zhao, Tianyi Zang, Ying Zhang, and Liang Cheng
- Subjects
Alzheimer disease ,SNPs ,mendelian randomization ,GWAS ,eQTL ,Genetics ,QH426-470 - Abstract
Alzheimer disease (AD) is the fourth major cause of death in the elderly following cancer, heart disease and cerebrovascular disease. Finding candidate causal genes can help in the design of Gene targeted drugs and effectively reduce the risk of the disease. Complex diseases such as AD are usually caused by multiple genes. The Genome-wide association study (GWAS), has identified the potential genetic variants for most diseases. However, because of linkage disequilibrium (LD), it is difficult to identify the causative mutations that directly cause diseases. In this study, we combined expression quantitative trait locus (eQTL) studies with the GWAS, to comprehensively define the genes that cause Alzheimer disease. The method used was the Summary Mendelian randomization (SMR), which is a novel method to integrate summarized data. Two GWAS studies and five eQTL studies were referenced in this paper. We found several candidate SNPs that have a strong relationship with AD. Most of these SNPs overlap in different data sets, providing relatively strong reliability. We also explain the function of the novel AD-related genes we have discovered.
- Published
- 2019
- Full Text
- View/download PDF
21. BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs
- Author
-
Rongjie Wang, Junyi Li, Yang Bai, Tianyi Zang, and Yadong Wang
- Subjects
Compression ,Bucket-based ,Next-generation sequencing ,Dynamic de Bruijn graph ,Medicine ,Biology (General) ,QH301-705.5 - Abstract
Dramatic increases in data produced by next-generation sequencing (NGS) technologies demand data compression tools for saving storage space. However, effective and efficient data compression for genome sequencing data has remained an unresolved challenge in NGS data studies. In this paper, we propose a novel alignment-free and reference-free compression method, BdBG, which is the first to compress genome sequencing data with dynamic de Bruijn graphs based on the data after bucketing. Compared with existing de Bruijn graph methods, BdBG only stored a list of bucket indexes and bifurcations for the raw read sequences, and this feature can effectively reduce storage space. Experimental results on several genome sequencing datasets show the effectiveness of BdBG over three state-of-the-art methods. BdBG is written in python and it is an open source software distributed under the MIT license, available for download at https://github.com/rongjiewang/BdBG.
- Published
- 2018
- Full Text
- View/download PDF
22. TetraCVD: A Temporal-Textual Transformer based Model for Cardiovascular Disease Diagnosis.
- Author
-
Kailong Lu, Fei Zhao, Penghuan Gu, Haoyan Wang, Tianyi Zang, and Hong Wang
- Published
- 2023
- Full Text
- View/download PDF
23. MTOR hypermethylation may associate with the susceptibility and survival of SARS-CoV-2 infections to lung adenocarcinoma patients based on multi-omics data and machine learning.
- Author
-
Yu Guo, Minghao Li, Yang Hu 0008, and Tianyi Zang
- Published
- 2022
- Full Text
- View/download PDF
24. Comparison of the Nanopore and PacBio sequencing technologies for DNA 5-methylcytosine detection.
- Author
-
Yadong Liu, Zhongyu Liu, Tao Jiang 0021, Tianyi Zang, and Yadong Wang
- Published
- 2022
- Full Text
- View/download PDF
25. prePathCluster: An novel deep-learning based method for endocrine disease pathway analysis.
- Author
-
Ningyi Zhang and Tianyi Zang
- Published
- 2021
- Full Text
- View/download PDF
26. NCRR: A novel method for measuring disease similarity based on non-coding RNA regulation.
- Author
-
Ningyi Zhang, Liran Juan, and Tianyi Zang
- Published
- 2020
- Full Text
- View/download PDF
27. CNN-DDI: A novel deep learning method for predicting drug-drug interactions.
- Author
-
Chengcheng Zhang and Tianyi Zang
- Published
- 2020
- Full Text
- View/download PDF
28. Assessment of Machine Learning Methods for Classification in Single Cell ATAC-seq.
- Author
-
Zhe Cui, Bo Liu 0023, Liran Juan, Tianyi Zang, Tao Jiang 0021, and Yadong Wang
- Published
- 2020
- Full Text
- View/download PDF
29. Identification of anticancer peptides based on Random Relevance Vector Machines.
- Author
-
Tianyi Zhao, Cheng Liang, Tianyi Zang, and Yang Hu 0008
- Published
- 2019
- Full Text
- View/download PDF
30. Identifying Candidate Diseases-related Metabolites Based on Disease Similarity.
- Author
-
Yongtian Wang, Liran Juan, Chunpu Liu, Tianyi Zang, and Yadong Wang
- Published
- 2018
- Full Text
- View/download PDF
31. DeepDNA: a hybrid convolutional and recurrent neural network for compressing human mitochondrial genomes.
- Author
-
Rongjie Wang, Yang Bai, Yan-Shuo Chu, Zhenxing Wang, Yongtian Wang, Mingrui Sun, Junyi Li, Tianyi Zang, and Yadong Wang
- Published
- 2018
- Full Text
- View/download PDF
32. A Novel Method for Identifying Alzheimer's Disease-related Proteins.
- Author
-
Yang Hu 0008, Jun Zhang 0041, Tianyi Zhao, Liang Cheng 0006, and Tianyi Zang
- Published
- 2018
- Full Text
- View/download PDF
33. Analysis for Early Seizure Detection System Based on Deep Learning Algorithm.
- Author
-
Fuxu Wang, Mingrui Sun, Tengfei Min, Yueying Wang, Chunpu Liu, Tianyi Zang, and Yadong Wang
- Published
- 2018
- Full Text
- View/download PDF
34. Predicting candidate disease-related lncRNAs based on network random walk.
- Author
-
Yongtian Wang, Liran Juan, Jiajie Peng, Tianyi Zang, and Yadong Wang
- Published
- 2018
- Full Text
- View/download PDF
35. Maternal GLP-1 receptor activation inhibits fetal growth.
- Author
-
Liping Qiao, Lu, Cindy, Tianyi Zang, Dzyuba, Brianna, and Jianhua Shao
- Subjects
GLUCAGON-like peptide 1 ,ADIPOSE tissues ,FAT ,GLUCOSE transporters ,INSULIN ,BLOOD sugar ,FOOD consumption - Abstract
Glucagon-like peptide 1 (GLP-1) regulates food intake, insulin production, and metabolism. Our recent study demonstrated that pancreatic α-cells-secreted (intraislet) GLP-1 effectively promotes maternal insulin secretion and metabolic adaptation during pregnancy. However, the role of circulating GLP-1 in maternal energy metabolism remains largely unknown. Our study aims to investigate systemic GLP-1 response to pregnancy and its regulatory effect on fetal growth. Using C57BL/6 mice, we observed a gradual decline in maternal blood GLP-1 concentrations. Subsequent administration of the GLP-1 receptor agonist semaglutide (Sem) to dams in late pregnancy revealed a modest decrease in maternal food intake during initial treatment. At the same time, no significant alterations were observed in maternal body weight or fat mass. Notably, Sem-treated dams exhibited a significant decrease in fetal body weight, which persisted even following the restoration of maternal blood glucose levels. Despite no observable change in placental weight, a marked reduction in the placenta labyrinth area from Sem-treated dams was evident. Our investigation further demonstrated a substantial decrease in the expression levels of various pivotal nutrient transporters within the placenta, including glucose transporter one and sodium-neutral amino acid transporter one, after Sem treatment. In addition, Sem injection led to a notable reduction in the capillary area, number, and surface densities within the labyrinth. These findings underscore the crucial role of modulating circulating GLP-1 levels in maternal adaptation, emphasizing the inhibitory effects of excessive GLP-1 receptor activation on both placental development and fetal growth. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. A bucket index correction based method for compression of genomic sequencing data.
- Author
-
Rongjie Wang, Yang Bai, Qianlong Cheng, Tianyi Zang, and Yadong Wang
- Published
- 2017
- Full Text
- View/download PDF
37. FNSemSim: An improved disease similarity method based on network fusion.
- Author
-
Yongtian Wang, Liran Juan, Yan-Shuo Chu, Rongjie Wang, Tianyi Zang, and Yadong Wang
- Published
- 2017
- Full Text
- View/download PDF
38. DMcompress: Dynamic Markov models for bacterial genome compression.
- Author
-
Rongjie Wang, Mingxiang Teng, Yang Bai, Tianyi Zang, and Yadong Wang
- Published
- 2016
- Full Text
- View/download PDF
39. Integration ofmultiple-omics data to reveal the shared genetic architecture of educational attainment, intelligence, cognitive performance, and Alzheimer's disease.
- Author
-
Fuxu Wang, Haoyan Wang, Ye Yuan, Bing Han, Shizheng Qiu, Yang Hu, and Tianyi Zang
- Subjects
ALZHEIMER'S disease ,COGNITIVE ability ,EDUCATIONAL attainment ,GENOME-wide association studies ,GENETIC correlations ,FALSE discovery rate ,RANDOMIZATION (Statistics) - Abstract
Growing evidence suggests the effect of educational attainment (EA) on Alzheimer's disease (AD), but less is known about the shared genetic architecture between them. Here, leveraging genome-wide association studies (GWAS) for AD (N = 21,982/41,944), EA (N = 1,131,881), cognitive performance (N = 257,828), and intelligence (N = 78,308), we investigated their causal association with the linkage disequilibrium score (LDSC) and Mendelian randomization and their shared loci with the conjunctional false discovery rate (conjFDR), transcriptome-wide association studies (TWAS), and colocalization. We observed significant genetic correlations of EA (rg = -0.22, p = 5.07E-05), cognitive performance (rg = -0.27, p = 2.44E-05), and intelligence (rg = -0.30, p = 3.00E-04) with AD, and a causal relationship between EA and AD (OR = 0.74, 95% CI: 0.58--0.94, p = 0.013). We identified 13 shared loci at conjFDR <0.01, of which five were novel, and prioritized three causal genes. These findings inform early prevention strategies for AD. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. SS4CSHC: A Services System for the Collaboration in Stroke Healthcare Cycle.
- Author
-
Mingrui Sun, Di Dai, Xiaowei Wu, Shiquan Wang, Tianyi Zang, and Xiaofei Xu
- Published
- 2015
- Full Text
- View/download PDF
41. OGSA-Based SOA for Collaborative Cancer Research: System Modeling and Generation.
- Author
-
Tianyi Zang, Radu Calinescu, and Marta Kwiatkowska
- Published
- 2011
- Full Text
- View/download PDF
42. Cooperative Work Systems for the Security of Digital Computing Infrastructure.
- Author
-
Tianning Zang, Xiao-chun Yun, Tianyi Zang, Yongzheng Zhang 0002, and Chaoguang Men
- Published
- 2010
- Full Text
- View/download PDF
43. Comprehensive Practice on Service Engineering: An Experimental Solution.
- Author
-
Zhongjie Wang, Xiaofei Xu, Dian-Hui Chu, Lanshun Nie, Tianyi Zang, and Xiaofeng Liu
- Published
- 2010
- Full Text
- View/download PDF
44. WSRF-Based Modeling of Clinical Trial Information for Collaborative Cancer Research.
- Author
-
Tianyi Zang, Radu Calinescu, Steve Harris, Andrew Tsui, Marta Z. Kwiatkowska, Jeremy Gibbons, Jim Davies, Peter Maccallum, and Carlos Caldas
- Published
- 2008
- Full Text
- View/download PDF
45. GRASG - A Framework for 'Gridifying' and Running Applications on Service-Oriented Grids.
- Author
-
Quoc-Thuan Ho, Terence Hung, Wei Jie, Hoong-Maeng Chan, Emilda Sindhu, Subramaniam Ganesan, Tianyi Zang, and Xiaorong Li
- Published
- 2006
- Full Text
- View/download PDF
46. A Grid Computing Infrastructure Based on OGSA.
- Author
-
Zhou Lei, Tianyi Zang, Wei Jie, Wentong Cai 0001, and Lizhe Wang 0001
- Published
- 2003
47. A Web Services Based Grid Performance Data Management Framework.
- Author
-
Tianyi Zang, Zhou Lei, Wei Jie, Wentong Cai 0001, and Lizhe Wang 0001
- Published
- 2003
48. MHCRoBERTa: pan-specific peptide–MHC class I binding prediction through transfer learning with label-agnostic protein sequences
- Author
-
Fuxu Wang, Haoyan Wang, Lizhuang Wang, Haoyu Lu, Shizheng Qiu, Tianyi Zang, Xinjun Zhang, and Yang Hu
- Subjects
Machine Learning ,Histocompatibility Antigens Class I ,Amino Acid Sequence ,Peptides ,Molecular Biology ,Algorithms ,Protein Binding ,Information Systems - Abstract
Predicting the binding of peptide and major histocompatibility complex (MHC) plays a vital role in immunotherapy for cancer. The success of Alphafold of applying natural language processing (NLP) algorithms in protein secondary struction prediction has inspired us to explore the possibility of NLP methods in predicting peptide–MHC class I binding. Based on the above motivations, we propose the MHCRoBERTa method, RoBERTa pre-training approach, for predicting the binding affinity between type I MHC and peptides. Analysis of the results on benchmark dataset demonstrates that MHCRoBERTa can outperform other state-of-art prediction methods with an increase of the Spearman rank correlation coefficient (SRCC) value. Notably, our model gave a significant improvement on IC50 value. Our method has achieved SRCC value and AUC value as 0.785 and 0.817, respectively. Our SRCC value is 14.3% higher than NetMHCpan3.0 (the second highest SRCC value on pan-specific) and is 3% higher than MHCflurry (the second highest SRCC value on all methods). The AUC value is also better than any other pan-specific methods. Moreover, we visualize the multi-head self-attention for the token representation across the layers and heads by this method. Through the analysis of the representation of each layer and head, we can show whether the model has learned the syntax and semantics necessary to perform the prediction task well. All these results demonstrate that our model can accurately predict the peptide–MHC class I binding affinity and that MHCRoBERTa is a powerful tool for screening potential neoantigens for cancer immunotherapy. MHCRoBERTa is available as an open source software at github (https://github.com/FuxuWang/MHCRoBERTa).
- Published
- 2022
49. A Review of Drug Side Effect Identification Methods
- Author
-
Shuai Deng, Tianyi Zang, Tianyi Zhao, Yige Sun, and Yang Hu
- Subjects
Pharmacology ,Drug ,Identification methods ,0303 health sciences ,Databases, Factual ,Drug-Related Side Effects and Adverse Reactions ,Computer science ,media_common.quotation_subject ,Machine Learning ,Clinical trial ,03 medical and health sciences ,0302 clinical medicine ,Risk analysis (engineering) ,Drug Discovery ,Data Mining ,Humans ,Drug side effects ,030212 general & internal medicine ,Drug reaction ,030304 developmental biology ,media_common - Abstract
Drug side effects have become an important indicator for evaluating the safety of drugs. There are two main factors in the frequent occurrence of drug safety problems; on the one hand, the clinical understanding of drug side effects is insufficient, leading to frequent adverse drug reactions, while on the other hand, due to the long-term period and complexity of clinical trials, side effects of approved drugs on the market cannot be reported in a timely manner. Therefore, many researchers have focused on developing methods to identify drug side effects. In this review, we summarize the methods of identifying drug side effects and common databases in this field. We classified methods of identifying side effects into four categories: biological experimental, machine learning, text mining and network methods. We point out the key points of each kind of method. In addition, we also explain the advantages and disadvantages of each method. Finally, we propose future research directions.
- Published
- 2020
50. The Essential Role of Pancreatic α-Cells in Maternal Metabolic Adaptation to Pregnancy
- Author
-
Liping Qiao, Sarah Saget, Cindy Lu, Tianyi Zang, Brianna Dzyuba, William W. Hay, and Jianhua Shao
- Subjects
Endocrinology, Diabetes and Metabolism ,Knockout ,1.1 Normal biological development and functioning ,Reproductive health and childbirth ,Inbred C57BL ,Medical and Health Sciences ,Glucagon-Like Peptide-1 Receptor ,Endocrinology & Metabolism ,Islets of Langerhans ,Mice ,Underpinning research ,Glucagon-Like Peptide 1 ,Pregnancy ,Receptors ,Internal Medicine ,Receptors, Glucagon ,2.1 Biological and endogenous factors ,Animals ,Insulin ,Aetiology ,Metabolic and endocrine ,Nutrition ,Mice, Knockout ,Diabetes ,Glucagon ,Mice, Inbred C57BL ,Glucose ,Islet Studies ,Glucagon-Secreting Cells ,Female - Abstract
Pancreatic α-cells are important in maintaining metabolic homeostasis, but their role in regulating maternal metabolic adaptations to pregnancy has not been studied. The objective of this study was to determine whether pancreatic α-cells respond to pregnancy and their contribution to maternal metabolic adaptation. Using C57BL/6 mice, our study showed that pregnancy induced a significant increase of α-cell mass by promoting α-cell proliferation that was associated with a transitory increase of maternal serum glucagon concentration in early pregnancy. Maternal pancreatic GLP-1 content also was significantly increased during pregnancy. Using the inducible Cre/loxp technique, we ablated the α-cells (α-null) before and during pregnancy while maintaining enteroendocrine L-cells and serum GLP-1 in the normal range. In contrast to an improved glucose tolerance test (GTT) before pregnancy, significantly impaired GTT and remarkably higher serum glucose concentrations in the fed state were observed in α-null dams. Glucagon receptor antagonism treatment, however, did not affect measures of maternal glucose metabolism, indicating a dispensable role of glucagon receptor signaling in maternal glucose homeostasis. However, the GLP-1 receptor agonist improved insulin production and glucose metabolism of α-null dams. Furthermore, GLP-1 receptor antagonist Exendin (9-39) attenuated pregnancy-enhanced insulin secretion and GLP-1 restored glucose-induced insulin secretion of cultured islets from α-null dams. Together, these results demonstrate that α-cells play an essential role in controlling maternal metabolic adaptation to pregnancy by enhancing insulin secretion.
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.