17 results on '"Tianyi Zang"'
Search Results
2. Explore potential disease related metabolites based on latent factor model
- Author
-
Yongtian Wang, Liran Juan, Jiajie Peng, Tao Wang, Tianyi Zang, and Yadong Wang
- Subjects
Metabolite ,Disease similarity ,Disease diagnosis ,Matrix decomposition ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background In biological systems, metabolomics can not only contribute to the discovery of metabolic signatures for disease diagnosis, but is very helpful to illustrate the underlying molecular disease-causing mechanism. Therefore, identification of disease-related metabolites is of great significance for comprehensively understanding the pathogenesis of diseases and improving clinical medicine. Results In the paper, we propose a disease and literature driven metabolism prediction model (DLMPM) to identify the potential associations between metabolites and diseases based on latent factor model. We build the disease glossary with disease terms from different databases and an association matrix based on the mapping between diseases and metabolites. The similarity of diseases and metabolites is used to complete the association matrix. Finally, we predict potential associations between metabolites and diseases based on the matrix decomposition method. In total, 1,406 direct associations between diseases and metabolites are found. There are 119,206 unknown associations between diseases and metabolites predicted with a coverage rate of 80.88%. Subsequently, we extract training sets and testing sets based on data increment from the database of disease-related metabolites and assess the performance of DLMPM on 19 diseases. As a result, DLMPM is proven to be successful in predicting potential metabolic signatures for human diseases with an average AUC value of 82.33%. Conclusion In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. The results show that DLMPM has a better performance in prioritizing candidate diseases-related metabolites compared with the previous methods and would be helpful for researchers to reveal more information about human diseases.
- Published
- 2022
- Full Text
- View/download PDF
3. deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index
- Author
-
Bo Liu, Yadong Liu, Junyi Li, Hongzhe Guo, Tianyi Zang, and Yadong Wang
- Subjects
Long read alignment ,RNA-seq ,de Bruijn graph-based index ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract The alignment of long-read RNA sequencing reads is non-trivial due to high sequencing errors and complicated gene structures. We propose deSALT, a tailored two-pass alignment approach, which constructs graph-based alignment skeletons to infer exons and uses them to generate spliced reference sequences to produce refined alignments. deSALT addresses several difficult technical issues, such as small exons and sequencing errors, which break through bottlenecks of long RNA-seq read alignment. Benchmarks demonstrate that deSALT has a greater ability to produce accurate and homogeneous full-length alignments. deSALT is available at: https://github.com/hitbc/deSALT.
- Published
- 2019
- Full Text
- View/download PDF
4. Human mitochondrial genome compression using machine learning techniques
- Author
-
Rongjie Wang, Tianyi Zang, and Yadong Wang
- Subjects
Compression ,Human mitochondrial genomes ,Machine learning ,Medicine ,Genetics ,QH426-470 - Abstract
Abstract Background In recent years, with the development of high-throughput genome sequencing technologies, a large amount of genome data has been generated, which has caused widespread concern about data storage and transmission costs. However, how to effectively compression genome sequences data remains an unsolved problem. Results In this paper, we propose a compression method using machine learning techniques (DeepDNA), for compressing human mitochondrial genome data. The experimental results show the effectiveness of our proposed method compared with other on the human mitochondrial genome data. Conclusions The compression method we proposed can be classified as non-reference based method, but the compression effect is comparable to that of reference based methods. Moreover, our method not only have a well compression results in the population genome with large redundancy, but also in the single genome with small redundancy. The codes of DeepDNA are available at https://github.com/rongjiewang/DeepDNA.
- Published
- 2019
- Full Text
- View/download PDF
5. Enhancement and Imputation of Peak Signal Enables Accurate Cell-Type Classification in scATAC-seq
- Author
-
Zhe Cui, Ya Cui, Yan Gao, Tao Jiang, Tianyi Zang, and Yadong Wang
- Subjects
scATAC-seq ,classification ,machine learning ,support vector machine ,cell-type annotation ,Genetics ,QH426-470 - Abstract
Single-cell Assay Transposase Accessible Chromatin sequencing (scATAC-seq) has been widely used in profiling genome-wide chromatin accessibility in thousands of individual cells. However, compared with single-cell RNA-seq, the peaks of scATAC-seq are much sparser due to the lower copy numbers (diploid in humans) and the inherent missing signals, which makes it more challenging to classify cell type based on specific expressed gene or other canonical markers. Here, we present svmATAC, a support vector machine (SVM)-based method for accurately identifying cell types in scATAC-seq datasets by enhancing peak signal strength and imputing signals through patterns of co-accessibility. We applied svmATAC to several scATAC-seq data from human immune cells, human hematopoietic system cells, and peripheral blood mononuclear cells. The benchmark results showed that svmATAC is free of literature-based markers and robust across datasets in different libraries and platforms. The source code of svmATAC is available at https://github.com/mrcuizhe/svmATAC under the MIT license.
- Published
- 2021
- Full Text
- View/download PDF
6. Peptide-Major Histocompatibility Complex Class I Binding Prediction Based on Deep Learning With Novel Feature
- Author
-
Tianyi Zhao, Liang Cheng, Tianyi Zang, and Yang Hu
- Subjects
peptide-major histocompatibility complex class I binding prediction ,deep learning ,convolutional neural network ,epitope prediction ,human leukocyte antigen ,Genetics ,QH426-470 - Abstract
Peptide-based vaccine development needs accurate prediction of the binding affinity between major histocompatibility complex I (MHC I) proteins and their peptide ligands. Nowadays more and more machine learning methods have been developed to predict binding affinity and some of them have become the popular tools. However most of them are designed by the shallow neural networks. Bengio said that deep neural networks can learn better fits with less data than shallow neural networks. In our case, some of the alleles only have dozens of peptide data. In addition, we transform each peptide into a characteristic matrix and input it into the model. As we know when dealing with the problem that the input is a matrix, convolutional neural network (CNN) can find the most critical features by itself. Obviously, compared with the traditional neural network model, CNN is more suitable for predicting binding affinity. Different from the previous studies which are based on blocks substitution matrix (BLOSUM), we used novel feature to do the prediction. Since we consider that the order of the sequence, hydropathy index, polarity and the length of the peptide could affect the binding affinity and the properties of these amino acids are key factors for their binding to MHC, we extracted these information from each peptide. In order to make full use of the data we have obtained, we have integrated different lengths of peptides into 15mer based on the binding mode of peptide to MHC I. In order to demonstrate that our method is reliable to predict peptide-MHC binding, we compared our method with several popular methods. The experiments show the superiority of our method.
- Published
- 2019
- Full Text
- View/download PDF
7. Integrate GWAS, eQTL, and mQTL Data to Identify Alzheimer’s Disease-Related Genes
- Author
-
Tianyi Zhao, Yang Hu, Tianyi Zang, and Yadong Wang
- Subjects
Alzheimer’s disease ,Mendelian randomization ,GWAS ,eQTL ,mQTL ,Genetics ,QH426-470 - Abstract
It is estimated that the impact of related genes on the risk of Alzheimer’s disease (AD) is nearly 70%. Identifying candidate causal genes can help treatment and diagnosis. The maturity of sequencing technology and the reduction of cost make genome-wide association study (GWAS) become an important means to find disease-related mutation sites. Because of linkage disequilibrium (LD), neither the gene regulated by SNP nor the specific SNP can be determined. Because GWAS is affected by sample size and interaction, we introduced empirical Bayes (EB) to make a meta-analysis of GWAS to greatly eliminate the bias caused by sample and the interaction of SNP. In addition, most SNPs are in the noncoding region, so it is not clear how they relate to phenotype. In this paper, expression quantitative trait locus (eQTL) studies and methylation quantitative trait locus (mQTL) studies are combined with GWAS to find the genes associated with Alzheimer disease in expression levels by pleiotropy. Summary data-based Mendelian randomization (SMR) is introduced to integrate GWAS and eQTL/mQTL data. Finally, we prioritized 274 significant SNPs, which belong to 20 genes by eQTL analysis and 379 significant SNPs, which belong to seven known genes by mQTL. Among them, 93 SNPs and 2 genes are overlapped. Finally, we did 10 case studies to prove the effectiveness of our method.
- Published
- 2019
- Full Text
- View/download PDF
8. Predicting circRNA-Disease Associations Based on circRNA Expression Similarity and Functional Similarity
- Author
-
Yongtian Wang, Chenxi Nie, Tianyi Zang, and Yadong Wang
- Subjects
circRNA ,disease ,circRNA expression similarity ,circRNA functional similarity ,PersonalRank ,Genetics ,QH426-470 - Abstract
Circular RNAs (circRNAs) are a novel class of endogenous noncoding RNAs that have well-conserved sequences. Emerging evidence has shown that circRNAs can be novel biomarkers or therapeutic targets for many diseases and play an important role in the development of various pathological conditions. Therefore, identifying potential disease-related circRNAs is helpful in improving the efficiency of finding therapeutic targets for diseases. Here, we propose a computational model (PreCDA) to predict potential circRNA–disease associations. First, we calculated the circRNA expression similarity based on circRNA expression profiles. The circRNA functional similarity is calculated based on cosine similarity, and the disease similarity is used as the dimension of each circRNA vector. The associations between circRNAs and diseases are defined based on the circRNA functional similarity and expression similarity. We constructed a disease-related circRNA association network and used a graph-based recommendation algorithm (PersonalRank) to sort candidate disease-related circRNAs. As a result, PreCDA has an average area under the receiver operating characteristic curve value of 78.15% in predicting candidate disease-related circRNAs. In addition, we discuss the factors that affect the performance of this method and find some unknown circRNAs related to diseases, with several common diseases used as case studies. These results show that PreCDA has good performance in predicting potential circRNA–disease associations and is helpful for the diagnosis and treatment of human diseases.
- Published
- 2019
- Full Text
- View/download PDF
9. Identification of Alzheimer's Disease-Related Genes Based on Data Integration Method
- Author
-
Yang Hu, Tianyi Zhao, Tianyi Zang, Ying Zhang, and Liang Cheng
- Subjects
Alzheimer disease ,SNPs ,mendelian randomization ,GWAS ,eQTL ,Genetics ,QH426-470 - Abstract
Alzheimer disease (AD) is the fourth major cause of death in the elderly following cancer, heart disease and cerebrovascular disease. Finding candidate causal genes can help in the design of Gene targeted drugs and effectively reduce the risk of the disease. Complex diseases such as AD are usually caused by multiple genes. The Genome-wide association study (GWAS), has identified the potential genetic variants for most diseases. However, because of linkage disequilibrium (LD), it is difficult to identify the causative mutations that directly cause diseases. In this study, we combined expression quantitative trait locus (eQTL) studies with the GWAS, to comprehensively define the genes that cause Alzheimer disease. The method used was the Summary Mendelian randomization (SMR), which is a novel method to integrate summarized data. Two GWAS studies and five eQTL studies were referenced in this paper. We found several candidate SNPs that have a strong relationship with AD. Most of these SNPs overlap in different data sets, providing relatively strong reliability. We also explain the function of the novel AD-related genes we have discovered.
- Published
- 2019
- Full Text
- View/download PDF
10. Enhancement and Imputation of Peak Signal Enables Accurate Cell-Type Classification in scATAC-seq
- Author
-
Yadong Wang, Yan Gao, Ya Cui, Tianyi Zang, Tao Jiang, and Zhe Cui
- Subjects
Profiling (computer programming) ,cell-type annotation ,Cell type ,Source code ,lcsh:QH426-470 ,Computer science ,media_common.quotation_subject ,Computational biology ,Chromatin ,Support vector machine ,lcsh:Genetics ,machine learning ,classification ,Benchmark (computing) ,Genetics ,Molecular Medicine ,scATAC-seq ,support vector machine ,Imputation (statistics) ,Genetics (clinical) ,Transposase ,media_common ,Original Research - Abstract
Single-cell Assay Transposase Accessible Chromatin sequencing (scATAC-seq) has been widely used in profiling genome-wide chromatin accessibility in thousands of individual cells. However, compared with single-cell RNA-seq, the peaks of scATAC-seq are much sparser due to the lower copy numbers (diploid in humans) and the inherent missing signals, which makes it more challenging to classify cell type based on specific expressed gene or other canonical markers. Here, we present svmATAC, a support vector machine (SVM)-based method for accurately identifying cell types in scATAC-seq datasets by enhancing peak signal strength and imputing signals through patterns of co-accessibility. We applied svmATAC to several scATAC-seq data from human immune cells, human hematopoietic system cells, and peripheral blood mononuclear cells. The benchmark results showed that svmATAC is free of literature-based markers and robust across datasets in different libraries and platforms. The source code of svmATAC is available at https://github.com/mrcuizhe/svmATAC under the MIT license.
- Published
- 2021
11. Peptide-Major Histocompatibility Complex Class I Binding Prediction Based on Deep Learning With Novel Feature
- Author
-
Yang Hu, Liang Cheng, Tianyi Zhao, and Tianyi Zang
- Subjects
0301 basic medicine ,lcsh:QH426-470 ,Computer science ,convolutional neural network ,Computational biology ,Convolutional neural network ,Substitution matrix ,peptide-major histocompatibility complex class I binding prediction ,epitope prediction ,03 medical and health sciences ,0302 clinical medicine ,human leukocyte antigen ,MHC class I ,Genetics ,Feature (machine learning) ,Hydrophobicity scales ,Genetics (clinical) ,Original Research ,biology ,Artificial neural network ,business.industry ,Deep learning ,deep learning ,BLOSUM ,lcsh:Genetics ,030104 developmental biology ,030220 oncology & carcinogenesis ,biology.protein ,Molecular Medicine ,Artificial intelligence ,business - Abstract
Peptide-based vaccine development needs accurate prediction of the binding affinity between major histocompatibility complex I (MHC I) proteins and their peptide ligands. Nowadays more and more machine learning methods have been developed to predict binding affinity and some of them have become the popular tools. However most of them are designed by the shallow neural networks. Bengio said that deep neural networks can learn better fits with less data than shallow neural networks. In our case, some of the alleles only have dozens of peptide data. In addition, we transform each peptide into a characteristic matrix and input it into the model. As we know when dealing with the problem that the input is a matrix, convolutional neural network (CNN) can find the most critical features by itself. Obviously, compared with the traditional neural network model, CNN is more suitable for predicting binding affinity. Different from the previous studies which are based on blocks substitution matrix (BLOSUM), we used novel feature to do the prediction. Since we consider that the order of the sequence, hydropathy index, polarity and the length of the peptide could affect the binding affinity and the properties of these amino acids are key factors for their binding to MHC, we extracted these information from each peptide. In order to make full use of the data we have obtained, we have integrated different lengths of peptides into 15mer based on the binding mode of peptide to MHC I. In order to demonstrate that our method is reliable to predict peptide-MHC binding, we compared our method with several popular methods. The experiments show the superiority of our method.
- Published
- 2019
12. Integrate GWAS, eQTL, and mQTL Data to Identify Alzheimer’s Disease-Related Genes
- Author
-
Yang Hu, Tianyi Zang, Yadong Wang, and Tianyi Zhao
- Subjects
0301 basic medicine ,mQTL ,Linkage disequilibrium ,lcsh:QH426-470 ,Single-nucleotide polymorphism ,Genome-wide association study ,Computational biology ,Quantitative trait locus ,Biology ,eQTL ,03 medical and health sciences ,0302 clinical medicine ,Pleiotropy ,Mendelian randomization ,Genetics ,GWAS ,SNP ,Genetics (clinical) ,Original Research ,lcsh:Genetics ,030104 developmental biology ,030220 oncology & carcinogenesis ,Expression quantitative trait loci ,Molecular Medicine ,Alzheimer’s disease - Abstract
It is estimated that the impact of related genes on the risk of Alzheimer’s disease (AD) is nearly 70%. Identifying candidate causal genes can help treatment and diagnosis. The maturity of sequencing technology and the reduction of cost make genome-wide association study (GWAS) become an important means to find disease-related mutation sites. Because of linkage disequilibrium (LD), neither the gene regulated by SNP nor the specific SNP can be determined. Because GWAS is affected by sample size and interaction, we introduced empirical Bayes (EB) to make a meta-analysis of GWAS to greatly eliminate the bias caused by sample and the interaction of SNP. In addition, most SNPs are in the noncoding region, so it is not clear how they relate to phenotype. In this paper, expression quantitative trait locus (eQTL) studies and methylation quantitative trait locus (mQTL) studies are combined with GWAS to find the genes associated with Alzheimer disease in expression levels by pleiotropy. Summary data-based Mendelian randomization (SMR) is introduced to integrate GWAS and eQTL/mQTL data. Finally, we prioritized 274 significant SNPs, which belong to 20 genes by eQTL analysis and 379 significant SNPs, which belong to seven known genes by mQTL. Among them, 93 SNPs and 2 genes are overlapped. Finally, we did 10 case studies to prove the effectiveness of our method.
- Published
- 2019
13. Human mitochondrial genome compression using machine learning techniques
- Author
-
Yadong Wang, Rongjie Wang, and Tianyi Zang
- Subjects
lcsh:QH426-470 ,Computer science ,Population ,lcsh:Medicine ,Machine learning ,computer.software_genre ,Proteomics ,Genome ,Human mitochondrial genetics ,DNA sequencing ,Machine Learning ,03 medical and health sciences ,Redundancy (information theory) ,Compression (functional analysis) ,Drug Discovery ,Databases, Genetic ,Genetics ,Humans ,education ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,education.field_of_study ,Base Sequence ,Models, Genetic ,business.industry ,Research ,lcsh:R ,030302 biochemistry & molecular biology ,Compression ,Data Compression ,Human genetics ,lcsh:Genetics ,Genome, Mitochondrial ,Molecular Medicine ,Artificial intelligence ,Neural Networks, Computer ,business ,computer ,Human mitochondrial genomes ,Algorithms - Abstract
Background In recent years, with the development of high-throughput genome sequencing technologies, a large amount of genome data has been generated, which has caused widespread concern about data storage and transmission costs. However, how to effectively compression genome sequences data remains an unsolved problem. Results In this paper, we propose a compression method using machine learning techniques (DeepDNA), for compressing human mitochondrial genome data. The experimental results show the effectiveness of our proposed method compared with other on the human mitochondrial genome data. Conclusions The compression method we proposed can be classified as non-reference based method, but the compression effect is comparable to that of reference based methods. Moreover, our method not only have a well compression results in the population genome with large redundancy, but also in the single genome with small redundancy. The codes of DeepDNA are available at https://github.com/rongjiewang/DeepDNA.
- Published
- 2019
14. Predicting circRNA-Disease Associations Based on circRNA Expression Similarity and Functional Similarity
- Author
-
Yadong Wang, Tianyi Zang, Yongtian Wang, and Chenxi Nie
- Subjects
0301 basic medicine ,disease ,lcsh:QH426-470 ,PersonalRank ,Cosine similarity ,Disease ,Computational biology ,Biology ,Expression (mathematics) ,circRNA functional similarity ,lcsh:Genetics ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Similarity (network science) ,030220 oncology & carcinogenesis ,circRNA expression similarity ,Genetics ,Molecular Medicine ,circRNA ,Functional similarity ,Genetics (clinical) ,Original Research - Abstract
Circular RNAs (circRNAs) are a novel class of endogenous noncoding RNAs that have well-conserved sequences. Emerging evidence has shown that circRNAs can be novel biomarkers or therapeutic targets for many diseases and play an important role in the development of various pathological conditions. Therefore, identifying potential disease-related circRNAs is helpful in improving the efficiency of finding therapeutic targets for diseases. Here, we propose a computational model (PreCDA) to predict potential circRNA–disease associations. First, we calculated the circRNA expression similarity based on circRNA expression profiles. The circRNA functional similarity is calculated based on cosine similarity, and the disease similarity is used as the dimension of each circRNA vector. The associations between circRNAs and diseases are defined based on the circRNA functional similarity and expression similarity. We constructed a disease-related circRNA association network and used a graph-based recommendation algorithm (PersonalRank) to sort candidate disease-related circRNAs. As a result, PreCDA has an average area under the receiver operating characteristic curve value of 78.15% in predicting candidate disease-related circRNAs. In addition, we discuss the factors that affect the performance of this method and find some unknown circRNAs related to diseases, with several common diseases used as case studies. These results show that PreCDA has good performance in predicting potential circRNA–disease associations and is helpful for the diagnosis and treatment of human diseases.
- Published
- 2019
15. Identifying Alzheimer's Disease-related miRNA Based on Semi-clustering
- Author
-
Ningyi Zhang, Yadong Wang, Tianyi Zhao, Yang Hu, Tianyi Zang, and Donghua Wang
- Subjects
Computational Biology ,Computational biology ,Disease ,Biology ,Biomarker (cell) ,Support vector machine ,Correlation ,MicroRNAs ,Gene Expression Regulation ,Interaction network ,Alzheimer Disease ,Drug Discovery ,microRNA ,Genetics ,Molecular Medicine ,Cluster Analysis ,Humans ,Protein Interaction Maps ,Cluster analysis ,Molecular Biology ,Gene ,Genetics (clinical) ,Biomarkers - Abstract
Background: More and more scholars are trying to use it as a specific biomarker for Alzheimer’s Disease (AD) and mild cognitive impairment (MCI). Multiple studies have indicated that miRNAs are associated with poor axonal growth and loss of synaptic structures, both of which are early events in AD. The overall loss of miRNA may be associated with aging, increasing the incidence of AD, and may also be involved in the disease through some specific molecular mechanisms. Objective: Identifying Alzheimer’s disease-related miRNA can help us find new drug targets, early diagnosis. Materials and Methods: We used genes as a bridge to connect AD and miRNAs. Firstly, proteinprotein interaction network is used to find more AD-related genes by known AD-related genes. Then, each miRNA’s correlation with these genes is obtained by miRNA-gene interaction. Finally, each miRNA could get a feature vector representing its correlation with AD. Unlike other studies, we do not generate negative samples randomly with using classification method to identify AD-related miRNAs. Here we use a semi-clustering method ‘one-class SVM’. AD-related miRNAs are considered as outliers and our aim is to identify the miRNAs that are similar to known AD-related miRNAs (outliers). Results and Conclusion: We identified 257 novel AD-related miRNAs and compare our method with SVM which is applied by generating negative samples. The AUC of our method is much higher than SVM and we did case studies to prove that our results are reliable.
- Published
- 2019
16. Identification of Alzheimer's Disease-Related Genes Based on Data Integration Method
- Author
-
Tianyi Zang, Yang Hu, Liang Cheng, Tianyi Zhao, and Ying Zhang
- Subjects
0301 basic medicine ,Linkage disequilibrium ,lcsh:QH426-470 ,Heart disease ,Genome-wide association study ,Single-nucleotide polymorphism ,Disease ,Computational biology ,Biology ,eQTL ,03 medical and health sciences ,0302 clinical medicine ,Mendelian randomization ,medicine ,Genetics ,GWAS ,Genetics (clinical) ,Original Research ,medicine.disease ,lcsh:Genetics ,030104 developmental biology ,030220 oncology & carcinogenesis ,Expression quantitative trait loci ,Molecular Medicine ,mendelian randomization ,Alzheimer's disease ,Alzheimer disease ,SNPs - Abstract
Alzheimer disease (AD) is the fourth major cause of death in the elderly following cancer, heart disease and cerebrovascular disease. Finding candidate causal genes can help in the design of Gene targeted drugs and effectively reduce the risk of the disease. Complex diseases such as AD are usually caused by multiple genes. The Genome-wide association study (GWAS), has identified the potential genetic variants for most diseases. However, because of linkage disequilibrium (LD), it is difficult to identify the causative mutations that directly cause diseases. In this study, we combined expression quantitative trait locus (eQTL) studies with the GWAS, to comprehensively define the genes that cause Alzheimer disease. The method used was the Summary Mendelian randomization (SMR), which is a novel method to integrate summarized data. Two GWAS studies and five eQTL studies were referenced in this paper. We found several candidate SNPs that have a strong relationship with AD. Most of these SNPs overlap in different data sets, providing relatively strong reliability. We also explain the function of the novel AD-related genes we have discovered.
- Published
- 2018
17. The personal genome browser: visualizing functions of genetic variants
- Author
-
Tianyi Zang, Zhenxing Wang, Liran Juan, Yadong Wang, Tianjiao Zhang, Yafeng Hao, Mingxiang Teng, Jie Li, Chengwu Yan, and Yongzhuang Liu
- Subjects
Genetics ,Internet ,Variant Call Format ,Genome, Human ,business.industry ,Genetic Variation ,Genomics ,Computational biology ,Biology ,Genome ,Article ,DNA sequencing ,Computer Graphics ,Humans ,Human genome ,Personalized medicine ,1000 Genomes Project ,business ,Software ,Personal genomics - Abstract
Advances in high-throughput sequencing technologies have brought us into the individual genome era. Projects such as the 1000 Genomes Project have led the individual genome sequencing to become more and more popular. How to visualize, analyse and annotate individual genomes with knowledge bases to support genome studies and personalized healthcare is still a big challenge. The Personal Genome Browser (PGB) is developed to provide comprehensive functional annotation and visualization for individual genomes based on the genetic-molecular-phenotypic model. Investigators can easily view individual genetic variants, such as single nucleotide variants (SNVs), INDELs and structural variations (SVs), as well as genomic features and phenotypes associated to the individual genetic variants. The PGB especially highlights potential functional variants using the PGB built-in method or SIFT/PolyPhen2 scores. Moreover, the functional risks of genes could be evaluated by scanning individual genetic variants on the whole genome, a chromosome, or a cytoband based on functional implications of the variants. Investigators can then navigate to high risk genes on the scanned individual genome. The PGB accepts Variant Call Format (VCF) and Genetic Variation Format (GVF) files as the input. The functional annotation of input individual genome variants can be visualized in real time by well-defined symbols and shapes. The PGB is available at http://www.pgbrowser.org/.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.