82 results on '"Naoki Nariai"'
Search Results
2. A crowdsourced set of curated structural variants for the human genome.
- Author
-
Lesley M Chapman, Noah Spies, Patrick Pai, Chun Shen Lim, Andrew Carroll, Giuseppe Narzisi, Christopher M Watson, Christos Proukakis, Wayne E Clarke, Naoki Nariai, Eric Dawson, Garan Jones, Daniel Blankenberg, Christian Brueffer, Chunlin Xiao, Sree Rohit Raj Kolora, Noah Alexander, Paul Wolujewicz, Azza E Ahmed, Graeme Smith, Saadlee Shehreen, Aaron M Wenger, Marc Salit, and Justin M Zook
- Subjects
Biology (General) ,QH301-705.5 - Abstract
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.
- Published
- 2020
- Full Text
- View/download PDF
3. Pgltools: a genomic arithmetic tool suite for manipulation of Hi-C peak and other chromatin interaction data
- Author
-
William W. Greenwald, He Li, Erin N. Smith, Paola Benaglio, Naoki Nariai, and Kelly A. Frazer
- Subjects
Hi-CChIA-PET ,Chromatin conformation capture ,Peak ,Paired-genomic-loci ,Tool suite ,Bedtools ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Genomic interaction studies use next-generation sequencing (NGS) to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from “genomic interaction” studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed. Results This article details a file standard and novel software tool suite for working with paired-genomic-loci data. We present the paired-genomic-loci (PGL) file standard for genomic-interactions data, and the accompanying analysis tool suite “pgltools”: a cross platform, pypy compatible python package available both as an easy-to-use UNIX package, and as a python module, for integration into pipelines of paired-genomic-loci analyses. Conclusions Pgltools is a freely available, open source tool suite for manipulating paired-genomic-loci data. Source code, an in-depth manual, and a tutorial are available publicly at www.github.com/billgreenwald/pgltools , and a python module of the operations can be installed from PyPI via the PyGLtools module.
- Published
- 2017
- Full Text
- View/download PDF
4. iPSCORE: A Resource of 222 iPSC Lines Enabling Functional Characterization of Genetic Variation across a Variety of Cell Types
- Author
-
Athanasia D. Panopoulos, Matteo D'Antonio, Paola Benaglio, Roy Williams, Sherin I. Hashem, Bernhard M. Schuldt, Christopher DeBoever, Angelo D. Arias, Melvin Garcia, Bradley C. Nelson, Olivier Harismendy, David A. Jakubosky, Margaret K.R. Donovan, William W. Greenwald, KathyJean Farnam, Megan Cook, Victor Borja, Carl A. Miller, Jonathan D. Grinstein, Frauke Drees, Jonathan Okubo, Kenneth E. Diffenderfer, Yuriko Hishida, Veronica Modesto, Carl T. Dargitz, Rachel Feiring, Chang Zhao, Aitor Aguirre, Thomas J. McGarry, Hiroko Matsui, He Li, Joaquin Reyna, Fangwen Rao, Daniel T. O'Connor, Gene W. Yeo, Sylvia M. Evans, Neil C. Chi, Kristen Jepsen, Naoki Nariai, Franz-Josef Müller, Lawrence S.B. Goldstein, Juan Carlos Izpisua Belmonte, Eric Adler, Jeanne F. Loring, W. Travis Berggren, Agnieszka D'Antonio-Chronowska, Erin N. Smith, and Kelly A. Frazer
- Subjects
Medicine (General) ,R5-920 ,Biology (General) ,QH301-705.5 - Abstract
Summary: Large-scale collections of induced pluripotent stem cells (iPSCs) could serve as powerful model systems for examining how genetic variation affects biology and disease. Here we describe the iPSCORE resource: a collection of systematically derived and characterized iPSC lines from 222 ethnically diverse individuals that allows for both familial and association-based genetic studies. iPSCORE lines are pluripotent with high genomic integrity (no or low numbers of somatic copy-number variants) as determined using high-throughput RNA-sequencing and genotyping arrays, respectively. Using iPSCs from a family of individuals, we show that iPSC-derived cardiomyocytes demonstrate gene expression patterns that cluster by genetic background, and can be used to examine variants associated with physiological and disease phenotypes. The iPSCORE collection contains representative individuals for risk and non-risk alleles for 95% of SNPs associated with human phenotypes through genome-wide association studies. Our study demonstrates the utility of iPSCORE for examining how genetic variants influence molecular and physiological traits in iPSCs and derived cell lines. : Working as part of the NHLBI NextGen consortium, Panopoulos and colleagues report the derivation and characterization of 222 publicly available iPSCs from ethnically diverse individuals with corresponding genomic data including SNP arrays, RNA-seq, and whole-genome sequencing. This collection provides a powerful resource to investigate the function of genetic variants. Keywords: iPSCORE, iPSC, GWAS, molecular traits, physiological traits, cardiac disease, NHLBI Next Gen, LQT2, KCNH2, iPSC-derived cardiomyocytes
- Published
- 2017
- Full Text
- View/download PDF
5. Systematic genetic analysis of the MHC region reveals mechanistic underpinnings of HLA type associations with disease
- Author
-
Matteo D'Antonio, Joaquin Reyna, David Jakubosky, Margaret KR Donovan, Marc-Jan Bonder, Hiroko Matsui, Oliver Stegle, Naoki Nariai, Agnieszka D'Antonio-Chronowska, and Kelly A Frazer
- Subjects
major histocompatibility complex ,eQTLs ,gene expression ,HLA types ,Medicine ,Science ,Biology (General) ,QH301-705.5 - Abstract
The MHC region is highly associated with autoimmune and infectious diseases. Here we conduct an in-depth interrogation of associations between genetic variation, gene expression and disease. We create a comprehensive map of regulatory variation in the MHC region using WGS from 419 individuals to call eight-digit HLA types and RNA-seq data from matched iPSCs. Building on this regulatory map, we explored GWAS signals for 4083 traits, detecting colocalization for 180 disease loci with eQTLs. We show that eQTL analyses taking HLA type haplotypes into account have substantially greater power compared with only using single variants. We examined the association between the 8.1 ancestral haplotype and delayed colonization in Cystic Fibrosis, postulating that downregulation of RNF5 expression is the likely causal mechanism. Our study provides insights into the genetic architecture of the MHC region and pinpoints disease associations that are due to differential expression of HLA genes and non-HLA genes.
- Published
- 2019
- Full Text
- View/download PDF
6. Profiling of microRNA in human and mouse ES and iPS cells reveals overlapping but distinct microRNA expression patterns.
- Author
-
Siti Razila Abdul Razak, Kazuko Ueno, Naoya Takayama, Naoki Nariai, Masao Nagasaki, Rika Saito, Hideto Koso, Chen-Yi Lai, Miyako Murakami, Koichiro Tsuji, Tatsuo Michiue, Hiromitsu Nakauchi, Makoto Otsu, and Sumiko Watanabe
- Subjects
Medicine ,Science - Abstract
Using quantitative PCR-based miRNA arrays, we comprehensively analyzed the expression profiles of miRNAs in human and mouse embryonic stem (ES), induced pluripotent stem (iPS), and somatic cells. Immature pluripotent cells were purified using SSEA-1 or SSEA-4 and were used for miRNA profiling. Hierarchical clustering and consensus clustering by nonnegative matrix factorization showed two major clusters, human ES/iPS cells and other cell groups, as previously reported. Principal components analysis (PCA) to identify miRNAs that segregate in these two groups identified miR-187, 299-3p, 499-5p, 628-5p, and 888 as new miRNAs that specifically characterize human ES/iPS cells. Detailed direct comparisons of miRNA expression levels in human ES and iPS cells showed that several miRNAs included in the chromosome 19 miRNA cluster were more strongly expressed in iPS cells than in ES cells. Similar analysis was conducted with mouse ES/iPS cells and somatic cells, and several miRNAs that had not been reported to be expressed in mouse ES/iPS cells were suggested to be ES/iPS cell-specific miRNAs by PCA. Comparison of the average expression levels of miRNAs in ES/iPS cells in humans and mice showed quite similar expression patterns of human/mouse miRNAs. However, several mouse- or human-specific miRNAs are ranked as high expressers. Time course tracing of miRNA levels during embryoid body formation revealed drastic and different patterns of changes in their levels. In summary, our miRNA expression profiling encompassing human and mouse ES and iPS cells gave various perspectives in understanding the miRNA core regulatory networks regulating pluripotent cells characteristics.
- Published
- 2013
- Full Text
- View/download PDF
7. Probabilistic protein function prediction from heterogeneous genome-wide data.
- Author
-
Naoki Nariai, Eric D Kolaczyk, and Simon Kasif
- Subjects
Medicine ,Science - Abstract
Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.
- Published
- 2007
- Full Text
- View/download PDF
8. Sex-Specific Differences in the Transcriptome of the Human Dorsolateral Prefrontal Cortex in Schizophrenia
- Author
-
Zhiqian Yu, Kazuko Ueno, Ryo Funayama, Mai Sakai, Naoki Nariai, Kaname Kojima, Yoshie Kikuchi, Xue Li, Chiaki Ono, Junpei Kanatani, Jiro Ono, Kazuya Iwamoto, Kenji Hashimoto, Kengo Kinoshita, Keiko Nakayama, Masao Nagasaki, and Hiroaki Tomita
- Subjects
Cellular and Molecular Neuroscience ,Neurology ,Neuroscience (miscellaneous) - Abstract
Schizophrenia presents clinical and biological differences between males and females. This study investigated transcriptional profiles in the dorsolateral prefrontal cortex (DLPFC) using postmortem data from the largest RNA-sequencing (RNA-seq) database on schizophrenic cases and controls. Data for 154 male and 113 female controls and 160 male and 93 female schizophrenic cases were obtained from the CommonMind Consortium. In the RNA-seq database, the principal component analysis showed that sex effects were small in schizophrenia. After we analyzed the impact of sex-specific differences on gene expression, the female group showed more significantly changed genes compared with the male group. Based on the gene ontology analysis, the female sex-specific genes that changed were overrepresented in the mitochondrion, ATP (phosphocreatine and adenosine triphosphate)-, and metal ion-binding relevant biological processes. An ingenuity pathway analysis revealed that the differentially expressed genes related to schizophrenia in the female group were involved in midbrain dopaminergic and γ-aminobutyric acid (GABA)-ergic neurons and microglia. We used methylated DNA-binding domain-sequencing analyses and microarray to investigate the DNA methylation that potentially impacts the sex differences in gene transcription using a maternal immune activation (MIA) murine model. Among the sex-specific positional genes related to schizophrenia in the PFC of female offspring from MIA, the changes in the methylation and transcriptional expression of loci ACSBG1 were validated in the females with schizophrenia in independent postmortem samples by real-time PCR and pyrosequencing. Our results reveal potential genetic risks in the DLPFC for the sex-dependent prevalence and symptomology of schizophrenia.
- Published
- 2022
- Full Text
- View/download PDF
9. SVEM: A Structural Variant Estimation Method Using Multi-mapped Reads on Breakpoints.
- Author
-
Tomohiko Ohtsuki, Naoki Nariai, Kaname Kojima, Takahiro Mimori, Yukuto Sato, Yosuke Kawai, Yumi Yamaguchi-Kabata, Tetsuo Shibuya, and Masao Nagasaki
- Published
- 2014
- Full Text
- View/download PDF
10. HapMonster: A Statistically Unified Approach for Variant Calling and Haplotyping Based on Phase-Informative Reads.
- Author
-
Kaname Kojima, Naoki Nariai, Takahiro Mimori, Yumi Yamaguchi-Kabata, Yukuto Sato, Yosuke Kawai, and Masao Nagasaki
- Published
- 2014
- Full Text
- View/download PDF
11. Combining Hierarchical Inference in Ontologies with Heterogeneous Data Sources Improves Gene Function Prediction.
- Author
-
Xiaoyu Jiang, Naoki Nariai, Martin Steffen, Simon Kasif, David Gold, and Eric D. Kolaczyk
- Published
- 2008
- Full Text
- View/download PDF
12. Estimating Gene Networks from Expression Data and Binding Location Data via Boolean Networks.
- Author
-
Osamu Hirose, Naoki Nariai, Yoshinori Tamada, Hideo Bannai, Seiya Imoto, and Satoru Miyano
- Published
- 2005
- Full Text
- View/download PDF
13. Using Protein-Protein Interactions for Refining Gene Networks Estimated from Microarray Data by Bayesian Networks.
- Author
-
Naoki Nariai, SunYong Kim, Seiya Imoto, and Satoru Miyano
- Published
- 2004
14. TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.
- Author
-
Naoki Nariai, Osamu Hirose, Kaname Kojima, and Masao Nagasaki
- Published
- 2013
- Full Text
- View/download PDF
15. A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads.
- Author
-
Kaname Kojima, Naoki Nariai, Takahiro Mimori, Mamoru Takahashi, Yumi Yamaguchi-Kabata, Yukuto Sato, and Masao Nagasaki
- Published
- 2013
- Full Text
- View/download PDF
16. 295-OR: Identification of Type 1 Diabetes Genes and Regulatory Processes Mediating Pancreatic Beta-Cell Survival in Response to Proinflammatory Cytokines
- Author
-
PAOLA BENAGLIO, HAN ZHU, MEI-LIN OKINO, JIAN YAN, RUTH ELGAMAL, NAOKI NARIAI, ELISHA BEEBE, KATHA KORGAONKAR, YUNJIANG QIU, MARGARET DONOVAN, JOSHUA CHIOU, JACKLYN M. NEWSOME ASHMUS, JASPREET KAUR, SIERRA CORBAN, BING REN, KELLY FRAZER, MAIKE SANDER, and KYLE J. GAULTON
- Subjects
Endocrinology, Diabetes and Metabolism ,Internal Medicine - Abstract
The response of beta cells to inflammatory cytokines contributes to type 1 diabetes (T1D) risk, but the specific genes that underlie this response and mediate beta cell survival remain largely unknown. In this study we used a suite of functional genomics assays combined with human genetics, to map cis-regulatory programs in beta cells and to identify T1D genes that affect beta cell survival upon exposure to the pro-inflammatory cytokines IL1β, IFNγ and TNFα. We mapped 38,931 cytokine-responsive candidate cis -regulatory elements (cCREs) active in beta cells using ATAC-seq and single nuclear ATAC-seq (snATAC-seq) , and linked cytokine-responsive beta cell cCREs to putative target genes using single cell co-accessibility and HiChIP. We performed a genome-wide pooled CRISPR loss-of-function screen in EndoC-βH1 cells, which identified 867 genes affecting cytokine-induced beta cell loss. Genes that promoted beta cell survival and had up-regulated expression in cytokine exposure were specifically enriched at T1D loci, and these genes were preferentially involved in inhibiting inflammatory response, ubiquitin-mediated proteolysis, mitophagy and autophagy. We identified 2,229 variants in cytokine-responsive beta cell cCREs altering transcription factor (TF) binding using high-throughput SNP-SELEX, and variants altering binding of TF families regulating stress, inflammation and apoptosis were broadly enriched for T1D association. Finally, through integration with genetic fine mapping, we identified T1D risk variants regulating beta cell survival in cytokine exposure. At the 16p13 locus, a T1D variant affected TF binding in a cytokine-induced beta cell cCRE that physically interacted with the SOCS1 promoter, and increased SOCS1 activity promoted beta cell survival in cytokine exposure. Together our findings reveal processes and genes acting in beta cells during cytokine exposure that intrinsically modulate risk of T1D. Disclosure P.Benaglio: None. M.Donovan: None. J.Chiou: Employee; Pfizer Inc. J.M.Newsome ashmus: None. J.Kaur: None. S.Corban: None. B.Ren: Stock/Shareholder; Arima Genomics, Epigenome Technologies. K.Frazer: n/a. M.Sander: None. K.J.Gaulton: Consultant; Genentech, Inc., Stock/Shareholder; Neurocrine Biosciences, Inc., Vertex Pharmaceuticals Incorporated. H.Zhu: None. M.Okino: None. J.Yan: None. R.Elgamal: None. N.Nariai: None. E.Beebe: None. K.Korgaonkar: None. Y.Qiu: None. Funding NIH DK122607
- Published
- 2022
- Full Text
- View/download PDF
17. Systematic analysis of binding of transcription factors to noncoding variants
- Author
-
Kyle J. Gaulton, Nick Vinckier, Xiaoyu Li, Yimeng Yin, Jussi Taipale, Anugraha Raman, André M. Ribeiro dos Santos, Paola Benaglio, Jian Yan, Yang Eric Li, Kelly A. Frazer, Fulin Chen, Shicai Fan, Maike Sander, Joshua Chiou, Naoki Nariai, Bing Ren, and Yunjiang Qiu
- Subjects
0303 health sciences ,Multidisciplinary ,Ligand binding assay ,Computational biology ,Biology ,Noncoding DNA ,03 medical and health sciences ,0302 clinical medicine ,Transcription (biology) ,Multiplex ,Human genome ,Transcription factor ,030217 neurology & neurosurgery ,Systematic evolution of ligands by exponential enrichment ,030304 developmental biology ,Genetic association - Abstract
Many sequence variants have been linked to complex human traits and diseases1, but deciphering their biological functions remains challenging, as most of them reside in noncoding DNA. Here we have systematically assessed the binding of 270 human transcription factors to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein-DNA binding assay, termed single-nucleotide polymorphism evaluation by systematic evolution of ligands by exponential enrichment (SNP-SELEX). The resulting 828 million measurements of transcription factor-DNA interactions enable estimation of the relative affinity of these transcription factors to each variant in vitro and evaluation of the current methods to predict the effects of noncoding variants on transcription factor binding. We show that the position weight matrices of most transcription factors lack sufficient predictive power, whereas the support vector machine combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human transcription factors and demonstrate their utility in genome-wide association studies and understanding of the molecular pathways involved in diverse human traits and diseases.
- Published
- 2021
- Full Text
- View/download PDF
18. Type 1 diabetes risk genes mediate pancreatic beta cell survival in response to proinflammatory cytokines
- Author
-
Paola Benaglio, Han Zhu, Mei-Lin Okino, Jian Yan, Ruth Elgamal, Naoki Nariai, Elisha Beebe, Katha Korgaonkar, Yunjiang Qiu, Margaret K.R. Donovan, Joshua Chiou, Gaowei Wang, Jacklyn Newsome, Jaspreet Kaur, Michael Miller, Sebastian Preissl, Sierra Corban, Anthony Aylward, Jussi Taipale, Bing Ren, Kelly A. Frazer, Maike Sander, Kyle J. Gaulton, Department of Pathology, Biosciences, Doctoral Programme in Integrative Life Science, and Jussi Taipale / Principal Investigator
- Subjects
Pediatric ,accessible chromatin ,type 1 diabetes ,Prevention ,high-throughput reporter assay ,Human Genome ,Diabetes ,human genetics ,Autoimmune Disease ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,beta cell ,CRISPR screen ,Cardiovascular and Metabolic Diseases ,3121 General medicine, internal medicine and other clinical medicine ,proinflammatory cytokines ,gene expression ,Genetics ,2.1 Biological and endogenous factors ,3111 Biomedicine ,Aetiology ,3D chromatin interactions ,functional genomics ,Metabolic and endocrine - Abstract
Publisher Copyright: © 2022 We combined functional genomics and human genetics to investigate processes that affect type 1 diabetes (T1D) risk by mediating beta cell survival in response to proinflammatory cytokines. We mapped 38,931 cytokine-responsive candidate cis-regulatory elements (cCREs) in beta cells using ATAC-seq and snATAC-seq and linked them to target genes using co-accessibility and HiChIP. Using a genome-wide CRISPR screen in EndoC-βH1 cells, we identified 867 genes affecting cytokine-induced survival, and genes promoting survival and up-regulated in cytokines were enriched at T1D risk loci. Using SNP-SELEX, we identified 2,229 variants in cytokine-responsive cCREs altering transcription factor (TF) binding, and variants altering binding of TFs regulating stress, inflammation, and apoptosis were enriched for T1D risk. At the 16p13 locus, a fine-mapped T1D variant altering TF binding in a cytokine-induced cCRE interacted with SOCS1, which promoted survival in cytokine exposure. Our findings reveal processes and genes acting in beta cells during inflammation that modulate T1D risk.
- Published
- 2022
- Full Text
- View/download PDF
19. Type 1 diabetes risk genes mediate pancreatic beta cell survival in response to proinflammatory cytokines
- Author
-
Aylward A, Kelly A. Frazer, Kaur J, Ren B, Beebe E, Korgaonkar K, Elgamal R, Zhu H, Paola Benaglio, Chiou J, Jussi Taipale, Donovan M, Kyle J. Gaulton, Maike Sander, Okino M, Naoki Nariai, Yan J, Qiu Y, Newsome J, and Corban S
- Subjects
Type 1 diabetes ,Text mining ,business.industry ,Immunology ,medicine ,Biology ,Beta cell ,medicine.disease ,business ,Gene ,Proinflammatory cytokine - Abstract
Beta cells intrinsically contribute to the pathogenesis of type 1 diabetes (T1D), but the genes and molecular processes that mediate beta cell survival in T1D remain largely unknown. We combined high throughput functional genomics and human genetics to identify T1D risk loci regulating genes affecting beta cell survival in response to the proinflammatory cytokines IL-1β, IFNγ, and TNFα. We mapped 38,931 cytokine-responsive candidate cis-regulatory elements (cCREs) active in beta cells using ATAC-seq and single nuclear ATAC-seq (snATAC-seq), and linked cytokine-responsive beta cell cCREs to putative target genes using single cell co-accessibility and HiChIP. We performed a genome-wide pooled CRISPR loss-of-function screen in EndoC-βH1 cells, which identified 867 genes affecting cytokine-induced beta cell loss. Genes that promoted beta cell survival and had up-regulated expression in cytokine exposure were specifically enriched at T1D loci, and these genes were preferentially involved in inhibiting inflammatory response, ubiquitin-mediated proteolysis, mitophagy and autophagy. We identified 2,229 variants in cytokine-responsive beta cell cCREs altering transcription factor (TF) binding using high-throughput SNP-SELEX, and variants altering binding of TF families regulating stress, inflammation and apoptosis were broadly enriched for T1D association. Finally, through integration with genetic fine mapping, we annotated T1D loci regulating beta cell survival in cytokine exposure. At the 16p13 locus, a T1D variant affected TF binding in a cytokine-induced beta cell cCRE that physically interacted with the SOCS1 promoter, and increased SOCS1 activity promoted beta cell survival in cytokine exposure. Together our findings reveal processes and genes acting in beta cells during cytokine exposure that intrinsically modulate risk of T1D.
- Published
- 2021
- Full Text
- View/download PDF
20. A crowdsourced set of curated structural variants for the human genome
- Author
-
Eric T. Dawson, Chunlin Xiao, Noah Alexander, Sree Rohit Raj Kolora, Lesley M. Chapman, Aaron M. Wenger, Christopher M. Watson, Giuseppe Narzisi, Justin M. Zook, Daniel Blankenberg, Christian Brueffer, Graeme C. Smith, Marc L. Salit, Azza Ahmed, Paul Wolujewicz, Saadlee Shehreen, Naoki Nariai, Patrick Pai, Christos Proukakis, Andrew Carroll, Garan Jones, Wayne E. Clarke, Noah Spies, Chun Shen Lim, Chapman, Lesley M [0000-0001-7413-4392], Spies, Noah [0000-0002-6759-9842], Pai, Patrick [0000-0001-5304-788X], Lim, Chun Shen [0000-0001-7015-0125], Carroll, Andrew [0000-0002-4824-6689], Narzisi, Giuseppe [0000-0003-1118-8849], Watson, Christopher M [0000-0003-2371-1844], Proukakis, Christos [0000-0001-6423-6539], Clarke, Wayne E [0000-0003-2471-0712], Dawson, Eric [0000-0001-5448-1653], Jones, Garan [0000-0002-8917-3930], Brueffer, Christian [0000-0002-3826-0989], Kolora, Sree Rohit Raj [0000-0001-7839-735X], Wolujewicz, Paul [0000-0003-2982-9448], Ahmed, Azza E [0000-0002-1358-8371], Smith, Graeme [0000-0002-7413-4998], Shehreen, Saadlee [0000-0002-4869-0747], Wenger, Aaron M [0000-0003-1183-0432], Salit, Marc [0000-0003-1624-5195], and Apollo - University of Cambridge Repository
- Subjects
0301 basic medicine ,Heredity ,Computer science ,Genome ,Database and Informatics Methods ,0302 clinical medicine ,INDEL Mutation ,Heuristics ,Genome Sequencing ,Biology (General) ,Ecology ,Genomics ,Genetic Mapping ,Tandem Repeats ,Computational Theory and Mathematics ,Modeling and Simulation ,Sequence Analysis ,Research Article ,Bioinformatics ,QH301-705.5 ,Concordance ,Variant Genotypes ,Computational biology ,Research and Analysis Methods ,Genome Complexity ,DNA sequencing ,Set (abstract data type) ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetics ,Humans ,Repeated Sequences ,Molecular Biology Techniques ,Sequencing Techniques ,Indel ,Molecular Biology ,Alleles ,Ecology, Evolution, Behavior and Systematics ,Genome, Human ,Biology and Life Sciences ,Computational Biology ,Genome Analysis ,030104 developmental biology ,Haplotypes ,Genetic Loci ,Genomic Structural Variation ,Human genome ,Sequence Alignment ,030217 neurology & neurosurgery ,Reference genome - Abstract
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app—SVCurator—to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. ‘Expert’ curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of ‘expert’ curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies., Author summary Large genomic changes, called structural variants, can cause a variety of human diseases, but have been challenging to detect with conventional DNA sequencing methods. We are working in the Genome in a Bottle Consortium to develop authoritatively characterized genomes with benchmark structural variants that can be used by anyone to assess the accuracy of their sequencing and analysis methods. Manual curation of the sequencing reads from multiple technologies has been essential to establish benchmark variant calls. Here, we present consensus curations from a web-based platform that displays a comprehensive set of visualizations of sequencing read support for structural variants. We use the svviz visualization tool to present evidence not only for deletions but also for insertions, which have previously not been possible to curate. We derive consensus calls from the multiple curations of each variant, and we find these are highly concordant with a draft Genome in a Bottle structural variant benchmark set.
- Published
- 2020
21. Integration of relational and hierarchical network information for protein function prediction.
- Author
-
Xiaoyu Jiang, Naoki Nariai, Martin Steffen, Simon Kasif, and Eric D. Kolaczyk
- Published
- 2008
- Full Text
- View/download PDF
22. A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes.
- Author
-
Naoki Nariai, Kaname Kojima, Takahiro Mimori, Yosuke Kawai, and Masao Nagasaki
- Published
- 2016
- Full Text
- View/download PDF
23. Estimating copy numbers of alleles from population-scale high-throughput sequencing data.
- Author
-
Takahiro Mimori, Naoki Nariai, Kaname Kojima, Yukuto Sato, Yosuke Kawai, Yumi Yamaguchi-Kabata, and Masao Nagasaki
- Published
- 2015
- Full Text
- View/download PDF
24. HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data.
- Author
-
Naoki Nariai, Kaname Kojima, Sakae Saito, Takahiro Mimori, Yukuto Sato, Yosuke Kawai, Yumi Yamaguchi-Kabata, Jun Yasuda, and Masao Nagasaki
- Published
- 2015
- Full Text
- View/download PDF
25. iPSCORE: A Resource of 222 iPSC Lines Enabling Functional Characterization of Genetic Variation across a Variety of Cell Types
- Author
-
Sylvia M. Evans, Frauke Drees, Naoki Nariai, Lawrence S.B. Goldstein, Chang Zhao, Neil C. Chi, Erin N. Smith, Jonathan D. Grinstein, Olivier Harismendy, Rachel Feiring, Aitor Aguirre, Agnieszka D'Antonio-Chronowska, Eric Adler, Matteo D’Antonio, Kelly A. Frazer, William W. Greenwald, Yuriko Hishida, Fangwen Rao, Juan Carlos Izpisua Belmonte, Hiroko Matsui, Bernhard M. Schuldt, Sherin Hashem, KathyJean Farnam, Victor Borja, Paola Benaglio, Melvin Garcia, Jonathan Okubo, Franz-Josef Müller, Bradley C. Nelson, Veronica Modesto, He Li, Kenneth E. Diffenderfer, Thomas J. McGarry, Roy Williams, Margaret K.R. Donovan, Carl T. Dargitz, Gene W. Yeo, David Jakubosky, Megan Cook, Kristen Jepsen, Daniel T. O'Connor, Carl A. Miller, Christopher DeBoever, Joaquin Reyna, Athanasia D. Panopoulos, Angelo Arias, Jeanne F. Loring, and W. Travis Berggren
- Subjects
0301 basic medicine ,Databases, Factual ,Genome-wide association study ,Arrhythmias ,Biochemistry ,2.1 Biological and endogenous factors ,GWAS ,iPSC-derived cardiomyocytes ,LQT2 ,KCNH2 ,Myocytes, Cardiac ,Stem Cell Research - Induced Pluripotent Stem Cell - Non-Human ,Aetiology ,Induced pluripotent stem cell ,lcsh:QH301-705.5 ,Oligonucleotide Array Sequence Analysis ,Genetics ,lcsh:R5-920 ,iPSC ,Continental Population Groups ,Stem Cell Research - Induced Pluripotent Stem Cell - Human ,High-Throughput Nucleotide Sequencing ,Cell Differentiation ,Single Nucleotide ,Cellular Reprogramming ,Phenotype ,molecular traits ,Multigene Family ,lcsh:Medicine (General) ,Cardiac ,Biotechnology ,Resource ,cardiac disease ,Genotype ,Clinical Sciences ,Induced Pluripotent Stem Cells ,NHLBI Next Gen ,Single-nucleotide polymorphism ,Biology ,Polymorphism, Single Nucleotide ,Cell Line ,Databases ,03 medical and health sciences ,Clinical Research ,Genetic variation ,iPSCORE ,Humans ,physiological traits ,Polymorphism ,Allele ,Genotyping ,Factual ,Genetic Association Studies ,Genetic association ,Myocytes ,Stem Cell Research - Induced Pluripotent Stem Cell ,Human Genome ,Racial Groups ,Genetic Variation ,Arrhythmias, Cardiac ,Cell Biology ,Stem Cell Research ,030104 developmental biology ,lcsh:Biology (General) ,Biochemistry and Cell Biology ,Developmental Biology - Abstract
Summary Large-scale collections of induced pluripotent stem cells (iPSCs) could serve as powerful model systems for examining how genetic variation affects biology and disease. Here we describe the iPSCORE resource: a collection of systematically derived and characterized iPSC lines from 222 ethnically diverse individuals that allows for both familial and association-based genetic studies. iPSCORE lines are pluripotent with high genomic integrity (no or low numbers of somatic copy-number variants) as determined using high-throughput RNA-sequencing and genotyping arrays, respectively. Using iPSCs from a family of individuals, we show that iPSC-derived cardiomyocytes demonstrate gene expression patterns that cluster by genetic background, and can be used to examine variants associated with physiological and disease phenotypes. The iPSCORE collection contains representative individuals for risk and non-risk alleles for 95% of SNPs associated with human phenotypes through genome-wide association studies. Our study demonstrates the utility of iPSCORE for examining how genetic variants influence molecular and physiological traits in iPSCs and derived cell lines., Graphical Abstract, Highlights • iPSCORE: A collection of publicly available iPSCs from 222 individuals • Several multigenerational families and individuals of various ethnicities and ages • Individuals carrying risk and non-risk genotypes for 95% of GWAS SNPs • Genetic variants associated with mRNA expression in differentiated cardiomyocytes, Working as part of the NHLBI NextGen consortium, Panopoulos and colleagues report the derivation and characterization of 222 publicly available iPSCs from ethnically diverse individuals with corresponding genomic data including SNP arrays, RNA-seq, and whole-genome sequencing. This collection provides a powerful resource to investigate the function of genetic variants.
- Published
- 2017
- Full Text
- View/download PDF
26. Author response: Systematic genetic analysis of the MHC region reveals mechanistic underpinnings of HLA type associations with disease
- Author
-
Oliver Stegle, Marc Jan Bonder, Agnieszka D'Antonio-Chronowska, Matteo D’Antonio, Joaquin Reyna, Naoki Nariai, Margaret K.R. Donovan, David Jakubosky, Hiroko Matsui, and Kelly A. Frazer
- Subjects
Genetics ,biology ,biology.protein ,Human leukocyte antigen ,Disease ,Major histocompatibility complex ,Genetic analysis - Published
- 2019
- Full Text
- View/download PDF
27. Systematic genetic analysis of the MHC region reveals mechanistic underpinnings of HLA type associations with disease
- Author
-
Naoki Nariai, Joaquin Reyna, David Jakubosky, Matteo D’Antonio, Kelly A. Frazer, Hiroko Matsui, Margaret K.R. Donovan, Agnieszka D'Antonio-Chronowska, Marc Jan Bonder, and Oliver Stegle
- Subjects
0301 basic medicine ,Male ,Cystic Fibrosis ,Genome-wide association study ,Major Histocompatibility Complex ,0302 clinical medicine ,computational biology ,HLA Antigens ,80 and over ,2.1 Biological and endogenous factors ,genetics ,RNA-Seq ,HLA types ,Aetiology ,Biology (General) ,Genetics ,Aged, 80 and over ,General Neuroscience ,eQTLs ,Chromosome Mapping ,systems biology ,General Medicine ,Single Nucleotide ,Middle Aged ,3. Good health ,Medicine ,Female ,Biotechnology ,Research Article ,Computational and Systems Biology ,Human ,Adult ,Adolescent ,QH301-705.5 ,Science ,Quantitative Trait Loci ,Genomics ,Human leukocyte antigen ,Biology ,Major histocompatibility complex ,Autoimmune Disease ,Polymorphism, Single Nucleotide ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Young Adult ,Genetic variation ,genomics ,Humans ,Genetic Predisposition to Disease ,human ,Polymorphism ,Alleles ,Aged ,General Immunology and Microbiology ,Haplotype ,Human Genome ,Genetics and Genomics ,Genetic architecture ,030104 developmental biology ,Good Health and Well Being ,Haplotypes ,Expression quantitative trait loci ,biology.protein ,gene expression ,Biochemistry and Cell Biology ,030217 neurology & neurosurgery ,Genome-Wide Association Study - Abstract
The MHC region is highly associated with autoimmune and infectious diseases. Here we conduct an in-depth interrogation of associations between genetic variation, gene expression and disease. We create a comprehensive map of regulatory variation in the MHC region using WGS from 419 individuals to call eight-digit HLA types and RNA-seq data from matched iPSCs. Building on this regulatory map, we explored GWAS signals for 4083 traits, detecting colocalization for 180 disease loci with eQTLs. We show that eQTL analyses taking HLA type haplotypes into account have substantially greater power compared with only using single variants. We examined the association between the 8.1 ancestral haplotype and delayed colonization in Cystic Fibrosis, postulating that downregulation of RNF5 expression is the likely causal mechanism. Our study provides insights into the genetic architecture of the MHC region and pinpoints disease associations that are due to differential expression of HLA genes and non-HLA genes.
- Published
- 2019
28. Pancreatic islet chromatin accessibility and conformation reveals distal enhancer networks of type 2 diabetes risk
- Author
-
Nikita Kadakia, Jian Yan, Joseph Avruch, Anthony Aylward, Dana Kramer, Kelly A. Frazer, Laura Regué, Jee Yun Han, Kyle J. Gaulton, David U. Gorkin, Liliana Minichiello, Nicholas Vinckier, Maike Sander, Ning Dai, Joshua Chiou, Mei-Lin Okino, Frauke Drees, Naoki Nariai, William W. Greenwald, Allen Wang, Bing Ren, and Yunjiang Qiu
- Subjects
0301 basic medicine ,Epigenomics ,Male ,endocrine system diseases ,genetic processes ,Molecular Conformation ,General Physics and Astronomy ,Genome-wide association study ,02 engineering and technology ,Inbred C57BL ,Mice ,Gene expression ,Genetics research ,2.1 Biological and endogenous factors ,Insulin ,Gene Regulatory Networks ,Aetiology ,lcsh:Science ,Mice, Knockout ,Multidisciplinary ,geography.geographical_feature_category ,Diabetes ,RNA-Binding Proteins ,Middle Aged ,021001 nanoscience & nanotechnology ,Islet ,Chromatin ,Cell biology ,Transport protein ,Enhancer Elements, Genetic ,Female ,0210 nano-technology ,Type 2 ,Biotechnology ,Adult ,endocrine system ,Enhancer Elements ,Knockout ,Science ,Quantitative Trait Loci ,Biology ,Chromatin structure ,General Biochemistry, Genetics and Molecular Biology ,Article ,03 medical and health sciences ,Islets of Langerhans ,Genetic ,Diabetes Mellitus ,Genetics ,Animals ,Humans ,natural sciences ,Genetic Predisposition to Disease ,Enhancer ,Gene ,Metabolic and endocrine ,Cell Nucleus ,geography ,Gene Expression Profiling ,Human Genome ,General Chemistry ,Chromatin Assembly and Disassembly ,Gene expression profiling ,Mice, Inbred C57BL ,030104 developmental biology ,Glucose ,Diabetes Mellitus, Type 2 ,Cardiovascular and Metabolic Diseases ,lcsh:Q - Abstract
Genetic variants affecting pancreatic islet enhancers are central to T2D risk, but the gene targets of islet enhancer activity are largely unknown. We generate a high-resolution map of islet chromatin loops using Hi-C assays in three islet samples and use loops to annotate target genes of islet enhancers defined using ATAC-seq and published ChIP-seq data. We identify candidate target genes for thousands of islet enhancers, and find that enhancer looping is correlated with islet-specific gene expression. We fine-map T2D risk variants affecting islet enhancers, and find that candidate target genes of these variants defined using chromatin looping and eQTL mapping are enriched in protein transport and secretion pathways. At IGF2BP2, a fine-mapped T2D variant reduces islet enhancer activity and IGF2BP2 expression, and conditional inactivation of IGF2BP2 in mouse islets impairs glucose-stimulated insulin secretion. Our findings provide a resource for studying islet enhancer function and identifying genes involved in T2D risk., Risk loci for type 2 diabetes (T2D) reside in pancreatic islet enhancers. Here, the authors generate high-resolution maps of islet chromatin conformation using Hi-C which they combine with ATAC-seq and ChIP-seq data to annotate candidate target genes of enhancers and validate IGF2BP2 activity in mouse islets.
- Published
- 2019
29. SVCurator: A Crowdsourcing app to visualize evidence of structural variants for the human genome
- Author
-
Christos Proukakis, Andrew Carroll, Sree Rohit Raj Kolora, Graeme C. Smith, Daniel Blankenberg, Chunlin Xiao, Garan Jones, Aaron M. Wenger, Christian Brueffer, Noah Alexander, Lesley M. Chapman, Noah Spies, Azza Ahmed, Naoki Nariai, Saadlee Shehreen, Christopher M. Watson, Justin M. Zook, Giuseppe Narzisi, Marc L. Salit, Paul Wolujewicz, Chun Shen Lim, Wayne E. Clarke, Patrick Pai, and Eric T. Dawson
- Subjects
medicine.medical_specialty ,business.industry ,Computer science ,Concordance ,Genomics ,Computational biology ,Crowdsourcing ,Genome ,Set (abstract data type) ,medicine ,Medical genetics ,Human genome ,business ,Indel ,Reference genome - Abstract
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is yet to be defined. In this study, we manually curated 1235 SVs which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app – SVCurator – to help curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy.SVCurator is a Python Flask-based web platform that displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002], We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. The crowdsourced results were highly concordant with 37 out of the 61 curators having at least 78% concordance with a set of ‘expert’ curators, where there was 93% concordance amongst ‘expert’ curators. This produced high confidence labels for 935 events. When compared to the heuristic-based draft benchmark SV callset from GIAB, the SVCurator crowdsourced labels were 94.5% concordant with the benchmark set. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.
- Published
- 2019
- Full Text
- View/download PDF
30. Estimating gene regulatory networks and protein-protein interactions of Saccharomyces cerevisiae from multiple genome-wide data.
- Author
-
Naoki Nariai, Yoshinori Tamada, Seiya Imoto, and Satoru Miyano
- Published
- 2005
- Full Text
- View/download PDF
31. iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data.
- Author
-
Takahiro Mimori, Naoki Nariai, Kaname Kojima, Mamoru Takahashi, Akira Ono, Yukuto Sato, Yumi Yamaguchi-Kabata, and Masao Nagasaki
- Published
- 2013
- Full Text
- View/download PDF
32. Pancreatic islet chromatin accessibility and conformation defines distal enhancer networks of type 2 diabetes risk
- Author
-
David U. Gorkin, Nicholas Vinckier, Frauke Drees, Naoki Nariai, Yunjiang Qiu, Allen Wang, Ning Dai, Jee Yun Han, William W. Greenwald, Joshua Chiou, Laura Regué Barrufet, Maike Sander, Mei-Lin Okino, Bing Ren, Joseph Avruch, Kelly A. Frazer, Anthony Aylward, Nikita Kadakia, Liliana Minichiello, Jian Yan, and Kyle J. Gaulton
- Subjects
endocrine system ,0303 health sciences ,geography ,geography.geographical_feature_category ,endocrine system diseases ,Pancreatic islets ,Gene regulatory network ,Biology ,Islet ,Chromatin ,Cell biology ,Transport protein ,03 medical and health sciences ,0302 clinical medicine ,medicine.anatomical_structure ,medicine ,Secretion ,Enhancer ,Gene ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
The gene targets of enhancer activity in pancreatic islets are largely unknown, impeding discovery of islet regulatory networks involved in type 2 diabetes (T2D) risk. We mapped chromatin state, accessibility and conformation using ChIP-seq, ATAC-seq and Hi-C in human pancreatic islets, which we integrated with T2D genetic fine-mapping and islet expression QTL data. Active islet regulatory elements preferentially interacted with other active elements, often at distances over 1MB, and we identified target genes for thousands of distal islet enhancers. A third of T2D risk signals mapped in islet enhancers, and target genes regulated by these signals were specifically involved in processes related to protein transport and secretion. Among implicated target genes of T2D islet enhancer signals with no prior known role in islet function, we demonstrated that reduced IGF2BP2 activity in mouse islets leads to impaired glucose-stimulated insulin secretion. These results link distal islet enhancer regulation of protein secretion and transport to genetic risk of T2D, and highlight the utility of high-throughput chromatin conformation maps to uncover the gene regulatory networks of complex disease.
- Published
- 2018
- Full Text
- View/download PDF
33. Construction of full-length Japanese reference panel of class I HLA genes with single-molecule, real-time sequencing
- Author
-
Nobuo Fuse, Naomi Nakai-Inagaki, Atsushi Hozawa, Akira Ono, Takahiro Mimori, Yoko Kuroki, Kazuharu Misawa, Junichi Sugawara, Sakae Saito, Naoki Nariai, Kengo Kinoshita, Shinichi Kuriyama, Jun Yasuda, Yosuke Kawai, Masao Nagasaki, Tomoko F. Shibata, Keiko Tateno, Naoko Minegishi, Kichiya Suzuki, Masayuki Yamamoto, and Fumiki Katsuoka
- Subjects
0301 basic medicine ,Genotype ,Sequence analysis ,Computational biology ,Human leukocyte antigen ,Biology ,030226 pharmacology & pharmacy ,Article ,03 medical and health sciences ,0302 clinical medicine ,Japan ,Genetic variation ,Genetics ,Humans ,Allele ,Alleles ,Genetic association ,Pharmacology ,Genome, Human ,Histocompatibility Testing ,Histocompatibility Antigens Class I ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Sequence Analysis, DNA ,Transplantation ,030104 developmental biology ,Molecular Medicine ,Single molecule real time sequencing - Abstract
Human leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used. The panel consisted of 139 alleles, which were all extended from known IPD-IMGT/HLA sequences, contained 40 with novel variants, and captured more than 96.5% of allelic diversity in 1KJPN. These newly available sequences would be important resources for research and clinical applications including high-resolution HLA typing, genetic association studies, and analyzes of cis-regulatory elements.
- Published
- 2017
34. A Histologic Categorization of Aqueous Outflow Routes in Familial Open-Angle Glaucoma and Associations With Mutations in the MYOC Gene in Japanese Patients
- Author
-
Kei Homma, Masayuki Yamamoto, Teruhiko Hamanaka, Naoki Nariai, Nobuo Ishida, Jun Yasuda, Masao Nagasaki, Tetsuro Sakurai, Atsushi Endo, Yoichi Matsubara, Fumiki Katsuoka, Masae Kimura, and Nobuo Fuse
- Subjects
0301 basic medicine ,Adult ,Male ,Intraocular pressure ,medicine.medical_specialty ,genetic structures ,Open angle glaucoma ,medicine.medical_treatment ,Glaucoma ,Trabeculectomy ,Limbus Corneae ,Polymerase Chain Reaction ,Aqueous Humor ,03 medical and health sciences ,0302 clinical medicine ,Asian People ,Japan ,Trabecular Meshwork ,Ophthalmology ,medicine ,Humans ,Exome ,Eye Proteins ,Myocilin ,Intraocular Pressure ,Aged ,Glycoproteins ,Schlemm's canal ,Aged, 80 and over ,Polymorphism, Genetic ,business.industry ,Middle Aged ,medicine.disease ,eye diseases ,Pedigree ,Cytoskeletal Proteins ,030104 developmental biology ,medicine.anatomical_structure ,Mutation ,030221 ophthalmology & optometry ,Female ,sense organs ,Trabecular meshwork ,business ,Biomarkers ,Glaucoma, Open-Angle - Abstract
Purpose This study evaluated specific relationships between pathogenic mechanisms and genetic polymorphisms in primary open-angle glaucoma (POAG). We analyzed the morphologies of trabeculectomy specimens obtained from patients with familial POAG. Methods We used light microscopy and transmission electron microscopy to examine specimens obtained from 17 eyes of 14 patients with familial POAG. We also conducted exome analyses of two families and used targeted Sanger sequencing to analyze samples obtained from the remaining patients. Results The POAG cases examined in this study were divided into two groups based on morphologic characteristics. Group A eyes (7 eyes from 5 patients) had an abnormally thick trabecular meshwork (TM), whereas group B eyes (10 eyes from 9 patients) had a TM of normal thickness. The characteristics of the outflow routes in group A eyes were remarkable and included apoptotic TM cells, abnormally thickened TM basement membranes, fused TM beams, and occluded Schlemm's canals. All group A patients harbored mutations (F369L, P370L, T377M, and T448P) in the myocilin (MYOC) gene that were not found in group B patients. Conclusions Although age matching of morphologic changes in the outflow routes was impossible due to the small sample size, this study suggests that abnormal TM cells may cause sequential damage in abnormally thickened TM basement membranes, TM cell apoptosis, TM beam fusion, and the occlusion of Schlemm's canals. The four detected MYOC mutations appeared to be associated with morphologic changes in the TM and the underlying pathogenesis of a subtype of familial POAG.
- Published
- 2017
35. TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference
- Author
-
Kaname Kojima, Naoki Nariai, Masao Nagasaki, and Osamu Hirose
- Subjects
Statistics and Probability ,Gene isoform ,Sequence analysis ,RNA-Seq ,Computational biology ,Biology ,computer.software_genre ,Bayesian inference ,Biochemistry ,Cell Line ,Bayes' theorem ,Expectation–maximization algorithm ,RNA Isoforms ,Humans ,Molecular Biology ,Sequence Analysis, RNA ,Gene Expression Profiling ,Alternative splicing ,Bayes Theorem ,Computer Science Applications ,Alternative Splicing ,Computational Mathematics ,Computational Theory and Mathematics ,Human genome ,Data mining ,Sequence Alignment ,computer ,Algorithms - Abstract
MOTIVATION Many human genes express multiple transcript isoforms through alternative splicing, which greatly increases diversity of protein function. Although RNA sequencing (RNA-Seq) technologies have been widely used in measuring amounts of transcribed mRNA, accurate estimation of transcript isoform abundances from RNA-Seq data is challenging because reads often map to more than one transcript isoforms or paralogs whose sequences are similar to each other. RESULTS We propose a statistical method to estimate transcript isoform abundances from RNA-Seq data. Our method can handle gapped alignments of reads against reference sequences so that it allows insertion or deletion errors within reads. The proposed method optimizes the number of transcript isoforms by variational Bayesian inference through an iterative procedure, and its convergence is guaranteed under a stopping criterion. On simulated datasets, our method outperformed the comparable quantification methods in inferring transcript isoform abundances, and at the same time its rate of convergence was faster than that of the expectation maximization algorithm. We also applied our method to RNA-Seq data of human cell line samples, and showed that our prediction result was more consistent among technical replicates than those of other methods. AVAILABILITY An implementation of our method is available at http://github.com/nariai/tigar CONTACT nariai@megabank.tohoku.ac.jp SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
- Published
- 2013
- Full Text
- View/download PDF
36. Establishing the involvement of the novel gene AGBL5 in retinitis pigmentosa by whole genome sequencing
- Author
-
John R. Heckenlively, Naoki Nariai, Kelly A. Frazer, Kari Branham, David Jakubosky, Paul A. Sieving, John Suk, Pooja Biswas, Tao Long, Michael A. Hicks, Amalio Telenti, Radha Ayyagari, He Li, Aditya A. Guru, and Hiroko Matsui
- Subjects
0301 basic medicine ,Male ,Adolescent ,Physiology ,DNA Mutational Analysis ,Carboxypeptidases ,Biology ,medicine.disease_cause ,Compound heterozygosity ,Genome ,03 medical and health sciences ,symbols.namesake ,Young Adult ,Retinitis pigmentosa ,Genetics ,medicine ,Electroretinography ,Humans ,Copy-number variation ,Gene ,Genetic Association Studies ,Whole genome sequencing ,Sanger sequencing ,Mutation ,Whole Genome Sequencing ,Retinal Degeneration ,General Interest ,medicine.disease ,Pedigree ,030104 developmental biology ,symbols ,Female ,Retinitis Pigmentosa - Abstract
While more than 250 genes are known to cause inherited retinal degenerations (IRD), nearly 40–50% of families have the genetic basis for their disease unknown. In this study we sought to identify the underlying cause of IRD in a family by whole genome sequence (WGS) analysis. Clinical characterization including standard ophthalmic examination, fundus photography, visual field testing, electroretinography, and review of medical and family history was performed. WGS was performed on affected and unaffected family members using Illumina HiSeq X10. Sequence reads were aligned to hg19 using BWA-MEM and variant calling was performed with Genome Analysis Toolkit. The called variants were annotated with SnpEff v4.11, PolyPhen v2.2.2, and CADD v1.3. Copy number variations were called using Genome STRiP (svtoolkit 2.00.1611) and SpeedSeq software. Variants were filtered to detect rare potentially deleterious variants segregating with disease. Candidate variants were validated by dideoxy sequencing. Clinical evaluation revealed typical adolescent-onset recessive retinitis pigmentosa (arRP) in affected members. WGS identified about 4 million variants in each individual. Two rare and potentially deleterious compound heterozygous variants p.Arg281Cys and p.Arg487* were identified in the gene ATP/GTP binding protein like 5 ( AGBL5) as likely causal variants. No additional variants in IRD genes that segregated with disease were identified. Mutation analysis confirmed the segregation of these variants with the IRD in the pedigree. Homology models indicated destabilization of AGBL5 due to the p.Arg281Cys change. Our findings establish the involvement of mutations in AGBL5 in RP and validate the WGS variant filtering pipeline we designed.
- Published
- 2016
37. Short tandem repeat number estimation from paired-end reads for multiple individuals by considering coalescent tree
- Author
-
Yosuke Kawai, Masao Nagasaki, Kaname Kojima, Takahiro Mimori, Naoki Nariai, and Takanori Hasegawa
- Subjects
0301 basic medicine ,0206 medical engineering ,02 engineering and technology ,Biology ,Belief propagation ,Coalescent theory ,03 medical and health sciences ,symbols.namesake ,Genetics ,Humans ,Computer Simulation ,Short tandem repeat ,1000 Genomes Project ,Sequence ,Models, Statistical ,High-throughput sequencing ,Genome, Human ,Research ,Markov chain Monte Carlo ,Statistical model ,Sequence Analysis, DNA ,Tree (graph theory) ,Variable number tandem repeat ,030104 developmental biology ,symbols ,Algorithm ,020602 bioinformatics ,Algorithms ,Biotechnology ,Microsatellite Repeats - Abstract
Background Two types of approaches are mainly considered for the repeat number estimation in short tandem repeat (STR) regions from high-throughput sequencing data: approaches directly counting repeat patterns included in sequence reads spanning the region and approaches based on detecting the difference between the insert size inferred from aligned paired-end reads and the actual insert size. Although the accuracy of repeat numbers estimated with the former approaches is high, the size of target STR regions is limited to the length of sequence reads. On the other hand, the latter approaches can handle STR regions longer than the length of sequence reads. However, repeat numbers estimated with the latter approaches is less accurate than those with the former approaches. Results We proposed a new statistical model named coalescentSTR that estimates repeat numbers from paired-end read distances for multiple individuals simultaneously by connecting the read generative model for each individual with their genealogy. In the model, the genealogy is represented by handling coalescent trees as hidden variables, and the summation of the hidden variables is taken on coalescent trees sampled based on phased genotypes located around a target STR region with Markov chain Monte Carlo. In the sampled coalescent trees, repeat number information from insert size data is propagated, and more accurate estimation of repeat numbers is expected for STR regions longer than the length of sequence reads. For finding the repeat numbers maximizing the likelihood of the model on the estimation of repeat numbers, we proposed a state-of-the-art belief propagation algorithm on sampled coalescent trees. Conclusions We verified the effectiveness of the proposed approach from the comparison with existing methods by using simulation datasets and real whole genome and whole exome data for HapMap individuals analyzed in the 1000 Genomes Project.
- Published
- 2016
38. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals
- Author
-
Rumiko Saito, Kaname Kojima, Kaoru Tsuda, Atsushi Hozawa, Yukuto Sato, Nobuo Fuse, Yosuke Kawai, Shin Ito, Shigeo Kure, Junji Yokozawa, Inaho Danjoh, Masao Nagasaki, Hideyasu Kiyomoto, Yoko Kuroki, Takahiro Mimori, Yumi Yamaguchi-Kabata, Xiaoqing Pan, Fumiki Katsuoka, Kengo Kinoshita, Naoki Nariai, Shinichi Kuriyama, Sakae Saito, Osamu Tanabe, Jun Yasuda, Masayuki Yamamoto, Naoko Minegishi, James Douglas Engel, Satoshi Nishikawa, and Nobuo Yaegashi
- Subjects
Genetics ,Whole genome sequencing ,Multidisciplinary ,Genome, Human ,Haplotype ,General Physics and Astronomy ,Genetic Variation ,General Chemistry ,Biology ,Genome ,General Biochemistry, Genetics and Molecular Biology ,Article ,3. Good health ,Minor allele frequency ,Asian People ,Haplotypes ,Genotype ,Genetic variation ,Humans ,Human genome ,Genetic association - Abstract
The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of, The Tohoku Medical Megabank Organization establishes a biobank with detailed patient health care and genome information. Here the authors analyse whole-genome sequences of 1,070 Japanese individuals, allowing them to catalogue 21 million single-nucleotide variants including 12 million novel ones.
- Published
- 2015
39. Systematic genetic analysis of the MHC region reveals mechanistic underpinnings of HLA type associations with disease.
- Author
-
D'Antonio, Matteo, Reyna, Joaquin, Jakubosky, David, Donovan, Margaret K. R., Bonder, Marc-Jan, Hiroko Matsui, Stegle, Oliver, Naoki Nariai, D'Antonio-Chronowska, Agnieszka, and Frazer, Kelly A.
- Published
- 2019
- Full Text
- View/download PDF
40. Estimating gene regulatory networks and protein-protein interactions of Saccharomyces cerevisiae from multiple genome-wide data
- Author
-
Satoru Miyano, Naoki Nariai, Seiya Imoto, and Yoshinori Tamada
- Subjects
Statistics and Probability ,Saccharomyces cerevisiae Proteins ,Saccharomyces cerevisiae ,Gene regulatory network ,Computational biology ,Biology ,Biochemistry ,Genome ,Protein–protein interaction ,Databases, Genetic ,Protein Interaction Mapping ,Computer Simulation ,Molecular Biology ,Gene ,Genetics ,Models, Genetic ,Markov chain ,Gene Expression Profiling ,Chromosome Mapping ,Bayesian network ,biology.organism_classification ,Computer Science Applications ,Computational Mathematics ,Gene Expression Regulation ,Computational Theory and Mathematics ,Algorithms ,Biological network ,Signal Transduction - Abstract
Motivation: Biological processes in cells are properly performed by gene regulations, signal transductions and interactions between proteins. To understand such molecular networks, we propose a statistical method to estimate gene regulatory networks and protein--protein interaction networks simultaneously from DNA microarray data, protein--protein interaction data and other genome-wide data. Results: We unify Bayesian networks and Markov networks for estimating gene regulatory networks and protein--protein interaction networks according to the reliability of each biological information source. Through the simultaneous construction of gene regulatory networks and protein--protein interaction networks of Saccharomyces cerevisiae cell cycle, we predict the role of several genes whose functions are currently unknown. By using our probabilistic model, we can detect false positives of high-throughput data, such as yeast two-hybrid data. In a genome-wide experiment, we find possible gene regulatory relationships and protein--protein interactions between large protein complexes that underlie complex regulatory mechanisms of biological processes. Contact: nariai@ims.u-tokyo.ac.jp
- Published
- 2005
- Full Text
- View/download PDF
41. HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data
- Author
-
Yukuto Sato, Naoki Nariai, Yumi Yamaguchi-Kabata, Kaname Kojima, Masao Nagasaki, Yosuke Kawai, Takahiro Mimori, Jun Yasuda, and Sakae Saito
- Subjects
Genotype ,Human leukocyte antigen ,Biology ,Data sequences ,Gene Frequency ,HLA Antigens ,Genetics ,Humans ,Allele ,Allele frequency ,Gene ,Alleles ,Whole genome sequencing ,Internet ,Polymorphism, Genetic ,Genome, Human ,Histocompatibility Testing ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,Bayes Theorem ,Proceedings ,Primer (molecular biology) ,DNA microarray ,Algorithms ,Biotechnology - Abstract
Background Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data. Results We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimizes read alignments to HLA allele sequences and abundance of reads on HLA alleles by variational Bayesian inference. We show the effectiveness of the proposed method over other methods through the analysis of predicting HLA types for HLA class I (HLA-A, -B and -C) and class II (HLA-DQA1,-DQB1 and -DRB1) loci from the simulation data of various depth of coverage, and real sequencing data of human trio samples. Conclusions HLA-VBSeq is an efficient and accurate HLA typing method using high-throughput sequencing data without the need of primer design for HLA loci. Moreover, it does not assume any prior knowledge about HLA allele frequencies, and hence HLA-VBSeq is broadly applicable to human samples obtained from a genetically diverse population.
- Published
- 2015
- Full Text
- View/download PDF
42. TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads
- Author
-
Yosuke Kawai, Yumi Yamaguchi-Kabata, Masao Nagasaki, Kaname Kojima, Naoki Nariai, Yukuto Sato, and Takahiro Mimori
- Subjects
Pipeline (computing) ,Sequence assembly ,RNA-Seq ,Computational biology ,Biology ,Bayes' theorem ,RNA Isoforms ,Complementary DNA ,Genetics ,graphical models ,Humans ,RNA, Messenger ,Sequence Analysis, RNA ,Gene Expression Profiling ,Research ,Computational Biology ,Genetic Variation ,Bayes Theorem ,Transcript isoform quantification ,Gene expression profiling ,DNA microarray ,variational Bayesian inference ,Algorithms ,Software ,Biotechnology ,HeLa Cells - Abstract
Background High-throughput RNA sequencing (RNA-Seq) enables quantification and identification of transcripts at single-base resolution. Recently, longer sequence reads become available thanks to the development of new types of sequencing technologies as well as improvements in chemical reagents for the Next Generation Sequencers. Although several computational methods have been proposed for quantifying gene expression levels from RNA-Seq data, they are not sufficiently optimized for longer reads (e.g. > 250 bp). Results We propose TIGAR2, a statistical method for quantifying transcript isoforms from fixed and variable length RNA-Seq data. Our method models substitution, deletion, and insertion errors of sequencers based on gapped-alignments of reads to the reference cDNA sequences so that sensitive read-aligners such as Bowtie2 and BWA-MEM are effectively incorporated in our pipeline. Also, a heuristic algorithm is implemented in variational Bayesian inference for faster computation. We apply TIGAR2 to both simulation data and real data of human samples and evaluate performance of transcript quantification with TIGAR2 in comparison to existing methods. Conclusions TIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data. Our method performs better than existing methods for the fixed-length reads (100 bp, 250 bp, 500 bp, and 1000 bp of both single-end and paired-end) and variable-length reads, especially for reads longer than 250 bp.
- Published
- 2015
43. Large-Scale Profiling Reveals the Influence of Genetic Variation on Gene Expression in Human Induced Pluripotent Stem Cells
- Author
-
Bing Ren, Katrina M. Olson, David Jakubosky, Kelly A. Frazer, Erin N. Smith, Kristen Jepsen, Efren Sandoval, Naoki Nariai, Hui Huang, Angelo Arias, Paola Benaglio, Matteo D’Antonio, He Li, Hiroko Matsui, Christopher DeBoever, Agnieszka D'Antonio-Chronowska, Joaquin Reyna, Emma K. Farley, and William H. Biggs
- Subjects
0301 basic medicine ,Regulatory Sequences, Nucleic Acid ,Medical and Health Sciences ,0302 clinical medicine ,stem cell gene expression ,2.1 Biological and endogenous factors ,Stem Cell Research - Induced Pluripotent Stem Cell - Non-Human ,Aetiology ,Induced pluripotent stem cell ,Regulation of gene expression ,Genetics ,Stem Cell Research - Induced Pluripotent Stem Cell - Human ,expression quantitative trait loci ,Biological Sciences ,Cellular Reprogramming ,Regulatory sequence ,Molecular Medicine ,regulation of gene expression ,Human ,Biotechnology ,DNA Copy Number Variations ,Quantitative Trait Loci ,Induced Pluripotent Stem Cells ,Pair-rule gene ,Biology ,eQTL ,Gene dosage ,Article ,Chromosomes ,Genetic Heterogeneity ,03 medical and health sciences ,Humans ,Stem Cell Research - Embryonic - Human ,Gene ,Chromosomes, Human, X ,Binding Sites ,Nucleic Acid ,Stem Cell Research - Induced Pluripotent Stem Cell ,Gene Expression Profiling ,Human Genome ,Genetic Variation ,Molecular Sequence Annotation ,Cell Biology ,Stem Cell Research ,Gene expression profiling ,030104 developmental biology ,Gene Expression Regulation ,Expression quantitative trait loci ,gene expression ,stem cell genetics ,Generic health relevance ,Regulatory Sequences ,030217 neurology & neurosurgery ,Transcription Factors ,Developmental Biology - Abstract
In this study, we used whole genome sequencing and gene expression profiling of 215 human induced pluripotent stem cell (iPSC) lines from different donors to identify genetic variants associated with RNA expression for 5,746 genes. We were able to predict causal variants for these expression quantitative trait loci (eQTLs) that disrupt transcription factor binding and validated a subset of them experimentally. We also identified copy number variant (CNV) eQTLs, including some that appear to affect gene expression by altering the copy number of intergenic regulatory regions. In addition, we were able to identify effects on gene expression of rare genic CNVs and regulatory single nucleotide variants, and found that reactivation of gene expression on the X chromosome depends on gene chromosomal position. Our work highlights the value of iPSCs for genetic association analyses and provides a unique resource for investigating the genetic regulation of gene expression in pluripotent cells.
- Published
- 2017
- Full Text
- View/download PDF
44. Validation of multiple single nucleotide variation calls by additional exome analysis with a semiconductor sequencer to supplement data of whole-genome sequencing of a human population
- Author
-
Yukuto Sato, Kengo Kinoshita, Xiaoqing Pan, Masayuki Yamamoto, Inaho Danjoh, Shin Ito, Mitsuyo Matsumoto, Kaoru Tsuda, Naoki Nariai, Tomo Saito, Matsuyuki Shirota, Rumiko Saito, Satoshi Nishikawa, Junji Yokozawa, Ikuko N. Motoike, Hisaaki Kudo, Masao Nagasaki, Ichiko Nishijima, Kaname Kojima, Naoko Minegishi, Osamu Tanabe, Jun Yasuda, Kazuhiko Igarashi, Nobuo Fuse, Sakae Saito, Fumiki Katsuoka, and Yumi Yamaguchi-Kabata
- Subjects
Male ,Semiconductor-type sequencer ,Population genetics ,Population ,Genomics ,Biology ,Polymorphism, Single Nucleotide ,Deep sequencing ,Single nucleotide variations ,Next-generation sequencer ,Human population genetics ,Genetics ,Humans ,Exome ,education ,Exome sequencing ,Whole genome sequencing ,Whole-genome sequencing ,Base Composition ,education.field_of_study ,Genome, Human ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,Sequence Analysis, DNA ,SNP genotyping ,Semiconductors ,Female ,Research Article ,Biotechnology - Abstract
Background Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously. Results Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%. Conclusions Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users.
- Published
- 2014
- Full Text
- View/download PDF
45. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing
- Author
-
Naoki Nariai, Kaname Kojima, Takahiro Mimori, Masao Nagasaki, Mamoru Takahashi, Yosuke Kawai, Yukuto Sato, and Yumi Yamaguchi-Kabata
- Subjects
media_common.quotation_subject ,Population ,MiSeq ,Statistics as Topic ,Detailed data ,Computational biology ,Biology ,computer.software_genre ,Computer graphics ,Data cleaning ,User-Computer Interface ,Data sequences ,Software ,Genetics ,Computer Graphics ,Humans ,Quality (business) ,education ,Graphical user interface ,media_common ,education.field_of_study ,business.industry ,Illumina HiSeq ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Automated analysis ,Sequence Analysis, DNA ,High-Throughput DNA Sequencing ,NGS ,Data mining ,business ,computer ,Biotechnology - Abstract
Background Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. Results We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. Conclusion The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.
- Published
- 2014
46. HapMonster: A Statistically Unified Approach for Variant Calling and Haplotyping Based on Phase-Informative Reads
- Author
-
Takahiro Mimori, Yosuke Kawai, Yukuto Sato, Yumi Yamaguchi-Kabata, Kaname Kojima, Masao Nagasaki, and Naoki Nariai
- Subjects
Computer science ,Sequencing data ,Haplotype ,Inference ,Computational biology ,DNA sequencing ,Coalescent theory - Abstract
Haplotype phasing is essential for identifying disease-causing variants with phase-dependent interactions as well as for the coalescent-based inference of demographic history. One of approaches for estimating haplotypes is to use phase-informative reads, which span multiple heterozygous variant positions. Although the quality of estimated variants is crucial in haplotype phasing, accurate variant calling is still challenging due to errors on sequencing and read mapping. Since some of such errors can be corrected by considering haplotype phasing, simultaneous estimation of variants and haplotypes is important. Thus, we propose a statistically unified approach for variant calling and haplotype phasing named HapMonster, where haplotype phasing information is used for improving the accuracy of variant calling and the improved variant calls are used for more accurate haplotype phasing. From the comparison with other existing methods on simulation and real sequencing data, we confirm the effectiveness of HapMonster in both variant calling and haplotype phasing.
- Published
- 2014
- Full Text
- View/download PDF
47. SVEM: A Structural Variant Estimation Method Using Multi-mapped Reads on Breakpoints
- Author
-
Yumi Yamaguchi-Kabata, Yosuke Kawai, Kaname Kojima, Yukuto Sato, Masao Nagasaki, Naoki Nariai, Testuo Shibuya, Takahiro Mimori, and Tomohiko Ohtsuki
- Subjects
education.field_of_study ,Computer science ,Population ,Breakpoint ,computer.software_genre ,DNA sequencing ,Identification (information) ,Expectation–maximization algorithm ,Human genome ,Data mining ,education ,Precision and recall ,computer ,Reference genome - Abstract
Recent development of next generation sequencing (NGS) technologies has led to the identification of structural variants (SVs) of genomic DNA existing in the human population. Several SV detection methods utilizing NGS data have been proposed. However, there are several difficulties in analysis of NGS data, particularly with regard to handling reads from duplicated loci or low-complexity sequences of the human genome. In this paper, we propose SVEM, a novel statistical method to detect SVs with a single nucleotide resolution that can utilize multi-mapped reads on breakpoints. SVEM estimates the amount of reads on breakpoints as parameters and mapping states as latent variables using the expectation maximization algorithm. This framework enables us to handle ambiguous mapping of reads without discarding information for SV detection. SVEM is applied to simulation data and real data, and it achieves better performance than existing methods in terms of precision and recall.
- Published
- 2014
- Full Text
- View/download PDF
48. iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
- Author
-
Mamoru Takahashi, Takahiro Mimori, Yumi Yamaguchi-Kabata, Kaname Kojima, Masao Nagasaki, Naoki Nariai, Yukuto Sato, and Akira Ono
- Subjects
Genetics ,Whole genome sequencing ,Time Factors ,Genome, Human ,Research ,Applied Mathematics ,Pipeline (computing) ,High-Throughput Nucleotide Sequencing ,Genomics ,Computational biology ,Biology ,Genome ,DNA sequencing ,Computer Science Applications ,Structural Biology ,Modelling and Simulation ,Modeling and Simulation ,Humans ,Human genome ,1000 Genomes Project ,International HapMap Project ,Molecular Biology ,Algorithms ,Sequence Deletion - Abstract
Background: Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies has led to the discovery of a large number of SVs, accurate and genome-wide detection of SVs remains challenging. Thus far, various calling algorithms based on NGS data have been proposed. However, their strategies are diverse and there is no tool able to detect a full range of SVs accurately. Results: We focused on evaluating the performance of existing deletion calling algorithms for various spanning ranges from low- to high-coverage simulation data. The simulation data was generated from a whole genome sequence with artificial SVs constructed based on the distribution of variants obtained from the 1000 Genomes Project. From the simulation analysis, deletion calls of various deletion sizes were obtained with each caller, and it was found that the performance was quite different according to the type of algorithms and targeting deletion size. Based on these results, we propose an integrated structural variant calling pipeline (iSVP) that combines existing methods with a newly devised filtering and merging processes. It achieved highly accurate deletion calling with >90% precision and >90% recall on the 30× read data for a broad range of size. We applied iSVP to the whole-genome sequence data of a CEU HapMap sample, and detected a large number of deletions, including notable peaks around 300 bp and 6,000 bp, which corresponded to Alus and long interspersed nuclear elements, respectively. In addition, many of the predicted deletions were highly consistent with experimentally validated ones by other studies. Conclusions: We present iSVP, a new deletion calling pipeline to obtain a genome-wide landscape of deletions in a highly accurate manner. From simulation and real data analysis, we show that iSVP is broadly applicable to human whole-genome sequencing data, which will elucidate relationships between SVs across genomes and associated diseases or biological functions.
- Published
- 2013
- Full Text
- View/download PDF
49. A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads
- Author
-
Yumi Yamaguchi-Kabata, Takahiro Mimori, Masao Nagasaki, Kaname Kojima, Mamoru Takahashi, Naoki Nariai, and Yukuto Sato
- Subjects
Statistics and Probability ,Genotyping Techniques ,Sequence analysis ,Pedigree information ,Computational biology ,Biology ,Biochemistry ,Polymorphism, Single Nucleotide ,Humans ,Molecular Biology ,Genotyping ,Sequence (medicine) ,Genetics ,Models, Statistical ,Haplotype ,Genetic Variation ,Coverage data ,Genomics ,Sequence Analysis, DNA ,New variant ,Computer Science Applications ,Pedigree ,Computational Mathematics ,Computational Theory and Mathematics ,Haplotypes - Abstract
Motivation: Variant calling from genome-wide sequencing data is essential for the analysis of disease-causing mutations and elucidation of disease mechanisms. However, variant calling in low coverage regions is difficult due to sequence read errors and mapping errors. Hence, variant calling approaches that are robust to low coverage data are demanded. Results: We propose a new variant calling approach that considers pedigree information and haplotyping based on sequence reads spanning two or more heterozygous positions termed phase informative reads. In our approach, genotyping and haplotyping by the assignment of each read to a haplotype based on phase informative reads are simultaneously performed. Therefore, positions with low evidence for heterozygosity are rescued by phase informative reads, and such rescued positions contribute to haplotyping in a synergistic way. In addition, pedigree information supports more accurate haplotyping as well as genotyping, especially in low coverage regions. Although heterozygous positions are useful for haplotyping, homozygous positions are not informative and weaken the information from heterozygous positions, as majority of positions are homozygous. Thus, we introduce latent variables that determine zygosity at each position to filter out homozygous positions for haplotyping. In performance evaluation with a parent–offspring trio sequencing data, our approach outperforms existing approaches in accuracy on the agreement with single nucleotide polymorphism array genotyping results. Also, performance analysis considering distance between variants showed that the use of phase informative reads is effective for accurate variant calling, and further performance improvement is expected with longer sequencing data. Contact: nagasaki@megabank.tohoku.ac.jp or kojima@megabank.tohoku.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2013
50. Wnt3a stimulates maturation of impaired neutrophils developed from severe congenital neutropenia patient-derived pluripotent stem cells
- Author
-
Takafumi Hiramoto, Kazuhiro Nakamura, Masao Kobayashi, Yasuhiro Ebihara, Hiromitsu Nakauchi, Yoichi Furukawa, Kohichiro Tsuji, Naoki Nariai, Shinji Mochizuki, Kiyoshi Yamaguchi, Yoko Mizoguchi, Kazuko Ueno, Masao Nagasaki, Shohei Yamamoto, and Kenzaburo Tani
- Subjects
Pluripotent Stem Cells ,endocrine system ,animal structures ,Neutropenia ,Neutrophils ,Cellular differentiation ,Granulopoiesis ,Polymerase Chain Reaction ,Wnt3A Protein ,Granulocyte Colony-Stimulating Factor ,medicine ,Humans ,Induced pluripotent stem cell ,Congenital Neutropenia ,Multidisciplinary ,biology ,Dose-Response Relationship, Drug ,Reverse Transcriptase Polymerase Chain Reaction ,Elastase ,Cell Differentiation ,Biological Sciences ,medicine.disease ,Granulocyte colony-stimulating factor ,nervous system ,Neutrophil elastase ,Immunology ,Mutation ,biology.protein ,sense organs ,Leukocyte Elastase ,hormones, hormone substitutes, and hormone antagonists - Abstract
The derivation of induced pluripotent stem (iPS) cells from individuals of genetic disorders offers new opportunities for basic research into these diseases and the development of therapeutic compounds. Severe congenital neutropenia (SCN) is a serious disorder characterized by severe neutropenia at birth. SCN is associated with heterozygous mutations in the neutrophil elastase [elastase, neutrophil-expressed (ELANE)] gene, but the mechanisms that disrupt neutrophil development have not yet been clarified because of the current lack of an appropriate disease model. Here, we generated iPS cells from an individual with SCN (SCN-iPS cells). Granulopoiesis from SCN-iPS cells revealed neutrophil maturation arrest and little sensitivity to granulocyte-colony stimulating factor, reflecting a disease status of SCN. Molecular analysis of the granulopoiesis from the SCN-iPS cells vs. control iPS cells showed reduced expression of genes related to the wingless-type mmtv integration site family, member 3a (Wnt3a)/β-catenin pathway [e.g., lymphoid enhancer-binding factor 1], whereas Wnt3a administration induced elevation lymphoid enhancer-binding factor 1-expression and the maturation of SCN-iPS cell-derived neutrophils. These results indicate that SCN-iPS cells provide a useful disease model for SCN, and the activation of the Wnt3a/β-catenin pathway may offer a novel therapy for SCN with ELANE mutation.
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.