19 results on '"Basile AO"'
Search Results
2. High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.
- Author
-
Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson ZB, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN Jr, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, and Miller DE
- Subjects
- Humans, Human Genome Project, Polymorphism, Single Nucleotide, High-Throughput Nucleotide Sequencing methods, Sequence Analysis, DNA methods, Nanopore Sequencing methods, Genome, Human, Genetic Variation
- Abstract
Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs., (© 2024 Gustafson et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2024
- Full Text
- View/download PDF
3. Multi-omic analysis of Huntington's disease reveals a compensatory astrocyte state.
- Author
-
Paryani F, Kwon JS, Ng CW, Jakubiak K, Madden N, Ofori K, Tang A, Lu H, Xia S, Li J, Mahajan A, Davidson SM, Basile AO, McHugh C, Vonsattel JP, Hickman R, Zody MC, Housman DE, Goldman JE, Yoo AS, Menon V, and Al-Dalahmah O
- Subjects
- Humans, Huntingtin Protein genetics, Huntingtin Protein metabolism, Male, Female, Lipidomics methods, Middle Aged, Metallothionein metabolism, Metallothionein genetics, Brain metabolism, Brain pathology, Lipid Metabolism, Aged, Multiomics, Huntington Disease metabolism, Huntington Disease genetics, Huntington Disease pathology, Astrocytes metabolism, Astrocytes pathology, Neurons metabolism
- Abstract
The mechanisms underlying the selective regional vulnerability to neurodegeneration in Huntington's disease (HD) have not been fully defined. To explore the role of astrocytes in this phenomenon, we used single-nucleus and bulk RNAseq, lipidomics, HTT gene CAG repeat-length measurements, and multiplexed immunofluorescence on HD and control post-mortem brains. We identified genes that correlated with CAG repeat length, which were enriched in astrocyte genes, and lipidomic signatures that implicated poly-unsaturated fatty acids in sensitizing neurons to cell death. Because astrocytes play essential roles in lipid metabolism, we explored the heterogeneity of astrocytic states in both protoplasmic and fibrous-like (CD44+) astrocytes. Significantly, one protoplasmic astrocyte state showed high levels of metallothioneins and was correlated with the selective vulnerability of distinct striatal neuronal populations. When modeled in vitro, this state improved the viability of HD-patient-derived spiny projection neurons. Our findings uncover key roles of astrocytic states in protecting against neurodegeneration in HD., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
4. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation.
- Author
-
Gustafson JA, Gibson SB, Damaraju N, Zalusky MP, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson Z, Storz SH, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN Jr, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, and Miller DE
- Abstract
Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs., Competing Interests: COMPETING INTEREST STATEMENT WDC, ML, FS, and DEM have received research support and/or consumables from ONT. WDC, JG, FS, and DEM have received travel funding to speak on behalf of ONT. DEM is on a scientific advisory board at ONT. FS has received research support from Illumina, Genetech, and PacBio. SBM is an advisor to BioMarin, MyOme, and Tenaya Therapeutics. EEE is a scientific advisory board (SAB) member of Variant Bio, Inc. DEM holds stock options in MyOme.
- Published
- 2024
- Full Text
- View/download PDF
5. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.
- Author
-
Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K, Fairley S, Runnels A, Winterkorn L, Lowy E, Paul Flicek, Germer S, Brand H, Hall IM, Talkowski ME, Narzisi G, and Zody MC
- Subjects
- Female, High-Throughput Nucleotide Sequencing methods, Humans, INDEL Mutation, Male, Polymorphism, Single Nucleotide, Genome, Human, Whole Genome Sequencing
- Abstract
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies., Competing Interests: Declaration of interests E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. P.F. is an SAB member of Fabric Genomics, Inc., and Eagle Genomics, Ltd., (Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
6. Retromer dysfunction in amyotrophic lateral sclerosis.
- Author
-
Pérez-Torres EJ, Utkina-Sosunova I, Mishra V, Barbuti P, De Planell-Saguer M, Dermentzaki G, Geiger H, Basile AO, Robine N, Fagegaltier D, Politi KA, Rinchetti P, Jackson-Lewis V, Harms M, Phatnani H, Lotti F, and Przedborski S
- Subjects
- Animals, Disease Models, Animal, Humans, Mice, Mice, Transgenic, Spinal Cord metabolism, Superoxide Dismutase-1 genetics, Superoxide Dismutase-1 metabolism, Amyotrophic Lateral Sclerosis metabolism, Vesicular Transport Proteins genetics, Vesicular Transport Proteins metabolism
- Abstract
Retromer is a heteropentameric complex that plays a specialized role in endosomal protein sorting and trafficking. Here, we report a reduction in the retromer proteins-vacuolar protein sorting 35 (VPS35), VPS26A, and VPS29-in patients with amyotrophic lateral sclerosis (ALS) and in the ALS model provided by transgenic (Tg) mice expressing the mutant superoxide dismutase-1 G93A. These changes are accompanied by a reduction of levels of the α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid receptor subunit GluA1, a proxy of retromer function, in spinal cords from Tg SOD1
G93A mice. Correction of the retromer deficit by a viral vector expressing VPS35 exacerbates the paralytic phenotype in Tg SOD1G93A mice. Conversely, lowering Vps35 levels in Tg SOD1G93A mice ameliorates the disease phenotype. In light of these findings, we propose that mild alterations in retromer inversely modulate neurodegeneration propensity in ALS.- Published
- 2022
- Full Text
- View/download PDF
7. Haplotype-resolved diverse human genomes and integrated analysis of structural variation.
- Author
-
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu TY, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, and Eichler EE
- Subjects
- Female, Genotype, High-Throughput Nucleotide Sequencing, Humans, INDEL Mutation, Interspersed Repetitive Sequences, Male, Population Groups genetics, Quantitative Trait Loci, Retroelements, Sequence Analysis, DNA, Sequence Inversion, Whole Genome Sequencing, Genetic Variation, Genome, Human, Haplotypes
- Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population., (Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.)
- Published
- 2021
- Full Text
- View/download PDF
8. A Polygenic and Phenotypic Risk Prediction for Polycystic Ovary Syndrome Evaluated by Phenome-Wide Association Studies.
- Author
-
Joo YY, Actkins K, Pacheco JA, Basile AO, Carroll R, Crosslin DR, Day F, Denny JC, Velez Edwards DR, Hakonarson H, Harley JB, Hebbring SJ, Ho K, Jarvik GP, Jones M, Karaderi T, Mentch FD, Meun C, Namjou B, Pendergrass S, Ritchie MD, Stanaway IB, Urbanek M, Walunas TL, Smith M, Chisholm RL, Kho AN, Davis L, and Hayes MG
- Subjects
- Adolescent, Aged, Case-Control Studies, Child, Electronic Health Records, Female, Follow-Up Studies, Genetic Predisposition to Disease, Humans, Middle Aged, Polycystic Ovary Syndrome epidemiology, Polycystic Ovary Syndrome genetics, Prognosis, Risk Factors, Algorithms, Genome-Wide Association Study, Multifactorial Inheritance genetics, Phenomics methods, Phenotype, Polycystic Ovary Syndrome diagnosis
- Abstract
Context: As many as 75% of patients with polycystic ovary syndrome (PCOS) are estimated to be unidentified in clinical practice., Objective: Utilizing polygenic risk prediction, we aim to identify the phenome-wide comorbidity patterns characteristic of PCOS to improve accurate diagnosis and preventive treatment., Design, Patients, and Methods: Leveraging the electronic health records (EHRs) of 124 852 individuals, we developed a PCOS risk prediction algorithm by combining polygenic risk scores (PRS) with PCOS component phenotypes into a polygenic and phenotypic risk score (PPRS). We evaluated its predictive capability across different ancestries and perform a PRS-based phenome-wide association study (PheWAS) to assess the phenomic expression of the heightened risk of PCOS., Results: The integrated polygenic prediction improved the average performance (pseudo-R2) for PCOS detection by 0.228 (61.5-fold), 0.224 (58.8-fold), 0.211 (57.0-fold) over the null model across European, African, and multi-ancestry participants respectively. The subsequent PRS-powered PheWAS identified a high level of shared biology between PCOS and a range of metabolic and endocrine outcomes, especially with obesity and diabetes: "morbid obesity", "type 2 diabetes", "hypercholesterolemia", "disorders of lipid metabolism", "hypertension", and "sleep apnea" reaching phenome-wide significance., Conclusions: Our study has expanded the methodological utility of PRS in patient stratification and risk prediction, especially in a multifactorial condition like PCOS, across different genetic origins. By utilizing the individual genome-phenome data available from the EHR, our approach also demonstrates that polygenic prediction by PRS can provide valuable opportunities to discover the pleiotropic phenomic network associated with PCOS pathogenesis., (© Endocrine Society 2020. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2020
- Full Text
- View/download PDF
9. Artificial Intelligence for Drug Toxicity and Safety.
- Author
-
Basile AO, Yahi A, and Tatonetti NP
- Subjects
- Animals, Drug Evaluation, Preclinical, Drug-Related Side Effects and Adverse Reactions prevention & control, Humans, Machine Learning, Pharmacovigilance, Product Surveillance, Postmarketing, Quantitative Structure-Activity Relationship, Toxicity Tests, Adverse Drug Reaction Reporting Systems, Artificial Intelligence
- Abstract
Interventional pharmacology is one of medicine's most potent weapons against disease. These drugs, however, can result in damaging side effects and must be closely monitored. Pharmacovigilance is the field of science that monitors, detects, and prevents adverse drug reactions (ADRs). Safety efforts begin during the development process, using in vivo and in vitro studies, continue through clinical trials, and extend to postmarketing surveillance of ADRs in real-world populations. Future toxicity and safety challenges, including increased polypharmacy and patient diversity, stress the limits of these traditional tools. Massive amounts of newly available data present an opportunity for using artificial intelligence (AI) and machine learning to improve drug safety science. Here, we explore recent advances as applied to preclinical drug safety and postmarketing surveillance with a specific focus on machine and deep learning (DL) approaches., (Copyright © 2019 Elsevier Ltd. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
10. Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico.
- Author
-
Zhang X, Basile AO, Pendergrass SA, and Ritchie MD
- Subjects
- Humans, Models, Genetic, Research Design, Computer Simulation standards, Genetic Association Studies methods, Sample Size
- Abstract
Background: The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses., Results: We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression., Conclusions: Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses.
- Published
- 2019
- Full Text
- View/download PDF
11. Informatics and machine learning to define the phenotype.
- Author
-
Basile AO and Ritchie MD
- Subjects
- Humans, Genetic Predisposition to Disease, Genome-Wide Association Study methods, Machine Learning, Phenotype
- Abstract
Introduction: For the past decade, the focus of complex disease research has been the genotype. From technological advancements to the development of analysis methods, great progress has been made. However, advances in our definition of the phenotype have remained stagnant. Phenotype characterization has recently emerged as an exciting area of informatics and machine learning. The copious amounts of diverse biomedical data that have been collected may be leveraged with data-driven approaches to elucidate trait-related features and patterns. Areas covered: In this review, the authors discuss the phenotype in traditional genetic associations and the challenges this has imposed.Approaches for phenotype refinement that can aid in more accurate characterization of traits are also discussed. Further, the authors highlight promising machine learning approaches for establishing a phenotype and the challenges of electronic health record (EHR)-derived data. Expert commentary: The authors hypothesize that through unsupervised machine learning, data-driven approaches can be used to define phenotypes rather than relying on expert clinician knowledge. Through the use of machine learning and an unbiased set of features extracted from clinical repositories, researchers will have the potential to further understand complex traits and identify patient subgroups. This knowledge may lead to more preventative and precise clinical care.
- Published
- 2018
- Full Text
- View/download PDF
12. Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants.
- Author
-
Basile AO, Byrska-Bishop M, Wallace J, Frase AT, and Ritchie MD
- Subjects
- Algorithms, Genomics methods, Genetic Association Studies methods, Genetic Variation, Software
- Abstract
Motivation: BioBin is an automated bioinformatics tool for the multi-level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements., Results: In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool., Availability and Implementation: The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin-download., Contact: mdritchie@geisinger.edu., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2017. Published by Oxford University Press.)
- Published
- 2018
- Full Text
- View/download PDF
13. Session Introduction: Challenges of Pattern Recognition in Biomedical Data.
- Author
-
Verma SS, Verma A, Basile AO, Bishop MB, and Darabos C
- Abstract
The analysis of large biomedical data often presents with various challenges related to not just the size of the data, but also to data quality issues such as heterogeneity, multidimensionality, noisiness, and incompleteness of the data. The data-intensive nature of computational genomics problems in biomedical informatics warrants the development and use of massive computer infrastructure and advanced software tools and platforms, including but not limited to the use of cloud computing. Our session aims to address these challenges in handling big data for designing a study, performing analysis, and interpreting outcomes of these analyses. These challenges have been prevalent in many studies including those which focus on the identification of novel genetic variant-phenotype associations using data from sources like Electronic Health Records (EHRs) or multi-omic data. One of the biggest challenges to focus on is the imperfect nature of the biomedical data where a lot of noise and sparseness is observed. In our session, we will present research articles that can help in identifying innovative ways to recognize and overcome newly arising challenges associated with pattern recognition in biomedical data.
- Published
- 2018
14. PLATO software provides analytic framework for investigating complexity beyond genome-wide association studies.
- Author
-
Hall MA, Wallace J, Lucas A, Kim D, Basile AO, Verma SS, McCarty CA, Brilliant MH, Peissig PL, Kitchner TE, Verma A, Pendergrass SA, Dudek SM, Moore JH, and Ritchie MD
- Subjects
- Alcohol Drinking, Alleles, Databases, Genetic, Diabetes Mellitus, Type 2 genetics, Diet, Epistasis, Genetic, Gene Deletion, Gene Dosage, Gene-Environment Interaction, Genomics, Genotype, Glutamate Decarboxylase genetics, Humans, Models, Genetic, Phenotype, Polymorphism, Single Nucleotide, Programming Languages, Recurrence, Sequence Analysis, DNA, Software, Surveys and Questionnaires, Computational Biology, Genome, Human, Genome-Wide Association Study
- Abstract
Genome-wide, imputed, sequence, and structural data are now available for exceedingly large sample sizes. The needs for data management, handling population structure and related samples, and performing associations have largely been met. However, the infrastructure to support analyses involving complexity beyond genome-wide association studies is not standardized or centralized. We provide the PLatform for the Analysis, Translation, and Organization of large-scale data (PLATO), a software tool equipped to handle multi-omic data for hundreds of thousands of samples to explore complexity using genetic interactions, environment-wide association studies and gene-environment interactions, phenome-wide association studies, as well as copy number and rare variant analyses. Using the data from the Marshfield Personalized Medicine Research Project, a site in the electronic Medical Records and Genomics Network, we apply each feature of PLATO to type 2 diabetes and demonstrate how PLATO can be used to uncover the complex etiology of common traits.
- Published
- 2017
- Full Text
- View/download PDF
15. Knowledge-driven binning approach for rare variant association analysis: application to neuroimaging biomarkers in Alzheimer's disease.
- Author
-
Kim D, Basile AO, Bang L, Horgusluoglu E, Lee S, Ritchie MD, Saykin AJ, and Nho K
- Subjects
- Aged, Aged, 80 and over, Biomarkers, Exons, Female, Genome-Wide Association Study, Genomics, Humans, Male, Middle Aged, Neuroimaging, Phenotype, Alzheimer Disease diagnostic imaging, Alzheimer Disease genetics, Data Mining methods
- Abstract
Background: Rapid advancement of next generation sequencing technologies such as whole genome sequencing (WGS) has facilitated the search for genetic factors that influence disease risk in the field of human genetics. To identify rare variants associated with human diseases or traits, an efficient genome-wide binning approach is needed. In this study we developed a novel biological knowledge-based binning approach for rare-variant association analysis and then applied the approach to structural neuroimaging endophenotypes related to late-onset Alzheimer's disease (LOAD)., Methods: For rare-variant analysis, we used the knowledge-driven binning approach implemented in Bin-KAT, an automated tool, that provides 1) binning/collapsing methods for multi-level variant aggregation with a flexible, biologically informed binning strategy and 2) an option of performing unified collapsing and statistical rare variant analyses in one tool. A total of 750 non-Hispanic Caucasian participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort who had both WGS data and magnetic resonance imaging (MRI) scans were used in this study. Mean bilateral cortical thickness of the entorhinal cortex extracted from MRI scans was used as an AD-related neuroimaging endophenotype. SKAT was used for a genome-wide gene- and region-based association analysis of rare variants (MAF (minor allele frequency) < 0.05) and potential confounding factors (age, gender, years of education, intracranial volume (ICV) and MRI field strength) for entorhinal cortex thickness were used as covariates. Significant associations were determined using FDR adjustment for multiple comparisons., Results: Our knowledge-driven binning approach identified 16 functional exonic rare variants in FANCC significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In addition, the approach identified 7 evolutionary conserved regions, which were mapped to FAF1, RFX7, LYPLAL1 and GOLGA3, significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In further analysis, the functional exonic rare variants in FANCC were also significantly associated with hippocampal volume and cerebrospinal fluid (CSF) Aβ
1-42 (p-value < 0.05)., Conclusions: Our novel binning approach identified rare variants in FANCC as well as 7 evolutionary conserved regions significantly associated with a LOAD-related neuroimaging endophenotype. FANCC (fanconi anemia complementation group C) has been shown to modulate TLR and p38 MAPK-dependent expression of IL-1β in macrophages. Our results warrant further investigation in a larger independent cohort and demonstrate that the biological knowledge-driven binning approach is a powerful strategy to identify rare variants associated with AD and other complex disease.- Published
- 2017
- Full Text
- View/download PDF
16. PATTERNS IN BIOMEDICAL DATA-HOW DO WE FIND THEM?
- Author
-
Basile AO, Verma A, Byrska-Bishop M, Pendergrass SA, Darabos C, and Lester Kirchner H
- Abstract
Given the exponential growth of biomedical data, researchers are faced with numerous challenges in extracting and interpreting information from these large, high-dimensional, incomplete, and often noisy data. To facilitate addressing this growing concern, the "Patterns in Biomedical Data-How do we find them?" session of the 2017 Pacific Symposium on Biocomputing (PSB) is devoted to exploring pattern recognition using data-driven approaches for biomedical and precision medicine applications. The papers selected for this session focus on novel machine learning techniques as well as applications of established methods to heterogeneous data. We also feature manuscripts aimed at addressing the current challenges associated with the analysis of biomedical data.
- Published
- 2017
- Full Text
- View/download PDF
17. A biologically informed method for detecting rare variant associations.
- Author
-
Moore CCB, Basile AO, Wallace JR, Frase AT, and Ritchie MD
- Abstract
Background: BioBin is a bioinformatics software package developed to automate the process of binning rare variants into groups for statistical association analysis using a biological knowledge-driven framework. BioBin collapses variants into biological features such as genes, pathways, evolutionary conserved regions (ECRs), protein families, regulatory regions, and others based on user-designated parameters. BioBin provides the infrastructure to create complex and interesting hypotheses in an automated fashion thereby circumventing the necessity for advanced and time consuming scripting., Purpose of the Study: In this manuscript, we describe the software package for BioBin, along with type I error and power simulations to demonstrate the strengths and various customizable features and analysis options of this variant binning tool., Results: Simulation testing highlights the utility of BioBin as a fast, comprehensive and expandable tool for the biologically-inspired binning and analysis of low-frequency variants in sequence data., Conclusions and Potential Implications: The BioBin software package has the capability to transform and streamline the analysis pipelines for researchers analyzing rare variants. This automated bioinformatics tool minimizes the manual effort of creating genomic regions for binning such that time can be spent on the much more interesting task of statistical analyses. This software package is open source and freely available from http://ritchielab.com/software/biobin-download.
- Published
- 2016
- Full Text
- View/download PDF
18. Phenome-Wide Association Study to Explore Relationships between Immune System Related Genetic Loci and Complex Traits and Diseases.
- Author
-
Verma A, Basile AO, Bradford Y, Kuivaniemi H, Tromp G, Carey D, Gerhard GS, Crowe JE Jr, Ritchie MD, and Pendergrass SA
- Subjects
- Ankyrins genetics, Diabetes Mellitus, Type 2 genetics, Diabetes Mellitus, Type 2 pathology, Electronic Health Records, Genetic Loci, Genotype, Humans, Linkage Disequilibrium, Nerve Tissue Proteins genetics, Phenotype, Polymorphism, Single Nucleotide, Respiratory Tract Infections genetics, Respiratory Tract Infections pathology, Sinusitis genetics, Sinusitis pathology, Tumor Necrosis Factor-alpha genetics, Genetic Association Studies, Immune System metabolism
- Abstract
We performed a Phenome-Wide Association Study (PheWAS) to identify interrelationships between the immune system genetic architecture and a wide array of phenotypes from two de-identified electronic health record (EHR) biorepositories. We selected variants within genes encoding critical factors in the immune system and variants with known associations with autoimmunity. To define case/control status for EHR diagnoses, we used International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes from 3,024 Geisinger Clinic MyCode® subjects (470 diagnoses) and 2,899 Vanderbilt University Medical Center BioVU biorepository subjects (380 diagnoses). A pooled-analysis was also carried out for the replicating results of the two data sets. We identified new associations with potential biological relevance including SNPs in tumor necrosis factor (TNF) and ankyrin-related genes associated with acute and chronic sinusitis and acute respiratory tract infection. The two most significant associations identified were for the C6orf10 SNP rs6910071 and "rheumatoid arthritis" (ICD-9 code category 714) (pMETAL = 2.58 x 10-9) and the ATN1 SNP rs2239167 and "diabetes mellitus, type 2" (ICD-9 code category 250) (pMETAL = 6.39 x 10-9). This study highlights the utility of using PheWAS in conjunction with EHRs to discover new genotypic-phenotypic associations for immune-system related genetic loci.
- Published
- 2016
- Full Text
- View/download PDF
19. KNOWLEDGE DRIVEN BINNING AND PHEWAS ANALYSIS IN MARSHFIELD PERSONALIZED MEDICINE RESEARCH PROJECT USING BIOBIN.
- Author
-
Basile AO, Wallace JR, Peissig P, McCarty CA, Brilliant M, and Ritchie MD
- Subjects
- Computational Biology methods, Computational Biology statistics & numerical data, Computer Simulation, Databases, Genetic statistics & numerical data, Genetic Variation, High-Throughput Nucleotide Sequencing statistics & numerical data, Humans, Knowledge Bases, Models, Genetic, Models, Statistical, Pharmacogenetics statistics & numerical data, Precision Medicine statistics & numerical data, Genome-Wide Association Study statistics & numerical data, Phenotype, Software
- Abstract
Next-generation sequencing technology has presented an opportunity for rare variant discovery and association of these variants with disease. To address the challenges of rare variant analysis, multiple statistical methods have been developed for combining rare variants to increase statistical power for detecting associations. BioBin is an automated tool that expands on collapsing/binning methods by performing multi-level variant aggregation with a flexible, biologically informed binning strategy using an internal biorepository, the Library of Knowledge (LOKI). The databases within LOKI provide variant details, regional annotations and pathway interactions which can be used to generate bins of biologically-related variants, thereby increasing the power of any subsequent statistical test. In this study, we expand the framework of BioBin to incorporate statistical tests, including a dispersion-based test, SKAT, thereby providing the option of performing a unified collapsing and statistical rare variant analysis in one tool. Extensive simulation studies performed on gene-coding regions showed a Bin-KAT analysis to have greater power than BioBin-regression in all simulated conditions, including variants influencing the phenotype in the same direction, a scenario where burden tests often retain greater power. The use of Madsen- Browning variant weighting increased power in the burden analysis to that equitable with Bin-KAT; but overall Bin-KAT retained equivalent or higher power under all conditions. Bin-KAT was applied to a study of 82 pharmacogenes sequenced in the Marshfield Personalized Medicine Research Project (PMRP). We looked for association of these genes with 9 different phenotypes extracted from the electronic health record. This study demonstrates that Bin-KAT is a powerful tool for the identification of genes harboring low frequency variants for complex phenotypes.
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.