Author: "Taedong Yun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Taedong Yun"' showing total 15 results

Start Over Author "Taedong Yun"

15 results on '"Taedong Yun"'

1. Improving variant calling using population data and deep learning

Author: Nae-Chyun Chen, Alexey Kolesnikov, Sidharth Goel, Taedong Yun, Pi-Chuan Chang, and Andrew Carroll
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of variant calling, and are often limited to filtering which trades recall for precision. In this study, we develop population-aware DeepVariant models with a new channel encoding allele frequencies from the 1000 Genomes Project. This model reduces variant calling errors, improving both precision and recall in single samples, and reduces rare homozygous and pathogenic clinvar calls cohort-wide. We assess the use of population-specific or diverse reference panels, finding the greatest accuracy with diverse panels, suggesting that large, diverse panels are preferable to individual populations, even when the population matches sample ancestry. Finally, we show that this benefit generalizes to samples with different ancestry from the training data even when the ancestry is also excluded from the reference panel.
Published: 2023
Full Text: View/download PDF

2. DeepNull models non-linear covariate effects to improve phenotypic prediction and association power

Author: Zachary R. McCaw, Thomas Colthurst, Taedong Yun, Nicholas A. Furlotte, Andrew Carroll, Babak Alipanahi, Cory Y. McLean, and Farhad Hormozdiari
Subjects: Science
Abstract: GWAS often assume a linear phenotype-covariate relationship which may not hold in practice. Here the authors present DeepNull, in which they apply deep learning to identify and adjust for complex non-linear relationships, improving phenotypic prediction and GWAS power.
Published: 2022
Full Text: View/download PDF

3. A population-specific reference panel for improved genotype imputation in African Americans

Author: Jared O’Connell, Taedong Yun, Meghan Moreno, Helen Li, Nadia Litterman, Alexey Kolesnikov, Elizabeth Noblin, Pi-Chuan Chang, Anjali Shastri, Elizabeth H. Dorfman, Suyash Shringarpure, andMe Research Team, Adam Auton, Andrew Carroll, and Cory Y. McLean
Subjects: Biology (General), QH301-705.5
Abstract: O’Connell et al. construct a new genome-wide imputation reference panel comprising 2,269 individuals of Sub-Saharan African ancestries. They adapt DeepVariant to create best practices for reference panel development and generate a high quality, publicly available resource that will further empower high resolution genome-wide imputation efforts in individuals of African ancestries.
Published: 2021
Full Text: View/download PDF

4. Balanced labellings of affine permutations

Author: Hwanchul Yoo and Taedong Yun
Subjects: affine permutations, permutation diagrams, balanced labellings, reduced words, stanley symmetric functions, [info.info-dm] computer science [cs]/discrete mathematics [cs.dm], Mathematics, QA1-939
Abstract: We study the $\textit{diagrams}$ of affine permutations and their $\textit{balanced}$ labellings. As in the finite case, which was investigated by Fomin, Greene, Reiner, and Shimozono, the balanced labellings give a natural encoding of reduced decompositions of affine permutations. In fact, we show that the sum of weight monomials of the $\textit{column strict}$ balanced labellings is the affine Stanley symmetric function defined by Lam and we give a simple algorithm to recover reduced words from balanced labellings. Applying this theory, we give a necessary and sufficient condition for a diagram to be an affine permutation diagram. Finally, we conjecture that if two affine permutations are $\textit{diagram equivalent}$ then their affine Stanley symmetric functions coincide.
Published: 2013
Full Text: View/download PDF

5. SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Author: Steve Yadlowsky, Taedong Yun, Cory Y. McLean, and Alexander D'Amour
Published: 2021

6. Unsupervised representation learning improves genomic discovery for lung function and respiratory disease prediction

Author: Taedong Yun, Justin Cosentino, Babak Behsaz, Zachary R. McCaw, Davin Hill, Robert Luben, Dongbing Lai, John Bates, Howard Yang, Tae-Hwi Schwantes-An, Anthony P. Khawaja, Andrew Carroll, Brian D. Hobbs, Michael H. Cho, Cory Y. McLean, and Farhad Hormozdiari
Subjects: Article
Abstract: BackgroundHigh-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, accurately phenotyping high-dimensional clinical data remains a major impediment to genetic discovery.MethodsWe introduce a general deep learning framework, RE presentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute anon-linear, low-dimensional, disentangled embeddingof the data and can also incorporate expert clinical metrics. We demonstrate the utility of REGLE by application to spirograms, which measure lung function. We generate two types of synthetic representations of pulmonary functions we call spirogram encodings (SPINCs) and residual spirogram encodings (RSPINCs).FindingsGenome-wide association studies on (R)SPINCs identify more genome-wide significant loci than existing methods while replicating most known lung function loci. Furthermore, (R)SPINCs are associated with overall survival and, under the latent causal variable model, they exhibit significantly high genetic causality proportion with asthma, chronic obstructive pulmonary disease (COPD), and inflammatory diseases. Finally, we construct a set of polygenic risk scores (PRS) that are generally predictive of pulmonary traits and diseases. We demonstrate superior performance predicting asthma and COPD, in multiple ancestries and across four biobanks, compared to PRSs constructed using expert-defined pulmonary function measurements.InterpretationREGLE is a method for generating low-dimensional, disentangled representations of high-dimensional clinical data that does not require labels, and improves upon expert-defined phenotypes for genetic discovery and disease prediction. It can flexibly incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific PRS in datasets which have minimal expert phenotyping. (R)SPINCs are quantifying clinically relevant features that are not currently captured in a standardized or automated way.FundingGoogle LLC.
Published: 2023

7. Deep Learning Utilizing Suboptimal Spirometry Data to Improve Lung Function and Mortality Prediction in the UK Biobank

Author: Davin Hill, Max Torop, Aria Masoomi, Peter J. Castaldi, Edwin K. Silverman, Sandeep Bodduluri, Surya P. Bhatt, Taedong Yun, Cory Y. McLean, Farhad Hormozdiari, Jennifer Dy, Michael H. Cho, and Brian D. Hobbs
Subjects: Article
Abstract: BackgroundSpirometry measures lung function by selecting the best of multiple efforts meeting pre-specified quality control (QC), and reporting two key metrics: forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC). We hypothesize that discarded submaximal and QC-failing data meaningfully contribute to the prediction of airflow obstruction and all-cause mortality.MethodsWe evaluated volume-time spirometry data from the UK Biobank. We identified “best” spirometry efforts as those passing QC with the maximum FVC. “Discarded” efforts were either submaximal or failed QC. To create a combined representation of lung function we implemented a contrastive learning approach,Spirogram-basedContrastiveLearningFramework (Spiro-CLF), which utilized all recorded volume-time curves per participant and applied different transformations (e.g. flow-volume, flow-time). In a held-out 20% testing subset we applied the Spiro-CLF representation of a participant’s overall lung function to 1) binary predictions of FEV1/FVC < 0.7 and FEV1Percent Predicted (FEV1PP) < 80%, indicative of airflow obstruction, and 2) Cox regression for all-cause mortality.FindingsWe included 940,705 volume-time curves from 352,684 UK Biobank participants with 2-3 spirometry efforts per individual (66.7% with 3 efforts) and at least one QC-passing spirometry effort. Of all spirometry efforts, 24.1% failed QC and 37.5% were submaximal. Spiro-CLF prediction of FEV1/FVC < 0.7 utilizing discarded spirometry efforts had an Area under the Receiver Operating Characteristics (AUROC) of 0.981 (0.863 for FEV1PP prediction). Incorporating discarded spirometry efforts in all-cause mortality prediction was associated with a concordance index (c-index) of 0.654, which exceeded the c-indices from FEV1(0.590), FVC (0.559), or FEV1/FVC (0.599) from each participant’s single best effort.InterpretationA contrastive learning model using raw spirometry curves can accurately predict lung function using submaximal and QC-failing efforts. This model also has superior prediction of all-cause mortality compared to standard lung function measurements.FundingMHC is supported by NIH R01HL137927, R01HL135142, HL147148, and HL089856.BDH is supported by NIH K08HL136928, U01 HL089856, and an Alpha-1 Foundation Research Grant.DH is supported by NIH 2T32HL007427-41EKS is supported by NIH R01 HL152728, R01 HL147148, U01 HL089856, R01 HL133135, P01 HL132825, and P01 HL114501.PJC is supported by NIH R01HL124233 and R01HL147326.SPB is supported by NIH R01HL151421 and UH3HL155806.TY, FH, and CYM are employees of Google LLC
Published: 2023

8. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer

Author: Gunjan Baid, Daniel E. Cook, Kishwar Shafin, Taedong Yun, Felipe Llinares-López, Quentin Berthet, Anastasiya Belyaeva, Armin Töpfer, Aaron M. Wenger, William J. Rowell, Howard Yang, Alexey Kolesnikov, Waleed Ammar, Jean-Philippe Vert, Ashish Vaswani, Cory Y. McLean, Maria Nattestad, Pi-Chuan Chang, and Andrew Carroll
Subjects: Biomedical Engineering, Molecular Medicine, Bioengineering, Applied Microbiology and Biotechnology, Biotechnology
Abstract: Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10-25 kilobases), accurate 'HiFi' reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer-encoder for sequence correction. Compared to pbccs, DeepConsensus reduces read errors by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27% and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity ( NG50 4.9 megabases (Mb) to 17.2 Mb), increase gene completeness (94% to 97%), reduce the false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45) and reduce variant-calling errors by 24%. DeepConsensus models could be trained to the general problem of analyzing the alignment of other types of sequences, such as unique molecular identifiers or genome assemblies.
Published: 2022

9. DeepNull: Modeling non-linear covariate effects improves phenotype prediction and association power

Author: Thomas Colthurst, Farhad Hormozdiari, Taedong Yun, Babak Alipanahi, Cory Y. McLean, N.A. Furlotte, Andrew Carroll, and Zachary R. McCaw
Subjects: Statistical genetics, Genotype, Covariate, Genome-wide association study, Computational biology, Biology, Phenotype, Statistical power, Genetic association, Type I and type II errors
Abstract: Genome-wide association studies (GWAS) are among the workhorses of statistical genetics, having detected thousands of variants associated with complex traits and diseases. A typical GWAS examines the association between genotypes and the phenotype of interest while adjusting for a set of covariates. While covariates potentially have non-linear effects on the phenotype in many real world settings, due to the challenge of specifying the model, GWAS seldom include non-linear terms. Here we introduce DeepNull, a method that models non-linear covariate effects on phenotypes using a deep neural network (DNN) and then includes the model prediction as a single extra term in the GWAS association. First, using simulated data, we show that DeepNull increases statistical power by up to 20% while maintaining tight control of the type I error in the presence of interactions or non-linear covariate effects. Second, DeepNull maintains similar results to a standard GWAS when covariates have only linear effects on the phenotype. Third, DeepNull detects larger numbers of significant hits and loci (7% additional loci averaged over 10 traits) than standard GWAS in ten phenotypes from the UK Biobank (n=370K). Many of the hits found only by DeepNull are biologically plausible or have previously been reported in the GWAS catalog. Finally, DeepNull improves phenotype prediction by 23% averaged over the same ten phenotypes, the highest improvement was observed in the case of Glaucoma referral probability where DeepNull improves the phenotype prediction by 83%.
Published: 2021

10. DeepNull models non-linear covariate effects to improve phenotypic prediction and association power

Author: Zachary R. McCaw, Thomas Colthurst, Taedong Yun, Nicholas A. Furlotte, Andrew Carroll, Babak Alipanahi, Cory Y. McLean, and Farhad Hormozdiari
Subjects: Multidisciplinary, Science, General Physics and Astronomy, General Chemistry, General Biochemistry, Genetics and Molecular Biology, Article, Phenotype, Research Design, Genetics research, Linear Models, Computer Simulation, Genome-Wide Association Study, Genetic association study
Abstract: Genome-wide association studies (GWASs) examine the association between genotype and phenotype while adjusting for a set of covariates. Although the covariates may have non-linear or interactive effects, due to the challenge of specifying the model, GWAS often neglect such terms. Here we introduce DeepNull, a method that identifies and adjusts for non-linear and interactive covariate effects using a deep neural network. In analyses of simulated and real data, we demonstrate that DeepNull maintains tight control of the type I error while increasing statistical power by up to 20% in the presence of non-linear and interactive effects. Moreover, in the absence of such effects, DeepNull incurs no loss of power. When applied to 10 phenotypes from the UK Biobank (n = 370K), DeepNull discovered more hits (+6%) and loci (+7%), on average, than conventional association analyses, many of which are biologically plausible or have previously been reported. Finally, DeepNull improves upon linear modeling for phenotypic prediction (+23% on average)., GWAS often assume a linear phenotype-covariate relationship which may not hold in practice. Here the authors present DeepNull, in which they apply deep learning to identify and adjust for complex non-linear relationships, improving phenotypic prediction and GWAS power.
Published: 2021

11. DeepTrio: Variant Calling in Families Using Deep Learning

Author: Andrew Carroll, Pi-Chuan Chang, Sidharth Goel, Howard Yang, Taedong Yun, Alexey Kolesnikov, Gunjan Baid, Maria Nattestad, and Cory Y. McLean
Subjects: business.industry, Computer science, Deep learning, Encoding (memory), Inheritance (genetic algorithm), Context (language use), Artificial intelligence, Computational biology, Allele, business, Exome, Genome, Sequence (medicine)
Abstract: Every human inherits one copy of the genome from their mother and another from their father. Parental inheritance helps us understand the transmission of traits and genetic diseases, which often involve de novo variants and rare recessive alleles. Here we present DeepTrio, which learns to analyze child-mother-father trios from the joint sequence information, without explicit encoding of inheritance priors. DeepTrio learns how to weigh sequencing error, mapping error, and de novo rates and genome context directly from the sequence data. DeepTrio has higher accuracy on both Illumina and PacBio HiFi data when compared to DeepVariant. Improvements are especially pronounced at lower coverages (with 20x DeepTrio roughly equivalent to 30x DeepVariant). As DeepTrio learns directly from data, we also demonstrate extensions to exome calling solely by changing the training data. DeepTrio includes pre-trained models for Illumina WGS, Illumina exome, and PacBio HiFi.
Published: 2021

12. Improving variant calling using population data and deep learning

Author: Sidharth Goel, Taedong Yun, Nae-Chyun Chen, Alexey Kolesnikov, Andrew Carroll, and Pi-Chuan Chang
Subjects: education.field_of_study, business.industry, Computer science, Applied Mathematics, Deep learning, Population, Sample (statistics), Machine learning, computer.software_genre, Biochemistry, Computer Science Applications, Structural Biology, Artificial intelligence, 1000 Genomes Project, education, business, Molecular Biology, Allele frequency, computer
Abstract: Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of variant calling, and are often limited to filtering which trades recall for precision. In this study, we develop population-aware DeepVariant models with a new channel encoding allele frequencies from the 1000 Genomes Project. This model reduces variant calling errors, improving both precision and recall in single samples, and reduces rare homozygous and pathogenic clinvar calls cohort-wide. We assess the use of population-specific or diverse reference panels, finding the greatest accuracy with diverse panels, suggesting that large, diverse panels are preferable to individual populations, even when the population matches sample ancestry. Finally, we show that this benefit generalizes to samples with different ancestry from the training data even when the ancestry is also excluded from the reference panel.
Published: 2021

13. Accurate, scalable cohort variant calls using DeepVariant and GLnexus

Author: Pi-Chuan Chang, Taedong Yun, Andrew Carroll, Helen Li, Michael F. Lin, and Cory Y. McLean
Subjects: Statistics and Probability, AcademicSubjects/SCI01060, Computer science, computer.software_genre, Biochemistry, 03 medical and health sciences, Consistency (database systems), 0302 clinical medicine, Genetic variation, 1000 Genomes Project, Molecular Biology, 030304 developmental biology, 0303 health sciences, 030305 genetics & heredity, Genome Analysis, Pipeline (software), Original Papers, Computer Science Applications, Computational Mathematics, Computational Theory and Mathematics, Cohort, Scalability, Benchmark (computing), Data mining, computer, 030217 neurology & neurosurgery, Imputation (genetics)
Abstract: Motivation Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. Results We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. Availability and implementation We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2020
Full Text: View/download PDF

14. Rainbow Graphs and Switching Classes

Author: Taedong Yun, Suho Oh, and Hwanchul Yoo
Subjects: Mathematics::Combinatorics, General Mathematics, Nuclear Theory, Rainbow, Graph, Vertex (geometry), Combinatorics, Mathematics::Logic, Computer Science::Discrete Mathematics, FOS: Mathematics, Bijection, Mathematics - Combinatorics, Combinatorics (math.CO), Mathematics
Abstract: A rainbow graph is a graph that admits a vertex-coloring such that every color appears exactly once in the neighborhood of each vertex. We investigate some properties of rainbow graphs. In particular, we show that there is a bijection between the isomorphism classes of n-rainbow graphs on 2n vertices and the switching classes of graphs on n vertices., Added more reference, fixed some typos (revision for journal submission)
Published: 2013

15. RAINBOW GRAPHS AND SWITCHING CLASSES.

Author: SUHO OH, HWANCHUL YOO, and TAEDONG YUN
Subjects: GRAPH theory, COLOR, BIJECTIONS, ISOMORPHISM (Mathematics), INJECTIVE functions
Abstract: A rainbow graph is a graph that admits a vertex-coloring such that every color appears exactly once in the neighborhood of each vertex. We investigate some properties of rainbow graphs. In particular, we show that there is a bijection between the isomorphism classes of n-rainbow graphs on 2n vertices and the switching classes of graphs on n vertices. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

15 results on '"Taedong Yun"'

1. Improving variant calling using population data and deep learning

2. DeepNull models non-linear covariate effects to improve phenotypic prediction and association power

3. A population-specific reference panel for improved genotype imputation in African Americans

4. Balanced labellings of affine permutations

5. SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

6. Unsupervised representation learning improves genomic discovery for lung function and respiratory disease prediction

7. Deep Learning Utilizing Suboptimal Spirometry Data to Improve Lung Function and Mortality Prediction in the UK Biobank

8. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer

9. DeepNull: Modeling non-linear covariate effects improves phenotype prediction and association power

10. DeepNull models non-linear covariate effects to improve phenotypic prediction and association power

11. DeepTrio: Variant Calling in Families Using Deep Learning

12. Improving variant calling using population data and deep learning

13. Accurate, scalable cohort variant calls using DeepVariant and GLnexus

14. Rainbow Graphs and Switching Classes

15. RAINBOW GRAPHS AND SWITCHING CLASSES.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

15 results on '"Taedong Yun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources