11 results on '"Masao Ueki"'
Search Results
2. Smooth-Threshold Multivariate Genetic Prediction with Unbiased Model Selection
- Author
-
Gen Tamiya and Masao Ueki
- Subjects
0301 basic medicine ,Elastic net regularization ,Multivariate statistics ,Epidemiology ,Single-nucleotide polymorphism ,Best linear unbiased prediction ,Polymorphism, Single Nucleotide ,01 natural sciences ,Article ,010104 statistics & probability ,03 medical and health sciences ,Quantitative Trait, Heritable ,Lasso (statistics) ,Alzheimer Disease ,Statistics ,Humans ,0101 mathematics ,Genetics (clinical) ,Mathematics ,Genetic association ,Models, Genetic ,Genome, Human ,business.industry ,Model selection ,Reproducibility of Results ,Pattern recognition ,Genomics ,Regression ,Phenotype ,030104 developmental biology ,Research Design ,Regression Analysis ,Artificial intelligence ,business ,Algorithms ,Genome-Wide Association Study - Abstract
We develop a new genetic prediction method, smooth-threshold multivariate genetic prediction, using single nucleotide polymorphisms (SNPs) data in genome-wide association studies (GWASs). Our method consists of two stages. At the first stage, unlike the usual discontinuous SNP screening as used in the gene score method, our method continuously screens SNPs based on the output from standard univariate analysis for marginal association of each SNP. At the second stage, the predictive model is built by a generalized ridge regression simultaneously using the screened SNPs with SNP weight determined by the strength of marginal association. Continuous SNP screening by the smooth-thresholding not only makes prediction stable but also leads to a closed form expression of generalized degrees of freedom (GDF). The GDF leads to the Stein’s unbiased risk estimation (SURE) which enables data-dependent choice of optimal SNP screening cutoff without using cross-validation. Our method is very rapid because computationally expensive genome-wide scan is required only once in contrast to the penalized regression methods including lasso and elastic net. Simulation studies which mimic real GWAS data with quantitative and binary traits demonstrate that the proposed method outperforms the gene score method and genomic best linear unbiased prediction (GBLUP), and also shows comparable or sometimes improved performance with the lasso and elastic net being known to have good predictive ability but with heavy computational cost. Application to whole-genome sequencing (WGS) data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) exhibits that the proposed method shows higher predictive power than the gene score and GBLUP methods.
- Published
- 2016
- Full Text
- View/download PDF
3. Quick assessment for systematic test statistic inflation/deflation due to null model misspecifications in genome-wide environment interaction studies
- Author
-
Gen Tamiya, for Alzheimer’s Disease Neuroimaging Initiative, Masahiro Fujii, and Masao Ueki
- Subjects
0301 basic medicine ,Heredity ,Genomics Statistics ,Computer science ,Test Statistics ,Biochemistry ,Mathematical and Statistical Techniques ,0302 clinical medicine ,Missing heritability problem ,Metabolites ,Econometrics ,030212 general & internal medicine ,Multidisciplinary ,Mathematical Models ,Approximation Methods ,Statistics ,Regression analysis ,Genomics ,Genetic Mapping ,Identification (information) ,Physical Sciences ,Medicine ,Algorithms ,Research Article ,Science ,Variant Genotypes ,Correlation and dependence ,Research and Analysis Methods ,03 medical and health sciences ,Alzheimer Disease ,Covariate ,Genome-Wide Association Studies ,Genetics ,Test statistic ,Humans ,Computer Simulation ,Statistical Methods ,Statistical hypothesis testing ,Models, Statistical ,Models, Genetic ,Null model ,Biology and Life Sciences ,Computational Biology ,Human Genetics ,Genome Analysis ,Metabolism ,030104 developmental biology ,Genetic Loci ,Gene-Environment Interaction ,Null hypothesis ,Mathematics ,Genome-Wide Association Study - Abstract
Gene-environment (GxE) interaction is one potential explanation for the missing heritability problem. A popular approach to genome-wide environment interaction studies (GWEIS) is based on regression models involving interactions between genetic variants and environment variables. Unfortunately, GWEIS encounters systematically inflated (or deflated) test statistics more frequently than a marginal association study. The problematic behavior may occur due to poor specification of the null model (i.e. the model without genetic effect) in GWEIS. Improved null model specification may resolve the problem, but the investigation requires many time-consuming analyses of genome-wide scans, e.g. by trying out several transformations of the phenotype. It is therefore helpful if we can predict such problematic behavior beforehand. We present a simple closed-form formula to assess problematic behavior of GWEIS under the null hypothesis of no genetic effects. It requires only phenotype, environment variables, and covariates, enabling quick identification of systematic test statistic inflation or deflation. Applied to real data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), our formula identified problematic studies from among hundreds GWEIS considering each metabolite as the environment variable in GxE interaction. Our formula is useful to quickly identify problematic GWEIS without requiring a genome-wide scan.
- Published
- 2019
4. Multiple choice from competing regression models under multicollinearity based on standardized update
- Author
-
Yoshinori Kawasaki and Masao Ueki
- Subjects
Statistics and Probability ,Variance inflation factor ,Applied Mathematics ,Model selection ,Regression analysis ,Stepwise regression ,Computational Mathematics ,Computational Theory and Mathematics ,Goodness of fit ,Bayesian information criterion ,Multicollinearity ,Statistics ,Econometrics ,Akaike information criterion ,Mathematics - Abstract
This paper proposes a new method for choosing regression models which may produce multiple models with sufficient explanatory power and parsimony unlike the traditional model selection procedures that aim at obtaining a single best model. The method ensures interpretability of the resulting models even under strong multicollinearity. The algorithm proceeds in the forward stepwise manner with two requirements for the selected regression models to be fulfilled: goodness of fit and the magnitude of update in loss functions. For the latter criterion, the standardized update is newly introduced, which is closely related with the model selection criteria including the Mallows' C"p, Akaike information criterion and Bayesian information criterion. Simulation studies demonstrate that the proposed algorithm works well with and without strong multicollinearity and even with many explanatory variables. Application to real data is also provided.
- Published
- 2013
- Full Text
- View/download PDF
5. Fast score test with global null estimation regardless of missing genotypes
- Author
-
Shuntaro Sato, Alzheimer’s Disease Neuroimaging Initiative, and Masao Ueki
- Subjects
0301 basic medicine ,Heredity ,Test Statistics ,lcsh:Medicine ,Alzheimer's Disease ,Wald test ,Logistic regression ,Mathematical and Statistical Techniques ,0302 clinical medicine ,Statistics ,Medicine and Health Sciences ,lcsh:Science ,Mathematics ,Multidisciplinary ,Mathematical Models ,Simulation and Modeling ,Neurodegenerative Diseases ,Regression analysis ,Genomics ,Research Assessment ,Genetic Mapping ,Phenotype ,Neurology ,Physical Sciences ,Algorithms ,Statistics (Mathematics) ,Research Article ,Type I and type II errors ,Score test ,Genotype ,Neuroimaging ,Variant Genotypes ,Research and Analysis Methods ,Polymorphism, Single Nucleotide ,Molecular Genetics ,03 medical and health sciences ,Alzheimer Disease ,Mental Health and Psychiatry ,Genome-Wide Association Studies ,Genetics ,Humans ,Computer Simulation ,Genetic Predisposition to Disease ,Statistical Methods ,Molecular Biology ,Research Errors ,Statistical hypothesis testing ,Models, Genetic ,lcsh:R ,Null (mathematics) ,Biology and Life Sciences ,Computational Biology ,Human Genetics ,Genome Analysis ,030104 developmental biology ,Likelihood-ratio test ,Dementia ,lcsh:Q ,030217 neurology & neurosurgery ,Genome-Wide Association Study - Abstract
In genome-wide association studies (GWASs) for binary traits (or case-control samples) in the presence of covariates to be adjusted for, researchers often use a logistic regression model to test variants for disease association. Popular tests include Wald, likelihood ratio, and score tests. For likelihood ratio test and Wald test, maximum likelihood estimation (MLE), which requires iterative procedure, must be computed for each single nucleotide polymorphism (SNP). In contrast, the score test only requires MLE under the null model, being lower in computational cost than other tests. Usually, genotype data include missing genotypes because of assay failures. It loses computational efficiency in the conventional score test (CST), which requires null estimation by excluding individuals with missing genotype for each SNP. In this study, we propose two new score tests, called PM1 and PM2, that use a single global null estimator for all SNPs regardless of missing genotypes, thereby enabling faster computation than CST. We prove that PM2 and CST have an equivalent asymptotic power and that the power of PM1 is asymptotically lower than that of PM2. We evaluate the performance of the proposed methods in terms of type I error rates and power by simulation studies and application to real GWAS data provided by the Alzheimer’s Disease Neuroimaging Initiative (ADNI), confirming our theoretical results. ADNI-GWAS application demonstrated that the proposed score tests improve computational speed about 6?18 times faster than the existing tests, CST, Wald tests and likelihood ratio tests. Our score tests are general and applicable to other regression models., PLoS ONE, 13(7), e0199692; 2018
- Published
- 2018
6. A bias correction and acceleration approach for the problem of regions
- Author
-
Masao Ueki and Kaoru Fueda
- Subjects
Statistics and Probability ,Statistics::Theory ,Applied Mathematics ,Bootstrap aggregating ,Model selection ,Edgeworth series ,Acceleration ,Statistics ,Statistics::Methodology ,Probability distribution ,Applied mathematics ,p-value ,Statistics, Probability and Uncertainty ,Jackknife resampling ,Mathematics ,Statistical hypothesis testing - Abstract
For testing the problem of regions in the space of distribution functions, this paper considers approaches to modify the bootstrap probability to be a second-order accurate p -value based on the familiar bias correction and acceleration method. It is shown that Shimodaira's [2004a. Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling. Ann. Statist. 32, 2616–2641] twostep-multiscale bootstrap method works even in the problem of regions in functional space. In this paper the bias correction quantity is estimated by his onestep-multiscale bootstrap method. Instead of using the twostep-multiscale bootstrap method, the acceleration constant is estimated by a newly proposed jackknife method which requires first-level bootstrap resamplings only. Some numerical examples are illustrated, in which an application to testing significance in model selection is included.
- Published
- 2009
- Full Text
- View/download PDF
7. Optimal tuning parameter estimation in maximum penalized likelihood method
- Author
-
Masao Ueki and Kaoru Fueda
- Subjects
Statistics and Probability ,Generalized linear model ,Spline (mathematics) ,Series (mathematics) ,Estimation theory ,Statistics ,Perspective (graphical) ,Applied mathematics ,Optimal tuning ,Cross-validation ,Selection (genetic algorithm) ,Mathematics - Abstract
In maximum penalized or regularized methods, it is important to select a tuning parameter appropriately. This paper proposes a direct plug-in method for tuning parameter selection. The tuning parameters selected using a generalized information criterion (Konishi and Kitagawa, Biometrika, 83, 875–890, 1996) and cross-validation (Stone, Journal of the Royal Statistical Society, Series B, 58, 267–288, 1974) are shown to be asymptotically equivalent to those selected using the proposed method, from the perspective of estimation of an optimal tuning parameter. Because of its directness, the proposed method is superior to the two selection methods mentioned above in terms of computational cost. Some numerical examples which contain the penalized spline generalized linear model regressions are provided.
- Published
- 2008
- Full Text
- View/download PDF
8. Boosting local quasi-likelihood estimators
- Author
-
Masao Ueki and Kaoru Fueda
- Subjects
Statistics and Probability ,Generalized linear model ,ComputingMethodologies_PATTERNRECOGNITION ,Quasi-likelihood ,Boosting (machine learning) ,Statistics ,Applied mathematics ,Kernel regression ,Estimator ,Gradient boosting ,Maximization ,Regression ,Mathematics - Abstract
For likelihood-based regression contexts, including generalized linear models, this paper presents a boosting algorithm for local constant quasi-likelihood estimators. Its advantages are the following: (a) the one-boosted estimator reduces bias in local constant quasi-likelihood estimators without increasing the order of the variance, (b) the boosting algorithm requires only one-dimensional maximization at each boosting step and (c) the resulting estimators can be written explicitly and simply in some practical cases.
- Published
- 2008
- Full Text
- View/download PDF
9. Adjusting estimative prediction limits
- Author
-
Masao Ueki and Kaoru Fueda
- Subjects
Statistics and Probability ,Stochastic process ,Applied Mathematics ,General Mathematics ,Filtering theory ,Coverage probability ,Coverage error ,Estimator ,Agricultural and Biological Sciences (miscellaneous) ,Value (economics) ,Statistics ,Limit (mathematics) ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,Mathematics - Abstract
This note presents a direct adjustment of the estimative prediction limit to reduce the coverage error from a target value to third-order accuracy. The adjustment is asymptotically equivalent to those of Barndorff-Nielsen & Cox (1994, 1996) and Vidoni (1998). It has a simpler form with a plug-in estimator of the coverage probability of the estimative limit at the target value. Copyright 2007, Oxford University Press.
- Published
- 2007
- Full Text
- View/download PDF
10. On the choice of degrees of freedom for testing gene-gene interactions
- Author
-
Masao Ueki
- Subjects
Statistics and Probability ,Contingency table ,Models, Statistical ,Epidemiology ,Computer science ,Degrees of freedom (statistics) ,Normal Distribution ,Regression analysis ,Logistic regression ,Polymorphism, Single Nucleotide ,Logistic Models ,Likelihood-ratio test ,Linear regression ,Statistics ,Null distribution ,Humans ,Computer Simulation ,Prospective Studies ,Type I and type II errors ,Genome-Wide Association Study ,Retrospective Studies - Abstract
In gene-gene interaction analysis using single nucleotide polymorphism (SNP) data, empty cells arise in the genotype contingency table more frequently than in single SNP association studies. Empty cells lead to unidentifiable regression coefficients in regression model fitting. It is unclear whether the degrees of freedom (d.f.) for testing interactions are reduced for such sparse contingency tables. Boolean Operation based Screening and Testing is an exhaustive gene-gene interaction search method in which a fixed d.f. of four (the most conservative choice) is used in the chi-squared null distribution for the likelihood ratio test for gene-gene interactions under a logistic regression model. In this paper, the choice of d.f. is investigated theoretically by introducing a decomposition of type I error. An adaptive method using the observed d.f. can be less conservative than the fixed d.f. method, thereby enhancing power. In simulated data, type I error rates for the adaptive method were usually better controlled under various scenarios for Gaussian linear regression and logistic regression, including prospective and retrospective sampling designs, as well as for artificial data that mimic actual genome-wide SNPs. When the adaptive method was applied to public datasets generated from simulations, it exhibited an improvement in power over the fixed method.
- Published
- 2013
11. Improved statistics for genome-wide interaction analysis
- Author
-
Masao Ueki and Heather J. Cordell
- Subjects
Cancer Research ,lcsh:QH426-470 ,Epidemiology ,Word error rate ,Genes, Recessive ,Biology ,Polymorphism, Single Nucleotide ,03 medical and health sciences ,0302 clinical medicine ,Genetic model ,Statistics ,Range (statistics) ,Genetics ,Humans ,Psoriasis ,Disease ,Gene Regulatory Networks ,Genetic Predisposition to Disease ,Molecular Biology ,Computerized Simulations ,Mathematical Computing ,Genetics (clinical) ,Ecology, Evolution, Behavior and Systematics ,Statistic ,030304 developmental biology ,Statistical hypothesis testing ,Genes, Dominant ,0303 health sciences ,Numerical Analysis ,Models, Statistical ,Models, Genetic ,Nonparametric statistics ,Computational Biology ,MicroRNAs ,lcsh:Genetics ,Logistic Models ,Haplotypes ,Genetic Loci ,Computer Science ,Mutation ,Medicine ,Null hypothesis ,030217 neurology & neurosurgery ,Mathematics ,Type I and type II errors ,Research Article ,Genome-Wide Association Study - Abstract
Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result., Author Summary Gene–gene interactions are a topic of great interest to geneticists carrying out studies of how genetic factors influence the development of common, complex diseases. Genes that interact may not only make important biological contributions to underlying disease processes, but also be more difficult to detect when using standard statistical methods in which we examine the effects of genetic factors one at a time. Recently a method was proposed by Wu and colleagues [1] for detecting pairwise interactions when carrying out genome-wide association studies (in which a large number of genetic variants across the genome are examined). Wu and colleagues carried out theoretical work and computer simulations that suggested their method outperformed other previously proposed approaches for detecting interactions. Here we show that, in fact, the method proposed by Wu and colleagues can result in an over-preponderence of false postive findings. We propose an adjusted version of their method that reduces the false positive rate while maintaining high power. We also propose a new method for detecting pairs of genetic effects that shows similarly high power but has some conceptual advantages over both Wu's method and also other previously proposed approaches.
- Published
- 2012
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.