5,896 results on '"Statistical hypothesis testing"'
Search Results
2. Posterior-based Wald-type statistics for hypothesis testing
- Author
-
Tao Zeng, Yong Li, Xiaobin Liu, and Jun Yu
- Subjects
Economics and Econometrics ,Applied Mathematics ,Posterior probability ,Markov chain Monte Carlo ,Covariance ,Wald test ,Statistics::Computation ,symbols.namesake ,Standard error ,Statistics ,Test statistic ,symbols ,Statistics::Methodology ,Statistic ,Mathematics ,Statistical hypothesis testing - Abstract
A new Wald-type statistic is proposed for hypothesis testing based on Bayesian posterior distributions under the correct model specification. The new statistic can be explained as a posterior version of the Wald statistic and has several nice properties. First, it is well-defined under improper prior distributions. Second, it avoids Jeffreys–Lindley–Bartlett’s paradox. Third, under the null hypothesis and repeated sampling, it follows a χ 2 distribution asymptotically, offering an asymptotically pivotal test. Fourth, it only requires inverting the posterior covariance for parameters of interest. Fifth and perhaps most importantly, when a random sample from the posterior distribution (such as MCMC output) is available, the proposed statistic can be easily obtained as a by-product of posterior simulation. In addition, the numerical standard error of the estimated proposed statistic can be computed based on random samples. A robust version of the test statistic is developed under model misspecification and inherits many nice properties of the new posterior statistic. The finite sample performance of the statistics is examined in Monte Carlo studies. The method is applied to two latent variable models used in microeconometrics and financial econometrics.
- Published
- 2022
3. Volatility of volatility: Estimation and tests based on noisy high frequency data with jumps
- Author
-
Zhiyuan Zhang, Yingying Li, and Guangying Liu
- Subjects
Economics and Econometrics ,Applied Mathematics ,05 social sciences ,Null (mathematics) ,Estimator ,Interval (mathematics) ,01 natural sciences ,Nonlinear Sciences::Chaotic Dynamics ,010104 statistics & probability ,Rate of convergence ,0502 economics and business ,Applied mathematics ,0101 mathematics ,Volatility (finance) ,Null hypothesis ,050205 econometrics ,Mathematics ,Central limit theorem ,Statistical hypothesis testing - Abstract
We establish a feasible central limit theorem with convergence rate n 1 ∕ 8 for the estimation of the integrated volatility of volatility (VoV) based on noisy high-frequency data with jumps. This is the first inference theory ever built for VoV estimation under such a general setup. The central limit theorem is applied to provide interval estimates of the VoV and conduct hypothesis tests. Furthermore, when one is interested in the null hypothesis that the VoV is zero, we show that a more powerful test can be established based on a VoV estimator with a convergence rate n 1 ∕ 5 under the null. Empirical results on the S&P 500 and individual stocks show strong evidence of non-zero VoV.
- Published
- 2022
4. Asymptotic properties of correlation-based principal component analysis
- Author
-
Jungjun Choi and Xiye Yang
- Subjects
Economics and Econometrics ,Delta method ,Covariance matrix ,Applied Mathematics ,Principal component analysis ,Estimator ,Applied mathematics ,Variance (accounting) ,Covariance ,Eigenvalues and eigenvectors ,Statistical hypothesis testing ,Mathematics - Abstract
It is a common practice to conduct principal component analysis (PCA) using standardized data, which is equivalent to applying PCA to the correlation matrix rather than the covariance matrix. Yet little research has been done about such differences in the context of high frequency data. This paper bridges this gap. We derive the analytical forms of the asymptotic biases and variances for the estimators of the integrated eigenvalues and eigenvectors. Furthermore, we propose a novel jackknife-type estimator of the asymptotic variance of the integrated volatility functional estimator. This new variance estimator shows much better finite sample performances compared to other existing ones. This paper also proposes several statistical tests for some commonly tested hypotheses in the literature. Simulation results show that one will get misleading results if one uses the analytical results of the covariance case when applying PCA on the correlation matrix.
- Published
- 2022
5. Bivariate pseudo-observations for recurrent event analysis with terminal events
- Author
-
Per Kragh Andersen, Henrik Ravn, Sofie Korn, Morten Overgaard, and Julie K. Furberg
- Subjects
Interpretation (logic) ,Pseudo-observations ,Computer science ,Applied Mathematics ,Simultaneous model ,General Medicine ,Bivariate analysis ,Marginal model ,Extension (predicate logic) ,Multi-state model ,Expected value ,Recurrent events ,Terminal events ,Terminal (electronics) ,Econometrics ,Event (probability theory) ,Statistical hypothesis testing - Abstract
The analysis of recurrent events in the presence of terminal events requires special attention. Several approaches have been suggested for such analyses either using intensity models or marginal models. When analysing treatment effects on recurrent events in controlled trials, special attention should be paid to competing deaths and their impact on interpretation. This paper proposes a method that formulates a marginal model for recurrent events and terminal events simultaneously. Estimation is based on pseudo-observations for both the expected number of events and survival probabilities. Various relevant hypothesis tests in the framework are explored. Theoretical derivations and simulation studies are conducted to investigate the behaviour of the method. The method is applied to two real data examples. The bivariate marginal pseudo-observation model carries the strength of a two-dimensional modelling procedure and performs well in comparison with available models. Finally, an extension to a three-dimensional model, which decomposes the terminal event per death cause, is proposed and exemplified.
- Published
- 2023
6. Residual-augmented IVX predictive regression
- Author
-
Paulo Rodrigues and Matei Demetrescu
- Subjects
Economics and Econometrics ,Applied Mathematics ,Monte Carlo method ,Instrumental variable ,Econometrics ,Context (language use) ,Sensitivity (control systems) ,Endogeneity ,Predictability ,Residual ,Mathematics ,Statistical hypothesis testing - Abstract
Bias correction in predictive regressions is known to reduce the empirical size problems of OLS-based predictability tests with persistent predictors. This paper shows that bias correction is also achieved in the context of the extended instrumental variable (IVX) predictability testing framework introduced by Kostakis et al. (2015). To be specific, new IVX-based statistics subject to a bias correction analogous to that proposed by Amihud and Hurvich (2004) are introduced. Four important contributions are provided: first, we characterize the effects that bias-reduction adjustments have on the asymptotic distributions of the IVX test statistics in a general context allowing for short-run dynamics and heterogeneity; second, we discuss the validity of the procedure when predictors are stationary as well as near-integrated; third, we conduct an exhaustive Monte Carlo analysis to investigate the small in- and out-of-sample properties of the test procedures and their sensitivity to distinctive features that characterize predictive regressions in practice, such as strong persistence, endogeneity, and non-Gaussian innovations; and fourth, we provide an analysis of real estate return and rent growth predictability in 19 OECD countries.
- Published
- 2022
7. The Reference Distributions of Maurer’s Universal Statistical Test and Its Improved Tests
- Author
-
Ken Umeno, Atsushi Iwasaki, and Yasunari Hikima
- Subjects
Soundness ,Normal distribution ,Distribution (mathematics) ,Applied mathematics ,Variance (accounting) ,Library and Information Sciences ,Computer Science Applications ,Information Systems ,Statistical hypothesis testing ,Test (assessment) ,Mathematics - Abstract
Maurer's universal statistical test can widely detect non-randomness of given sequences. Coron proposed an improved test, and further Yamamoto and Liu proposed a new test based on Coron's test. These tests use normal distributions as their reference distributions, but the soundness has not been theoretically discussed so far. Additionally, Yamamoto and Liu's test uses an experimental value as the variance of its reference distribution. In this paper, we theoretically derive the variance of the reference distribution of Yamamoto and Liu's test and prove that the true reference distribution of Coron's test converges to a normal distribution in some sense. We can apply the proof to the other tests with small changes.
- Published
- 2022
8. Asymptotically most powerful tests for random number generators
- Author
-
Boris Ryabko
- Subjects
Statistics and Probability ,Bernoulli's principle ,Sequence ,Random number generation ,Applied Mathematics ,Uniformly most powerful test ,Binary number ,Applied mathematics ,Ergodic theory ,p-value ,Statistics, Probability and Uncertainty ,Mathematics ,Statistical hypothesis testing - Abstract
The problem of constructing the most powerful test for random number generators (RNGs) is considered, where the generators are modelled by stationary ergodic processes. At present, RNGs are widely used in data protection, modelling and simulation systems, computer games, and in many other areas where the generated random numbers should look like binary numbers of a Bernoulli equiprobable sequence. Another problem considered is that of constructing effective statistical tests for random number generators (RNG). Currently, effectiveness of statistical tests for RNGs is mainly estimated based on experiments with various RNGs. We find an asymptotic estimate for the p -value of an optimal test in the case where the alternative hypothesis is a known stationary ergodic source, and then describe a family of tests each of which has the same asymptotic estimate of the p -value for any (unknown) stationary ergodic source. This model appears to be acceptable for binary sequences generated by physical devices that are used in cryptographic data protection systems.
- Published
- 2022
9. Performing Group Difference Testing on Graph Structured Data From GANs: Analysis and Applications in Neuroimaging
- Author
-
Zhichun Huang, Won Hwa Kim, Akshay Mishra, Vikas Singh, Tuan Quang Dinh, Sathya N. Ravi, Tien N. Vo, and Yunyang Xiong
- Subjects
Computer science ,Neuroimaging ,02 engineering and technology ,Machine learning ,computer.software_genre ,Article ,Empirical research ,Artificial Intelligence ,Simple (abstract algebra) ,Image Processing, Computer-Assisted ,0202 electrical engineering, electronic engineering, information engineering ,Null distribution ,Humans ,Statistical hypothesis testing ,Complement (set theory) ,Spectral graph theory ,Group (mathematics) ,business.industry ,Applied Mathematics ,Brain ,Computational Theory and Mathematics ,020201 artificial intelligence & image processing ,Neural Networks, Computer ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Algorithms ,Software - Abstract
Generative adversarial networks (GANs) have emerged as a powerful generative model in computer vision. Given their impressive abilities in generating highly realistic images, they are also being used in novel ways in applications in the life sciences. This raises an interesting question when GANs are used in scientific or biomedical studies. Consider the setting where we are restricted to only using the samples from a trained GAN for downstream group difference analysis (and do not have direct access to the real data). Will we obtain similar conclusions? In this work, we explore if “generated” data, i.e., sampled from such GANs can be used for performing statistical group difference tests in cases versus controls studies, common across many scientific disciplines. We provide a detailed analysis describing regimes where this may be feasible. We complement the technical results with an empirical study focused on the analysis of cortical thickness on brain mesh surfaces in an Alzheimer’s disease dataset. To exploit the geometric nature of the data, we use simple ideas from spectral graph theory to show how adjustments to existing GANs can yield improvements. We also give a generalization error bound by extending recent results on Neural Network Distance. To our knowledge, our work offers the first analysis assessing whether the Null distribution in “healthy versus diseased subjects” type statistical testing using data generated from the GANs coincides with the one obtained from the same analysis with real data. The code is available at https://github.com/yyxiongzju/GLapGAN.
- Published
- 2022
10. Homeostasis phenomenon in conformal prediction and predictive distribution functions
- Author
-
Min ge Xie and Zheshi Zheng
- Subjects
Property (programming) ,Computer science ,Applied Mathematics ,Boundary (topology) ,Conformal map ,02 engineering and technology ,01 natural sciences ,Regression ,Theoretical Computer Science ,010104 statistics & probability ,Distribution function ,Artificial Intelligence ,Robustness (computer science) ,Covariate ,0202 electrical engineering, electronic engineering, information engineering ,Applied mathematics ,020201 artificial intelligence & image processing ,0101 mathematics ,Software ,Statistical hypothesis testing - Abstract
Conformal prediction is an attractive framework for prediction that is distribution free. In this article, we study in details its homeostasis property under a general regression setup and also introduce the concepts of upper and lower predictive distributions and predictive curve to establish connections to left-, right- and two-tailed hypothesis testing problems as well as the developments in confidence distributions. The homeostasis property is very attractive, since it states that under some conditions the prediction results remain valid even if the model used for learning is completely wrong. We show explicitly why the property holds in a model-based setup and also explore the boundary when the property breaks down. Beside the typical assumption used in conformal prediction that the response and covariate pairs ( y , x ) of all subjects are iid distributed, we also study the classical regression setting in which the design is fixed with given (non-random) covariates x. The trade-offs among learning model accuracy, prediction valid and prediction efficiency are discussed, leading to an emphasis of more efforts on developing better learning models.
- Published
- 2022
11. A Fast and Accurate Approximation to the Distributions of Quadratic Forms of Gaussian Variables
- Author
-
Zheyang Wu, Judong Shen, and Hong Zhang
- Subjects
Statistics and Probability ,Computer science ,business.industry ,Gaussian ,Big data ,R package ,symbols.namesake ,Distribution (mathematics) ,Kurtosis ,Gamma distribution ,symbols ,Discrete Mathematics and Combinatorics ,Applied mathematics ,Statistics, Probability and Uncertainty ,business ,Random variable ,Statistical hypothesis testing - Abstract
In computational and applied statistics, it is of great interest to get fast and accurate calculation for the distributions of the quadratic forms of Gaussian random variables. This paper presents a novel approximation strategy that contains two developments. First, we propose a faster numerical procedure in computing the moments of the quadratic forms. Second, we establish a general moment-matching framework for distribution approximation, which covers existing approximation methods for the distributions of the quadratic forms of Gaussian variables. Under this framework, a novel moment-ratio method (MR) is proposed to match the ratio of skewness and kurtosis based on the gamma distribution. Our extensive simulations show that 1) MR is almost as accurate as the exact distribution calculation and is much more efficient; 2) comparing with existing approximation methods, MR significantly improves the accuracy of approximating far right tail probabilities. The proposed method has wide applications. For example, it is a better choice than existing methods for facilitating hypothesis testing in big data analysis, where efficient and accurate calculation of very small $p$-values is desired. An R package Qapprox that implements related methods is available on CRAN.
- Published
- 2022
12. Fourier trajectory analysis for system discrimination
- Author
-
Russell R. Barton and Lucy E. Morgan
- Subjects
050210 logistics & transportation ,021103 operations research ,Information Systems and Management ,General Computer Science ,05 social sciences ,Autocorrelation ,0211 other engineering and technologies ,02 engineering and technology ,Management Science and Operations Research ,Industrial and Manufacturing Engineering ,symbols.namesake ,Fourier transform ,Fourier analysis ,Modeling and Simulation ,0502 economics and business ,Stochastic simulation ,symbols ,Trajectory ,Applied mathematics ,Fourier series ,Quantile ,Mathematics ,Statistical hypothesis testing - Abstract
With few exceptions, simulation output analysis has focused on static characterizations, to determine a property of the steady-state distribution of a performance metric such as a mean, a quantile, or the distribution itself. Analyses often seek to overcome difficulties induced by autocorrelation of the output stream. But sample paths generated by stochastic simulation exhibit dynamic behaviour that is characteristic of system structure and associated distributions. In this paper, we explore these dynamic characteristics, as captured by the Fourier transform of a dynamic steady-state simulation trajectory. We find that Fourier coefficient magnitudes can have greater discriminatory power than the usual test statistics when two systems have different utilisations and/or dynamic behaviour, and with simpler analysis resulting from the statistical independence of coefficient estimates at different frequencies.
- Published
- 2022
13. Solving nonlinear systems and unconstrained optimization problems by hybridizing whale optimization algorithm and flower pollination algorithm
- Author
-
Mohamed A. Tawhid and Abdelmonem M. Ibrahim
- Subjects
Numerical Analysis ,education.field_of_study ,Optimization problem ,General Computer Science ,Computer science ,020209 energy ,Applied Mathematics ,Population ,02 engineering and technology ,Hybrid algorithm ,Theoretical Computer Science ,Nonlinear system ,Local optimum ,Friedman test ,Modeling and Simulation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,education ,Algorithm ,Statistical hypothesis testing ,Premature convergence - Abstract
This paper suggests a new hybrid algorithm by integrating two population-based algorithms: Whale Optimization Algorithm (WOA) and Flower Pollination Algorithm (FPA), to solve complex nonlinear systems and unconstrained optimization problems. WOFPA denotes the suggested algorithm, a hybrid Whale Optimization Algorithm and Flower Pollination Algorithm. Nonlinear systems can be cast into unconstrained optimization problems, called merit functions, where the optimal solutions for the merit functions are equivalent to the solutions of nonlinear systems. WOFPA aims to decrease the execution time and the complexity of WOA and FPA. WOFPA has the advantages of WOA and FPA; WOFPA is a high-quality algorithm to solve both problems, nonlinear systems and unconstrained optimization problems. For example, FPA may have a premature convergence in the local optima, and WOFPA subdues the disadvantage of FPA. Numerical experiments of 14 benchmarks nonlinear systems and 30 CEC 2014 benchmarks unconstrained optimization functions with various dimensions are employed to test the performance of WOFPA. To have a further investigation for the performance of WOFPA, WOFPA is compared with WOA, FPA, and other existing algorithms from the literature. Two non-parametric statistical tests, Wilcoxon statistical test and the Friedman test, are conducted for this study to check the performance of the proposed algorithms and other compared algorithms and the significance of our results. The experiment results demonstrate that WOFPA performs better than other algorithms in the literature by getting the optimum solutions for most nonlinear systems and optimization problems and proves its efficiency compared with other existing algorithms.
- Published
- 2021
14. Two-way ORDANOVA: Analyzing ordinal variation in a cross-balanced design
- Author
-
Tamar Gadrich and Yariv N. Marmor
- Subjects
Statistics and Probability ,Applied Mathematics ,Homogeneity (statistics) ,Ordinal Scale ,Sample (statistics) ,Scale (descriptive set theory) ,law.invention ,Calculator ,law ,Component (UML) ,Statistics ,Statistics, Probability and Uncertainty ,Categorical variable ,Mathematics ,Statistical hypothesis testing - Abstract
Variability assessment of qualitative data has an important role in diverse areas such as sociology, quality engineering, healthcare, decision making, genetics, metrology in chemistry and many others. To test the homogeneity hypothesis, we use two main categorical factors. Using a cross-balanced design, we provide a decomposition theorem of the sample total-variation into ‘intra’ (within) component and ‘inter’ (between) component for any scale. Building on this, we provide a way for decomposing the ‘inter’ component itself into variation components contributed by the two factors and their interaction (when we consider more than one replicated test in the cross-design). Moreover, focusing on an ordinal scale, we offer segregation power indices for testing the significate effects of the two factors and their interaction. A two-way ORDANOVA calculator is provided to display the empirical distributions of the segregation power indices and to find critical points of the test statistics distributions at the given confidence level. Some examples are given to demonstrate the proposed method.
- Published
- 2021
15. Portmanteau tests for generalized integer-valued autoregressive time series models
- Author
-
Atefeh Zamani, Masoomeh Forughi, and Z. Shishebor
- Subjects
Statistics and Probability ,Autoregressive model ,Series (mathematics) ,Monte Carlo method ,Portmanteau test ,Applied mathematics ,Portmanteau ,Sample (statistics) ,Statistics, Probability and Uncertainty ,Statistical hypothesis testing ,Integer (computer science) ,Mathematics - Abstract
In recent years, integer-valued time series attract the attention of researchers and find their applications in data analysis. Among various models, the integer-valued autoregressive (INAR) ones are of great popularity and are widely applied in practice. This paper develops some portmanteau test statistics to check the adequacy of the fitted model in a wide group of INAR processes, called generalized INAR. For this purpose, the asymptotic distributions of the test statistics are obtained and, using Monte Carlo simulation studies, their finite sample properties are derived. Besides, the results are applied in analyzing a real data example
- Published
- 2021
16. Exact One- and Two-Sample Likelihood Ratio Tests based on Time-Constrained Life-Tests from Exponential Distributions
- Author
-
Narayanaswamy Balakrishnan, Hon-Yiu So, and Xiaojun Zhu
- Subjects
Statistics and Probability ,Exponential distribution ,General Mathematics ,Likelihood-ratio test ,Monte Carlo method ,Asymptotic distribution ,Applied mathematics ,Power function ,Scale parameter ,Mathematics ,Statistical hypothesis testing ,Exponential function - Abstract
The likelihood ratio test is one of the commonly used procedures for hypothesis testing. Several results on likelihood ratio test have been discussed for testing the scale parameter of an exponential distribution under complete and censored data; however, all of them are based on approximations of the involved null distributions. In this paper, we first derive the exact distribution of the likelihood ratio statistic for testing the scale parameter of an exponential distribution based on a time-constrained life-testing experiment. We also obtain the asymptotic distribution, which is useful in the case of a large sample size. We then discuss the derivation of its power function. Next, we consider the likelihood ratio test of $$\theta _2=\gamma _0\theta _1$$ when data are obtained from two exponential distributions based on a time-constrained life-testing experiment. Here again, we derive both exact and asymptotic distributions of the likelihood ratio statistic and then use them to determine the reject region as well as the power function. Monte Carlo simulations are then carried out to evaluate the performance of the inferential methods developed here. Finally, some examples are used to illustrate all the inferential results.
- Published
- 2021
17. Tests for heteroskedasticity in transformation models
- Author
-
Charl Pretorius, Simos G. Meintanis, and Marie Hušková
- Subjects
Statistics and Probability ,Heteroscedasticity ,Transformation (function) ,Homoscedasticity ,Test statistic ,Null distribution ,Applied mathematics ,Limit (mathematics) ,Statistics, Probability and Uncertainty ,Null hypothesis ,Statistical hypothesis testing ,Mathematics - Abstract
We consider a model whereby a given response variable Y following a transformation $${{\mathcal {Y}}}:=\mathcal {T}(Y)$$ , satisfies some classical regression equation. In this transformation model the form of the transformation is specified analytically but incorporates an unknown transformation parameter. We develop testing procedures for the null hypothesis of homoskedasticity for versions of this model where the regression function is considered either known or unknown. The test statistics are formulated on the basis of Fourier-type conditional contrasts of a variance computed under the null hypothesis against the same quantity computed under alternatives. The limit null distribution of the test statistic is studied, as well as the behaviour of the test criterion under alternatives. Since the limit null distribution is complicated, a bootstrap version is suggested in order to actually carry out the test procedures. Monte Carlo results are included that illustrate the finite-sample properties of the new method. The applicability of the new tests on real data is also illustrated.
- Published
- 2021
18. Estimation and hypothesis testing in the two-stage nested design under nonnormality
- Author
-
Birdal Şenoğlu and İklim Gedik Balay
- Subjects
Statistics and Probability ,Nested design ,Estimation ,Applied Mathematics ,Modeling and Simulation ,Statistics ,Stage (hydrology) ,Statistics, Probability and Uncertainty ,Mathematics ,Statistical hypothesis testing - Published
- 2021
19. Statistics of SPDEs: From Linear to Nonlinear
- Author
-
Jaya P. N. Bishwal
- Subjects
Stochastic partial differential equation ,Nonlinear system ,Consistency (statistics) ,Statistical inference ,Applied mathematics ,Inference ,Upper and lower bounds ,Mathematics ,Statistical hypothesis testing ,Type I and type II errors - Abstract
We study statistical inference for stochastic partial differential equations (SPDEs). Though inference linear SPDEs have been studied well (with lot of problems still remain to be investigated) in the last two decades, inference for nonlinear SPDEs is in its infancy. The inference methods use both inference for finite-dimensional diffusions and inference for classical i.i.d. sequences. Solving 2D Navier-Stokes equation is one of the challenging problem of the last century. However, with additive white noise, the equation has a strong solution. We estimate the viscosity coefficient of the 2D stochastic Navier-Stokes (SNS) equation by minimum contrast method. We show $n$ consistency in contrast to $\sqrt n$ consistency in the classical i.i.d. case where $n$ is the number of observations. We consider both continuous and discrete observations in time. We also obtain the Berry-Esseen bounds. Then we estimate and control the Type I and Type II error of a simple hypothesis testing problem of the viscosity coefficient of the SNS equation. We study a class of rejection regions and provide thresholds that guarantee that the statistical errors are smaller than the given upper bound. The tests are of likelihood ratio type. The proofs are based on the large deviation bounds. Finally we give Monte Carlo test procedure for simulated data.
- Published
- 2021
20. ClineHelpR: an R package for genomic cline outlier detection and visualization
- Author
-
Bradley T. Martin, Tyler K. Chafin, Michael E. Douglas, and Marlis R. Douglas
- Subjects
Interface (Java) ,Computer science ,Introgression ,Population genetics ,QH301-705.5 ,Hybrid zones ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Context (language use) ,computer.software_genre ,Biochemistry ,Structural Biology ,Outlier detection ,Humans ,Biology (General) ,Molecular Biology ,Selection ,Statistical hypothesis testing ,Genome ,Applied Mathematics ,Genomic cline ,Reproducibility of Results ,Genomics ,Cline (biology) ,File format ,Biological Evolution ,Computer Science Applications ,Visualization ,bgc ,Outlier ,Hybridization, Genetic ,Anomaly detection ,Data mining ,computer ,Software - Abstract
Background Patterns of multi-locus differentiation (i.e., genomic clines) often extend broadly across hybrid zones and their quantification can help diagnose how species boundaries are shaped by adaptive processes, both intrinsic and extrinsic. In this sense, the transitioning of loci across admixed individuals can be contrasted as a function of the genome-wide trend, in turn allowing an expansion of clinal theory across a much wider array of biodiversity. However, computational tools that serve to interpret and consequently visualize ‘genomic clines’ are limited, and users must often write custom, relatively complex code to do so. Results Here, we introduce the ClineHelpR R-package for visualizing genomic clines and detecting outlier loci using output generated by two popular software packages, bgc and Introgress. ClineHelpR bundles both input generation (i.e., filtering datasets and creating specialized file formats) and output processing (e.g., MCMC thinning and burn-in) with functions that directly facilitate interpretation and hypothesis testing. Tools are also provided for post-hoc analyses that interface with external packages such as ENMeval and RIdeogram. Conclusions Our package increases the reproducibility and accessibility of genomic cline methods, thus allowing an expanded user base and promoting these methods as mechanisms to address diverse evolutionary questions in both model and non-model organisms. Furthermore, the ClineHelpR extended functionality can evaluate genomic clines in the context of spatial and environmental features, allowing users to explore underlying processes potentially contributing to the observed patterns and helping facilitate effective conservation management strategies.
- Published
- 2021
21. Conditional Inference in Small Sample Scenarios Using a Resampling Approach
- Author
-
Andreas Kurz and Clemens Draxler
- Subjects
psychometrics ,Rasch model ,Statistics ,Inference ,Context (language use) ,Conditional probability distribution ,multiparameter hypothesis testing ,HA1-4737 ,non-parametric resampling ,Sampling distribution ,Resampling ,conditional distribution ,Applied mathematics ,Power function ,Statistical hypothesis testing ,Mathematics - Abstract
This paper discusses a non-parametric resampling technique in the context of multidimensional or multiparameter hypothesis testing of assumptions of the Rasch model. It is based on conditional distributions and it is suggested in small sample size scenarios as an alternative to the application of asymptotic or large sample theory. The exact sampling distribution of various well-known chi-square test statistics like Wald, likelihood ratio, score, and gradient tests as well as others can be arbitrarily well approximated in this way. A procedure to compute the power function of the tests is also presented. A number of examples of scenarios are discussed in which the power function of the test does not converge to 1 with an increasing deviation of the true values of the parameters of interest from the values specified in the hypothesis to be tested. Finally, an attempt to modify the critical region of the tests is made aiming at improving the power and an R package is provided.
- Published
- 2021
22. Scrambled Linear Pseudorandom Number Generators
- Author
-
David Blackman and Sebastiano Vigna
- Subjects
FOS: Computer and information sciences ,Pseudorandom number generator ,Computer Science - Cryptography and Security ,Computer science ,Applied Mathematics ,Linear map ,Nonlinear system ,Computer Science - Data Structures and Algorithms ,State space ,Computer Science - Mathematical Software ,Data Structures and Algorithms (cs.DS) ,State (computer science) ,Heuristics ,Cryptography and Security (cs.CR) ,Mathematical Software (cs.MS) ,Algorithm ,Software ,Shift register ,Statistical hypothesis testing - Abstract
F 2 -linear pseudorandom number generators are very popular due to their high speed, to the ease with which generators with a sizable state space can be created, and to their provable theoretical properties. However, they suffer from linear artifacts that show as failures in linearity-related statistical tests such as the binary-rank and the linear-complexity test. In this article, we give two new contributions. First, we introduce two new F 2 -linear transformations that have been handcrafted to have good statistical properties and at the same time to be programmable very efficiently on superscalar processors, or even directly in hardware. Then, we describe some scramblers , that is, nonlinear functions applied to the state array that reduce or delete the linear artifacts, and propose combinations of linear transformations and scramblers that give extremely fast pseudorandom number generators of high quality. A novelty in our approach is that we use ideas from the theory of filtered linear-feedback shift registers to prove some properties of our scramblers, rather than relying purely on heuristics. In the end, we provide simple, extremely fast generators that use a few hundred bits of memory, have provable properties, and pass strong statistical tests.
- Published
- 2021
23. Covariance matrices of S robust regression estimators
- Author
-
Marco Riani, Gianluca Morelli, Fabrizio Laurini, Silvia Salini, and Andrea Cerioli
- Subjects
Statistics and Probability ,S-estimator ,Sample size determination ,Applied Mathematics ,Modeling and Simulation ,Statistics ,Estimator ,Statistics, Probability and Uncertainty ,Covariance ,Confidence interval ,Mathematics ,Statistical hypothesis testing ,Robust regression - Abstract
Asymptotic properties of robust regression estimators are well known. However, it is not always clear what is the best strategy for confidence intervals and hypothesis testing when the sample size is not very large, since the distribution of residuals coming from robust estimates has unknown properties for small samples. In the present work we propose an analysis of various strategies for estimating the variance-covariance matrix of the S estimators at the variation of n and p, considering different �� functions. An adaptive correction strategy is proposed. In addition to the simulation study, an example on a benchmark dataset is shown.
- Published
- 2021
24. Testing for the equivalence of several sets of time series and its multiple comparison procedure
- Author
-
Yukio Yanagisawa
- Subjects
Statistics and Probability ,Harmonic regression ,Series (mathematics) ,Multiple comparison procedure ,Applied mathematics ,Equivalence (measure theory) ,Mathematics ,Statistical hypothesis testing ,Nonparametric regression - Abstract
We propose test statistics for testing for the equivalence of the mean functions with respect to time for two sets of time series and that for several sets of time series. We also propose multiple ...
- Published
- 2021
25. Detecting common breaks in the means of high dimensional cross-dependent panels
- Author
-
Gregory Rice, Zhenya Liu, Lajos Horváth, Yuqian Zhao, Centre d'Études et de Recherche en Gestion d'Aix-Marseille (CERGAM), and Aix Marseille Université (AMU)-Université de Toulon (UTLN)
- Subjects
Economics and Econometrics ,Series (mathematics) ,05 social sciences ,Monte Carlo method ,CUSUM ,Type (model theory) ,[SHS.ECO]Humanities and Social Sciences/Economics and Finance ,HG ,01 natural sciences ,Data set ,010104 statistics & probability ,Sampling distribution ,0502 economics and business ,Applied mathematics ,0101 mathematics ,Statistic ,050205 econometrics ,Mathematics ,Statistical hypothesis testing - Abstract
Summary The problem of detecting change points in the mean of high dimensional panel data with potentially strong cross-sectional dependence is considered. Under the assumption that the cross-sectional dependence is captured by an unknown number of common factors, a new CUSUM-type statistic is proposed. We derive its asymptotic properties under three scenarios depending on to what extent the common factors are asymptotically dominant. With panel data consisting of N cross sectional time series of length T, the asymptotic results hold under the mild assumption that $\min \lbrace N,T\rbrace \rightarrow \infty$, with an otherwise arbitrary relationship between N and T, allowing the results to apply to most panel data examples. Bootstrap procedures are proposed to approximate the sampling distribution of the test statistics. A Monte Carlo simulation study showed that our test outperforms several other existing tests in finite samples in a number of cases, particularly when N is much larger than T. The practical application of the proposed results are demonstrated with real data applications to detecting and estimating change points in the high dimensional FRED-MD macroeconomic data set.
- Published
- 2021
26. Identification of Objects During the Structural and System Monitoring of the Situation
- Author
-
V. A. Shevtsov, A. A. Kochkarov, S. N. Razin’kov, and A. V. Timoshenko
- Subjects
Average risk ,Computer Networks and Communications ,business.industry ,Computer science ,Applied Mathematics ,Maximum likelihood ,Type (model theory) ,System monitoring ,Theoretical Computer Science ,Root mean square ,Identification (information) ,Software ,Control and Systems Engineering ,Computer Vision and Pattern Recognition ,business ,Algorithm ,Information Systems ,Statistical hypothesis testing - Abstract
Using the criterion of the minimum average risk, the statistically optimal algorithm for identifying objects by the same type of parameters is synthesized for the structural and system monitoring of the situation. In order to reduce computational costs, its quasi-optimal modification is performed, based on the exclusion of significantly different parameter values from the compared arrays. An identification simulation model is developed in the Qt Creator software environment in the object-oriented C++ programming language. Based on the statistical tests of the model, the probabilities of correctly identifying objects and false alarms are investigated according to estimates of the maximum likelihood of angular coordinates. The dependences of the indicators of identification efficiency on the root mean square errors (RMSEs) of the parameter estimates, the number of objects, and their density in the monitoring area are analyzed.
- Published
- 2021
27. A modification of MaxT procedure using spurious correlations
- Author
-
Toshihiko Shiroishi, Yoshiyuki Ninomiya, Toyoyuki Takada, and Satoshi Kuriki
- Subjects
Statistics and Probability ,Multivariate statistics ,Applied Mathematics ,Statistics ,Multiple comparisons problem ,Test statistic ,Limit (mathematics) ,Statistics, Probability and Uncertainty ,Spurious relationship ,Statistical power ,Statistic ,Statistical hypothesis testing ,Mathematics - Abstract
We consider one of the most basic multiple testing problems that compares expectations of multivariate data among several groups. As a test statistic, a conventional (approximate) t -statistic is considered, and we determine its rejection region using a common rejection limit. When there are unknown correlations among test statistics, the multiplicity adjusted p -values are dependent on the unknown correlations. They are usually replaced with their estimates that are always consistent under any hypothesis. In this paper, we propose the use of estimates, which are not necessarily consistent and are referred to as spurious correlations, in order to improve statistical power. Through simulation studies, we verify that the proposed method asymptotically controls the family-wise error rate and clearly provides higher statistical power than existing methods. In addition, the proposed and existing methods are applied to a real multiple testing problem that compares quantitative traits among groups of mice and the results are compared.
- Published
- 2021
28. Linking Scores with Patient-Reported Health Outcome Instruments
- Author
-
Benjamin D. Schalet, David Cella, Sangdon Lim, Seung W. Choi, and Epidemiology and Data Science
- Subjects
Psychometrics ,Process (engineering) ,Calibration (statistics) ,Applied Mathematics ,Applied psychology ,Harmonization ,Outcome (game theory) ,Surveys and Questionnaires ,Equating ,Calibration ,Information system ,Humans ,Patient Reported Outcome Measures ,Psychology ,General Psychology ,Statistical hypothesis testing - Abstract
The psychometric process used to establish a relationship between the scores of two (or more) instruments is generically referred to as linking. When two instruments with the same content and statistical test specifications are linked, these instruments are said to be equated. Linking and equating procedures have long been used for practical benefit in educational testing. In recent years, health outcome researchers have increasingly applied linking techniques to patient-reported outcome (PRO) data. However, these applications have some noteworthy purposes and associated methodological questions. Purposes for linking health outcomes include the harmonization of data across studies or settings (enabling increased power in hypothesis testing), the aggregation of summed score data by means of score crosswalk tables, and score conversion in clinical settings where new instruments are introduced, but an interpretable connection to historical data is needed. When two PRO instruments are linked, assumptions for equating are typically not met and the extent to which those assumptions are violated becomes a decision point around how (and whether) to proceed with linking. We demonstrate multiple linking procedures—equipercentile, unidimensional IRT calibration, and calibrated projection—with the Patient-Reported Outcomes Measurement Information System Depression bank and the Patient Health Questionnaire-9. We validate this link across two samples and simulate different instrument correlation levels to provide guidance around which linking method is preferred. Finally, we discuss some remaining issues and directions for psychometric research in linking PRO instruments.
- Published
- 2021
29. Statistical tests of a simple energy balance equation in a synthetic model of cotrending and cointegration
- Author
-
Dukpa Kim and Josep Lluís Carrion-i-Silvestre
- Subjects
Economics and Econometrics ,Anàlisi estocàstica ,Bivariate analysis ,01 natural sciences ,Integració econòmica ,010104 statistics & probability ,Linear programming ,0502 economics and business ,Null distribution ,Applied mathematics ,0101 mathematics ,Mètode de Montecarlo ,Economic integration ,050205 econometrics ,Statistical hypothesis testing ,Mathematics ,Forcing (recursion theory) ,Series (mathematics) ,Cointegration ,Applied Mathematics ,05 social sciences ,Estimator ,Programació lineal ,Monte Carlo method ,Null hypothesis ,Analyse stochastique - Abstract
We develop new tests for the linear relationship between temperature and forcing, which is one of the most studied implications from a simple energy balance model. We consider a bivariate system of temperature and forcing where the time path of well-mixed-greenhouse-gases forcing is included as a potential common trend function in addition to a stochastic trend and a broken linear trend. Our test statistics are first devised as the likelihood ratio and then are modified to remove nuisance parameters in the asymptotic null distribution. The asymptotic null distribution and the required modification differ as to the existence of a stochastic trend. Thus, the test statistics are modified in two different ways and then are combined using the super-efficient estimator of the sum of autoregressive coefficients. The asymptotic critical values from the two cases remain close and we use the bigger one to control size for both cases. The proposed tests are applied to four temperature series and a forcing series. The null hypothesis of the linear relationship is not rejected with conventional sizes.
- Published
- 2021
30. Intermittent fault detection for delayed stochastic systems over sensor networks
- Author
-
Li Sheng, Sen Zhang, and Ming Gao
- Subjects
Computer Networks and Communications ,Control and Systems Engineering ,Control theory ,Computer science ,Stochastic process ,Applied Mathematics ,Node (networking) ,Signal Processing ,Constant (mathematics) ,Wireless sensor network ,Statistical hypothesis testing ,Intermittent fault - Abstract
This paper is concerned with the intermittent fault (IF) detection problem for a class of linear discrete-time stochastic systems over sensor networks with constant time delay. By utilizing the lifting method, the distributed decoupled observers are proposed based on the output information of neighbor nodes and the node itself. In order to detect the appearing time and disappearing time of the IF, the truncated residuals are designed by introducing a sliding-time window. Furthermore, the IF detection and location thresholds are determined based on the hypothesis testing technique and the detectability of the IF is analyzed in the framework of stochastic analysis. Finally, a simulation example is presented to illustrate the effectiveness of the derived results.
- Published
- 2021
31. Exact statistical inferences for the median of the Birnbaum–Saunders distribution
- Author
-
Su-Fen Yang, Dong Shang Chang, and Ming Che Lu
- Subjects
Statistics and Probability ,Distribution (number theory) ,Applied Mathematics ,Interval estimation ,Paris' law ,Birnbaum–Saunders distribution ,Modeling and Simulation ,Statistics ,Consistent estimator ,Statistical inference ,Statistics, Probability and Uncertainty ,Scale parameter ,Statistical hypothesis testing ,Mathematics - Abstract
The two-parameter Birnbaum–Saunders distribution was derived to describe the failure time from a process of fatigue crack growth. The scale parameter is also the median of the distribution. The inf...
- Published
- 2021
32. Maximum nonparametric kernel likelihood estimation for multiplicative linear regression models
- Author
-
Jun Zhang, Yiping Yang, and Bingqing Lin
- Subjects
Statistics and Probability ,Kernel (statistics) ,Linear regression ,Kernel density estimation ,Kernel smoother ,Nonparametric statistics ,Applied mathematics ,Estimator ,Errors-in-variables models ,Statistics, Probability and Uncertainty ,Mathematics ,Statistical hypothesis testing - Abstract
We propose a kernel density based estimation for multiplicative linear regression models. The method proposed in this article makes use of kernel smoothing nonparametric techniques to estimate the unknown density function of model error. For the hypothesis testing of parametric components, restricted estimators under the null hypothesis and test statistics are proposed. The asymptotic properties for the estimators and test statistics are established. We illustrate our proposals through simulations and an analysis of the QSAR fish bioconcentration factor data set. Our analysis provides strong evidence that the proposed kernel density based estimator is superior than the least squares estimator and least product relative error estimator in the literature, particularly for multimodal or asymmetric or heavy-tailed distributions of the model error.
- Published
- 2021
33. Causal order identification to address confounding: binary variables
- Author
-
Yusuke Inaoka and Joe Suzuki
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Kullback–Leibler divergence ,Computer science ,Computer Science - Information Theory ,Information Theory (cs.IT) ,Applied Mathematics ,Estimator ,Experimental and Cognitive Psychology ,Mutual information ,Independent component analysis ,Machine Learning (cs.LG) ,Clinical Psychology ,Bayes' theorem ,Shortest path problem ,Algorithm ,Analysis ,Independence (probability theory) ,Statistical hypothesis testing - Abstract
This paper considers an extension of the linear non-Gaussian acyclic model (LiNGAM) that determines the causal order among variables from a dataset when the variables are expressed by a set of linear equations, including noise. In particular, we assume that the variables are binary. The existing LiNGAM assumes that no confounding is present, which is restrictive in practice. Based on the concept of independent component analysis (ICA), this paper proposes an extended framework in which the mutual information among the noises is minimized. Another significant contribution is to reduce the realization to the shortest path problem, in which the distance between each pair of nodes expresses an associated mutual information value, and the path with the minimum sum (KL divergence) is sought. Although p! mutual information values should be compared, this paper dramatically reduces the computation when no confounding is present. The proposed algorithm finds the globally optimal solution, while the existing approaches locally greedily seek the order based on hypothesis testing. We use the best estimator in the sense of Bayes/MDL that correctly detects independence for mutual information estimation. Experiments using artificial and actual data show that the proposed version of LiNGAM achieves significantly better performance, particularly when confounding is present.
- Published
- 2021
34. Poisson QMLE for change-point detection in general integer-valued time series models
- Author
-
William Kengne and Mamadou Lamine Diop
- Subjects
Statistics and Probability ,Series (mathematics) ,05 social sciences ,Estimator ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,Poisson distribution ,Conditional expectation ,01 natural sciences ,010104 statistics & probability ,symbols.namesake ,Wiener process ,0502 economics and business ,FOS: Mathematics ,symbols ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Change detection ,050205 econometrics ,Mathematics ,Statistical hypothesis testing ,Integer (computer science) - Abstract
We consider together the retrospective and the sequential change-point detection in a general class of integer-valued time series. The conditional mean of the process depends on a parameter $$\theta ^*$$ which may change over time. We propose procedures which are based on the Poisson quasi-maximum likelihood estimator of the parameter, and where the updated estimator is computed without the historical observations in the sequential framework. For both the retrospective and the sequential detection, the test statistics converge to some distributions obtained from the standard Brownian motion under the null hypothesis of no change and diverge to infinity under the alternative; that is, these procedures are consistent. Some results of simulations as well as real data application are provided.
- Published
- 2021
35. Distance-covariance-based tests for heteroscedasticity in nonlinear regressions
- Author
-
Mingxiang Cao and Kai Xu
- Subjects
Score test ,Heteroscedasticity ,General Mathematics ,Null (mathematics) ,Applied mathematics ,Regression analysis ,Covariance ,Nonlinear regression ,Statistical hypothesis testing ,Mathematics ,Parametric statistics - Abstract
We use distance covariance to introduce novel consistent tests of heteroscedasticity for nonlinear regression models in multidimensional spaces. The proposed tests require no user-defined regularization, which are simple to implement based on only pairwise distances between points in the sample and are applicable even if we have non-normal errors and many covariates in the regression model. We establish the asymptotic distributions of the proposed test statistics under the null and alternative hypotheses and a sequence of local alternatives converging to the null at the fastest possible parametric rate. In particular, we focus on whether and how the estimation of the finite-dimensional unknown parameter vector in regression functions will affect the distribution theory. It turns out that the asymptotic null distributions of the suggested test statistics depend on the data generating process, and then a bootstrap scheme and its validity are considered. Simulation studies demonstrate the versatility of our tests in comparison with the score test, the Cramer-von Mises test, the Kolmogorov-Smirnov test and the Zheng-type test. We also use the ultrasonic reference block data set from National Institute for Standards and Technology of USA to illustrate the practicability of our proposals.
- Published
- 2021
36. A new test for tail index with application to Danish fire loss data
- Author
-
Wai Keung Li and Tony S. T. Wong
- Subjects
Statistics and Probability ,Applied Mathematics ,Order statistic ,Inference ,Estimator ,language.human_language ,Test (assessment) ,Danish ,Sequential method ,Modeling and Simulation ,Statistics ,language ,Statistics, Probability and Uncertainty ,Tail index ,Statistical hypothesis testing ,Mathematics - Abstract
The conditional approach of Hill's estimator depends on a threshold choice. This may give different results in a statistical test when different thresholds are used. Motivated by the uniformly most...
- Published
- 2021
37. A two‐step procedure for testing partial parameter stability in cointegrated regression models
- Author
-
Mohitosh Kejriwal, Pierre Perron, and Xuewen Yu
- Subjects
Statistics and Probability ,Sample size determination ,Applied Mathematics ,Linear regression ,Applied mathematics ,Regression analysis ,Limit (mathematics) ,Statistics, Probability and Uncertainty ,Invariant (mathematics) ,Asymptotic theory (statistics) ,Stability (probability) ,Statistical hypothesis testing ,Mathematics - Abstract
Kejriwal and Perron (2010, KP) provided a comprehensive treatment for the problem of testing multiple structural changes in cointegrated regression models. A variety of models were considered depending on whether all regression coefficients are allowed to change (pure structural change) or a subset of the coefficients is held A–xed (partial structural change). In this note, we A–rst show that the limit distributions of the test statistics in the latter case are not invariant to changes in the coe¢ cients not being tested; in fact, they diverge as the sample size increases. To address this issue, we propose a simple two step procedure to test for partial parameter stability. The A–rst entails the application of a joint test of stability for all coe¢ cients as in KP. Upon a rejection, the second conducts a stability test on the subset of coe¢ cients of interest while allowing the other coe¢ cients to change at the estimated breakpoints. Its limit distribution is standard chi-square. The relevant asymptotic theory is provided along with simulations that illustrates the usefulness of the procedure in finite samples.
- Published
- 2021
38. A Non-Probabilistic Neutrosophic Entropy-Based Method For High-Order Fuzzy Time-Series Forecasting
- Author
-
Sibarama Panigrahi, Himansu Sekhar Behera, and Radha Mohan Pattanayak
- Subjects
Range (mathematics) ,Multidisciplinary ,Wilcoxon signed-rank test ,Fuzzy set ,Probabilistic logic ,Applied mathematics ,Entropy (information theory) ,Time series ,Fuzzy logic ,Mathematics ,Statistical hypothesis testing - Abstract
Over the years, numerous fuzzy time-series forecasting (FTSF) models have been developed to handle the uncertainty and non-determinism in the time-series (TS) data. To handle the non-determinism and indeterminacy, researchers have considered either intuitionistic fuzzy set or hesitant fuzzy set theory. However, in both the fuzzy set theories (FST), the degree of indeterminacy is a dependent value and always lies in the range [0, 1]. Hence, these two fuzzy set theories fail to model the indeterminacy value when the degree of non-membership fluctuates due to hesitancy. Motivated from this, we have considered neutrosophic entropy-based fuzzy time-series forecasting (NEBFTSF) model where the neutrosophic entropy of each observation in the TS is used to capture the indeterminacy. Apart from this, the triangular membership value for each observation is used to illustrate the non-probabilistic uncertainty in the TS. The present research mainly focuses on three concepts such as 1) an adaptive method is used to partition the universe of discourse (UOD) into unequal length of intervals (LOIs), 2) for the first time the fuzzy logical relationships (FLRs) are established by considering the ratio trend variation (RTV) data with mean of aggregated entropy value of each crisp observation, and 3) to obtain the forecasted values both de-trending and de-normalization are employed. To assess the forecasting performance of the proposed model, 11 TS datasets with ten distinct profound forecasting models are considered. The Friedman and Nemenyi hypothesis test and Wilcoxon signed rank test conform the forecasting efficiency and reliability of the NEBFTSF model.
- Published
- 2021
39. Semiparametric inference on general functionals of two semicontinuous populations
- Author
-
Meng Yuan, Pengfei Li, Chunlin Wang, and Boxi Lin
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,education.field_of_study ,Population ,Nonparametric statistics ,Asymptotic distribution ,Estimator ,Mathematics - Statistics Theory ,Context (language use) ,Statistics Theory (math.ST) ,01 natural sciences ,Methodology (stat.ME) ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Empirical likelihood ,Distribution (mathematics) ,FOS: Mathematics ,Applied mathematics ,030212 general & internal medicine ,0101 mathematics ,education ,Statistics - Methodology ,Statistical hypothesis testing ,Mathematics - Abstract
In this paper, we propose new semiparametric procedures for making inference on linear functionals and their functions of two semicontinuous populations. The distribution of each population is usually characterized by a mixture of a discrete point mass at zero and a continuous skewed positive component, and hence such distribution is semicontinuous in the nature. To utilize the information from both populations, we model the positive components of the two mixture distributions via a semiparametric density ratio model. Under this model setup, we construct the maximum empirical likelihood estimators of the linear functionals and their functions, and establish the asymptotic normality of the proposed estimators. We show the proposed estimators of the linear functionals are more efficient than the fully nonparametric ones. The developed asymptotic results enable us to construct confidence regions and perform hypothesis tests for the linear functionals and their functions. We further apply these results to several important summary quantities such as the moments, the mean ratio, the coefficient of variation, and the generalized entropy class of inequality measures. Simulation studies demonstrate the advantages of our proposed semiparametric method over some existing methods. Two real data examples are provided for illustration., Comment: 32 pages
- Published
- 2021
40. Hypotheses testing and posterior concentration rates for semi-Markov processes
- Author
-
Nikolaos Limnios, Vlad Stefan Barbu, Ghislaine Gayraud, Irene Votsi, Le Mans Université (UM), Université de Technologie de Compiègne (UTC), Université de Rouen Normandie (UNIROUEN), and Normandie Université (NU)
- Subjects
Statistics and Probability ,Hellinger distance ,Posterior probability ,Markov process ,Mathematics - Statistics Theory ,semi-Markov kernel ,Statistics Theory (math.ST) ,Space (mathematics) ,01 natural sciences ,Bayesian nonparametrics ,010104 statistics & probability ,symbols.namesake ,0502 economics and business ,Prior probability ,FOS: Mathematics ,Countable set ,State space ,Applied mathematics ,testing procedure ,0101 mathematics ,050205 econometrics ,Statistical hypothesis testing ,Mathematics ,posterior concentration rate ,05 social sciences ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,semi-Markov kernels ,robust statistical tests ,symbols ,Bayesian nonparametric statistics ,posterior concentration rates ,semi-Markov processes - Abstract
In this paper, we adopt a nonparametric Bayesian approach and investigate the asymptotic behavior of the posterior distribution in continuous-time and general state space semi-Markov processes. In particular, we obtain posterior concentration rates for semi-Markov kernels. For the purposes of this study, we construct robust statistical tests between Hellinger balls around semi-Markov kernels and present some specifications to particular cases, including discrete-time semi-Markov processes and countable state space Markov processes. The objective of this paper is to provide sufficient conditions on priors and semi-Markov kernels that enable us to establish posterior concentration rates.
- Published
- 2021
41. Two-sample Behrens–Fisher problems for high-dimensional data: A normal reference approach
- Author
-
Jin-Ting Zhang, Jia Guo, Tianming Zhu, and Bu Zhou
- Subjects
Statistics and Probability ,Clustering high-dimensional data ,Applied Mathematics ,Null distribution ,Test statistic ,Applied mathematics ,Asymptotic distribution ,Statistics, Probability and Uncertainty ,Covariance ,Null hypothesis ,Behrens–Fisher problem ,Statistical hypothesis testing ,Mathematics - Abstract
High-dimensional data are frequently encountered with the development of modern data collection techniques. Testing the equality of the mean vectors of two high-dimensional samples with possibly different covariance matrices is usually referred to as a high-dimensional two-sample Behrens–Fisher (BF) problem. In the high-dimensional setting, the classical BF solutions are expected to perform poorly or become inapplicable due to the singularity of the sample covariance matrices. Several approaches have been proposed in the literature to address this challenging issue but they all require strong regularity conditions on the underlying covariance matrices to guarantee that their test statistics are asymptotically normally distributed. To overcome this difficulty, an L 2 -norm-based test is proposed and studied in this article. It is shown that under some regularity conditions and the null hypothesis, the test statistic and a chi-square-type mixture have the same normal or non-normal limiting distribution. It is then natural to approximate the null distribution of the proposed test using that of the chi-square-type mixture, which is actually obtained from the proposed test statistic when the two high-dimensional samples are normally distributed. The resulting test is then referred to as a normal reference test. The distribution of the chi-square-type mixture can then be well approximated by the Welch–Satterthwaite χ 2 -approximation with the approximation parameters consistently estimated from the data. The asymptotic power of the proposed test is established. Good performance of the proposed test against several existing competitors is demonstrated via several simulation studies and illustrated by a real data example.
- Published
- 2021
42. Robust parameter estimation of a PEMFC via optimization based on probabilistic model building
- Author
-
L.C. Ordoñez, S. Ivvan Valdez, Salvador Botello-Rionda, and Luis Blanco-Cocom
- Subjects
Numerical Analysis ,General Computer Science ,Mean squared error ,Estimation theory ,Applied Mathematics ,Statistical model ,Theoretical Computer Science ,Estimation of distribution algorithm ,Approximation error ,Modeling and Simulation ,Outlier ,Algorithm ,Metaheuristic ,Statistical hypothesis testing ,Mathematics - Abstract
In this work, we approximated a set of unknown physical parameters for a semi-empirical mathematical model of a PEMFC. We used an Estimation of Distribution Algorithm (EDA) known as U M D A G to find the tuple that best reproduces the experimental polarization curve. We tackled non-derivable objective functions to perform robust parameter estimation. We compared the sum of the squared error with published results, and the sum and the median of the absolute error values were used to diminish or remove the effect of possible noise or outliers. Since the U M D A G requires a single user-given parameter (the population size) and presents a natural reduction of the variance, it was possible to introduce a variance-based stopping criterion. The obtained results were compared with the most up-to-date evolutionary algorithms, demonstrating that this proposal is competitive. We used four previously reported experimental datasets to get the parameters or validate them. Two of them were used to test the method and to compare it with reported results of recent bio-inspired metaheuristics. Then, we used the identified parameters to simulate the cases of the remaining data sets validating the correct estimation. Finally, we introduced a posterior statistical analysis (hypothesis test), which provided further information about dependencies and the impact of each parameter on the cell performance.
- Published
- 2021
43. Trajectories from Distribution-valued Functional Curves: A Unified Wasserstein Framework
- Author
-
Anuja Sharma and Guido Gerig
- Subjects
education.field_of_study ,Population ,Structure (category theory) ,Function (mathematics) ,01 natural sciences ,Synthetic data ,Article ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Distribution (mathematics) ,Trajectory ,Applied mathematics ,0101 mathematics ,education ,030217 neurology & neurosurgery ,Dykstra's projection algorithm ,Statistical hypothesis testing ,Mathematics - Abstract
Temporal changes in medical images are often evaluated along a parametrized function that represents a structure of interest (e.g. white matter tracts). By attributing samples along these functions with distributions of image properties in the local neighborhood, we create distribution-valued signatures for these functions. We propose a novel, comprehensive framework which models their temporal evolution trajectories. This is achieved under the unifying scheme of Wasserstein distance metric. The regression problem is formulated as a constrained optimization problem and solved using an alternating projection algorithm. The solution simultaneously preserves the functional characteristics of the curve, models the temporal change in distribution profiles and forces the estimated distributions to be valid. Hypothesis testing is applied in two ways using Wasserstein based test statistics. Validation is presented on synthetic data. Estimation of a population trajectory is shown using diffusion properties along DTI tracts from a healthy population of infants. Detection of delayed growth is shown using a case study.
- Published
- 2022
44. Testing the eigenvalue structure of spot and integrated covariance
- Author
-
Abderrahim Taamouti, Julian Williams, and Prosper Dovonon
- Subjects
Economics and Econometrics ,Applied Mathematics ,05 social sciences ,Monte Carlo method ,Sampling (statistics) ,Covariance ,01 natural sciences ,010104 statistics & probability ,Semimartingale ,Dimension (vector space) ,0502 economics and business ,Range (statistics) ,Applied mathematics ,050207 economics ,0101 mathematics ,Eigenvalues and eigenvectors ,Statistical hypothesis testing ,Mathematics - Abstract
For vector Ito semimartingale dynamics, we derive the asymptotic distributions of likelihood-ratio-type test statistics for the purpose of identifying the eigenvalue structure of both integrated and spot covariance matrices estimated using high-frequency data. Unlike the existing approaches where the cross-section dimension grows to infinity, our tests do not necessarily require large cross-section and thus allow for a wide range of applications. The tests, however, are based on non-standard asymptotic distributions with many nuisance parameters. Another contribution of this paper consists in proposing a bootstrap method to approximate these asymptotic distributions. While standard bootstrap methods focus on sampling point-wise returns, the proposed method replicates features of the asymptotic approximation of the statistics of interest that guarantee its validity. A Monte Carlo simulation study shows that the bootstrap-based test controls size and has power for even moderate size samples.
- Published
- 2022
45. Graph drawing using tabu search coupled with path relinking.
- Author
-
Dib, Fadi K. and Rodgers, Peter
- Subjects
- *
METAHEURISTIC algorithms , *GRAPH theory , *RANDOM graphs , *MULTIDISCIPLINARY design optimization , *SCALABILITY - Abstract
Graph drawing, or the automatic layout of graphs, is a challenging problem. There are several search based methods for graph drawing which are based on optimizing an objective function which is formed from a weighted sum of multiple criteria. In this paper, we propose a new neighbourhood search method which uses a tabu search coupled with path relinking to optimize such objective functions for general graph layouts with undirected straight lines. To our knowledge, before our work, neither of these methods have been previously used in general multi-criteria graph drawing. Tabu search uses a memory list to speed up searching by avoiding previously tested solutions, while the path relinking method generates new solutions by exploring paths that connect high quality solutions. We use path relinking periodically within the tabu search procedure to speed up the identification of good solutions. We have evaluated our new method against the commonly used neighbourhood search optimization techniques: hill climbing and simulated annealing. Our evaluation examines the quality of the graph layout (objective function’s value) and the speed of layout in terms of the number of evaluated solutions required to draw a graph. We also examine the relative scalability of each method. Our experimental results were applied to both random graphs and a real-world dataset. We show that our method outperforms both hill climbing and simulated annealing by producing a better layout in a lower number of evaluated solutions. In addition, we demonstrate that our method has greater scalability as it can layout larger graphs than the state-of-the-art neighbourhood search methods. Finally, we show that similar results can be produced in a real world setting by testing our method against a standard public graph dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
46. Reliability demonstration test for load-sharing systems with exponential and Weibull components.
- Author
-
Xu, Jianyu, Hu, Qingpei, Yu, Dan, and Xie, Min
- Subjects
- *
MECHANICAL loads , *RELIABILITY in engineering , *WEIBULL distribution , *EXPONENTIAL functions , *FAILURE analysis - Abstract
Conducting a Reliability Demonstration Test (RDT) is a crucial step in production. Products are tested under certain schemes to demonstrate whether their reliability indices reach pre-specified thresholds. Test schemes for RDT have been studied in different situations, e.g., lifetime testing, degradation testing and accelerated testing. Systems designed with several structures are also investigated in many RDT plans. Despite the availability of a range of test plans for different systems, RDT planning for load-sharing systems hasn’t yet received the attention it deserves. In this paper, we propose a demonstration method for two specific types of load-sharing systems with components subject to two distributions: exponential and Weibull. Based on the assumptions and interpretations made in several previous works on such load-sharing systems, we set the mean time to failure (MTTF) of the total system as the demonstration target. We represent the MTTF as a summation of mean time between successive component failures. Next, we introduce generalized test statistics for both the underlying distributions. Finally, RDT plans for the two types of systems are established on the basis of these test statistics. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
47. Predictability of machine learning techniques to forecast the trends of market index prices: Hypothesis testing for the Korean stock markets.
- Author
-
Pyo, Sujin, Lee, Jaewook, Cha, Mincheol, and Jang, Huisu
- Subjects
- *
MACHINE learning , *STATISTICAL hypothesis testing , *STOCK exchanges , *PREDICTION models , *ARTIFICIAL neural networks - Abstract
The prediction of the trends of stocks and index prices is one of the important issues to market participants. Investors have set trading or fiscal strategies based on the trends, and considerable research in various academic fields has been studied to forecast financial markets. This study predicts the trends of the Korea Composite Stock Price Index 200 (KOSPI 200) prices using nonparametric machine learning models: artificial neural network, support vector machines with polynomial and radial basis function kernels. In addition, this study states controversial issues and tests hypotheses about the issues. Accordingly, our results are inconsistent with those of the precedent research, which are generally considered to have high prediction performance. Moreover, Google Trends proved that they are not effective factors in predicting the KOSPI 200 index prices in our frameworks. Furthermore, the ensemble methods did not improve the accuracy of the prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
48. Evolutionary dynamics of group formation.
- Author
-
Javarone, Marco Alberto and Marinazzo, Daniele
- Subjects
- *
GROUP formation , *GROUP size , *STATISTICAL hypothesis testing , *GAME theory , *PHASE diagrams - Abstract
Group formation is a quite ubiquitous phenomenon across different animal species, whose individuals cluster together forming communities of diverse size. Previous investigations suggest that, in general, this phenomenon might have similar underlying reasons across the interested species, despite genetic and behavioral differences. For instance improving the individual safety (e.g. from predators), and increasing the probability to get food resources. Remarkably, the group size might strongly vary from species to species, e.g. shoals of fishes and herds of lions, and sometimes even within the same species, e.g. tribes and families in human societies. Here we build on previous theories stating that the dynamics of group formation may have evolutionary roots, and we explore this fascinating hypothesis from a purely theoretical perspective, with a model using the framework of Evolutionary Game Theory. In our model we hypothesize that homogeneity constitutes a fundamental ingredient in these dynamics. Accordingly, we study a population that tries to form homogeneous groups, i.e. composed of similar agents. The formation of a group can be interpreted as a strategy. Notably, agents can form a group (receiving a ‘group payoff’), or can act individually (receiving an ‘individual payoff’). The phase diagram of the modeled population shows a sharp transition between the ‘group phase’ and the ‘individual phase’, characterized by a critical ‘individual payoff’. Our results then support the hypothesis that the phenomenon of group formation has evolutionary roots. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
49. Knowledge of Religion and Religiosity of Santri and Their Influence on the Pluralism
- Author
-
Ahmad Nashiruddin and Latifah Nuraini
- Subjects
Variables ,Applied Mathematics ,General Mathematics ,media_common.quotation_subject ,Test (assessment) ,Religiosity ,Correlation ,Linear regression ,Pluralism (philosophy) ,Psychology ,Social psychology ,Value (mathematics) ,Statistical hypothesis testing ,media_common - Abstract
This study aims to determine how the influence of religious knowledge and religiosity on pluralism of students Kajen. This research is quantitative with three variables, they are religious knowledge and religiosity as independent variables, and pluralism as dependent variable. Data analyzing of this research used the correlation of product moment for the partial test, and linear regression test for the simultaneous test. The results of data analysis to the 136 samples showed that there is a positive and significant correlation between religious knowledge and pluralism, it shown by the value of sig is smaller than the value of alpha. Furthermore, the second hypothesis test showed the same results that there is a positive and significant relationship between religiosity and pluralism, the value of sig 0,0003,00, it means that religious knowledge and religiosity influence together on pluralism. Equation of regression’s line Y=0,511X_1+0,274X_2+44,728, showed that the value of coefficient X1 and X2 is positive, means both of them influence on pluralism positively (Y).
- Published
- 2021
50. Testing High-Dimensional Nonparametric Behrens-Fisher Problem
- Author
-
Ao Yuan, Na Li, and Zhen Meng
- Subjects
Sample size determination ,Outlier ,Computer Science (miscellaneous) ,Explained sum of squares ,Nonparametric statistics ,Applied mathematics ,Asymptotic distribution ,Cauchy distribution ,Behrens–Fisher problem ,Information Systems ,Statistical hypothesis testing ,Mathematics - Abstract
For high-dimensional nonparametric Behrens-Fisher problem in which the data dimension is larger than the sample size, the authors propose two test statistics in which one is U-statistic Rank-based Test (URT) and another is Cauchy Combination Test (CCT). CCT is analogous to the maximum-type test, while URT takes into account the sum of squares of differences of ranked samples in different dimensions, which is free of shapes of distributions and robust to outliers. The asymptotic distribution of URT is derived and the closed form for calculating the statistical significance of CCT is given. Extensive simulation studies are conducted to evaluate the finite sample power performance of the statistics by comparing with the existing method. The simulation results show that our URT is robust and powerful method, meanwhile, its practicability and effectiveness can be illustrated by an application to the gene expression data.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.