In order to improve the effect of agricultural economic structure optimisation, this paper builds an agricultural economic structure optimisation analysis system based on factor analysis and intelligent algorithms, and uses a set of low-dimensional hidden factor vectors to describe the covariance matrix structure of high-dimensional observation vectors. Moreover, this paper combines the algorithm with the actual situation to obtain the optimisation analysis process of the agricultural economic structure. On this basis, this paper analyses the optimisation of agricultural economic structure in practice with a case analysis method. Moreover, this paper uses mathematical analysis to process agricultural economic structure data, and uses mathematical statistics to visualise the process of data processing. In addition, this paper combines the weighted average to perform parameter processing, combines with the comparative analysis method to obtain the agricultural economic results parameters, and analyses these parameters. Finally, on the basis of the above, this paper proposes the corresponding optimisation strategy of agricultural economic structure. The research results show that the method proposed in this paper has a certain effect. [ABSTRACT FROM AUTHOR]
In this paper, an up-to-date overview is provided on the data driven-based fault diagnosis (FD) and remaining useful life (RUL) prediction problems of the petroleum machinery and equipment (PME). First, the FD and RUL prediction of five key components including bearings, gears, motors, pumps and pipelines are discussed by adopting mathematical statistics and shallow learning. Then, four kinds of widely-used DL models, i.e. deep neural networks, deep belief networks, convolution neural networks and recurrent neural networks, are surveyed, and the applications in the field of PME are highlighted. Finally, the possible challenges are proposed and some corresponding research directions in the future are presented. [ABSTRACT FROM AUTHOR]
Liu, Junqi, Tang, Huiming, Li, Qi, Su, Aijun, Liu, Qianhui, and Zhong, Cheng
Subjects
MULTISENSOR data fusion, LANDSLIDES, GLOBAL Positioning System, LANDSLIDE prediction, SIMULATION methods & models
Abstract
There hides a certain relationship among various monitoring data in a landslide, and the mining of this relationship is of significance to landslide research. In this paper, we first collect multiple monitoring data of riverside 1# slump-mass of Huangtupo landslide, the Three Gorges Reservoir Region, China, including Global Positioning System (GPS) monitoring data, inclinometer data, reservoir water level, rainfall, water content, crack width, groundwater level and temperature data, etc. By adopting the combination of quantitative statistics and qualitative simulation method for multi-sensor fusion monitoring data analysis, we overcome the one-sidedness of using a single method or single data type. The result of fusion analysis has indicated that in time periods with low rainfall or when the rainfall is not the major factor, main factors affecting landslide movement are crack development, water content of the landslide and water level of the Three Gorges Reservoir. Compared with the actual monitoring data, the fusion analysis results has a maximum error of 1.9%, which shows a good effect. [ABSTRACT FROM AUTHOR]
Application of the exact statistical inference frequently leads to non-standard probability distributions of the considered estimators or test statistics. The exact distributions of many estimators and test statistics can be specified by their characteristic functions, as is the case for the null distribution of the Bartlett's test statistic. However, analytical inversion of the characteristic function, if possible, frequently leads to complicated expressions for computing the distribution function and the corresponding quantiles. An efficient alternative is the well-known method based on numerical inversion of the characteristic functions, which is, however, ignored in popular statistical software packages. In this paper, we present the explicit characteristic function of the corrected Bartlett's test statistic together with the computationally fast and efficient implementation of the approach based on numerical inversion of this characteristic function, suggested for evaluating the exact null distribution used for testing homogeneity of variances in several normal populations, with possibly unequal sample sizes. [ABSTRACT FROM AUTHOR]
GAUSSIAN distribution, MONTE Carlo method, MATHEMATICAL statistics, FISHER information, BAYES' estimation, PARETO distribution
Abstract
In this paper, we consider the problem of making statistical inference for a truncated normal distribution under progressive type I interval censoring. We obtain maximum likelihood estimators of unknown parameters using the expectation-maximization algorithm and in sequel, we also compute corresponding midpoint estimates of parameters. Estimation based on the probability plot method is also considered. Asymptotic confidence intervals of unknown parameters are constructed based on the observed Fisher information matrix. We obtain Bayes estimators of parameters with respect to informative and non-informative prior distributions under squared error and linex loss functions. We compute these estimates using the importance sampling procedure. The highest posterior density intervals of unknown parameters are constructed as well. We present a Monte Carlo simulation study to compare the performance of proposed point and interval estimators. Analysis of a real data set is also performed for illustration purposes. Finally, inspection times and optimal censoring plans based on the expected Fisher information matrix are discussed. [ABSTRACT FROM AUTHOR]
Mixture experiments usually involve various constraints on the proportions of the ingredients of the mixture under study. In this paper, inspired by the fact that the available stock of certain ingredients is often limited, we focus on a new type of constraint, which we refer to as an ingredient availability constraint. This type of constraint substantially complicates the search for optimal designs for mixture experiments. One difficulty, for instance, is that the optimal number of experimental runs is not known a priori. To deal with this complication, we propose a variable neighborhood search algorithm to find I-optimal designs for mixture experiments in case there is a limited stock of certain ingredients. [ABSTRACT FROM AUTHOR]
ANALYSIS of means, FACTORIAL experiment designs, MATHEMATICAL statistics, ANALYSIS of variance, EXPERIMENTAL design
Abstract
Multi-way (multifactor) models with significant interaction can be analyzed using simple effect comparisons. These F-tests are multiple comparisons, which are referred to as slice tests (e.g., in a two-factor study one slices by factor A by comparing the levels of factor B for each level of A). Slicing uses the full model degrees of freedom and mean squared error (MSE). This paper shows how to use analysis of means (ANOM) methods analogous to ANOVA F-test slicing to perform multiple comparisons. This approach results in a set of powerful decision charts that can be used to assess both statistical and practical significance. [ABSTRACT FROM AUTHOR]
In many applications, the quality of products or services tends to be measured by multiple categorical characteristics, each of which is classified into attribute levels such as good, marginal, and bad. Here there is usually natural order among these attribute levels. However, traditional monitoring techniques ignore such order among them. By assuming that each ordinal categorical quality characteristic is determined by a latent continuous variable, this paper incorporates the ordinal information into an extended log-linear model and proposes a multivariate ordinal categorical control chart based on a generalized likelihood-ratio test. The proposed chart is efficient in detecting location shifts and dependence shifts in the corresponding latent continuous variables of ordinal categorical characteristics based on merely the attribute-level counts of the ordinal characteristics. [ABSTRACT FROM AUTHOR]
NEURAL circuitry, TIME series analysis, MULTILAYER perceptrons, SIMULATION methods & models, MATHEMATICAL statistics
Abstract
This paper presents the use of immune-based neural networks that include multilayer perceptron (MLP) and functional neural network for the prediction of financial time series signals. Extensive simulations for the prediction of one-and five-steps-ahead of stationary and non-stationary time series were performed which indicate that immune-based neural networks in most cases demonstrated advantages in capturing chaotic movement in the financial signals with an improvement in the profit return and rapid convergence over MLPs. [ABSTRACT FROM AUTHOR]
Arendarczyk, Marek, Kozubowski, Tomasz J., and Panorska, Anna K.
Subjects
MATHEMATICAL statistics, STATISTICAL models, LAW of large numbers, MATHEMATICAL models, REGRESSION analysis
Abstract
We provide tools for identification and exploration of data with very large variability having power law tails. Such data describe extreme features of processes such as fire losses, flood, drought, financial gain/loss, hurricanes, population of cities, among others. Prediction and quantification of extreme events are at the forefront of the current research needs, as these events have the strongest impact on our lives, safety, economics, and the environment. We concentrate on the intuitive, rather than rigorous mathematical treatment of models with heavy tails. Our goal is to introduce instructors to these important models and provide some tools for their identification and exploration. The methods we provide may be incorporated into courses such as probability, mathematical statistics, statistical modeling or regression methods. Our examples come from ecology and census fields. for this article are available online. [ABSTRACT FROM AUTHOR]
In this paper, a new compound continuous distribution named the Gompertz Fréchet distribution which extends the Frèchet distribution was developed. Its various statistical properties were also derived and estimation of model parameters was considered using the maximum likelihood estimation method. The application of the Gompertz Fréchet distribution was provided using real-life data sets and its performance was compared with Gompertz Weibull distribution, Gompertz Lomax distribution and Gompertz Burr XII distribution. [ABSTRACT FROM AUTHOR]
CONTINGENCY tables, DISTRIBUTION (Probability theory), MATHEMATICAL statistics, DATA analysis, ALGORITHMS
Abstract
In an informal way, some dilemmas in connection with hypothesis testing in contingency tables are discussed. The body of the article concerns the numerical evaluation of Cochran's Rule about the minimum expected value in r × c contingency tables with fixed margins when testing independence with Pearson's X2 statistic using the χ2 distribution. [ABSTRACT FROM AUTHOR]
Kaplan and Meier's 1958 article developed a nonparametric estimator of the survivor function from a right-censored dataset. Determining the size of the support of the estimator as a function of the sample size provides a challenging exercise for students in an advanced course in mathematical statistics. We devise two algorithms for calculating the support size and calculate the associated probability mass function for small sample sizes and particular probability distributions for the failure and censoring times. [ABSTRACT FROM AUTHOR]
In this paper, we introduce the notion of sBCI/sBCK/eBCI/eBCK-algebras as a generalization of the notion of BCI/BCK-algebras. This structure is studied in detail. Also we introduce a way to make an eBCK-algebra from a BCK-algebra and vice versa. [ABSTRACT FROM AUTHOR]
This paper presents the nonparametric estimation of first and second infinitesimal moments of the underlying jump-diffusion model with asymmetric kernel functions. In particular, we use asymmetric kernel estimators characterized by the gamma distribution. This approach allows to conciliate the idea of using the asymmetric kernel with jump-diffusion models. We show that the proposed estimators are consistent and asymptotically follow normal distribution under the conditions of recurrence and stationarity. [ABSTRACT FROM AUTHOR]
I believe that it is most important to seek the posterior distribution Graph HT ht . Suppose Graph HT ht is a linear functional with domain contained in the space of all real-valued functions. The general result based on Graph HT ht has a lot of flexibility, since Graph HT ht can be any linear functional. This equivalence is used to obtain AMARI confidence intervals for Graph HT ht through constructing confidence intervals on the linear functional, Graph HT ht , linear in I G i . [Extracted from the article]
Wood, Simon N., Pya, Natalya, and Säfken, Benjamin
Subjects
*BOUNDARY element methods, *NUMERICAL analysis, *STATISTICS, *MATHEMATICAL statistics, *REGRESSION analysis, *ANALYSIS of covariance
Abstract
The article focuses on the study regarding the boundary of smoothing parameter space in statistical analysis. It mentions several papers by different authors which featured different approaches and methods to determine smoothing parameters on the edge of the feasible parameter space. It also describes the proposed fixes of the researchers that offer substantial improvement to the phase transition to smooth estimates to nonzero smoothing penalty.
We establish a general framework for statistical inferences with nonprobability survey samples when relevant auxiliary information is available from a probability survey sample. We develop a rigorous procedure for estimating the propensity scores for units in the nonprobability sample, and construct doubly robust estimators for the finite population mean. Variance estimation is discussed under the proposed framework. Results from simulation studies show the robustness and the efficiency of our proposed estimators as compared to existing methods. The proposed method is used to analyze a nonprobability survey sample collected by the Pew Research Center with auxiliary information from the Behavioral Risk Factor Surveillance System and the Current Population Survey. Our results illustrate a general approach to inference with nonprobability samples and highlight the importance and usefulness of auxiliary information from probability survey samples. for this article are available online. [ABSTRACT FROM AUTHOR]
Barbu, Vlad Stefan, Karagrigoriou, Alex, and Makrides, Andreas
Subjects
MATHEMATICAL statistics, GEOMETRIC distribution, RANDOM variables, CONTINUOUS distributions, MAXIMUM likelihood statistics
Abstract
In this article we are interested in a general class of distributions for independent not necessarily identically distributed random variables, closed under minima, that includes a number of discrete and continuous distributions like the Geometric, Exponential, Weibull or Pareto. The main parameter involved in this class of distributions is assumed to be time varying with several possible modeling options. This is of particular interest in reliability and survival analysis for describing the time to event or failure. The maximum likelihood estimation of the parameters is addressed and the asymptotic properties of the estimators are discussed. We provide real and simulated examples and we explore the accuracy of the estimating procedure as well as the performance of classical model selection criteria in choosing the correct model among a number of competing models for the time-varying parameters of interest. [ABSTRACT FROM AUTHOR]
The case of size-biased sampling of known order from a finite population without replacement is considered. The behavior of such a sampling scheme is studied with respect to the sampling fraction. Based on a simulation study, it is concluded that such a sample cannot be treated either as a random sample from the parent distribution or as a random sample from the corresponding r-size weighted distribution and as the sampling fraction increases, the biasness in the sample decreases resulting in a transition from an r-size-biased sample to a random sample. A modified version of a likelihood-free method is adopted for making statistical inference for the unknown population parameters, as well as for the size of the population when it is unknown. A simulation study, which takes under consideration the sampling fraction, demonstrates that the proposed method presents better and more robust behavior compared to the approaches, which treat the r-size-biased sample either as a random sample from the parent distribution or as a random sample from the corresponding r-size weighted distribution. Finally, a numerical example which motivates this study illustrates our results. [ABSTRACT FROM AUTHOR]
GIBBS sampling, POLYTOPES, STATISTICS, VECTOR autoregression model, MATHEMATICAL statistics, CONFIDENCE regions (Mathematics), RANDOM sets
Abstract
In particular, the mapping Graph HT ht is the inverse mapping of the data-generating Equation (1). The elegantly constructed I i in Proposition 3.2 has components proportional to the inverse of the exponentiated directed graph path "value", minimized over ratios of components Graph HT ht over all Graph HT ht . The exposition of the development of the ideas is a series of geometric arguments that directly relate ratios of components of I i to ratios of the values of components of the Graph HT ht via Graph HT ht . [Extracted from the article]
We designed a sequence of courses for the DataCamp online learning platform that approximates the content of a typical introductory statistics course. We discuss the design and implementation of these courses and illustrate how they can be successfully integrated into a brick-and-mortar class. We reflect on the process of creating content for online consumers, ruminate on the pedagogical considerations we faced, and describe an R package for statistical inference that became a by-product of this development process. We discuss the pros and cons of creating the course sequence and express our view that some aspects were particularly problematic. The issues raised should be relevant to nearly all statistics instructors. for this article are available online. [ABSTRACT FROM AUTHOR]
Networks are often characterized by node heterogeneity for which nodes exhibit different degrees of interaction and link homophily for which nodes sharing common features tend to associate with each other. In this article, we rigorously study a directed network model that captures the former via node-specific parameterization and the latter by incorporating covariates. In particular, this model quantifies the extent of heterogeneity in terms of outgoingness and incomingness of each node by different parameters, thus allowing the number of heterogeneity parameters to be twice the number of nodes. We study the maximum likelihood estimation of the model and establish the uniform consistency and asymptotic normality of the resulting estimators. Numerical studies demonstrate our theoretical findings and two data analyses confirm the usefulness of our model. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
B. Sibley, Alexander, Li, Zhiguo, Jiang, Yu, Li, Yi-Ju, Chan, Cliburn, Allen, Andrew, and Owzar, Kouros
Subjects
GENOMICS, ALGEBRA software, LIKELIHOOD ratio tests, ALGORITHMS, COMPUTATIONAL complexity
Abstract
The score statistic continues to be a fundamental tool for statistical inference. In the analysis of data from high-throughput genomic assays, inference on the basis of the score usually enjoys greater stability, considerably higher computational efficiency, and lends itself more readily to the use of resampling methods than the asymptotically equivalent Wald or likelihood ratio tests. The score function often depends on a set of unknown nuisance parameters which have to be replaced by estimators, but can be improved by calculating the efficient score, which accounts for the variability induced by estimating these parameters. Manual derivation of the efficient score is tedious and error-prone, so we illustrate using computer algebra to facilitate this derivation. We demonstrate this process within the context of a standard example from genetic association analyses, though the techniques shown here could be applied to any derivation, and have a place in the toolbox of any modern statistician. We further show how the resulting symbolic expressions can be readily ported to compiled languages, to develop fast numerical algorithms for high-throughput genomic analysis. We conclude by considering extensions of this approach. The code featured in this report is available online as part of the supplementary material. [ABSTRACT FROM AUTHOR]
Duchi, John C., Jordan, Michael I., and Wainwright, Martin J.
Subjects
ESTIMATION theory, STATISTICS, MATHEMATICAL statistics, INFERENTIAL statistics, APPROXIMATION theory
Abstract
The authors offer a response to the commentaries made on their article on statistical estimation and privacy. They discuss the issues raised by the discussants including the advantages and disadvantages of local notions of privacy as compared to classical differential and approximate differential privacy, the issue of interactivity, and the importance of local differential privacy. The goal of statistical inference is also mentioned.
Fossaluza, Victor, Izbicki, Rafael, da Silva, Gustavo Miranda, and Esteves, Luís Gustavo
Subjects
HYPOTHESIS, ANALYSIS of variance, MATHEMATICAL statistics, STATISTICAL hypothesis testing, PROBABILITY theory, DATA analysis, NUMERICAL analysis
Abstract
Multiple hypothesis testing, an important quantitative tool to report the results of scientific inquiries, frequently leads to contradictory conclusions. For instance, in an analysis of variance (ANOVA) setting, the same dataset can lead one to reject the equality of two means, say μ1= μ2, but at the same time to not reject the hypothesis that μ1= μ2= 0. These two conclusions violate the coherence principle introduced by Gabriel in 1969, and lead to results that are difficult to communicate, and, many times, embarrassing for practitioners of statistical methods. Although this situation is common in the daily life of statisticians, it is usually not discussed in courses of statistics. In this work, we enrich the teaching and discussion of this important topic by investigating through a few examples whether several standard test procedures are coherent or not. We also discuss the relationship between coherent tests and measures of support. Finally, we show how a Bayesian decision-theoretical framework can be used to build coherent tests. These approaches to coherence enlighten when such property is appealing in multiple testing and provide means of obtaining it. [ABSTRACT FROM PUBLISHER]
Multiple testing problems arising in modern scientific applications can involve simultaneously testing thousands or even millions of hypotheses, with relatively few true signals. In this article, we consider the multiple testing problem where prior information is available (for instance, from an earlier study under different experimental conditions), that can allow us to test the hypotheses as a ranked list to increase the number of discoveries. Given an ordered list ofnhypotheses, the aim is to select a data-dependent cutoffkand declare the firstkhypotheses to be statistically significant while bounding the false discovery rate (FDR). Generalizing several existing methods, we develop a family of “accumulation tests” to choose a cutoffkthat adapts to the amount of signal at the top of the ranked list. We introduce a new method in this family, the HingeExp method, which offers higher power to detect true signals compared to existing techniques. Our theoretical results prove that these methods control a modified FDR on finite samples, and characterize the power of the methods in the family. We apply the tests to simulated data, including a high-dimensional model selection problem for linear regression. We also compare accumulation tests to existing methods for multiple testing on a real data problem of identifying differential gene expression over a dosage gradient. Supplementary materials for this article are available online. [ABSTRACT FROM PUBLISHER]
Electricity load forecasts are an integral part of many decision-making processes in the electricity market. However, most literature on electricity load forecasting concentrates on deterministic forecasts, neglecting possibly important information about uncertainty. A more complete picture of future demand can be obtained by using distributional forecasts, allowing for more efficient decision-making. A predictive density can be fully characterized by tail measures such as quantiles and expectiles. Furthermore, interest often lies in the accurate estimation of tail events rather than in the mean or median. We propose a new methodology to obtain probabilistic forecasts of electricity load that is based on functional data analysis of generalized quantile curves. The core of the methodology is dimension reduction based on functional principal components of tail curves with dependence structure. The approach has several advantages, such as flexible inclusion of explanatory variables like meteorological forecasts and no distributional assumptions. The methodology is applied to load data from a transmission system operator (TSO) and a balancing unit in Germany. Our forecast method is evaluated against other models including the TSO forecast model. It outperforms them in terms of mean absolute percentage error and mean squared error. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
We consider two new approaches to nonparametric estimation of the leverage effect. The first approach uses stock prices alone. The second approach uses the data on stock prices as well as a certain volatility instrument, such as the Chicago Board Options Exchange (CBOE) volatility index (VIX) or the Black–Scholes implied volatility. The theoretical justification for the instrument-based estimator relies on a certain invariance property, which can be exploited when high-frequency data are available. The price-only estimator is more robust since it is valid under weaker assumptions. However, in the presence of a valid volatility instrument, the price-only estimator is inefficient as the instrument-based estimator has a faster rate of convergence.We consider an empirical application, in which we study the relationship between the leverage effect and the debt-to-equity ratio, credit risk, and illiquidity. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
The article presents comment on the study regarding generalized additive model (GAM) technique. It outlines the general settings and implementation of the model along with its computational tricks and inferential products. It also commends the researchers for developing a more sophisticated level of the methods to higher realms of generality and functionality.
STATISTICAL sampling, MATHEMATICAL statistics, ERROR functions, ANALYSIS of means, ANALYSIS of variance, PROBABILITY theory, MATHEMATICAL models
Abstract
Matching estimators are commonly used to estimate causal effects in nonexperimental settings. Covariate measurement error can be problematic for matching estimators when observational treatment groups differ on latent quantities observed only through error-prone surrogates. We establish necessary and sufficient conditions for matching and weighting with functions of observed covariates to yield unconfounded causal effect estimators, generalizing results from the standard (i.e., no measurement error) case. We establish that in common covariate measurement error settings, including continuous variables with continuous measurement error, discrete variables with misclassification, and factor and item response theory models, no single function of the observed covariates computed for all units in a study is appropriate for matching. However, we demonstrate that in some circumstances, it is possible to create different functions of the observed covariates for treatment and control units to construct a variable appropriate for matching. We also demonstrate the counterintuitive result that in some settings, it is possible to selectively contaminate the covariates with additional measurement error to construct a variable appropriate for matching. We discuss the implications of our results for the choice between matching and weighting estimators with error-prone covariates. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
ANALYSIS of covariance, ESTIMATION theory, PROBABILITY theory, MATHEMATICAL statistics, DATA analysis
Abstract
Weighting methods that adjust for observed covariates, such as inverse probability weighting, are widely used for causal inference and estimation with incomplete outcome data. Part of the appeal of such methods is that one set of weights can be used to estimate a range of treatment effects based on different outcomes, or a variety of population means for several variables. However, this appeal can be diminished in practice by the instability of the estimated weights and by the difficulty of adequately adjusting for observed covariates in some settings. To address these limitations, this article presents a new weighting method that finds the weights of minimum variance that adjust or balance the empirical distribution of the observed covariates up to levels prespecified by the researcher. This method allows the researcher to balance very precisely the means of the observed covariates and other features of their marginal and joint distributions, such as variances and correlations and also, for example, the quantiles of interactions of pairs and triples of observed covariates, thus, balancing entire two- and three-way marginals. Since the weighting method is based on a well-defined convex optimization problem, duality theory provides insight into the behavior of the variance of the optimal weights in relation to the level of covariate balance adjustment, answering the question, how much does tightening a balance constraint increases the variance of the weights? Also, the weighting method runs in polynomial time so relatively large datasets can be handled quickly. An implementation of the method is provided in the new package sbw for R. This article shows some theoretical properties of the resulting weights and illustrates their use by analyzing both a dataset from the 2010 Chilean earthquake and a simulated example. [ABSTRACT FROM PUBLISHER]
SOCIAL networks, MATHEMATICAL statistics, ACQUISITION of data, CLASSICAL statistics, BAYESIAN analysis, MATHEMATICAL models
Abstract
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to weigh the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data-collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept,finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
Meta-analysis is a valuable tool for combining information from independent studies. However, most common meta-analysis techniques rely on distributional assumptions that are difficult, if not impossible, to verify. For instance, in the commonly used fixed-effects and random-effects models, we take for granted that the underlying study-level parameters are either exactly the same across individual studies or that they are realizations of a random sample from a population, often under a parametric distributional assumption. In this article, we present a new framework for summarizing information obtained from multiple studies and make inference that is not dependent on any distributional assumption for the study-level parameters. Specifically, we assume the study-level parameters are unknown, fixed parameters and draw inferences about, for example, the quantiles of this set of parameters using study-specific summary statistics. This type of problem is known to be quite challenging (see Hall and Miller). We use a novel resampling method via the confidence distributions of the study-level parameters to construct confidence intervals for the above quantiles. We justify the validity of the interval estimation procedure asymptotically and compare the new procedure with the standard bootstrapping method. We also illustrate our proposal with the data from a recent meta-analysis of the treatment effect from an antioxidant on the prevention of contrast-induced nephropathy. [ABSTRACT FROM AUTHOR]
Abadie, Alberto, Imbens, Guido W., and Zheng, Fanyin
Subjects
STATISTICAL bootstrapping, MATHEMATICAL statistics, CONFIDENCE intervals, DISTRIBUTION (Probability theory), APPROXIMATION theory, ANALYSIS of covariance
Abstract
Following the work by Eicker, Huber, and White it is common in empirical work to report standard errors that are robust against general misspecification. In a regression setting, these standard errors are valid for the parameter that minimizes the squared difference between the conditional expectation and a linear approximation, averaged over the population distribution of the covariates. Here, we discuss an alternative parameter that corresponds to the approximation to the conditional expectation based on minimization of the squared difference averaged over the sample, rather than the population, distribution of the covariates. We argue that in some cases this may be a more interesting parameter. We derive the asymptotic variance for this parameter, which is generally smaller than the Eicker–Huber–White robust variance, and propose a consistent estimator for this asymptotic variance. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
PRESSURE groups, DISTRIBUTION (Probability theory), MATHEMATICAL statistics, STATISTICS, SOCIAL services, PUBLIC health
Abstract
The geographic distribution of nonprofit antipoverty organizations has important implications for economic development, social services, public health, and policy efforts. With counts of antipoverty nonprofits at the census tract level in Greater Hartford, Connecticut, we examine whether these organizations are located in areas with high levels of poverty with a spatial zero-inflated-Poisson model. Covariates that measure need, resources, urban structure, and demographic characteristics are incorporated into both the zero-inflation component and the Poisson component of the model. Variation not explained by the covariates is captured by the combination of a spatial random effect and an unstructured random effect. Statistical inferences are done within the Bayesian framework. Model comparison with the conditional predictive ordinate suggests that the random effects and the zero-inflation are both important components in fitting the data. All three need measures—proportion of people below the poverty line, unemployment rate, and rental occupancy—are found to have significantly positive effect on the mean of the count, providing evidence that antipoverty nonprofits tend to locate where they are needed. The dataset and R/OpenBUGS code are available in supplementary materials online. [ABSTRACT FROM AUTHOR]
FEATURE selection, MAXIMUM likelihood statistics, GENERALIZED spaces, LINEAR statistical models, MATHEMATICAL statistics, SPARSE matrices
Abstract
Feature selection is fundamental for modeling the high-dimensional data, where the number of features can be huge and much larger than the sample size. Since the feature space is so large, many traditional procedures become numerically infeasible. It is hence essential to first remove most apparently noninfluential features before any elaborative analysis. Recently, several procedures have been developed for this purpose, which include the sure-independent-screening (SIS) as a widely used technique. To gain computational efficiency, the SIS screens features based on their individual predicting power. In this article, we propose a new screening method via the sparsity-restricted maximum likelihood estimator (SMLE). The new method naturally takes the joint effects of features in the screening process, which gives itself an edge to potentially outperform the existing methods. This conjecture is further supported by the simulation studies under a number of modeling settings. We show that the proposed method is screening consistent in the context of ultrahigh-dimensional generalized linear models. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
JONES-FARMER, L. ALLISON, WOODALL, WILLIAM H., STEINER, STEFAN H., and CHAMP, CHARLES W.
Subjects
STATISTICAL process control, DATA analysis, CHANGE-point problems, MATHEMATICAL statistics, BEST practices
Abstract
We provide an overview and perspective on the Phase I collection and analysis of data for use in process improvement and control charting. In Phase I, the focus is on understanding the process variability, assessing the stability of the process, investigating process-improvement ideas, selecting an appropriate in-control model, and providing estimates of the in-control model parameters. In our article, we review and synthesize many of the important developments that pertain to the analysis of process data in Phase I. We give our view of the major issues and developments in Phase I analysis. We identify the current best practices and some opportunities for future research in this area. [ABSTRACT FROM AUTHOR]