778 results
Search Results
2. On the Estimation of the Mean with Random Number Of Observations. (Comments on a Paper by M. Singh and V. K. Gupta)
- Author
-
B. G. Lindqvist
- Subjects
Statistics and Probability ,Estimation ,Statistics ,Estimator ,General Medicine ,Statistics, Probability and Uncertainty ,Expected value ,Unit (ring theory) ,Random variable ,Mathematics ,Arithmetic mean - Abstract
In this paper we study the problem of estimating the mean when the number of observations taken on each unit is a random variable. The estimator proposed by M. Singh and V. K. Gupta (1980) is discussed and modified. It is argued, however, that the arithmetic mean is a more appropriate choice of estimator.
- Published
- 2007
- Full Text
- View/download PDF
3. Preface to the Translation of Eckart Sonnemann's 1982 Paper on General Solutions to Multiple Testing Problems
- Author
-
Helmut Finner
- Subjects
Statistics and Probability ,Multiple comparisons problem ,Calculus ,General Medicine ,Statistics, Probability and Uncertainty ,Translation (geometry) ,Mathematics - Published
- 2008
- Full Text
- View/download PDF
4. Testing for Treatment Effects on Subsets of EndpointsThe view presented in this paper are those of authors and not necessary representing those of the U.S. Food and Drug Administration
- Author
-
James J. Chen and Sue-Jane Wang
- Subjects
Statistics and Probability ,Difference set ,Univariate ,General Medicine ,Statistics ,Multiple comparisons problem ,Test statistic ,p-value ,Statistics, Probability and Uncertainty ,Null hypothesis ,Algorithm ,Statistic ,Mathematics ,Statistical hypothesis testing - Abstract
Multiple endpoints are tested to assess an overall treatment effect and also to identify which endpoints or subsets of endpoints contributed to treatment differences. The conventional p-value adjustment methods, such as single-step, step-up, or step-down procedures, sequentially identify each significant individual endpoint. Closed test procedures can also detect individual endpoints that have effects via a step-by-step closed strategy. This paper proposes a global-based statistic for testing an a priori number, say, r of the k endpoints, as opposed to the conventional approach of testing one (r = 1) endpoint. The proposed test statistic is an extension of the single-step p-value-based statistic based on the distribution of the smallest p-value. The test maintains strong control of the FamilyWise Error (FWE) rate under the null hypothesis of no difference in any (sub)set of r endpoints among all possible combinations of the k endpoints. After rejecting the null hypothesis, the individual endpoints in the sets that are rejected can be tested further, using a univariate test statistic in a second step, if desired. However, the second step test only weakly controls the FWE. The proposed method is illustrated by application to a psychosis data set.
- Published
- 2002
- Full Text
- View/download PDF
5. A Bootstrap Procedure for Adaptive Selection of the Test Statistic in Flexible Two-Stage DesignsThis paper is based on a presentation given at the Workshop 'Frontiers in Adaptive Designs', 17.–18. 5. 2001, Vienna/Austria
- Author
-
Brit Schneider, Tim Friede, and Meinhard Kieser
- Subjects
Statistics and Probability ,Exact test ,Wilcoxon signed-rank test ,Sample size determination ,Likelihood-ratio test ,Ancillary statistic ,Statistics ,Test statistic ,Chi-square test ,General Medicine ,Statistics, Probability and Uncertainty ,Statistic ,Mathematics - Abstract
Adaptive two-stage designs allow a data-driven change of design characteristics during the ongoing trial. One of the available options is an adaptive choice of the test statistic for the second stage of the trial based on the results of the interim analysis. Since there is often only a vague knowledge of the distribution shape of the primary endpoint in the planning phase of a study, a change of the test statistic may then be considered if the data indicate that the assumptions underlying the initial choice of the test are not correct. Collings and Hamilton proposed a bootstrap method for the estimation of the power of the two-sample Wilcoxon test for shift alternatives. We use this approach for the selection of the test statistic. By means of a simulation study, we show that the gain in terms of power may be considerable when the initial assumption about the underlying distribution was wrong, whereas the loss is relatively small when in the first instance the optimal test statistic was chosen. The results also hold true for comparison with a one-stage design. Application of the method is illustrated by a clinical trial example.
- Published
- 2002
- Full Text
- View/download PDF
6. On a Distribution Associated with a Stochastic Process in EcologyAn invited paper presented at the Ninth Lukacs Symposium, Frontiers of Environmental and Ecological Statistics for the 21th Century, at Bowling Green State University, Bowling Green, OH, on April 25, 1999
- Author
-
K. G. Janardan
- Subjects
Statistics and Probability ,education.field_of_study ,Distribution (number theory) ,Stochastic modelling ,Ecology ,Population ,Poisson binomial distribution ,General Medicine ,Poisson distribution ,Binomial distribution ,Continuous-time Markov chain ,symbols.namesake ,Compound Poisson distribution ,symbols ,Statistics, Probability and Uncertainty ,education ,Mathematics - Abstract
Poisson processes {X(t), t > 0} are suitable models for a broad variety of counting processes in Ecology. For example, when analyzing data that apparently came from Poisson population, over-dispersion [i.e. V(X(t)) > E(X(t))] or under-dispersion [i.e. V(X(t)) < E(X(t))] is encountered. This led CONSUL and JAIN (1973), and JANARDAN and SCHAEFFER (1977) to consider a generalization of the Poisson distribution called Lagrangian Poisson distribution. JANARDAN (1980) modified the Poisson process and derived a stochastic model for the number of eggs laid by a parasite on a host. This distribution is very suitable for fitting data with over- (or under-) dispersion. JANARDAN et al. (1981) considered this stochastic model and applied it to study the variation of the distribution of chromosome aberrations in human and animal cells subject to radiation or chemical insults. Here, we present a new approach for the derivation of this distribution and provide some alternative chance mechanisms for the genesis of the distribution. Moments, moment properties, and some applications are also given.
- Published
- 2002
- Full Text
- View/download PDF
7. Applied Stochastic Geometry: A Survey (Invited Paper)
- Author
-
D. Stoyan
- Subjects
Statistics and Probability ,Mathematical optimization ,General Medicine ,Statistics, Probability and Uncertainty ,Stochastic geometry ,Mathematical economics ,Mathematics - Abstract
The paper is a survey on some practical aspects of stochastic geometry in the sense of D. G. KENDALL's and G. MATHERON's & J. SERRA's schools.
- Published
- 1979
- Full Text
- View/download PDF
8. A remark on a paper by misra
- Author
-
D. G. Kabe
- Subjects
Statistics and Probability ,Exact test ,Linear regression ,Root test ,Statistics ,Multivariate normal distribution ,General Medicine ,Statistics, Probability and Uncertainty ,Confidence interval ,Student's t-test ,Statistic ,Mathematics ,t-statistic - Abstract
MISRA (1978) sets confidence intervals for a double linear compound of multivariate normal regression coefficients by using ROY'S maximum root test criterion. The exact test statistic to be used is STUDENT'S t. The t statistic gives narrower confidence bounds than those given by ROY's maximum root statistic. A result given by MORRISON (1975, p. 18, equation 10) for profile analysis is also obtained by using the STUDENT'S t test.
- Published
- 1980
- Full Text
- View/download PDF
9. Effect size measures and their benchmark values for quantifying benefit or risk of medicinal products
- Author
-
Volker W. Rahlfs and Helmuth Zimmermann
- Subjects
Statistics and Probability ,effect size measures ,Biometry ,Drug-Related Side Effects and Adverse Reactions ,Distribution (number theory) ,transformation of measures ,binary ,Value (computer science) ,Risk Assessment ,01 natural sciences ,Measure (mathematics) ,Normal distribution ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Statistics ,030212 general & internal medicine ,0101 mathematics ,continuous data ,Proportional Hazards Models ,Mathematics ,General Biometry ,Stochastic Processes ,Absolute risk reduction ,Mann–Whitney measure ,ordinal ,General Medicine ,clinical relevance ,Benchmarking ,Strictly standardized mean difference ,Binary data ,Benchmark (computing) ,Statistics, Probability and Uncertainty ,Research Paper - Abstract
The standardized mean difference is a well‐known effect size measure for continuous, normally distributed data. In this paper we present a general basis for important other distribution families. As a general concept, usable for every distribution family, we introduce the relative effect, also called Mann–Whitney effect size measure of stochastic superiority. This measure is a truly robust measure, needing no assumptions about a distribution family. It is thus the preferred tool for assumption‐free, confirmatory studies. For normal distribution shift, proportional odds, and proportional hazards, we show how to derive many global values such as risk difference average, risk difference extremum, and odds ratio extremum. We demonstrate that the well‐known benchmark values of Cohen with respect to group differences—small, medium, large—can be translated easily into corresponding Mann–Whitney values. From these, we get benchmarks for parameters of other distribution families. Furthermore, it is shown that local measures based on binary data (2 × 2 tables) can be associated with the Mann–Whitney measure: The concept of stochastic superiority can always be used. It is a general statistical value in every distribution family. It therefore yields a procedure for standardizing the assessment of effect size measures. We look at the aspect of relevance of an effect size and—introducing confidence intervals—present some examples for use in statistical practice.
- Published
- 2019
- Full Text
- View/download PDF
10. Bayesian variable selection logistic regression with paired proteomic measurements
- Author
-
Bart Mertens and Alexia Kakourou
- Subjects
Proteomics ,0301 basic medicine ,Statistics and Probability ,Inference ,Feature selection ,isotope clusters ,Bayesian inference ,Logistic regression ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,Component (UML) ,Humans ,paired measurements ,added-value assessment ,0101 mathematics ,Selection (genetic algorithm) ,mass spectrometry ,Mathematics ,General Biometry ,Bayesian variable selection ,Models, Statistical ,business.industry ,Bayes Theorem ,Pattern recognition ,prediction ,General Medicine ,Expression (mathematics) ,Pancreatic Neoplasms ,Logistic Models ,030104 developmental biology ,added‐value assessment ,Predictive power ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,Research Paper - Abstract
We explore the problem of variable selection in a case‐control setting with mass spectrometry proteomic data consisting of paired measurements. Each pair corresponds to a distinct isotope cluster and each component within pair represents a summary of isotopic expression based on either the intensity or the shape of the cluster. Our objective is to identify a collection of isotope clusters associated with the disease outcome and at the same time assess the predictive added‐value of shape beyond intensity while maintaining predictive performance. We propose a Bayesian model that exploits the paired structure of our data and utilizes prior information on the relative predictive power of each source by introducing multiple layers of selection. This allows us to make simultaneous inference on which are the most informative pairs and for which—and to what extent—shape has a complementary value in separating the two groups. We evaluate the Bayesian model on pancreatic cancer data. Results from the fitted model show that most predictive potential is achieved with a subset of just six (out of 1289) pairs while the contribution of the intensity components is much higher than the shape components. To demonstrate how the method behaves under a controlled setting we consider a simulation study. Results from this study indicate that the proposed approach can successfully select the truly predictive pairs and accurately estimate the effects of both components although, in some cases, the model tends to overestimate the inclusion probability of the second component.
- Published
- 2018
- Full Text
- View/download PDF
11. Bayesian hierarchical modelling of continuous non‐negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter
- Author
-
Mike P. Toms, Ruth King, Benjamin Thomas Swallow, Stephen T. Buckland, EPSRC, University of St Andrews. School of Mathematics and Statistics, University of St Andrews. Scottish Oceans Institute, University of St Andrews. Centre for Research into Ecological & Environmental Modelling, University of St Andrews. Marine Alliance for Science & Technology Scotland, and University of St Andrews. St Andrews Sustainability Institute
- Subjects
0106 biological sciences ,Statistics and Probability ,MCMC ,QH301 Biology ,Bayesian probability ,NDAS ,reversible jump MCMC ,010603 evolutionary biology ,01 natural sciences ,Birds ,QH301 ,Tweedie distributions ,010104 statistics & probability ,symbols.namesake ,Surveys and Questionnaires ,Tweedie distribution ,Statistics ,Econometrics ,Animals ,Bayesian hierarchical modeling ,QA Mathematics ,0101 mathematics ,QA ,R2C ,Bayesian hierarchical model ,Continuous nonnegative data ,Mathematics ,Models, Statistical ,Behavior, Animal ,Population size ,Bayes Theorem ,Regression analysis ,Markov chain Monte Carlo ,General Medicine ,Research Papers ,Excess zeros ,Markov Chains ,symbols ,Regression Analysis ,Spike (software development) ,Seasons ,Statistics, Probability and Uncertainty ,BDC ,Gardens ,Algorithms ,Research Paper ,Count data - Abstract
The development of methods for dealing with continuous data with a spike at zero has lagged behind those for overdispersed or zero‐inflated count data. We consider longitudinal ecological data corresponding to an annual average of 26 weekly maximum counts of birds, and are hence effectively continuous, bounded below by zero but also with a discrete mass at zero. We develop a Bayesian hierarchical Tweedie regression model that can directly accommodate the excess number of zeros common to this type of data, whilst accounting for both spatial and temporal correlation. Implementation of the model is conducted in a Markov chain Monte Carlo (MCMC) framework, using reversible jump MCMC to explore uncertainty across both parameter and model spaces. This regression modelling framework is very flexible and removes the need to make strong assumptions about mean‐variance relationships a priori. It can also directly account for the spike at zero, whilst being easily applicable to other types of data and other model formulations. Whilst a correlative study such as this cannot prove causation, our results suggest that an increase in an avian predator may have led to an overall decrease in the number of one of its prey species visiting garden feeding stations in the United Kingdom. This may reflect a change in behaviour of house sparrows to avoid feeding stations frequented by sparrowhawks, or a reduction in house sparrow population size as a result of sparrowhawk increase. Publisher PDF
- Published
- 2015
- Full Text
- View/download PDF
12. Median estimation of chemical constituents for sampling on two occasions under a log‐normal model
- Author
-
Athanassios Kondylis
- Subjects
Statistics and Probability ,Time Factors ,Composite median ,Regression estimator ,Population ,Monte Carlo method ,Miscellanea ,Chemical compounds ,Smoke ,Statistics ,Statistical inference ,education ,Mathematics ,Estimation ,education.field_of_study ,Models, Statistical ,Sampling (statistics) ,Regression analysis ,Tobacco Products ,General Medicine ,Sampling with partial replacement ,Population model ,Chemical constituents ,Regression Analysis ,Model‐assisted inference ,Statistics, Probability and Uncertainty ,Monte Carlo Method ,Research Paper - Abstract
Sampling from a finite population on multiple occasions introduces dependencies between the successive samples when overlap is designed. Such sampling designs lead to efficient statistical estimates, while they allow estimating changes over time for the targeted outcomes. This makes them very popular in real‐world statistical practice. Sampling with partial replacement can also be very efficient in biological and environmental studies where estimation of toxicants and its trends over time is the main interest. Sampling with partial replacement is designed here on two occasions in order to estimate the median concentration of chemical constituents quantified by means of liquid chromatography coupled with tandem mass spectrometry. Such data represent relative peak areas resulting from the chromatographic analysis. They are therefore positive‐valued and skewed data, and are commonly fitted very well by the log‐normal model. A log‐normal model is assumed here for chemical constituents quantified in mainstream cigarette smoke in a real case study. Combining design‐based and model‐based approaches for statistical inference, we seek for the median estimation of chemical constituents by sampling with partial replacement on two time occasions. We also discuss the limitations of extending the proposed approach to other skewed population models. The latter is investigated by means of a Monte Carlo simulation study.
- Published
- 2015
- Full Text
- View/download PDF
13. One‐two dependence and probability inequalities between one‐ and two‐sided union‐intersection tests
- Author
-
Helmut Finner and Markus Roters
- Subjects
Statistics and Probability ,Inequality ,Intersection (set theory) ,media_common.quotation_subject ,Mathematical statistics ,General Medicine ,Type (model theory) ,01 natural sciences ,Empirical distribution function ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Goodness of fit ,Multiple comparisons problem ,Econometrics ,030212 general & internal medicine ,0101 mathematics ,Statistics, Probability and Uncertainty ,Random variable ,Probability ,Mathematics ,media_common - Abstract
In a paper published in 1939 in The Annals of Mathematical Statistics, Wald and Wolfowitz discussed the possible validity of a probability inequality between one- and two-sided coverage probabilities for the empirical distribution function. Twenty-eight years later, Vandewiele and Noé proved this inequality for Kolmogorov-Smirnov type goodness of fit tests. We refer to this type of inequality as one-two inequality. In this paper, we generalize their result for one- and two-sided union-intersection tests based on positively associated random variables and processes. Thereby, we give a brief review of different notions of positive association and corresponding results. Moreover, we introduce the notion of one-two dependence and discuss relationships with other dependence concepts. While positive association implies one-two dependence, the reverse implication fails. Last but not least, the Bonferroni inequality and the one-two inequality yield lower and upper bounds for two-sided acceptance/rejection probabilities which differ only slightly for significance levels not too large. We discuss several examples where the one-two inequality applies. Finally, we briefly discuss the possible impact of the validity of a one-two inequality on directional error control in multiple testing.
- Published
- 2021
- Full Text
- View/download PDF
14. Generalized estimating equations approach for spatial lattice data: A case study in adoption of improved maize varieties in Mozambique
- Author
-
Lourenço Manuel and João Domingos Scalon
- Subjects
Statistics and Probability ,Generalized linear model ,Covariance matrix ,General Medicine ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Quasi-likelihood ,Autoregressive model ,Statistics ,Covariate ,030212 general & internal medicine ,0101 mathematics ,Statistics, Probability and Uncertainty ,Generalized estimating equation ,Random variable ,Spatial analysis ,Mathematics - Abstract
Generalized estimating equations (GEE) are extension of generalized linear models (GLM) widely applied in longitudinal data analysis. GEE are also applied in spatial data analysis using geostatistics methods. In this paper, we advocate application of GEE for spatial lattice data by modeling the spatial working correlation matrix using the Moran's index and the spatial weight matrix. We present theoretical developments and results for simulated and actual data as well. For the former case, 1,000 samples of a random variable (response variable) defined in (0, 1) interval were generated using different values of the Moran's index. In addition, 1,000 samples of a binary and a continuous variable were also randomly generated as covariates. In each sample, three structures of spatial working correlation matrices were used while modeling: The independent, autoregressive, and the Toeplitz structure. Two measures were used to evaluate the performance of each of the spatial working correlation structures: the asymptotic relative efficiency and the working correlation selection criterions. The results showed that both measures indicated that the autoregressive spatial working correlation matrix proposed in this paper presents the best performance in general. For the actual data case, the proportion of small farmers who used improved maize varieties was considered as the response variable and a set of nine variables were used as covariates. Two structures of spatial working correlation matrices were used and the results showed consistence with those obtained in the simulation study.
- Published
- 2020
- Full Text
- View/download PDF
15. Assessment of local influence for the analysis of agreement
- Author
-
Felipe Osorio, Manuel Galea, and Carla Leal
- Subjects
Data Analysis ,Sleep Wake Disorders ,Statistics and Probability ,Biometry ,Maximum likelihood ,Monte Carlo method ,Multivariate normal distribution ,Diagnostic tools ,01 natural sciences ,Clinical study ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Statistics ,Humans ,030212 general & internal medicine ,0101 mathematics ,Probability ,Mathematics ,Clinical Trials as Topic ,Likelihood Functions ,Measurement method ,Models, Statistical ,Estimator ,General Medicine ,Concordance correlation coefficient ,Statistics, Probability and Uncertainty ,Monte Carlo Method - Abstract
The concordance correlation coefficient (CCC) and the probability of agreement (PA) are two frequently used measures for evaluating the degree of agreement between measurements generated by two different methods. In this paper, we consider the CCC and the PA using the bivariate normal distribution for modeling the observations obtained by two measurement methods. The main aim of this paper is to develop diagnostic tools for the detection of those observations that are influential on the maximum likelihood estimators of the CCC and the PA using the local influence methodology but not based on the likelihood displacement. Thus, we derive first- and second-order measures considering the case-weight perturbation scheme. The proposed methodology is illustrated through a Monte Carlo simulation study and using a dataset from a clinical study on transient sleep disorder. Empirical results suggest that under certain circumstances first-order local influence measures may be more powerful than second-order measures for the detection of influential observations.
- Published
- 2019
- Full Text
- View/download PDF
16. K‐Sample comparisons using propensity analysis
- Author
-
Hyun Joo Ahn, Sin-Ho Jung, and Sang Ah Chi
- Subjects
Statistics and Probability ,Biometry ,Endpoint Determination ,Population ,Kaplan-Meier Estimate ,Dunnett's test ,01 natural sciences ,Article ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Statistics ,Humans ,030212 general & internal medicine ,0101 mathematics ,Propensity Score ,education ,Multinomial logistic regression ,Mathematics ,education.field_of_study ,Inverse probability weighting ,Decision Trees ,Regression analysis ,General Medicine ,Observational Studies as Topic ,Propensity score matching ,Regression Analysis ,Observational study ,Statistics, Probability and Uncertainty - Abstract
In this paper, we investigate K-group comparisons on survival endpoints for observational studies. In clinical databases for observational studies, treatment for patients are chosen with probabilities varying depending on their baseline characteristics. This often results in non-comparable treatment groups because of imbalance in baseline characteristics of patients among treatment groups. In order to overcome this issue, we conduct propensity analysis and match the subjects with similar propensity scores across treatment groups or compare weighted group means (or weighted survival curves for censored outcome variables) using the inverse probability weighting (IPW). To this end, multinomial logistic regression has been a popular propensity analysis method to estimate the weights. We propose to use decision tree method as an alternative propensity analysis due to its simplicity and robustness. We also propose IPW rank statistics, called Dunnett-type test and ANOVA-type test, to compare 3 or more treatment groups on survival endpoints. Using simulations, we evaluate the finite sample performance of the weighted rank statistics combined with these propensity analysis methods. We demonstrate these methods with a real data example. The IPW method also allows us for unbiased estimation of population parameters of each treatment group. In this paper, we limit our discussions to survival outcomes, but all the methods can be easily modified for any type of outcomes, such as binary or continuous variables.
- Published
- 2019
- Full Text
- View/download PDF
17. Comparing dependent kappa coefficients obtained on multilevel data
- Author
-
Sophie Vanbelle
- Subjects
Statistics and Probability ,Scale (ratio) ,General Medicine ,Estimating equations ,01 natural sciences ,030218 nuclear medicine & medical imaging ,010104 statistics & probability ,03 medical and health sciences ,Delta method ,0302 clinical medicine ,Cohen's kappa ,Statistics ,Binary data ,0101 mathematics ,Statistics, Probability and Uncertainty ,Categorical variable ,Kappa ,Reliability (statistics) ,Mathematics - Abstract
Reliability and agreement are two notions of paramount importance in medical and behavioral sciences. They provide information about the quality of the measurements. When the scale is categorical, reliability and agreement can be quantified through different kappa coefficients. The present paper provides two simple alternatives to more advanced modeling techniques, which are not always adequate in case of a very limited number of subjects, when comparing several dependent kappa coefficients obtained on multilevel data. This situation frequently arises in medical sciences, where multilevel data are common. Dependent kappa coefficients can result from the assessment of the same individuals at various occasions or when each member of a group is compared to an expert, for example. The method is based on simple matrix calculations and is available in the R package "multiagree". Moreover, the statistical properties of the proposed method are studied using simulations. Although this paper focuses on kappa coefficients, the method easily extends to other statistical measures.
- Published
- 2017
- Full Text
- View/download PDF
18. Clustering multiply imputed multivariate high-dimensional longitudinal profiles
- Author
-
Paul Dendale, Liesbeth Bruckers, and Geert Molenberghs
- Subjects
Statistics and Probability ,Dimensionality reduction ,Functional data analysis ,02 engineering and technology ,General Medicine ,Function (mathematics) ,Missing data ,computer.software_genre ,01 natural sciences ,Data set ,010104 statistics & probability ,Principal component analysis ,Consensus clustering ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,0101 mathematics ,Statistics, Probability and Uncertainty ,Cluster analysis ,Algorithm ,computer ,Mathematics - Abstract
In this paper, we propose a method to cluster multivariate functional data with missing observations. Analysis of functional data often encompasses dimension reduction techniques such as principal component analysis (PCA). These techniques require complete data matrices. In this paper, the data are completed by means of multiple imputation, and subsequently each imputed data set is submitted to a cluster procedure. The final partition of the data, summarizing the partitions obtained for the imputed data sets, is obtained by means of ensemble clustering. The uncertainty in cluster membership, due to missing data, is characterized by means of the agreement between the members of the ensemble and fuzziness of the consensus clustering. The potential of the method was brought out on the heart failure (HF) data. Daily measurement for four biomarkers (heart rate, diastolic, and systolic blood pressure, weight) were used to cluster the patients. To normalize the distributions of the longitudinal outcomes, the data were transformed with a natural logarithm function. A cubic spline base with 69 basis functions was employed to smooth the profiles. The proposed algorithm indicates the existence of a latent structure and divides the HF patients into two clusters, showing a different evolution in blood pressure values and weight. In general, cluster results are sensitive to choices made. Likewise for the proposed approach, alternative choices for the distance measure, procedure to optimize the objective function, choice of the scree-test threshold, or the number of principal components, to be used in the approximation of the surrogate density, could all influence the final partition. For the HF data set, the final partition depends on the number of principal components used in the procedure.
- Published
- 2017
- Full Text
- View/download PDF
19. Detection of spatial change points in the mean and covariances of multivariate simultaneous autoregressive models
- Author
-
Philipp E. Otto and Wolfgang Schmid
- Subjects
Statistics and Probability ,Nonlinear autoregressive exogenous model ,05 social sciences ,Asymptotic distribution ,General Medicine ,01 natural sciences ,Empirical distribution function ,010104 statistics & probability ,Gumbel distribution ,Autoregressive model ,0502 economics and business ,Statistics ,Test statistic ,Autoregressive integrated moving average ,0101 mathematics ,Statistics, Probability and Uncertainty ,Algorithm ,STAR model ,050205 econometrics ,Mathematics - Abstract
In this paper, we propose a test procedure to detect change points of multidimensional autoregressive processes. The considered process differs from typical applied spatial autoregressive processes in that it is assumed to evolve from a predefined center into every dimension. Additionally, structural breaks in the process can occur at a certain distance from the predefined center. The main aim of this paper is to detect such spatial changes. In particular, we focus on shifts in the mean and the autoregressive parameter. The proposed test procedure is based on the likelihood-ratio approach. Eventually, the goodness-of-fit values of the estimators are compared for different shifts. Moreover, the empirical distribution of the test statistic of the likelihood-ratio test is obtained via Monte Carlo simulations. We show that the generalized Gumbel distribution seems to be a suitable limiting distribution of the proposed test statistic. Finally, we discuss the detection of lung cancer in computed tomography scans and illustrate the proposed test procedure.
- Published
- 2016
- Full Text
- View/download PDF
20. Parameter redundancy in discrete state-space and integrated models
- Author
-
Rachel S. McCrea and Diana J. Cole
- Subjects
0106 biological sciences ,Statistics and Probability ,Joint likelihood ,General Medicine ,010603 evolutionary biology ,01 natural sciences ,010104 statistics & probability ,Multiple data ,Redundancy (information theory) ,Population model ,Statistics ,Identifiability ,0101 mathematics ,Statistics, Probability and Uncertainty ,Probability of survival ,Algorithm ,Mathematics - Abstract
Discrete state-space models are used in ecology to describe the dynamics of wild animal populations, with parameters, such as the probability of survival, being of ecological interest. For a particular parametrization of a model it is not always clear which parameters can be estimated. This inability to estimate all parameters is known as parameter redundancy or a model is described as nonidentifiable. In this paper we develop methods that can be used to detect parameter redundancy in discrete state-space models. An exhaustive summary is a combination of parameters that fully specify a model. To use general methods for detecting parameter redundancy a suitable exhaustive summary is required. This paper proposes two methods for the derivation of an exhaustive summary for discrete state-space models using discrete analogues of methods for continuous state-space models. We also demonstrate that combining multiple data sets, through the use of an integrated population model, may result in a model in which all parameters are estimable, even though models fitted to the separate data sets may be parameter redundant.
- Published
- 2016
- Full Text
- View/download PDF
21. Class probability estimation for medical studies
- Author
-
Richard Simon
- Subjects
Statistics and Probability ,Class (set theory) ,Probability estimation ,Econometrics ,General Medicine ,Statistics, Probability and Uncertainty ,Logistic regression ,Multicategory ,Mathematical economics ,Outcome (probability) ,Mathematics - Abstract
I provide a commentary on two papers "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gerard Biau, Michael Kohler, Inke R. Konig, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. Konig, and Andreas Ziegler. Those papers provide an up-to-date review of some popular machine learning methods for class probability estimation and compare those methods to logistic regression modeling in real and simulated datasets.
- Published
- 2014
- Full Text
- View/download PDF
22. General theory of mixture procedures for gatekeeping
- Author
-
Alex Dmitrienko and Ajit C. Tamhane
- Subjects
Statistics and Probability ,Mathematical optimization ,Quantitative Biology::Molecular Networks ,Nonparametric statistics ,Data interpretation ,Familywise error rate ,Computer Science::Social and Information Networks ,General Medicine ,Computer Science::Computers and Society ,Gatekeeping ,Theory based ,General theory ,Statistics, Probability and Uncertainty ,Algorithm ,Mathematics ,Parametric statistics - Abstract
The paper introduces a general approach to constructing mixture-based gatekeeping procedures in multiplicity problems with two or more families of hypotheses. Mixture procedures serve as extensions of and overcome limitations of some previous gatekeeping approaches such as parallel gatekeeping and tree-structured gatekeeping. This paper offers a general theory of mixture procedures constructed from nonparametric (p-value based) to parametric (normal theory based) procedures and studies their properties. It is also shown that the mixture procedure for parallel gatekeeping is equivalent to the multistage gatekeeping procedure. A clinical trial example is used to illustrate the mixture approach and the implementation of mixture procedures.
- Published
- 2013
- Full Text
- View/download PDF
23. Optimal weight in estimating and comparing areas under the receiver operating characteristic curve using longitudinal data
- Author
-
Xiaofei Wang and Yougui Wu
- Subjects
Keratin-19 ,Vascular Endothelial Growth Factor A ,Statistics and Probability ,Analysis of Variance ,Longitudinal study ,Lung Neoplasms ,Receiver operating characteristic ,General Medicine ,Variance (accounting) ,Models, Theoretical ,Prognosis ,Outcome (probability) ,Weighting ,Correlation ,ROC Curve ,Area Under Curve ,Carcinoma, Non-Small-Cell Lung ,Statistics ,Disease Progression ,Humans ,Longitudinal Studies ,Analysis of variance ,Sensitivity (control systems) ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
In the setting of longitudinal study, subjects are followed for the occurrence of some dichotomous outcome. In many of these studies, some markers are also obtained repeatedly during the study period. Emir et al. introduced a non-parametric approach to the estimation of the area under the ROC curve of a repeated marker. Their non-parametric estimate involves assigning a weight to each subject. There are two weighting schemes suggested in their paper: one for the case when within-patient correlation is low, and the other for the case when within-subject correlation is high. However, it is not clear how to assign weights to marker measurements when within-patient correlation is modest. In this paper, we consider the optimal weights that minimize the variance of the estimate of the area under the ROC curve (AUC) of a repeated marker, as well as the optimal weights that minimize the variance of the AUC difference between two repeated markers. Our results in this paper show that the optimal weights depend not only on the within-patient control--case correlation in the longitudinal data, but also on the proportion of subjects that become cases. More importantly, we show that the loss of efficiency by using the two weighting schemes suggested by Emir et al. instead of our optimal weights can be severe when there is a large within-subject control--case correlation and the proportion of subjects that become cases is small, which is often the case in longitudinal study settings.
- Published
- 2011
- Full Text
- View/download PDF
24. Assessing inter-rater reliability when the raters are fixed: Two concepts and two estimates
- Author
-
Valentin Rousson
- Subjects
Statistics and Probability ,education.field_of_study ,Multivariate analysis ,Intraclass correlation ,Population ,Sample (statistics) ,Context (language use) ,Statistical model ,General Medicine ,Inter-rater reliability ,Statistics ,Econometrics ,Statistics, Probability and Uncertainty ,education ,Reliability (statistics) ,Mathematics - Abstract
Intraclass correlation (ICC) is an established tool to assess inter-rater reliability. In a seminal paper published in 1979, Shrout and Fleiss considered three statistical models for inter-rater reliability data with a balanced design. In their first two models, an infinite population of raters was considered, whereas in their third model, the raters in the sample were considered to be the whole population of raters. In the present paper, we show that the two distinct estimates of ICC developed for the first two models can both be applied to the third model and we discuss their different interpretations in this context.
- Published
- 2011
- Full Text
- View/download PDF
25. Duncan'sk -Ratio Bayes Rule Approach to Multiple Comparisons: An Overview
- Author
-
Gene Pennello
- Subjects
Statistics and Probability ,Bayes' rule ,Omnibus test ,Bayes Theorem ,General Medicine ,Bayes' theorem ,Decision Theory ,Frequentist inference ,Data Interpretation, Statistical ,Prior probability ,Statistics ,Multiple comparisons problem ,Statistics, Probability and Uncertainty ,Marginal distribution ,Mathematics ,Type I and type II errors - Abstract
An alternative to frequentist approaches to multiple comparisons is Duncan's k-ratio Bayes rule approach. The purpose of this paper is to compile key results on k-ratio Bayes rules for a number of multiple comparison problems that heretofore, have only been available in separate papers or doctoral dissertations. Among other problems, multiple comparisons for means in one-way, two-way, and treatments-vs.-control structures will be reviewed. In the k-ratio approach, the optimal joint rule for a multiple comparisons problem is derived under the assumptions of additive losses and prior exchangeability for the component comparisons. In the component loss function for a comparison, a balance is achieved between the decision losses due to Type I and Type II errors by assuming that their ratio is k. The component loss is also linear in the magnitude of the error. Under the assumption of additive losses, the joint Bayes rule for the component comparisons applies to each comparison the Bayes test for that comparison considered alone. That is, a comparisonwise approach is optimal. However, under prior exchangeability of the comparisons, the component test critical regions adapt to omnibus patterns in the data. For example, for a balanced one-way array of normally distributed means, the Bayes critical t value for a difference between means is inversely related to the F ratio measuring heterogeneity among the means, resembling a continuous version of Fisher's F-protected least significant difference rule. For more complicated treatment structures, the Bayes critical t value for a difference depends intuitively on multiple F ratios and marginal difference(s) (if applicable), such that the critical t value warranted for the difference can range from being as conservative as that given by a familywise rule to actually being anti-conservative relative to that given by the unadjusted 5%-level Student's t test.
- Published
- 2007
- Full Text
- View/download PDF
26. A Decomposition of Linear Diagonals-Parameter Symmetry Model for Square Contingency Tables with Ordered Categories
- Author
-
Sadao Tomizawa
- Subjects
Statistics and Probability ,Contingency table ,Combinatorics ,Likelihood-ratio test ,Diagonal ,Decomposition (computer science) ,General Medicine ,Statistics, Probability and Uncertainty ,Likelihood ratio statistic ,Symmetry (geometry) ,Square (algebra) ,Mathematics ,Monod-Wyman-Changeux model - Abstract
For square contingency tables with ordered categories, this paper gives a decomposition for AGRESTI'S (1983) linear diagonals-parameter symmetry (LDPS) model into GOODMAN'S (1979) diagonals-parameter symmetry model and the linear diagonals-parameter marginal symmetry model introduced in this paper. It is also pointed out that the likelihood ratio statistic for the LDPS model is equal to the sum of those for the decomposed two models.
- Published
- 2007
- Full Text
- View/download PDF
27. Comparing the Means of Two Independent Groups
- Author
-
Rand R. Wilcox
- Subjects
Statistics and Probability ,Heteroscedasticity ,Mean estimation ,Sample size determination ,Maximum likelihood ,Statistics ,Trimming ,General Medicine ,Statistics, Probability and Uncertainty ,Welch's t-test ,Cornish–Fisher expansion ,Mathematics ,Group treatment - Abstract
Recently, CRESSIE and WHITFORD (1986) showed that Welch's test of H0 :μ1 = μ2 can be biased, under nonnormality, where μ1 and μ2 are the means of two independent treatment groups. They suggested, therefore, that a two-sample analog of Johnson's test be used instead. One goal in this paper is to examine the extent to which a two-sample analog of Johnson's test improves upon Welch's technique in terms of Type I errors and power when sample sizes are small or moderately large. Several alternative procedures are also considered including an additional modification of Johnson's procedure, a procedure suggested by DUNNETT (1982) that uses Tiku's modified maximum likelihood estimate with 10% trimming, two versions of Efron's bootstrap, and a test recently proposed by WILCOX (1989). The paper also describes situations where Welch's procedure is not robust in terms of Type I errors. This is important because based on results currently available, Welch's procedure is thought to be nonrobust in terms of power, but robust in terms of Type I errors.
- Published
- 2007
- Full Text
- View/download PDF
28. Maximum Likelihood Estimates for Binary Data with Random Effects
- Author
-
Haiganoush K. Preisler
- Subjects
Statistics and Probability ,Restricted maximum likelihood ,Logit ,GLIM ,General Medicine ,Maximum likelihood sequence estimation ,Random effects model ,Statistics::Computation ,Overdispersion ,Statistics ,Binary data ,Statistics::Methodology ,Statistics, Probability and Uncertainty ,Likelihood function ,Mathematics - Abstract
The purpose of this paper is to present a procedure for obtaining approximate maximum likelihood estimates for compound binary response models. The extra binomial variation is incorporated into the model by adding random effects to the fixed effects on the probit (or logit) scale. Numerical integration techniques are used to arrive at a solution of the likelihood equations. The paper also presents an illustrating numerical example based on a large toxicological data set. The computations are carried out within the GLIM statistical package.
- Published
- 2007
- Full Text
- View/download PDF
29. Internal Covariance Structure of Some Coronary Heart Disease Risk Variables Investigated in Two Groups of Data
- Author
-
Anna Bartkowiak, S. Lukasik, Mrukowicz M, Chwistecki K, and Morgenstern W
- Subjects
Statistics and Probability ,Data set ,Section (archaeology) ,Coordinate system ,Principal component analysis ,Statistics ,Structure (category theory) ,Context (language use) ,General Medicine ,Rotation matrix ,Statistics, Probability and Uncertainty ,Covariance ,Mathematics - Abstract
In a former paper BARTKOWIAK (1987) has shown some aspects of simultaneous diagonalization of two covariance matrices in the context of a medical data set in which the considered variables were strongly dependent. In this paper we show a real example (considering some coronary disease risk factors) when the considered variables are nearly independent and for which the problem of equal covariance structure is relevant both from statistical and medical point of view. In Section 1 we present the medical problem and the motivation for performing this study. In Section 2 we recall the problem of seeking for common principal components coordinate system by a simultaneous diagonalization of two covariance matrices by use of one common rotation matrix. In Section 3 we show the effects of a simultaneous diagonalization applied to two groups of individuals (“Alive” and “CHD death”) It comes out with P = 0.68 that these groups can have a common principal component coordinate system. In Section 4 we gather more close at the correlations between the considered variables looking for some peculiarities occurring in the recorded data.
- Published
- 2007
- Full Text
- View/download PDF
30. Testing Equality between Two Diagnostic Procedures in Paired-Sample Ordinal Data
- Author
-
Kung-Jong Lui, Xiao-Hua Zhou, and Chii-Dean Lin
- Subjects
Statistics and Probability ,Ordinal data ,Ordinal Scale ,Monte Carlo method ,Parametric model ,Sample (statistics) ,General Medicine ,Sensitivity (control systems) ,Statistics, Probability and Uncertainty ,Algorithm ,Type I and type II errors ,Statistical hypothesis testing ,Mathematics - Abstract
When a new diagnostic procedure is developed, it is important to assess whether the diagnostic accuracy of the new procedure is different from that of the standard procedure. For paired-sample ordinal data, this paper develops two test statistics for testing equality of the diagnostic accuracy between two procedures without assuming any parametric models. One is derived on the basis of the probability of correctly identifying the case for a randomly selected pair of a case and a non-case over all possible cutoff points, and the other is derived on the basis of the sensitivity and specificity directly. To illustrate the practical use of the proposed test procedures, this paper includes an example regarding the use of digitized and plain films for screening breast cancer. This paper also applies Monte Carlo simulation to evaluate the finite sample performance of the two statistics developed here and notes that they can perform well in a variety of situations. (© 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
- Published
- 2004
- Full Text
- View/download PDF
31. Bayesian Modeling of Joint Regressions for the Mean and Covariance Matrix
- Author
-
Dani Gamerman and Edilberto C. Cepeda
- Subjects
Statistics and Probability ,Generalized linear model ,Covariance matrix ,Bayesian probability ,Regression analysis ,General Medicine ,Bayesian inference ,Growth curve (statistics) ,symbols.namesake ,Joint probability distribution ,Statistics ,symbols ,Econometrics ,Statistics, Probability and Uncertainty ,Fisher information ,Mathematics - Abstract
An important problem in agronomy is the study of longitudinal data on the growth curve of the weight of cattle through time, possibly taking into account the effect of other explanatory variables such as treatments and time. In this paper, a Bayesian approach for analysing longitudinal data is proposed. It takes into account regression structures on the mean and the variance-covariance matrix of normal observations. The approach is based on the modeling strategy suggested by Pourahmadi (1999, Biometrika 86, 667–690). After revising this methodology, we present the Bayesian approach used to fit the models, based on a generalization of the Metropolis-Hastings algorithm of Cepeda and Gamerman (2000, Brazilian Journal of Probability and Statistics, 14, 207–221). The approach is used to the study of growth and development of a group of deaf children. The paper is concluded with a few proposed extensions. (© 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
- Published
- 2004
- Full Text
- View/download PDF
32. The Mean of the Inverse of a Punctured Normal Distribution and Its Application
- Author
-
Graham R. Wood, C. D. Lai, and C. G. Qiao
- Subjects
Statistics and Probability ,Estimator ,Inverse ,General Medicine ,Normal distribution ,Combinatorics ,Distribution (mathematics) ,Geometric–harmonic mean ,Statistics ,Truncation (statistics) ,Statistics, Probability and Uncertainty ,Random variable ,Mathematics ,Arithmetic mean - Abstract
Summary The fundamental properties of a punctured normal distribution are studied. The results are applied tothree issues concerning X=Y where X and Y are independent normal random variables with means m X and m Y respectively. First, estimation of m X =m Y as a surrogate for EðX=YÞ is justified, then the reasonfor preference of a weighted average, over an arithmetic average, as an estimator of m X =m Y is given.Finally, an approximate confidence interval for m X =m Y is provided. A grain yield data set is used toillustrate the results. Key words: Arithmetic average; Coefficient of variation; Inverse mean; Punctured normal;Ratio; Truncated; Weighted average. 1 Introduction In agricultural research, sample surveys and other areas, one is often concerned with the ratioR ¼ X=Y of two independent normal random variables, X Nðm X ;s 2X Þ and Y Nðm Y ;s 2Y Þ wherem X ;m Y > 0. Properties of the distribution of this ratio, in particular its moments, are of interest toresearchers. The mean EðX=YÞ of the ratio of two independent normal variables, however, does notexist (see for example, Springer, 1979, p. 139). The root of the problem is the non-existence ofEð1=YÞ; this occurs because Y can in theory assume values arbitrarily close to zero.In order to resolve a number of issues surrounding this problem, in this paper we study Y forjYj > e, with e a small positive number, a “punctured normal” distribution; a small neighbourhood ofzero is removed from consideration. We examine the fundamental properties of this distribution, inparticular we show that the inverse mean of a punctured normal does exist and an explicit expressionis obtained. Approximations for Eð1=YÞ have been developed for a left-truncated normal random vari-able Y (Nahmias and Wang, 1978; Hall, 1979), but the expression for the inverse mean of a puncturednormal given in this paper is exact.We then apply our results to three issues surrounding R ¼ X=Y. First, we justify estimation ofm
- Published
- 2004
- Full Text
- View/download PDF
33. Asymptotical Tests on the Equivalence, Substantial Difference and Non-inferiority Problems with Two Proportions
- Author
-
I. Herranz Tejedor and A. Martín Andrés
- Subjects
Statistics and Probability ,Alternative hypothesis ,General Medicine ,Confidence interval ,Combinatorics ,Exact test ,Non inferiority ,Sample size determination ,Statistics ,Statistics, Probability and Uncertainty ,Null hypothesis ,Equivalence (measure theory) ,Statistical hypothesis testing ,Mathematics - Abstract
Let d = p2 − p1 be the difference between two binomial proportions obtained from two independent trials. For parameter d, three pairs of hypothesis may be of interest: H1: d ≤ δ vs. K1: d > δ; H2: d ∉ (δ1, δ2) vs. K2: d ∈ (δ1, δ2); and H3: d ∈ [δ1, δ2] vs. K3: d ∉ [δ1, δ2], where Hi is the null hypothesis and Ki is the alternative hypothesis. These tests are useful in clinical trials, pharmacological and vaccine studies and in statistics generally. The three problems may be investigated by exact unconditional tests when the sample sizes are moderate. Otherwise, one should use approximate (or asymptotical) tests generally based on a Z-statistics like those suggested in the paper. The article defines a new procedure for testing H2 or H3, demonstrates that this is more powerful than tests based on confidence intervals (the classic TOST – two one sided tests – test), defines two corrections for continuity which reduce the liberality of the three tests, and selects the one that behaves better. The programs for executing the unconditional exact and asymptotic tests described in the paper can be loaded at http://www.ugr.es/~bioest/software.htm. (© 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
- Published
- 2004
- Full Text
- View/download PDF
34. Incorporating Inter-item Correlations in Item Response Data Analysis
- Author
-
Atanu Biswas, Keumhee C. Carriere, and Xiaoming Sheng
- Subjects
Statistics and Probability ,Ordinal data ,Rasch model ,Ordinal analysis ,Polytomous Rasch model ,General Medicine ,behavioral disciplines and activities ,Ordinal regression ,Rating scale ,Statistics ,Econometrics ,Statistics, Probability and Uncertainty ,Latent variable model ,Categorical variable ,Mathematics - Abstract
This paper concerns with the analysis of item response data, which are usually measured on a rating scale and are therefore ordinal. These study items tended to be highly inter-correlated. Rasch models, which convert ordinal categorical scales into linear measurements, are widely used in ordinal data analysis. In this paper, we improve the current methodology in order to incorporate inter-item correlations. We have advocated the latent variable approach for this purpose, in combination with generalized estimating equations to estimate the Rasch model parameters. The data on a study of families of lung cancer patients demonstrate the utility of our methods.
- Published
- 2003
- Full Text
- View/download PDF
35. Optimal Representation of Supplementary Variables in Biplots from Principal Component Analysis and Correspondence Analysis
- Author
-
Jan Graffelman and Tomàs Aluja-Banet
- Subjects
Statistics and Probability ,Goodness of fit ,Biplot ,Principal component analysis ,Statistics ,Context (language use) ,General Medicine ,Statistics, Probability and Uncertainty ,Representation (mathematics) ,Least squares ,Correspondence analysis ,Mathematics ,Variable (mathematics) - Abstract
This paper treats the topic of representing supplementary variables in biplots obtained by principal component analysis (PCA) and correspondence analysis (CA). We follow a geometrical approach where we minimize errors that are obtained when the scores of the PCA or CA solution are projected onto a vector that represents a supplementary variable. This paper shows that optimal directions for supplementary variables can be found by solving a regression problem, and justifies that earlier formulae from Gabriel are optimal in the least squares sense. We derive new results regarding the geometrical properties, goodness of fit statistics and the interpretation of supplementary variables. It is shown that supplementary variables can be represented by plotting their correlation coefficients with the axes of the biplot only when the proper type of scaling is used. We discuss supplementary variables in an ecological context and give illustrations with data from an environmental monitoring survey.
- Published
- 2003
- Full Text
- View/download PDF
36. A Revisit on Comparing the Asymptotic Interval Estimators of Odds Ratio in a Single 2 × 2 Table
- Author
-
Chii-Dean Lin and Kung-Jong Lui
- Subjects
Statistics and Probability ,Interval estimation ,Logit ,Coverage probability ,Estimator ,Continuity correction ,General Medicine ,Interval (mathematics) ,Confidence interval ,Sample size determination ,Statistics ,Econometrics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
It is well known that Cornfield's confidence interval of the odds ratio with the continuity correction can mimic the performance of the exact method. Furthermore, because the calculation procedure of using the former is much simpler than that of using the latter, Cornfield's confidence interval with the continuity correction is highly recommended by many publications. However, all these papers that draw this conclusion are on the basis of examining the coverage probability exclusively. The efficiency of the resulting confidence intervals is completely ignored. This paper calculates and compares the coverage probability and the average length for Woolfs logit interval estimator, Gart's logit interval estimator of adding 0.50, Cornfield's interval estimator with the continuity correction, and Cornfield's interval estimator without the continuity correction in a variety of situations. This paper notes that Cornfield's interval estimator with the continuity correction is too conservative, while Cornfield's method without the continuity correction can improve efficiency without sacrificing the accuracy of the coverage probability. This paper further notes that when the sample size is small (say, 20 or 30 per group) and the probability of exposure in the control group is small (say, 0.10) or large (say, 0.90), using Cornfield's method without the continuity correction is likely preferable to all the other estimators considered here. When the sample size is large (say, 100 per group) or when the probability of exposure in the control group is moderate (say, 0.50), Gart's logit interval estimator is probably the best.
- Published
- 2003
- Full Text
- View/download PDF
37. Notes on Estimation of the General Odds Ratio and the General Risk Difference for Paired-Sample Data
- Author
-
Kung-Jong Lui
- Subjects
Statistics and Probability ,Ordinal data ,Cross-sectional study ,Ordinal Scale ,Interval estimation ,Case-control study ,Absolute risk reduction ,Estimator ,General Medicine ,Odds ratio ,Statistics ,Econometrics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
Under the matched-pair design, this paper discusses estimation of the general odds ratio OR G for ordinal exposure in case-control studies and the general risk difference RD G for ordinal outcomes in cross-sectional or cohort studies. To illustrate the practical usefulness of interval estimators of OR G and RD G developed here, this paper uses the data from a case-control study investigating the effect of the number of beverages drunk at burning hot temperature on the risk of possessing esophageal cancer, and the data from a cross-sectional study comparing the grade distributions of unaided distance vision between two eyes. Finally, this paper notes that using the commonly-used statistics related to odds ratio for dichotomous data by collapsing the ordinal exposure into two categories: the exposure versus the non-exposure, tends to be less efficient than using the statistics related to OR G proposed herein.
- Published
- 2002
- Full Text
- View/download PDF
38. On boundary solutions and identifiability in categorical regression with non-ignorable non-response
- Author
-
Paul Clarke
- Subjects
Statistics and Probability ,Estimation theory ,Boundary (topology) ,Regression analysis ,General Medicine ,Logistic regression ,Regression ,Econometrics ,Identifiability ,Applied mathematics ,Statistics, Probability and Uncertainty ,Categorical variable ,Variable (mathematics) ,Mathematics - Abstract
This paper considers the regression analysis of categorical variables when the response variable is incompletely observed and the non-response mechanism is assumed to be non-ignorable. Maximum likelihood estimation of the model parameters can lead to substantively implausible boundary solutions where the estimated proportion of non-respondents taking certain values of the response variable is zero. A geometric explanation of why boundary solutions occur was given in a previous paper for a simple model. By extending this explanation, it is possible to define the sub-class of non-ignorable models whose parameters are identified, and to show all models not in this sub-class are non-identified. The conditions under which a model is a member of this class are easily established.
- Published
- 2002
- Full Text
- View/download PDF
39. Survival Tests forr Groups
- Author
-
Emilio Letón and Pilar Zuluaga
- Subjects
Statistics and Probability ,Score test ,Survival function ,Test score ,Multiple comparisons problem ,Statistics ,General Medicine ,Statistics, Probability and Uncertainty ,Matrix form ,Censoring (statistics) ,Mathematics ,Weighting - Abstract
In this paper we give the generalization of the score tests covering the case of ties and we give examples where the expressions in matrix form are completely specified for the weighted tests and the score tests for the case of r groups. It is worth mentioning that although the score tests are not generally included in the commercial software, these tests should be used if it can be assumed that the censoring mechanism is equal in the r groups or if there is no censoring (Lawless, 1982). We establish the equivalence between "numerators" of these families of tests. As result of this equivalence we define four new tests that complete the classification of score and weighted tests. The Kruskal-Wallis test (1952) appears as a particular case of the score tests for the case of non-censoring. A simulation study has been done in order to compare the performance of the tests described in this paper. An example is included to make the understanding of the paper easier.
- Published
- 2002
- Full Text
- View/download PDF
40. Analysis of Longitudinal Data of Epileptic Seizure Counts – A Two‐State Hidden Markov Regression Approach
- Author
-
Martin L. Puterman and Peiming Wang
- Subjects
Statistics and Probability ,Markov chain ,Variable-order Markov model ,Markov process ,Markov chain Monte Carlo ,General Medicine ,Markov model ,Continuous-time Markov chain ,symbols.namesake ,Statistics ,symbols ,Markov property ,Hidden semi-Markov model ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
This paper discusses a two-state hidden Markov Poisson regression (MPR) model for analyzing longitudinal data of epileptic seizure counts, which allows for the rate of the Poisson process to depend on covariates through an exponential link function and to change according to the states of a two-state Markov chain with its transition probabilities associated with covariates through a logit link function. This paper also considers a two-state hidden Markov negative binomial regression (MNBR) model, as an alternative, by using the negative binomial instead of Poisson distribution in the proposed MPR model when there exists extra-Poisson variation conditional on the states of the Markov chain. The two proposed models in this paper relax the stationary requirement of the Markov chain, allow for over-dispersion relative to the usual Poisson regression model and for correlation between repeated observations. The proposed methodology provides a plausible analysis for the longitudinal data of epileptic seizure counts, and the MNBR model fits the data much better than the MPR model. Maximum likelihood estimation using the EM and quasi-Newton algorithms is discussed. A Monte Carlo study for the proposed MPR model investigates the reliability of the estimation method, the choice of probabilities for the initial states of the Markov chain, and some finite sample behaviors of the maximum likelihood estimates, suggesting that (1) the estimation method is accurate and reliable as long as the total number of observations is reasonably large, and (2) the choice of probabilities for the initial states of the Markov process has little impact on the parameter estimates.
- Published
- 2001
- Full Text
- View/download PDF
41. Interval Estimation of Simple Difference in Dichotomous Data with Repeated Measurements
- Author
-
Kung-Jong Lui
- Subjects
Statistics and Probability ,Sample size determination ,Interval estimation ,Monte Carlo method ,Statistics ,Estimator ,Probability distribution ,General Medicine ,Interval (mathematics) ,Statistics, Probability and Uncertainty ,Outcome (probability) ,Confidence interval ,Mathematics - Abstract
When comparing two treatments, we often use the simple difference between the probabilities of response to measure the efficacy of one treatment over the other. When the measurement of outcome is unreliable or the cost of obtaining additional subjects is high relative to that of additional measurements from the obtained subjects, we may often consider taking more than one measurement per subject to increase the precision of an interval estimator. This paper focuses discussion on interval estimation of simple difference when we take repeated measurements per subject. This paper develops four asymptotic interval estimators of simple difference for any finite number of measurements per subject. This paper further applies Monte Carlo simulation to evaluate the finite-sample performance of these estimators in a variety of situations. Finally, this paper includes a discussion on sample size determination on the basis of both the average length and the probability of controlling the length of the resulting interval estimate proposed elsewhere.
- Published
- 2001
- Full Text
- View/download PDF
42. An Efficient Alternative to Average Ranks for Testing with Incomplete Ranking Data
- Author
-
Dong Hoon Lim and Douglas A. Wolfe
- Subjects
Statistics and Probability ,Mathematical optimization ,Estimation theory ,Computation ,General Medicine ,Maximization ,Missing data ,computer.software_genre ,Ranking ,Complete information ,Minification ,Data mining ,Statistics, Probability and Uncertainty ,computer ,Statistic ,Mathematics - Abstract
In this paper we consider the setting where a group of n judges are to independently rank a series of k objects, but the intended complete rankings are not realized and we are faced with analyzing randomly incomplete ranking vectors. In this paper we propose a new testing procedure for dealing with such data realizations. We concentrate on the problem of testing for no differences in the objects being ranked (i.e., they are indistinguishable) against general alternatives, but our approach could easily be extended to restricted (e.g., ordered or umbrella) alternatives. Using an improvement of a preliminary screening approach previously proposed by the authors, we present an algorithm for computation of the relevant Friedman-type statistic in the general alternatives setting and present the results of an extensive simulation study comparing the new procedure with the standard approach of imputing average within-judge ranks to the unranked objects.
- Published
- 2001
- Full Text
- View/download PDF
43. A Note on Interval Estimation of the Simple Difference in Data with Correlated Matched Pairs
- Author
-
Kung-Jong Lui
- Subjects
Statistics and Probability ,Estimation theory ,Intraclass correlation ,Interclass correlation ,Interval estimation ,Statistics ,Coverage probability ,Estimator ,Multinomial distribution ,Cluster sampling ,General Medicine ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
When we employ cluster sampling to collect data with matched pairs, the assumption of independence between all matched pairs is not likely true. This paper notes that applying interval estimators, that do not account for the intraclass correlation between matched pairs, to estimate the simple difference between two proportions of response can be quite misleading, especially when both the number of matched pairs per cluster and the intraclass correlation between matched pairs within clusters are large. This paper develops two asymptotic interval estimators of the simple difference, that accommodate the data of cluster sampling with correlated matched pairs. This paper further applies Monte Carlo simulation to compare the finite sample performance of these estimators and demonstrates that the interval estimator, derived from a quadratic equation proposed here, can actually perform quite well in a variety of situations.
- Published
- 2001
- Full Text
- View/download PDF
44. Undominatedp-Values and PropertyC for Unconditional One-Sided Two-Sample Binomial Tests
- Author
-
H. Frick
- Subjects
Statistics and Probability ,Discrete mathematics ,Property (philosophy) ,Binomial (polynomial) ,Context (language use) ,General Medicine ,Type (model theory) ,Binomial distribution ,Calculus ,Statistics, Probability and Uncertainty ,Statistical theory ,Statistic ,Statistical hypothesis testing ,Mathematics - Abstract
In a recent paper ROHMEL and MANSMANN (1999) discussed p-values for unconditional two-sample binomial tests of the one-sided type. Since uniformly smallest p-values do not exist, they considered undominated or, in their notation, acceptable p-values. Rohmel and Mansmann showed that any p-value is dominated by a p-value induced by an appropriate statistic. For investigating acceptable p-values it is therefore sufficient to consider the set of statistics. In this paper necessary and sufficient conditions and construction methods for statistics inducing acceptable p-values are discussed. The concept of property C of BARNARD (1947) is examined in this context. For the classical null-hypothesis p2 < p1 it turns out that property C and acceptability are mutually exclusive criteria. In order to reconcile both ideas, C-acceptable p-values are introduced, i.e. p-values which are undominated in the set of all p-values with property C. Barnard's celebrated unconditional test is shown to be of this type. A necessary and sufficient condition for statistics to induce C-acceptable p-values for the classical null-hypothesesis is formulated. Furthermore some numerical consequences of property C are discussed. Finally the p-value π R induced by Rohmel and Mansmann's recently developed statistic is investigated. It is proven that π R is C-acceptable for all null-hypotheses.
- Published
- 2000
- Full Text
- View/download PDF
45. A Note on the Log-Rank Test in Life Table Analysis with Correlated Observations
- Author
-
Kung-Jong Lui
- Subjects
Statistics and Probability ,Intraclass correlation ,General Medicine ,Test (assessment) ,Log-rank test ,Correlation ,Statistics ,Econometrics ,Multinomial distribution ,Statistics, Probability and Uncertainty ,Special case ,Statistical hypothesis testing ,Type I and type II errors ,Mathematics - Abstract
Survival data consisting of independent sets of correlated failure times may arise in many situations. For example, we may take repeated observations of the failure time of interest from each patient or observations of the failure time on siblings, or consider the failure times on littermates in toxicological experiments. Because the failure times taken on the same patient or related family members or from the same litter are likely correlated, use of the classical log-rank test in these situations can be quite misleading with respect to type I error. To avoid this concern, this paper develops two closed-form asymptotic summary tests, that account for the intraclass correlation between the failure times within patients or units. In fact, one of these two test includes the classical log-rank test as a special case when the intraclass correlation equals 0. Furthermore, to evaluate the finite-sample performance of the two tests developed here, this paper applies Monte Carlo simulation and notes that they can actually perform quite well in a variety of situations considered here.
- Published
- 2000
- Full Text
- View/download PDF
46. Symbols and Terminology in Biometry
- Author
-
David J. Finney
- Subjects
Statistics and Probability ,Vocabulary ,Standardization ,Statement (logic) ,Interpretation (philosophy) ,media_common.quotation_subject ,General Medicine ,Literacy ,Terminology ,law.invention ,Epistemology ,Symbol ,law ,CLARITY ,Statistics, Probability and Uncertainty ,Algorithm ,Mathematics ,media_common - Abstract
The many biologists whose work requires statistical science must be concerned for sound management and interpretation of quantitative data. Biometricians also need increased care for literacy and clarity in speech and writing, with precise phrasing for every numerical statement. This paper illustrates common confusions that arise from inexact terminology, or from words and symbols used without adequate definition. Careless statements on quantitative relations, or on probabilities, may be ambiguous; bad practices seriously pollute scientific journals and obstruct transmission of information. Such faults can affect daily life for a modem citizen. Pedantry is unwanted, and to be dogmatic about corrective measures would be stupid. This paper suggests that biologists and biometricians should examine the practicability of a system yet to be devised for standardizing use of symbols and the generally accepted terminology for the methods, techniques, and processes of statistical analysis. The outcome should influence all that we biometricians say and do - as authors, as consultants, and as referees for journals.
- Published
- 2000
- Full Text
- View/download PDF
47. Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection
- Author
-
Kung-Jong Lui
- Subjects
Statistics and Probability ,Fieller's theorem ,Secondary infection ,Interval estimation ,Coverage probability ,Estimator ,General Medicine ,Wald test ,Confidence interval ,Statistics ,Econometrics ,Tolerance interval ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
Summary This paper discusses interval estimation of the simple difference (SD) between the proportions of the primary infection and the secondary infection, given the primary infection, by developing three asymptotic interval estimators using Wald’s test statistic, the likelihood-ratio test, and the basic principle of Fieller’s theorem. This paper further evaluates and compares the performance of these interval estimators with respect to the coverage probability and the expected length of the resulting confidence intervals. This paper finds that the asymptotic confidence interval using the likelihood ratio test consistently performs well in all situations considered here. When the underlying SD is within 0.10 and the total number of subjects is not large (say, 50), this paper further finds that the interval estimators using Fieller’s theorem would be preferable to the estimator using the Wald’s test statistic if the primary infection probability were moderate (say, 0.30), but the latter is preferable to the former if this probability were large (say, 0.80). When the total number of subjects is large (say, 200), all the three interval estimators perform well in almost all situations considered in this paper. In these cases, for simplicity, we may apply either of the two interval estimators using Wald’s test statistic or Fieller’s theorem without losing much accuracy and efficiency as compared with the interval estimator using the asymptotic likelihood ratio test.
- Published
- 2000
- Full Text
- View/download PDF
48. A Simulation Study of a Partitioning Procedure for Signal Detection with an Application in Medical Imaging
- Author
-
Pinyuen Chen and Michael C. Wicks
- Subjects
Statistics and Probability ,education.field_of_study ,Covariance matrix ,Population ,Identity matrix ,Multivariate normal distribution ,General Medicine ,Covariance ,Statistical power ,Complex normal distribution ,Statistics ,Applied mathematics ,Detection theory ,Statistics, Probability and Uncertainty ,education ,Mathematics - Abstract
We study the performance of a screening procedure R of CHEN, MELVIN, and WICKS (1999) when it is adopted prior to the signal detection procedure R kkr proposed independently by KELLY (1986) and KHATRI and RAO (1987). Through simulation results, we show that the probability of detection for R kkr is significantly improved when procedure R is first used to screen out nonhomogeneous data. Procedure R is a selection procedure which compares k (≥1) experimental populations with a control population and eliminates the dissimilar experimental populations. An experimental population with covariance matrix Σ is said to be similar to the control population with covariance matrix Σ 0 if Σ 0 Σ -1 is close to the identity matrix in certain meaning of closeness to be defined later in the paper. As mentioned in CHEN et al. (1999), procedure R can be used as a screening process prior to any traditional signal processing detection algorithm which requires the assumption of the same covariance matrices for the experimental populations as for the control population. A commonly used such detection algorithm is the one proposed independently by KELLY (1986) and KHATRI and RAO (1987). In this paper, we first simulate data from k + I complex multivariate normal populations which all have zero mean vectors and have different covariance matrices. One of the populations represents the control population and the remaining populations are the experimental populations. Then we apply procedure R to the simulated experimental populations to screen out the dissimilar populations. Finally, R kkr is used to detect a target using respectively the unscreened and the screened data. We present simulation results on the powers of the test R kkr when it is applied to the unscreened data and the screened data. The results illustrate that, under the nonhomogeneous environment where covariance matrices of the experimental populations are different from the covariance matrix of the control population, we can always improve the power of the test R kkr by employing the procedure R. We also present an example illustrating the potential application of our study in medical imaging.
- Published
- 2000
- Full Text
- View/download PDF
49. Randomization of Neighbour Balanced Designs
- Author
-
Joachim Kunert
- Subjects
Statistics and Probability ,Statistics ,General Medicine ,Statistics, Probability and Uncertainty ,Statistical hypothesis testing ,Mathematics ,Block design - Abstract
This paper is a simulation study on the influence of interference between treatments in field trials. The considerations in the paper are based on a simple model which includes additive neighbour effects of treatments. We use uniformity data where neighbour effects are added, to demonstrate the influence that these effects have on the validity of comparisons between treatments. The simulations illustrate that the influence of the neighbour effects is reduced if a neighbour balanced or a partially neighbour balanced design is used.
- Published
- 2000
- Full Text
- View/download PDF
50. Testing Hypotheses about Regression Parameters, When the Error Term Is Heteroscedastic
- Author
-
Rand R. Wilcox
- Subjects
Statistics and Probability ,Heteroscedasticity ,F-test ,Ordinary least squares ,Linear regression ,Statistics ,Robust statistics ,Linear model ,Estimator ,Regression analysis ,General Medicine ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
The paper considers methods for testing H 0 : β 1 =... = β p = 0, where β 1 .....β p are the slope parameters in a linear regression model with an emphasis on p = 2. It is known that even when the usual error term is normal, but heteroscedastic, control over the probability of a type 1 error can be poor when using the conventional F test in conjunction with the least squares estimator. When the error term is nonnormal, the situation gets worse. Another practical problem is that power can be poor under even slight departures from normality. LIU and SINGH (1997) describe a general bootstrap method for making inferences about parameters in a multivariate setting that is based on the general notion of depth. This paper studies the small-sample properties of their method when applied to the problem at hand. It is found that there is a practical advantage to using Tukey's depth versus the Mahalanobis depth when using a particular robust estimator. When using the ordinary least squares estimator, the method improves upon the conventional F test, but practical problems remain when the sample size is less than 60. In simulations, using Tukey's depth with the robust estimator gave the best results, in terms of type I errors, among the five methods studied.
- Published
- 1999
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.