1,798 results
Search Results
2. Identifying key Statistical papers from 1985 to 2002 using citation data for applied biostatisticians
- Author
-
Schell, Michael J.
- Subjects
Variables (Mathematics) -- Usage ,Fractions -- Analysis ,Statistical sampling -- Usage ,Science and technology ,Social sciences - Abstract
Dissemination of ideas from theory to practice is a significant challenge in statistics. Quick identification of articles useful to practitioners would greatly assist in this dissemination, thereby improving science. This article uses the citation count history of articles to identify key papers from 1985 to 2002 from 12 statistics journals for applied biostalisticians. One feature requiring attention in order to appropriately rank an article's impact is assessment of the citation accrual patterns over time. Citation counts in statistics differ dramatically from fields such as medicine. In statistics, most articles receive few citations, with 15-year-old articles from live key journals receiving a median of 13 citations compared to 66 in the Journal of Clinical Oncology. However, statistics articles in the top 2%-3% continue to gain citations at a high rate past 15 years, exceeding those in JCO, whose counts slow dramatically around 8 years past publication. Articles with the highest expected applied uses 20 years post publication were identified using joinpoint regression. In this evaluation, the fraction of citations that represent applied use was defined and estimated. The false discovery rate, quantification of heterogeneity in meta-analysis, and generalized estimating equations rank as the ideas with the greatest estimated applied impact. KEY WORDS: Applied fraction; Citation count; Statistical practice.
- Published
- 2010
3. Einstein's first published paper
- Author
-
Iglewicz, Boris
- Subjects
Physicists -- Works ,Science and technology ,Social sciences - Abstract
This article reviews Albert Einstein's first published paper, submitted for publication in 1900. At that time, Einstein was 21 and a recent college graduate. His paper uses modeling and least squares to analyze data in support of a scientific proposition. Einstein is shown to be well trained, for his day, in using statistics as a tool in his scientific research. This paper also shows his ability to make trivial arithmetic mistakes and some clumsiness in data recording. A major aim of this article is to help provide a better appreciation of Einstein as an active user of statistical arguments in this and other of his important publications. KEY WORDS: Floor function; Least squares; Molecules; Regression; Rounding errors; Scientific method.
- Published
- 2007
4. Rethinking the paper helicopter: combining statistical and engineering knowledge
- Author
-
Annis, David H.
- Subjects
Helicopters -- Design and construction ,Helicopters -- Study and teaching ,Engineering mathematics -- Study and teaching ,Engineering students -- Education ,Science and technology ,Social sciences - Abstract
Box's paper helicopter has been used to teach experimental design for more than a decade. It is simple, inexpensive, and provides real data for an involved, multifactor experiment. Unfortunately it can also further an all-too-common practice that Professor Box himself has repeatedly cautioned against, namely ignoring the fundamental science while rushing to solve problems that may not be sufficiently understood. Often this slighting of the science so as to get on with the statistics is justified by referring to Box's oft-quoted maxim that 'All models are wrong, however some are useful.' Nevertheless, what is equally true, to paraphrase both Professor Box and George Orwell, is that 'All models are wrong, but some are more wrong than others.' To experiment effectively it is necessary to understand the relevant science so as to distinguish between what is usefully wrong, and what is dangerously wrong. This article presents an improved analysis of Box's helicopter problem relying on statistical and engineering knowledge and shows that this leads to an enhanced paper helicopter, requiring fewer experimental trails and achieving superior performance. In fact, of the 20 experimental trials run for validation--10 each of the proposed aerodynamic design and the conventional full factorial optimum--the longest 10 flight times all belong to the aerodynamic optimum, while the shortest 10 all belong to the conventional full factorial optimum. I further discuss how ancillary engineering knowledge can be incorporated into thinking about--and teaching--experimental design. KEY WORDS: Aerodynamics; Engineering education; Experimental design; Factorial; Nonlinear; Quadratic.
- Published
- 2005
5. Tukey's paper after 40 years
- Author
-
Mallows, Colin, Brillinger, David R., Buja, Andreas, Efron, Bradley, Huber, Peter J., and Landwehr, James M.
- Subjects
Statisticians -- Works ,Statistics -- Analysis ,Engineering and manufacturing industries ,Mathematics ,Science and technology - Abstract
A brief overview of Tukey's paper, 'The Future of Data Analysis' after 40 years by various authors is discussed considering the debates regarding whether statistics is a science and ways to attract bright students by showing excitement and rewards of applied work. It is argued that beyond the ideas of data analysis, one should look at how data is analyzed by others and the key concept should be statistical thinking.
- Published
- 2006
6. Statistics, Probability and Game Theory: Papers in Honor of David Blackwell
- Author
-
Fergusun, T. S., Shapley, L. S., and MacQueen, J. B.
- Subjects
Statistics, Probability and Game Theory: Papers in Honor of David Blackwell (Book) ,Books -- Book reviews ,Mathematics - Abstract
T. S. FERGUSON, L. S. SHAPLEY, and J. B. MACQUEEN (Eds.). Hayward, CA: Institute of Mathematical Statistics, 1996. ISBN 0-940600-42-0. xiv + 407 PP. $32 (P). This book is a [...]
- Published
- 1999
7. The Science of Bradley Efron, Selected Papers
- Author
-
Ahmed, S. Ejaz
- Subjects
The Science of Bradley Efron, Selected Papers (Nonfiction work) -- Book reviews ,Books -- Book reviews ,Engineering and manufacturing industries ,Mathematics ,Science and technology - Published
- 2009
8. Selected Papers of Frederick Mosteller
- Author
-
Bradstreet, Thomas E.
- Subjects
Selected Papers of Frederick Mosteller (Book) -- Book reviews ,Books -- Book reviews ,Mathematics - Published
- 2008
9. Celebrating Statistics: Papers in Honour of Sir David Cox on His 80th Birthday
- Author
-
Ghosh, Subir
- Subjects
Celebrating Statistics: Papers in Honour of Sir David Cox on His 80th Birthday (Book) -- Book reviews ,Books -- Book reviews ,Engineering and manufacturing industries ,Mathematics ,Science and technology - Published
- 2007
10. Selected Papers of Frederick Mosteller
- Subjects
Selected Papers of Frederick Mosteller (Book) -- Book reviews ,Books -- Book reviews ,Engineering and manufacturing industries ,Mathematics ,Science and technology - Published
- 2007
11. Essays in Econometrics: Collected Papers of Clive W. J. Granger (Vols. 1 and 2).*
- Author
-
Embrechts, Paul A.L.
- Subjects
Essays in Econometrics: Collected Papers of Clive W. J. Granger (Vols. 1 and 2).* (Book) -- Book reviews ,Books -- Book reviews ,Mathematics - Abstract
Essays in Econometrics: Collected Papers of Clive W. J. Granger (Vols. 1 and 2).* Eric GHYSELS, Norman R. SWANSON, and Mark W. WATSON (Eds.). New York: Cambridge University Press, 2001. [...]
- Published
- 2004
12. Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics
- Author
-
Kafadar, Karen
- Subjects
Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (Book) -- Book reviews ,Books -- Book reviews ,Mathematics - Abstract
This festschrift in honor of Professor Le Cam's 70th birthday attests to the wide-ranging impact of his work in probability theory, specifically on convergence and contiguity of probability measures, superefficiency, [...]
- Published
- 1997
13. Celebrating Statistics: Papers in Honour of Sir David Cox on His 80th Birthday
- Author
-
Hall, Daniel B.
- Subjects
Oxford University Press (Oxford, England) ,Book publishing -- Statistics ,Mathematics - Published
- 2006
14. Annis, D. H. (2005), 'Rethinking the Paper Helicopter: combining statistical and engineering knowledge,' the American statistician, 59, 320-326: comment by Pardo and reply
- Subjects
Science and technology ,Social sciences - Published
- 2006
15. E.T. Jaynes; papers on probability, statistics, and statistical physics
- Author
-
Rodriguez, Carlos C.
- Subjects
E.T. Jaynes; Papers on Probability, Statistics, and Statistical Physics (Book) -- Book reviews ,Books -- Book reviews ,Mathematics - Published
- 1985
16. Robust data-driven inference for density-weighted average derivatives
- Author
-
Cattaneo, Matias D., Crump, Richard K., and Jansson, Michael
- Subjects
Derivatives (Financial instruments) -- Forecasts and trends ,Robust statistics -- Usage ,Market trend/market analysis ,Mathematics - Abstract
This paper presents a novel data-driven bandwidth selector compatible with the small bandwidth asymptotics developed in Cattaneo, Crump, and Jansson (2009) for density-weighted average derivatives. The new bandwidth selector is of the plug-in variety, and is obtained based on a mean squared error expansion of the estimator of interest. An extensive Monte Carlo experiment shows a remarkable improvement in performance when the bandwidth-dependent robust inference procedures proposed by Cattaneo, Crump, and Jansson (2009) are coupled with this new data-driven bandwidth selector. The resulting robust data-driven confidence intervals compare favorably to the alternative procedures available in the literature. The online supplemental material to this paper contains further results from the simulation study. KEY WORDS: Averaged derivative; Bandwidth selection; Robust inference; Small bandwidth asymptotics.
- Published
- 2010
17. Empirical likelihood in missing data problems
- Author
-
Qin, Jing, Zhang, Biao, and Leung, Denis H.Y.
- Subjects
Estimation theory -- Usage ,Combinatorial probabilities -- Usage ,Geometric probabilities -- Usage ,Probabilities -- Usage ,Mathematics - Abstract
Missing data is a ubiquitous problem in medical and social sciences. It is well known that inferences based only on the complete data may not only lose efficiency, but may also lead to biased results if the data is not missing completely at random (MCAR). The inverse-probability weighting method proposed by Horvitz and Thompson (1952) is a popular alternative when the data is not MCAR. The Horvitz-Thompson method, however, is sensitive to the inverse weights and may suffer from loss of efficiency. In this paper, we propose a unified empirical likelihood approach to missing data problems and explore the use of empirical likelihood to effectively combine unbiased estimating equations when the number of estimating equations is greater than the number of unknown parameters. One important feature of this approach is the separation of the complete data unbiased estimating equations from the incomplete data unbiased estimating equations. The proposed method can achieve semiparametric efficiency if the probability of missingness is correctly specified. Simulation results show that the proposed method has better finite sample performance than its competitors. Supplemental materials for this paper, including proofs of the main theoretical results and the R code used for the NHANES example, are available online on the journal website. KEY WORDS: Empirical likelihood; Estimating functions; Missing data: Surrogate end point.
- Published
- 2009
18. Learn from thy neighbor: parallel-chain and regional adaptive MCMC
- Author
-
Craiu, Radu V., Rosenthal, Jeffrey, and Yang, Chao
- Subjects
Markov processes -- Usage ,Algorithms -- Usage ,Monte Carlo method -- Usage ,Algorithm ,Mathematics - Abstract
Starting with the seminal paper of Haario, Saksman, and Tamminen (Haario, Saksman. and Tamminen 2001), a substantial amount of work has been done to validate adaptive Markov chain Monte Carlo algorithms. In this paper we focus on two practical aspects of adaptive Metropolis samplers. First, we draw attention to the deficient performance of standard adaptation when the target distribution is multimodal. We propose a parallel chain adaptation strategy that incorporates multiple Markov chains which are run in parallel. Second, we note that the current adaptive MCMC paradigm implicitly assumes that the adaptation is uniformly efficient on all regions of the state space. However, in many practical instances, different 'optimal' kernels are needed in different regions of the state space. We propose here a regional adaptation algorithm in which we account for possible errors made in defining the adaptation regions. This corresponds to the more realistic case in which one does not know exactly the optimal regions for adaptation. The methods focus on the random walk Metropolis sampling algorithm but their scope is much wider. We provide theoretical justification for the two adaptive approaches using the existent theory build for adaptive Markov chain Monte Carlo. We illustrate the performance of the methods using simulations and analyze a mixture model for real data using an algorithm that combines the two approaches. KEY WORDS: Adaptive Markov chain Monte Carlo; Metropolis sampling; Parallel chains; Random walk Metropolis sampling; Regional adaptation.
- Published
- 2009
19. The use of statistics in medical research: a comparison of The New England Journal of Medicine and Nature Medicine
- Author
-
Strasak, Alexander M., Zaman, Qamruz, Marinell, Gerhard, Pfeiffer, Karl P., and Ulmer, Hanno
- Subjects
Medical research -- Analysis ,Medicine, Experimental -- Analysis ,Statistical methods -- Usage ,Chaos theory -- Analysis ,Error analysis (Mathematics) -- Usage ,Medical journals -- Technology application ,Technology application ,Science and technology ,Social sciences - Abstract
There is widespread evidence of the extensive use of statistical methods in medical research. Just the same, standards are generally low and a growing body of literature points to statistical errors in most medical journals. However, there is no comprehensive study contrasting the top medical journals of basic and clinical science for recent practice in their use of statistics. All original research articles in Volume 10, Numbers 1-6 of Nature Medicine (Nat Med) and Volume 350, Numbers 1-26 of The New England Journal of Medicine (NEJM) were screened for their statistical content. Types, frequencies, and complexity of applied statistical methods were systematically recorded. A 46-item checklist was used to evaluate statistical quality for a subgroup of papers. 94.5 percent (95% CI 87.6-98.2) of NEJM articles and 82.4 percent (95% CI 65.5-93.2) of Nat Med articles contained inferential statistics. NEJM papers were significantly more likely to use advanced statistical methods (p < 0.0001). Statistical errors were identified in a considerable proportion of articles, although not always serious in nature. Documentation of applied statistical methods was generally poor and insufficient, particularly in Nat Med. Compared to 1983, a vast increase in usage and complexity of statistical methods could be observed for NEJM papers. This does not necessarily hold true for Nat Med papers, as the results of the study indicate that basic science sticks with basic analysis. As statistical errors seem to remain common in medical literature, closer attention to statistical methodology should be seriously considered to raise standards. KEY WORDS: Complexity; Errors and shortcomings; Statistical methods in medical journals; Techniques.
- Published
- 2007
20. Statistical research: some advice for beginners
- Author
-
Hamada, Michael and Sitter, Randy
- Subjects
Statistics -- Methods ,Statistics -- Usage ,Statistics -- Research ,Science and technology ,Social sciences - Abstract
Editor's Note: Research is essential to the health and growth of the statistics discipline. The following article discusses some basic strategies for doing and presenting research based on the authors' experience and conversations with other statisticians. The August 2004 issue of The American Statistician will feature a discussion on the topic 'How to do Statistical Research.' All readers are invited to contribute to this special section. Discussion about this article or general perspectives on being a researcher in the discipline of statistics are welcome. Because of space limitations, we ask that your contribution not exceed 500 words. Articles received by the TAS editorial office (tas@bgnet.bgsu.edu) by June 4, 2004, will be considered for publication.--James Albert, Editor, The American Statistician For new graduate students, we discuss issues and aspects of doing statistical research and provide advice. We answer questions that we had when we were beginners, like 'When do I start?', 'How do I start?', 'How do I find out what has already been done?', 'How do I make progress?', 'How do I finish?', and 'What else can I do?'. KEY WORDS: Finding problems; Identifying literature; Presenting; Reading papers; Writing., 1. INTRODUCTION In an academic environment, where most researchers start, it is easy for the beginner to focus too narrowly on a thesis, a paper in a journal, and/or a [...]
- Published
- 2004
21. Fast estimators of the jackknife
- Author
-
Buzas, J.S.
- Subjects
Curves, Algebraic -- Research ,Error analysis -- Research ,Statistical sampling -- Research ,Science and technology ,Social sciences - Abstract
The jackknife is a reliable method for estimating standard error nonparametrically. The method is easy to use, but is computationally intensive. The time required to compute the jackknife standard error for an estimator [Theta] will depend on the time required to compute [Theta] itself. For some estimators the required time is prohibitive. This may be especially true in simulation studies where [Theta] and its standard error are computed for a large number of datasets. Let [[X.sub.1],[X.sub.2], ..., [X.sub.N] be a random sample and [[Theta].sub.(i)] the estimator computed with [X.sub.i] removed. Then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the jackknife estimator of the variability of [Theta] where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In this paper estimators of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are defined that can be computed quickly while sacrificing little precision or accuracy. The method requires that random variables [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are available that can be computed quickly and are strongly correlated with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. It is described how [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] can generally be obtained, and the method is illustrated with two examples. The paper focuses on the jackknife estimator for standard error, but the method can also be applied to quickly compute the jackknife estimator of bias. KEY WORDS: Bootstrap; Influence curve; Measurement error; Sampling., 1. INTRODUCTION The ordinary jackknife is a reliable method for estimating the standard error of an estimator nonparametrically. Efron and Gong (1983) discuss properties of the jackknife and compare it [...]
- Published
- 1997
22. Computer-aided teaching of probabilistic modeling for biological phenomena
- Author
-
Goel, Prem K., Peruggia, Mario, and An, Baoshe
- Subjects
Stochastic processes -- Study and teaching ,Computer-assisted instruction -- Evaluation ,Biology -- Study and teaching ,Science and technology ,Social sciences - Abstract
College students majoring in science and engineering need to learn how to model key features of the driving mechanisms of natural, scientific, and engineering phenomena. A rigorous treatment of these topics requires a thorough understanding of advanced mathematical concepts and probability theory. However, we believe that carefully designed computer simulation software offers a means of conveying fundamental ideas of probabilistic modeling, while minimizing the need for underlying mathematical analyses. Based on this premise we have initiated the development of a software system that will be incorporated into a novel, introductory course in probabilistic modeling for undergraduate students in the biological and environmental sciences. In this paper we describe the preliminary version of our system that implements simulation, real-time animation, and calculations of dynamic statistical summaries for several prototypical stochastic models for a variety of biological systems. KEY WORDS: Dynamic graphics; Stochastic processes; Stochastic simulation., 1. INTRODUCTION In this paper we describe the preliminary version of a software system for the computer simulation of prototypical stochastic models for a variety of biological systems. Our primary [...]
- Published
- 1997
23. A concrete strategy for teaching hypothesis testing
- Author
-
Loosen, Franz
- Subjects
Statistical hypothesis testing -- Study and teaching ,Statistics -- Study and teaching ,Teaching -- Methods ,Science and technology ,Social sciences - Abstract
This paper describes a physical device that can be used as a teaching aid for hypothesis testing instruction. The accompanying verbal commentary is sketched, and advantages over traditional teaching methods are discussed. KEY WORDS: Hypothesis testing; Statistical teaching aids., 1. INTRODUCTION This paper presents a simple device (hereafter referred to as the demonstrator) that can be used as a teaching aid for introducing the basic concepts in the 'classical' [...]
- Published
- 1997
24. Elemental subsets: the building blocks of regression
- Author
-
Mayo, Matthew S. and Gray, J. Brian
- Subjects
Regression analysis -- Research ,Robust statistics -- Research ,Science and technology ,Social sciences - Abstract
In a regression dataset an elemental subset consists of the minimum number of cases required to estimate the unknown parameters of a regression model. The resulting elemental regression provides an exact fit to the cases in the elemental subset. Early methods of regression estimation were based on combining the results of elemental regressions. This approach was abandoned because of its computational infeasibility in all but the smallest datasets and because of the arrival of the least squares method. With the computing power available today, there has been renewed interest in making use of the elemental regressions for model fitting and diagnostic purposes. In this paper we consider the elemental subsets and their associated elemental regressions as useful 'building blocks' for the estimation of regression models. Many existing estimators can be expressed in terms of the elemental regressions. We introduce a new classification of regression estimators that generalizes a characterization of ordinary-least squares (OLS) based on elemental regressions. Estimators in this class are weighted averages of the elemental regressions, where the weights are determined by leverage and residual information associated with the elemental subsets. The new classification incorporates many existing estimators and provides a framework for developing new alternatives to least squares regression, including the trimmed elemental estimators (TEE) proposed in this paper. KEY WORDS: Elemental regression; Leverage; Residual; Robust regression., 1. INTRODUCTION The multiple linear regression model is of the form Y = X[Beta] + [Epsilon] (1.1) where Y is an n x 1 vector of random observations, X is [...]
- Published
- 1997
25. Practical small-sample asymptotics for regression problems
- Author
-
Strawderman Robert L., Casella, George, and Wells, Martin T.
- Subjects
Regression analysis -- Research ,Asymptotic expansions -- Analysis ,Mathematics - Abstract
A general method for estimating the distribution of a solution to an estimating equation defined through as sum of independent random variables is proposed. Suggested expansions are valid for the distribution of approximated parameters from general regression models and are not different from saddlepoint approximations in the case of iid random variables. A main advantage of the proposed methodology is that in requires on the conditional cumulant-generating function to be specified., 1. INTRODUCTION Daniels's (1954) seminal paper on saddlepoint approximations in statistics spawned a great deal of research in the general area of small-sample asymptotic approximations. His original paper concentrated primarily [...]
- Published
- 1996
26. Generating multivariate categorical variates using the iterative proportional fitting algorithm
- Author
-
Gange, Stephen J.
- Subjects
Multivariate analysis -- Methods ,Random number generators -- Analysis ,Science and technology ,Social sciences - Abstract
Two recent papers have suggested methods for generating correlated binary data with fixed marginal distributions and specified degrees of pairwise association. Emrich and Piedmonte suggested a method based on the existence of a multivariate normal distribution, while Lee suggested methods based on linear programming and Archimedian copulas. In this paper, a simpler method is described using the iterative proportional fitting algorithm for generating an n-dimensional distribution of correlated categorical data with specified margins of dimension 1, 2, . . ., k [less than] n. An example of generating a distribution for a generalized estimating equations (GEE) model is discussed. KEY WORDS: Correlated outcomes; Generalized estimating equations; Log-linear models; Random number generation., 1. INTRODUCTION Over the past several years, a significant amount of research has explored the use of regression models for analyzing correlated categorical data. In these models, it is assumed [...]
- Published
- 1995
27. Use of nested orthogonal contrasts in analyzing rank data
- Author
-
Marden, John I.
- Subjects
Decision-making -- Analysis ,Mathematical models -- Analysis ,Standard deviations -- Analysis ,Mathematics - Abstract
A data set consisting of 143 rankings of 10 occupations from a survey of Goldberg has been analyzed in a number of recent papers. The purpose of this paper is to use so-called 'nested orthogonal contrasts' of the occupations to gain further insight into the data. A contrast is a comparison of subsets of the occupations based on their relative ranks; contrasts are orthogonal if the comparisons they represent are not confounded. Various models based on sets of orthogonal contrasts-including contingency table models, models analogous to those of Fligner and Verducci, and latent class models--are applied to the data. It is found that there are three main groups of occupations based on overall prestige, and within each group are distinctions between managerial and technical occupations. KEY WORDS: Contingency tables; Discordant pairs; Kendall's tau; Mallows's [pi] model; Orthogonal contrasts; Permutations; Rank statistics., 1. BACKGROUND One question on a survey of engineering graduates discussed by Goldberg (1976) asked people to rank m = 10 occupations according to their perceived social prestige, with 1 [...]
- Published
- 1992
28. Recent publications in JSS
- Subjects
Mathematics ,Science and technology - Abstract
The following abstracts come from papers recently accepted to the Journal of Statistical Software (JSS). JSS is free; code and papers may be downloaded at no cost. Code means source [...]
- Published
- 2004
29. How to read the statistical methods literature: a guide for students
- Author
-
Murphy, James R.
- Subjects
Statistics -- Study and teaching ,Teaching -- Methods ,Science and technology ,Social sciences - Abstract
Statistical methods papers are densely written. The writers assume that the readers already have sophisticated knowledge of the topic. In addition, a standard statistical notation has not been developed. Students who learn a technique in one notation may be confused when reading articles written with a different notation. This paper contains suggestions for making the student's task easier and more productive. KEY WORDS: Pedagogy; Reading statistical methods; Teaching statistics., 1. INTRODUCTION Several guides tell the nonstatistician how to read and interpret applied statistical results. Huff (1954) gives five points for the skeptical reader to keep in mind. Sackett (1991) [...]
- Published
- 1997
30. Integrating scientific writing into a statistics curriculum: a course in statistically based scientific writing
- Author
-
Samsa, Gregory and Oddone, Eugene Z.
- Subjects
Statistics -- Curricula ,Technical writing -- Curricula ,Science and technology ,Social sciences - Abstract
A course in writing and critical appraisal of medical papers that uses statistics is described, and its relationship to the goal of better integrating scientific writing into the statistics curriculum is discussed. It is concluded that writing should play an increased role in statistical education and that this can best be accomplished by distributing exercises in writing and critical appraisal throughout the curriculum. Writing assignments, such as simulated practice in producing components of scientific papers and grants, should reflect students' likely uses of statistics. KEY WORDS: Communication; Critical appraisal; Curriculum development., 1. INTRODUCTION Effective communication, including scientific writing, is an essential element of statistical practice. Statisticians (and other users of statistics) write to describe, to explain, to clarify, to interpret, to [...]
- Published
- 1994
31. Recent Publications in JSS
- Subjects
Statistics -- Methods ,Graph theory -- Methods ,Statistical software ,Mathematics ,Science and technology - Abstract
The following abstract comes from a paper recently accepted to the Journal of Statistical Software (JSS). JSS is free; code and papers may be downloaded at no cost. Code means [...]
- Published
- 2001
32. Statistical and Probabilistic Models in Reliability
- Author
-
ZIEGEL, ERIC R.
- Subjects
Statistical and Probabilistic Models in Reliability (Book) ,Books -- Book reviews ,Engineering and manufacturing industries ,Mathematics ,Science and technology - Abstract
This book contains 24 papers chosen from among 61 papers that were presented at the First International Conference on Mathematical Methods in Reliability. The conference was held in Bucharest. The [...]
- Published
- 2000
33. Nonlinear Modeling and Forecasting: Proceedings of the Workshop on Nonlinear Modeling and Forecasting
- Author
-
KAFADAR, KAREN
- Subjects
Nonlinear Modeling and Forecasting (Book) ,Books -- Book reviews ,Mathematics - Abstract
M. CASDAGLI and S. EUBANK, eds. Reading, MA: Addison-Wesley, 1992. xxiii + 533 pp. $49.50 (hardcover); $34.50 (paper). This book is the proceedings of papers presented at a workshop in [...]
- Published
- 1994
34. Modeling longitudinal data using a pair-copula decomposition of serial dependence
- Author
-
Smith, Michael, Min, Aleksey, Almeida, Carlos, and Czado, Claudia
- Subjects
Markov processes -- Usage ,Bayesian statistical decision theory -- Models ,Mathematics - Abstract
Copulas have proven to he very successful tools for the flexible modeling of cross-sectional dependence. In this paper we express the dependence structure of continuous-valued time series data using a sequence of bivariate copulas. This corresponds to a type of decomposition recently called a 'vine' in the graphical models literature, where each copula is entitled a 'pair-copula.' We propose a Bayesian approach for the estimation of this dependence structure for longitudinal data. Bayesian selection ideas are used to identify any independence pair-copulas, with the end result being a parsimonious representation of a time-inhomogeneous Markov process of varying order. Estimates are Bayesian model averages over the distribution of the lag structure of the Markov process. Using a simulation study we show that the selection approach is reliable and can improve the estimates of both conditional and unconditional pairwise dependencies substantially. We also show that a vine with selection outperforms a Gaussian copula with a flexible correlation matrix. The advantage of the pair-copula formulation is further demonstrated using a longitudinal model of intraday electricity load. Using Gaussian. Gumbel, and Clayton pair-copulas we identify parsimonious decompositions of intraday serial dependence, which improve the accuracy of intraday load forecasts. We also propose a new diagnostic for measuring the goodness of lit of high-dimensional multivariate copulas. Overall, the pair-copula model is very general and the Bayesian method generalizes many previous approaches for the analysis of longitudinal data. Supplemental materials for the article are also available online. KEY WORDS: Bayesian model selection; Copula diagnostic; Covarianee selection; D-vine; Goodness of fit; Inhomogeneous Markov process; Intraday electricity load; Longitudinal copulas.
- Published
- 2010
35. Dimension reduction in regressions through cumulative slicing estimation
- Author
-
Zhu, Li-Ping, Zhu, Li-Xing, and Feng, Zheng-Hui
- Subjects
Regression analysis -- Usage ,Mathematics - Abstract
In this paper we offer a complete methodology of cumulative slicing estimation to sufficient dimension reduction. In parallel to the classical slicing estimation, we develop three methods that are termed, respectively, as cumulative mean estimation, cumulative variance estimation, and cumulative directional regression. The strong consistency for p = O([n.sup.1/2]/log n) and the asymptotic normality for p = o([n.sup.1/2]) are established, where p is the dimension of the predictors and n is sample size. Such asymptotic results improve the rate p = o([n.sup.1/3]) in many existing contexts of semiparametric modeling. In addition, we propose a modified BIC-type criterion to estimate the structural dimension of the central subspace. Its consistency is established when p = o([n.sup.1/2]). Extensive simulations are carried out for comparison with existing methods and a real data example is presented for illustration. KEY WORDS: Inverse regression; Slicing estimation; Sufficient dimension reduction; Ultrahigh dimensionality.
- Published
- 2010
36. Localized realized volatility modeling
- Author
-
Chen, Ying, Hardle, Wolfgang Karl, and Pigorsch, Uta
- Subjects
Autoregression (Statistics) -- Usage ,Mathematics - Abstract
With the recent availability of high-frequency financial data the long-range dependence of volatility regained researchers' interest and has led to the consideration of long-memory models for volatility. The long-range diagnosis of volatility, however, is usually stated for long sample periods, while for small sample sizes, such as one year, the volatility dynamics appears to be better described by short-memory processes. The ensemble of these seemingly contradictory phenomena point towards short-memory models of volatility with nonstationarities, such as structural breaks or regime switches, that spuriously generate a long memory pattern. In this paper we adopt this view on the dependence structure of volatility and propose a localized procedure for modeling realized volatility. That is at each point in time we determine a past interval over which volatility is approximated by a local linear process. A simulation study shows that long memory processes as well as short memory processes with structural breaks can be well approximated by this local approach. Furthermore, using S&P500 data we find that our local modeling approach outperforms long-memory type models and models with structural breaks in terms of predictability. KEY WORDS: Adaptive procedure; Localized autoregressive modeling.
- Published
- 2010
37. Approximate Bayesian Computation: A nonparametric perspective
- Author
-
Blum, Michael G. B.
- Subjects
Polynomials -- Analysis ,Kernel functions -- Usage ,Regression analysis -- Usage ,Mathematics - Abstract
Approximate Bayesian Computation is a family of likelihood-free inference techniques that are well suited to models defined in terms of a stochastic generating mechanism. In a nutshell, Approximate Bayesian Computation proceeds by computing summary statistics [s.sub.obs] from that data and simulating summary statistics for different values of the parameter [THETA]. The posterior distribution is then approximated by an estimator of the conditional density g([THETA]|[S.sub.obs]). In this paper, we derive the asymptotic bias and variance of the standard estimators of the posterior distribution which are based on rejection sampling and linear adjustment. Additionally, we introduce an original estimator of the posterior distribution based on quadratic adjustment and we show that its bias contains a fewer number of terms than the estimator with linear adjustment. Although we find that the estimators with adjustment are not universally superior to the estimator based on rejection sampling, we find that they can achieve better performance when there is a nearly homoscedastic relationship between the summary statistics and the parameter of interest. To make this relationship as homoscedastic as possible, we propose to use transformations of the summary statistics. In different examples borrowed from the population genetics and epidemiological literature, we show the potential of the methods with adjustment and of the transformations of the summary statistics. Supplemental materials containing the details of the proofs are available online. KEY WORDS: Conditional density estimation; Implicit statistical model; Kernel regression; Local polynomial; Simulation-based inference
- Published
- 2010
38. Least absolute relative error estimation
- Author
-
Chen, Kani, Guo, Shaojun, Lin, Yuanyuan, and Ying, Zhiliang
- Subjects
Regression analysis -- Models ,Logarithmic functions -- Models ,Weighting (Statistics) -- Usage ,Mathematics - Abstract
Multiplicative regression model or accelerated failure time model, which becomes linear regression model after logarithmic transformation, is useful in analyzing data with positive responses, such as stock prices or life times, that are particularly common in economic/financial or biomedical studies. Least squares or least absolute deviation are among the most widely used criterions in statistical estimation for linear regression model. However, in many practical applications, especially in treating, for example, stock price data, the size of relative error, rather than that of error itself, is the central concern of the practitioners. This paper offers an alternative to the traditional estimation methods by considering minimizing the least absolute relative errors for multiplicative regression models. We prove consistency and asymptotic normality and provide an inference approach via random weighting. We also specify the error distribution, with which the proposed least absolute relative errors estimation is efficient. Supportive evidence is shown in simulation studies. Application is illustrated in an analysis of stock returns in Hong Kong Stock Exchange. KEY WORDS: Logarithm transformation; Multiplicative regression model; Random weighting.
- Published
- 2010
39. Tests for error correlation in the functional linear model
- Author
-
Gabrys, Robertas, Horvath, Lajos, and Kokoszka, Piotr
- Subjects
Regression analysis -- Usage ,Correlation (Statistics) -- Usage ,Principal components analysis -- Usage ,Mathematics - Abstract
The paper proposes two inferential tests for error correlation in the functional linear model, which complement the available graphical goodness-of-fit checks. To construct them, finite dimensional residuals are computed in two different ways, and then their autocorrelations are suitably defined. From these autocorrelation matrices, two quadratic forms are constructed whose limiting distribution are chi-squared with known numbers of degrees of freedom (different for the two forms). The asymptotic approximations are suitable for moderate sample sizes. The test statistics can be relatively easily computed using the R package fda, or similar MATLAB software. Application of the tests is illustrated on magnetometer and financial data. The asymptotic theory emphasizes the differences between the standard vector linear regression and the functional linear regression. To understand the behavior of the residuals obtained from the functional linear model, the interplay of three types of approximation errors must be considered, whose sources are: projection on a finite dimensional subspace, estimation of the optimal subspace, and estimation of the regression kernel. KEY WORDS: Correlated errors; Functional regression; Principal components.
- Published
- 2010
40. Correlated z-values and the accuracy of large-scale statistical estimates
- Author
-
Efron, Bradley
- Subjects
Acceleration (Mechanics) -- Analysis ,Correlation (Statistics) -- Usage ,Combinatorial probabilities -- Analysis ,Geometric probabilities -- Analysis ,Probabilities -- Analysis ,Mathematics - Abstract
We consider large-scale studies in which there are hundreds or thousands of correlated cases to investigate, each represented by its own normal variate, typically a z-value. A familiar example is provided by a microarray experiment comparing healthy with sick subjects' expression levels for thousands of genes. This paper concerns the accuracy of summary statistics for the collection of normal variates, such as their empirical cdf or a false discovery rate statistic. It seems like we must estimate an N by N correlation matrix, N the number of cases, but our main result shows that this is nut necessary: good accuracy approximations can be based on the root mean square correlation over all N * (N - 1)/2 pairs, a quantity often easily estimated. A second result shows that z-values closely follow normal distributions even under nonnull conditions, supporting application of the main theorem. Practical application of the theory is illustrated for a large leukemia microarray study. KEY WORDS: Acceleration; Correlation penalty; Empirical process; Mehler's identity; Nonnull z-values; Rms correlation.
- Published
- 2010
41. An ensemble Kalman filter and smoother for satellite data assimilation
- Author
-
Stroud, Jonathan R., Stein, Michael L., Lesht, Barry M., Schwab, David J., and Beletsky, Dmitry
- Subjects
Algorithms -- Usage ,Kalman filtering -- Analysis ,Analysis of covariance -- Usage ,Algorithm ,Mathematics - Abstract
This paper proposes a methodology for combining satellite images with advection-diffusion models for interpolation and prediction of environmental processes. We propose a dynamic state-space model and an ensemble Kalman filter and smoothing algorithm for on-line and retrospective state estimation. Our approach addresses the high dimensionality, measurement bias, and nonlinearities inherent in satellite data. We apply the method to a sequence of SeaWiFS satellite images in Lake Michigan from March 1998, when a large sediment plume was observed in the images following a major storm event. Using our approach, we combine the images with a sediment transport model to produce maps of sediment concentrations and uncertainties over space and time. We show that our approach improves out-of-sample RMSE by 20%-30% relative to standard approaches. This article has supplementary material online. KEY WORDS: Circulant embedding; Covariance tapering; Gaussian random field; Nonlinear state-space model; Spatial statistics; Spatiotemporal model; Variogram.
- Published
- 2010
42. Optimal partitioning for linear mixed effects models: applications to identifying placebo responders
- Author
-
Tarpey, Thaddeus, Petkova, Eva, Lu, Yimeng, and Govindarajulu, Usha
- Subjects
Cluster analysis -- Usage ,Event history analysis -- Usage ,Mathematics - Abstract
A longstanding problem in clinical research is distinguishing drug-treated subjects that respond due to specific effects of the drug from those that respond to nonspecific (or placebo) effects of the treatment. Linear mixed effect models are commonly used to model longitudinal clinical trial data. In this paper we present a solution to the problem of identifying placebo responders using an optimal partitioning methodology for linear mixed effects models. Since individual outcomes in a longitudinal study correspond to curves, the optimal partitioning methodology produces a set of prototypical outcome profiles. The optimal partitioning methodology can accommodate both continuous and discrete covariates. The proposed partitioning strategy is compared and contrasted with the growth mixture modeling approach. The methodology is applied to a two-phase depression clinical trial where subjects in a first phase were treated openly for 12 weeks with fluoxetine followed by a double blind discontinuation phase where responders to treatment in the first phase were randomized to either stay on fluoxetine or switched to a placebo. The optimal partitioning methodology is applied to the first phase to identify prototypical outcome profiles. Using time to relapse in the second phase of the study, a survival analysis is performed on the partitioned data. The optimal partitioning results identify prototypical profiles that distinguish whether subjects relapse depending on whether or not they stay on the drug or are randomized to a placebo. KEY WORDS: B-spline; Cluster analysis; Finite mixture models; Functional data; Kaplan Meier functions; Orthonormal basis; Principal components; Repeated measures; Survival analysis.
- Published
- 2010
43. Informative retesting
- Author
-
Bilder, Christopher R., Tebbs, Joshua M., and Chen, Peng
- Subjects
Medical screening -- Methods ,Medical tests -- Methods ,Mathematics - Abstract
In situations where individuals are screened for an infectious disease or other binary characteristic and where resources for testing are limited, group testing can offer substantial benefits. Group testing, where subjects are tested in groups (pools) initially, has been successfully applied to problems in blood bank screening, public health, drug discovery, genetics, and many other areas. In these applications, often the goal is to identify each individual as positive or negative using initial group tests and subsequent retests of individuals within positive groups. Many group testing identification procedures have been proposed; however, the vast majority of them fail to incorporate heterogeneity among the individuals being screened. In this paper, we present a new approach to identify positive individuals when covariate information is available on each. This covariate information is used to structure how retesting is implemented within positive groups; therefore, we call this new approach 'informative retesting.' We derive closed-form expressions and implementation algorithms for the probability mass functions for the number of tests needed to decode positive groups. These informative retesting procedures are illustrated through a number of examples and are applied to chlamydia and gonorrhea testing in Nebraska for the Infertility Prevention Project. Overall, our work shows compelling evidence that informative retesting can dramatically decrease the number of tests while providing accuracy similar to established noninformative retesting procedures. This article has supplementary material online. KEY WORDS: Binary response; Chlamydia; Gonorrhea; Group testing; Identification; Pooled testing.
- Published
- 2010
44. Prediction of functional status for the elderly based on a new ordinal regression model
- Author
-
Hong, Hyokyoung Grace and He, Xuming
- Subjects
Aged -- Health aspects ,Monte Carlo method -- Usage ,Nonparametric tests -- Usage ,Quantile regression -- Usage ,Mathematics - Abstract
The functional mobility of the elderly is a very important factor in aging research, and prognostic information is valuable in making clinical and health care policy decisions. We develop a predictive model for the functional status of the elderly based on data from the Second Longitudinal Study of Aging (LSOA II). The functional status is an ordinal response variable. The ordered probit model has been moderately successful in analyzing such data; however, its reliance on the normal distribution for its latent variable hinders its accuracy and potential. In this paper, we focus on the prediction of conditional quantiles of the functional status based on a more general transformation model. The proposed estimation procedure does not rely on any parametric specification of the conditional distribution functions, aiming to reduce model misspecification errors in the prediction. Cross-validation within the LSOA II data shows that our prediction intervals are more informative than those from the ordered probit model. Monte Carlo simulations also demonstrate the merits of our approach in the analysis of ordinal response variables. KEY WORDS: Nonparametric transformation model; Ordinal data; Quantile regression; Second Longitudinal Study of Aging.
- Published
- 2010
45. Causal effects of treatments for informative missing data due to progression/death
- Author
-
Lee, Keunbaik, Daniels, Michael J., and Sargent, Daniel J.
- Subjects
Levels of measurement (Statistics) -- Usage ,Mathematics - Abstract
In longitudinal clinical trials, when outcome variables at later time points are only defined for patients who survive to those times, the evaluation of the causal effect of treatment is complicated. In this paper, we describe an approach that can be used to obtain the causal effect of three treatment arms with ordinal outcomes in the presence of death using a principal stratification approach. We introduce a set of flexible assumptions to identify the causal effect and implement a sensitivity analysis for nonidentifiable assumptions which we parameterize parsimoniously. Methods are illustrated on quality of life data from a recent colorectal cancer clinical trial. This article has supplementary material online. KEY WORDS: Ordinal data; Principal stratification; QOL; Sensitivity analysis.
- Published
- 2010
46. A Bayesian vector multidimensional scaling procedure for the analysis of ordered preference data
- Author
-
Fong, Duncan K. H., DeSarbo, Wayne S., Park, Joonwook, and Scott, Crystal J.
- Subjects
Bayesian statistical decision theory -- Usage ,Multidimensional scaling -- Usage ,Mathematics - Abstract
Multidimensional scaling (MDS) comprises a family of geometric models for the multidimensional representation of data and a corresponding set of methods for fitting such models to actual data. In this paper, we develop a new Bayesian vector MDS model to analyze ordered successive categories preference/dominance data commonly collected in many social science and business studies. A joint spatial representation of the row and column elements of the input data matrix is provided in a reduced dimensionality such that the geometric relationship of the row and column elements renders insight into the utility structure underlying the data. Unlike classical deterministic MDS procedures, the Bayesian method includes a probability based criterion to determine the number of dimensions of the derived joint space map and provides posterior interval as well as point estimates for parameters of interest. Also, our procedure models the raw integer successive categories data which ameliorates the need of any data preprocessing as required for many metric MDS procedures. Furthermore, the proposed Bayesian procedure allows external information in the form of an intractable posterior distribution derived from a related dataset to be incorporated as a prior in deriving the spatial representation of the preference data. An actual commercial application dealing with consumers' intentions to buy new luxury sport utility vehicles are presented to illustrate the proposed methodology. Favorable comparisons are made with more traditional MDS approaches. KEY WORDS: Bayesian analysis; Multidimensional scaling; Preference analysis; Sports utility vehicles.
- Published
- 2010
47. Using evidence of mixed populations to select variables for clustering very high-dimensional data
- Author
-
Chan, Yao-Ban and Hall, Peter
- Subjects
Nonparametric statistics -- Usage ,Population -- Analysis ,Signal processing -- Analysis ,Digital signal processor ,Mathematics - Abstract
In this paper we develop a nonparametric approach to clustering very high-dimensional data, designed particularly for problems where the mixture nature of a population is expressed through multimodality of its density. Therefore, a technique based implicitly on mode testing can be particularly effective. In principle, several alternative approaches could be used to assess the extent of multimodality, but in the present problem the excess mass method has important advantages. We show that the resulting methodology for determining clusters is particularly effective in cases where the data are relatively heavy tailed or show a moderate to high degree of correlation, or when the number of important components is relatively small. Conversely, in the case of light-tailed, almost-independent components when there are many clusters, clustering in terms of modality can be less reliable than more conventional approaches. This article has supplementary material online. KEY WORDS: Bandwidth test; Bootstrap; Density estimation; Excess mass; Mode test; Multimodality.
- Published
- 2010
48. Posterior simulation in countable mixture models for large datasets
- Author
-
Guha, Subharup
- Subjects
Markov processes -- Usage ,Monte Carlo method -- Usage ,Data compression -- Analysis ,Distribution (Probability theory) -- Usage ,Mathematics - Abstract
Mixture models, or convex combinations of a countable number of probability distributions, offer an elegant framework for inference when the population of interest can be subdivided into latent clusters having random characteristics that are heterogeneous between, but homogeneous within, the clusters. Traditionally, the different kinds of mixture models have been motivated and analyzed from very different perspectives, and their common characteristics have not been fully appreciated. The inferential techniques developed for these models usually necessitate heavy computational burdens that make them difficult, if not impossible, to apply to the massive data sets increasingly encountered in real world studies. This paper introduces a flexible class of models called generalized Polya urn (GPU) processes. Many common mixture models, including finite mixtures, hidden Markov models, and Dirichlet processes, are obtained as special cases of GPU processes. Other important special cases include finite-dimensional Dirichlet priors, infinite hidden Markov models, analysis of densities models, nested Chinese restaurant processes, hierarchical DP models, nonparametric density models, spatial Dirichlet processes, weighted mixtures of DP priors, and nested Dirichlet processes. An investigation of the theoretical properties of GPU processes offers new insight into asymptotics that form the basis of cost-effective Markov chain Monte Carlo (MCMC) strategies for large datasets. These MCMC techniques have the advantage of providing inferences from the posterior of interest, rather than an approximation, and are applicable to different mixture models. The versatility and impressive gains of the methodology are demonstrated by simulation studies and by a semiparametric Bayesian analysis of high-resolution comparative genomic hybridization data on lung cancer. The appendixes are available online as supplemental material. KEY WORDS: Data squashing; Dirichlet process; Generalized Polya urn process; Hidden Markov model; Markov chain Monte Carlo; Semiparametric Bayes.
- Published
- 2010
49. Dynamic nonparametric Bayesian models for analysis of music
- Author
-
Ren, Lu, Dunson, David, Lindroth, Scott, and Carin, Lawrence
- Subjects
Monte Carlo method -- Usage ,Markov processes -- Usage ,Bayesian statistical decision theory -- Usage ,Mathematics - Abstract
The dynamic hierarchical Dirichlet process (dHDP) is developed to model complex sequential data, with a focus on audio signals from music. The music is represented in terms of a sequence of discrete observations, and the sequence is modeled using a hidden Markov model (HMM) with time-evolving parameters. The dHDP imposes the belief that observations that are temporally proximate are more likely to be drawn from HMMs with similar parameters, while also allowing for 'innovation' associated with abrupt changes in the music texture. The sharing mechanisms of the time-evolving model are derived, and for inference a relatively simple Markov chain Monte Carlo sampler is developed. Segmentation of a given musical piece is constituted via the model inference. Detailed examples are presented on several pieces, with comparisons to other models. The dHDP results are also compared with a conventional music-theoretic analysis. All the supplemental materials used by this paper are available online. KEY WORDS: Dynamic Dirichlet process; Hidden Markov Model; Mixture Model; Segmentation; Sequential data; Time series.
- Published
- 2010
50. Variable selection with the strong heredity constraint and its oracle property
- Author
-
Choi, Nam Hee, Li, William, and Zhu, Ji
- Subjects
Genetics -- Analysis ,Heredity -- Models ,Mathematics - Abstract
In this paper, we extend the LASSO method (Tibshirani 1996) for simultaneously fitting a regression model and identifying important interaction terms. Unlike most of the existing variable selection methods, our method automatically enforces the heredity constraint, that is, an interaction term can be included in the model only if the corresponding main terms are also included in the model. Furthermore, we extend our method m generalized linear models, and show that it performs as well as if the true model were given in advance, that is, the oracle properly as in Fun and Li (2001) and Fan and Peng (2004). The proof of the oracle property is given in online supplemental materials, Numerical results on both simulation data and real data indicate that our method lends lo remove irrelevant variables more effectively and provide better prediction performance than previous work (Yuan. Joseph, and Lin 2007 and Zhao. Rocha, and Yu 2009 as well as the classical LASSO method). KEY WORDS: Heredity structure; LASSO; Regularization.
- Published
- 2010
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.