125 results on '"Ian T. Jolliffe"'
Search Results
2. Forecast Verification: A Practitioner's Guide in Atmospheric Science
- Author
-
Ian T. Jolliffe, David B. Stephenson, Ian T. Jolliffe, David B. Stephenson
- Published
- 2012
3. A 50-year personal journey through time with principal component analysis
- Author
-
Ian T. Jolliffe
- Subjects
Statistics and Probability ,Numerical Analysis ,Multivariate statistics ,Principal component analysis ,Mathematics education ,Statistics, Probability and Uncertainty ,Inclusion (education) ,Mathematics - Abstract
Principal component analysis (PCA) is one of the most widely used multivariate techniques. A little more than 50 years ago I first encountered PCA and it has played an important role in my career and beyond, for many years since. I have been persuaded that an account of my 50-year journey through time with PCA would be a suitable topic for inclusion in the Jubilee Issue of JMVA and this is the result.
- Published
- 2022
4. Dark Data: Why What You Don’t Know Matters
- Author
-
Ian T. Jolliffe
- Subjects
Statistics and Probability ,Economics and Econometrics ,media_common.quotation_subject ,Art ,Statistics, Probability and Uncertainty ,Humanities ,Social Sciences (miscellaneous) ,media_common - Published
- 2021
5. Behaviour of verification measures for deterministic binary forecasts with respect to random changes and thresholding
- Author
-
Ian T. Jolliffe and Agostino Manzato
- Subjects
Atmospheric Science ,010504 meteorology & atmospheric sciences ,0208 environmental biotechnology ,Posterior probability ,Statistics ,Binary number ,02 engineering and technology ,01 natural sciences ,Thresholding ,Forecast verification ,020801 environmental engineering ,0105 earth and related environmental sciences ,Mathematics - Published
- 2017
6. Detection of weekly cycles in atmospheric data
- Author
-
Ian T. Jolliffe
- Subjects
Atmospheric Science ,010504 meteorology & atmospheric sciences ,Ensemble forecasting ,Flatness (systems theory) ,0208 environmental biotechnology ,Rank (computer programming) ,02 engineering and technology ,01 natural sciences ,Data type ,020801 environmental engineering ,Histogram ,Econometrics ,0105 earth and related environmental sciences ,Mathematics ,Statistical hypothesis testing - Abstract
There is considerable interest in whether weekly cycles are present in certain types of atmospheric and hydrologic data. Seven days is not a natural period for cyclic behaviour, but rather a human construct. Hence the presence of weekly cycles can help to confirm the presence and nature of anthropogenic climate change. Several statistical tests have been used to investigate the presence of such cycles. In this short note, a test previously used for ‘flatness’ of rank histograms in ensemble forecasting is shown to be useful for this task for certain types of data, with some advantages compared with existing techniques. Some disadvantages and extensions are also described.
- Published
- 2017
7. Probability forecasts with observation error: what should be forecast?
- Author
-
Ian T. Jolliffe
- Subjects
Atmospheric Science ,010504 meteorology & atmospheric sciences ,0208 environmental biotechnology ,Forecast skill ,02 engineering and technology ,Expected value ,01 natural sciences ,Measure (mathematics) ,020801 environmental engineering ,Brier score ,Order (exchange) ,Statistics ,Econometrics ,Meaning (existential) ,Hedge (finance) ,0105 earth and related environmental sciences ,Mathematics ,Event (probability theory) - Abstract
When probability forecasts are made of a binary event, a commonly used measure for assessing the forecasts is the Brier score. One of its properties is that it is proper, meaning that its expected value cannot be improved by the forecaster issuing a probability other than his/her true belief. This property assumes that the occurrence or otherwise of the forecast event is recorded without error. This note investigates what forecast should be made in order to minimize the expected value of the Brier score when errors are present in the observations. Should it still be the forecaster's true belief or should it be something else, implying that the forecaster should hedge his/her forecast? The answer is that it depends on whether the forecaster can model the error mechanism or whether the error mechanism is unknown. It is shown that in the former case the forecaster's true belief of the probability of the event should still be forecast. However, in the case of an unknown error mechanism, the forecaster should attempt to forecast the probability that the erroneous observation indicates that the event has occurred, rather than the true probability of the event.
- Published
- 2017
8. Use of generalized additive modelling techniques to create synthetic daily temperature networks for benchmarking homogenization algorithms
- Author
-
Ian T. Jolliffe, Kate M. Willett, and Rachel Killick
- Subjects
Mathematical optimization ,010504 meteorology & atmospheric sciences ,Homogenization (climate) ,Environmental science ,Benchmarking ,010502 geochemistry & geophysics ,01 natural sciences ,0105 earth and related environmental sciences - Published
- 2018
9. A clustering approach to interpretable principal components
- Author
-
Nickolay T. Trendafilov, Doyo Gragn Enki, and Ian T. Jolliffe
- Subjects
Statistics and Probability ,business.industry ,Dimensionality reduction ,Correlation clustering ,Pattern recognition ,Explained variation ,Correlation ,Principal component analysis ,Statistics ,Feature (machine learning) ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,Cluster analysis ,Eigenvalues and eigenvectors ,Mathematics - Abstract
A new method for constructing interpretable principal components is proposed. The method first clusters the variables, and then interpretable (sparse) components are constructed from the correlation matrices of the clustered variables. For the first step of the method, a new weighted-variances method for clustering variables is proposed. It reflects the nature of the problem that the interpretable components should maximize the explained variance and thus provide sparse dimension reduction. An important feature of the new clustering procedure is that the optimal number of clusters (and components) can be determined in a non-subjective manner. The new method is illustrated using well-known simulated and real data sets. It clearly outperforms many existing methods for sparse principal component analysis in terms of both explained variance and sparseness.
- Published
- 2013
10. Principal component analysis: a review and recent developments
- Author
-
Ian T. Jolliffe and Jorge Cadima
- Subjects
0301 basic medicine ,Computer science ,General Mathematics ,Dimensionality reduction ,General Engineering ,Sparse PCA ,General Physics and Astronomy ,Variance (accounting) ,Articles ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Data type ,03 medical and health sciences ,030104 developmental biology ,ComputingMethodologies_PATTERNRECOGNITION ,Principal component analysis ,A priori and a posteriori ,Data mining ,computer ,0105 earth and related environmental sciences ,Interpretability ,Curse of dimensionality - Abstract
Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori , hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.
- Published
- 2016
11. Epilogue: New Directions in Forecast Verification
- Author
-
David B. Stephenson and Ian T. Jolliffe
- Subjects
Forecast error ,Computer science ,Econometrics ,Forecast skill ,Forecast verification - Published
- 2011
12. Introduction
- Author
-
Ian T. Jolliffe and David B. Stephenson
- Published
- 2011
13. Independent Component Analysis for Three-Way Data With an Application From Atmospheric Science
- Author
-
Ian T. Jolliffe, Steffen Unkel, Nickolay T. Trendafilov, and Abdel Hannachi
- Subjects
Statistics and Probability ,Geopotential ,Computer science ,Applied Mathematics ,Mode (statistics) ,Atmospheric sciences ,Agricultural and Biological Sciences (miscellaneous) ,Independent component analysis ,Set (abstract data type) ,Component analysis ,Spatial ecology ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,Representation (mathematics) ,Rotation (mathematics) ,General Environmental Science - Abstract
In this paper, a new approach to independent component analysis (ICA) for three-way data is considered. The rotational freedom of the three-mode component analysis (Tucker3) model is exploited to implement ICA in one mode of the data. The performance of the proposed approach is evaluated by means of numerical experiments. An illustration with real data from atmospheric science is presented, where the first mode is spatial location, the second is time and the third is a set of different meteorological variables representing geopotential heights at various vertical pressure levels. The results show that the three-mode decomposition finds spatial patterns of climate anomalies which can be interpreted in a meteorological sense and as such gives an insightful low-dimensional representation of the data.
- Published
- 2011
14. Independent exploratory factor analysis with application to atmospheric science data
- Author
-
Abdel Hannachi, Nickolay T. Trendafilov, Ian T. Jolliffe, and Steffen Unkel
- Subjects
Statistics and Probability ,North Pacific Oscillation ,Matrix (mathematics) ,North Atlantic oscillation ,Statistics ,Spatial ecology ,Northern Hemisphere ,Model parameters ,Statistics, Probability and Uncertainty ,Rotation (mathematics) ,Physics::Atmospheric and Oceanic Physics ,Exploratory factor analysis ,Mathematics - Abstract
The independent exploratory factor analysis method is introduced for recovering independent latent sources from their observed mixtures. The new model is viewed as a method of factor rotation in exploratory factor analysis (EFA). First, estimates for all EFA model parameters are obtained simultaneously. Then, an orthogonal rotation matrix is sought that minimizes the dependence between the common factors. The rotation of the scores is compensated by a rotation of the initial loading matrix. The proposed approach is applied to study winter monthly sea-level pressure anomalies over the Northern Hemisphere. The North Atlantic Oscillation, the North Pacific Oscillation, and the Scandinavian pattern are identified among the rotated spatial patterns with a physically interpretable structure.
- Published
- 2010
15. Equitability Revisited: Why the 'Equitable Threat Score' Is Not Equitable
- Author
-
Robin J. Hogan, David B. Stephenson, Christopher A. T. Ferro, and Ian T. Jolliffe
- Subjects
Contingency table ,Atmospheric Science ,Dependency (UML) ,Sample size determination ,Statistics ,Econometrics ,Rare events ,Forecast skill ,Limit (mathematics) ,Base (topology) ,Zero (linguistics) ,Mathematics - Abstract
In the forecasting of binary events, verification measures that are “equitable” were defined by Gandin and Murphy to satisfy two requirements: 1) they award all random forecasting systems, including those that always issue the same forecast, the same expected score (typically zero), and 2) they are expressible as the linear weighted sum of the elements of the contingency table, where the weights are independent of the entries in the table, apart from the base rate. The authors demonstrate that the widely used “equitable threat score” (ETS), as well as numerous others, satisfies neither of these requirements and only satisfies the first requirement in the limit of an infinite sample size. Such measures are referred to as “asymptotically equitable.” In the case of ETS, the expected score of a random forecasting system is always positive and only falls below 0.01 when the number of samples is greater than around 30. Two other asymptotically equitable measures are the odds ratio skill score and the symmetric extreme dependency score, which are more strongly inequitable than ETS, particularly for rare events; for example, when the base rate is 2% and the sample size is 1000, random but unbiased forecasting systems yield an expected score of around −0.5, reducing in magnitude to −0.01 or smaller only for sample sizes exceeding 25 000. This presents a problem since these nonlinear measures have other desirable properties, in particular being reliable indicators of skill for rare events (provided that the sample size is large enough). A potential way to reconcile these properties with equitability is to recognize that Gandin and Murphy’s two requirements are independent, and the second can be safely discarded without losing the key advantages of equitability that are embodied in the first. This enables inequitable and asymptotically equitable measures to be scaled to make them equitable, while retaining their nonlinearity and other properties such as being reliable indicators of skill for rare events. It also opens up the possibility of designing new equitable verification measures.
- Published
- 2010
16. Some recent developments in cluster analysis
- Author
-
Ian T. Jolliffe and Andreas Philipp
- Subjects
Clustering high-dimensional data ,Computer science ,Subject (documents) ,Weather and climate ,Disease cluster ,computer.software_genre ,Hierarchical clustering ,Geophysics ,Geochemistry and Petrology ,Model based clustering ,Relevance (information retrieval) ,Data mining ,Cluster analysis ,computer - Abstract
Cluster analysis has been used for many years in weather and climate research but most applications have concentrated on a handful of techniques. However, the subject is vast with many methods scattered across a number of literatures, and research on new variants continues. This paper describes a few of these new developments, concentrating on those that have appeared in the statistical and atmospheric science literatures. Their relevance to applications in weather and climate is discussed.
- Published
- 2010
17. Independent Component Analysis of Climate Data: A New Look at EOF Rotation
- Author
-
Nickolay T. Trendafilov, Abdel Hannachi, Ian T. Jolliffe, and Steffen Unkel
- Subjects
Atmospheric Science ,Meteorology ,Arctic oscillation ,Climatology ,Mode (statistics) ,Empirical orthogonal functions ,Time domain ,Rotation matrix ,Rotation (mathematics) ,Independent component analysis ,Algorithm ,Independence (probability theory) ,Mathematics - Abstract
The complexity inherent in climate data makes it necessary to introduce more than one statistical tool to the researcher to gain insight into the climate system. Empirical orthogonal function (EOF) analysis is one of the most widely used methods to analyze weather/climate modes of variability and to reduce the dimensionality of the system. Simple structure rotation of EOFs can enhance interpretability of the obtained patterns but cannot provide anything more than temporal uncorrelatedness. In this paper, an alternative rotation method based on independent component analysis (ICA) is considered. The ICA is viewed here as a method of EOF rotation. Starting from an initial EOF solution rather than rotating the loadings toward simplicity, ICA seeks a rotation matrix that maximizes the independence between the components in the time domain. If the underlying climate signals have an independent forcing, one can expect to find loadings with interpretable patterns whose time coefficients have properties that go beyond simple noncorrelation observed in EOFs. The methodology is presented and an application to monthly means sea level pressure (SLP) field is discussed. Among the rotated (to independence) EOFs, the North Atlantic Oscillation (NAO) pattern, an Arctic Oscillation–like pattern, and a Scandinavian-like pattern have been identified. There is the suggestion that the NAO is an intrinsic mode of variability independent of the Pacific.
- Published
- 2009
18. Calibration of Probabilistic Forecasts of Binary Events
- Author
-
Ian T. Jolliffe, Christopher A. T. Ferro, Cristina Primo, and David B. Stephenson
- Subjects
Atmospheric Science ,Quality (physics) ,Computer science ,Calibration (statistics) ,Statistics ,Bayesian probability ,Probabilistic logic ,Econometrics ,Binary number ,Initial value problem ,Consensus forecast ,Logistic regression ,Physics::Atmospheric and Oceanic Physics - Abstract
Probabilistic forecasts of atmospheric variables are often given as relative frequencies obtained from ensembles of deterministic forecasts. The detrimental effects of imperfect models and initial conditions on the quality of such forecasts can be mitigated by calibration. This paper shows that Bayesian methods currently used to incorporate prior information can be written as special cases of a beta-binomial model and correspond to a linear calibration of the relative frequencies. These methods are compared with a nonlinear calibration technique (i.e., logistic regression) using real precipitation forecasts. Calibration is found to be advantageous in all cases considered, and logistic regression is preferable to linear methods.
- Published
- 2009
19. Spatial Weighting and Iterative Projection Methods for EOFs
- Author
-
Mark P. Baldwin, Ian T. Jolliffe, and David B. Stephenson
- Subjects
Analysis of covariance ,Atmospheric Science ,Matrix (mathematics) ,Iterative method ,Computer science ,Empirical orthogonal functions ,A-weighting ,Data mining ,Grid ,Projection (set theory) ,computer.software_genre ,computer ,Weighting - Abstract
Often there is a need to consider spatial weighting in methods for finding spatial patterns in climate data. The focus of this paper is on techniques that maximize variance, such as empirical orthogonal functions (EOFs). A weighting matrix is introduced into a generalized framework for dealing with spatial weighting. One basic principal in the design of the weighting matrix is that the resulting spatial patterns are independent of the grid used to represent the data. A weighting matrix can also be used for other purposes, such as to compensate for the neglect of unrepresented subgrid-scale variance or, in the form of a prewhitening filter, to maximize the signal-to-noise ratio of EOFs. The new methodology is applicable to other types of climate pattern analysis, such as extended EOF analysis and maximum covariance analysis. The increasing availability of large datasets of three-dimensional gridded variables (e.g., reanalysis products and model output) raises special issues for data-reduction methods such as EOFs. Fast, memory-efficient methods are required in order to extract leading EOFs from such large datasets. This study proposes one such approach based on a simple iteration of successive projections of the data onto time series and spatial maps. It is also demonstrated that spatial weighting can be combined with the iterative methods. Throughout the paper, multivariate statistics notation is used, simplifying implementation as matrix commands in high-level computing languages.
- Published
- 2009
20. Two Extra Components in the Brier Score Decomposition
- Author
-
Caio A. S. Coelho, Ian T. Jolliffe, and David B. Stephenson
- Subjects
Atmospheric Science ,Basis (linear algebra) ,Brier score ,Rank (linear algebra) ,Statistics ,Econometrics ,Forecast skill ,Statistical model ,Forecast verification ,Reliability (statistics) ,Small set ,Mathematics - Abstract
The Brier score is widely used for the verification of probability forecasts. It also forms the basis of other frequently used probability scores such as the rank probability score. By conditioning (stratifying) on the issued forecast probabilities, the Brier score can be decomposed into the sum of three components: uncertainty, reliability, and resolution. This Brier score decomposition can provide useful information to the forecast provider about how the forecasts can be improved. Rather than stratify on all values of issued probability, it is common practice to calculate the Brier score components by first partitioning the issued probabilities into a small set of bins. This note shows that for such a procedure, an additional two within-bin components are needed in addition to the three traditional components of the Brier score. The two new components can be combined with the resolution component to make a generalized resolution component that is less sensitive to choice of bin width than is the traditional resolution component. The difference between the generalized resolution term and the conventional resolution term also quantifies how forecast skill is degraded when issuing categorized probabilities to users. The ideas are illustrated using an example of multimodel ensemble seasonal forecasts of equatorial sea surface temperatures.
- Published
- 2008
21. Evaluating Rank Histograms Using Decompositions of the Chi-Square Test Statistic
- Author
-
Ian T. Jolliffe and Cristina Primo
- Subjects
Atmospheric Science ,Ensemble forecasting ,Histogram ,Flatness (systems theory) ,Statistics ,Chi-square test ,Ensemble simulation ,Statistic ,Mathematics - Abstract
Rank histograms are often plotted to evaluate the forecasts produced by an ensemble forecasting system—an ideal rank histogram is “flat” or uniform. It has been noted previously that the obvious test of “flatness,” the well-known χ2 goodness-of-fit test, spreads its power thinly and hence is not good at detecting specific alternatives to flatness, such as bias or over- or underdispersion. Members of the Cramér–von Mises family of tests do much better in this respect. An alternative to using the Cramér–von Mises family is to decompose the χ2 test statistic into components that correspond to specific alternatives. This approach is described in the present paper. It is arguably easier to use and more flexible than the Cramér–von Mises family of tests, and does at least as well as it in detecting alternatives corresponding to bias and over- or underdispersion.
- Published
- 2008
22. Proper Scores for Probability Forecasts Can Never Be Equitable
- Author
-
Ian T. Jolliffe and David B. Stephenson
- Subjects
Atmospheric Science ,Class (computer programming) ,Statistics ,Value (economics) ,Econometrics ,Forecast skill ,Binary number ,Consensus forecast ,Forecast verification ,Measure (mathematics) ,Mathematics ,Event (probability theory) - Abstract
Verification is an important part of any forecasting system. It is usually achieved by computing the value of some measure or score that indicates how good the forecasts are. Many possible verification measures have been proposed, and to choose between them a number of desirable properties have been defined. For probability forecasts of a binary event, two of the best known of these properties are propriety and equitability. A proof that the two properties are incompatible for a wide class of verification measures is given in this paper, after briefly reviewing the two properties and some recent attempts to improve properties for the well-known Brier skill score.
- Published
- 2008
23. The impenetrable hedge: a note on propriety, equitability and consistency
- Author
-
Ian T. Jolliffe
- Subjects
Atmospheric Science ,Actuarial science ,Consistency (negotiation) ,Judgement ,Econometrics ,Economics ,Weather and climate ,Hedge (finance) ,Forecast verification - Abstract
In weather and climate forecasting, hedging is said to occur whenever a forecaster's judgement and the forecast differ, and it is usually taken as evident that hedging is undesirable. Forecasts are often judged by computing a verification measure or score. A number of different scores is available in most circumstances, and to choose between them, various desirable properties of scores have been defined. It is generally accepted that it is undesirable to use a score for which hedging can improve the score or its expected value. Three ‘desirable’ properties of scores are linked to the idea that hedging should be avoided, namely propriety, equitability and consistency. It is fair to say that none of these properties is fully understood. The aim of this article is to provide some clarification and new insights, as well as some historical background. Nearly as many questions are raised as are answered. Copyright © 2008 Royal Meteorological Society
- Published
- 2008
24. Uncertainty and Inference for Verification Measures
- Author
-
Ian T. Jolliffe
- Subjects
Atmospheric Science ,Multiple comparisons problem ,Statistics ,Confidence distribution ,Estimation statistics ,Econometrics ,Statistical inference ,Prediction interval ,Confidence interval ,Confidence and prediction bands ,Statistical hypothesis testing ,Mathematics - Abstract
When a forecast is assessed, a single value for a verification measure is often quoted. This is of limited use, as it needs to be complemented by some idea of the uncertainty associated with the value. If this uncertainty can be quantified, it is then possible to make statistical inferences based on the value observed. There are two main types of inference: confidence intervals can be constructed for an underlying “population” value of the measure, or hypotheses can be tested regarding the underlying value. This paper will review the main ideas of confidence intervals and hypothesis tests, together with the less well known “prediction intervals,” concentrating on aspects that are often poorly understood. Comparisons will be made between different methods of constructing confidence intervals—exact, asymptotic, bootstrap, and Bayesian—and the difference between prediction intervals and confidence intervals will be explained. For hypothesis testing, multiple testing will be briefly discussed, together with connections between hypothesis testing, prediction intervals, and confidence intervals.
- Published
- 2007
25. DALASS: Variable selection in discriminant analysis via the LASSO
- Author
-
Nickolay T. Trendafilov and Ian T. Jolliffe
- Subjects
Statistics and Probability ,Unit sphere ,Mathematical optimization ,Applied Mathematics ,Feature selection ,Linear discriminant analysis ,Computational Mathematics ,symbols.namesake ,Computational Theory and Mathematics ,Discriminant ,Discriminant function analysis ,Lasso (statistics) ,symbols ,Applied mathematics ,Penalty method ,Fisher information ,Mathematics - Abstract
The objective of DALASS is to simplify the interpretation of Fisher's discriminant function coefficients. The DALASS problem-discriminant analysis (DA) modified so that the canonical variates satisfy the LASSO constraint-is formulated as a dynamical system on the unit sphere. Both standard and orthogonal canonical variates are considered. The globally convergent continuous-time algorithms are illustrated numerically and applied to some well-known data sets.
- Published
- 2007
26. Modelling seasonally varying data: A case study for Sudden Infant Death Syndrome (SIDS)
- Author
-
Ian T. Jolliffe, Peter Joseph Benedict Helms, and Jennifer Mooney
- Subjects
Statistics and Probability ,Series (stratigraphy) ,Geography ,Cosinor analysis ,Statistics ,medicine ,Probability distribution ,Statistics, Probability and Uncertainty ,Seasonality ,Sudden infant death syndrome ,medicine.disease ,Regression ,Demography - Abstract
Many time series are measured monthly, either as averages or totals, and such data often exhibit seasonal variability – the values of the series are consistently larger for some months of the year than for others. A typical series of this type is the number of deaths each month attributed to SIDS (Sudden Infant Death Syndrome). Seasonality can be modelled in a number of ways. This paper describes and discusses various methods for modelling seasonality in SIDS data, though much of the discussion is relevant to other seasonally varying data. There are two main approaches, either fitting a circular probability distribution to the data, or using regression-based techniques to model the mean seasonal behaviour. Both are discussed in this paper.
- Published
- 2006
27. In search of simple structures in climate: simplifying EOFs
- Author
-
Nickolay T. Trendafilov, Abdelwaheb Hannachi, David B. Stephenson, and Ian T. Jolliffe
- Subjects
Constraint (information theory) ,Atmospheric Science ,Climate pattern ,Meteorology ,Climatology ,Principal component analysis ,Applied mathematics ,Empirical orthogonal functions ,Variance (accounting) ,Time series ,Rotation (mathematics) ,Mathematics ,Curse of dimensionality - Abstract
Empirical orthogonal functions (EOFs) are widely used in climate research to identify dominant patterns of variability and to reduce the dimensionality of climate data. EOFs, however, can be difficult to interpret. Rotated empirical orthogonal functions (REOFs) have been proposed as more physical entities with simpler patterns than EOFs. This study presents a new approach for finding climate patterns with simple structures that overcomes the problems encountered with rotation. The method achieves simplicity of the patterns by using the main properties of EOFs and REOFs simultaneously. Orthogonal patterns that maximise variance subject to a constraint that induces a form of simplicity are found. The simplified empirical orthogonal function (SEOF) patterns, being more 'local'. are constrained to have zero loadings outside the main centre of action. The method is applied to winter Northern Hemisphere (NH) monthly mean sea level pressure (SLP) reanalyses over the period 1948-2000. The 'simplified' leading patterns of variability are identified and compared to the leading patterns obtained from EOFs and REOFs. Copyright (C) 2005 Royal Meteorological Society.
- Published
- 2006
28. Projected gradient approach to the numerical solution of the SCoTLASS
- Author
-
Nickolay T. Trendafilov and Ian T. Jolliffe
- Subjects
Statistics and Probability ,Shrinkage estimator ,Unit sphere ,Mathematical optimization ,Applied Mathematics ,MathematicsofComputing_NUMERICALANALYSIS ,Constrained optimization ,Dynamical system ,Constraint (information theory) ,Computational Mathematics ,Computational Theory and Mathematics ,Lasso (statistics) ,Method of steepest descent ,Applied mathematics ,Penalty method ,Mathematics - Abstract
The SCoTLASS problem-principal component analysis modified so that the components satisfy the Least Absolute Shrinkage and Selection Operator (LASSO) constraint-is reformulated as a dynamical system on the unit sphere. The LASSO inequality constraint is tackled by exterior penalty function. A globally convergent algorithm is developed based on the projected gradient approach. The algorithm is illustrated numerically and discussed on a well-known data set.
- Published
- 2006
29. Comments on 'Discussion of Verification Concepts in Forecast Verification: A Practitioner’s Guide in Atmospheric Science'
- Author
-
Ian T. Jolliffe and David B. Stephenson
- Subjects
Atmospheric Science ,Meteorology ,Computer science ,Forecast verification - Published
- 2005
30. Variable selection for discriminant analysis of fish sounds using matrix correlations
- Author
-
Mark Wood, Graham W. Horgan, and Ian T. Jolliffe
- Subjects
Statistics and Probability ,Multiple discriminant analysis ,Multivariate statistics ,business.industry ,Applied Mathematics ,Context (language use) ,Feature selection ,Pattern recognition ,Linear discriminant analysis ,Agricultural and Biological Sciences (miscellaneous) ,Canonical analysis ,Optimal discriminant analysis ,Principal component analysis ,Statistics ,Artificial intelligence ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,business ,General Environmental Science ,Mathematics - Abstract
Discriminant analysis is a widely used multivariate technique. In some applications the number of variables available is very large and, as with other multivariate techniques, it is desirable to simplify matters by selecting a subset of the variables in such a way that little useful information is lost in doing so. Many methods have been suggested for variable selection in discriminant analysis; this article introduces a new one, based on matrix correlation, an idea that has proved useful in the context of principal component analysis. The method is illustrated on an example involving fish sounds. It is important to discriminate between the sounds made by different species of fish, and even by individual fish, but the nature of the data is such that many potential variables are available.
- Published
- 2005
31. Variable selection and interpretation in correlation principal components
- Author
-
Ian T. Jolliffe and Noriah M. Al-Kandari
- Subjects
Statistics and Probability ,Multivariate statistics ,Ecological Modeling ,Dimensionality reduction ,Feature selection ,computer.software_genre ,Interpretation (model theory) ,Data set ,Set (abstract data type) ,Principal component analysis ,Data mining ,Procrustes analysis ,computer ,Mathematics - Abstract
Principal component analysis (PCA) is a dimension-reducing tool that replaces the variables in a multivariate data set by a smaller number of derived variables. Dimension reduction is often undertaken to help in interpreting the data set but, as each principal component usually involves all the original variables, interpretation of a PCA can still be difficult. One way to overcome this difficulty is to select a subset of the original variables and use this subset to approximate the principal components. This article reviews a number of techniques for choosing subsets of the variables and examines their merits in terms of preserving the information in the PCA, and in aiding interpretation of the main sources of variation in the data. Copyright © 2005 John Wiley & Sons, Ltd.
- Published
- 2005
32. Seasonality of type 1 diabetes mellitus in children and its modification by weekends and holidays: retrospective observational study
- Author
-
Peter Joseph Benedict Helms, Ian T. Jolliffe, P Smail, and Jennifer Mooney
- Subjects
Male ,medicine.medical_specialty ,Pediatrics ,Adolescent ,Age Distribution ,Age groups ,Diabetes mellitus ,Epidemiology ,Humans ,Medicine ,Sex Distribution ,Date of birth ,Child ,Holidays ,Retrospective Studies ,Type 1 diabetes ,business.industry ,Infant, Newborn ,Infant ,Retrospective cohort study ,Seasonality ,medicine.disease ,Diabetes Mellitus, Type 1 ,Scotland ,El Niño ,Child, Preschool ,Pediatrics, Perinatology and Child Health ,Original Article ,Female ,Seasons ,business ,human activities - Abstract
Background: Diagnoses of type 1 insulin dependent diabetes mellitus are generally more common in winter, although this seasonal pattern has not been observed in children of preschool age (0–4 years) or in all countries. Aims: To confirm the persistence of seasonality and the influence of age, holidays, and weekends. Methods: We extracted data on date of birth, date of presentation, age, and sex of children diagnosed with diabetes and registered with the Scottish Study Group for the Care of Diabetes in the Young. Cosinor analysis was applied to monthly and mid-monthly data. Two sample Z tests were used to compare the epochs 1984–1992 and 1993–2001. Results: Some 4517 children between 0 and 14 years of age (2407 male and 2110 female) presented with IDDM between 1 January 1984 and 31 December 2001. Seasonality was evident in children above 4 years of age with amplitudes of 19.5–25.7% and peaks between mid December and mid January. Presentation was strongly influenced by weekends and holiday periods, with reduced presentations in December compared with November and January, and with the lowest presentations in July (the main Scottish holiday month). Using mid-month to mid-month data did not change the overall seasonality but did improve the fits for cosinor analysis. Mondays and Fridays were the most common days for presentation. Conclusion: Initial presentation of IDDM in Scotland follows a stable seasonal pattern in all but the youngest children with lower rates of presentation in holiday periods and at weekends for all age groups.
- Published
- 2004
33. Estimating common trends in multivariate time series using dynamic factor analysis
- Author
-
Jan J. Beukema, Rob Dekker, Ian T. Jolliffe, Alain F. Zuur, and R. J. Fryer
- Subjects
0106 biological sciences ,Statistics and Probability ,Multivariate statistics ,Multivariate analysis ,010504 meteorology & atmospheric sciences ,Series (mathematics) ,Computer science ,010604 marine biology & hydrobiology ,Ecological Modeling ,Missing data ,01 natural sciences ,Set (abstract data type) ,Dynamic factor ,Expectation–maximization algorithm ,Statistics ,Econometrics ,Autoregressive integrated moving average ,0105 earth and related environmental sciences - Abstract
This article discusses dynamic factor analysis, a technique for estimating common trends in multivariate time series. Unlike more common time series techniques such as spectral analysis and ARIMA models, dynamic factor analysis can analyse short, non-stationary time series containing missing values. Typically, the parameters in dynamic factor analysis are estimated by direct optimization, which means that only small data sets can be analysed if computing time is not to become prohibitively long and the chances of obtaining sub-optimal estimates are to be avoided. This article shows how the parameters of dynamic factor analysis can be estimated using the EM algorithm, allowing larger data sets to be analysed. The technique is illustrated on a marine environmental data set. Copyright © 2003 John Wiley & Sons, Ltd.
- Published
- 2003
34. Fitting mixtures of von Mises distributions: a case study involving sudden infant death syndrome
- Author
-
Jennifer Mooney, Ian T. Jolliffe, and Peter Joseph Benedict Helms
- Subjects
Statistics and Probability ,education.field_of_study ,Uniform distribution (continuous) ,Applied Mathematics ,Population ,Northern ireland ,Sudden infant death syndrome ,Computational Mathematics ,Computational Theory and Mathematics ,Statistics ,von Mises distribution ,von Mises yield criterion ,education ,Mathematics - Abstract
Sudden infant death syndrome (SIDS) exhibits a seasonal pattern with a winter peak. This pattern is not symmetric. It rises rapidly to a winter peak before falling more slowly to a dip in the summer. It has been suggested that the relatively flat peak may be due to the presence of more than one population, where each population corresponds to a different cause of SIDS. Various models based on the von Mises distribution are fitted to monthly data for England, Wales, Scotland and Northern Ireland for the years 1983-1998, including a single von Mises distribution, a mixture of a von Mises and a uniform distribution and a mixture of two von Mises distributions. There are a number of ways of fitting such models (Fisher, 1993; Spurr and Koutbeiy, 1991). Various computational problems arise with the fitting procedures. Attempts to tackle these problems for the SIDS data are discussed. A bootstrap likelihood ratio approach (Polymenis and Titterington, 1998) is used to assess how many components are required in the model. Its properties are investigated by simulation. The improvement in fit of two components compared to one is not significant in most years, and hence there is little evidence of two populations in the seasonality of SIDS. In most years, it was also impossible to fit a mixture of von Mises and uniform components, with a single von Mises distribution being sufficient.
- Published
- 2003
35. Does the North Atlantic current affect spatial distribution of whiting? Testing environmental hypotheses using statistical and GIS techniques
- Author
-
David G. Reid, Graham J. Pierce, Ian T. Jolliffe, and Xiaohong Zheng
- Subjects
Ecology ,biology ,Generalized additive model ,Aquatic Science ,Oceanography ,biology.organism_classification ,Spatial distribution ,Whiting ,Sea surface temperature ,Merlangius merlangus ,Abundance (ecology) ,Spatial ecology ,Common spatial pattern ,Environmental science ,Ecology, Evolution, Behavior and Systematics - Abstract
This paper describes spatial relationships between whiting, Merlangius merlangus (Linnaeus, 1758), abundance in the northern North Sea and contemporaneous measures of environmental conditions: sea surface temperature (SST), sea bottom temperature (SBT), and depth, with particular reference to the processes underlying patterns in SST. Generalised additive models (GAMs) were used to provide quantitative descriptions of the relationships between local abundance and environmental conditions. GIS (geographic information system) techniques were used to provide qualitative description of spatial patterns and to confirm the results revealed from GAMs. GAMs fitted to both long-term averaged and individual years’ data revealed marked seasonal changes in the spatial relationships between whiting abundance and environmental variables. The GAM results were supported by GIS analysis. In winter and spring (December–April) in the northern North Sea, the spatial pattern of SST apparently has an important influence on the spatial distribution of whiting at the same time. Where the water is relatively warm whiting abundance is relatively high, probably reflecting the indirect influence of North Atlantic waters entering the northern North Sea. However, there are no consistent optimum SST bands for whiting. These positive relationships between abundance and SST disappear in summer.
- Published
- 2002
36. Simplified EOFs-three alternatives to rotation
- Author
-
Ian T. Jolliffe, S. K. Vines, and Mudassir Uddin
- Subjects
Data set ,Atmospheric Science ,Geography ,Lasso (statistics) ,Principal component analysis ,Statistics ,Environmental Chemistry ,Empirical orthogonal functions ,Algorithm ,Rotation (mathematics) ,General Environmental Science ,Interpretation (model theory) - Abstract
Principal component analysis (PCA) is widely used in atmospheric science, and the resulting empirical orthogonal functions (EOFs) are often rotated to aid interpretation. In this paper 3 methods are described which provide alternatives to the standard 2-stage procedure of PCA followed by rotation. The techniques are illustrated on a small example involving sea-surface temperatures in the Mediterranean. Each method is shown to give different simplified interpretations for the major sources of variation in the data set. All 3 techniques have advantages over standard rotation.
- Published
- 2002
37. Principal Component Analysis.
- Author
-
Ian T. Jolliffe
- Published
- 2011
- Full Text
- View/download PDF
38. Principal Component Analysis
- Author
-
Ian T. Jolliffe
- Subjects
Normalization (statistics) ,Multivariate statistics ,Multivariate analysis ,Dimensionality reduction ,Statistics ,Principal component analysis ,Sparse PCA ,Maximization ,Covariance ,Linear combination ,Mathematics - Abstract
When large multivariate datasets are analyzed, it is often desirable to reduce their dimensionality. Principal component analysis is one technique for doing this. It replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables. Often, it is possible to retain most of the variability in the original variables with q very much smaller than p. Despite its apparent simplicity, principal component analysis has a number of subtleties, and it has many uses and extensions. A number of choices associated with the technique are briefly discussed, namely, covariance or correlation, how many components, and different normalization constraints, as well as confusion with factor analysis. Various uses and extensions are outlined. Keywords: dimension reduction; factor analysis; multivariate analysis; variance maximization
- Published
- 2014
39. Eigenvalues and eigenvectors in statistics
- Author
-
Ian T. Jolliffe
- Subjects
Optimization problem ,Multivariate analysis ,Multivariate analysis of variance ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,Statistics ,MathematicsofComputing_NUMERICALANALYSIS ,Matrix t-distribution ,Multivariate statistical ,Eigenvalues and eigenvectors ,Mathematics - Abstract
The words eigenvalue and eigenvector often appear in computer output for multivariate statistical techniques. An explanation is given of what is meant by these terms for some specific techniques and more generally. Keywords: matrices; multivariate analysis; optimization problems
- Published
- 2014
40. Concepts for benchmarking of homogenisation algorithm performance on the global scale
- Author
-
Claude N. Williams, Victor Venema, Zeke Hausfather, Ian T. Jolliffe, Robert Lund, M. J. Menne, Lisa V. Alexander, Steve Easterbrook, David I. Berry, Enric Aguilar, Peter Thorne, S. Brönniman, Thordis L. Thorarinsdottir, Renate Auchmann, K. M. Willett, Rachel Warren, Colin M. Gallagher, Giuseppina Lopardo, and Lucie A. Vincent
- Subjects
Scale (ratio) ,Environmental science ,Benchmarking ,Industrial engineering - Abstract
The International Surface Temperature Initiative (ISTI) is striving towards substantively improving our ability to robustly understand historical land surface air temperature change at all scales. A key recently completed first step has been collating all available records into a comprehensive open access, traceable and version-controlled databank. The crucial next step is to maximise the value of the collated data through a robust international framework of benchmarking and assessment for product intercomparison and uncertainty estimation. We focus on uncertainties arising from the presence of inhomogeneities in monthly surface temperature data and the varied methodological choices made by various groups in building homogeneous temperature products. The central facet of the benchmarking process is the creation of global scale synthetic analogs to the real-world database where both the "true" series and inhomogeneities are known (a luxury the real world data do not afford us). Hence algorithmic strengths and weaknesses can be meaningfully quantified and conditional inferences made about the real-world climate system. Here we discuss the necessary framework for developing an international homogenisation benchmarking system on the global scale for monthly mean temperatures. The value of this framework is critically dependent upon the number of groups taking part and so we strongly advocate involvement in the benchmarking exercise from as many data analyst groups as possible to make the best use of this substantial effort.
- Published
- 2014
41. A framework for benchmarking of homogenisation algorithm performance on the global scale
- Author
-
Robert Lund, Lucie A. Vincent, David I. Berry, Steve Easterbrook, Giuseppina Lopardo, Claude N. Williams, Ian T. Jolliffe, Colin M. Gallagher, Thordis L. Thorarinsdottir, Lisa V. Alexander, Enric Aguilar, Peter Thorne, M. J. Menne, Zeke Hausfather, Victor Venema, Renate Auchmann, Rachel Warren, Stefan Brönnimann, and Kate M. Willett
- Subjects
Value (ethics) ,Atmospheric Science ,Process (engineering) ,Scale (chemistry) ,lcsh:QC801-809 ,Geology ,Benchmarking ,Oceanography ,computer.software_genre ,Data science ,lcsh:Geophysics. Cosmic physics ,Geography ,13. Climate action ,Homogeneous ,Key (cryptography) ,Product (category theory) ,Data mining ,910 Geography & travel ,computer ,Strengths and weaknesses - Abstract
The International Surface Temperature Initiative (ISTI) is striving towards substantively improving our ability to robustly understand historical land surface air temperature change at all scales. A key recently completed first step has been collating all available records into a comprehensive open access, traceable and version-controlled databank. The crucial next step is to maximise the value of the collated data through a robust international framework of benchmarking and assessment for product intercomparison and uncertainty estimation. We focus on uncertainties arising from the presence of inhomogeneities in monthly mean land surface temperature data and the varied methodological choices made by various groups in building homogeneous temperature products. The central facet of the benchmarking process is the creation of global-scale synthetic analogues to the real-world database where both the "true" series and inhomogeneities are known (a luxury the real-world data do not afford us). Hence, algorithmic strengths and weaknesses can be meaningfully quantified and conditional inferences made about the real-world climate system. Here we discuss the necessary framework for developing an international homogenisation benchmarking system on the global scale for monthly mean temperatures. The value of this framework is critically dependent upon the number of groups taking part and so we strongly advocate involvement in the benchmarking exercise from as many data analyst groups as possible to make the best use of this substantial effort.
- Published
- 2014
42. A Comparison of Multivariate Outlier Detection Methods for Clinical Laboratory Safety Data
- Author
-
Ian T. Jolliffe and Kay I Penny
- Subjects
Statistics and Probability ,Multivariate statistics ,Multivariate analysis ,Computer science ,Multivariate normal distribution ,computer.software_genre ,Data set ,parasitic diseases ,Outlier ,sort ,Laboratory safety ,Data mining ,Dimension (data warehouse) ,computer - Abstract
Summary. During a clinical trial of a new treatment, a large number of variables are measured to monitor the safety of the treatment. It is important to detect outlying observations which may indicate that something abnormal is happening. To do this effectively, techniques are needed for finding multivariate outliers. Six techniques of this sort are described and illustrated on a typical laboratory safety data set. Their properties are investigated more thoroughly by means of a simulation study. The results show that some methods do better than others depending on whether or not the data set is multivariate normal, the dimension of the data set, the type of outlier, the proportion of outliers in a data set and the degree of contamination, i.e. 'outlyingness'. The results indicate that it is desirable to run a battery of multivariate methods on a particular data set in an attempt to highlight possible outliers.
- Published
- 2001
43. VARIABLE SELECTION AND INTERPRETATION OF COVARIANCE PRINCIPAL COMPONENTS
- Author
-
Noriah M. Al-Kandari and Ian T. Jolliffe
- Subjects
Statistics and Probability ,Modeling and Simulation ,Principal component analysis ,Statistics ,Closeness ,Feature selection ,Covariance ,Procrustes analysis ,Linear combination ,Selection (genetic algorithm) ,Correspondence analysis ,Mathematics - Abstract
In practice, when a principal component analysis is applied on a large number of variables the resultant principal components may not be easy to interpret, as each principal component is a linear combination of all the original variables. Selection of a subset of variables that contains, in some sense, as much information as possible and enhances the interpretations of the first few covariance principal components is one possible approach to tackle this problem. This paper describes several variable selection criteria and investigates which criteria are best for this purpose. Although some criteria are shown to be better than others, the main message of this study is that it is unwise to rely on only one or two criteria. It is also clear that the interdependence between variables and the choice of how to measure closeness between the original components and those using subsets of variables are both important in determining the best criteria to use.
- Published
- 2001
44. Variable selection and the interpretation of principal subspaces
- Author
-
Ian T. Jolliffe and Jorge Cadima
- Subjects
Statistics and Probability ,Variables ,Applied Mathematics ,Dimensionality reduction ,media_common.quotation_subject ,Design matrix ,Feature selection ,Agricultural and Biological Sciences (miscellaneous) ,Exploratory factor analysis ,Variable (computer science) ,Principal component analysis ,Statistics ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,General Environmental Science ,Mathematics ,Factor analysis ,media_common - Abstract
Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.
- Published
- 2001
45. Early detection of the start of the wet season in semiarid tropical climates of western Africa
- Author
-
Doris E.S. Dodd and Ian T. Jolliffe
- Subjects
Wet season ,Atmospheric Science ,Geography ,Range (biology) ,Climatology ,Current season ,Tropical climate ,Early detection ,Linear discriminant analysis ,West africa - Abstract
An earlier paper (Jolliffe IT, Sarria-Dodd DE. 1994. International Journal of Climatology 14: 71–76) investigated the problem of deciding when the wet season has started in tropical and sub-tropical climates. In particular, methodology based on linear discriminant analysis was developed for using data from the current season to make the decision, rather than relying only on information from previous seasons. It was shown, for three stations in eastern Africa, that the methodology was potentially valuable. The present study is much larger, using data from 24 stations, covering a range of annual rainfall totals, in western Africa. It is confirmed that linear discriminant analysis can indeed be useful in detecting when the wet season has started, and hence in deciding when to plant crops. As well as being a larger analysis than that reported previously, the present study also extends the previous work by investigating an alternative definition of the start of the wet season and by including ‘date’ as a potential explanatory variable. Copyright © 2001 Royal Meteorological Society.
- Published
- 2001
46. Multivariate outlier detection applied to multiply imputed laboratory data
- Author
-
Kay I Penny and Ian T. Jolliffe
- Subjects
Statistics and Probability ,Mahalanobis distance ,Multivariate analysis ,Epidemiology ,Computer science ,computer.software_genre ,Missing data ,Data set ,Outlier ,Statistics ,Anomaly detection ,Data mining ,Imputation (statistics) ,computer ,Statistical hypothesis testing - Abstract
In clinical laboratory safety data, multivariate outlier detection methods may highlight a patient whose laboratory measurements do not follow the same pattern of relationships as the majority of patients, although their individual measurements are not found to be outlying when considered one at a time. Missing data problems are often dealt with by imputing a single value as an estimate of the missing value. The completed data set may then be analysed using traditional methods. A disadvantage of using single imputation is the underestimation of variability, with a corresponding distortion of power in hypothesis testing. Multiple imputation methods attempt to overcome this problem, and in this paper a study is described which considers the application of multivariate outlier detection methods to multiply imputed clinical laboratory safety data sets. Three different proportions of missing data are generated in laboratory data sets of dimensions 4, 7, 12 and 30, and a comparison of eight multiple imputation methods is carried out. Two outlier detection techniques, Mahalanobis distance and generalized principal component analysis, are applied to the multiply imputed data sets, and their performances are discussed. Measures are introduced for assessing the accuracy of the missing data results, depending on which method of analysis is used.
- Published
- 1999
47. The Dice co-efficient: a neglected verification performance measure for deterministic forecasts of binary events
- Author
-
Ian T. Jolliffe
- Subjects
Atmospheric Science ,010504 meteorology & atmospheric sciences ,Computer science ,Measure (physics) ,Binary number ,Dice ,Data mining ,010502 geochemistry & geophysics ,computer.software_genre ,01 natural sciences ,computer ,0105 earth and related environmental sciences - Published
- 2015
48. Time series modelling of surface pressure data
- Author
-
Shafeeqah Al-Awadhi and Ian T. Jolliffe
- Subjects
Atmospheric Science ,Transformation (function) ,Series (mathematics) ,Autoregressive model ,Statistics ,Barograph ,Autoregressive–moving-average model ,Representation (mathematics) ,Surface pressure ,Mathematics ,Surrogate data - Abstract
In this paper we examine time series modelling of surface pressure data, as measured by a barograph, at Herne Bay, England, during the years 1981‐1989. Autoregressive moving average (ARMA) models have been popular in many fields over the past 20 years, although applications in climatology have been rather less widespread than in some disciplines. Some recent examples are Milionis and Davies (Int. J. Climatol., 14, 569‐579) and Seleshi et al .( Int. J. Climatol., 14, 911‐923). We fit standard ARMA models to the pressure data separately for each of six 2-month natural seasons. Differences between the best fitting models for different seasons are discussed. Barograph data are recorded continuously, whereas ARMA models are fitted to discretely recorded data. The effect of different spacings between the fitted data on the models chosen is discussed briefly. Often, ARMA models can give a parsimonious and interpretable representation of a time series, but for many series the assumptions underlying such models are not fully satisfied, and more complex models may be considered. A specific feature of surface pressure data in the UK is that its behaviour is different at high and at low pressures: day-to-day changes are typically larger at low pressure levels than at higher levels. This means that standard assumptions used in fitting ARMA models are not valid, and two ways of overcoming this problem are investigated. Transformation of the data to better satisfy the usual assumptions is considered, as is the use of non-linear, specifically threshold autoregressive (TAR), models. © 1998 Royal Meteorological Society.
- Published
- 1998
49. Assessment of descriptive weather forecasts
- Author
-
Nils Jolliffe and Ian T. Jolliffe
- Subjects
Atmospheric Science ,Meteorology ,Environmental science - Published
- 1997
50. Variable selection and interpretation in canonical correlation analysis
- Author
-
Ian T. Jolliffe and Noriah M. Al-Kandari
- Subjects
Statistics and Probability ,Random variate ,Reification (statistics) ,Selection (relational algebra) ,Canonical correspondence analysis ,Modeling and Simulation ,Linear regression ,Statistics ,Feature selection ,Canonical correlation ,Mathematics ,Canonical analysis - Abstract
The canonical variates in canonical correlation analysis are often interpreted by looking at the weights or loadings of the variables in each canonical variate and effectively ignoring those variables whose weights or loadings are small. It is shown that such a procedure can be misleading. The related problem of selecting a subset of the original variables which preserves the information in the most important canonical variates is also examined. Because of different possible definitions of ‘the information in canonical variates’, any such subset selection needs very careful consideration.
- Published
- 1997
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.