157 results on '"Charles C Taylor"'
Search Results
2. Interval forecasts based on regression trees for streaming data.
- Author
-
Xin Zhao, Stuart Barber, Charles C. Taylor, and Zoka Milan
- Published
- 2021
- Full Text
- View/download PDF
3. A New Approach to Measuring Distances in Dense Graphs.
- Author
-
Fatimah A. Almulhim, Peter A. Thwaites, and Charles C. Taylor
- Published
- 2018
- Full Text
- View/download PDF
4. Sparse modelling of cancer patients' survival based on genomic copy number alterations.
- Author
-
Khaled Alqahtani, Charles C. Taylor, Henry M. Wood, and Arief Gusnanto
- Published
- 2022
- Full Text
- View/download PDF
5. Classification tree methods for panel data using wavelet-transformed time series.
- Author
-
Xin Zhao, Stuart Barber, Charles C. Taylor, and Zoka Milan
- Published
- 2018
- Full Text
- View/download PDF
6. Cross-validation is safe to use.
- Author
-
Ross D. King, Oghenejokpeme I. Orhobor, and Charles C. Taylor
- Published
- 2021
- Full Text
- View/download PDF
7. Kernel regression for errors-in-variables problems in the circular domain
- Author
-
Marco Di Marzio, Stefania Fensore, and Charles C. Taylor
- Subjects
Statistics and Probability ,Statistics, Probability and Uncertainty - Abstract
We study the problem of estimating a regression function when the predictor and/or the response are circular random variables in the presence of measurement errors. We propose estimators whose weight functions are deconvolution kernels defined according to the nature of the involved variables. We derive the asymptotic properties of the proposed estimators and consider possible generalizations and extensions. We provide some simulation results and a real data case study to illustrate and compare the proposed methods.
- Published
- 2023
- Full Text
- View/download PDF
8. Estimating optimal window size for analysis of low-coverage next-generation sequence data.
- Author
-
Arief Gusnanto, Charles C. Taylor, Ibrahim Nafisah, Henry M. Wood, Pamela Rabbitts, and Stefano Berri
- Published
- 2014
- Full Text
- View/download PDF
9. The package: nonparametric regression using local rotation matrices in
- Author
-
Giovanni Lafratta, Charles C. Taylor, Marco Di Marzio, and Stefania Fensore
- Subjects
Statistics and Probability ,021103 operations research ,Applied Mathematics ,0211 other engineering and technologies ,Nonparametric statistics ,02 engineering and technology ,Rotation matrix ,01 natural sciences ,Regression ,Bias reduction ,Nonparametric regression ,010104 statistics & probability ,Modeling and Simulation ,Statistics ,Singular value decomposition ,0101 mathematics ,Statistics, Probability and Uncertainty ,MIT License ,Mathematics - Abstract
The package implements nonparametric (smooth) regression for spherical data in , and is freely available from the Comprehensive Archive Network (CRAN), licensed under the MIT License. It can be use...
- Published
- 2021
- Full Text
- View/download PDF
10. Spatio-temporal forecasting using wavelet transform-based decision trees with application to air quality and covid-19 forecasting
- Author
-
Xin Zhao, Stuart Barber, Charles C Taylor, Xiaokai Nie, and Wenqian Shen
- Subjects
Statistics and Probability ,Articles ,Statistics, Probability and Uncertainty - Abstract
We develop a new method that combines a decision tree with a wavelet transform to forecast time series data with spatial spillover effects. The method can not only improve prediction but also give good interpretability of the time series mechanism. As a feature exploration method, the wavelet transform represents information at different resolution levels, which may improve the performance of decision trees. The method is applied to simulated data, air pollution and COVID time series data sets. In the simulation, Haar, LA8, D4 and D6 wavelets are compared, with the Haar wavelet having the best performance. In the air pollution application, by using wavelet transform-based decision trees, the temporal effect of air quality index including autoregressive and seasonal effects can be described as well as the spatial correlation effect. To describe the spillover spatial effect in contiguous regions, a spatial weight is constructed to improve the modeling performance. The results show that air quality index has autoregressive, seasonal and spatial spillover effects. The wavelet transformed variables have a better forecasting performance and enhanced interpretability than the original variables. For the COVID time series of cumulative cases, spatial weighted variables are not selected which shows the lock-down policies are truly effective.
- Published
- 2022
11. A comparison of block and semi-parametric bootstrap methods for variance estimation in spatial statistics.
- Author
-
N. Iranpanah, Mohsen Mohammadzadeh, and Charles C. Taylor
- Published
- 2011
- Full Text
- View/download PDF
12. Properties and approximate p-value calculation of the Cramer test
- Author
-
Arief Gusnanto, Charles C. Taylor, Alison Telford, and Henry M. Wood
- Subjects
Statistics and Probability ,Anderson–Darling test ,Applied Mathematics ,Cumulative distribution function ,Variance (accounting) ,Test (assessment) ,Distribution (mathematics) ,Modeling and Simulation ,Cramér–von Mises criterion ,Statistics ,p-value ,Statistics, Probability and Uncertainty ,Null hypothesis ,Mathematics - Abstract
Two-sample tests are probably the most commonly used tests in statistics. These tests generally address one aspect of the samples' distribution, such as mean or variance. When the null hypothesis is that two distributions are equal, the Anderson–Darling (AD) test, which is developed from the Cramer–von Mises (CvM) test, is generally employed. Unfortunately, we find that the AD test often fails to identify true differences when the differences are complex: they are not only in terms of mean, variance and/or skewness but also in terms of multi-modality. In such cases, we find that Cramer test, a modification of the CvM test, performs well. However, the adaptation of the Cramer test in routine analysis is hindered by the fact that the mean, variance and skewness of the test statistic are not available, which resulted in the problem of calculating the associated p-value. For this purpose, we propose a new method for obtaining a p-value by approximating the distribution of the test statistic by a generalized Pareto distribution. By approximating the distribution in this way, the calculation of the p-value is much faster than e.g. bootstrap method, especially for large n. We have observed that this approximation enables the Cramer test to have proper control of type-I error. A simulation study indicates that the Cramer test is as powerful as other tests in simple cases and more powerful in more complicated cases.
- Published
- 2020
- Full Text
- View/download PDF
13. Density estimation for circular data observed with errors
- Author
-
Charles C. Taylor, Stefania Fensore, Marco Di Marzio, and Agnese Panzera
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Applied Mathematics ,Estimator ,General Medicine ,Density estimation ,General Biochemistry, Genetics and Molecular Biology ,Bias ,Simple (abstract algebra) ,Kernel (statistics) ,Computer Simulation ,Deconvolution ,General Agricultural and Biological Sciences ,Equivalence (measure theory) ,Fourier series ,Algorithm ,Smoothing ,Mathematics - Abstract
Until now the problem of estimating circular densities when data are observed with errors has been mainly treated by Fourier series methods. We propose kernel-based estimators exhibiting simple construction and easy implementation. Specifically, we consider three different approaches: the first one is based on the equivalence between kernel estimators using data corrupted with different levels of error. This proposal appears to be totally unexplored, despite its potential for application also in the Euclidean setting. The second approach relies on estimators whose weight functions are circular deconvolution kernels. Due to the periodicity of the involved densities, it requires ad hoc mathematical tools. Finally, the third one is based on the idea of correcting extra bias of kernel estimators which use contaminated data and is essentially an adaptation of the standard theory to the circular case. For all the proposed estimators, we derive asymptotic properties, provide some simulation results, and also discuss some possible generalizations and extensions. Real data case studies are also included.
- Published
- 2022
14. Evaluating Usefulness for Dynamic Classification.
- Author
-
Gholamreza Nakhaeizadeh, Charles C. Taylor, and Carsten Lanquillon
- Published
- 1998
15. Automatic bandwidth selection for circular density estimation.
- Author
-
Charles C. Taylor
- Published
- 2008
- Full Text
- View/download PDF
16. Learning in Dynamically Changing Domains: Theory Revision and Context Dependence Issues.
- Author
-
Charles C. Taylor and Gholamreza Nakhaeizadeh
- Published
- 1997
- Full Text
- View/download PDF
17. Statistical Aspects of Classification in Drifting Populations.
- Author
-
Charles C. Taylor, Gholamreza Nakhaeizadeh, and G. Kunisch
- Published
- 1997
18. The Poisson Index: a new probabilistic model for protein-ligand binding site similarity.
- Author
-
J. R. Davies, Richard M. Jackson, Kanti V. Mardia, and Charles C. Taylor
- Published
- 2007
- Full Text
- View/download PDF
19. Classification of type I-censored bivariate data.
- Author
-
Matthew J. Langdon, Charles C. Taylor, and Robert M. West
- Published
- 2007
- Full Text
- View/download PDF
20. Hierarchical Bayesian modelling of spatial age-dependent mortality.
- Author
-
N. Miklós Arató, Ian L. Dryden, and Charles C. Taylor
- Published
- 2006
- Full Text
- View/download PDF
21. An Understanding of Muscle Fibre Images.
- Author
-
Charles C. Taylor, Mohammed Reza Faghihi, and Ian L. Dryden
- Published
- 1995
- Full Text
- View/download PDF
22. On boosting kernel density methods for multivariate data: density estimation and classification.
- Author
-
Marco Di Marzio and Charles C. Taylor
- Published
- 2005
- Full Text
- View/download PDF
23. Kernel density classification and boosting: an L2 analysis.
- Author
-
Marco Di Marzio and Charles C. Taylor
- Published
- 2005
- Full Text
- View/download PDF
24. Chain plot: a tool for exploiting bivariate temporal structures.
- Author
-
Charles C. Taylor and András Zempléni
- Published
- 2004
- Full Text
- View/download PDF
25. Statistical Methods in Learning.
- Author
-
A. Sutherland, Bob Henery, Rafael Molina 0001, Charles C. Taylor, and Ross D. King
- Published
- 1992
- Full Text
- View/download PDF
26. Procrustes shape analysis of triangulations of a two coloured point pattern.
- Author
-
Mohammed Reza Faghihi, Charles C. Taylor, and Ian L. Dryden
- Published
- 1999
- Full Text
- View/download PDF
27. Interval forecasts based on regression trees for streaming data
- Author
-
Stuart Barber, Charles C. Taylor, Zoka Milan, and Xin Zhao
- Subjects
Statistics and Probability ,Computer science ,Test data generation ,Applied Mathematics ,Autoregressive conditional heteroskedasticity ,CPU time ,Inference ,02 engineering and technology ,Interval (mathematics) ,01 natural sciences ,Regression ,Computer Science Applications ,010104 statistics & probability ,Tree (data structure) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Autoregressive integrated moving average ,0101 mathematics ,Algorithm - Abstract
In forecasting, we often require interval forecasts instead of just a specific point forecast. To track streaming data effectively, this interval forecast should reliably cover the observed data and yet be as narrow as possible. To achieve this, we propose two methods based on regression trees: one ensemble method and one method based on a single tree. For the ensemble method, we use weighted results from the most recent models, and for the single-tree method, we retain one model until it becomes necessary to train a new model. We propose a novel method to update the interval forecast adaptively using root mean square prediction errors calculated from the latest data batch. We use wavelet-transformed data to capture long time variable information and conditional inference trees for the underlying regression tree model. Results show that both methods perform well, having good coverage without the intervals being excessively wide. When the underlying data generation mechanism changes, their performance is initially affected but can recover relatively quickly as time proceeds. The method based on a single tree performs the best in computational (CPU) time compared to the ensemble method. When compared to ARIMA and GARCH modelling, our methods achieve better or similar coverage and width but require considerably less CPU time.
- Published
- 2019
- Full Text
- View/download PDF
28. Fluid shear stress stimulates breast cancer cells to display invasive and chemoresistant phenotypes while upregulating PLAU in a 3D bioreactor
- Author
-
Caymen Novak, Catherine Z. Liu, Eric N. Horst, Charles C. Taylor, and Geeta Mehta
- Subjects
0106 biological sciences ,0301 basic medicine ,Breast Neoplasms ,Bioengineering ,01 natural sciences ,Applied Microbiology and Biotechnology ,Article ,Metastasis ,Extracellular matrix ,03 medical and health sciences ,Bioreactors ,Breast cancer ,Downregulation and upregulation ,010608 biotechnology ,medicine ,Shear stress ,Humans ,Neoplasm Invasiveness ,Mechanotransduction ,Tumor microenvironment ,Chemistry ,Membrane Proteins ,medicine.disease ,Neoplasm Proteins ,Up-Regulation ,Gene Expression Regulation, Neoplastic ,030104 developmental biology ,Drug Resistance, Neoplasm ,Cancer cell ,MCF-7 Cells ,Cancer research ,Female ,Stress, Mechanical ,Shear Strength ,Biotechnology - Abstract
Breast cancer cells experience a range of shear stresses in the tumor microenvironment (TME). However most current in vitro three-dimensional (3D) models fail to systematically probe the effects of this biophysical stimuli on cancer cell metastasis, proliferation and chemoresistance. To investigate the roles of shear stress within the mammary and lung pleural effusion TME, a bioreactor capable of applying shear stress to cells within a 3D extracellular matrix was designed and characterized. Breast cancer cells were encapsulated within an interpenetrating network (IPN) hydrogel and subjected to shear stress of 5.4 dynes cm(−2) for 72 hours. Finite element modeling assessed shear stress profiles within the bioreactor. Cells exposed to shear stress had significantly higher cellular area and significantly lower circularity, indicating a motile phenotype. Stimulated cells were more proliferative than static controls and showed higher rates of chemoresistance to the anti-neoplastic drug paclitaxel. Fluid shear stress induced significant upregulation of the PLAU gene and elevated urokinase activity was confirmed through zymography and activity assay. Overall, these results indicate that pulsatile shear stress promotes breast cancer cell proliferation, invasive potential, chemoresistance, and PLAU signaling.
- Published
- 2019
- Full Text
- View/download PDF
29. Kernel Circular Deconvolution Density Estimation
- Author
-
Marco Di Marzio, Stefania Fensore, Charles C. Taylor, and Agnese Panzera
- Subjects
Observational error ,Kernel (statistics) ,Euclidean geometry ,Estimator ,Applied mathematics ,Deconvolution ,Density estimation ,Data application ,Mathematics - Abstract
We consider the problem of nonparametrically estimating a circular density from data contaminated by angular measurement errors. Specifically, we obtain a kernel-type estimator with weight functions that are reminiscent of deconvolution kernels. Here, differently from the Euclidean setting, discrete Fourier coefficients are involved rather than characteristic functions. We provide some simulation results along with a real data application.
- Published
- 2020
- Full Text
- View/download PDF
30. A New Approach to Measuring Distances in Dense Graphs
- Author
-
Charles C. Taylor, Peter A. Thwaites, and Fatimah A. Almulhim
- Subjects
Discrete mathematics ,Computer science ,k-means clustering ,Graph theory ,01 natural sciences ,Graph ,010305 fluids & plasmas ,Hierarchical clustering ,Vertex (geometry) ,Search algorithm ,0103 physical sciences ,Adjacency matrix ,010306 general physics ,Cluster analysis ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
The problem of computing distances and shortest paths between vertices in graphs is one of the fundamental issues in graph theory. It is of great importance in many different applications, for example, transportation, and social network analysis. However, efficient shortest distance algorithms are still desired in many disciplines. Basically, the majority of dense graphs have ties between the shortest distances. Therefore, we consider a different approach and introduce a new measure to solve all-pairs shortest paths for undirected and unweighted graphs. This measures the shortest distance between any two vertices by considering the length and the number of all possible paths between them. The main aim of this new approach is to break the ties between equal shortest paths SP, which can be obtained by the Breadth-first search algorithm (BFS), and distinguish meaningfully between these equal distances. Moreover, using the new measure in clustering produces higher quality results compared with SP. In our study, we apply two different clustering techniques: hierarchical clustering and K-means clustering, with four different graph models, and for a various number of clusters. We compare the results using a modularity function to check the quality of our clustering results.
- Published
- 2019
- Full Text
- View/download PDF
31. Kernel density classification for spherical data
- Author
-
Agnese Panzera, Charles C. Taylor, Marco Di Marzio, and Stefania Fensore
- Subjects
Statistics and Probability ,010104 statistics & probability ,Field (physics) ,Global climate ,010102 general mathematics ,Kernel density estimation ,Nonparametric statistics ,Applied mathematics ,Decision rule ,0101 mathematics ,Statistics, Probability and Uncertainty ,01 natural sciences ,Mathematics - Abstract
Classifying observations coming from two different spherical populations by using a nonparametric method appears to be an unexplored field, although clearly worth to pursue. We propose some decision rules based on spherical kernel density estimation and we provide asymptotic L 2 properties. A real-data application using global climate data is finally discussed.
- Published
- 2019
32. Geometry-based distance for clustering amino acids
- Author
-
Arief Gusnanto, Charles C. Taylor, and Samira F. Abushilah
- Subjects
Statistics and Probability ,chemistry.chemical_classification ,Quantitative Biology::Biomolecules ,business.industry ,Squared euclidean distance ,Pattern recognition ,Articles ,Quantitative Biology::Genomics ,Amino acid ,Hierarchical clustering ,chemistry ,Artificial intelligence ,Statistics, Probability and Uncertainty ,Cluster analysis ,business ,Mathematics - Abstract
Clustering amino acids is one of the most challenging problems in functional and structural prediction of protein. Previous studies have proposed clusters based on measurements of physical and biochemical characteristics of the amino acids such as volume, area, hydrophilicity, polarity, hydrogen bonding, shape, and charge. These characteristics, although important, are less directly related to the protein structure compared to geometrical characteristics such as dihedral angles between amino acids. We propose using the p-value from a test of equality of dihedral-angle distributions as the basis of a distance measure for the clustering. In this novel approach, an energy test is modified to deal with bivariate angular data and the p-value is obtained via a permutation method. The results indicate that the clusters of amino acids have sensible interpretation where Glycine, Proline, and Asparagine each forms a distinct cluster. A simulation study suggests that this approach has good working characteristics to cluster amino acids.
- Published
- 2019
- Full Text
- View/download PDF
33. Local binary regression with spherical predictors
- Author
-
Agnese Panzera, Marco Di Marzio, Charles C. Taylor, and Stefania Fensore
- Subjects
Statistics and Probability ,Polynomial regression ,Statistics::Theory ,010102 general mathematics ,Kernel density estimation ,Local regression ,Binary number ,Estimator ,01 natural sciences ,010104 statistics & probability ,Applied mathematics ,Statistics::Methodology ,Binary regression ,0101 mathematics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
We discuss local regression estimators when the predictor lies on the d -dimensional sphere and the response is binary. Despite Di Marzio et al. (2018b), who introduce spherical kernel density classification, we build on the theory of local polynomial regression and local likelihood. Simulations and a real-data application illustrate the effectiveness of the proposals.
- Published
- 2019
34. Cross-validation is safe to use
- Author
-
Oghenejokpeme I. Orhobor, Ross D. King, and Charles C. Taylor
- Subjects
Human-Computer Interaction ,Artificial Intelligence ,Computer Networks and Communications ,business.industry ,Medicine ,Computer Vision and Pattern Recognition ,business ,Software ,Cross-validation ,Reliability engineering - Published
- 2021
- Full Text
- View/download PDF
35. Classification of form under heterogeneity and non-isotropic errors
- Author
-
Arief Gusnanto, Farag Shuweihdi, and Charles C. Taylor
- Subjects
Statistics and Probability ,business.industry ,Computation ,Diagonal ,Estimator ,Pattern recognition ,Euclidean distance matrix ,computer.software_genre ,Form classification ,Weighting ,Data mining ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,computer ,Classifier (UML) ,Shape analysis (digital geometry) ,Mathematics - Abstract
A number of areas related to learning under supervision have not been fully investigated, particularly the possibility of incorporating the method of classification into shape analysis. In this regard, practical ideas conducive to the improvement of form classification are the focus of interest. Our proposal is to employ a hybrid classifier built on Euclidean Distance Matrix Analysis (EDMA) and Procrustes distance, rather than generalised Procrustes analysis (GPA). In empirical terms, it has been demonstrated that there is notable difference between the estimated form and the true form when EDMA is used as the basis for computation. However, this does not seem to be the case when GPA is employed. With the assumption that no association exists between landmarks, EDMA and GPA are used to calculate the mean form and diagonal weighting matrix to build superimposing classifiers. As our findings indicate, with the use of EDMA estimators, the superimposing classifiers we propose work extremely well, as opposed to the use of GPA, as far as both simulated and real datasets are concerned.
- Published
- 2016
- Full Text
- View/download PDF
36. Practical performance of local likelihood for circular density estimation
- Author
-
Agnese Panzera, Stefania Fensore, M. Di Marzio, and Charles C. Taylor
- Subjects
Statistics and Probability ,Normalization (statistics) ,education.field_of_study ,Mathematical optimization ,Estimation theory ,Applied Mathematics ,05 social sciences ,Population ,Probability and statistics ,Density estimation ,01 natural sciences ,Likelihood principle ,010104 statistics & probability ,Sample size determination ,Modeling and Simulation ,0502 economics and business ,0101 mathematics ,Statistics, Probability and Uncertainty ,education ,Likelihood function ,Algorithm ,050205 econometrics ,Mathematics - Abstract
Local likelihood has been mainly developed from an asymptotic point of view, with little attention to finite sample size issues. The present paper provides simulation evidence of how likelihood density estimation practically performs from two points of view. First, we explore the impact of the normalization step of the final estimate, second we show the effectiveness of higher order fits in identifying modes present in the population when small sample sizes are available. We refer to circular data, nevertheless it is easily seen that our findings straightforwardly extend to the Euclidean setting, where they appear to be somehow new.
- Published
- 2016
- Full Text
- View/download PDF
37. Nonparametric circular quantile regression
- Author
-
Charles C. Taylor, Marco Di Marzio, and Agnese Panzera
- Subjects
Statistics and Probability ,Circular distribution ,Applied Mathematics ,05 social sciences ,Nonparametric statistics ,Estimator ,Inversion (meteorology) ,Conditional probability distribution ,01 natural sciences ,Quantile regression ,010104 statistics & probability ,Circular conditional distribution function, circular conditional quantiles, circular kernels, optimal smoothing degree, wind directions ,0502 economics and business ,Statistics ,Applied mathematics ,Minification ,0101 mathematics ,Statistics, Probability and Uncertainty ,050205 econometrics ,Mathematics ,Quantile - Abstract
We discuss nonparametric estimation of conditional quantiles of a circular distribution when the conditioning variable is either linear or circular. Two different approaches are pursued: inversion of a conditional distribution function estimator, and minimization of a smoothed check function. Local constant and local linear versions of both estimators are discussed. Simulation experiments and a real data case study are used to illustrate the usefulness of the methods.
- Published
- 2016
- Full Text
- View/download PDF
38. A note on nonparametric estimation of circular conditional densities
- Author
-
M. Di Marzio, Charles C. Taylor, Agnese Panzera, and Stefania Fensore
- Subjects
Statistics and Probability ,Polynomial ,Applied Mathematics ,05 social sciences ,Nonparametric statistics ,Estimator ,Conditional probability distribution ,Conditional expectation ,01 natural sciences ,Quantile regression ,010104 statistics & probability ,Modeling and Simulation ,0502 economics and business ,Statistics ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Conditional variance ,050205 econometrics ,Quantile ,Mathematics - Abstract
The conditional density offers the most informative summary of the relationship between explanatory and response variables. We need to estimate it in place of the simple conditional mean when its shape is not well-behaved. A motivation for estimating conditional densities, specific to the circular setting, lies in the fact that a natural alternative of it, like quantile regression, could be considered problematic because circular quantiles are not rotationally equivariant. We treat conditional density estimation as a local polynomial fitting problem as proposed by Fan et al. [Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika. 1996;83:189–206] in the Euclidean setting, and discuss a class of estimators in the cases when the conditioning variable is either circular or linear. Asymptotic properties for some members of the proposed class are derived. The effectiveness of the methods for finite sample sizes is illustrated by simulation experiments a...
- Published
- 2016
- Full Text
- View/download PDF
39. Classification tree methods for panel data using wavelet-transformed time series
- Author
-
Charles C. Taylor, Xin Zhao, Zoka Milan, and Stuart Barber
- Subjects
Statistics and Probability ,Interpretation (logic) ,Series (mathematics) ,business.industry ,Computer science ,Applied Mathematics ,Decision tree learning ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,Data type ,010104 statistics & probability ,Computational Mathematics ,Wavelet ,Computational Theory and Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,0101 mathematics ,Representation (mathematics) ,Scale (map) ,business ,Panel data - Abstract
Wavelet-transformed variables can have better classification performance for panel data than using variables on their original scale. Examples are provided showing the types of data where using a wavelet-based representation is likely to improve classification accuracy. Results show that in most cases wavelet-transformed data have better or similar classification accuracy to the original data, and only select genuinely useful explanatory variables. Use of wavelet-transformed data provides localized mean and difference variables which can be more effective than the original variables, provide a means of separating “signal” from “noise”, and bring the opportunity for improved interpretation via the consideration of which resolution scales are the most informative. Panel data with multiple observations on each individual require some form of aggregation to classify at the individual level. Three different aggregation schemes are presented and compared using simulated data and real data gathered during liver transplantation. Methods based on aggregating individual level data before classification outperform methods which rely solely on the combining of time-point classifications.
- Published
- 2018
40. Statistical Estimate of Radon Concentration from Passive and Active Detectors in Doha
- Author
-
Rifaat Hassona, Adil Yousef, Kassim Mwitondi, Ibrahim Al Sadig, and Charles C. Taylor
- Subjects
Radon detection ,spatio-temporal analyses ,Information Systems and Management ,Meteorology ,0211 other engineering and technologies ,chemistry.chemical_element ,Radon ,unsupervised modelling ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Clustering ,Local regression ,radon detection ,Data modeling ,Spatio-temporal analyses ,Visualisation ,visualisation ,Cluster analysis ,0105 earth and related environmental sciences ,Potential impact ,021103 operations research ,Data collection ,estimation ,Detector ,lcsh:Z ,lcsh:Bibliography. Library science. Information resources ,Computer Science Applications ,local regression ,chemistry ,Work (electrical) ,Environmental science ,Estimation methods ,Estimation ,Unsupervised modelling ,clustering ,Information Systems - Abstract
Harnessing knowledge on the physical and natural conditions that affect our health, general livelihood and sustainability has long been at the core of scientific research. Health risks of ionising radiation from exposure to radon and radon decay products in homes, work and other public places entail developing novel approaches to modelling occurrence of the gas and its decaying products, in order to cope with the physical and natural dynamics in human habitats. Various data modelling approaches and techniques have been developed and applied to identify potential relationships among individual local meteorological parameters with a potential impact on radon concentrations&mdash, i.e., temperature, barometric pressure and relative humidity. In this first research work on radon concentrations in the State of Qatar, we present a combination of exploratory, visualisation and algorithmic estimation methods to try and understand the radon variations in and around the city of Doha. Data were obtained from the Central Radiation Laboratories (CRL) in Doha, gathered from 36 passive radon detectors deployed in various schools, residential and work places in and around Doha as well as from one active radon detector located at the CRL. Our key findings show high variations mainly attributable to technical variations in data gathering, as the equipment and devices appear to heavily influence the levels of radon detected. A parameter maximisation method applied to simulate data with similar behaviour to the data from the passive detectors in four of the neighbourhoods appears appropriate for estimating parameters in cases of data limitation. Data from the active detector exhibit interesting seasonal variations&mdash, with data clustering exhibiting two clearly separable groups, with passive and active detectors exhibiting a huge disagreement in readings. These patterns highlight challenges related to detection methods&mdash, in particular ensuring that deployed detectors and calculations of radon concentrations are adapted to local conditions. The study doesn&rsquo, t dwell much on building materials and makes rather fundamental assumptions, including an equal exhalation rate of radon from the soil across neighbourhoods, based on Doha&rsquo, s homogeneous underlying geological formation. The study also highlights potential extensions into the broader category of pollutants such as hydrocarbon, air particulate carbon monoxide and nitrogen dioxide at specific time periods of the year and particularly how they may tie in with global health institutions&rsquo, requirements.
- Published
- 2018
41. Statistical analysis of particulate matter data in Doha, Qatar
- Author
-
Charles C. Taylor, Kassim Mwitondi, and Adil Yousif
- Subjects
Pollution ,Data collection ,Meteorology ,media_common.quotation_subject ,Outlier ,Analyser ,Environmental science ,Sampling (statistics) ,Sample (statistics) ,Missing data ,Wind speed ,media_common - Abstract
Pollution in Doha is measured using passive, active and automatic sampling. In this paper we consider data automatically sampled in which various pollutants were continually collected and analysed every hour. At each station the sample is analysed on-line and in real time and the data is stored within the analyser, or a separate logger so it can be downloaded remotely by a modem. The accuracy produced enables pollution episodes to be analysed in detail and related to traffic flows, meteorology and other variables. Data has been collected hourly over more than 6 years at 3 different locations, with measurements available for various pollutants – for example, ozone, nitrogen oxides, sulphur dioxide, carbon monoxide, THC, methane and particulate matter (PM1.0, PM2.5 and PM10), as well as meteorological data such as humidity, temperature, and wind speed and direction. Despite much care in the data collection process, the resultant data has long stretches of missing values, when the equipment has malfunctioned – often as a result of more extreme conditions. Our analysis is twofold. Firstly, we consider ways to “clean” the data, by imputing missing values, including identified outliers. The second aspect specifically considers prediction of each particulate (PM1.0, PM2.5 and PM10) 24 hours ahead, using current (and previous) pollution and meteorological data. In this case, we use vector autoregressive models, compare with decision trees and propose variable selection criteria which explicitly adapt to missing data. Our results show that the regression tree models, with no variable transformations, perform the best, and that attempts to impute missing values are hampered by non-random missingness.
- Published
- 2018
42. Circular local likelihood
- Author
-
Charles C. Taylor, Agnese Panzera, Marco Di Marzio, and Stefania Fensore
- Subjects
Statistics and Probability ,Polynomial ,Bessel functions. Circular data. Density estimation. Log-likelihood. von Mises density ,Logarithm ,Basis (linear algebra) ,05 social sciences ,Kernel density estimation ,Estimator ,Density estimation ,Function (mathematics) ,01 natural sciences ,0506 political science ,010104 statistics & probability ,050602 political science & public administration ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Special case ,Mathematics - Abstract
We introduce a class of local likelihood circular density estimators, which includes the kernel density estimator as a special case. The idea lies in optimizing a spatially weighted version of the log-likelihood function, where the logarithm of the density is locally approximated by a periodic polynomial. The use of von Mises density functions as weights reduces the computational burden. Also, we propose closed-form estimators which could form the basis of counterparts in the multidimensional Euclidean setting. Simulation results and a real data case study are used to evaluate the performance and illustrate the results.
- Published
- 2018
43. Nonparametric Rotations for Sphere-Sphere Regression
- Author
-
Marco Di Marzio, Charles C. Taylor, and Agnese Panzera
- Subjects
Statistics and Probability ,Wahba's problem ,05 social sciences ,Nonparametric statistics ,Hypersphere ,01 natural sciences ,Regression ,Bias Reduction, Fisher’s Method of Scoring, Local Smoothing, Non-Rigid Rotation Estimation, Singular Value Decomposition, Skew-symmetric Matrices, Spherical Kernels, Wahba’s Problem ,010104 statistics & probability ,Simple (abstract algebra) ,0502 economics and business ,Singular value decomposition ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Rotation (mathematics) ,050205 econometrics ,Parametric statistics ,Mathematics - Abstract
Regression of data represented as points on a hypersphere has traditionally been treated using parametric families of transformations that include the simple rigid rotation as an important, special case. On the other hand, nonparametric methods have generally focused on modeling a scalar response through a spherical predictor by representing the regression function as a polynomial, leading to component-wise estimation of a spherical response. We propose a very flexible, simple regression model where for each location of the manifold a specific rotation matrix is to be estimated. To make this approach tractable, we assume continuity of the regression function that, in turn, allows for approximations of rotation matrices based on a series expansion. It is seen that the nonrigidity of our technique motivates an iterative estimation within a Newton–Raphson learning scheme, which exhibits bias reduction properties. Extensions to general shape matching are also outlined. Both simulations and real data are used to illustrate the results. Supplementary materials for this article are available online.
- Published
- 2018
- Full Text
- View/download PDF
44. Nonparametric estimating equations for circular probability density functions and their derivatives
- Author
-
Agnese Panzera, Charles C. Taylor, Stefania Fensore, and Marco Di Marzio
- Subjects
Statistics and Probability ,Mathematical optimization ,Population ,Fourier coefficients ,Probability density function ,Estimating equations ,trigonometric moments ,01 natural sciences ,010104 statistics & probability ,Circular kernels ,Density estimation ,Jackknife ,Sin-polynomials ,Trigonometric moments ,Von mises density ,density estimation ,0502 economics and business ,Applied mathematics ,0101 mathematics ,education ,von Mises density ,050205 econometrics ,Mathematics ,education.field_of_study ,05 social sciences ,Nonparametric statistics ,Estimator ,Probability and statistics ,jackknife ,Delta method ,sin-polynomials ,Statistics, Probability and Uncertainty - Abstract
We propose estimating equations whose unknown parameters are the values taken by a circular density and its derivatives at a point. Specifically, we solve equations which relate local versions of population trigonometric moments with their sample counterparts. Major advantages of our approach are: higher order bias without asymptotic variance inflation, closed form for the estimators, and absence of numerical tasks. We also investigate situations where the observed data are dependent. Theoretical results along with simulation experiments are provided.
- Published
- 2017
- Full Text
- View/download PDF
45. Estimating optimal window size for analysis of low-coverage next-generation sequence data
- Author
-
Ibrahim Nafisah, Charles C. Taylor, Henry M. Wood, Stefano Berri, Arief Gusnanto, and Pamela Rabbitts
- Subjects
Statistics and Probability ,Lung Neoplasms ,Computer science ,Context (language use) ,computer.software_genre ,Biochemistry ,Humans ,Molecular Biology ,Likelihood Functions ,Sequence ,Genome, Human ,High-Throughput Nucleotide Sequencing ,Window (computing) ,Contrast (statistics) ,Genomics ,Sequence Analysis, DNA ,Function (mathematics) ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Step function ,Data mining ,Akaike information criterion ,computer ,Algorithm ,Next generation sequence - Abstract
Motivation: Current high-throughput sequencing has greatly transformed genome sequence analysis. In the context of very low-coverage sequencing ( Results: We assume the reads density to be a step function. Given this model, we propose a data-based estimation of optimal window size based on Akaike’s information criterion (AIC) and cross-validation (CV) log-likelihood. By plotting the AIC and CV log-likelihood curve as a function of window size, we are able to estimate the optimal window size that minimizes AIC or maximizes CV log-likelihood. The proposed methods are of general purpose and we illustrate their application using low-coverage next-generation sequence datasets from real tumour samples and simulated datasets. Availability and implementation: An R package to estimate optimal window size is available at http://www1.maths.leeds.ac.uk/∼arief/R/win/ . Contact: a.gusnanto@leeds.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2014
- Full Text
- View/download PDF
46. Smooth estimation of circular cumulative distribution functions and quantiles
- Author
-
Marco Di Marzio, Charles C. Taylor, and Agnese Panzera
- Subjects
Statistics and Probability ,Kernel method ,Location parameter ,Cumulative distribution function ,Statistics ,Nonparametric statistics ,Estimator ,Applied mathematics ,Statistics, Probability and Uncertainty ,Covariance ,Empirical distribution function ,Quantile ,Mathematics - Abstract
Smooth nonparametric estimators based on a kernel method are proposed for cumulative distribution functions (CDFs) and quantiles of circular data. A sound motivation for this is that although for euclidean data similar estimators have been widely studied, for circular data nothing similar seems to exist; albeit, remarkably, in the circular-setting local methods are implemented more easily because of the absence of boundaries on the circle. The only alternative to our method seems to be the empirical CDF, that does not take into account circularity of data when the estimate is near the cut-point, as our local method naturally does. The definition of circular CDF is different from its euclidean counterpart in many respects, and this will give rise to estimators exhibiting some ‘unusual’ features such as, for example, global efficiency measures containing a location parameter and a covariance term. Simulations along with real data case studies illustrate the findings.
- Published
- 2012
- Full Text
- View/download PDF
47. Mixtures of concentrated multivariate sine distributions with applications to bioinformatics
- Author
-
Zhengzheng Zhang, Thomas Hamelryck, John T. Kent, Kanti V. Mardia, and Charles C. Taylor
- Subjects
Statistics and Probability ,Wishart distribution ,Univariate distribution ,Inverse-Wishart distribution ,Matrix t-distribution ,Statistics::Methodology ,Matrix normal distribution ,Multivariate t-distribution ,Statistics, Probability and Uncertainty ,Bioinformatics ,Elliptical distribution ,Normal-Wishart distribution ,Mathematics - Abstract
Motivated by examples in protein bioinformatics, we study a mixture model of multivariate angular distributions. The distribution treated here (multivariate sine distribution) is a multivariate extension of the well-known von Mises distribution on the circle. The density of the sine distribution has an intractable normalizing constant and here we propose to replace it in the concentrated case by a simple approximation. We study the EM algorithm for this distribution and apply it to a practical example from protein bioinformatics.
- Published
- 2012
- Full Text
- View/download PDF
48. Validating protein structure using kernel density estimates
- Author
-
Charles C. Taylor, Agnese Panzera, Marco Di Marzio, and Kanti V. Mardia
- Subjects
Statistics and Probability ,Quantitative Biology::Biomolecules ,Mathematical optimization ,Kernel density estimation ,Conditional probability distribution ,Density estimation ,Multivariate kernel density estimation ,Kernel embedding of distributions ,Variable kernel density estimation ,Test set ,Kernel (statistics) ,Statistics, Probability and Uncertainty ,Algorithm ,Mathematics - Abstract
Measuring the quality of determined protein structures is a very important problem in bioinformatics. Kernel density estimation is a well-known nonparametric method which is often used for exploratory data analysis. Recent advances, which have extended previous linear methods to multi-dimensional circular data, give a sound basis for the analysis of conformational angles of protein backbones, which lie on the torus. By using an energy test, which is based on interpoint distances, we initially investigate the dependence of the angles on the amino acid type. Then, by computing tail probabilities which are based on amino-acid conditional density estimates, a method is proposed which permits inference on a test set of data. This can be used, for example, to validate protein structures, choose between possible protein predictions and highlight unusual residue angles.
- Published
- 2012
- Full Text
- View/download PDF
49. Non-parametric smoothing and prediction for nonlinear circular time series
- Author
-
Agnese Panzera, Charles C. Taylor, and Macro Di Marzio
- Subjects
Statistics and Probability ,Mathematical optimization ,Field (physics) ,Series (mathematics) ,Applied Mathematics ,Nonparametric statistics ,Nonlinear system ,Applied mathematics ,Time domain ,Statistics, Probability and Uncertainty ,Constant (mathematics) ,Cross-spectrum ,Smoothing ,Mathematics - Abstract
Not much research has been done in the field of circular time-series analysis. We propose a non-parametric theory for smoothing and prediction in the time domain for circular time-series data. Our model is based on local constant and local linear fitting estimates of a minimizer of an angular risk function. Both asymptotic arguments and empirical examples are used to describe the accuracy of our methods.
- Published
- 2012
- Full Text
- View/download PDF
50. Kernel density estimation on the torus
- Author
-
Marco Di Marzio, Agnese Panzera, and Charles C. Taylor
- Subjects
Statistics and Probability ,Applied Mathematics ,Kernel density estimation ,Torus ,Density estimation ,Multivariate kernel density estimation ,Kernel method ,Variable kernel density estimation ,Calculus ,Partial derivative ,Applied mathematics ,Statistics, Probability and Uncertainty ,Smoothing ,Mathematics - Abstract
Kernel density estimation for multivariate, circular data has been formulated only when the sample space is the sphere, but theory for the torus would also be useful. For data lying on a d-dimensional torus (d >= 1), we discuss kernel estimation of a density, its mixed partial derivatives, and their squared functionals. We introduce a specific class of product kernels whose order is suitably defined in such a way to obtain L-2-risk formulas whose structure can be compared to their Euclidean counterparts. Our kernels are based on circular densities; however, we also discuss smaller bias estimation involving negative kernels which are functions of circular densities. Practical rules for selecting the smoothing degree, based on cross-validation, bootstrap and plug-in ideas are derived. Moreover, we provide specific results on the use of kernels based on the von Mises density. Finally, real-data examples and simulation studies illustrate the findings.
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.