85 results on '"Charles C. Taylor"'
Search Results
2. The package: nonparametric regression using local rotation matrices in
- Author
-
Giovanni Lafratta, Charles C. Taylor, Marco Di Marzio, and Stefania Fensore
- Subjects
Statistics and Probability ,021103 operations research ,Applied Mathematics ,0211 other engineering and technologies ,Nonparametric statistics ,02 engineering and technology ,Rotation matrix ,01 natural sciences ,Regression ,Bias reduction ,Nonparametric regression ,010104 statistics & probability ,Modeling and Simulation ,Statistics ,Singular value decomposition ,0101 mathematics ,Statistics, Probability and Uncertainty ,MIT License ,Mathematics - Abstract
The package implements nonparametric (smooth) regression for spherical data in , and is freely available from the Comprehensive Archive Network (CRAN), licensed under the MIT License. It can be use...
- Published
- 2021
- Full Text
- View/download PDF
3. Spatio-temporal forecasting using wavelet transform-based decision trees with application to air quality and covid-19 forecasting
- Author
-
Xin Zhao, Stuart Barber, Charles C Taylor, Xiaokai Nie, and Wenqian Shen
- Subjects
Statistics and Probability ,Articles ,Statistics, Probability and Uncertainty - Abstract
We develop a new method that combines a decision tree with a wavelet transform to forecast time series data with spatial spillover effects. The method can not only improve prediction but also give good interpretability of the time series mechanism. As a feature exploration method, the wavelet transform represents information at different resolution levels, which may improve the performance of decision trees. The method is applied to simulated data, air pollution and COVID time series data sets. In the simulation, Haar, LA8, D4 and D6 wavelets are compared, with the Haar wavelet having the best performance. In the air pollution application, by using wavelet transform-based decision trees, the temporal effect of air quality index including autoregressive and seasonal effects can be described as well as the spatial correlation effect. To describe the spillover spatial effect in contiguous regions, a spatial weight is constructed to improve the modeling performance. The results show that air quality index has autoregressive, seasonal and spatial spillover effects. The wavelet transformed variables have a better forecasting performance and enhanced interpretability than the original variables. For the COVID time series of cumulative cases, spatial weighted variables are not selected which shows the lock-down policies are truly effective.
- Published
- 2022
4. Properties and approximate p-value calculation of the Cramer test
- Author
-
Arief Gusnanto, Charles C. Taylor, Alison Telford, and Henry M. Wood
- Subjects
Statistics and Probability ,Anderson–Darling test ,Applied Mathematics ,Cumulative distribution function ,Variance (accounting) ,Test (assessment) ,Distribution (mathematics) ,Modeling and Simulation ,Cramér–von Mises criterion ,Statistics ,p-value ,Statistics, Probability and Uncertainty ,Null hypothesis ,Mathematics - Abstract
Two-sample tests are probably the most commonly used tests in statistics. These tests generally address one aspect of the samples' distribution, such as mean or variance. When the null hypothesis is that two distributions are equal, the Anderson–Darling (AD) test, which is developed from the Cramer–von Mises (CvM) test, is generally employed. Unfortunately, we find that the AD test often fails to identify true differences when the differences are complex: they are not only in terms of mean, variance and/or skewness but also in terms of multi-modality. In such cases, we find that Cramer test, a modification of the CvM test, performs well. However, the adaptation of the Cramer test in routine analysis is hindered by the fact that the mean, variance and skewness of the test statistic are not available, which resulted in the problem of calculating the associated p-value. For this purpose, we propose a new method for obtaining a p-value by approximating the distribution of the test statistic by a generalized Pareto distribution. By approximating the distribution in this way, the calculation of the p-value is much faster than e.g. bootstrap method, especially for large n. We have observed that this approximation enables the Cramer test to have proper control of type-I error. A simulation study indicates that the Cramer test is as powerful as other tests in simple cases and more powerful in more complicated cases.
- Published
- 2020
- Full Text
- View/download PDF
5. Density estimation for circular data observed with errors
- Author
-
Charles C. Taylor, Stefania Fensore, Marco Di Marzio, and Agnese Panzera
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Applied Mathematics ,Estimator ,General Medicine ,Density estimation ,General Biochemistry, Genetics and Molecular Biology ,Bias ,Simple (abstract algebra) ,Kernel (statistics) ,Computer Simulation ,Deconvolution ,General Agricultural and Biological Sciences ,Equivalence (measure theory) ,Fourier series ,Algorithm ,Smoothing ,Mathematics - Abstract
Until now the problem of estimating circular densities when data are observed with errors has been mainly treated by Fourier series methods. We propose kernel-based estimators exhibiting simple construction and easy implementation. Specifically, we consider three different approaches: the first one is based on the equivalence between kernel estimators using data corrupted with different levels of error. This proposal appears to be totally unexplored, despite its potential for application also in the Euclidean setting. The second approach relies on estimators whose weight functions are circular deconvolution kernels. Due to the periodicity of the involved densities, it requires ad hoc mathematical tools. Finally, the third one is based on the idea of correcting extra bias of kernel estimators which use contaminated data and is essentially an adaptation of the standard theory to the circular case. For all the proposed estimators, we derive asymptotic properties, provide some simulation results, and also discuss some possible generalizations and extensions. Real data case studies are also included.
- Published
- 2022
6. Interval forecasts based on regression trees for streaming data
- Author
-
Stuart Barber, Charles C. Taylor, Zoka Milan, and Xin Zhao
- Subjects
Statistics and Probability ,Computer science ,Test data generation ,Applied Mathematics ,Autoregressive conditional heteroskedasticity ,CPU time ,Inference ,02 engineering and technology ,Interval (mathematics) ,01 natural sciences ,Regression ,Computer Science Applications ,010104 statistics & probability ,Tree (data structure) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Autoregressive integrated moving average ,0101 mathematics ,Algorithm - Abstract
In forecasting, we often require interval forecasts instead of just a specific point forecast. To track streaming data effectively, this interval forecast should reliably cover the observed data and yet be as narrow as possible. To achieve this, we propose two methods based on regression trees: one ensemble method and one method based on a single tree. For the ensemble method, we use weighted results from the most recent models, and for the single-tree method, we retain one model until it becomes necessary to train a new model. We propose a novel method to update the interval forecast adaptively using root mean square prediction errors calculated from the latest data batch. We use wavelet-transformed data to capture long time variable information and conditional inference trees for the underlying regression tree model. Results show that both methods perform well, having good coverage without the intervals being excessively wide. When the underlying data generation mechanism changes, their performance is initially affected but can recover relatively quickly as time proceeds. The method based on a single tree performs the best in computational (CPU) time compared to the ensemble method. When compared to ARIMA and GARCH modelling, our methods achieve better or similar coverage and width but require considerably less CPU time.
- Published
- 2019
- Full Text
- View/download PDF
7. Fluid shear stress stimulates breast cancer cells to display invasive and chemoresistant phenotypes while upregulating PLAU in a 3D bioreactor
- Author
-
Caymen Novak, Catherine Z. Liu, Eric N. Horst, Charles C. Taylor, and Geeta Mehta
- Subjects
0106 biological sciences ,0301 basic medicine ,Breast Neoplasms ,Bioengineering ,01 natural sciences ,Applied Microbiology and Biotechnology ,Article ,Metastasis ,Extracellular matrix ,03 medical and health sciences ,Bioreactors ,Breast cancer ,Downregulation and upregulation ,010608 biotechnology ,medicine ,Shear stress ,Humans ,Neoplasm Invasiveness ,Mechanotransduction ,Tumor microenvironment ,Chemistry ,Membrane Proteins ,medicine.disease ,Neoplasm Proteins ,Up-Regulation ,Gene Expression Regulation, Neoplastic ,030104 developmental biology ,Drug Resistance, Neoplasm ,Cancer cell ,MCF-7 Cells ,Cancer research ,Female ,Stress, Mechanical ,Shear Strength ,Biotechnology - Abstract
Breast cancer cells experience a range of shear stresses in the tumor microenvironment (TME). However most current in vitro three-dimensional (3D) models fail to systematically probe the effects of this biophysical stimuli on cancer cell metastasis, proliferation and chemoresistance. To investigate the roles of shear stress within the mammary and lung pleural effusion TME, a bioreactor capable of applying shear stress to cells within a 3D extracellular matrix was designed and characterized. Breast cancer cells were encapsulated within an interpenetrating network (IPN) hydrogel and subjected to shear stress of 5.4 dynes cm(−2) for 72 hours. Finite element modeling assessed shear stress profiles within the bioreactor. Cells exposed to shear stress had significantly higher cellular area and significantly lower circularity, indicating a motile phenotype. Stimulated cells were more proliferative than static controls and showed higher rates of chemoresistance to the anti-neoplastic drug paclitaxel. Fluid shear stress induced significant upregulation of the PLAU gene and elevated urokinase activity was confirmed through zymography and activity assay. Overall, these results indicate that pulsatile shear stress promotes breast cancer cell proliferation, invasive potential, chemoresistance, and PLAU signaling.
- Published
- 2019
- Full Text
- View/download PDF
8. Kernel Circular Deconvolution Density Estimation
- Author
-
Marco Di Marzio, Stefania Fensore, Charles C. Taylor, and Agnese Panzera
- Subjects
Observational error ,Kernel (statistics) ,Euclidean geometry ,Estimator ,Applied mathematics ,Deconvolution ,Density estimation ,Data application ,Mathematics - Abstract
We consider the problem of nonparametrically estimating a circular density from data contaminated by angular measurement errors. Specifically, we obtain a kernel-type estimator with weight functions that are reminiscent of deconvolution kernels. Here, differently from the Euclidean setting, discrete Fourier coefficients are involved rather than characteristic functions. We provide some simulation results along with a real data application.
- Published
- 2020
- Full Text
- View/download PDF
9. A New Approach to Measuring Distances in Dense Graphs
- Author
-
Charles C. Taylor, Peter A. Thwaites, and Fatimah A. Almulhim
- Subjects
Discrete mathematics ,Computer science ,k-means clustering ,Graph theory ,01 natural sciences ,Graph ,010305 fluids & plasmas ,Hierarchical clustering ,Vertex (geometry) ,Search algorithm ,0103 physical sciences ,Adjacency matrix ,010306 general physics ,Cluster analysis ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
The problem of computing distances and shortest paths between vertices in graphs is one of the fundamental issues in graph theory. It is of great importance in many different applications, for example, transportation, and social network analysis. However, efficient shortest distance algorithms are still desired in many disciplines. Basically, the majority of dense graphs have ties between the shortest distances. Therefore, we consider a different approach and introduce a new measure to solve all-pairs shortest paths for undirected and unweighted graphs. This measures the shortest distance between any two vertices by considering the length and the number of all possible paths between them. The main aim of this new approach is to break the ties between equal shortest paths SP, which can be obtained by the Breadth-first search algorithm (BFS), and distinguish meaningfully between these equal distances. Moreover, using the new measure in clustering produces higher quality results compared with SP. In our study, we apply two different clustering techniques: hierarchical clustering and K-means clustering, with four different graph models, and for a various number of clusters. We compare the results using a modularity function to check the quality of our clustering results.
- Published
- 2019
- Full Text
- View/download PDF
10. Kernel density classification for spherical data
- Author
-
Agnese Panzera, Charles C. Taylor, Marco Di Marzio, and Stefania Fensore
- Subjects
Statistics and Probability ,010104 statistics & probability ,Field (physics) ,Global climate ,010102 general mathematics ,Kernel density estimation ,Nonparametric statistics ,Applied mathematics ,Decision rule ,0101 mathematics ,Statistics, Probability and Uncertainty ,01 natural sciences ,Mathematics - Abstract
Classifying observations coming from two different spherical populations by using a nonparametric method appears to be an unexplored field, although clearly worth to pursue. We propose some decision rules based on spherical kernel density estimation and we provide asymptotic L 2 properties. A real-data application using global climate data is finally discussed.
- Published
- 2019
11. Geometry-based distance for clustering amino acids
- Author
-
Arief Gusnanto, Charles C. Taylor, and Samira F. Abushilah
- Subjects
Statistics and Probability ,chemistry.chemical_classification ,Quantitative Biology::Biomolecules ,business.industry ,Squared euclidean distance ,Pattern recognition ,Articles ,Quantitative Biology::Genomics ,Amino acid ,Hierarchical clustering ,chemistry ,Artificial intelligence ,Statistics, Probability and Uncertainty ,Cluster analysis ,business ,Mathematics - Abstract
Clustering amino acids is one of the most challenging problems in functional and structural prediction of protein. Previous studies have proposed clusters based on measurements of physical and biochemical characteristics of the amino acids such as volume, area, hydrophilicity, polarity, hydrogen bonding, shape, and charge. These characteristics, although important, are less directly related to the protein structure compared to geometrical characteristics such as dihedral angles between amino acids. We propose using the p-value from a test of equality of dihedral-angle distributions as the basis of a distance measure for the clustering. In this novel approach, an energy test is modified to deal with bivariate angular data and the p-value is obtained via a permutation method. The results indicate that the clusters of amino acids have sensible interpretation where Glycine, Proline, and Asparagine each forms a distinct cluster. A simulation study suggests that this approach has good working characteristics to cluster amino acids.
- Published
- 2019
- Full Text
- View/download PDF
12. Local binary regression with spherical predictors
- Author
-
Agnese Panzera, Marco Di Marzio, Charles C. Taylor, and Stefania Fensore
- Subjects
Statistics and Probability ,Polynomial regression ,Statistics::Theory ,010102 general mathematics ,Kernel density estimation ,Local regression ,Binary number ,Estimator ,01 natural sciences ,010104 statistics & probability ,Applied mathematics ,Statistics::Methodology ,Binary regression ,0101 mathematics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
We discuss local regression estimators when the predictor lies on the d -dimensional sphere and the response is binary. Despite Di Marzio et al. (2018b), who introduce spherical kernel density classification, we build on the theory of local polynomial regression and local likelihood. Simulations and a real-data application illustrate the effectiveness of the proposals.
- Published
- 2019
13. Cross-validation is safe to use
- Author
-
Oghenejokpeme I. Orhobor, Ross D. King, and Charles C. Taylor
- Subjects
Human-Computer Interaction ,Artificial Intelligence ,Computer Networks and Communications ,business.industry ,Medicine ,Computer Vision and Pattern Recognition ,business ,Software ,Cross-validation ,Reliability engineering - Published
- 2021
- Full Text
- View/download PDF
14. Classification of form under heterogeneity and non-isotropic errors
- Author
-
Arief Gusnanto, Farag Shuweihdi, and Charles C. Taylor
- Subjects
Statistics and Probability ,business.industry ,Computation ,Diagonal ,Estimator ,Pattern recognition ,Euclidean distance matrix ,computer.software_genre ,Form classification ,Weighting ,Data mining ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,computer ,Classifier (UML) ,Shape analysis (digital geometry) ,Mathematics - Abstract
A number of areas related to learning under supervision have not been fully investigated, particularly the possibility of incorporating the method of classification into shape analysis. In this regard, practical ideas conducive to the improvement of form classification are the focus of interest. Our proposal is to employ a hybrid classifier built on Euclidean Distance Matrix Analysis (EDMA) and Procrustes distance, rather than generalised Procrustes analysis (GPA). In empirical terms, it has been demonstrated that there is notable difference between the estimated form and the true form when EDMA is used as the basis for computation. However, this does not seem to be the case when GPA is employed. With the assumption that no association exists between landmarks, EDMA and GPA are used to calculate the mean form and diagonal weighting matrix to build superimposing classifiers. As our findings indicate, with the use of EDMA estimators, the superimposing classifiers we propose work extremely well, as opposed to the use of GPA, as far as both simulated and real datasets are concerned.
- Published
- 2016
- Full Text
- View/download PDF
15. Nonparametric circular quantile regression
- Author
-
Charles C. Taylor, Marco Di Marzio, and Agnese Panzera
- Subjects
Statistics and Probability ,Circular distribution ,Applied Mathematics ,05 social sciences ,Nonparametric statistics ,Estimator ,Inversion (meteorology) ,Conditional probability distribution ,01 natural sciences ,Quantile regression ,010104 statistics & probability ,Circular conditional distribution function, circular conditional quantiles, circular kernels, optimal smoothing degree, wind directions ,0502 economics and business ,Statistics ,Applied mathematics ,Minification ,0101 mathematics ,Statistics, Probability and Uncertainty ,050205 econometrics ,Mathematics ,Quantile - Abstract
We discuss nonparametric estimation of conditional quantiles of a circular distribution when the conditioning variable is either linear or circular. Two different approaches are pursued: inversion of a conditional distribution function estimator, and minimization of a smoothed check function. Local constant and local linear versions of both estimators are discussed. Simulation experiments and a real data case study are used to illustrate the usefulness of the methods.
- Published
- 2016
- Full Text
- View/download PDF
16. A note on nonparametric estimation of circular conditional densities
- Author
-
M. Di Marzio, Charles C. Taylor, Agnese Panzera, and Stefania Fensore
- Subjects
Statistics and Probability ,Polynomial ,Applied Mathematics ,05 social sciences ,Nonparametric statistics ,Estimator ,Conditional probability distribution ,Conditional expectation ,01 natural sciences ,Quantile regression ,010104 statistics & probability ,Modeling and Simulation ,0502 economics and business ,Statistics ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Conditional variance ,050205 econometrics ,Quantile ,Mathematics - Abstract
The conditional density offers the most informative summary of the relationship between explanatory and response variables. We need to estimate it in place of the simple conditional mean when its shape is not well-behaved. A motivation for estimating conditional densities, specific to the circular setting, lies in the fact that a natural alternative of it, like quantile regression, could be considered problematic because circular quantiles are not rotationally equivariant. We treat conditional density estimation as a local polynomial fitting problem as proposed by Fan et al. [Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika. 1996;83:189–206] in the Euclidean setting, and discuss a class of estimators in the cases when the conditioning variable is either circular or linear. Asymptotic properties for some members of the proposed class are derived. The effectiveness of the methods for finite sample sizes is illustrated by simulation experiments a...
- Published
- 2016
- Full Text
- View/download PDF
17. Classification tree methods for panel data using wavelet-transformed time series
- Author
-
Charles C. Taylor, Xin Zhao, Zoka Milan, and Stuart Barber
- Subjects
Statistics and Probability ,Interpretation (logic) ,Series (mathematics) ,business.industry ,Computer science ,Applied Mathematics ,Decision tree learning ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,Data type ,010104 statistics & probability ,Computational Mathematics ,Wavelet ,Computational Theory and Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,0101 mathematics ,Representation (mathematics) ,Scale (map) ,business ,Panel data - Abstract
Wavelet-transformed variables can have better classification performance for panel data than using variables on their original scale. Examples are provided showing the types of data where using a wavelet-based representation is likely to improve classification accuracy. Results show that in most cases wavelet-transformed data have better or similar classification accuracy to the original data, and only select genuinely useful explanatory variables. Use of wavelet-transformed data provides localized mean and difference variables which can be more effective than the original variables, provide a means of separating “signal” from “noise”, and bring the opportunity for improved interpretation via the consideration of which resolution scales are the most informative. Panel data with multiple observations on each individual require some form of aggregation to classify at the individual level. Three different aggregation schemes are presented and compared using simulated data and real data gathered during liver transplantation. Methods based on aggregating individual level data before classification outperform methods which rely solely on the combining of time-point classifications.
- Published
- 2018
18. Statistical Estimate of Radon Concentration from Passive and Active Detectors in Doha
- Author
-
Rifaat Hassona, Adil Yousef, Kassim Mwitondi, Ibrahim Al Sadig, and Charles C. Taylor
- Subjects
Radon detection ,spatio-temporal analyses ,Information Systems and Management ,Meteorology ,0211 other engineering and technologies ,chemistry.chemical_element ,Radon ,unsupervised modelling ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Clustering ,Local regression ,radon detection ,Data modeling ,Spatio-temporal analyses ,Visualisation ,visualisation ,Cluster analysis ,0105 earth and related environmental sciences ,Potential impact ,021103 operations research ,Data collection ,estimation ,Detector ,lcsh:Z ,lcsh:Bibliography. Library science. Information resources ,Computer Science Applications ,local regression ,chemistry ,Work (electrical) ,Environmental science ,Estimation methods ,Estimation ,Unsupervised modelling ,clustering ,Information Systems - Abstract
Harnessing knowledge on the physical and natural conditions that affect our health, general livelihood and sustainability has long been at the core of scientific research. Health risks of ionising radiation from exposure to radon and radon decay products in homes, work and other public places entail developing novel approaches to modelling occurrence of the gas and its decaying products, in order to cope with the physical and natural dynamics in human habitats. Various data modelling approaches and techniques have been developed and applied to identify potential relationships among individual local meteorological parameters with a potential impact on radon concentrations&mdash, i.e., temperature, barometric pressure and relative humidity. In this first research work on radon concentrations in the State of Qatar, we present a combination of exploratory, visualisation and algorithmic estimation methods to try and understand the radon variations in and around the city of Doha. Data were obtained from the Central Radiation Laboratories (CRL) in Doha, gathered from 36 passive radon detectors deployed in various schools, residential and work places in and around Doha as well as from one active radon detector located at the CRL. Our key findings show high variations mainly attributable to technical variations in data gathering, as the equipment and devices appear to heavily influence the levels of radon detected. A parameter maximisation method applied to simulate data with similar behaviour to the data from the passive detectors in four of the neighbourhoods appears appropriate for estimating parameters in cases of data limitation. Data from the active detector exhibit interesting seasonal variations&mdash, with data clustering exhibiting two clearly separable groups, with passive and active detectors exhibiting a huge disagreement in readings. These patterns highlight challenges related to detection methods&mdash, in particular ensuring that deployed detectors and calculations of radon concentrations are adapted to local conditions. The study doesn&rsquo, t dwell much on building materials and makes rather fundamental assumptions, including an equal exhalation rate of radon from the soil across neighbourhoods, based on Doha&rsquo, s homogeneous underlying geological formation. The study also highlights potential extensions into the broader category of pollutants such as hydrocarbon, air particulate carbon monoxide and nitrogen dioxide at specific time periods of the year and particularly how they may tie in with global health institutions&rsquo, requirements.
- Published
- 2018
19. Statistical analysis of particulate matter data in Doha, Qatar
- Author
-
Charles C. Taylor, Kassim Mwitondi, and Adil Yousif
- Subjects
Pollution ,Data collection ,Meteorology ,media_common.quotation_subject ,Outlier ,Analyser ,Environmental science ,Sampling (statistics) ,Sample (statistics) ,Missing data ,Wind speed ,media_common - Abstract
Pollution in Doha is measured using passive, active and automatic sampling. In this paper we consider data automatically sampled in which various pollutants were continually collected and analysed every hour. At each station the sample is analysed on-line and in real time and the data is stored within the analyser, or a separate logger so it can be downloaded remotely by a modem. The accuracy produced enables pollution episodes to be analysed in detail and related to traffic flows, meteorology and other variables. Data has been collected hourly over more than 6 years at 3 different locations, with measurements available for various pollutants – for example, ozone, nitrogen oxides, sulphur dioxide, carbon monoxide, THC, methane and particulate matter (PM1.0, PM2.5 and PM10), as well as meteorological data such as humidity, temperature, and wind speed and direction. Despite much care in the data collection process, the resultant data has long stretches of missing values, when the equipment has malfunctioned – often as a result of more extreme conditions. Our analysis is twofold. Firstly, we consider ways to “clean” the data, by imputing missing values, including identified outliers. The second aspect specifically considers prediction of each particulate (PM1.0, PM2.5 and PM10) 24 hours ahead, using current (and previous) pollution and meteorological data. In this case, we use vector autoregressive models, compare with decision trees and propose variable selection criteria which explicitly adapt to missing data. Our results show that the regression tree models, with no variable transformations, perform the best, and that attempts to impute missing values are hampered by non-random missingness.
- Published
- 2018
20. Circular local likelihood
- Author
-
Charles C. Taylor, Agnese Panzera, Marco Di Marzio, and Stefania Fensore
- Subjects
Statistics and Probability ,Polynomial ,Bessel functions. Circular data. Density estimation. Log-likelihood. von Mises density ,Logarithm ,Basis (linear algebra) ,05 social sciences ,Kernel density estimation ,Estimator ,Density estimation ,Function (mathematics) ,01 natural sciences ,0506 political science ,010104 statistics & probability ,050602 political science & public administration ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Special case ,Mathematics - Abstract
We introduce a class of local likelihood circular density estimators, which includes the kernel density estimator as a special case. The idea lies in optimizing a spatially weighted version of the log-likelihood function, where the logarithm of the density is locally approximated by a periodic polynomial. The use of von Mises density functions as weights reduces the computational burden. Also, we propose closed-form estimators which could form the basis of counterparts in the multidimensional Euclidean setting. Simulation results and a real data case study are used to evaluate the performance and illustrate the results.
- Published
- 2018
21. Nonparametric Rotations for Sphere-Sphere Regression
- Author
-
Marco Di Marzio, Charles C. Taylor, and Agnese Panzera
- Subjects
Statistics and Probability ,Wahba's problem ,05 social sciences ,Nonparametric statistics ,Hypersphere ,01 natural sciences ,Regression ,Bias Reduction, Fisher’s Method of Scoring, Local Smoothing, Non-Rigid Rotation Estimation, Singular Value Decomposition, Skew-symmetric Matrices, Spherical Kernels, Wahba’s Problem ,010104 statistics & probability ,Simple (abstract algebra) ,0502 economics and business ,Singular value decomposition ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Rotation (mathematics) ,050205 econometrics ,Parametric statistics ,Mathematics - Abstract
Regression of data represented as points on a hypersphere has traditionally been treated using parametric families of transformations that include the simple rigid rotation as an important, special case. On the other hand, nonparametric methods have generally focused on modeling a scalar response through a spherical predictor by representing the regression function as a polynomial, leading to component-wise estimation of a spherical response. We propose a very flexible, simple regression model where for each location of the manifold a specific rotation matrix is to be estimated. To make this approach tractable, we assume continuity of the regression function that, in turn, allows for approximations of rotation matrices based on a series expansion. It is seen that the nonrigidity of our technique motivates an iterative estimation within a Newton–Raphson learning scheme, which exhibits bias reduction properties. Extensions to general shape matching are also outlined. Both simulations and real data are used to illustrate the results. Supplementary materials for this article are available online.
- Published
- 2018
- Full Text
- View/download PDF
22. Nonparametric estimating equations for circular probability density functions and their derivatives
- Author
-
Agnese Panzera, Charles C. Taylor, Stefania Fensore, and Marco Di Marzio
- Subjects
Statistics and Probability ,Mathematical optimization ,Population ,Fourier coefficients ,Probability density function ,Estimating equations ,trigonometric moments ,01 natural sciences ,010104 statistics & probability ,Circular kernels ,Density estimation ,Jackknife ,Sin-polynomials ,Trigonometric moments ,Von mises density ,density estimation ,0502 economics and business ,Applied mathematics ,0101 mathematics ,education ,von Mises density ,050205 econometrics ,Mathematics ,education.field_of_study ,05 social sciences ,Nonparametric statistics ,Estimator ,Probability and statistics ,jackknife ,Delta method ,sin-polynomials ,Statistics, Probability and Uncertainty - Abstract
We propose estimating equations whose unknown parameters are the values taken by a circular density and its derivatives at a point. Specifically, we solve equations which relate local versions of population trigonometric moments with their sample counterparts. Major advantages of our approach are: higher order bias without asymptotic variance inflation, closed form for the estimators, and absence of numerical tasks. We also investigate situations where the observed data are dependent. Theoretical results along with simulation experiments are provided.
- Published
- 2017
- Full Text
- View/download PDF
23. Estimating optimal window size for analysis of low-coverage next-generation sequence data
- Author
-
Ibrahim Nafisah, Charles C. Taylor, Henry M. Wood, Stefano Berri, Arief Gusnanto, and Pamela Rabbitts
- Subjects
Statistics and Probability ,Lung Neoplasms ,Computer science ,Context (language use) ,computer.software_genre ,Biochemistry ,Humans ,Molecular Biology ,Likelihood Functions ,Sequence ,Genome, Human ,High-Throughput Nucleotide Sequencing ,Window (computing) ,Contrast (statistics) ,Genomics ,Sequence Analysis, DNA ,Function (mathematics) ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Step function ,Data mining ,Akaike information criterion ,computer ,Algorithm ,Next generation sequence - Abstract
Motivation: Current high-throughput sequencing has greatly transformed genome sequence analysis. In the context of very low-coverage sequencing ( Results: We assume the reads density to be a step function. Given this model, we propose a data-based estimation of optimal window size based on Akaike’s information criterion (AIC) and cross-validation (CV) log-likelihood. By plotting the AIC and CV log-likelihood curve as a function of window size, we are able to estimate the optimal window size that minimizes AIC or maximizes CV log-likelihood. The proposed methods are of general purpose and we illustrate their application using low-coverage next-generation sequence datasets from real tumour samples and simulated datasets. Availability and implementation: An R package to estimate optimal window size is available at http://www1.maths.leeds.ac.uk/∼arief/R/win/ . Contact: a.gusnanto@leeds.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2014
- Full Text
- View/download PDF
24. Validating protein structure using kernel density estimates
- Author
-
Charles C. Taylor, Agnese Panzera, Marco Di Marzio, and Kanti V. Mardia
- Subjects
Statistics and Probability ,Quantitative Biology::Biomolecules ,Mathematical optimization ,Kernel density estimation ,Conditional probability distribution ,Density estimation ,Multivariate kernel density estimation ,Kernel embedding of distributions ,Variable kernel density estimation ,Test set ,Kernel (statistics) ,Statistics, Probability and Uncertainty ,Algorithm ,Mathematics - Abstract
Measuring the quality of determined protein structures is a very important problem in bioinformatics. Kernel density estimation is a well-known nonparametric method which is often used for exploratory data analysis. Recent advances, which have extended previous linear methods to multi-dimensional circular data, give a sound basis for the analysis of conformational angles of protein backbones, which lie on the torus. By using an energy test, which is based on interpoint distances, we initially investigate the dependence of the angles on the amino acid type. Then, by computing tail probabilities which are based on amino-acid conditional density estimates, a method is proposed which permits inference on a test set of data. This can be used, for example, to validate protein structures, choose between possible protein predictions and highlight unusual residue angles.
- Published
- 2012
- Full Text
- View/download PDF
25. Kernel density estimation on the torus
- Author
-
Marco Di Marzio, Agnese Panzera, and Charles C. Taylor
- Subjects
Statistics and Probability ,Applied Mathematics ,Kernel density estimation ,Torus ,Density estimation ,Multivariate kernel density estimation ,Kernel method ,Variable kernel density estimation ,Calculus ,Partial derivative ,Applied mathematics ,Statistics, Probability and Uncertainty ,Smoothing ,Mathematics - Abstract
Kernel density estimation for multivariate, circular data has been formulated only when the sample space is the sphere, but theory for the torus would also be useful. For data lying on a d-dimensional torus (d >= 1), we discuss kernel estimation of a density, its mixed partial derivatives, and their squared functionals. We introduce a specific class of product kernels whose order is suitably defined in such a way to obtain L-2-risk formulas whose structure can be compared to their Euclidean counterparts. Our kernels are based on circular densities; however, we also discuss smaller bias estimation involving negative kernels which are functions of circular densities. Practical rules for selecting the smoothing degree, based on cross-validation, bootstrap and plug-in ideas are derived. Moreover, we provide specific results on the use of kernels based on the von Mises density. Finally, real-data examples and simulation studies illustrate the findings.
- Published
- 2011
- Full Text
- View/download PDF
26. Local polynomial regression for circular predictors
- Author
-
Marco Di Marzio, Agnese Panzera, and Charles C. Taylor
- Subjects
Statistics and Probability ,Polynomial regression ,Polynomial ,Probability theory ,Calculus ,Applied mathematics ,Torus ,Statistics, Probability and Uncertainty ,Design space ,Smoothing ,Mathematics ,Variable (mathematics) - Abstract
We consider local smoothing of datasets where the design space is the d-dimensional (d≥ 1) torus and the response variable is real-valued. Our purpose is to extend least squa res local polynomial fitting to this situation. We give both theoretical and empirical results.
- Published
- 2009
- Full Text
- View/download PDF
27. Using small bias nonparametric density estimators for confidence interval estimation
- Author
-
Marco Di Marzio and Charles C. Taylor
- Subjects
Statistics and Probability ,Bootstrapping (electronics) ,Kernel (statistics) ,Statistics ,Econometrics ,Nonparametric statistics ,Estimator ,Statistics, Probability and Uncertainty ,U-statistic ,Confidence interval ,CDF-based nonparametric confidence interval ,Multivariate kernel density estimation ,Mathematics - Abstract
Confidence intervals for densities built on the basis of standard nonparametric theory are doomed to have poor coverage rates due to bias. Studies on coverage improvement exist, but reasonably behaved interval estimators are needed. We explore the use of small bias kernel-based methods to construct confidence intervals, in particular using a geometric density estimator that seems better suited for this purpose.
- Published
- 2009
- Full Text
- View/download PDF
28. Maximum likelihood estimation using composite likelihoods for closed exponential families
- Author
-
Kanti V. Mardia, Charles C. Taylor, Gareth Hughes, and John T. Kent
- Subjects
Statistics and Probability ,Pseudolikelihood ,Restricted maximum likelihood ,Applied Mathematics ,General Mathematics ,Normalizing constant ,Bivariate von Mises distribution ,Maximum likelihood sequence estimation ,Agricultural and Biological Sciences (miscellaneous) ,Statistics::Computation ,Exponential family ,Expectation–maximization algorithm ,Statistics ,Statistics::Methodology ,Computer Science::Symbolic Computation ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,Likelihood function ,Mathematics - Abstract
In certain multivariate problems the full probability density has an awkward normalizing constant, but the conditional and/or marginal distributions may be much more tractable. In this paper we investigate the use of composite likelihoods instead of the full likelihood. For closed exponential families, both are shown to be maximized by the same parameter values for any number of observations. Examples include log-linear models and multivariate normal models. In other cases the parameter estimate obtained by maximizing a composite likelihood can be viewed as an approximation to the full maximum likelihood estimate. An application is given to an example in directional data based on a bivariate von Mises distribution. Copyright 2009, Oxford University Press.
- Published
- 2009
- Full Text
- View/download PDF
29. On boosting kernel regression
- Author
-
Marco Di Marzio and Charles C. Taylor
- Subjects
Statistics and Probability ,Analysis of covariance ,Boosting (machine learning) ,Iterative method ,Applied Mathematics ,Estimator ,Cross-validation ,Kernel method ,Statistics ,Kernel regression ,Applied mathematics ,Statistics, Probability and Uncertainty ,Smoothing ,Mathematics - Abstract
In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the Nadaraya-Watson estimator with L2boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast, and the variance diverges exponentially slow. The firs t boosting step is analyzed in more detail, giving asymptotic expressions as functions of the smoothing parameter, and relationships with previous work are explored. Practical performance is illustrated by both simulated and real data.
- Published
- 2008
- Full Text
- View/download PDF
30. A multivariate von mises distribution with applications to bioinformatics
- Author
-
Kanti V. Mardia, Gareth Hughes, Harshinder Singh, and Charles C. Taylor
- Subjects
Statistics and Probability ,Multivariate statistics ,Univariate ,Multivariate normal distribution ,Bivariate analysis ,Conditional probability distribution ,Wald test ,Statistics::Computation ,Statistics ,von Mises distribution ,Statistics::Methodology ,Applied mathematics ,Statistics, Probability and Uncertainty ,Marginal distribution ,Mathematics - Abstract
Motivated by problems of modelling torsional angles in molecules, Singh, Hnizdo & Demchuk (2002) proposed a bivariate circular model which is a natural torus analogue of the bivariate normal distribution and a natural extension of the univariate von Mises distribution to the bivariate case. The authors present here a multivariate extension of the bivariate model of Singh, Hnizdo & Demchuk (2002). They study the conditional distributions and investigate the shapes of marginal distributions for a special case. The methods of moments and pseudo-likelihood are considered for the estimation of parameters of the new distribution. The authors investigate the efficiency of the pseudo-likelihood approach in three dimensions. They illustrate their methods with protein data of conformational angles.
- Published
- 2008
- Full Text
- View/download PDF
31. Automatic bandwidth selection for circular density estimation
- Author
-
Charles C. Taylor
- Subjects
Statistics and Probability ,Alternative methods ,Applied Mathematics ,Bandwidth (signal processing) ,Concentration parameter ,Estimator ,Density estimation ,Bivariate analysis ,Computational Mathematics ,Computational Theory and Mathematics ,Euclidean geometry ,Statistics ,von Mises distribution ,Applied mathematics ,Mathematics - Abstract
Given angular data @q"1,...,@q"[email protected]?[0,[email protected]) a common objective is to estimate the density. In case that a kernel estimator is used, bandwidth selection is crucial to the performance. A ''plug-in rule'' for the bandwidth, which is based on the concentration of a reference density, namely, the von Mises distribution is obtained. It is seen that this is equivalent to the usual Euclidean plug-in rule in the case where the concentration becomes large. In case that the concentration parameter is unknown, alternative methods are explored which are intended to be robust to departures from the reference density. Simulations indicate that ''wrapped estimators'' can perform well in this context. The methods are applied to a real bivariate dataset concerning protein structure.
- Published
- 2008
- Full Text
- View/download PDF
32. The Poisson Index: a new probabilistic model for protein–ligand binding site similarity
- Author
-
J.R. Davies, Richard M. Jackson, Charles C. Taylor, and Kanti V. Mardia
- Subjects
Statistics and Probability ,Matching (graph theory) ,Structural similarity ,Molecular Sequence Data ,Ligands ,Poisson distribution ,Biochemistry ,Measure (mathematics) ,symbols.namesake ,Similarity (network science) ,Sequence Analysis, Protein ,Protein Interaction Mapping ,Statistics ,Computer Simulation ,Amino Acid Sequence ,Poisson Distribution ,Molecular Biology ,Mathematics ,Binding Sites ,Models, Statistical ,Sequence Homology, Amino Acid ,business.industry ,Proteins ,Contrast (statistics) ,Pattern recognition ,Statistical model ,Similitude ,Computer Science Applications ,Computational Mathematics ,Models, Chemical ,Computational Theory and Mathematics ,symbols ,Artificial intelligence ,business ,Algorithms ,Protein Binding - Abstract
Motivation: The large-scale comparison of protein–ligand binding sites is problematic, in that measures of structural similarity are difficult to quantify and are not easily understood in terms of statistical similarity that can ultimately be related to structure and function. We present a binding site matching score the Poisson Index (PI) based upon a well-defined statistical model. PI requires only the number of matching atoms between two sites and the size of the two sites—the same information used by the Tanimoto Index (TI), a comparable and widely used measure for molecular similarity. We apply PI and TI to a previously automatically extracted set of binding sites to determine the robustness and usefulness of both scores.Results: We found that PI outperforms TI; moreover, site similarity is poorly defined for TI at values around the 99.5% confidence level for which PI is well defined. A difference map at this confidence level shows that PI gives much more meaningful information than TI. We show individual examples where TI fails to distinguish either a false or a true site paring in contrast to PI, which performs much better. TI cannot handle large or small sites very well, or the comparison of large and small sites, in contrast to PI that is shown to be much more robust. Despite the difficulty of determining a biological ‘ground truth’ for binding site similarity we conclude that PI is a suitable measure of binding site similarity and could form the basis for a binding site classification scheme comparable to existing protein domain classification schema.Availability: PI is implemented in SitesBase www.modelling.leeds.ac.uk/sb/Contact: r.m.jackson@leeds.ac.uk
- Published
- 2007
- Full Text
- View/download PDF
33. Classification of type I-censored bivariate data
- Author
-
Matthew J. Langdon, Robert West, and Charles C. Taylor
- Subjects
Statistics and Probability ,business.industry ,Applied Mathematics ,Pattern recognition ,Bivariate analysis ,Bayes classifier ,Censoring (statistics) ,Computational Mathematics ,Bayes' theorem ,Computational Theory and Mathematics ,Bivariate data ,Decision boundary ,Artificial intelligence ,business ,Random variable ,Classifier (UML) ,Mathematics - Abstract
Type I, or limits of detection censoring occurs when a random variable is only observable between fixed and known limits. The classification problem, when the feature vectors to be used to classify are bivariate type I-censored observations, is considered. A Bayes' optimal classifier is constructed under the assumption that the underlying distribution is Gaussian and it is shown that the decision boundary between classes is not continuous as in the uncensored case. Examples of the decision boundary are presented and simulation studies are used to illustrate the methods described. The resultant classifier is applied to simulated electrical impedance tomography data and a medical data set as illustrations.
- Published
- 2007
- Full Text
- View/download PDF
34. Hierarchical Bayesian modelling of spatial age-dependent mortality
- Author
-
Ian L. Dryden, N. Miklós Arató, and Charles C. Taylor
- Subjects
Statistics and Probability ,Markov chain ,Applied Mathematics ,Posterior probability ,Markov chain Monte Carlo ,Conditional probability distribution ,Markov model ,Binomial distribution ,Computational Mathematics ,symbols.namesake ,Metropolis–Hastings algorithm ,Computational Theory and Mathematics ,Prior probability ,Statistics ,Econometrics ,symbols ,Quantitative Biology::Populations and Evolution ,Mathematics - Abstract
Hierarchical Bayesian modelling is considered for the number of age-dependent deaths in different geographic regions. The model uses a conditional binomial distribution for the number of age-dependent deaths, a new family of zero mean Gaussian Markov random field models for incorporating spatial correlations between neighbouring regions, and an intrinsic Gaussian model for including correlations between age-dependent mortality rates. Age-dependent mortality rates are estimated for each region, and approximate credibility intervals based on summaries of samples from the posterior distribution are obtained from Markov chain Monte Carlo simulation. The consequent maps of mortality rates are less variable and smoother than those which would be obtained from naive estimates, and various inferences may be drawn from the results. The prior spatial model includes some of the common conditional autoregressive spatial models used in epidemiology, and so model uncertainty in this family can be accounted for. The methodology is illustrated with an actuarial data set of age-dependent deaths in 150 geographic regions of Hungary. Sensitivity to the prior distributions is discussed, as well as relative risks for certain covariates (males in towns, females in towns, males in villages, females in villages).
- Published
- 2006
- Full Text
- View/download PDF
35. Kernel density classification and boosting: an L2 analysis
- Author
-
M. Di Marzio and Charles C. Taylor
- Subjects
Statistics and Probability ,business.industry ,Kernel density estimation ,Pattern recognition ,Multivariate kernel density estimation ,Theoretical Computer Science ,Kernel method ,Computational Theory and Mathematics ,Variable kernel density estimation ,Kernel embedding of distributions ,Polynomial kernel ,Radial basis function kernel ,Kernel regression ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,Mathematics - Abstract
Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is "boosting", and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research.
- Published
- 2005
- Full Text
- View/download PDF
36. Boosted Regression Estimates of Spatial Data: Pointwise Inference
- Author
-
Marco Di Marzio and Charles C. Taylor
- Subjects
Pointwise ,Statistics and Probability ,Boosting (machine learning) ,General Mathematics ,Statistics ,Econometrics ,Nonparametric statistics ,Estimator ,Inference ,Spatial analysis ,Cross-validation ,Regression ,Mathematics - Abstract
In this study simple nonparametric techniques have been adopted to estimate the trend surface of the Swiss rainfall data. In particular we employed the Nadaraya-Watson smoother and in addition, an adapted-by boosting-version of it. Additionally, we have explored the use of the Nadaraya-Watson estimator for the construction of pointwise confidence intervals. Overall, boosting does seem to improve the estimate as much as previous examples and the results indicate that cross-validation can be successfully used for parameter selection on real datasets. In addition, our estimators compare favorably with most of the techniques previously used on this dataset.
- Published
- 2005
- Full Text
- View/download PDF
37. Non-Stationary Spatiotemporal Analysis of Karst Water Levels
- Author
-
J. Kovács, Charles C. Taylor, Ian L. Dryden, and L. Márkus
- Subjects
Statistics and Probability ,Data set ,Covariance function ,Kriging ,Stochastic modelling ,Econometrics ,Estimator ,Applied mathematics ,Hydrograph ,Statistics, Probability and Uncertainty ,Covariance ,Cross-validation ,Mathematics - Abstract
Summary We consider non-stationary spatiotemporal modelling in an investigation into karst water levels in western Hungary. A strong feature of the data set is the extraction of large amounts of water from mines, which caused the water levels to reduce until about 1990 when the mining ceased, and then the levels increased quickly. We discuss some traditional hydrogeological models which might be considered to be appropriate for this situation, and various alternative stochastic models. In particular, a separable space–time covariance model is proposed which is then deformed in time to account for the non-stationary nature of the lagged correlations between sites. Suitable covariance functions are investigated and then the models are fitted by using weighted least squares and cross-validation. Forecasting and prediction are carried out by using spatiotemporal kriging. We assess the performance of the method with one-step-ahead forecasting and make comparisons with naïve estimators. We also consider spatiotemporal prediction at a set of new sites. The new model performs favourably compared with the deterministic model and the naïve estimators, and the deformation by time shifting is worthwhile.
- Published
- 2005
- Full Text
- View/download PDF
38. Chain plot: a tool for exploiting bivariate temporal structures
- Author
-
András Zempléni and Charles C. Taylor
- Subjects
Statistics and Probability ,Probability plot ,Partial residual plot ,Applied Mathematics ,Bivariate analysis ,Probability plot correlation coefficient plot ,Computational Mathematics ,Exploratory data analysis ,Computational Theory and Mathematics ,Chain (algebraic topology) ,Bivariate data ,Statistics ,Q–Q plot ,Algorithm ,Mathematics - Abstract
In this paper we present a graphical tool useful for visualizing the cyclic behaviour of bivariate time series. We investigate its properties and link it to the asymmetry of the two variables concerned. We also suggest adding approximate confidence bounds to the points on the plot and investigate the effect of lagging to the chain plot. We conclude our paper by some standard Fourier analysis, relating and comparing this to the chain plot.
- Published
- 2004
- Full Text
- View/download PDF
39. Boosting kernel density estimates: A bias reduction technique?
- Author
-
Marco Di Marzio and Charles C. Taylor
- Subjects
Statistics and Probability ,Boosting (machine learning) ,Applied Mathematics ,General Mathematics ,Statistics ,Kernel density estimation ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,Agricultural and Biological Sciences (miscellaneous) ,Bias reduction ,Mathematics - Abstract
SUMMARY This paper proposes an algorithm for boosting kernel density estimates. We show that boosting is closely linked to a previously proposed method of bias reduction and indicate how it should enjoy similar properties. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research.
- Published
- 2004
- Full Text
- View/download PDF
40. Bayesian Texture Segmentation of Weed and Crop Images Using Reversible Jump Markov Chain Monte Carlo Methods
- Author
-
Mark R. Scarr, Charles C. Taylor, and Ian L. Dryden
- Subjects
Statistics and Probability ,Random field ,Markov random field ,business.industry ,Posterior probability ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Reversible-jump Markov chain Monte Carlo ,Mixture model ,Markov model ,Computer Science::Graphics ,Metropolis–Hastings algorithm ,Computer Science::Computer Vision and Pattern Recognition ,Prior probability ,Statistics ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
Summary A Bayesian method for segmenting weed and crop textures is described and implemented. The work forms part of a project to identify weeds and crops in images so that selective crop spraying can be carried out. An image is subdivided into blocks and each block is modelled as a single texture. The number of different textures in the image is assumed unknown. A hierarchical Bayesian procedure is used where the texture labels have a Potts model (colour Ising Markov random field) prior and the pixels within a block are distributed according to a Gaussian Markov random field, with the parameters dependent on the type of texture. We simulate from the posterior distribution by using a reversible jump Metropolis–Hastings algorithm, where the number of different texture components is allowed to vary. The methodology is applied to a simulated image and then we carry out texture segmentation on the weed and crop images that motivated the work.
- Published
- 2003
- Full Text
- View/download PDF
41. Nonparametric regression for spherical data
- Author
-
Charles C. Taylor, Agnese Panzera, and Marco Di Marzio
- Subjects
Statistics and Probability ,Polynomial regression ,Polynomial ,Mathematical optimization ,Dimension (vector space) ,Statistics, Probability and Uncertainty ,Local polynomial fitting ,Spherical-linear regression ,Spherical-spherical regression ,Regression ,Nonparametric regression ,Curse of dimensionality ,Interpretability ,Mathematics ,Parametric statistics - Abstract
We develop nonparametric smoothing for regression when both the predictor and the response variables are defined on a sphere of whatever dimension. A local polynomial fitting approach is pursued, which retains all the advantages in terms of rate optimality, interpretability, and ease of implementation widely observed in the standard setting. Our estimates have a multi-output nature, meaning that each coordinate is separately estimated, within a scheme of a regression with a linear response. The main properties include linearity and rotational equivariance. This research has been motivated by the fact that very few models describe this kind of regression. Such current methods are surely not widely employable since they have a parametric nature, and also require the same dimensionality for prediction and response spaces, along with nonrandom design. Our approach does not suffer these limitations. Real-data case studies and simulation experiments are used to illustrate the effectiveness of the method.
- Published
- 2014
42. The K ‐Function for Nearly Regular Point Processes
- Author
-
Charles C. Taylor, Ian L. Dryden, and Rahman Farnoosh
- Subjects
Statistics and Probability ,Biometry ,Movement ,Gaussian ,Equilateral triangle ,Models, Biological ,General Biochemistry, Genetics and Molecular Biology ,Square (algebra) ,Point process ,Regular grid ,Combinatorics ,symbols.namesake ,Animals ,Computer Simulation ,Mathematics ,Models, Statistical ,General Immunology and Microbiology ,Estimation theory ,Applied Mathematics ,Chlamydomonas ,Mathematical analysis ,Estimator ,General Medicine ,Grid ,symbols ,General Agricultural and Biological Sciences - Abstract
Summary. We propose modeling a nearly regular point pattern by a generalized Neyman-Scott process in which the offspring are Gaussian perturbations from a regular mean configuration. The mean configuration of interest is an equilateral grid, but our results can be used for any stationary regular grid. The case of uniformly distributed points is first studied as a benchmark. By considering the square of the interpoint distances, we can evaluate the first two moments of the K-function. These results can be used for parameter estimation, and simulations are used to both verify the theory and to assess the accuracy of the estimators. The methodology is applied to an investigation of regularity in plumes observed from swimming microorganisms.
- Published
- 2001
- Full Text
- View/download PDF
43. Procrustes Shape Analysis of Planar Point Subsets
- Author
-
Ian L. Dryden, M. R. Faghihi, and Charles C. Taylor
- Subjects
Statistics and Probability ,Delaunay triangulation ,Gaussian ,Mathematical analysis ,Isotropy ,Covariance ,Equilateral triangle ,Combinatorics ,symbols.namesake ,symbols ,Statistics, Probability and Uncertainty ,Statistic ,Mathematics ,Shape analysis (digital geometry) ,Central limit theorem - Abstract
Summary Consider a set of points in the plane randomly perturbed about a mean configuration by Gaussian errors. In this paper a Procrustes statistic based on the shapes of subsets of the points is studied, and its approximate distribution is found for small variations. We derive various properties of the distribution including the first two moments, a central limit result and a scaled χ2–-approximation. We concentrate on the independent isotropic Gaussian error case, although the results are valid for general covariance structures. We investigate triangle subsets in detail and in particular the situation where the population mean is regular (i.e. a Delaunay triangulation of the mean of the process is comprised of equilateral triangles of the same size). We examine the variance of the statistic for differently shaped regions and provide an asymptotic result for general shaped regions. The results are applied to an investigation of regularity in human muscle fibre cross-sections.
- Published
- 1997
- Full Text
- View/download PDF
44. Matching markers and unlabeled configurations in protein gels
- Author
-
Kanti V. Mardia, Charles C. Taylor, and Emma M. Petty
- Subjects
Statistics and Probability ,High probability ,Electrophoresis ,FOS: Computer and information sciences ,Computer science ,business.industry ,Pattern recognition ,shape ,Statistics - Applications ,Western Blots ,Modeling and Simulation ,Expectation–maximization algorithm ,Applications (stat.AP) ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,Shape analysis (digital geometry) - Abstract
Unlabeled shape analysis is a rapidly emerging and challenging area of statistics. This has been driven by various novel applications in bioinformatics. We consider here the situation where two configurations are matched under various constraints, namely, the configurations have a subset of manually located "markers" with high probability of matching each other while a larger subset consists of unlabeled points. We consider a plausible model and give an implementation using the EM algorithm. The work is motivated by a real experiment of gels for renal cancer and our approach allows for the possibility of missing and misallocated markers. The methodology is successfully used to automatically locate and remove a grossly misallocated marker within the given data set., Comment: Published in at http://dx.doi.org/10.1214/12-AOAS544 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- Published
- 2012
- Full Text
- View/download PDF
45. A comparison of block and semi-parametric bootstrap methods for variance estimation in spatial statistics
- Author
-
Mohsen Mohammadzadeh, N. Iranpanah, and Charles C. Taylor
- Subjects
Statistics and Probability ,Statistics::Theory ,Estimation theory ,Applied Mathematics ,Bootstrap aggregating ,Estimator ,Semiparametric model ,Computational Mathematics ,Computational Theory and Mathematics ,Kriging ,Statistics ,Statistics::Methodology ,Block size ,Spatial analysis ,Block (data storage) ,Mathematics - Abstract
Efron (1979) introduced the bootstrap method for independent data but it cannot be easily applied to spatial data because of their dependency. For spatial data that are correlated in terms of their locations in the underlying space the moving block bootstrap method is usually used to estimate the precision measures of the estimators. The precision of the moving block bootstrap estimators is related to the block size which is difficult to select. In the moving block bootstrap method also the variance estimator is underestimated. In this paper, first the semi-parametric bootstrap is used to estimate the precision measures of estimators in spatial data analysis. In the semi-parametric bootstrap method, we use the estimation of the spatial correlation structure. Then, we compare the semi-parametric bootstrap with a moving block bootstrap for variance estimation of estimators in a simulation study. Finally, we use the semi-parametric bootstrap to analyze the coal-ash data.
- Published
- 2011
46. Estimating the Dimension of a Fractal
- Author
-
Charles C. Taylor and James R. Taylor
- Subjects
Statistics and Probability ,010102 general mathematics ,01 natural sciences ,010104 statistics & probability ,Box counting ,Fractal ,Dimension (vector space) ,Complete information ,Statistics ,Statistical analysis ,Limit (mathematics) ,0101 mathematics ,Algorithm ,Mathematics - Abstract
SUMMARY We suggest refinements of the box counting method which address the obvious problems caused by the incomplete information and inaccessibility of the limit. A method for the statistical analysis of these corrected data is developed and tested on simulated and real data.
- Published
- 1991
- Full Text
- View/download PDF
47. A generative, probabilistic model of local protein structure
- Author
-
Anders Krogh, Kanti V. Mardia, Jesper Ferkinghoff-Borg, Thomas Hamelryck, Wouter Boomsma, and Charles C. Taylor
- Subjects
Models, Molecular ,Multidisciplinary ,Theoretical computer science ,Models, Statistical ,Continuous modelling ,Computer science ,Amino Acid Motifs ,Probabilistic logic ,Proteins ,Statistical model ,Protein structure prediction ,Biological Sciences ,Bioinformatics ,Prime (order theory) ,Generative model ,Fragment (logic) ,Generative grammar - Abstract
Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence–structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.
- Published
- 2008
48. Protein bioinformatics and mixtures of bivariate von Mises distributions for angular data
- Author
-
Kanti V. Mardia, Ganesh Subramaniam, and Charles C. Taylor
- Subjects
Statistics and Probability ,Likelihood Functions ,Models, Statistical ,General Immunology and Microbiology ,Myoglobin ,Protein Conformation ,Applied Mathematics ,Directional statistics ,Computational Biology ,Proteins ,Multivariate normal distribution ,General Medicine ,Bivariate analysis ,Bioinformatics ,General Biochemistry, Genetics and Molecular Biology ,Protein Structure, Secondary ,Malate Dehydrogenase ,Expectation–maximization algorithm ,von Mises distribution ,von Mises yield criterion ,Marginal distribution ,General Agricultural and Biological Sciences ,Algorithms ,Mathematics ,Ramachandran plot - Abstract
Summary A fundamental problem in bioinformatics is to characterize the secondary structure of a protein, which has traditionally been carried out by examining a scatterplot (Ramachandran plot) of the conformational angles. We examine two natural bivariate von Mises distributions—referred to as Sine and Cosine models—which have five parameters and, for concentrated data, tend to a bivariate normal distribution. These are analyzed and their main properties derived. Conditions on the parameters are established which result in bimodal behavior for the joint density and the marginal distribution, and we note an interesting situation in which the joint density is bimodal but the marginal distributions are unimodal. We carry out comparisons of the two models, and it is seen that the Cosine model may be preferred. Mixture distributions of the Cosine model are fitted to two representative protein datasets using the expectation maximization algorithm, which results in an objective partition of the scatterplot into a number of components. Our results are consistent with empirical observations; new insights are discussed.
- Published
- 2007
49. Learning in dynamically changing domains: Theory revision and context dependence issues
- Author
-
Charles C. Taylor and Gholamreza Nakhaeizadeh
- Subjects
Computational learning theory ,Computer science ,business.industry ,Algorithmic learning theory ,Context (language use) ,State (computer science) ,Artificial intelligence ,business ,Data science - Abstract
Dealing with dynamically changing domains is a very important topic in Machine Learning (ML) which has very interesting practical applications. Some attempts have already been made both in the statistical and machine learning communities to address some of the issues. In this paper we give a state of the art from the available literature in this area. We argue that a lot of further research is still needed, outline the directions that such research should go and describe the expected results. We argue also that most of the problems in this area can be solved only by interaction between the researchers of both the statistical and ML-communities.
- Published
- 1997
- Full Text
- View/download PDF
50. An understanding of muscle fibre images
- Author
-
M. R. Faghihi, Ian L. Dryden, and Charles C. Taylor
- Subjects
Delaunay triangulation ,business.industry ,media_common.quotation_subject ,Isotropy ,Pattern recognition ,Geometry ,Equilateral triangle ,Normal muscle ,Test statistic ,Artificial intelligence ,Cluster analysis ,business ,Random variable ,Normality ,Mathematics ,media_common - Abstract
Images of muscle biopsies reveal a mosaic pattern of two (slow-twitch and fast-twitch) fibre-types. An analysis of such images can indicate some neuromuscular disorder. We briefly review some methods which analyse the arrangement of the fibres (e.g. clustering of fibre type) and the fibre sizes. The proposed methodology uses the cell centres as a set of landmarks from which a Delaunay triangulation is created. The shapes of these (correlated) triangles are then used in a test statistic, to ascertain normality of a muscle. Our “normal muscle” model supposes that the fibres are hexagonal (so that the triangulation is made up of equilateral triangles) with a perturbation of specified isotropic variance of the fibre centres. We obtain the distribution of the test statistic as an approximate function of a χ2 random variable, so that a formal test can be carried out.
- Published
- 1995
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.