Author: "Charles C. Taylor" / Search Limiters: Full Text - Searchworks@Jio Institute Digital Library Search Results

1. Kernel regression for errors-in-variables problems in the circular domain

Author: Marco Di Marzio, Stefania Fensore, and Charles C. Taylor
Subjects: Statistics and Probability, Statistics, Probability and Uncertainty
Abstract: We study the problem of estimating a regression function when the predictor and/or the response are circular random variables in the presence of measurement errors. We propose estimators whose weight functions are deconvolution kernels defined according to the nature of the involved variables. We derive the asymptotic properties of the proposed estimators and consider possible generalizations and extensions. We provide some simulation results and a real data case study to illustrate and compare the proposed methods.
Published: 2023
Full Text: View/download PDF

2. The package: nonparametric regression using local rotation matrices in

Author: Giovanni Lafratta, Charles C. Taylor, Marco Di Marzio, and Stefania Fensore
Subjects: Statistics and Probability, 021103 operations research, Applied Mathematics, 0211 other engineering and technologies, Nonparametric statistics, 02 engineering and technology, Rotation matrix, 01 natural sciences, Regression, Bias reduction, Nonparametric regression, 010104 statistics & probability, Modeling and Simulation, Statistics, Singular value decomposition, 0101 mathematics, Statistics, Probability and Uncertainty, MIT License, Mathematics
Abstract: The package implements nonparametric (smooth) regression for spherical data in , and is freely available from the Comprehensive Archive Network (CRAN), licensed under the MIT License. It can be use...
Published: 2021
Full Text: View/download PDF

3. Spatio-temporal forecasting using wavelet transform-based decision trees with application to air quality and covid-19 forecasting

Author: Xin Zhao, Stuart Barber, Charles C Taylor, Xiaokai Nie, and Wenqian Shen
Subjects: Statistics and Probability, Articles, Statistics, Probability and Uncertainty
Abstract: We develop a new method that combines a decision tree with a wavelet transform to forecast time series data with spatial spillover effects. The method can not only improve prediction but also give good interpretability of the time series mechanism. As a feature exploration method, the wavelet transform represents information at different resolution levels, which may improve the performance of decision trees. The method is applied to simulated data, air pollution and COVID time series data sets. In the simulation, Haar, LA8, D4 and D6 wavelets are compared, with the Haar wavelet having the best performance. In the air pollution application, by using wavelet transform-based decision trees, the temporal effect of air quality index including autoregressive and seasonal effects can be described as well as the spatial correlation effect. To describe the spillover spatial effect in contiguous regions, a spatial weight is constructed to improve the modeling performance. The results show that air quality index has autoregressive, seasonal and spatial spillover effects. The wavelet transformed variables have a better forecasting performance and enhanced interpretability than the original variables. For the COVID time series of cumulative cases, spatial weighted variables are not selected which shows the lock-down policies are truly effective.
Published: 2022

4. Properties and approximate p-value calculation of the Cramer test

Author: Arief Gusnanto, Charles C. Taylor, Alison Telford, and Henry M. Wood
Subjects: Statistics and Probability, Anderson–Darling test, Applied Mathematics, Cumulative distribution function, Variance (accounting), Test (assessment), Distribution (mathematics), Modeling and Simulation, Cramér–von Mises criterion, Statistics, p-value, Statistics, Probability and Uncertainty, Null hypothesis, Mathematics
Abstract: Two-sample tests are probably the most commonly used tests in statistics. These tests generally address one aspect of the samples' distribution, such as mean or variance. When the null hypothesis is that two distributions are equal, the Anderson–Darling (AD) test, which is developed from the Cramer–von Mises (CvM) test, is generally employed. Unfortunately, we find that the AD test often fails to identify true differences when the differences are complex: they are not only in terms of mean, variance and/or skewness but also in terms of multi-modality. In such cases, we find that Cramer test, a modification of the CvM test, performs well. However, the adaptation of the Cramer test in routine analysis is hindered by the fact that the mean, variance and skewness of the test statistic are not available, which resulted in the problem of calculating the associated p-value. For this purpose, we propose a new method for obtaining a p-value by approximating the distribution of the test statistic by a generalized Pareto distribution. By approximating the distribution in this way, the calculation of the p-value is much faster than e.g. bootstrap method, especially for large n. We have observed that this approximation enables the Cramer test to have proper control of type-I error. A simulation study indicates that the Cramer test is as powerful as other tests in simple cases and more powerful in more complicated cases.
Published: 2020
Full Text: View/download PDF

5. Density estimation for circular data observed with errors

Author: Charles C. Taylor, Stefania Fensore, Marco Di Marzio, and Agnese Panzera
Subjects: Statistics and Probability, General Immunology and Microbiology, Applied Mathematics, Estimator, General Medicine, Density estimation, General Biochemistry, Genetics and Molecular Biology, Bias, Simple (abstract algebra), Kernel (statistics), Computer Simulation, Deconvolution, General Agricultural and Biological Sciences, Equivalence (measure theory), Fourier series, Algorithm, Smoothing, Mathematics
Abstract: Until now the problem of estimating circular densities when data are observed with errors has been mainly treated by Fourier series methods. We propose kernel-based estimators exhibiting simple construction and easy implementation. Specifically, we consider three different approaches: the first one is based on the equivalence between kernel estimators using data corrupted with different levels of error. This proposal appears to be totally unexplored, despite its potential for application also in the Euclidean setting. The second approach relies on estimators whose weight functions are circular deconvolution kernels. Due to the periodicity of the involved densities, it requires ad hoc mathematical tools. Finally, the third one is based on the idea of correcting extra bias of kernel estimators which use contaminated data and is essentially an adaptation of the standard theory to the circular case. For all the proposed estimators, we derive asymptotic properties, provide some simulation results, and also discuss some possible generalizations and extensions. Real data case studies are also included.
Published: 2022

6. Interval forecasts based on regression trees for streaming data

Author: Stuart Barber, Charles C. Taylor, Zoka Milan, and Xin Zhao
Subjects: Statistics and Probability, Computer science, Test data generation, Applied Mathematics, Autoregressive conditional heteroskedasticity, CPU time, Inference, 02 engineering and technology, Interval (mathematics), 01 natural sciences, Regression, Computer Science Applications, 010104 statistics & probability, Tree (data structure), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Autoregressive integrated moving average, 0101 mathematics, Algorithm
Abstract: In forecasting, we often require interval forecasts instead of just a specific point forecast. To track streaming data effectively, this interval forecast should reliably cover the observed data and yet be as narrow as possible. To achieve this, we propose two methods based on regression trees: one ensemble method and one method based on a single tree. For the ensemble method, we use weighted results from the most recent models, and for the single-tree method, we retain one model until it becomes necessary to train a new model. We propose a novel method to update the interval forecast adaptively using root mean square prediction errors calculated from the latest data batch. We use wavelet-transformed data to capture long time variable information and conditional inference trees for the underlying regression tree model. Results show that both methods perform well, having good coverage without the intervals being excessively wide. When the underlying data generation mechanism changes, their performance is initially affected but can recover relatively quickly as time proceeds. The method based on a single tree performs the best in computational (CPU) time compared to the ensemble method. When compared to ARIMA and GARCH modelling, our methods achieve better or similar coverage and width but require considerably less CPU time.
Published: 2019
Full Text: View/download PDF

7. Fluid shear stress stimulates breast cancer cells to display invasive and chemoresistant phenotypes while upregulating PLAU in a 3D bioreactor

Author: Caymen Novak, Catherine Z. Liu, Eric N. Horst, Charles C. Taylor, and Geeta Mehta
Subjects: 0106 biological sciences, 0301 basic medicine, Breast Neoplasms, Bioengineering, 01 natural sciences, Applied Microbiology and Biotechnology, Article, Metastasis, Extracellular matrix, 03 medical and health sciences, Bioreactors, Breast cancer, Downregulation and upregulation, 010608 biotechnology, medicine, Shear stress, Humans, Neoplasm Invasiveness, Mechanotransduction, Tumor microenvironment, Chemistry, Membrane Proteins, medicine.disease, Neoplasm Proteins, Up-Regulation, Gene Expression Regulation, Neoplastic, 030104 developmental biology, Drug Resistance, Neoplasm, Cancer cell, MCF-7 Cells, Cancer research, Female, Stress, Mechanical, Shear Strength, Biotechnology
Abstract: Breast cancer cells experience a range of shear stresses in the tumor microenvironment (TME). However most current in vitro three-dimensional (3D) models fail to systematically probe the effects of this biophysical stimuli on cancer cell metastasis, proliferation and chemoresistance. To investigate the roles of shear stress within the mammary and lung pleural effusion TME, a bioreactor capable of applying shear stress to cells within a 3D extracellular matrix was designed and characterized. Breast cancer cells were encapsulated within an interpenetrating network (IPN) hydrogel and subjected to shear stress of 5.4 dynes cm(−2) for 72 hours. Finite element modeling assessed shear stress profiles within the bioreactor. Cells exposed to shear stress had significantly higher cellular area and significantly lower circularity, indicating a motile phenotype. Stimulated cells were more proliferative than static controls and showed higher rates of chemoresistance to the anti-neoplastic drug paclitaxel. Fluid shear stress induced significant upregulation of the PLAU gene and elevated urokinase activity was confirmed through zymography and activity assay. Overall, these results indicate that pulsatile shear stress promotes breast cancer cell proliferation, invasive potential, chemoresistance, and PLAU signaling.
Published: 2019
Full Text: View/download PDF

8. Kernel Circular Deconvolution Density Estimation

Author: Marco Di Marzio, Stefania Fensore, Charles C. Taylor, and Agnese Panzera
Subjects: Observational error, Kernel (statistics), Euclidean geometry, Estimator, Applied mathematics, Deconvolution, Density estimation, Data application, Mathematics
Abstract: We consider the problem of nonparametrically estimating a circular density from data contaminated by angular measurement errors. Specifically, we obtain a kernel-type estimator with weight functions that are reminiscent of deconvolution kernels. Here, differently from the Euclidean setting, discrete Fourier coefficients are involved rather than characteristic functions. We provide some simulation results along with a real data application.
Published: 2020
Full Text: View/download PDF

9. A New Approach to Measuring Distances in Dense Graphs

Author: Charles C. Taylor, Peter A. Thwaites, and Fatimah A. Almulhim
Subjects: Discrete mathematics, Computer science, k-means clustering, Graph theory, 01 natural sciences, Graph, 010305 fluids & plasmas, Hierarchical clustering, Vertex (geometry), Search algorithm, 0103 physical sciences, Adjacency matrix, 010306 general physics, Cluster analysis, MathematicsofComputing_DISCRETEMATHEMATICS
Abstract: The problem of computing distances and shortest paths between vertices in graphs is one of the fundamental issues in graph theory. It is of great importance in many different applications, for example, transportation, and social network analysis. However, efficient shortest distance algorithms are still desired in many disciplines. Basically, the majority of dense graphs have ties between the shortest distances. Therefore, we consider a different approach and introduce a new measure to solve all-pairs shortest paths for undirected and unweighted graphs. This measures the shortest distance between any two vertices by considering the length and the number of all possible paths between them. The main aim of this new approach is to break the ties between equal shortest paths SP, which can be obtained by the Breadth-first search algorithm (BFS), and distinguish meaningfully between these equal distances. Moreover, using the new measure in clustering produces higher quality results compared with SP. In our study, we apply two different clustering techniques: hierarchical clustering and K-means clustering, with four different graph models, and for a various number of clusters. We compare the results using a modularity function to check the quality of our clustering results.
Published: 2019
Full Text: View/download PDF

10. Kernel density classification for spherical data

Author: Agnese Panzera, Charles C. Taylor, Marco Di Marzio, and Stefania Fensore
Subjects: Statistics and Probability, 010104 statistics & probability, Field (physics), Global climate, 010102 general mathematics, Kernel density estimation, Nonparametric statistics, Applied mathematics, Decision rule, 0101 mathematics, Statistics, Probability and Uncertainty, 01 natural sciences, Mathematics
Abstract: Classifying observations coming from two different spherical populations by using a nonparametric method appears to be an unexplored field, although clearly worth to pursue. We propose some decision rules based on spherical kernel density estimation and we provide asymptotic L 2 properties. A real-data application using global climate data is finally discussed.
Published: 2019

11. Geometry-based distance for clustering amino acids

Author: Arief Gusnanto, Charles C. Taylor, and Samira F. Abushilah
Subjects: Statistics and Probability, chemistry.chemical_classification, Quantitative Biology::Biomolecules, business.industry, Squared euclidean distance, Pattern recognition, Articles, Quantitative Biology::Genomics, Amino acid, Hierarchical clustering, chemistry, Artificial intelligence, Statistics, Probability and Uncertainty, Cluster analysis, business, Mathematics
Abstract: Clustering amino acids is one of the most challenging problems in functional and structural prediction of protein. Previous studies have proposed clusters based on measurements of physical and biochemical characteristics of the amino acids such as volume, area, hydrophilicity, polarity, hydrogen bonding, shape, and charge. These characteristics, although important, are less directly related to the protein structure compared to geometrical characteristics such as dihedral angles between amino acids. We propose using the p-value from a test of equality of dihedral-angle distributions as the basis of a distance measure for the clustering. In this novel approach, an energy test is modified to deal with bivariate angular data and the p-value is obtained via a permutation method. The results indicate that the clusters of amino acids have sensible interpretation where Glycine, Proline, and Asparagine each forms a distinct cluster. A simulation study suggests that this approach has good working characteristics to cluster amino acids.
Published: 2019
Full Text: View/download PDF

12. Local binary regression with spherical predictors

Author: Agnese Panzera, Marco Di Marzio, Charles C. Taylor, and Stefania Fensore
Subjects: Statistics and Probability, Polynomial regression, Statistics::Theory, 010102 general mathematics, Kernel density estimation, Local regression, Binary number, Estimator, 01 natural sciences, 010104 statistics & probability, Applied mathematics, Statistics::Methodology, Binary regression, 0101 mathematics, Statistics, Probability and Uncertainty, Mathematics
Abstract: We discuss local regression estimators when the predictor lies on the d -dimensional sphere and the response is binary. Despite Di Marzio et al. (2018b), who introduce spherical kernel density classification, we build on the theory of local polynomial regression and local likelihood. Simulations and a real-data application illustrate the effectiveness of the proposals.
Published: 2019

13. Cross-validation is safe to use

Author: Oghenejokpeme I. Orhobor, Ross D. King, and Charles C. Taylor
Subjects: Human-Computer Interaction, Artificial Intelligence, Computer Networks and Communications, business.industry, Medicine, Computer Vision and Pattern Recognition, business, Software, Cross-validation, Reliability engineering
Published: 2021
Full Text: View/download PDF

14. Classification of form under heterogeneity and non-isotropic errors

Author: Arief Gusnanto, Farag Shuweihdi, and Charles C. Taylor
Subjects: Statistics and Probability, business.industry, Computation, Diagonal, Estimator, Pattern recognition, Euclidean distance matrix, computer.software_genre, Form classification, Weighting, Data mining, Artificial intelligence, Statistics, Probability and Uncertainty, business, computer, Classifier (UML), Shape analysis (digital geometry), Mathematics
Abstract: A number of areas related to learning under supervision have not been fully investigated, particularly the possibility of incorporating the method of classification into shape analysis. In this regard, practical ideas conducive to the improvement of form classification are the focus of interest. Our proposal is to employ a hybrid classifier built on Euclidean Distance Matrix Analysis (EDMA) and Procrustes distance, rather than generalised Procrustes analysis (GPA). In empirical terms, it has been demonstrated that there is notable difference between the estimated form and the true form when EDMA is used as the basis for computation. However, this does not seem to be the case when GPA is employed. With the assumption that no association exists between landmarks, EDMA and GPA are used to calculate the mean form and diagonal weighting matrix to build superimposing classifiers. As our findings indicate, with the use of EDMA estimators, the superimposing classifiers we propose work extremely well, as opposed to the use of GPA, as far as both simulated and real datasets are concerned.
Published: 2016
Full Text: View/download PDF

15. Nonparametric circular quantile regression

Author: Charles C. Taylor, Marco Di Marzio, and Agnese Panzera
Subjects: Statistics and Probability, Circular distribution, Applied Mathematics, 05 social sciences, Nonparametric statistics, Estimator, Inversion (meteorology), Conditional probability distribution, 01 natural sciences, Quantile regression, 010104 statistics & probability, Circular conditional distribution function, circular conditional quantiles, circular kernels, optimal smoothing degree, wind directions, 0502 economics and business, Statistics, Applied mathematics, Minification, 0101 mathematics, Statistics, Probability and Uncertainty, 050205 econometrics, Mathematics, Quantile
Abstract: We discuss nonparametric estimation of conditional quantiles of a circular distribution when the conditioning variable is either linear or circular. Two different approaches are pursued: inversion of a conditional distribution function estimator, and minimization of a smoothed check function. Local constant and local linear versions of both estimators are discussed. Simulation experiments and a real data case study are used to illustrate the usefulness of the methods.
Published: 2016
Full Text: View/download PDF

16. A note on nonparametric estimation of circular conditional densities

Author: M. Di Marzio, Charles C. Taylor, Agnese Panzera, and Stefania Fensore
Subjects: Statistics and Probability, Polynomial, Applied Mathematics, 05 social sciences, Nonparametric statistics, Estimator, Conditional probability distribution, Conditional expectation, 01 natural sciences, Quantile regression, 010104 statistics & probability, Modeling and Simulation, 0502 economics and business, Statistics, Applied mathematics, 0101 mathematics, Statistics, Probability and Uncertainty, Conditional variance, 050205 econometrics, Quantile, Mathematics
Abstract: The conditional density offers the most informative summary of the relationship between explanatory and response variables. We need to estimate it in place of the simple conditional mean when its shape is not well-behaved. A motivation for estimating conditional densities, specific to the circular setting, lies in the fact that a natural alternative of it, like quantile regression, could be considered problematic because circular quantiles are not rotationally equivariant. We treat conditional density estimation as a local polynomial fitting problem as proposed by Fan et al. [Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika. 1996;83:189–206] in the Euclidean setting, and discuss a class of estimators in the cases when the conditioning variable is either circular or linear. Asymptotic properties for some members of the proposed class are derived. The effectiveness of the methods for finite sample sizes is illustrated by simulation experiments a...
Published: 2016
Full Text: View/download PDF

17. Classification tree methods for panel data using wavelet-transformed time series

Author: Charles C. Taylor, Xin Zhao, Zoka Milan, and Stuart Barber
Subjects: Statistics and Probability, Interpretation (logic), Series (mathematics), business.industry, Computer science, Applied Mathematics, Decision tree learning, Pattern recognition, 02 engineering and technology, 01 natural sciences, Data type, 010104 statistics & probability, Computational Mathematics, Wavelet, Computational Theory and Mathematics, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, 0101 mathematics, Representation (mathematics), Scale (map), business, Panel data
Abstract: Wavelet-transformed variables can have better classification performance for panel data than using variables on their original scale. Examples are provided showing the types of data where using a wavelet-based representation is likely to improve classification accuracy. Results show that in most cases wavelet-transformed data have better or similar classification accuracy to the original data, and only select genuinely useful explanatory variables. Use of wavelet-transformed data provides localized mean and difference variables which can be more effective than the original variables, provide a means of separating “signal” from “noise”, and bring the opportunity for improved interpretation via the consideration of which resolution scales are the most informative. Panel data with multiple observations on each individual require some form of aggregation to classify at the individual level. Three different aggregation schemes are presented and compared using simulated data and real data gathered during liver transplantation. Methods based on aggregating individual level data before classification outperform methods which rely solely on the combining of time-point classifications.
Published: 2018

18. Statistical Estimate of Radon Concentration from Passive and Active Detectors in Doha

Author: Rifaat Hassona, Adil Yousef, Kassim Mwitondi, Ibrahim Al Sadig, and Charles C. Taylor
Subjects: Radon detection, spatio-temporal analyses, Information Systems and Management, Meteorology, 0211 other engineering and technologies, chemistry.chemical_element, Radon, unsupervised modelling, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Clustering, Local regression, radon detection, Data modeling, Spatio-temporal analyses, Visualisation, visualisation, Cluster analysis, 0105 earth and related environmental sciences, Potential impact, 021103 operations research, Data collection, estimation, Detector, lcsh:Z, lcsh:Bibliography. Library science. Information resources, Computer Science Applications, local regression, chemistry, Work (electrical), Environmental science, Estimation methods, Estimation, Unsupervised modelling, clustering, Information Systems
Abstract: Harnessing knowledge on the physical and natural conditions that affect our health, general livelihood and sustainability has long been at the core of scientific research. Health risks of ionising radiation from exposure to radon and radon decay products in homes, work and other public places entail developing novel approaches to modelling occurrence of the gas and its decaying products, in order to cope with the physical and natural dynamics in human habitats. Various data modelling approaches and techniques have been developed and applied to identify potential relationships among individual local meteorological parameters with a potential impact on radon concentrations&mdash, i.e., temperature, barometric pressure and relative humidity. In this first research work on radon concentrations in the State of Qatar, we present a combination of exploratory, visualisation and algorithmic estimation methods to try and understand the radon variations in and around the city of Doha. Data were obtained from the Central Radiation Laboratories (CRL) in Doha, gathered from 36 passive radon detectors deployed in various schools, residential and work places in and around Doha as well as from one active radon detector located at the CRL. Our key findings show high variations mainly attributable to technical variations in data gathering, as the equipment and devices appear to heavily influence the levels of radon detected. A parameter maximisation method applied to simulate data with similar behaviour to the data from the passive detectors in four of the neighbourhoods appears appropriate for estimating parameters in cases of data limitation. Data from the active detector exhibit interesting seasonal variations&mdash, with data clustering exhibiting two clearly separable groups, with passive and active detectors exhibiting a huge disagreement in readings. These patterns highlight challenges related to detection methods&mdash, in particular ensuring that deployed detectors and calculations of radon concentrations are adapted to local conditions. The study doesn&rsquo, t dwell much on building materials and makes rather fundamental assumptions, including an equal exhalation rate of radon from the soil across neighbourhoods, based on Doha&rsquo, s homogeneous underlying geological formation. The study also highlights potential extensions into the broader category of pollutants such as hydrocarbon, air particulate carbon monoxide and nitrogen dioxide at specific time periods of the year and particularly how they may tie in with global health institutions&rsquo, requirements.
Published: 2018

19. Statistical analysis of particulate matter data in Doha, Qatar

Author: Charles C. Taylor, Kassim Mwitondi, and Adil Yousif
Subjects: Pollution, Data collection, Meteorology, media_common.quotation_subject, Outlier, Analyser, Environmental science, Sampling (statistics), Sample (statistics), Missing data, Wind speed, media_common
Abstract: Pollution in Doha is measured using passive, active and automatic sampling. In this paper we consider data automatically sampled in which various pollutants were continually collected and analysed every hour. At each station the sample is analysed on-line and in real time and the data is stored within the analyser, or a separate logger so it can be downloaded remotely by a modem. The accuracy produced enables pollution episodes to be analysed in detail and related to traffic flows, meteorology and other variables. Data has been collected hourly over more than 6 years at 3 different locations, with measurements available for various pollutants – for example, ozone, nitrogen oxides, sulphur dioxide, carbon monoxide, THC, methane and particulate matter (PM1.0, PM2.5 and PM10), as well as meteorological data such as humidity, temperature, and wind speed and direction. Despite much care in the data collection process, the resultant data has long stretches of missing values, when the equipment has malfunctioned – often as a result of more extreme conditions. Our analysis is twofold. Firstly, we consider ways to “clean” the data, by imputing missing values, including identified outliers. The second aspect specifically considers prediction of each particulate (PM1.0, PM2.5 and PM10) 24 hours ahead, using current (and previous) pollution and meteorological data. In this case, we use vector autoregressive models, compare with decision trees and propose variable selection criteria which explicitly adapt to missing data. Our results show that the regression tree models, with no variable transformations, perform the best, and that attempts to impute missing values are hampered by non-random missingness.
Published: 2018

20. Circular local likelihood

Author: Charles C. Taylor, Agnese Panzera, Marco Di Marzio, and Stefania Fensore
Subjects: Statistics and Probability, Polynomial, Bessel functions. Circular data. Density estimation. Log-likelihood. von Mises density, Logarithm, Basis (linear algebra), 05 social sciences, Kernel density estimation, Estimator, Density estimation, Function (mathematics), 01 natural sciences, 0506 political science, 010104 statistics & probability, 050602 political science & public administration, Applied mathematics, 0101 mathematics, Statistics, Probability and Uncertainty, Special case, Mathematics
Abstract: We introduce a class of local likelihood circular density estimators, which includes the kernel density estimator as a special case. The idea lies in optimizing a spatially weighted version of the log-likelihood function, where the logarithm of the density is locally approximated by a periodic polynomial. The use of von Mises density functions as weights reduces the computational burden. Also, we propose closed-form estimators which could form the basis of counterparts in the multidimensional Euclidean setting. Simulation results and a real data case study are used to evaluate the performance and illustrate the results.
Published: 2018

21. Nonparametric Rotations for Sphere-Sphere Regression

Author: Marco Di Marzio, Charles C. Taylor, and Agnese Panzera
Subjects: Statistics and Probability, Wahba's problem, 05 social sciences, Nonparametric statistics, Hypersphere, 01 natural sciences, Regression, Bias Reduction, Fisher’s Method of Scoring, Local Smoothing, Non-Rigid Rotation Estimation, Singular Value Decomposition, Skew-symmetric Matrices, Spherical Kernels, Wahba’s Problem, 010104 statistics & probability, Simple (abstract algebra), 0502 economics and business, Singular value decomposition, Applied mathematics, 0101 mathematics, Statistics, Probability and Uncertainty, Rotation (mathematics), 050205 econometrics, Parametric statistics, Mathematics
Abstract: Regression of data represented as points on a hypersphere has traditionally been treated using parametric families of transformations that include the simple rigid rotation as an important, special case. On the other hand, nonparametric methods have generally focused on modeling a scalar response through a spherical predictor by representing the regression function as a polynomial, leading to component-wise estimation of a spherical response. We propose a very flexible, simple regression model where for each location of the manifold a specific rotation matrix is to be estimated. To make this approach tractable, we assume continuity of the regression function that, in turn, allows for approximations of rotation matrices based on a series expansion. It is seen that the nonrigidity of our technique motivates an iterative estimation within a Newton–Raphson learning scheme, which exhibits bias reduction properties. Extensions to general shape matching are also outlined. Both simulations and real data are used to illustrate the results. Supplementary materials for this article are available online.
Published: 2018
Full Text: View/download PDF

22. Nonparametric estimating equations for circular probability density functions and their derivatives

Author: Agnese Panzera, Charles C. Taylor, Stefania Fensore, and Marco Di Marzio
Subjects: Statistics and Probability, Mathematical optimization, Population, Fourier coefficients, Probability density function, Estimating equations, trigonometric moments, 01 natural sciences, 010104 statistics & probability, Circular kernels, Density estimation, Jackknife, Sin-polynomials, Trigonometric moments, Von mises density, density estimation, 0502 economics and business, Applied mathematics, 0101 mathematics, education, von Mises density, 050205 econometrics, Mathematics, education.field_of_study, 05 social sciences, Nonparametric statistics, Estimator, Probability and statistics, jackknife, Delta method, sin-polynomials, Statistics, Probability and Uncertainty
Abstract: We propose estimating equations whose unknown parameters are the values taken by a circular density and its derivatives at a point. Specifically, we solve equations which relate local versions of population trigonometric moments with their sample counterparts. Major advantages of our approach are: higher order bias without asymptotic variance inflation, closed form for the estimators, and absence of numerical tasks. We also investigate situations where the observed data are dependent. Theoretical results along with simulation experiments are provided.
Published: 2017
Full Text: View/download PDF

23. Estimating optimal window size for analysis of low-coverage next-generation sequence data

Author: Ibrahim Nafisah, Charles C. Taylor, Henry M. Wood, Stefano Berri, Arief Gusnanto, and Pamela Rabbitts
Subjects: Statistics and Probability, Lung Neoplasms, Computer science, Context (language use), computer.software_genre, Biochemistry, Humans, Molecular Biology, Likelihood Functions, Sequence, Genome, Human, High-Throughput Nucleotide Sequencing, Window (computing), Contrast (statistics), Genomics, Sequence Analysis, DNA, Function (mathematics), Computer Science Applications, Computational Mathematics, Computational Theory and Mathematics, Step function, Data mining, Akaike information criterion, computer, Algorithm, Next generation sequence
Abstract: Motivation: Current high-throughput sequencing has greatly transformed genome sequence analysis. In the context of very low-coverage sequencing ( Results: We assume the reads density to be a step function. Given this model, we propose a data-based estimation of optimal window size based on Akaike’s information criterion (AIC) and cross-validation (CV) log-likelihood. By plotting the AIC and CV log-likelihood curve as a function of window size, we are able to estimate the optimal window size that minimizes AIC or maximizes CV log-likelihood. The proposed methods are of general purpose and we illustrate their application using low-coverage next-generation sequence datasets from real tumour samples and simulated datasets. Availability and implementation: An R package to estimate optimal window size is available at http://www1.maths.leeds.ac.uk/∼arief/R/win/ . Contact: a.gusnanto@leeds.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Published: 2014
Full Text: View/download PDF

24. Validating protein structure using kernel density estimates

Author: Charles C. Taylor, Agnese Panzera, Marco Di Marzio, and Kanti V. Mardia
Subjects: Statistics and Probability, Quantitative Biology::Biomolecules, Mathematical optimization, Kernel density estimation, Conditional probability distribution, Density estimation, Multivariate kernel density estimation, Kernel embedding of distributions, Variable kernel density estimation, Test set, Kernel (statistics), Statistics, Probability and Uncertainty, Algorithm, Mathematics
Abstract: Measuring the quality of determined protein structures is a very important problem in bioinformatics. Kernel density estimation is a well-known nonparametric method which is often used for exploratory data analysis. Recent advances, which have extended previous linear methods to multi-dimensional circular data, give a sound basis for the analysis of conformational angles of protein backbones, which lie on the torus. By using an energy test, which is based on interpoint distances, we initially investigate the dependence of the angles on the amino acid type. Then, by computing tail probabilities which are based on amino-acid conditional density estimates, a method is proposed which permits inference on a test set of data. This can be used, for example, to validate protein structures, choose between possible protein predictions and highlight unusual residue angles.
Published: 2012
Full Text: View/download PDF

25. Kernel density estimation on the torus

Author: Marco Di Marzio, Agnese Panzera, and Charles C. Taylor
Subjects: Statistics and Probability, Applied Mathematics, Kernel density estimation, Torus, Density estimation, Multivariate kernel density estimation, Kernel method, Variable kernel density estimation, Calculus, Partial derivative, Applied mathematics, Statistics, Probability and Uncertainty, Smoothing, Mathematics
Abstract: Kernel density estimation for multivariate, circular data has been formulated only when the sample space is the sphere, but theory for the torus would also be useful. For data lying on a d-dimensional torus (d >= 1), we discuss kernel estimation of a density, its mixed partial derivatives, and their squared functionals. We introduce a specific class of product kernels whose order is suitably defined in such a way to obtain L-2-risk formulas whose structure can be compared to their Euclidean counterparts. Our kernels are based on circular densities; however, we also discuss smaller bias estimation involving negative kernels which are functions of circular densities. Practical rules for selecting the smoothing degree, based on cross-validation, bootstrap and plug-in ideas are derived. Moreover, we provide specific results on the use of kernels based on the von Mises density. Finally, real-data examples and simulation studies illustrate the findings.
Published: 2011
Full Text: View/download PDF

26. Local polynomial regression for circular predictors

Author: Marco Di Marzio, Agnese Panzera, and Charles C. Taylor
Subjects: Statistics and Probability, Polynomial regression, Polynomial, Probability theory, Calculus, Applied mathematics, Torus, Statistics, Probability and Uncertainty, Design space, Smoothing, Mathematics, Variable (mathematics)
Abstract: We consider local smoothing of datasets where the design space is the d-dimensional (d≥ 1) torus and the response variable is real-valued. Our purpose is to extend least squa res local polynomial fitting to this situation. We give both theoretical and empirical results.
Published: 2009
Full Text: View/download PDF

27. Using small bias nonparametric density estimators for confidence interval estimation

Author: Marco Di Marzio and Charles C. Taylor
Subjects: Statistics and Probability, Bootstrapping (electronics), Kernel (statistics), Statistics, Econometrics, Nonparametric statistics, Estimator, Statistics, Probability and Uncertainty, U-statistic, Confidence interval, CDF-based nonparametric confidence interval, Multivariate kernel density estimation, Mathematics
Abstract: Confidence intervals for densities built on the basis of standard nonparametric theory are doomed to have poor coverage rates due to bias. Studies on coverage improvement exist, but reasonably behaved interval estimators are needed. We explore the use of small bias kernel-based methods to construct confidence intervals, in particular using a geometric density estimator that seems better suited for this purpose.
Published: 2009
Full Text: View/download PDF

28. Maximum likelihood estimation using composite likelihoods for closed exponential families

Author: Kanti V. Mardia, Charles C. Taylor, Gareth Hughes, and John T. Kent
Subjects: Statistics and Probability, Pseudolikelihood, Restricted maximum likelihood, Applied Mathematics, General Mathematics, Normalizing constant, Bivariate von Mises distribution, Maximum likelihood sequence estimation, Agricultural and Biological Sciences (miscellaneous), Statistics::Computation, Exponential family, Expectation–maximization algorithm, Statistics, Statistics::Methodology, Computer Science::Symbolic Computation, Statistics, Probability and Uncertainty, General Agricultural and Biological Sciences, Likelihood function, Mathematics
Abstract: In certain multivariate problems the full probability density has an awkward normalizing constant, but the conditional and/or marginal distributions may be much more tractable. In this paper we investigate the use of composite likelihoods instead of the full likelihood. For closed exponential families, both are shown to be maximized by the same parameter values for any number of observations. Examples include log-linear models and multivariate normal models. In other cases the parameter estimate obtained by maximizing a composite likelihood can be viewed as an approximation to the full maximum likelihood estimate. An application is given to an example in directional data based on a bivariate von Mises distribution. Copyright 2009, Oxford University Press.
Published: 2009
Full Text: View/download PDF

29. On boosting kernel regression

Author: Marco Di Marzio and Charles C. Taylor
Subjects: Statistics and Probability, Analysis of covariance, Boosting (machine learning), Iterative method, Applied Mathematics, Estimator, Cross-validation, Kernel method, Statistics, Kernel regression, Applied mathematics, Statistics, Probability and Uncertainty, Smoothing, Mathematics
Abstract: In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the Nadaraya-Watson estimator with L2boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast, and the variance diverges exponentially slow. The firs t boosting step is analyzed in more detail, giving asymptotic expressions as functions of the smoothing parameter, and relationships with previous work are explored. Practical performance is illustrated by both simulated and real data.
Published: 2008
Full Text: View/download PDF

30. A multivariate von mises distribution with applications to bioinformatics

Author: Kanti V. Mardia, Gareth Hughes, Harshinder Singh, and Charles C. Taylor
Subjects: Statistics and Probability, Multivariate statistics, Univariate, Multivariate normal distribution, Bivariate analysis, Conditional probability distribution, Wald test, Statistics::Computation, Statistics, von Mises distribution, Statistics::Methodology, Applied mathematics, Statistics, Probability and Uncertainty, Marginal distribution, Mathematics
Abstract: Motivated by problems of modelling torsional angles in molecules, Singh, Hnizdo & Demchuk (2002) proposed a bivariate circular model which is a natural torus analogue of the bivariate normal distribution and a natural extension of the univariate von Mises distribution to the bivariate case. The authors present here a multivariate extension of the bivariate model of Singh, Hnizdo & Demchuk (2002). They study the conditional distributions and investigate the shapes of marginal distributions for a special case. The methods of moments and pseudo-likelihood are considered for the estimation of parameters of the new distribution. The authors investigate the efficiency of the pseudo-likelihood approach in three dimensions. They illustrate their methods with protein data of conformational angles.
Published: 2008
Full Text: View/download PDF

31. Automatic bandwidth selection for circular density estimation

Author: Charles C. Taylor
Subjects: Statistics and Probability, Alternative methods, Applied Mathematics, Bandwidth (signal processing), Concentration parameter, Estimator, Density estimation, Bivariate analysis, Computational Mathematics, Computational Theory and Mathematics, Euclidean geometry, Statistics, von Mises distribution, Applied mathematics, Mathematics
Abstract: Given angular data @q"1,...,@q"[email protected]?[0,[email protected]) a common objective is to estimate the density. In case that a kernel estimator is used, bandwidth selection is crucial to the performance. A ''plug-in rule'' for the bandwidth, which is based on the concentration of a reference density, namely, the von Mises distribution is obtained. It is seen that this is equivalent to the usual Euclidean plug-in rule in the case where the concentration becomes large. In case that the concentration parameter is unknown, alternative methods are explored which are intended to be robust to departures from the reference density. Simulations indicate that ''wrapped estimators'' can perform well in this context. The methods are applied to a real bivariate dataset concerning protein structure.
Published: 2008
Full Text: View/download PDF

32. The Poisson Index: a new probabilistic model for protein–ligand binding site similarity

Author: J.R. Davies, Richard M. Jackson, Charles C. Taylor, and Kanti V. Mardia
Subjects: Statistics and Probability, Matching (graph theory), Structural similarity, Molecular Sequence Data, Ligands, Poisson distribution, Biochemistry, Measure (mathematics), symbols.namesake, Similarity (network science), Sequence Analysis, Protein, Protein Interaction Mapping, Statistics, Computer Simulation, Amino Acid Sequence, Poisson Distribution, Molecular Biology, Mathematics, Binding Sites, Models, Statistical, Sequence Homology, Amino Acid, business.industry, Proteins, Contrast (statistics), Pattern recognition, Statistical model, Similitude, Computer Science Applications, Computational Mathematics, Models, Chemical, Computational Theory and Mathematics, symbols, Artificial intelligence, business, Algorithms, Protein Binding
Abstract: Motivation: The large-scale comparison of protein–ligand binding sites is problematic, in that measures of structural similarity are difficult to quantify and are not easily understood in terms of statistical similarity that can ultimately be related to structure and function. We present a binding site matching score the Poisson Index (PI) based upon a well-defined statistical model. PI requires only the number of matching atoms between two sites and the size of the two sites—the same information used by the Tanimoto Index (TI), a comparable and widely used measure for molecular similarity. We apply PI and TI to a previously automatically extracted set of binding sites to determine the robustness and usefulness of both scores.Results: We found that PI outperforms TI; moreover, site similarity is poorly defined for TI at values around the 99.5% confidence level for which PI is well defined. A difference map at this confidence level shows that PI gives much more meaningful information than TI. We show individual examples where TI fails to distinguish either a false or a true site paring in contrast to PI, which performs much better. TI cannot handle large or small sites very well, or the comparison of large and small sites, in contrast to PI that is shown to be much more robust. Despite the difficulty of determining a biological ‘ground truth’ for binding site similarity we conclude that PI is a suitable measure of binding site similarity and could form the basis for a binding site classification scheme comparable to existing protein domain classification schema.Availability: PI is implemented in SitesBase www.modelling.leeds.ac.uk/sb/Contact: r.m.jackson@leeds.ac.uk
Published: 2007
Full Text: View/download PDF

33. Classification of type I-censored bivariate data

Author: Matthew J. Langdon, Robert West, and Charles C. Taylor
Subjects: Statistics and Probability, business.industry, Applied Mathematics, Pattern recognition, Bivariate analysis, Bayes classifier, Censoring (statistics), Computational Mathematics, Bayes' theorem, Computational Theory and Mathematics, Bivariate data, Decision boundary, Artificial intelligence, business, Random variable, Classifier (UML), Mathematics
Abstract: Type I, or limits of detection censoring occurs when a random variable is only observable between fixed and known limits. The classification problem, when the feature vectors to be used to classify are bivariate type I-censored observations, is considered. A Bayes' optimal classifier is constructed under the assumption that the underlying distribution is Gaussian and it is shown that the decision boundary between classes is not continuous as in the uncensored case. Examples of the decision boundary are presented and simulation studies are used to illustrate the methods described. The resultant classifier is applied to simulated electrical impedance tomography data and a medical data set as illustrations.
Published: 2007
Full Text: View/download PDF

34. Hierarchical Bayesian modelling of spatial age-dependent mortality

Author: Ian L. Dryden, N. Miklós Arató, and Charles C. Taylor
Subjects: Statistics and Probability, Markov chain, Applied Mathematics, Posterior probability, Markov chain Monte Carlo, Conditional probability distribution, Markov model, Binomial distribution, Computational Mathematics, symbols.namesake, Metropolis–Hastings algorithm, Computational Theory and Mathematics, Prior probability, Statistics, Econometrics, symbols, Quantitative Biology::Populations and Evolution, Mathematics
Abstract: Hierarchical Bayesian modelling is considered for the number of age-dependent deaths in different geographic regions. The model uses a conditional binomial distribution for the number of age-dependent deaths, a new family of zero mean Gaussian Markov random field models for incorporating spatial correlations between neighbouring regions, and an intrinsic Gaussian model for including correlations between age-dependent mortality rates. Age-dependent mortality rates are estimated for each region, and approximate credibility intervals based on summaries of samples from the posterior distribution are obtained from Markov chain Monte Carlo simulation. The consequent maps of mortality rates are less variable and smoother than those which would be obtained from naive estimates, and various inferences may be drawn from the results. The prior spatial model includes some of the common conditional autoregressive spatial models used in epidemiology, and so model uncertainty in this family can be accounted for. The methodology is illustrated with an actuarial data set of age-dependent deaths in 150 geographic regions of Hungary. Sensitivity to the prior distributions is discussed, as well as relative risks for certain covariates (males in towns, females in towns, males in villages, females in villages).
Published: 2006
Full Text: View/download PDF

35. Kernel density classification and boosting: an L2 analysis

Author: M. Di Marzio and Charles C. Taylor
Subjects: Statistics and Probability, business.industry, Kernel density estimation, Pattern recognition, Multivariate kernel density estimation, Theoretical Computer Science, Kernel method, Computational Theory and Mathematics, Variable kernel density estimation, Kernel embedding of distributions, Polynomial kernel, Radial basis function kernel, Kernel regression, Artificial intelligence, Statistics, Probability and Uncertainty, business, Mathematics
Abstract: Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is "boosting", and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research.
Published: 2005
Full Text: View/download PDF

36. Boosted Regression Estimates of Spatial Data: Pointwise Inference

Author: Marco Di Marzio and Charles C. Taylor
Subjects: Pointwise, Statistics and Probability, Boosting (machine learning), General Mathematics, Statistics, Econometrics, Nonparametric statistics, Estimator, Inference, Spatial analysis, Cross-validation, Regression, Mathematics
Abstract: In this study simple nonparametric techniques have been adopted to estimate the trend surface of the Swiss rainfall data. In particular we employed the Nadaraya-Watson smoother and in addition, an adapted-by boosting-version of it. Additionally, we have explored the use of the Nadaraya-Watson estimator for the construction of pointwise confidence intervals. Overall, boosting does seem to improve the estimate as much as previous examples and the results indicate that cross-validation can be successfully used for parameter selection on real datasets. In addition, our estimators compare favorably with most of the techniques previously used on this dataset.
Published: 2005
Full Text: View/download PDF

37. Non-Stationary Spatiotemporal Analysis of Karst Water Levels

Author: J. Kovács, Charles C. Taylor, Ian L. Dryden, and L. Márkus
Subjects: Statistics and Probability, Data set, Covariance function, Kriging, Stochastic modelling, Econometrics, Estimator, Applied mathematics, Hydrograph, Statistics, Probability and Uncertainty, Covariance, Cross-validation, Mathematics
Abstract: Summary We consider non-stationary spatiotemporal modelling in an investigation into karst water levels in western Hungary. A strong feature of the data set is the extraction of large amounts of water from mines, which caused the water levels to reduce until about 1990 when the mining ceased, and then the levels increased quickly. We discuss some traditional hydrogeological models which might be considered to be appropriate for this situation, and various alternative stochastic models. In particular, a separable space–time covariance model is proposed which is then deformed in time to account for the non-stationary nature of the lagged correlations between sites. Suitable covariance functions are investigated and then the models are fitted by using weighted least squares and cross-validation. Forecasting and prediction are carried out by using spatiotemporal kriging. We assess the performance of the method with one-step-ahead forecasting and make comparisons with naïve estimators. We also consider spatiotemporal prediction at a set of new sites. The new model performs favourably compared with the deterministic model and the naïve estimators, and the deformation by time shifting is worthwhile.
Published: 2005
Full Text: View/download PDF

38. Chain plot: a tool for exploiting bivariate temporal structures

Author: András Zempléni and Charles C. Taylor
Subjects: Statistics and Probability, Probability plot, Partial residual plot, Applied Mathematics, Bivariate analysis, Probability plot correlation coefficient plot, Computational Mathematics, Exploratory data analysis, Computational Theory and Mathematics, Chain (algebraic topology), Bivariate data, Statistics, Q–Q plot, Algorithm, Mathematics
Abstract: In this paper we present a graphical tool useful for visualizing the cyclic behaviour of bivariate time series. We investigate its properties and link it to the asymmetry of the two variables concerned. We also suggest adding approximate confidence bounds to the points on the plot and investigate the effect of lagging to the chain plot. We conclude our paper by some standard Fourier analysis, relating and comparing this to the chain plot.
Published: 2004
Full Text: View/download PDF

39. Boosting kernel density estimates: A bias reduction technique?

Author: Marco Di Marzio and Charles C. Taylor
Subjects: Statistics and Probability, Boosting (machine learning), Applied Mathematics, General Mathematics, Statistics, Kernel density estimation, Statistics, Probability and Uncertainty, General Agricultural and Biological Sciences, Agricultural and Biological Sciences (miscellaneous), Bias reduction, Mathematics
Abstract: SUMMARY This paper proposes an algorithm for boosting kernel density estimates. We show that boosting is closely linked to a previously proposed method of bias reduction and indicate how it should enjoy similar properties. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research.
Published: 2004
Full Text: View/download PDF

40. Bayesian Texture Segmentation of Weed and Crop Images Using Reversible Jump Markov Chain Monte Carlo Methods

Author: Mark R. Scarr, Charles C. Taylor, and Ian L. Dryden
Subjects: Statistics and Probability, Random field, Markov random field, business.industry, Posterior probability, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Reversible-jump Markov chain Monte Carlo, Mixture model, Markov model, Computer Science::Graphics, Metropolis–Hastings algorithm, Computer Science::Computer Vision and Pattern Recognition, Prior probability, Statistics, Artificial intelligence, Statistics, Probability and Uncertainty, business, ComputingMethodologies_COMPUTERGRAPHICS, Mathematics
Abstract: Summary A Bayesian method for segmenting weed and crop textures is described and implemented. The work forms part of a project to identify weeds and crops in images so that selective crop spraying can be carried out. An image is subdivided into blocks and each block is modelled as a single texture. The number of different textures in the image is assumed unknown. A hierarchical Bayesian procedure is used where the texture labels have a Potts model (colour Ising Markov random field) prior and the pixels within a block are distributed according to a Gaussian Markov random field, with the parameters dependent on the type of texture. We simulate from the posterior distribution by using a reversible jump Metropolis–Hastings algorithm, where the number of different texture components is allowed to vary. The methodology is applied to a simulated image and then we carry out texture segmentation on the weed and crop images that motivated the work.
Published: 2003
Full Text: View/download PDF

41. Nonparametric regression for spherical data

Author: Charles C. Taylor, Agnese Panzera, and Marco Di Marzio
Subjects: Statistics and Probability, Polynomial regression, Polynomial, Mathematical optimization, Dimension (vector space), Statistics, Probability and Uncertainty, Local polynomial fitting, Spherical-linear regression, Spherical-spherical regression, Regression, Nonparametric regression, Curse of dimensionality, Interpretability, Mathematics, Parametric statistics
Abstract: We develop nonparametric smoothing for regression when both the predictor and the response variables are defined on a sphere of whatever dimension. A local polynomial fitting approach is pursued, which retains all the advantages in terms of rate optimality, interpretability, and ease of implementation widely observed in the standard setting. Our estimates have a multi-output nature, meaning that each coordinate is separately estimated, within a scheme of a regression with a linear response. The main properties include linearity and rotational equivariance. This research has been motivated by the fact that very few models describe this kind of regression. Such current methods are surely not widely employable since they have a parametric nature, and also require the same dimensionality for prediction and response spaces, along with nonrandom design. Our approach does not suffer these limitations. Real-data case studies and simulation experiments are used to illustrate the effectiveness of the method.
Published: 2014

42. The K ‐Function for Nearly Regular Point Processes

Author: Charles C. Taylor, Ian L. Dryden, and Rahman Farnoosh
Subjects: Statistics and Probability, Biometry, Movement, Gaussian, Equilateral triangle, Models, Biological, General Biochemistry, Genetics and Molecular Biology, Square (algebra), Point process, Regular grid, Combinatorics, symbols.namesake, Animals, Computer Simulation, Mathematics, Models, Statistical, General Immunology and Microbiology, Estimation theory, Applied Mathematics, Chlamydomonas, Mathematical analysis, Estimator, General Medicine, Grid, symbols, General Agricultural and Biological Sciences
Abstract: Summary. We propose modeling a nearly regular point pattern by a generalized Neyman-Scott process in which the offspring are Gaussian perturbations from a regular mean configuration. The mean configuration of interest is an equilateral grid, but our results can be used for any stationary regular grid. The case of uniformly distributed points is first studied as a benchmark. By considering the square of the interpoint distances, we can evaluate the first two moments of the K-function. These results can be used for parameter estimation, and simulations are used to both verify the theory and to assess the accuracy of the estimators. The methodology is applied to an investigation of regularity in plumes observed from swimming microorganisms.
Published: 2001
Full Text: View/download PDF

43. Procrustes Shape Analysis of Planar Point Subsets

Author: Ian L. Dryden, M. R. Faghihi, and Charles C. Taylor
Subjects: Statistics and Probability, Delaunay triangulation, Gaussian, Mathematical analysis, Isotropy, Covariance, Equilateral triangle, Combinatorics, symbols.namesake, symbols, Statistics, Probability and Uncertainty, Statistic, Mathematics, Shape analysis (digital geometry), Central limit theorem
Abstract: Summary Consider a set of points in the plane randomly perturbed about a mean configuration by Gaussian errors. In this paper a Procrustes statistic based on the shapes of subsets of the points is studied, and its approximate distribution is found for small variations. We derive various properties of the distribution including the first two moments, a central limit result and a scaled χ2–-approximation. We concentrate on the independent isotropic Gaussian error case, although the results are valid for general covariance structures. We investigate triangle subsets in detail and in particular the situation where the population mean is regular (i.e. a Delaunay triangulation of the mean of the process is comprised of equilateral triangles of the same size). We examine the variance of the statistic for differently shaped regions and provide an asymptotic result for general shaped regions. The results are applied to an investigation of regularity in human muscle fibre cross-sections.
Published: 1997
Full Text: View/download PDF

44. Matching markers and unlabeled configurations in protein gels

Author: Kanti V. Mardia, Charles C. Taylor, and Emma M. Petty
Subjects: Statistics and Probability, High probability, Electrophoresis, FOS: Computer and information sciences, Computer science, business.industry, Pattern recognition, shape, Statistics - Applications, Western Blots, Modeling and Simulation, Expectation–maximization algorithm, Applications (stat.AP), Artificial intelligence, Statistics, Probability and Uncertainty, business, Shape analysis (digital geometry)
Abstract: Unlabeled shape analysis is a rapidly emerging and challenging area of statistics. This has been driven by various novel applications in bioinformatics. We consider here the situation where two configurations are matched under various constraints, namely, the configurations have a subset of manually located "markers" with high probability of matching each other while a larger subset consists of unlabeled points. We consider a plausible model and give an implementation using the EM algorithm. The work is motivated by a real experiment of gels for renal cancer and our approach allows for the possibility of missing and misallocated markers. The methodology is successfully used to automatically locate and remove a grossly misallocated marker within the given data set., Comment: Published in at http://dx.doi.org/10.1214/12-AOAS544 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Published: 2012
Full Text: View/download PDF

45. A comparison of block and semi-parametric bootstrap methods for variance estimation in spatial statistics

Author: Mohsen Mohammadzadeh, N. Iranpanah, and Charles C. Taylor
Subjects: Statistics and Probability, Statistics::Theory, Estimation theory, Applied Mathematics, Bootstrap aggregating, Estimator, Semiparametric model, Computational Mathematics, Computational Theory and Mathematics, Kriging, Statistics, Statistics::Methodology, Block size, Spatial analysis, Block (data storage), Mathematics
Abstract: Efron (1979) introduced the bootstrap method for independent data but it cannot be easily applied to spatial data because of their dependency. For spatial data that are correlated in terms of their locations in the underlying space the moving block bootstrap method is usually used to estimate the precision measures of the estimators. The precision of the moving block bootstrap estimators is related to the block size which is difficult to select. In the moving block bootstrap method also the variance estimator is underestimated. In this paper, first the semi-parametric bootstrap is used to estimate the precision measures of estimators in spatial data analysis. In the semi-parametric bootstrap method, we use the estimation of the spatial correlation structure. Then, we compare the semi-parametric bootstrap with a moving block bootstrap for variance estimation of estimators in a simulation study. Finally, we use the semi-parametric bootstrap to analyze the coal-ash data.
Published: 2011

46. Estimating the Dimension of a Fractal

Author: Charles C. Taylor and James R. Taylor
Subjects: Statistics and Probability, 010102 general mathematics, 01 natural sciences, 010104 statistics & probability, Box counting, Fractal, Dimension (vector space), Complete information, Statistics, Statistical analysis, Limit (mathematics), 0101 mathematics, Algorithm, Mathematics
Abstract: SUMMARY We suggest refinements of the box counting method which address the obvious problems caused by the incomplete information and inaccessibility of the limit. A method for the statistical analysis of these corrected data is developed and tested on simulated and real data.
Published: 1991
Full Text: View/download PDF

47. A generative, probabilistic model of local protein structure

Author: Anders Krogh, Kanti V. Mardia, Jesper Ferkinghoff-Borg, Thomas Hamelryck, Wouter Boomsma, and Charles C. Taylor
Subjects: Models, Molecular, Multidisciplinary, Theoretical computer science, Models, Statistical, Continuous modelling, Computer science, Amino Acid Motifs, Probabilistic logic, Proteins, Statistical model, Protein structure prediction, Biological Sciences, Bioinformatics, Prime (order theory), Generative model, Fragment (logic), Generative grammar
Abstract: Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence–structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.
Published: 2008

48. Protein bioinformatics and mixtures of bivariate von Mises distributions for angular data

Author: Kanti V. Mardia, Ganesh Subramaniam, and Charles C. Taylor
Subjects: Statistics and Probability, Likelihood Functions, Models, Statistical, General Immunology and Microbiology, Myoglobin, Protein Conformation, Applied Mathematics, Directional statistics, Computational Biology, Proteins, Multivariate normal distribution, General Medicine, Bivariate analysis, Bioinformatics, General Biochemistry, Genetics and Molecular Biology, Protein Structure, Secondary, Malate Dehydrogenase, Expectation–maximization algorithm, von Mises distribution, von Mises yield criterion, Marginal distribution, General Agricultural and Biological Sciences, Algorithms, Mathematics, Ramachandran plot
Abstract: Summary A fundamental problem in bioinformatics is to characterize the secondary structure of a protein, which has traditionally been carried out by examining a scatterplot (Ramachandran plot) of the conformational angles. We examine two natural bivariate von Mises distributions—referred to as Sine and Cosine models—which have five parameters and, for concentrated data, tend to a bivariate normal distribution. These are analyzed and their main properties derived. Conditions on the parameters are established which result in bimodal behavior for the joint density and the marginal distribution, and we note an interesting situation in which the joint density is bimodal but the marginal distributions are unimodal. We carry out comparisons of the two models, and it is seen that the Cosine model may be preferred. Mixture distributions of the Cosine model are fitted to two representative protein datasets using the expectation maximization algorithm, which results in an objective partition of the scatterplot into a number of components. Our results are consistent with empirical observations; new insights are discussed.
Published: 2007

49. Learning in dynamically changing domains: Theory revision and context dependence issues

Author: Charles C. Taylor and Gholamreza Nakhaeizadeh
Subjects: Computational learning theory, Computer science, business.industry, Algorithmic learning theory, Context (language use), State (computer science), Artificial intelligence, business, Data science
Abstract: Dealing with dynamically changing domains is a very important topic in Machine Learning (ML) which has very interesting practical applications. Some attempts have already been made both in the statistical and machine learning communities to address some of the issues. In this paper we give a state of the art from the available literature in this area. We argue that a lot of further research is still needed, outline the directions that such research should go and describe the expected results. We argue also that most of the problems in this area can be solved only by interaction between the researchers of both the statistical and ML-communities.
Published: 1997
Full Text: View/download PDF

50. An understanding of muscle fibre images

Author: M. R. Faghihi, Ian L. Dryden, and Charles C. Taylor
Subjects: Delaunay triangulation, business.industry, media_common.quotation_subject, Isotropy, Pattern recognition, Geometry, Equilateral triangle, Normal muscle, Test statistic, Artificial intelligence, Cluster analysis, business, Random variable, Normality, Mathematics, media_common
Abstract: Images of muscle biopsies reveal a mosaic pattern of two (slow-twitch and fast-twitch) fibre-types. An analysis of such images can indicate some neuromuscular disorder. We briefly review some methods which analyse the arrangement of the fibres (e.g. clustering of fibre type) and the fibre sizes. The proposed methodology uses the cell centres as a set of landmarks from which a Delaunay triangulation is created. The shapes of these (correlated) triangles are then used in a test statistic, to ascertain normality of a muscle. Our “normal muscle” model supposes that the fibres are hexagonal (so that the triangulation is made up of equilateral triangles) with a perturbation of specified isotropic variance of the fibre centres. We obtain the distribution of the test statistic as an approximate function of a χ2 random variable, so that a formal test can be carried out.
Published: 1995
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

85 results on '"Charles C. Taylor"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources