78 results on '"Bertrand Clarke"'
Search Results
2. Point Prediction for Streaming Data.
- Author
-
Aleena Chanda, N. V. Vinodchandran, and Bertrand Clarke
- Published
- 2024
- Full Text
- View/download PDF
3. Predictive stability criteria for penalty selection in linear models
- Author
-
Dean Dustin, Bertrand Clarke, and Jennifer Clarke
- Subjects
Statistics and Probability ,Computational Mathematics ,Statistics, Probability and Uncertainty - Abstract
Choosing a shrinkage method can be done by selecting a penalty from a list of pre-specified penalties or by constructing a penalty based on the data. If a list of penalties for a class of linear models is given, we introduce a predictive stability criterion based on data perturbation to select a shrinkage method from the list. Simulation studies show that our predictive method identifies shrinkage methods that usually agree with existing literature and help explain heuristically when a given shrinkage method can be expected to perform well. If the preference is to construct a penalty customized for a given problem, then we propose a technique based on genetic algorithms, again using a predictive criterion. We find that, in general, a custom penalty never performs worse than any commonly used penalties and there are cases the custom penalty reduces to a recognizable penalty. Since penalty selection is mathematically equivalent to prior selection, our method also constructs priors. Our methodology allows us to observe that the oracle property typically holds for penalties that satisfy basic regularity conditions and therefore is not restrictive enough to play a direct role in penalty selection. In addition, our methodology, can be immediately applied to real data problems, and permits us to take model mis-specification into account.
- Published
- 2023
- Full Text
- View/download PDF
4. Interpreting uninterpretable predictors: kernel methods, Shtarkov solutions, and random forests
- Author
-
Bertrand Clarke and Tri Le
- Subjects
Statistics and Probability ,Kernel method ,Computational Theory and Mathematics ,Uninterpretable ,Computer science ,Applied Mathematics ,Statistics ,Statistics, Probability and Uncertainty ,Analysis ,Random forest - Published
- 2021
- Full Text
- View/download PDF
5. In praise of partially interpretable predictors
- Author
-
Tri Le and Bertrand Clarke
- Subjects
Mean squared error ,media_common.quotation_subject ,Statistics ,Praise ,Analysis ,Computer Science Applications ,Information Systems ,media_common ,Mathematics - Published
- 2020
- Full Text
- View/download PDF
6. Discussion of ‘Prior-based Bayesian Information Criterion (PBIC)’
- Author
-
Bertrand Clarke
- Subjects
Statistics and Probability ,Computational Theory and Mathematics ,business.industry ,Computer science ,Bayesian information criterion ,Applied Mathematics ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,Machine learning ,computer.software_genre ,computer ,Analysis - Published
- 2019
- Full Text
- View/download PDF
7. On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers
- Author
-
Tri Le and Bertrand Clarke
- Subjects
Boosting (machine learning) ,Computer science ,business.industry ,Pattern recognition ,02 engineering and technology ,Library and Information Sciences ,Bayes classifier ,Logistic regression ,01 natural sciences ,Ensemble learning ,Random forest ,010104 statistics & probability ,Bayes' theorem ,Mathematics (miscellaneous) ,0202 electrical engineering, electronic engineering, information engineering ,Statistics::Methodology ,020201 artificial intelligence & image processing ,Psychology (miscellaneous) ,Artificial intelligence ,0101 mathematics ,Statistics, Probability and Uncertainty ,business ,Classifier (UML) ,Interpretability - Abstract
Many of the best classifiers are ensemble methods such as bagging, random forests, boosting, and Bayes model averaging. We give conditions under which each of these four classifiers can be regarded as a Bayes classifier. We also give conditions under which stacking achieves the minimal Bayes risk. We compare the four classifiers with a logistic regression classifier to assess the cost of interpretability. First we characterize the increase in risk from using an ensemble method in a logistic classifier versus using it directly. Second, we characterize the change in risk from applying logistic regression to an ensemble method versus using the logistic classifier itself. Third, we give necessary and sufficient conditions for the logistic classifier to be worse than combining the logistic classifier and the Bayes classifier. Hence these results extend to ensemble classifiers that are asymptotically Bayes.
- Published
- 2018
- Full Text
- View/download PDF
8. Predicting antibiotic resistance gene abundance in activated sludge using shotgun metagenomics and machine learning
- Author
-
Jennifer Clarke, Xu Li, Bertrand Clarke, and Yuepeng Sun
- Subjects
Environmental Engineering ,0208 environmental biotechnology ,Candidatus Accumulibacter ,Indicator bacteria ,02 engineering and technology ,Wastewater ,010501 environmental sciences ,Biology ,Machine learning ,computer.software_genre ,01 natural sciences ,Machine Learning ,Clostridium ,Microbiome ,Waste Management and Disposal ,0105 earth and related environmental sciences ,Water Science and Technology ,Civil and Structural Engineering ,Sewage ,business.industry ,Ecological Modeling ,Drug Resistance, Microbial ,biology.organism_classification ,Pollution ,Anti-Bacterial Agents ,020801 environmental engineering ,Resistome ,Genes, Bacterial ,Metagenomics ,Artificial intelligence ,Bacteroides ,business ,Nitrospira ,computer - Abstract
While the microbiome of activated sludge (AS) in wastewater treatment plants (WWTPs) plays a vital role in shaping the resistome, identifying the potential bacterial hosts of antibiotic resistance genes (ARGs) in WWTPs remains challenging. The objective of this study is to explore the feasibility of using a machine learning approach, random forests (RF's), to identify the strength of associations between ARGs and bacterial taxa in metagenomic datasets from the activated sludge of WWTPs. Our results show that the abundance of select ARGs can be predicted by RF's using abundant genera (Candidatus Accumulibacter, Dechloromonas, Pesudomonas, and Thauera, etc.), (opportunistic) pathogens and indicators (Bacteroides, Clostridium, and Streptococcus, etc.), and nitrifiers (Nitrosomonas and Nitrospira, etc.) as explanatory variables. The correlations between predicted and observed abundance of ARGs (erm(B), tet(O), tet(Q), etc.) ranged from medium (0.400 0.600) when validated on testing datasets. Compared to those belonging to the other two groups, individual genera in the group of (opportunistic) pathogens and indicator bacteria had more positive functional relationships with select ARGs, suggesting genera in this group (e.g., Bacteroides, Clostridium, and Streptococcus) may be hosts of select ARGs. Furthermore, RF's with (opportunistic) pathogens and indicators as explanatory variables were used to predict the abundance of select ARGs in a full-scale WWTP successfully. Machine learning approaches such as RF's can potentially identify bacterial hosts of ARGs and reveal possible functional relationships between the ARGs and microbial community in the AS of WWTPs.
- Published
- 2021
- Full Text
- View/download PDF
9. Preface for the Jayanta K. Ghosh Memorial Volume of Sankhya, Series B
- Author
-
Bertrand Clarke and Gauri Sankar Datta
- Subjects
Statistics and Probability ,Combinatorics ,Series (mathematics) ,Applied Mathematics ,Statistics, Probability and Uncertainty ,Volume (compression) ,Mathematics - Published
- 2020
- Full Text
- View/download PDF
10. Modeling association in microbial communities with clique loglinear models
- Author
-
Bertrand Clarke, Adrian Dobra, Jennifer Clarke, Dragana Ajdic, and Camilo Valdes
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,model selection ,Computer science ,microbiome ,Computational biology ,01 natural sciences ,Statistics - Applications ,010104 statistics & probability ,03 medical and health sciences ,Bayes' theorem ,graphical models ,Applications (stat.AP) ,Taxonomic rank ,Microbiome ,0101 mathematics ,030304 developmental biology ,Clique ,next generation sequencing ,0303 health sciences ,Model selection ,Contingency tables ,Metagenomics ,Modeling and Simulation ,62H17 ,Log-linear model ,Statistics, Probability and Uncertainty ,Human Microbiome Project - Abstract
There is a growing awareness of the important roles that microbial communities play in complex biological processes. Modern investigation of these often uses next generation sequencing of metagenomic samples to determine community composition. We propose a statistical technique based on clique loglinear models and Bayes model averaging to identify microbial components in a metagenomic sample at various taxonomic levels that have significant associations. We describe the model class, a stochastic search technique for model selection, and the calculation of estimates of posterior probabilities of interest. We demonstrate our approach using data from the Human Microbiome Project and from a study of the skin microbiome in chronic wound healing. Our technique also identifies significant dependencies among microbial components as evidence of possible microbial syntrophy. KEYWORDS: contingency tables, graphical models, model selection, microbiome, next generation sequencing, Comment: 30 pages, 17 figure
- Published
- 2019
11. Using the Bayesian Shtarkov solution for predictions
- Author
-
Tri Le and Bertrand Clarke
- Subjects
Statistics and Probability ,Bayes ,Model average ,Computation ,Bayesian probability ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,010104 statistics & probability ,symbols.namesake ,Bayes' theorem ,Bagging ,0202 electrical engineering, electronic engineering, information engineering ,Prequential ,0101 mathematics ,Additive model ,Gaussian process ,Mathematics ,business.industry ,Applied Mathematics ,Estimator ,020206 networking & telecommunications ,Support vector machine ,Computational Mathematics ,Stacking ,Computational Theory and Mathematics ,Shtarkov predictor ,symbols ,Artificial intelligence ,Variety (universal algebra) ,business ,computer ,Algorithm - Abstract
The Bayes Shtarkov predictor can be defined and used for a variety of data sets that are exceedingly hard if not impossible to model in any detailed fashion. Indeed, this is the setting in which the derivation of the Shtarkov solution is most compelling. The computations show that anytime the numerical approximation to the Shtarkov solution is ‘reasonable’, it is better in terms of predictive error than a variety of other general predictive procedures. These include two forms of additive model as well as bagging or stacking with support vector machines, Nadaraya–Watson estimators, or draws from a Gaussian Process Prior.
- Published
- 2016
- Full Text
- View/download PDF
12. Models and Predictors: A Bickering Couple
- Author
-
Jennifer Clarke and Bertrand Clarke
- Published
- 2018
- Full Text
- View/download PDF
13. Defining a Predictive Paradigm
- Author
-
Bertrand Clarke and Jennifer Clarke
- Published
- 2018
- Full Text
- View/download PDF
14. Designing Bacteria
- Author
-
Jay E. Mittenthal, Bertrand Clarke, and Mark Levinthal
- Published
- 2018
- Full Text
- View/download PDF
15. Regular, median and Huber cross‐validation: A computational comparison
- Author
-
Bertrand Clarke and Chi Wai Yu
- Subjects
Model selection ,Linear model ,Estimator ,Residual ,Cross-validation ,Computer Science Applications ,Huber loss ,Skewness ,Statistics ,Outlier ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Analysis ,circulatory and respiratory physiology ,Information Systems ,Mathematics - Abstract
We present a new technique for comparing models using a median form of cross-validation and least median of squares estimation MCV-LMS. Rather than minimizing the sums of squares of residual errors, we minimize the median of the squared residual errors. We compare this with a robustified form of cross-validation using the Huber loss function and robust coefficient estimators HCV. Through extensive simulations we find that for linear models MCV-LMS outperforms HCV for data that is representative of the data generator when the tails of the noise distribution are heavy enough and asymmetric enough. We also find that MCV-LMS is often better able to detect the presence of small terms. Otherwise, HCV typically outperforms MCV-LMS for 'good' data. MCV-LMS also outperforms HCV in the presence of enough severe outliers.
- Published
- 2015
- Full Text
- View/download PDF
16. Detecting bacterial genomes in a metagenomic sample using NGS reads
- Author
-
Bertrand Clarke, Camilo Valdes, Jennifer Clarke, and Meghan Brennan
- Subjects
Statistics and Probability ,Metagenomics ,Applied Mathematics ,Earth Microbiome Project ,Computational biology ,Bacterial genome size ,Biology ,Bioinformatics ,Sample (graphics) ,DNA sequencing ,Human Microbiome Project - Published
- 2015
- Full Text
- View/download PDF
17. Statistical Problem Classes and Their Links to Information Theory
- Author
-
Jennifer Clarke, Chi Wai Yu, and Bertrand Clarke
- Subjects
Economics and Econometrics ,Theoretical computer science ,Kullback–Leibler divergence ,Bayesian information criterion ,Principle of maximum entropy ,Model selection ,Econometrics ,Entropy (information theory) ,Akaike information criterion ,Information theory ,Information diagram ,Mathematics - Abstract
We begin by recalling the tripartite division of statistical problems into three classes, M-closed, M-complete, and M-open and then reviewing the key ideas of introductory Shannon theory. Focusing on the related but distinct goals of model selection and prediction, we argue that different techniques for these two goals are appropriate for the three different problem classes. For M-closed problems we give relative entropy justification that the Bayes information criterion (BIC) is appropriate for model selection and that the Bayes model average is information optimal for prediction. For M-complete problems, we discuss the principle of maximum entropy and a way to use the rate distortion function to bypass the inaccessibility of the true distribution. For prediction in the M-complete class, there is little work done on information based model averaging so we discuss the Akaike information criterion (AIC) and its properties and variants. For the M-open class, we argue that essentially only predictive criteri...
- Published
- 2013
- Full Text
- View/download PDF
18. A Bayesian criterion for cluster stability
- Author
-
Bertrand Clarke and Hoyt Koepke
- Subjects
Bayesian probability ,Perturbation (astronomy) ,computer.software_genre ,Computer Science Applications ,Data set ,Distance matrix ,Prior probability ,Cluster (physics) ,Statistical analysis ,Data mining ,Cluster analysis ,Algorithm ,computer ,Analysis ,Information Systems ,Mathematics - Abstract
We present a technique for evaluating and comparing how clusterings reveal structure inherent in the data set. Our technique is based on a criterion evaluating how much point-to-cluster distances may be perturbed without affecting the membership of the points. Although similar to some existing perturbation methods, our approach distinguishes itself in five ways. First, the strength of the perturbations is indexed by a prior distribution controlling how close to boundary regions a point may be before it is considered unstable. Second, our approach is exact in that we integrate over all the perturbations; in practice, this can be done efficiently for well-chosen prior distributions. Third, we provide a rigorous theoretical treatment of the approach, showing that it is consistent for estimating the correct number of clusters. Fourth, it yields a detailed picture of the behavior and structure of the clustering. Finally, it is computationally tractable and easy to use, requiring only a point-to-cluster distance matrix as input. In a simulation study, we show that it outperforms several existing methods in terms of recovering the correct number of clusters. We also illustrate the technique in three real data sets. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013
- Published
- 2013
- Full Text
- View/download PDF
19. EnsCat: clustering of categorical data via ensembling
- Author
-
Jennifer Clarke, Bertrand Clarke, and Saeid Amiri
- Subjects
0301 basic medicine ,Clustering high-dimensional data ,Computer science ,High dimensional data ,Population ,Inference ,computer.software_genre ,01 natural sciences ,Biochemistry ,Clustering ,010104 statistics & probability ,03 medical and health sciences ,Structural Biology ,Cluster Analysis ,0101 mathematics ,education ,Cluster analysis ,Molecular Biology ,Categorical variable ,Categorical data ,education.field_of_study ,Applied Mathematics ,Computational Biology ,Ensembling methods ,Genomics ,Computer Science Applications ,Data set ,030104 developmental biology ,ComputingMethodologies_PATTERNRECOGNITION ,Metric (mathematics) ,Unsupervised learning ,Data mining ,computer ,Algorithms ,Software ,Curse of dimensionality - Abstract
Background Clustering is a widely used collection of unsupervised learning techniques for identifying natural classes within a data set. It is often used in bioinformatics to infer population substructure. Genomic data are often categorical and high dimensional, e.g., long sequences of nucleotides. This makes inference challenging: The distance metric is often not well-defined on categorical data; running time for computations using high dimensional data can be considerable; and the Curse of Dimensionality often impedes the interpretation of the results. Up to the present, however, the literature and software addressing clustering for categorical data has not yet led to a standard approach. Results We present software for an ensemble method that performs well in comparison with other methods regardless of the dimensionality of the data. In an ensemble method a variety of instantiations of a statistical object are found and then combined into a consensus value. It has been known for decades that ensembling generally outperforms the components that comprise it in many settings. Here, we apply this ensembling principle to clustering. We begin by generating many hierarchical clusterings with different clustering sizes. When the dimension of the data is high, we also randomly select subspaces also of variable size, to generate clusterings. Then, we combine these clusterings into a single membership matrix and use this to obtain a new, ensembled dissimilarity matrix using Hamming distance. Conclusions Ensemble clustering, as implemented in R and called EnsCat, gives more clearly separated clusters than other clustering techniques for categorical data. The latest version with manual and examples is available at https://github.com/jlp2duke/EnsCat.
- Published
- 2016
20. A Bayes interpretation of stacking for M-complete and M-open settings
- Author
-
Bertrand Clarke and Tri Le
- Subjects
Statistics and Probability ,Bayes estimator ,Basis (linear algebra) ,Applied Mathematics ,Bayesian probability ,Stacking ,Mathematics - Statistics Theory ,02 engineering and technology ,Statistics Theory (math.ST) ,01 natural sciences ,62F15, 62C10 ,Cross-validation ,Interpretation (model theory) ,Constraint (information theory) ,010104 statistics & probability ,Bayes' theorem ,Statistics ,0202 electrical engineering, electronic engineering, information engineering ,FOS: Mathematics ,020201 artificial intelligence & image processing ,0101 mathematics ,Algorithm ,Mathematics - Abstract
In M-open problems where no true model can be conceptualized, it is common to back off from modeling and merely seek good prediction. Even in M-complete problems, taking a predictive approach can be very useful. Stacking is a model averaging procedure that gives a composite predictor by combining individual predictors from a list of models using weights that optimize a cross-validation criterion. We show that the stacking weights also asymptotically minimize a posterior expected loss. Hence we formally provide a Bayesian justification for cross-validation. Often the weights are constrained to be positive and sum to one. For greater generality, we omit the positivity constraint and relax the `sum to one' constraint. A key question is `What predictors should be in the average?' We first verify that the stacking error depends only on the span of the models. Then we propose using bootstrap samples from the data to generate empirical basis elements that can be used to form models. We use this in two computed examples to give stacking predictors that are (i) data driven, (ii) optimal with respect to the number of component predictors, and (iii) optimal with respect to the weight each predictor gets., Comment: 37 pages, 2 figures
- Published
- 2016
- Full Text
- View/download PDF
21. Median loss decision theory
- Author
-
Bertrand Clarke and Chi Wai Yu
- Subjects
Statistics and Probability ,Heavy-tailed distribution ,Robustness (computer science) ,Applied Mathematics ,Decision theory ,Outlier ,Posterior probability ,Statistics ,Estimator ,Minification ,Statistics, Probability and Uncertainty ,Statistical theory ,Mathematics - Abstract
In this paper, we argue that replacing the expectation of the loss in statistical decision theory with the median of the loss leads to a viable and useful alternative to conventional risk minimization particularly because it can be used with heavy tailed distributions. We investigate three possible definitions for such medloss estimators and derive examples of them in several standard settings. We argue that the medloss definition based on the posterior distribution is better than the other two definitions that do not permit optimization over large classes of estimators. We argue that median loss minimizing estimates often yield improved performance, have resistance to outliers as high as the usual robust estimates, and are resistant to the specific loss used to form them. In simulations with the posterior medloss formulation, we show how the estimates can be obtained numerically and that they can have better robustness properties than estimates derived from risk minimization.
- Published
- 2011
- Full Text
- View/download PDF
22. Asymptotics of Bayesian median loss estimation
- Author
-
Bertrand Clarke and Chi Wai Yu
- Subjects
Statistics and Probability ,Hodges–Lehmann estimator ,Median ,Least trimmed squares ,Posterior ,01 natural sciences ,010104 statistics & probability ,Frequentist inference ,0502 economics and business ,Statistics ,Least median of squares estimator ,0101 mathematics ,050205 econometrics ,Mathematics ,Numerical Analysis ,05 social sciences ,Estimator ,Trimmed estimator ,Loss function ,Regression ,Least trimmed squares estimator ,Efficient estimator ,Statistics, Probability and Uncertainty ,Minimax estimator ,Asymptotics ,Invariant estimator - Abstract
We establish the consistency, asymptotic normality, and efficiency for estimators derived by minimizing the median of a loss function in a Bayesian context. We contrast this procedure with the behavior of two Frequentist procedures, the least median of squares (LMS) and the least trimmed squares (LTS) estimators, in regression problems. The LMS estimator is the Frequentist version of our estimator, and the LTS estimator approaches a median-based estimator as the trimming approaches 50% on each side. We argue that the Bayesian median-based method is a good tradeoff between the two Frequentist estimators.
- Published
- 2010
- Full Text
- View/download PDF
23. Bias-variance trade-off for prequential model list selection
- Author
-
Bertrand Clarke and Ernest Fokoué
- Subjects
Statistics and Probability ,Mathematical optimization ,Computer science ,Model selection ,Statistics ,Variance (accounting) ,Data generator ,Function (mathematics) ,Statistics, Probability and Uncertainty ,Decision problem ,Space (commercial competition) ,Trade-off ,Selection (genetic algorithm) - Abstract
The prequential approach to statistics leads naturally to model list selection because the sequential reformulation of the problem is a guided search over model lists drawn from a model space. That is, continually updating the action space of a decision problem to achieve optimal prediction forces the collection of models under consideration to grow neither too fast nor too slow to avoid excess variance and excess bias, respectively. At the same time, the goal of good predictive performance forces the search over good predictors formed from a model list to close in on the data generator. Taken together, prequential model list re-selection favors model lists which provide an effective approximation to the data generator but do so by making the approximation match the unknown function on important regions as determined by empirical bias and variance.
- Published
- 2009
- Full Text
- View/download PDF
24. Prequential analysis of complex data with adaptive model reselection
- Author
-
Bertrand Clarke and Jennifer Clarke
- Subjects
Complex data type ,Computer science ,Data mining ,computer.software_genre ,computer ,Analysis ,Computer Science Applications ,Information Systems - Published
- 2009
- Full Text
- View/download PDF
25. Information conversion, effective samples, and parameter size
- Author
-
Xiaodong Lin, J. Pittman, and Bertrand Clarke
- Subjects
Kullback–Leibler divergence ,Bayesian probability ,Library and Information Sciences ,Article ,Statistics::Computation ,Computer Science Applications ,Mixture theory ,Sample size determination ,Statistics ,Statistics::Methodology ,Bayesian hierarchical modeling ,Entropy (information theory) ,Nuisance parameter ,Algorithm ,Random variable ,Information Systems ,Mathematics - Abstract
Consider the relative entropy between a posterior density for a parameter given a sample and a second posterior density for the same parameter, based on a different model and a different data set. Then the relative entropy can be minimized over the second sample to get a virtual sample that would make the second posterior as close as possible to the first in an informational sense. If the first posterior is based on a dependent dataset and the second posterior uses an independence model, the effective inferential power of the dependent sample is transferred into the independent sample by the optimization. Examples of this optimization are presented for models with nuisance parameters, finite mixture models, and models for correlated data. Our approach is also used to choose the effective parameter size in a Bayesian hierarchical model.
- Published
- 2007
- Full Text
- View/download PDF
26. Information optimality and Bayesian modelling
- Author
-
Bertrand Clarke
- Subjects
Economics and Econometrics ,business.industry ,Applied Mathematics ,Model selection ,Conditional mutual information ,Bayesian probability ,Information processing ,Pointwise mutual information ,Machine learning ,computer.software_genre ,Information theory ,Prior probability ,Econometrics ,Multivariate mutual information ,Artificial intelligence ,business ,computer ,Mathematics - Abstract
The general approach of treating a statistical problem as one of information processing led to the Bayesian method of moments, reference priors, minimal information likelihoods, and stochastic complexity. These techniques rest on quantities that have physical interpretations from information theory. Current work includes: the role of prediction, the emergence of data dependent priors, the role of information measures in model selection, and the use of conditional mutual information to incorporate partial information.
- Published
- 2007
- Full Text
- View/download PDF
27. Clustering categorical data via ensembling dissimilarity matrices
- Author
-
Saeid Amiri, Bertrand Clarke, and Jennifer Clarke
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Phylogenetic tree ,business.industry ,Machine Learning (stat.ML) ,Context (language use) ,Pattern recognition ,02 engineering and technology ,High dimensional ,01 natural sciences ,010104 statistics & probability ,ComputingMethodologies_PATTERNRECOGNITION ,Statistics - Machine Learning ,0202 electrical engineering, electronic engineering, information engineering ,Discrete Mathematics and Combinatorics ,020201 artificial intelligence & image processing ,Artificial intelligence ,0101 mathematics ,Statistics, Probability and Uncertainty ,Cluster analysis ,business ,Categorical variable ,Mathematics - Abstract
We present a technique for clustering categorical data by generating many dissimilarity matrices and averaging over them. We begin by demonstrating our technique on low dimensional categorical data and comparing it to several other techniques that have been proposed. Then we give conditions under which our method should yield good results in general. Our method extends to high dimensional categorical data of equal lengths by ensembling over many choices of explanatory variables. In this context we compare our method with two other methods. Finally, we extend our method to high dimensional categorical data vectors of unequal length by using alignment techniques to equalize the lengths. We give examples to show that our method continues to provide good results, in particular, better in the context of genome sequences than clusterings suggested by phylogenetic trees.
- Published
- 2015
28. A General Hybrid Clustering Technique
- Author
-
Jennifer Clarke, Hoyt Koepke, Saeid Amiri, and Bertrand Clarke
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Single Linkage ,business.industry ,05 social sciences ,k-means clustering ,Pattern recognition ,Machine Learning (stat.ML) ,01 natural sciences ,Machine Learning (cs.LG) ,Computer Science - Learning ,010104 statistics & probability ,ComputingMethodologies_PATTERNRECOGNITION ,Statistics - Machine Learning ,Consistency (statistics) ,0502 economics and business ,Outlier ,Discrete Mathematics and Combinatorics ,Artificial intelligence ,0101 mathematics ,Statistics, Probability and Uncertainty ,Cluster analysis ,business ,050205 econometrics ,Mathematics - Abstract
Here, we propose a clustering technique for general clustering problems including those that have non-convex clusters. For a given desired number of clusters $K$, we use three stages to find a clustering. The first stage uses a hybrid clustering technique to produce a series of clusterings of various sizes (randomly selected). They key steps are to find a $K$-means clustering using $K_\ell$ clusters where $K_\ell \gg K$ and then joins these small clusters by using single linkage clustering. The second stage stabilizes the result of stage one by reclustering via the `membership matrix' under Hamming distance to generate a dendrogram. The third stage is to cut the dendrogram to get $K^*$ clusters where $K^* \geq K$ and then prune back to $K$ to give a final clustering. A variant on our technique also gives a reasonable estimate for $K_T$, the true number of clusters. We provide a series of arguments to justify the steps in the stages of our methods and we provide numerous examples involving real and simulated data to compare our technique with other related techniques.
- Published
- 2015
- Full Text
- View/download PDF
29. Netscan: a procedure for generating reaction networks by size
- Author
-
Jay E. Mittenthal, Bertrand Clarke, and Glenn Fawcett
- Subjects
Statistics and Probability ,File Transfer Protocol ,Theoretical computer science ,General Immunology and Microbiology ,business.industry ,Computer science ,Systems Biology ,Applied Mathematics ,Network identification ,General Medicine ,Models, Biological ,General Biochemistry, Genetics and Molecular Biology ,Constraint (information theory) ,Software ,Software Design ,Modeling and Simulation ,Animals ,Humans ,General Agricultural and Biological Sciences ,business ,Algorithms ,Signal Transduction - Abstract
In this paper, we describe an algorithm which can be used to generate the collection of networks, in order of increasing size, that are compatible with a list of chemical reactions and that satisfy a constraint. Our algorithm has been encoded and the software, called Netscan, can be freely downloaded from ftp://ftp.stat.ubc.ca/pub/riffraff/Netscanfiles, along with a manual, for general scientific use. Our algorithm may require pre-processing to ensure that the quantities it acts on are physically relevant and because it outputs sets of reactions, which we call canonical networks, that must be elaborated into full networks.
- Published
- 2004
- Full Text
- View/download PDF
30. Improvement over bayes prediction in small samples in the presence of model uncertainty
- Author
-
Bertrand Clarke and Hubert Wong
- Subjects
Statistics and Probability ,Bayes' theorem ,Statistics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
In an online prediction context, the authors introduce a new class of mongrel criteria that allow for the weighing of candidate models and the combination of their predictions based both on model-based and empirical measures of their performance. They present simulation results which show that model averaging using the mongrel-derived weights leads, in small samples, to predictions that are more accurate than that obtained by Bayesian weight updating, provided that none of the candidate models is too distant from the data generator. Amelioration de la prevision bayesienne dans les petits ehantillons en presence d'incertitude a propos du modele Dans un contexte de prevision continue, les auteurs proposent une nouvelle classe de criteres “metisses” permettant de ponderer differents modeles envisages et de combiner leurs previsions a partir de mesures fondees sur ces modeles et sur leur performance empirique. Ils font etat de simulations montrant que la synthese de modeles au moyen de poids metisses conduit, dans de petits echantillons, a des previsions plus precises que celle obtenue par mise a jour bayesienne des poids, pourvu qu'aucun des modeles en cause ne soit trop eloigne de celui dont emanent les donnees.
- Published
- 2004
- Full Text
- View/download PDF
31. Partial information reference priors: derivation and interpretations
- Author
-
Bertrand Clarke and Ao Yuan
- Subjects
Statistics and Probability ,Calibration (statistics) ,Applied Mathematics ,Estimator ,Asymptotic distribution ,Function (mathematics) ,Conditional probability distribution ,Density estimation ,Mutual information ,Combinatorics ,Prior probability ,Calculus ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
Suppose X1,…,Xn are IID p(·|θ,ψ) where (θ,ψ)∈ R d is distributed according to the prior density w(·). For estimators S n =S( X ) and T n =T( X ) assumed to be consistent for some function of θ and asymptotically normal, we examine the conditional Shannon mutual information (CSMI) between Θ and Tn given Ψ and Sn, I(Θ,Tn|Ψ,Sn). It is seen there are several important special cases of this CSMI. We establish asymptotic formulas for various cases and identify the resulting noninformative reference priors. As a consequence, we develop the notion of data-dependent priors and a calibration for how close an estimator is to sufficiency.
- Published
- 2004
- Full Text
- View/download PDF
32. A characterization of consistency of model weights given partial information in normal linear models
- Author
-
Hubert Wong and Bertrand Clarke
- Subjects
Statistics and Probability ,Probability theory ,Consistency (statistics) ,Statistics ,Linear regression ,Consistent estimator ,Posterior probability ,Linear model ,Conditional probability ,Applied mathematics ,Affine transformation ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
We characterize the consistency of posterior model probabilities that are computed conditional on affine functions of the outcome variable for normal linear models.
- Published
- 2004
- Full Text
- View/download PDF
33. Asymptotic normality of the posterior given a statistic
- Author
-
Bertrand Clarke and Ao Yuan
- Subjects
Statistics and Probability ,Combinatorics ,Prior probability ,Statistics ,Asymptotic distribution ,Limiting ,Statistics, Probability and Uncertainty ,Edgeworth series ,Posterior density ,Random variable ,Statistic ,Central limit theorem ,Mathematics - Abstract
The authors establish the asymptotic normality and determine the limiting variance of the posterior density for a multivariate parameter, given the value of a consistent and asymptotically Gaussian statistic satisfying a uniform local central limit theorem. Their proof is given in the continuous case but generalizes to lattice-valued random variables. It hinges on a uniform Edgeworth expansion used to control the behaviour of the conditioning statistic. They provide examples and show how their result can help in identifying reference priors. La normalite asymptotique de la loi a posteriori etant donne la valeur d'une statistique Les auteurs etablissent la normalite asymptotique et determinent la variance limite de la densite a posteriori d'un parametre multivarie, etant donne la valeur d'une statistique convergente et asymptotique-ment gaussienne repondant a un theoreme central limite local uniforme. Leur demonstration vaut pour le cas continu mais peut ětre etendue aux variables aleatoires a valeurs sur un treillis. Elle s'appuie sur une expansion de Edgeworth uniforme permettant de contrǒler le comportement de la statistique par laquelle on conditionne, us presentent des exemples et montrent en quoi leur resultat permet d'identifier des lois referentielles a priori.
- Published
- 2004
- Full Text
- View/download PDF
34. Decomposing posterior variance
- Author
-
Bertrand Clarke and Paul Gustafson
- Subjects
Statistics and Probability ,Standard error ,Bayesian robustness ,Applied Mathematics ,Bayesian probability ,Statistics ,Prior probability ,Parametric model ,Posterior variance ,Decomposition method (constraint satisfaction) ,Statistics, Probability and Uncertainty ,Uncertainty analysis ,Mathematics - Abstract
We propose a decomposition of posterior variance somewhat in the spirit of an ANOVA decomposition. Terms in this decomposition come in pairs. Given a single parametric model, for instance, one term describes uncertainty arising because the parameter value is unknown while the other describes uncertainty propagated via uncertainty about which prior distribution is appropriate for the parameter. In the context of multiple candidate models and model-averaged estimates, two additional terms emerge resulting in a four-term decomposition. In the context of multiple spaces of models, six terms result. The value of the decomposition is twofold. First, it yields a fuller accounting of uncertainty than methods which condition on data-driven choices of models or model spaces. Second, it constitutes a novel approach to the study of prior influence in Bayesian analysis.
- Published
- 2004
- Full Text
- View/download PDF
35. HOW CELLS AVOID ERRORS IN METABOLIC AND SIGNALING NETWORKS
- Author
-
Jay E. Mittenthal, Alexander Scheeline, and Bertrand Clarke
- Subjects
Computer science ,Negative feedback ,Statistical and Nonlinear Physics ,Sigmoid function ,Kinetic proofreading ,Condensed Matter Physics ,Algorithm ,Reliability (statistics) ,Sequence (medicine) - Abstract
We examine features of intracellular networks that make errors less probable and beneficial responses more probable. In a false negative (F-) error, a network does not respond to input. A network is reliable if it operates with a low probability of a F-error. Features that promote reliability include fewer reactions in sequence, more alternative pathways, no side reactions and negative feedback. In a false positive (F+) error, a network produces output without input. Here, a network is specific if it has a low probability of a F+error. Conjunctions of signals within or between pathways can improve specificity through sigmoid steady-state response curves, kinetic proofreading and checkpoints. Both reliability and specificity are important in networks that regulate the fate of a cell and in networks with hubs or modules, and this includes scale-free networks. Some networks discriminate among several inputs by responding to each input through a different combination of pathways.
- Published
- 2003
- Full Text
- View/download PDF
36. A minimally informative likelihood for decision analysis: Illustration and robustness
- Author
-
Bertrand Clarke and Ao Yuan
- Subjects
Statistics and Probability ,Blahut–Arimoto algorithm ,Robustness (computer science) ,Statistics, Probability and Uncertainty ,Rate distortion ,Algorithm ,Mathematical economics ,Decision analysis ,Mathematics - Abstract
The authors discuss a class of likelihood functions involving weak assumptions on data generating mechanisms. These likelihoods may be appropriate when it is difficult to propose models for the data. The properties of these likelihoods are given and it is shown how they can be computed numerically by use of the Blahut-Arimoto algorithm. The authors then show how these likelihoods can give useful inferences using a data set for which no plausible physical model is apparent. The plausibility of the inferences is enhanced by the extensive robustness analysis these likelihoods permit. Les auteurs montrent comment il est possible, en l'absence de modele naturel pour des observations, de construire une classe de fonctions de vraisemblance a partir d'hypotheses tres faibles concernant l'origine des donnees. Ils presentent les proprietes de ces vraisemblances a information minimale et expliquent comment les calculer a l'aide de l'algorithme de Blahut-Arimoto. Ils illustrent la faisabilite et l'utilite de cette approche au moyen d'un exemple concret. Comme cette methode se prěte bien a une etude de robustesse, les conclusions auxquelles elle conduit sont d'autant plus plausibles.
- Published
- 1999
- Full Text
- View/download PDF
37. An information criterion for likelihood selection
- Author
-
Ao Yuan and Bertrand Clarke
- Subjects
Mathematical optimization ,Conditional probability distribution ,Mutual information ,Library and Information Sciences ,Upper and lower bounds ,Computer Science Applications ,Rate–distortion theory ,Bayes' theorem ,Distortion ,Applied mathematics ,Limit (mathematics) ,Parametric family ,Information Systems ,Mathematics - Abstract
For a given source distribution, we establish properties of the conditional density achieving the rate distortion function lower bound as the distortion parameter varies. In the limit as the distortion tolerated goes to zero, the conditional density achieving the rate distortion function lower bound becomes degenerate in the sense that the channel it defines becomes error-free. As the permitted distortion increases to its limit, the conditional density achieving the rate distortion function lower bound defines a channel which no longer depends on the source distribution. In addition to the data compression motivation, we establish two results-one asymptotic, one nonasymptotic-showing that the the conditional densities achieving the rate distortion function lower bound make relatively weak assumptions on the dependence between the source and its representation. This corresponds, in Bayes estimation, to choosing a likelihood which makes relatively weak assumptions on the data generating mechanism if the source is regarded as a prior. Taken together, these results suggest one can use the conditional density obtained from the rate distortion function in data analysis. That is, when it is impossible to identify a "true" parametric family on the basis of physical modeling, our results provide both data compression and channel coding justification for using the conditional density achieving the rate distortion function lower bound as a likelihood.
- Published
- 1999
- Full Text
- View/download PDF
38. Asymptotics of the Expected Posterior
- Author
-
Bertrand Clarke and Dongchu Sun
- Subjects
Statistics and Probability ,Kullback–Leibler divergence ,Expected value ,Square (algebra) ,Term (time) ,Combinatorics ,symbols.namesake ,Exponential family ,Calculus ,symbols ,Fisher information ,Asymptotic expansion ,Confidence and prediction bands ,Mathematics - Abstract
Let (X1,...,X,...,Xn) be independently and identically distributed observations from an exponential family pθ equipped with a smooth prior density w on a real d-dimensional parameter θ. We give conditions under which the expected value of the posterior density evaluated at the true value of the parameter, θ0, admits an asymptotic expansion in terms of the Fisher information I(θ0), the prior w, and their first two derivatives. The leading term of the expansion is of the form nd/2c1(d, θ0) and the second order term is of the form n4/2-1c2(d, θ0>, w), with an error term that is o(nd/2-1). We identify the functions c1 and c2 explicitly. A modification of the proof of this expansion gives an analogous result for the expectation of the square of the posterior evaluated at θ0. As a consequence we can give a confidence band for the expected posterior, and we suggest a frequentist refinement for Bayesian testing.
- Published
- 1999
- Full Text
- View/download PDF
39. Asymptotic normality of the posterior in relative entropy
- Author
-
Bertrand Clarke
- Subjects
Independent and identically distributed random variables ,Kullback–Leibler divergence ,Computational complexity theory ,Asymptotic distribution ,Library and Information Sciences ,Computer Science Applications ,Sample size determination ,Statistics ,Maximum entropy probability distribution ,Entropy (information theory) ,Applied mathematics ,Entropy rate ,Information Systems ,Mathematics - Abstract
We show that the relative entropy between a posterior density formed from a smooth likelihood and prior and a limiting normal form tends to zero in the independent and identically distributed case. The mode of convergence is in probability and in mean. Applications to code lengths in stochastic complexity and to sample size selection are discussed.
- Published
- 1999
- Full Text
- View/download PDF
40. Designing Metabolism: Alternative Connectivities for the Pentose Phosphate Pathway
- Author
-
Alexander Scheeline, Ao Yuan, Bertrand Clarke, and Jay E. Mittenthal
- Subjects
Pharmacology ,General Mathematics ,General Neuroscience ,Immunology ,Carbon skeleton ,Metabolism ,Pentose phosphate pathway ,Biology ,General Biochemistry, Genetics and Molecular Biology ,Set (abstract data type) ,Metabolic pathway ,Computational Theory and Mathematics ,Biochemistry ,General Agricultural and Biological Sciences ,Biological system ,Flux (metabolism) ,General Environmental Science - Abstract
We present a method for generating alternative biochemical pathways between specified compounds. We systematically generated diverse alternatives to the nonoxidative stage of the pentose phosphate pathway, by first finding pathways between 5-carbon and 6-carbon skeletons. Each solution of the equations for the stoichiometric coefficients of skeleton-changing reactions defines a set of networks. Within each set we selected networks with modules; a module is a coupled set of reactions that occurs more than one in a network. The networks can be classified into at least 53 families in at least seven superfamilies, according to the number, the input-output relations, and the internal structure of their modules. We then assigned classes of enzymes to mediate transformations of carbon skeletons and modifications of functional groups. The ensemble of candidate networks was too large to allow complete determination of the optimal network. However, among the networks we studied the real pathway is especially favorable in several respects. It has few steps, uses no reducing or oxidizing compounds, requires only one ATP in one direction of flux, and does not depend on recurrent inputs.
- Published
- 1998
- Full Text
- View/download PDF
41. On the overall sensitivity of the posterior distribution to its inputs
- Author
-
Paul Gustafson and Bertrand Clarke
- Subjects
Statistics and Probability ,Kullback–Leibler divergence ,Posterior predictive distribution ,Applied Mathematics ,Posterior probability ,Prior probability ,Bayesian probability ,Parametric model ,Statistics ,Statistics, Probability and Uncertainty ,Conjugate prior ,Mathematics ,Parametric statistics - Abstract
In a parametric Bayesian analysis, the posterior distribution of the parameter is determined by three inputs: the prior distribution of the parameter, the model distribution of the data given the parameter, and the data themselves. Working in the framework of two particular families of parametric models with conjugate priors, we develop a method for quantifying the local sensitivity of the posterior to simultaneous perturbations of all three inputs. The method uses relative entropy to measure discrepancies between pairs of posterior distributions, model distributions, and prior distributions. It also requires a measure of discrepancy between pairs of data sets. The fundamental sensitivity measure is taken to be the maximum discrepancy between a baseline posterior and a perturbed posterior, given a constraint on the size of the discrepancy between the baseline set of inputs and the perturbed inputs. We also examine the perturbed inputs which attain this maximum sensitivity, to see how influential the prior, model, and data are relative to one another. An empirical study highlights some interesting connections between sensitivity and the extent to which the data conflict with both the prior and the model.
- Published
- 1998
- Full Text
- View/download PDF
42. Prediction in M-complete Problems with Limited Sample Size
- Author
-
Bertrand Clarke, Jennifer Clarke, and Chi Wai Yu
- Subjects
Statistics and Probability ,Bayes' theorem ,Mean squared error ,Sample size determination ,Applied Mathematics ,Model selection ,Bayesian probability ,Statistics ,Range (statistics) ,Weighted median ,Akaike information criterion ,Mathematics - Abstract
We define a new Bayesian predictor called the posterior weighted median (PWM) and compare its performance to several other predictors including the Bayes model average under squared error loss, the Barbieri-Berger median model predictor, the stacking predictor, and the model average predictor based on Akaike’s information criterion. We argue that PWM generally gives better performance than other predictors over a range of M-complete problems. This range is between the M-closed-M-complete boundary and the M-complete-M-open boundary. Indeed, as a problem gets closer to M-open, it seems that M-complete predictive methods begin to break down. Our comparisons rest on extensive simulations and real data examples. As a separate issue, we introduce the concepts of the ‘Bail out effect’ and the ‘Bail in effect’. These occur when a predictor gives not just poor results but defaults to the simplest model (‘bails out’) or to the most complex model (‘bails in’) on the model list. Either can occur in M-complete problems when the complexity of the data generator is too high for the predictor scheme to accommodate.
- Published
- 2013
- Full Text
- View/download PDF
43. A Markov Model for the Assembly of Heterochromatic Regions in Position Effect Variegation
- Author
-
Ao Yuan, Bertrand Clarke, T. Grigliatti, V. Lloyd, and I. McKay
- Subjects
Statistics and Probability ,Genetics ,Models, Genetic ,General Immunology and Microbiology ,Markov chain ,Heterochromatin ,Applied Mathematics ,Conditional probability ,Statistical model ,General Medicine ,Position-effect variegation ,Biology ,Markov model ,Markov Chains ,General Biochemistry, Genetics and Molecular Biology ,Chromatin ,Genes, Reporter ,Modeling and Simulation ,Variegation (histology) ,Animals ,General Agricultural and Biological Sciences ,Biological system ,Mathematics - Abstract
Here we give a mathematical model for the assembly of heterochromatic regions at the heterochromatin-euchromatin interface in position effect variegation. This probabilistic model predicts the proportions of cells in which a gene is active in cells with one and two variegating chromosomes. The association of heterochromatic proteins to form remodeled chromatin following DNA replication is mainly described by accumulation independent conditional probabilities. These probabilities are conditional on the boundary of the sites to which the proteins can bind; they give the relative attractiveness of the sites to a protein complex chosen at random from a pool of available complexes. The number of complexes available is assumed to be limited and rates of reaction are implicitly modeled by the conditional probabilities. In general, these conditional probabilities are not known, however, they can be experimentally determined. By comparing double variegation situations to single variegation, this model shows that there may be an effect on the expression of reporter genes located near the interfaces due to different sites competing for heterochromatic proteins. In addition, this model suggests that in some cases the attractiveness of sites may change in the presence of other chemical species. Consequently, the model distinguishes between two sorts of data obtained from competition experiments using position effect variegation. The two sorts of data differ as to whether there is a change in the attractiveness of sites in addition to an effect from different sites competing for the same constituents of heterochromatin. Subject to the fact that some of its parameters are not known precisely, this model replicates data from several experiments and can give predictions in other cases.
- Published
- 1996
- Full Text
- View/download PDF
44. Implications of Reference Priors for Prior Information and for Sample Size
- Author
-
Bertrand Clarke
- Subjects
Statistics and Probability ,Kullback–Leibler divergence ,Sample size determination ,Statistics ,Prior probability ,Posterior probability ,Entropy (information theory) ,Asymptotic distribution ,Minification ,Statistics, Probability and Uncertainty ,Prior information ,Mathematics - Abstract
Here we use posterior densities based on relative entropy reference priors for two purposes. The first purpose is to identify data implicit in the use of informative priors. We represent an informative prior as the posterior from an experiment with a known likelihood and a reference prior. Minimizing the relative entropy distance between this posterior and the informative prior over choices of data results in a data set that can be regarded as representative of the information in the informative prior. The second implication from reference priors is obtained by replacing the informative prior with a class of densities from which one might wish to make inferences. For each density in this class, one can obtain a data set that minimizes a relative entropy. The maximum of these sample sizes as the inferential density varies over its class can be used as a guess as to how much data is required for the desired inferences. We bound this sample size above and below by other techniques that permit it to ...
- Published
- 1996
- Full Text
- View/download PDF
45. Information tradeoff
- Author
-
Bertrand Clarke and Larry Wasserman
- Subjects
Statistics and Probability ,Mathematical optimization ,Kullback–Leibler divergence ,Computer science ,Component (UML) ,Prior probability ,Statistics, Probability and Uncertainty ,Parametric family ,Closed-form expression ,Jeffreys prior ,Term (time) - Abstract
A prior may be noninformative for one parameter at the cost of being informative for another parameter. This leads to the idea of tradeoff priors: priors that give up noninformativity for some parameters to achieve noninformativity for others. We propose a general framework where priors are selected by optimizing a functional with two components. The first component formalizes the requirement that the optimal prior be noninformative for the parameter of interest. The second component is a penalty term that forces the optimizing prior to be close to some target prior. Optimizing such a functional results in a parameterized family of priors from which a specific prior may be selected as the tradeoff prior. An important particular example of such functionals is provided by choosing the first term to be the marginal missing information for the parameter of interest (generalizing Bernardo’s notion of missing information) and the second term to be the relative entropy between the unknown prior and the Jeffreys prior. In this case we find a closed form expression for the tradeoff prior and we make explicit connections with the Berger-Bernardo prior. In particular, we show that under certain conditions, the Berger-Bernardo prior and the Jeffreys prior are special cases of the tradeoff prior. We consider several examples.
- Published
- 1995
- Full Text
- View/download PDF
46. Comment on Article by Sancetta
- Author
-
Bertrand Clarke
- Subjects
Statistics and Probability ,Applied Mathematics - Published
- 2012
- Full Text
- View/download PDF
47. Jeffreys' prior is asymptotically least favorable under entropy risk
- Author
-
Bertrand Clarke and Andrew R. Barron
- Subjects
Statistics and Probability ,Independent and identically distributed random variables ,Kullback–Leibler divergence ,Applied Mathematics ,Principle of maximum entropy ,symbols.namesake ,Bayes' theorem ,Prior probability ,Statistics ,symbols ,Entropy (information theory) ,Applied mathematics ,Statistics, Probability and Uncertainty ,Fisher information ,Jeffreys prior ,Mathematics - Abstract
We provide a rigorous proof that Jeffreys’ prior asymptotically maximizes Shannon’s mutual information between a sample of size n and the parameter. This was conjectured by Bernard0 (1979) and, despite the absence of a proof, forms the basis of the reference prior method in Bayesian statistical analysis. Our proof rests on an examination of large sample decision theoretic properties associated with the relative entropy or the Kullback-Leibler distance between probability density functions for independent and identically distributed random variables. For smooth finite-dimensional parametric families we derive an asymptotic expression for the minimax risk and for the related maximin risk. As a result, we show that, among continuous positive priors, Jeffreys’ prior uniquely achieves the asymptotic maximin value. In the discrete parameter case we show that, asymptotically, the Bayes risk reduces to the entropy of the prior so that the reference prior is seen to be the maximum entropy prior. We identify the physical significance of the risks by giving two information-theoretic interpretations in terms of probabilistic coding. AMS Subject Class$caGon: Primary 62C10, 62C20; secondary 62F12, 62F15.
- Published
- 1994
- Full Text
- View/download PDF
48. Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction*
- Author
-
Bertrand Clarke, Nicholas G. Polson, C. Severinski, and James G. Scott
- Subjects
Mathematical optimization ,Location parameter ,Regularization (physics) ,Prior probability ,Standard normal table ,Lévy process ,Bayesian interpretation of regularization ,Algorithm ,Shrinkage ,Mathematics - Abstract
We study the classic problem of choosing a prior distribution for a location parameter β = (β1, . . . , βp) as p grows large. First, we study the standard “global-local shrinkage” approach, based on scale mixtures of normals. Two theorems are presented which characterize certain desirable properties of shrinkage priors for sparse problems. Next, we review some recent results showing how Levy processes can be used to generate infinite-dimensional versions of standard normal scale-mixture priors, along with new priors that have yet to be seriously studied in the literature. This approach provides an intuitive framework both for generating new regularization penalties and shrinkage rules, and for performing asymptotic analysis on existing models.
- Published
- 2011
- Full Text
- View/download PDF
49. Noninformative Priors and Nuisance Parameters
- Author
-
Bertrand Clarke and Larry Wasserman
- Subjects
Statistics and Probability ,Degenerate energy levels ,Prior probability ,Posterior probability ,Statistics ,Nuisance parameter ,Statistics, Probability and Uncertainty ,Marginal distribution ,Constant (mathematics) ,Term (time) ,Mathematics ,Jeffreys prior - Abstract
We study the conflict between priors that are noninformative for a parameter of interest versus priors that are noninformative for the whole parameter. Our investigation leads us to maximize a functional that has two terms: an asymptotic approximation to a standardized expected Kullback-Leibler distance between the marginal prior and marginal posterior for a parameter of interest, and a penalty term measuring the distance of the prior from the Jeffreys prior. A positive constant multiplying the second terms determines the tradeoff between noninformativity for the parameter of interest and noninformativity for the entire parameter. As the constant increases, the prior tends to the Jeffreys prior. When the constant tends to 0, the prior becomes degenerate except in special cases. This prior does not have a closed-form solution, but we present a simple, numerical algorithm for finding the prior. We compare this prior to the Berger-Bernardo prior.
- Published
- 1993
- Full Text
- View/download PDF
50. Prequential Analysis of Complex Data with Adaptive Model Reselection
- Author
-
Jennifer, Clarke and Bertrand, Clarke
- Subjects
Article - Abstract
In Prequential analysis, an inference method is viewed as a forecasting system, and the quality of the inference method is based on the quality of its predictions. This is an alternative approach to more traditional statistical methods that focus on the inference of parameters of the data generating distribution. In this paper, we introduce adaptive combined average predictors (ACAPs) for the Prequential analysis of complex data. That is, we use convex combinations of two different model averages to form a predictor at each time step in a sequence. A novel feature of our strategy is that the models in each average are re-chosen adaptively at each time step. To assess the complexity of a given data set, we introduce measures of data complexity for continuous response data. We validate our measures in several simulated contexts prior to using them in real data examples. The performance of ACAPs is compared with the performances of predictors based on stacking or likelihood weighted averaging in several model classes and in both simulated and real data sets. Our results suggest that ACAPs achieve a better trade off between model list bias and model list variability in cases where the data is very complex. This implies that the choices of model class and averaging method should be guided by a concept of complexity matching, i.e. the analysis of a complex data set may require a more complex model class and averaging strategy than the analysis of a simpler data set. We propose that complexity matching is akin to a bias–variance tradeoff in statistical modeling.
- Published
- 2010
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.