18 results on '"Girolami, Mark"'
Search Results
2. Statistical analysis of differential equations: introducing probability measures on numerical solutions.
- Author
-
Conrad, Patrick, Girolami, Mark, Särkkä, Simo, Stuart, Andrew, and Zygalakis, Konstantinos
- Abstract
In this paper, we present a formal quantification of uncertainty induced by numerical solutions of ordinary and partial differential equation models. Numerical solutions of differential equations contain inherent uncertainties due to the finite-dimensional approximation of an unknown and implicitly defined function. When statistically analysing models based on differential equations describing physical, or other naturally occurring, phenomena, it can be important to explicitly account for the uncertainty introduced by the numerical method. Doing so enables objective determination of this source of uncertainty, relative to other uncertainties, such as those caused by data contaminated with noise or model error induced by missing physical or inadequate descriptors. As ever larger scale mathematical models are being used in the sciences, often sacrificing complete resolution of the differential equation on the grids used, formally accounting for the uncertainty in the numerical method is becoming increasingly more important. This paper provides the formal means to incorporate this uncertainty in a statistical model and its subsequent analysis. We show that a wide variety of existing solvers can be randomised, inducing a probability measure over the solutions of such differential equations. These measures exhibit contraction to a Dirac measure around the true unknown solution, where the rates of convergence are consistent with the underlying deterministic numerical method. Furthermore, we employ the method of modified equations to demonstrate enhanced rates of convergence to stochastic perturbations of the original deterministic problem. Ordinary differential equations and elliptic partial differential equations are used to illustrate the approach to quantify uncertainty in both the statistical analysis of the forward and inverse problems. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
3. Probabilistic Model Checking of DTMC Models of User Activity Patterns.
- Author
-
Andrei, Oana, Calder, Muffy, Higgs, Matthew, and Girolami, Mark
- Published
- 2014
- Full Text
- View/download PDF
4. Bayesian Approaches for Mechanistic Ion Channel Modeling.
- Author
-
Calderhead, Ben, Epstein, Michael, Sivilotti, Lucia, and Girolami, Mark
- Published
- 2013
- Full Text
- View/download PDF
5. Using Higher-Order Dynamic Bayesian Networks to Model Periodic Data from the Circadian Clock of Arabidopsis Thaliana.
- Author
-
Daly, Rónán, Edwards, Kieron D., O΄Neill, John S., Aitken, Stuart, Millar, Andrew J., and Girolami, Mark
- Abstract
Modelling gene regulatory networks in organisms is an important task that has recently become possible due to large scale assays using technologies such as microarrays. In this paper, the circadian clock of Arabidopsis thaliana is modelled by fitting dynamic Bayesian networks to luminescence data gathered from experiments. This work differs from previous modelling attempts by using higher-order dynamic Bayesian networks to explicitly model the time lag between the various genes being expressed. In order to achieve this goal, new techniques in preprocessing the data and in evaluating a learned model are proposed. It is shown that it is possible, to some extent, to model these time delays using a higher-order dynamic Bayesian network. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
6. Classification of Protein Interaction Sentences via Gaussian Processes.
- Author
-
Polajnar, Tamara, Rogers, Simon, and Girolami, Mark
- Abstract
The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and naïve Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
7. Definition of Valid Proteomic Biomarkers: A Bayesian Solution.
- Author
-
Harris, Keith, Girolami, Mark, and Mischak, Harald
- Abstract
Clinical proteomics is suffering from high hopes generated by reports on apparent biomarkers, most of which could not be later substantiated via validation. This has brought into focus the need for improved methods of finding a panel of clearly defined biomarkers. To examine this problem, urinary proteome data was collected from healthy adult males and females, and analysed to find biomarkers that differentiated between genders. We believe that models that incorporate sparsity in terms of variables are desirable for biomarker selection, as proteomics data typically contains a huge number of variables (peptides) and few samples making the selection process potentially unstable. This suggests the application of a two-level hierarchical Bayesian probit regression model for variable selection which assumes a prior that favours sparseness. The classification performance of this method is shown to improve that of the Probabilistic K-Nearest Neighbour model. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
8. Inferring Meta-covariates in Classification.
- Author
-
Harris, Keith, McMillan, Lisa, and Girolami, Mark
- Abstract
This paper develops an alternative method for gene selection that combines model based clustering and binary classification. By averaging the covariates within the clusters obtained from model based clustering, we define ˵meta-covariates″ and use them to build a probit regression model, thereby selecting clusters of similarly behaving genes, aiding interpretation. This simultaneous learning task is accomplished by an EM algorithm that optimises a single likelihood function which rewards good performance at both classification and clustering. We explore the performance of our methodology on a well known leukaemia dataset and use the Gene Ontology to interpret our results. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
9. Semi-supervised Prediction of Protein Interaction Sentences Exploiting Semantically Encoded Metrics.
- Author
-
Polajnar, Tamara and Girolami, Mark
- Abstract
Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
10. Class Prediction from Disparate Biological Data Sources Using an Iterative Multi-Kernel Algorithm.
- Author
-
Ying, Yiming, Campbell, Colin, Damoulas, Theodoros, and Girolami, Mark
- Abstract
For many biomedical modelling tasks a number of different types of data may influence predictions made by the model. An established approach to pursuing supervised learning with multiple types of data is to encode these different types of data into separate kernels and use multiple kernel learning. In this paper we propose a simple iterative approach to multiple kernel learning (MKL), focusing on multi-class classification. This approach uses a block L
1 -regularization term leading to a jointly convex formulation. It solves a standard multi-class classification problem for a single kernel, and then updates the kernel combinatorial coefficients based on mixed RKHS norms. As opposed to other MKL approaches, our iterative approach delivers a largely ignored message that MKL does not require sophisticated optimization methods while keeping competitive training times and accuracy across a variety of problems. We show that the proposed method outperforms state-of-the-art results on an important protein fold prediction dataset and gives competitive performance on a protein subcellular localization task. [ABSTRACT FROM AUTHOR]- Published
- 2009
- Full Text
- View/download PDF
11. Population MCMC methods for history matching and uncertainty quantification.
- Author
-
Mohamed, Linah, Calderhead, Ben, Filippone, Maurizio, Christie, Mike, and Girolami, Mark
- Abstract
This paper presents the application of a population Markov Chain Monte Carlo (MCMC) technique to generate history-matched models. The technique has been developed and successfully adopted in challenging domains such as computational biology but has not yet seen application in reservoir modelling. In population MCMC, multiple Markov chains are run on a set of response surfaces that form a bridge from the prior to posterior. These response surfaces are formed from the product of the prior with the likelihood raised to a varying power less than one. The chains exchange positions, with the probability of a swap being governed by a standard Metropolis accept/reject step, which allows for large steps to be taken with high probability. We show results of Population MCMC on the IC Fault Model-a simple three-parameter model that is known to have a highly irregular misfit surface and hence be difficult to match. Our results show that population MCMC is able to generate samples from the complex, multi-modal posterior probability distribution of the IC Fault model very effectively. By comparison, previous results from stochastic sampling algorithms often focus on only part of the region of high posterior probability depending on algorithm settings and starting points. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
12. Semi-parametric analysis of multi-rater data.
- Author
-
Rogers, Simon, Girolami, Mark, and Polajnar, Tamara
- Abstract
Datasets that are subjectively labeled by a number of experts are becoming more common in tasks such as biological text annotation where class definitions are necessarily somewhat subjective. Standard classification and regression models are not suited to multiple labels and typically a pre-processing step (normally assigning the majority class) is performed. We propose Bayesian models for classification and ordinal regression that naturally incorporate multiple expert opinions in defining predictive distributions. The models make use of Gaussian process priors, resulting in great flexibility and particular suitability to text based problems where the number of covariates can be far greater than the number of data instances. We show that using all labels rather than just the majority improves performance on a recent biological dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
13. Infinite factorization of multiple non-parametric views.
- Author
-
Rogers, Simon, Klami, Arto, Sinkkonen, Janne, Girolami, Mark, and Kaski, Samuel
- Subjects
FACTORIZATION ,MULTIVARIATE analysis ,DIRICHLET principle ,UNIVERSAL algebra ,CONTINGENCY tables ,PROTEINS - Abstract
Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale to the generative and non-parametric clustering setting by introducing a novel non-parametric hierarchical mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the sources. The commonalities between the sources are modeled by an infinite component model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views. We discover complex relationships between the marginals (that are multimodal in both marginals) that would remain undetected by simpler models. Cluster analysis of co-expression is a standard method of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
14. Sequential Activity Profiling: Latent Dirichlet Allocation of Markov Chains.
- Author
-
Girolami, Mark and Kabán, Ata
- Subjects
MARKOV processes ,ALGORITHMS ,STOCHASTIC processes ,DIRICHLET forms ,MATHEMATICAL forms - Abstract
To provide a parsimonious generative representation of the sequential activity of a number of individuals within a population there is a necessary tradeoff between the definition of individual specific and global representations. A linear-time algorithm is proposed that defines a distributed predictive model for finite state symbolic sequences which represent the traces of the activity of a number of individuals within a group. The algorithm is based on a straightforward generalization of latent Dirichlet allocation to time-invariant Markov chains of arbitrary order. The modelling assumption made is that the possibly heterogeneous behavior of individuals may be represented by a relatively small number of simple and common behavioral traits which may interleave randomly according to an individual-specific distribution. The results of an empirical study on three different application domains indicate that this modelling approach provides an efficient low-complexity and intuitively interpretable representation scheme which is reflected by improved prediction performance over comparable models. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
15. Kernel PCA for Feature Extraction and De-Noising in Nonlinear Regression.
- Author
-
Rosipal, Roman, Girolami, Mark, Trejo, Leonard J., and Cichocki, Andrzej
- Subjects
NONLINEAR systems ,REGRESSION analysis ,TIME series analysis - Abstract
In this paper, we propose the application of the Kernel Principal Component Analysis (PCA) technique for feature selection in a high-dimensional feature space, where input variables are mapped by a Gaussian kernel. The extracted features are employed in the regression problems of chaotic Mackey–Glass time-series prediction in a noisy environment and estimating human signal detection performance from brain event-related potentials elicited by task relevant signals. We compared results obtained using either Kernel PCA or linear PCA as data preprocessing steps. On the human signal detection task, we report the superiority of Kernel PCA feature extraction over linear PCA. Similar to linear PCA, we demonstrate de-noising of the original data by the appropriate selection of various nonlinear principal components. The theoretical relation and experimental comparison of Kernel Principal Components Regression, Kernel Ridge Regression and ε-insensitive Support Vector Regression is also provided. [ABSTRACT FROM AUTHOR]
- Published
- 2001
- Full Text
- View/download PDF
16. The Latent Variable Data Model for Exploratory Data Analysis and Visualisation: A Generalisation of the Nonlinear Infomax Algorithm.
- Author
-
Girolami, Mark
- Abstract
This paper presents a generalisation of the nonlinear 'Infomax' algorithm based on the linear latent variable model of factor analysis. The algorithm is based on an information theoretic index for projection pursuit which defines linear projections of observed data onto subspaces of lower dimension. This is applied to the visualisation and interpretation of complex high dimensional data and is empirically compared with the recently developed Generative Topographic Mapping. [ABSTRACT FROM AUTHOR]
- Published
- 1998
- Full Text
- View/download PDF
17. A temporal model of linear anti-Hebbian learning.
- Author
-
Girolami, Mark and Fyfe, Colin
- Abstract
A temporal variant of Foldiak's first model with lateral inhibitory synaptic weights is proposed. The usual symmetric scalar values of the lateral weights are replaced with data driven asymmetric memory based lateral weights, which take the form of Finite Impulse Response (FIR) coefficients. Linear anti-Hebbian learning, as defined by Foldiak (IEEE/INNS International Joint Conference on Neural Networks, 1989) and Matsuoka et al. (Neural Networks, Vol. 8, pp. 411-419, 1995), is employed in the self-organisation of the network weights. The temporal anti-Hebbian learning, when applied to the separation of convolved mixtures of signals, causes the network weights to converge to the truncated FIR filter coefficients of the unmixing transfer function and so recover the original signals. Simulation results are presented for separating two natural speech sources convolved and mixed by a priori unknown direct and cross-coupled transfer functions. We compare temporal anti-Hebbian learning with information maximisation learning when applied to the blind separation of convolved sources. [ABSTRACT FROM AUTHOR]
- Published
- 1996
- Full Text
- View/download PDF
18. Analysis of free text in electronic health records for identification of cancer patient trajectories.
- Author
-
Jensen, Kasper, Soguero-Ruiz, Cristina, Oyvind Mikalsen, Karl, Lindsetmo, Rolv-Ole, Kouskoumvekaki, Irene, Girolami, Mark, Olav Skrovseth, Stein, and Magne Augestad, Knut
- Abstract
With an aging patient population and increasing complexity in patient disease trajectories, physicians are often met with complex patient histories from which clinical decisions must be made. Due to the increasing rate of adverse events and hospitals facing financial penalties for readmission, there has never been a greater need to enforce evidence-led medical decision-making using available health care data. In the present work, we studied a cohort of 7,741 patients, of whom 4,080 were diagnosed with cancer, surgically treated at a University Hospital in the years 2004-2012. We have developed a methodology that allows disease trajectories of the cancer patients to be estimated from free text in electronic health records (EHRs). By using these disease trajectories, we predict 80% of patient events ahead in time. By control of confounders from 8326 quantified events, we identified 557 events that constitute high subsequent risks (risk > 20%), including six events for cancer and seven events for metastasis. We believe that the presented methodology and findings could be used to improve clinical decision support and personalize trajectories, thereby decreasing adverse events and optimizing cancer treatment. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.