Author: "Oberhauser, Harald" - Searchworks@Jio Institute Digital Library Search Results

1. A User's Guide to $\texttt{KSig}$: GPU-Accelerated Computation of the Signature Kernel

Author: Tóth, Csaba, Cruz, Danilo Jr Dela, and Oberhauser, Harald
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: The signature kernel is a positive definite kernel for sequential and temporal data that has become increasingly popular in machine learning applications due to powerful theoretical guarantees, strong empirical performance, and recently introduced various scalable variations. In this chapter, we give a short introduction to $\texttt{KSig}$, a $\texttt{Scikit-Learn}$ compatible Python package that implements various GPU-accelerated algorithms for computing signature kernels, and performing downstream learning tasks. We also introduce a new algorithm based on tensor sketches which gives strong performance compared to existing algorithms. The package is available at https://github.com/tgcsaba/ksig.
Published: 2025

2. Learning to Forget: Bayesian Time Series Forecasting using Recurrent Sparse Spectrum Signature Gaussian Processes

Author: Tóth, Csaba, Adachi, Masaki, Osborne, Michael A., and Oberhauser, Harald
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: The signature kernel is a kernel between time series of arbitrary length and comes with strong theoretical guarantees from stochastic analysis. It has found applications in machine learning such as covariance functions for Gaussian processes. A strength of the underlying signature features is that they provide a structured global description of a time series. However, this property can quickly become a curse when local information is essential and forgetting is required; so far this has only been addressed with ad-hoc methods such as slicing the time series into subsegments. To overcome this, we propose a principled, data-driven approach by introducing a novel forgetting mechanism for signatures. This allows the model to dynamically adapt its context length to focus on more recent information. To achieve this, we revisit the recently introduced Random Fourier Signature Features, and develop Random Fourier Decayed Signature Features (RFDSF) with Gaussian processes (GPs). This results in a Bayesian time series forecasting algorithm with variational inference, that offers a scalable probabilistic algorithm that processes and transforms a time series into a joint predictive distribution over time steps in one pass using recurrence. For example, processing a sequence of length $10^4$ steps in $\approx 10^{-2}$ seconds and in $< 1\text{GB}$ of GPU memory. We demonstrate that it outperforms other GP-based alternatives and competes with state-of-the-art probabilistic time series forecasting algorithms.
Published: 2024

3. A Quadrature Approach for General-Purpose Batch Bayesian Optimization via Probabilistic Lifting

Author: Adachi, Masaki, Hayakawa, Satoshi, Jørgensen, Martin, Hamid, Saad, Oberhauser, Harald, and Osborne, Michael A.
Subjects: Computer Science - Machine Learning, Mathematics - Numerical Analysis, Statistics - Machine Learning, 62C10, 62F15
Abstract: Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian optimisation via probabilistic lifting with kernel quadrature, called SOBER, which we present as a Python library based on GPyTorch/BoTorch. Our framework offers the following unique benefits: (1) Versatility in downstream tasks under a unified approach. (2) A gradient-free sampler, which does not require the gradient of acquisition functions, offering domain-agnostic sampling (e.g., discrete and mixed variables, non-Euclidean space). (3) Flexibility in domain prior distribution. (4) Adaptive batch size (autonomous determination of the optimal batch size). (5) Robustness against a misspecified reproducing kernel Hilbert space. (6) Natural stopping criterion., Comment: This work is the journal extension of the workshop paper (arXiv:2301.11832) and AISTATS paper (arXiv:2306.05843). 48 pages, 11 figures
Published: 2024

4. Random Fourier Signature Features

Author: Toth, Csaba, Oberhauser, Harald, and Szabo, Zoltan
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Methodology
Abstract: Tensor algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel accompanied with attractive theoretical guarantees from stochastic analysis. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. To mitigate this severe computational bottleneck, we develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences. We show uniform approximation guarantees for the proposed unbiased estimator of the signature kernel, while keeping its computation linear in the sequence length and number. In addition, combined with recent advances on tensor projections, we derive two even more scalable time series features with favourable concentration properties and computational complexity both in time and memory. Our empirical results show that the reduction in computational cost comes at a negligible price in terms of accuracy on moderate-sized datasets, and it enables one to scale to large datasets up to a million time series.
Published: 2023

5. Random Surfaces and Higher Algebra

Author: Lee, Darrick and Oberhauser, Harald
Subjects: Mathematics - Probability, Mathematics - Algebraic Topology, Mathematics - Category Theory, Mathematics - Differential Geometry
Abstract: The space of surfaces $\mathbf{X}: [0,1]^2 \to \mathbb{R}^d$ is equipped with natural horizontal and vertical concatenation operations, and we study representations of surfaces which preserve this algebraic structure. The framework of higher category theory allows us to formalize this notion in terms of a functor between double groupoids. We construct such functors by generalizing the classical concept of the path-ordered exponential as a nonabelian path integral. This generalization yields a notion of nonabelian surface integration, and is known in higher gauge theory as surface holonomy (or higher parallel transport). We extend these constructions from smooth surfaces to bounded controlled $p$-variation surfaces in the Young regime, with $p < 2$. By focusing on surface holonomy valued in higher generalizations of the classical matrix groups, called matrix double groups, we obtain explicit computations for piecewise linear approximations. Furthermore, we generalize the nonabelian Stokes' and Fubini theorems to the Young regime. Finally, we use these matrix surface holonomy functors to construct characteristic functions for random surfaces such as fractional Brownian sheets with Hurst parameter $h> \frac{1}{2}$., Comment: 102 pages (56 pages main text) - minor changes (rewritten introduction and improved presentation)
Published: 2023

6. HADES: Fast Singularity Detection with Local Measure Comparison

Author: Lim, Uzu, Oberhauser, Harald, and Nanda, Vidit
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Algebraic Topology, Mathematics - Differential Geometry, Mathematics - Statistics Theory, 55N31, 32S50
Abstract: We introduce Hades, an unsupervised algorithm to detect singularities in data. This algorithm employs a kernel goodness-of-fit test, and as a consequence it is much faster and far more scaleable than the existing topology-based alternatives. Using tools from differential geometry and optimal transport theory, we prove that Hades correctly detects singularities with high probability when the data sample lives on a transverse intersection of equidimensional manifolds. In computational experiments, Hades recovers singularities in synthetically generated data, branching points in road network data, intersection rings in molecular conformation space, and anomalies in image data.
Published: 2023

7. Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

Author: Adachi, Masaki, Hayakawa, Satoshi, Jørgensen, Martin, Wan, Xingchen, Nguyen, Vu, Oberhauser, Harald, and Osborne, Michael A.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Numerical Analysis, Statistics - Computation, Statistics - Machine Learning, 62C10, 62F15
Abstract: Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more costly, smaller batches lead to slower wall-clock run-times -- and the trade-off may change over the run (larger batches are often preferable earlier). To address this trade-off, we propose a novel Probabilistic Numerics framework that adaptively changes batch sizes. By framing batch selection as a quadrature task, our integration-error-aware algorithm facilitates the automatic tuning of batch sizes to meet predefined quadrature precision objectives, akin to how typical optimizers terminate based on convergence thresholds. This approach obviates the necessity for exhaustive searches across all potential batch sizes. We also extend this to scenarios with constrained active learning and constrained optimization, interpreting constraint violations as reductions in the precision requirement, to subsequently adapt batch construction. Through extensive experiments, we demonstrate that our approach significantly enhances learning efficiency and flexibility in diverse Bayesian batch active learning and Bayesian optimization applications., Comment: Accepted at AISTATS 2024. 33 pages, 6 figures
Published: 2023

8. The Signature Kernel

Author: Lee, Darrick and Oberhauser, Harald
Subjects: Mathematics - Probability, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The signature kernel is a positive definite kernel for sequential data. It inherits theoretical guarantees from stochastic analysis, has efficient algorithms for computation, and shows strong empirical performance. In this short survey paper for a forthcoming Springer handbook, we give an elementary introduction to the signature kernel and highlight these theoretical and computational properties., Comment: 31 pages, 2 figures
Published: 2023

9. Kernelized Cumulants: Beyond Kernel Mean Embeddings

Author: Bonnier, Patric, Oberhauser, Harald, and Szabó, Zoltán
Subjects: Statistics - Machine Learning, Computer Science - Information Theory, Computer Science - Machine Learning
Abstract: In $\mathbb R^d$, it is well-known that cumulants provide an alternative to moments that can achieve the same goals with numerous benefits such as lower variance estimators. In this paper we extend cumulants to reproducing kernel Hilbert spaces (RKHS) using tools from tensor algebras and show that they are computationally tractable by a kernel trick. These kernelized cumulants provide a new set of all-purpose statistics; the classical maximum mean discrepancy and Hilbert-Schmidt independence criterion arise as the degree one objects in our general construction. We argue both theoretically and empirically (on synthetic, environmental, and traffic data analysis) that going beyond degree one has several advantages and can be achieved with the same computational complexity and minimal overhead in our experiments., Comment: 19 pages, 8 figures
Published: 2023

10. SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces

Author: Adachi, Masaki, Hayakawa, Satoshi, Hamid, Saad, Jørgensen, Martin, Oberhauser, Harald, and Osborne, Micheal A.
Subjects: Computer Science - Machine Learning, Mathematics - Numerical Analysis, Statistics - Computation, Statistics - Machine Learning, 62C10, 62F15
Abstract: Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks., Comment: 34 pages, 12 figures
Published: 2023

11. Sampling-based Nystr\'om Approximation and Kernel Quadrature

Author: Hayakawa, Satoshi, Oberhauser, Harald, and Lyons, Terry
Subjects: Mathematics - Numerical Analysis, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We analyze the Nystr\"om approximation of a positive definite kernel associated with a probability measure. We first prove an improved error bound for the conventional Nystr\"om approximation with i.i.d. sampling and singular-value decomposition in the continuous regime; the proof techniques are borrowed from statistical learning theory. We further introduce a refined selection of subspaces in Nystr\"om approximation with theoretical guarantees that is applicable to non-i.i.d. landmark points. Finally, we discuss their application to convex kernel quadrature and give novel theoretical guarantees as well as numerical observations., Comment: 22 pages, ICML 2023 camera-ready version. Typos fixed
Published: 2023

12. Hypercontractivity Meets Random Convex Hulls: Analysis of Randomized Multivariate Cubatures

Author: Hayakawa, Satoshi, Oberhauser, Harald, and Lyons, Terry
Subjects: Mathematics - Probability, Mathematics - Numerical Analysis
Abstract: Given a probability measure $\mu$ on a set $\mathcal{X}$ and a vector-valued function $\varphi$, a common problem is to construct a discrete probability measure on $\mathcal{X}$ such that the push-forward of these two probability measures under $\varphi$ is the same. This construction is at the heart of numerical integration methods that run under various names such as quadrature, cubature, or recombination. A natural approach is to sample points from $\mu$ until their convex hull of their image under $\varphi$ includes the mean of $\varphi$. Here we analyze the computational complexity of this approach when $\varphi$ exhibits a graded structure by using so-called hypercontractivity. The resulting theorem not only covers the classical cubature case of multivariate polynomials, but also integration on pathspace, as well as kernel quadrature for product measures., Comment: 20 pages
Published: 2022
Full Text: View/download PDF

14. Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination

Author: Adachi, Masaki, Hayakawa, Satoshi, Jørgensen, Martin, Oberhauser, Harald, and Osborne, Michael A.
Subjects: Computer Science - Machine Learning, Mathematics - Numerical Analysis, Statistics - Computation, Statistics - Machine Learning, 62C10, 62F15
Abstract: Calculation of Bayesian posteriors and model evidences typically requires numerical integration. Bayesian quadrature (BQ), a surrogate-model-based approach to numerical integration, is capable of superb sample efficiency, but its lack of parallelisation has hindered its practical applications. In this work, we propose a parallelised (batch) BQ method, employing techniques from kernel quadrature, that possesses an empirically exponential convergence rate. Additionally, just as with Nested Sampling, our method permits simultaneous inference of both posteriors and model evidence. Samples from our BQ surrogate model are re-selected to give a sparse set of samples, via a kernel recombination algorithm, requiring negligible additional time to increase the batch size. Empirically, we find that our approach significantly outperforms the sampling efficiency of both state-of-the-art BQ techniques and Nested Sampling in various real-world datasets, including lithium-ion battery analytics., Comment: 38 pages, 6 figures
Published: 2022

15. Capturing Graphs with Hypo-Elliptic Diffusions

Author: Toth, Csaba, Lee, Darrick, Hacker, Celia, and Oberhauser, Harald
Subjects: Computer Science - Machine Learning, Mathematics - Probability
Abstract: Convolutional layers within graph neural networks operate by aggregating information about local neighbourhood structures; one common way to encode such substructures is through random walks. The distribution of these random walks evolves according to a diffusion equation defined using the graph Laplacian. We extend this approach by leveraging classic mathematical results about hypo-elliptic diffusions. This results in a novel tensor-valued graph operator, which we call the hypo-elliptic graph Laplacian. We provide theoretical guarantees and efficient low-rank approximation algorithms. In particular, this gives a structured approach to capture long-range dependencies on graphs that is robust to pooling. Besides the attractive theoretical properties, our experiments show that this method competes with graph transformers on datasets requiring long-range reasoning but scales only linearly in the number of edges as opposed to quadratically in nodes., Comment: 30 pages
Published: 2022

16. A Topological Approach to Mapping Space Signatures

Author: Giusti, Chad, Lee, Darrick, Nanda, Vidit, and Oberhauser, Harald
Subjects: Mathematics - Functional Analysis, Mathematics - Algebraic Topology, Mathematics - Probability, 60L10, 55U15, 28C20
Abstract: A common approach for describing classes of functions and probability measures on a topological space $\mathcal{X}$ is to construct a suitable map $\Phi$ from $\mathcal{X}$ into a vector space, where linear methods can be applied to address both problems. The case where $\mathcal{X}$ is a space of paths $[0,1] \to \mathbb{R}^n$ and $\Phi$ is the path signature map has received much attention in stochastic analysis and related fields. In this article we develop a generalized $\Phi$ for the case where $\mathcal{X}$ is a space of maps $[0,1]^d \to \mathbb{R}^n$ for any $d \in \mathbb{N}$, and show that the map $\Phi$ generalizes many of the desirable algebraic and analytic properties of the path signature to $d \ge 2$. The key ingredient to our approach is topological; in particular, our starting point is a generalisation of K-T Chen's path space cochain construction to the setting of cubical mapping spaces., Comment: 58 pages, comments welcome!
Published: 2022

17. Proper Scoring Rules, Gradients, Divergences, and Entropies for Paths and Time Series

Author: Bonnier, Patric and Oberhauser, Harald
Subjects: Statistics - Methodology, Mathematics - Probability, Mathematics - Statistics Theory
Abstract: Many forecasts consist not of point predictions but concern the evolution of quantities. For example, a central bank might predict the interest rates during the next quarter, an epidemiologist might predict trajectories of infection rates, a clinician might predict the behaviour of medical markers over the next day, etc. The situation is further complicated since these forecasts sometimes only concern the approximate "shape of the future evolution" or "order of events". Formally, such forecasts can be seen as probability measures on spaces of equivalence classes of paths modulo time-parametrization. We leverage the statistical framework of proper scoring rules with classical mathematical results to derive a principled approach to decision making with such forecasts. In particular, we introduce notions of gradients, entropy, and divergence that are tailor-made to respect the underlying non-Euclidean structure.
Published: 2021

18. Markov Chain Approximations to Stochastic Differential Equations by Recombination on Lattice Trees

Author: Cosentino, Francesco, Oberhauser, Harald, and Abate, Alessandro
Subjects: Mathematics - Probability
Abstract: We revisit the classical problem of approximating a stochastic differential equation by a discrete-time and discrete-space Markov chain. Our construction iterates Caratheodory's theorem over time to match the moments of the increments locally. This allows to construct a Markov chain with a sparse transition matrix where the number of attainable states grows at most polynomially as time increases. Moreover, the MC evolves on a tree whose nodes lie on a "universal lattice" in the sense that an arbitrary number of different SDEs can be approximated on the same tree. The construction is not tailored to specific models, we discuss both the case of uni-variate and multi-variate case SDEs, and provide an implementation and numerical experiments.
Published: 2021

19. Tangent Space and Dimension Estimation with the Wasserstein Distance

Author: Lim, Uzu, Oberhauser, Harald, and Nanda, Vidit
Subjects: Mathematics - Statistics Theory, Computer Science - Machine Learning
Abstract: Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold with high confidence. The algorithm for this estimation is Local PCA, a local version of principal component analysis. Our results accommodate for noisy non-uniform data distribution with the noise that may vary across the manifold, and allow simultaneous estimation at multiple points. Crucially, all of the constants appearing in our bound are explicitly described. The proof uses a matrix concentration inequality to estimate covariance matrices and a Wasserstein distance bound for quantifying nonlinearity of the underlying manifold and non-uniformity of the probability measure., Comment: Main theorems rewritten. Introduction is written more compactly
Published: 2021

20. Positively Weighted Kernel Quadrature via Subsampling

Author: Hayakawa, Satoshi, Oberhauser, Harald, and Lyons, Terry
Subjects: Mathematics - Numerical Analysis, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We study kernel quadrature rules with convex weights. Our approach combines the spectral properties of the kernel with recombination results about point measures. This results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples from the underlying measure and evaluation of the kernel and that result in a small worst-case error. In addition to our theoretical results and the benefits resulting from convex weights, our experiments indicate that this construction can compete with the optimal bounds in well-known examples., Comment: 29 pages, NeurIPS 2022 camera-ready version
Published: 2021

21. Grid-Free Computation of Probabilistic Safety with Malliavin Calculus

Author: Cosentino, Francesco, Oberhauser, Harald, and Abate, Alessandro
Subjects: Mathematics - Probability, Computer Science - Computational Engineering, Finance, and Science
Abstract: This work concerns continuous-time, continuous-space stochastic dynamical systems described by stochastic differential equations (SDE). It presents a new approach to compute probabilistic safety regions, namely sets of initial conditions of the SDE associated to trajectories that are safe with a probability larger than a given threshold. The approach introduces a functional that is minimised at the border of the probabilistic safety region, then solves an optimisation problem using techniques from Malliavin Calculus, which computes such region. Unlike existing results in the literature, the new approach allows one to compute probabilistic safety regions without gridding the state space of the SDE.
Published: 2021

22. Neural SDEs as Infinite-Dimensional GANs

Author: Kidger, Patrick, Foster, James, Li, Xuechen, Oberhauser, Harald, and Lyons, Terry
Subjects: Computer Science - Machine Learning
Abstract: Stochastic differential equations (SDEs) are a staple of mathematical modelling of temporal dynamics. However, a fundamental limitation has been that such models have typically been relatively inflexible, which recent work introducing Neural SDEs has sought to solve. Here, we show that the current classical approach to fitting SDEs may be approached as a special case of (Wasserstein) GANs, and in doing so the neural and classical regimes may be brought together. The input noise is Brownian motion, the output samples are time-evolving paths produced by a numerical solver, and by parameterising a discriminator as a Neural Controlled Differential Equation (CDE), we obtain Neural SDEs as (in modern machine learning parlance) continuous-time generative time series models. Unlike previous work on this problem, this is a direct extension of the classical approach without reference to either prespecified statistics or density functions. Arbitrary drift and diffusions are admissible, so as the Wasserstein loss has a unique global minima, in the infinite data limit any SDE may be learnt. Example code has been made available as part of the \texttt{torchsde} repository., Comment: Published at ICML 2021
Published: 2021

23. Nonlinear Independent Component Analysis for Discrete-Time and Continuous-Time Signals

Author: Schell, Alexander and Oberhauser, Harald
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Probability, 62H25, 62M99, 62H05, 60L10, 62M45, 62R10
Abstract: We study the classical problem of recovering a multidimensional source signal from observations of nonlinear mixtures of this signal. We show that this recovery is possible (up to a permutation and monotone scaling of the source's original component signals) if the mixture is due to a sufficiently differentiable and invertible but otherwise arbitrarily nonlinear function and the component signals of the source are statistically independent with 'non-degenerate' second-order statistics. The latter assumption requires the source signal to meet one of three regularity conditions which essentially ensure that the source is sufficiently far away from the non-recoverable extremes of being deterministic or constant in time. These assumptions, which cover many popular time series models and stochastic processes, allow us to reformulate the initial problem of nonlinear blind source separation as a simple-to-state problem of optimisation-based function approximation. We propose to solve this approximation problem by minimizing a novel type of objective function that efficiently quantifies the mutual statistical dependence between multiple stochastic processes via cumulant-like statistics. This yields a scalable and direct new method for nonlinear Independent Component Analysis with widely applicable theoretical guarantees and for which our experiments indicate good performance., Comment: 89 pages, 10 figures; thoroughly revised presentation (including newly added Sections 2, 10, A.19). To appear in the Annals of Statistics
Published: 2021

24. Estimating the probability that a given vector is in the convex hull of a random sample

Author: Hayakawa, Satoshi, Lyons, Terry, and Oberhauser, Harald
Subjects: Mathematics - Probability, Mathematics - Statistics Theory
Abstract: For a $d$-dimensional random vector $X$, let $p_{n, X}(\theta)$ be the probability that the convex hull of $n$ independent copies of $X$ contains a given point $\theta$. We provide several sharp inequalities regarding $p_{n, X}(\theta)$ and $N_X(\theta)$ denoting the smallest $n$ for which $p_{n, X}(\theta)\ge1/2$. As a main result, we derive the totally general inequality $1/2 \le \alpha_X(\theta)N_X(\theta)\le 3d + 1$, where $\alpha_X(\theta)$ (a.k.a. the Tukey depth) is the minimum probability that $X$ is in a fixed closed halfspace containing the point $\theta$. We also show several applications of our general results: one is a moment-based bound on $N_X(\mathbb{E}[X])$, which is an important quantity in randomized approaches to cubature construction or measure reduction problem. Another application is the determination of the canonical convex body included in a random convex polytope given by independent copies of $X$, where our combinatorial approach allows us to generalize existing results in random matrix community significantly., Comment: 34 pages
Published: 2021
Full Text: View/download PDF

25. The shifted ODE method for underdamped Langevin MCMC

Author: Foster, James, Lyons, Terry, and Oberhauser, Harald
Subjects: Mathematics - Numerical Analysis, Mathematics - Probability, Mathematics - Statistics Theory, 60J22, 60L90, 65C30
Abstract: In this paper, we consider the underdamped Langevin diffusion (ULD) and propose a numerical approximation using its associated ordinary differential equation (ODE). When used as a Markov Chain Monte Carlo (MCMC) algorithm, we show that the ODE approximation achieves a $2$-Wasserstein error of $\varepsilon$ in $\mathcal{O}\big(d^{\frac{1}{3}}/\varepsilon^{\frac{2}{3}}\big)$ steps under the standard smoothness and strong convexity assumptions on the target distribution. This matches the complexity of the randomized midpoint method proposed by Shen and Lee [NeurIPS 2019] which was shown to be order optimal by Cao, Lu and Wang. However, the main feature of the proposed numerical method is that it can utilize additional smoothness of the target log-density $f$. More concretely, we show that the ODE approximation achieves a $2$-Wasserstein error of $\varepsilon$ in $\mathcal{O}\big(d^{\frac{2}{5}}/\varepsilon^{\frac{2}{5}}\big)$ and $\mathcal{O}\big(\sqrt{d}/\varepsilon^{\frac{1}{3}}\big)$ steps when Lipschitz continuity is assumed for the Hessian and third derivative of $f$. By discretizing this ODE using a third order Runge-Kutta method, we can obtain a practical MCMC method that uses just two additional gradient evaluations per step. In our experiment, where the target comes from a logistic regression, this method shows faster convergence compared to other unadjusted Langevin MCMC algorithms.
Published: 2021

26. Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections

Author: Toth, Csaba, Bonnier, Patric, and Oberhauser, Harald
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Sequential data such as time series, video, or text can be challenging to analyse as the ordered structure gives rise to complex dependencies. At the heart of this is non-commutativity, in the sense that reordering the elements of a sequence can completely change its meaning. We use a classical mathematical object -- the tensor algebra -- to capture such dependencies. To address the innate computational complexity of high degree tensors, we use compositions of low-rank tensor projections. This yields modular and scalable building blocks for neural networks that give state-of-the-art performance on standard benchmarks such as multivariate time series classification and generative models for video., Comment: 37 pages, 6 figures, 8 tables
Published: 2020

27. Carath\'eodory Sampling for Stochastic Gradient Descent

Author: Cosentino, Francesco, Oberhauser, Harald, and Abate, Alessandro
Subjects: Computer Science - Machine Learning, Mathematics - Probability, Statistics - Machine Learning
Abstract: Many problems require to optimize empirical risk functions over large data sets. Gradient descent methods that calculate the full gradient in every descent step do not scale to such datasets. Various flavours of Stochastic Gradient Descent (SGD) replace the expensive summation that computes the full gradient by approximating it with a small sum over a randomly selected subsample of the data set that in turn suffers from a high variance. We present a different approach that is inspired by classical results of Tchakaloff and Carath\'eodory about measure reduction. These results allow to replace an empirical measure with another, carefully constructed probability measure that has a much smaller support, but can preserve certain statistics such as the expected gradient. To turn this into scalable algorithms we firstly, adaptively select the descent steps where the measure reduction is carried out; secondly, we combine this with Block Coordinate Descent so that measure reduction can be done very cheaply. This makes the resulting methods scalable to high-dimensional spaces. Finally, we provide an experimental validation and comparison.
Published: 2020

28. A Randomized Algorithm to Reduce the Support of Discrete Measures

Author: Cosentino, Francesco, Oberhauser, Harald, and Abate, Alessandro
Subjects: Computer Science - Machine Learning, Mathematics - Probability, Statistics - Machine Learning
Abstract: Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms and has the same mean when integrated against each of the $n$ functions. If $ N \gg n$ this results in a huge reduction of complexity. We give a simple geometric characterization of barycenters via negative cones and derive a randomized algorithm that computes this new measure by "greedy geometric sampling". We then study its properties, and benchmark it on synthetic and real-world data to show that it can be very beneficial in the $N\gg n$ regime. A Python implementation is available at \url{https://github.com/FraCose/Recombination_Random_Algos}.
Published: 2020

29. Adapted Topologies and Higher Rank Signatures

Author: Bonnier, Patric, Liu, Chong, and Oberhauser, Harald
Subjects: Mathematics - Probability
Abstract: The topology of weak convergence does not account for the growth of information over time that is captured in the filtration of an adapted stochastic process. For example, two adapted stochastic processes can have very similar laws but give completely different results in applications such as optimal stopping, queuing theory, or stochastic programming. To address such discontinuities, Aldous introduced the extended weak topology, and subsequently, Hoover and Keisler showed that both, weak topology and extended weak topology, are just the first two topologies in a sequence of topologies that get increasingly finer. We use higher rank expected signatures to embed adapted processes into graded linear spaces and show that these embeddings induce the adapted topologies of Hoover--Keisler., Comment: 39 pages, 6 figures
Published: 2020

30. Signature Cumulants, Ordered Partitions, and Independence of Stochastic Processes

Author: Bonnier, Patric and Oberhauser, Harald
Subjects: Mathematics - Probability, Mathematics - Statistics Theory
Abstract: The sequence of so-called signature moments describes the laws of many stochastic processes in analogy with how the sequence of moments describes the laws of vector-valued random variables. However, even for vector-valued random variables, the sequence of cumulants is much better suited for many tasks than the sequence of moments. This motivates us to study so-called signature cumulants. To do so, we develop an elementary combinatorial approach and show that in the same way that cumulants relate to the lattice of partitions, signature cumulants relate to the lattice of so-called "ordered partitions". We use this to give a new characterisation of independence of multivariate stochastic processes; finally we construct a family of unbiased minimum-variance estimators of signature cumulants., Comment: 31 pages, 2 figures
Published: 2019
Full Text: View/download PDF

31. Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances

Author: Toth, Csaba and Oberhauser, Harald
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Probability
Abstract: We develop a Bayesian approach to learning from sequential data by using Gaussian processes (GPs) with so-called signature kernels as covariance functions. This allows to make sequences of different length comparable and to rely on strong theoretical results from stochastic analysis. Signatures capture sequential structure with tensors that can scale unfavourably in sequence length and state space dimension. To deal with this, we introduce a sparse variational approach with inducing tensors. We then combine the resulting GP with LSTMs and GRUs to build larger models that leverage the strengths of each of these approaches and benchmark the resulting GPs on multivariate time series (TS) classification datasets. Code available at https://github.com/tgcsaba/GPSig., Comment: Near camera ready version for ICML 2020. Previous title: "Variational Gaussian Processes with Signature Covariances"
Published: 2019

32. A Free Boundary Characterisation of the Root Barrier for Markov Processes

Author: Gassiat, Paul, Oberhauser, Harald, and Zou, Christina Z.
Subjects: Mathematics - Probability, 60G40, 60J45
Abstract: We study the existence, optimality, and construction of non-randomised stopping times that solve the Skorokhod embedding problem (SEP) for Markov processes which satisfy a duality assumption. These stopping times are hitting times of space-time subsets, so-called Root barriers. Our main result is, besides the existence and optimality, a potential-theoretic characterisation of this Root barrier as a free boundary. If the generator of the Markov process is sufficiently regular, this reduces to an obstacle PDE that has the Root barrier as free boundary and thereby generalises previous results from one-dimensional diffusions to Markov processes. However, our characterisation always applies and allows, at least in principle, to compute the Root barrier by dynamic programming, even when the well-posedness of the informally associated obstacle PDE is not clear. Finally, we demonstrate the flexibility of our method by replacing time by an additive functional in Root's construction. Already for multi-dimensional Brownian motion this leads to new class of constructive solutions of (SEP)., Comment: 31 pages, 14 figures
Published: 2019

33. An optimal polynomial approximation of Brownian motion

Author: Foster, James, Lyons, Terry, and Oberhauser, Harald
Subjects: Mathematics - Numerical Analysis, Mathematics - Probability, 41A10, 60J65, 60L90, 65C30
Abstract: In this paper, we will present a strong (or pathwise) approximation of standard Brownian motion by a class of orthogonal polynomials. The coefficients that are obtained from the expansion of Brownian motion in this polynomial basis are independent Gaussian random variables. Therefore it is practical (requires $N$ independent Gaussian coefficients) to generate an approximate sample path of Brownian motion that respects integration of polynomials with degree less than $N$. Moreover, since these orthogonal polynomials appear naturally as eigenfunctions of an integral operator defined by the Brownian bridge covariance function, the proposed approximation is optimal in a certain weighted $L^{2}(\mathbb{P})$ sense. In addition, discretizing Brownian paths as piecewise parabolas gives a locally higher order numerical method for stochastic differential equations (SDEs) when compared to the standard piecewise linear approach. We shall demonstrate these ideas by simulating Inhomogeneous Geometric Brownian Motion (IGBM). This numerical example will also illustrate the deficiencies of the piecewise parabola approximation when compared to a new version of the asymptotically efficient log-ODE (or Castell-Gaines) method., Comment: 27 pages, 8 figures
Published: 2019
Full Text: View/download PDF

34. Signature moments to characterize laws of stochastic processes

Author: Chevyrev, Ilya and Oberhauser, Harald
Subjects: Mathematics - Statistics Theory, Mathematics - Probability, Statistics - Machine Learning, 62M99 (Primary) 60G35, 62M07 (Secondary)
Abstract: The sequence of moments of a vector-valued random variable can characterize its law. We study the analogous problem for path-valued random variables, that is stochastic processes, by using so-called robust signature moments. This allows us to derive a metric of maximum mean discrepancy type for laws of stochastic processes and study the topology it induces on the space of laws of stochastic processes. This metric can be kernelized using the signature kernel which allows to efficiently compute it. As an application, we provide a non-parametric two-sample hypothesis test for laws of stochastic processes., Comment: 42 pages. Restructured the text, changed experiments in final section. To appear in Journal of Machine Learning Research
Published: 2018

35. Persistence paths and signature features in topological data analysis

Author: Chevyrev, Ilya, Nanda, Vidit, and Oberhauser, Harald
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Probability, Mathematics - Statistics Theory
Abstract: We introduce a new feature map for barcodes that arise in persistent homology computation. The main idea is to first realize each barcode as a path in a convenient vector space, and to then compute its path signature which takes values in the tensor algebra of that vector space. The composition of these two operations - barcode to path, path to tensor series - results in a feature map that has several desirable properties for statistical learning, such as universality and characteristicness, and achieves state-of-the-art results on common classification benchmarks., Comment: Additional experiment and further details. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence
Published: 2018
Full Text: View/download PDF

36. Probabilistic supervised learning

Author: Gressmann, Frithjof, Király, Franz J., Mateen, Bilal, and Oberhauser, Harald
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Statistics Theory, Statistics - Methodology
Abstract: Predictive modelling and supervised learning are central to modern data science. With predictions from an ever-expanding number of supervised black-box strategies - e.g., kernel methods, random forests, deep learning aka neural networks - being employed as a basis for decision making processes, it is crucial to understand the statistical uncertainty associated with these predictions. As a general means to approach the issue, we present an overarching framework for black-box prediction strategies that not only predict the target but also their own predictions' uncertainty. Moreover, the framework allows for fair assessment and comparison of disparate prediction strategies. For this, we formally consider strategies capable of predicting full distributions from feature variables, so-called probabilistic supervised learning strategies. Our work draws from prior work including Bayesian statistics, information theory, and modern supervised machine learning, and in a novel synthesis leads to (a) new theoretical insights such as a probabilistic bias-variance decomposition and an entropic formulation of prediction, as well as to (b) new algorithms and meta-algorithms, such as composite prediction strategies, probabilistic boosting and bagging, and a probabilistic predictive independence test. Our black-box formulation also leads (c) to a new modular interface view on probabilistic supervised learning and a modelling workflow API design, which we have implemented in the newly released skpro machine learning toolbox, extending the familiar modelling interface and meta-modelling functionality of sklearn. The skpro package provides interfaces for construction, composition, and tuning of probabilistic supervised learning strategies, together with orchestration features for validation and comparison of any such strategy - be it frequentist, Bayesian, or other.
Published: 2018

37. Sketching the order of events

Author: Lyons, Terry and Oberhauser, Harald
Subjects: Statistics - Machine Learning, Computer Science - Data Structures and Algorithms, Mathematics - Statistics Theory
Abstract: We introduce features for massive data streams. These stream features can be thought of as "ordered moments" and generalize stream sketches from "moments of order one" to "ordered moments of arbitrary order". In analogy to classic moments, they have theoretical guarantees such as universality that are important for learning algorithms.
Published: 2017

38. Kernels for sequentially ordered data

Author: Király, Franz J and Oberhauser, Harald
Subjects: Statistics - Machine Learning, Computer Science - Discrete Mathematics, Computer Science - Learning, Mathematics - Statistics Theory, Statistics - Methodology
Abstract: We present a novel framework for kernel learning with sequential data of any kind, such as time series, sequences of graphs, or strings. Our approach is based on signature features which can be seen as an ordered variant of sample (cross-)moments; it allows to obtain a "sequentialized" version of any static kernel. The sequential kernels are efficiently computable for discrete sequences and are shown to approximate a continuous moment form in a sampling sense. A number of known kernels for sequences arise as "sequentializations" of suitable static kernels: string kernels may be obtained as a special case, and alignment kernels are closely related up to a modification that resolves their open non-definiteness issue. Our experiments indicate that our signature-based sequential kernel framework may be a promising approach to learning with sequential data, such as time series, that allows to avoid extensive manual pre-processing.
Published: 2016

40. An integral equation for Root's barrier and the generation of Brownian increments

Author: Gassiat, Paul, Mijatović, Aleksandar, and Oberhauser, Harald
Subjects: Mathematics - Probability
Abstract: We derive a nonlinear integral equation to calculate Root's solution of the Skorokhod embedding problem for atom-free target measures. We then use this to efficiently generate bounded time-space increments of Brownian motion and give a parabolic version of Muller's classic "Random walk over spheres" algorithm., Comment: Published at http://dx.doi.org/10.1214/14-AAP1042 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Published: 2013
Full Text: View/download PDF

41. A Levy-area between Brownian motion and rough paths with applications to robust non-linear filtering and RPDEs

Author: Diehl, Joscha, Oberhauser, Harald, and Riedel, Sebastian
Subjects: Mathematics - Probability
Abstract: We give meaning to differential equations with a rough path term and a Brownian noise term as driving signals. Such differential equations as well as the question of regularity of the solution map arise naturally and we discuss two applications: one revisits Clark's robustness problem in nonlinear filtering, the other is a Feynman--Kac type representation of linear RPDEs. En passant, we give a short and direct argument that implies integrability estimates for rough differential equations with Gaussian driving signals which is of independent interest., Comment: New coauthor, new section on integrability for Gaussian RDEs, minor changes
Published: 2013

42. Root's barrier, viscosity solutions of obstacle problems and reflected FBSDEs

Author: Gassiat, Paul, Oberhauser, Harald, and Reis, Goncalo dos
Subjects: Mathematics - Probability, Mathematics - Analysis of PDEs
Abstract: We revisit work of Rost, Dupire and Cox--Wang on connections between Root's solution of the Skorokhod embedding problem and obstacle problems. We develop an approach based on viscosity sub- and supersolutions and an accompanying comparison principle. This gives a complete characterization of (reversed) Root barriers and leads to new proofs of existence as well as minimality of such barrier solutions by pure PDE methods. The approach is self-contained and general enough to cover martingale diffusions with degenerate elliptic or time-dependent volatility; it also provides insights about the dynamics of general Skorokhod embeddings.
Published: 2013

43. An extension of the functional Ito formula under a family of non-dominated measures

Author: Oberhauser, Harald
Subjects: Mathematics - Probability
Abstract: Motivated by questions arising in financial mathematics, Dupire introduced a notion of smoothness for functionals of paths (different from the usual Fr\'echet--Gat\'eaux derivatives) and arrived at a generalization of It\=o's formula applicable to functionals which have a pathwise continuous dependence on the trajectories of the underlying process. We study nonlinear functionals which do not have such pathwise continuity and further work simultaneously under the family of continuous semimartingale measures on path-space. We do this without introducing a second component, as carried out by Cont--Fournie but by using old work of Bichteler which allows to keep a pathwise picture even for complex functionals
Published: 2012

44. A Chen-Fliess approximation for diffusion functionals

Author: Litterer, Christian and Oberhauser, Harald
Subjects: Mathematics - Probability
Abstract: We show that an interesting class of functionals of stochastic differential equations can be approximated by a Chen-Fliess series of iterated stochastic integrals and give a L^{2} error estimate, thus generalizing the standard stochastic Taylor expansion. The coefficients in this series are given a very intuitive meaning by using functional derivatives, recently introduced by B. Dupire., Comment: extended introduction, references added, changed title, minor typos fixed
Published: 2011

45. Parabolic comparison revisited and applications

Author: Diehl, Joscha, Friz, Peter K., and Oberhauser, Harald
Subjects: Mathematics - Analysis of PDEs, Mathematics - Probability, 35R99
Abstract: We consider the Cauchy-Dirichlet problem $\partial_t u - F(t,x,u,Du,D^2 u) = 0 on (0,T)\times \R^n$ in viscosity sense. Comparison is established for bounded semi-continuous (sub-/super-)solutions under structural assumption (3.14) of the User's Guide plus a mild condition on $F$ such as to cope with the unbounded domain. Comparison on $(0,T]$, space-time regularity and existence are also discussed. Our analysis passes through an extension of the parabolic theorem of sums which appears to be useful in its own right.
Published: 2011

46. On the splitting-up method for rough (partial) differential equations

Author: Friz, Peter and Oberhauser, Harald
Subjects: Mathematics - Probability, Mathematics - Analysis of PDEs, 60H15
Abstract: This article introduces the splitting method to systems responding to rough paths as external stimuli. The focus is on nonlinear partial differential equations with rough noise but we also cover rough differential equations. Applications to stochastic partial differential equations arising in control theory and nonlinear filtering are given.
Published: 2010

47. Rough path stability of (semi-)linear SPDEs

Author: Friz, Peter and Oberhauser, Harald
Subjects: Mathematics - Probability, Mathematics - Analysis of PDEs, 60H15
Abstract: We give meaning to linear and semi-linear (possibly degenerate) parabolic partial differential equations with (affine) linear rough path noise and establish stability in a rough path metric. In the case of enhanced Brownian motion (Brownian motion with its L\'evy area) as rough path noise the solution coincides with the standard variational solution of the SPDE., Comment: Final submitted version, to appear in PTRF. The previous title was "Rough path stability of SPDEs arising in non-linear filtering"
Published: 2010

48. A generalized Fernique theorem and applications

Author: Friz, Peter and Oberhauser, Harald
Subjects: Mathematics - Probability, Mathematics - Functional Analysis
Abstract: We prove a generalisation of Fernique's theorem which applies to a class of (measurable) functionals on abstract Wiener spaces by using the isoperimetric inequality. Our motivation comes from rough path theory where one deals with iterated integrals of Gaussian processes (which are generically not Gaussian). Gaussian integrability with explicitly given constants for variation and H\"older norms of the (fractional) Brownian rough path, Gaussian rough paths and the Banach space valued Wiener process enhanced with its L\'evy area [Ledoux, Lyons, Quian. "L\'evy area of Wiener processes in Banach spaces". Ann. Probab., 30(2):546--578, 2002] then all follow from applying our main theorem., Comment: To be published in the Proceedings of the AMS.
Published: 2010

49. A (rough) pathwise approach to a class of non-linear stochastic partial differential equations

Author: Caruana, Michael, Friz, Peter, and Oberhauser, Harald
Subjects: Mathematics - Analysis of PDEs, Mathematics - Probability, 35R99, 60H15
Abstract: We consider nonlinear parabolic evolution equations of the form $\partial_{t}u=F(t,x,Du,D^{2}u) $, subject to noise of the form $H(x,Du) \circ dB$ where $H$ is linear in $Du$ and $\circ dB$ denotes the Stratonovich differential of a multidimensional Brownian motion. Motivated by the essentially pathwise results of [Lions, P.-L. and Souganidis, P.E.; Fully nonlinear stochastic partial differential equations. C. R. Acad. Sci. Paris S\'{e}r. I Math. 326 (1998), no. 9] we propose the use of rough path analysis [Lyons, T. J.; Differential equations driven by rough signals. Rev. Mat. Iberoamericana 14 (1998), no. 2, 215--310] in this context. Although the core arguments are entirely deterministic, a continuity theorem allows for various probabilistic applications (limit theorems, support, large deviations, ...)., Comment: minor changes and some more details in the appendix; this version to appear in Annales de l'Institut Henri Poincar\'e / Analyse non lin\'eaire
Published: 2009
Full Text: View/download PDF

50. Non-standard approximations of the Ito-map

Author: Friz, Peter and Oberhauser, Harald
Subjects: Mathematics - Probability, Mathematics - Classical Analysis and ODEs, 60H99, 34K20
Abstract: The Wong-Zakai theorem asserts that ODEs driven by "reasonable" (e.g. piecewise linear) approximations of Brownian motion converge to the corresponding Stratonovich stochastic differential equation. With the aid of rough path analysis, we study "non-reasonable" approximations and go beyond a well-known criterion of [Ikeda--Watanabe, North Holland 1989] in the sense that our result applies to perturbations on all levels, exhibiting additional drift terms involving any iterated Lie brackets of the driving vector fields. In particular, this applies to the approximations by McShane ('72) and Sussmann ('91). Our approach is not restricted to Brownian driving signals. At last, these ideas can be used to prove optimality of certain rough path estimates., Comment: 18 pages
Published: 2008
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

165 results on '"Oberhauser, Harald"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources