165 results on '"Oberhauser, Harald"'
Search Results
2. Learning to Forget: Bayesian Time Series Forecasting using Recurrent Sparse Spectrum Signature Gaussian Processes
- Author
-
Tóth, Csaba, Adachi, Masaki, Osborne, Michael A., and Oberhauser, Harald
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
The signature kernel is a kernel between time series of arbitrary length and comes with strong theoretical guarantees from stochastic analysis. It has found applications in machine learning such as covariance functions for Gaussian processes. A strength of the underlying signature features is that they provide a structured global description of a time series. However, this property can quickly become a curse when local information is essential and forgetting is required; so far this has only been addressed with ad-hoc methods such as slicing the time series into subsegments. To overcome this, we propose a principled, data-driven approach by introducing a novel forgetting mechanism for signatures. This allows the model to dynamically adapt its context length to focus on more recent information. To achieve this, we revisit the recently introduced Random Fourier Signature Features, and develop Random Fourier Decayed Signature Features (RFDSF) with Gaussian processes (GPs). This results in a Bayesian time series forecasting algorithm with variational inference, that offers a scalable probabilistic algorithm that processes and transforms a time series into a joint predictive distribution over time steps in one pass using recurrence. For example, processing a sequence of length $10^4$ steps in $\approx 10^{-2}$ seconds and in $< 1\text{GB}$ of GPU memory. We demonstrate that it outperforms other GP-based alternatives and competes with state-of-the-art probabilistic time series forecasting algorithms.
- Published
- 2024
3. A Quadrature Approach for General-Purpose Batch Bayesian Optimization via Probabilistic Lifting
- Author
-
Adachi, Masaki, Hayakawa, Satoshi, Jørgensen, Martin, Hamid, Saad, Oberhauser, Harald, and Osborne, Michael A.
- Subjects
Computer Science - Machine Learning ,Mathematics - Numerical Analysis ,Statistics - Machine Learning ,62C10, 62F15 - Abstract
Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian optimisation via probabilistic lifting with kernel quadrature, called SOBER, which we present as a Python library based on GPyTorch/BoTorch. Our framework offers the following unique benefits: (1) Versatility in downstream tasks under a unified approach. (2) A gradient-free sampler, which does not require the gradient of acquisition functions, offering domain-agnostic sampling (e.g., discrete and mixed variables, non-Euclidean space). (3) Flexibility in domain prior distribution. (4) Adaptive batch size (autonomous determination of the optimal batch size). (5) Robustness against a misspecified reproducing kernel Hilbert space. (6) Natural stopping criterion., Comment: This work is the journal extension of the workshop paper (arXiv:2301.11832) and AISTATS paper (arXiv:2306.05843). 48 pages, 11 figures
- Published
- 2024
4. Random Fourier Signature Features
- Author
-
Toth, Csaba, Oberhauser, Harald, and Szabo, Zoltan
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Statistics - Methodology - Abstract
Tensor algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel accompanied with attractive theoretical guarantees from stochastic analysis. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. To mitigate this severe computational bottleneck, we develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences. We show uniform approximation guarantees for the proposed unbiased estimator of the signature kernel, while keeping its computation linear in the sequence length and number. In addition, combined with recent advances on tensor projections, we derive two even more scalable time series features with favourable concentration properties and computational complexity both in time and memory. Our empirical results show that the reduction in computational cost comes at a negligible price in terms of accuracy on moderate-sized datasets, and it enables one to scale to large datasets up to a million time series.
- Published
- 2023
5. Random Surfaces and Higher Algebra
- Author
-
Lee, Darrick and Oberhauser, Harald
- Subjects
Mathematics - Probability ,Mathematics - Algebraic Topology ,Mathematics - Category Theory ,Mathematics - Differential Geometry - Abstract
The space of surfaces $\mathbf{X}: [0,1]^2 \to \mathbb{R}^d$ is equipped with natural horizontal and vertical concatenation operations, and we study representations of surfaces which preserve this algebraic structure. The framework of higher category theory allows us to formalize this notion in terms of a functor between double groupoids. We construct such functors by generalizing the classical concept of the path-ordered exponential as a nonabelian path integral. This generalization yields a notion of nonabelian surface integration, and is known in higher gauge theory as surface holonomy (or higher parallel transport). We extend these constructions from smooth surfaces to bounded controlled $p$-variation surfaces in the Young regime, with $p < 2$. By focusing on surface holonomy valued in higher generalizations of the classical matrix groups, called matrix double groups, we obtain explicit computations for piecewise linear approximations. Furthermore, we generalize the nonabelian Stokes' and Fubini theorems to the Young regime. Finally, we use these matrix surface holonomy functors to construct characteristic functions for random surfaces such as fractional Brownian sheets with Hurst parameter $h> \frac{1}{2}$., Comment: 102 pages (56 pages main text) - minor changes (rewritten introduction and improved presentation)
- Published
- 2023
6. HADES: Fast Singularity Detection with Local Measure Comparison
- Author
-
Lim, Uzu, Oberhauser, Harald, and Nanda, Vidit
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Mathematics - Algebraic Topology ,Mathematics - Differential Geometry ,Mathematics - Statistics Theory ,55N31, 32S50 - Abstract
We introduce Hades, an unsupervised algorithm to detect singularities in data. This algorithm employs a kernel goodness-of-fit test, and as a consequence it is much faster and far more scaleable than the existing topology-based alternatives. Using tools from differential geometry and optimal transport theory, we prove that Hades correctly detects singularities with high probability when the data sample lives on a transverse intersection of equidimensional manifolds. In computational experiments, Hades recovers singularities in synthetically generated data, branching points in road network data, intersection rings in molecular conformation space, and anomalies in image data.
- Published
- 2023
7. Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach
- Author
-
Adachi, Masaki, Hayakawa, Satoshi, Jørgensen, Martin, Wan, Xingchen, Nguyen, Vu, Oberhauser, Harald, and Osborne, Michael A.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Mathematics - Numerical Analysis ,Statistics - Computation ,Statistics - Machine Learning ,62C10, 62F15 - Abstract
Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more costly, smaller batches lead to slower wall-clock run-times -- and the trade-off may change over the run (larger batches are often preferable earlier). To address this trade-off, we propose a novel Probabilistic Numerics framework that adaptively changes batch sizes. By framing batch selection as a quadrature task, our integration-error-aware algorithm facilitates the automatic tuning of batch sizes to meet predefined quadrature precision objectives, akin to how typical optimizers terminate based on convergence thresholds. This approach obviates the necessity for exhaustive searches across all potential batch sizes. We also extend this to scenarios with constrained active learning and constrained optimization, interpreting constraint violations as reductions in the precision requirement, to subsequently adapt batch construction. Through extensive experiments, we demonstrate that our approach significantly enhances learning efficiency and flexibility in diverse Bayesian batch active learning and Bayesian optimization applications., Comment: Accepted at AISTATS 2024. 33 pages, 6 figures
- Published
- 2023
8. The Signature Kernel
- Author
-
Lee, Darrick and Oberhauser, Harald
- Subjects
Mathematics - Probability ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
The signature kernel is a positive definite kernel for sequential data. It inherits theoretical guarantees from stochastic analysis, has efficient algorithms for computation, and shows strong empirical performance. In this short survey paper for a forthcoming Springer handbook, we give an elementary introduction to the signature kernel and highlight these theoretical and computational properties., Comment: 31 pages, 2 figures
- Published
- 2023
9. Kernelized Cumulants: Beyond Kernel Mean Embeddings
- Author
-
Bonnier, Patric, Oberhauser, Harald, and Szabó, Zoltán
- Subjects
Statistics - Machine Learning ,Computer Science - Information Theory ,Computer Science - Machine Learning - Abstract
In $\mathbb R^d$, it is well-known that cumulants provide an alternative to moments that can achieve the same goals with numerous benefits such as lower variance estimators. In this paper we extend cumulants to reproducing kernel Hilbert spaces (RKHS) using tools from tensor algebras and show that they are computationally tractable by a kernel trick. These kernelized cumulants provide a new set of all-purpose statistics; the classical maximum mean discrepancy and Hilbert-Schmidt independence criterion arise as the degree one objects in our general construction. We argue both theoretically and empirically (on synthetic, environmental, and traffic data analysis) that going beyond degree one has several advantages and can be achieved with the same computational complexity and minimal overhead in our experiments., Comment: 19 pages, 8 figures
- Published
- 2023
10. SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces
- Author
-
Adachi, Masaki, Hayakawa, Satoshi, Hamid, Saad, Jørgensen, Martin, Oberhauser, Harald, and Osborne, Micheal A.
- Subjects
Computer Science - Machine Learning ,Mathematics - Numerical Analysis ,Statistics - Computation ,Statistics - Machine Learning ,62C10, 62F15 - Abstract
Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks., Comment: 34 pages, 12 figures
- Published
- 2023
11. Sampling-based Nystr\'om Approximation and Kernel Quadrature
- Author
-
Hayakawa, Satoshi, Oberhauser, Harald, and Lyons, Terry
- Subjects
Mathematics - Numerical Analysis ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We analyze the Nystr\"om approximation of a positive definite kernel associated with a probability measure. We first prove an improved error bound for the conventional Nystr\"om approximation with i.i.d. sampling and singular-value decomposition in the continuous regime; the proof techniques are borrowed from statistical learning theory. We further introduce a refined selection of subspaces in Nystr\"om approximation with theoretical guarantees that is applicable to non-i.i.d. landmark points. Finally, we discuss their application to convex kernel quadrature and give novel theoretical guarantees as well as numerical observations., Comment: 22 pages, ICML 2023 camera-ready version. Typos fixed
- Published
- 2023
12. Hypercontractivity Meets Random Convex Hulls: Analysis of Randomized Multivariate Cubatures
- Author
-
Hayakawa, Satoshi, Oberhauser, Harald, and Lyons, Terry
- Subjects
Mathematics - Probability ,Mathematics - Numerical Analysis - Abstract
Given a probability measure $\mu$ on a set $\mathcal{X}$ and a vector-valued function $\varphi$, a common problem is to construct a discrete probability measure on $\mathcal{X}$ such that the push-forward of these two probability measures under $\varphi$ is the same. This construction is at the heart of numerical integration methods that run under various names such as quadrature, cubature, or recombination. A natural approach is to sample points from $\mu$ until their convex hull of their image under $\varphi$ includes the mean of $\varphi$. Here we analyze the computational complexity of this approach when $\varphi$ exhibits a graded structure by using so-called hypercontractivity. The resulting theorem not only covers the classical cubature case of multivariate polynomials, but also integration on pathspace, as well as kernel quadrature for product measures., Comment: 20 pages
- Published
- 2022
- Full Text
- View/download PDF
13. A topological approach to mapping space signatures
- Author
-
Giusti, Chad, Lee, Darrick, Nanda, Vidit, and Oberhauser, Harald
- Published
- 2025
- Full Text
- View/download PDF
14. Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination
- Author
-
Adachi, Masaki, Hayakawa, Satoshi, Jørgensen, Martin, Oberhauser, Harald, and Osborne, Michael A.
- Subjects
Computer Science - Machine Learning ,Mathematics - Numerical Analysis ,Statistics - Computation ,Statistics - Machine Learning ,62C10, 62F15 - Abstract
Calculation of Bayesian posteriors and model evidences typically requires numerical integration. Bayesian quadrature (BQ), a surrogate-model-based approach to numerical integration, is capable of superb sample efficiency, but its lack of parallelisation has hindered its practical applications. In this work, we propose a parallelised (batch) BQ method, employing techniques from kernel quadrature, that possesses an empirically exponential convergence rate. Additionally, just as with Nested Sampling, our method permits simultaneous inference of both posteriors and model evidence. Samples from our BQ surrogate model are re-selected to give a sparse set of samples, via a kernel recombination algorithm, requiring negligible additional time to increase the batch size. Empirically, we find that our approach significantly outperforms the sampling efficiency of both state-of-the-art BQ techniques and Nested Sampling in various real-world datasets, including lithium-ion battery analytics., Comment: 38 pages, 6 figures
- Published
- 2022
15. Capturing Graphs with Hypo-Elliptic Diffusions
- Author
-
Toth, Csaba, Lee, Darrick, Hacker, Celia, and Oberhauser, Harald
- Subjects
Computer Science - Machine Learning ,Mathematics - Probability - Abstract
Convolutional layers within graph neural networks operate by aggregating information about local neighbourhood structures; one common way to encode such substructures is through random walks. The distribution of these random walks evolves according to a diffusion equation defined using the graph Laplacian. We extend this approach by leveraging classic mathematical results about hypo-elliptic diffusions. This results in a novel tensor-valued graph operator, which we call the hypo-elliptic graph Laplacian. We provide theoretical guarantees and efficient low-rank approximation algorithms. In particular, this gives a structured approach to capture long-range dependencies on graphs that is robust to pooling. Besides the attractive theoretical properties, our experiments show that this method competes with graph transformers on datasets requiring long-range reasoning but scales only linearly in the number of edges as opposed to quadratically in nodes., Comment: 30 pages
- Published
- 2022
16. A Topological Approach to Mapping Space Signatures
- Author
-
Giusti, Chad, Lee, Darrick, Nanda, Vidit, and Oberhauser, Harald
- Subjects
Mathematics - Functional Analysis ,Mathematics - Algebraic Topology ,Mathematics - Probability ,60L10, 55U15, 28C20 - Abstract
A common approach for describing classes of functions and probability measures on a topological space $\mathcal{X}$ is to construct a suitable map $\Phi$ from $\mathcal{X}$ into a vector space, where linear methods can be applied to address both problems. The case where $\mathcal{X}$ is a space of paths $[0,1] \to \mathbb{R}^n$ and $\Phi$ is the path signature map has received much attention in stochastic analysis and related fields. In this article we develop a generalized $\Phi$ for the case where $\mathcal{X}$ is a space of maps $[0,1]^d \to \mathbb{R}^n$ for any $d \in \mathbb{N}$, and show that the map $\Phi$ generalizes many of the desirable algebraic and analytic properties of the path signature to $d \ge 2$. The key ingredient to our approach is topological; in particular, our starting point is a generalisation of K-T Chen's path space cochain construction to the setting of cubical mapping spaces., Comment: 58 pages, comments welcome!
- Published
- 2022
17. Proper Scoring Rules, Gradients, Divergences, and Entropies for Paths and Time Series
- Author
-
Bonnier, Patric and Oberhauser, Harald
- Subjects
Statistics - Methodology ,Mathematics - Probability ,Mathematics - Statistics Theory - Abstract
Many forecasts consist not of point predictions but concern the evolution of quantities. For example, a central bank might predict the interest rates during the next quarter, an epidemiologist might predict trajectories of infection rates, a clinician might predict the behaviour of medical markers over the next day, etc. The situation is further complicated since these forecasts sometimes only concern the approximate "shape of the future evolution" or "order of events". Formally, such forecasts can be seen as probability measures on spaces of equivalence classes of paths modulo time-parametrization. We leverage the statistical framework of proper scoring rules with classical mathematical results to derive a principled approach to decision making with such forecasts. In particular, we introduce notions of gradients, entropy, and divergence that are tailor-made to respect the underlying non-Euclidean structure.
- Published
- 2021
18. Markov Chain Approximations to Stochastic Differential Equations by Recombination on Lattice Trees
- Author
-
Cosentino, Francesco, Oberhauser, Harald, and Abate, Alessandro
- Subjects
Mathematics - Probability - Abstract
We revisit the classical problem of approximating a stochastic differential equation by a discrete-time and discrete-space Markov chain. Our construction iterates Caratheodory's theorem over time to match the moments of the increments locally. This allows to construct a Markov chain with a sparse transition matrix where the number of attainable states grows at most polynomially as time increases. Moreover, the MC evolves on a tree whose nodes lie on a "universal lattice" in the sense that an arbitrary number of different SDEs can be approximated on the same tree. The construction is not tailored to specific models, we discuss both the case of uni-variate and multi-variate case SDEs, and provide an implementation and numerical experiments.
- Published
- 2021
19. Tangent Space and Dimension Estimation with the Wasserstein Distance
- Author
-
Lim, Uzu, Oberhauser, Harald, and Nanda, Vidit
- Subjects
Mathematics - Statistics Theory ,Computer Science - Machine Learning - Abstract
Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold with high confidence. The algorithm for this estimation is Local PCA, a local version of principal component analysis. Our results accommodate for noisy non-uniform data distribution with the noise that may vary across the manifold, and allow simultaneous estimation at multiple points. Crucially, all of the constants appearing in our bound are explicitly described. The proof uses a matrix concentration inequality to estimate covariance matrices and a Wasserstein distance bound for quantifying nonlinearity of the underlying manifold and non-uniformity of the probability measure., Comment: Main theorems rewritten. Introduction is written more compactly
- Published
- 2021
20. Positively Weighted Kernel Quadrature via Subsampling
- Author
-
Hayakawa, Satoshi, Oberhauser, Harald, and Lyons, Terry
- Subjects
Mathematics - Numerical Analysis ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We study kernel quadrature rules with convex weights. Our approach combines the spectral properties of the kernel with recombination results about point measures. This results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples from the underlying measure and evaluation of the kernel and that result in a small worst-case error. In addition to our theoretical results and the benefits resulting from convex weights, our experiments indicate that this construction can compete with the optimal bounds in well-known examples., Comment: 29 pages, NeurIPS 2022 camera-ready version
- Published
- 2021
21. Grid-Free Computation of Probabilistic Safety with Malliavin Calculus
- Author
-
Cosentino, Francesco, Oberhauser, Harald, and Abate, Alessandro
- Subjects
Mathematics - Probability ,Computer Science - Computational Engineering, Finance, and Science - Abstract
This work concerns continuous-time, continuous-space stochastic dynamical systems described by stochastic differential equations (SDE). It presents a new approach to compute probabilistic safety regions, namely sets of initial conditions of the SDE associated to trajectories that are safe with a probability larger than a given threshold. The approach introduces a functional that is minimised at the border of the probabilistic safety region, then solves an optimisation problem using techniques from Malliavin Calculus, which computes such region. Unlike existing results in the literature, the new approach allows one to compute probabilistic safety regions without gridding the state space of the SDE.
- Published
- 2021
22. Neural SDEs as Infinite-Dimensional GANs
- Author
-
Kidger, Patrick, Foster, James, Li, Xuechen, Oberhauser, Harald, and Lyons, Terry
- Subjects
Computer Science - Machine Learning - Abstract
Stochastic differential equations (SDEs) are a staple of mathematical modelling of temporal dynamics. However, a fundamental limitation has been that such models have typically been relatively inflexible, which recent work introducing Neural SDEs has sought to solve. Here, we show that the current classical approach to fitting SDEs may be approached as a special case of (Wasserstein) GANs, and in doing so the neural and classical regimes may be brought together. The input noise is Brownian motion, the output samples are time-evolving paths produced by a numerical solver, and by parameterising a discriminator as a Neural Controlled Differential Equation (CDE), we obtain Neural SDEs as (in modern machine learning parlance) continuous-time generative time series models. Unlike previous work on this problem, this is a direct extension of the classical approach without reference to either prespecified statistics or density functions. Arbitrary drift and diffusions are admissible, so as the Wasserstein loss has a unique global minima, in the infinite data limit any SDE may be learnt. Example code has been made available as part of the \texttt{torchsde} repository., Comment: Published at ICML 2021
- Published
- 2021
23. Nonlinear Independent Component Analysis for Discrete-Time and Continuous-Time Signals
- Author
-
Schell, Alexander and Oberhauser, Harald
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Probability ,62H25, 62M99, 62H05, 60L10, 62M45, 62R10 - Abstract
We study the classical problem of recovering a multidimensional source signal from observations of nonlinear mixtures of this signal. We show that this recovery is possible (up to a permutation and monotone scaling of the source's original component signals) if the mixture is due to a sufficiently differentiable and invertible but otherwise arbitrarily nonlinear function and the component signals of the source are statistically independent with 'non-degenerate' second-order statistics. The latter assumption requires the source signal to meet one of three regularity conditions which essentially ensure that the source is sufficiently far away from the non-recoverable extremes of being deterministic or constant in time. These assumptions, which cover many popular time series models and stochastic processes, allow us to reformulate the initial problem of nonlinear blind source separation as a simple-to-state problem of optimisation-based function approximation. We propose to solve this approximation problem by minimizing a novel type of objective function that efficiently quantifies the mutual statistical dependence between multiple stochastic processes via cumulant-like statistics. This yields a scalable and direct new method for nonlinear Independent Component Analysis with widely applicable theoretical guarantees and for which our experiments indicate good performance., Comment: 89 pages, 10 figures; thoroughly revised presentation (including newly added Sections 2, 10, A.19). To appear in the Annals of Statistics
- Published
- 2021
24. Estimating the probability that a given vector is in the convex hull of a random sample
- Author
-
Hayakawa, Satoshi, Lyons, Terry, and Oberhauser, Harald
- Subjects
Mathematics - Probability ,Mathematics - Statistics Theory - Abstract
For a $d$-dimensional random vector $X$, let $p_{n, X}(\theta)$ be the probability that the convex hull of $n$ independent copies of $X$ contains a given point $\theta$. We provide several sharp inequalities regarding $p_{n, X}(\theta)$ and $N_X(\theta)$ denoting the smallest $n$ for which $p_{n, X}(\theta)\ge1/2$. As a main result, we derive the totally general inequality $1/2 \le \alpha_X(\theta)N_X(\theta)\le 3d + 1$, where $\alpha_X(\theta)$ (a.k.a. the Tukey depth) is the minimum probability that $X$ is in a fixed closed halfspace containing the point $\theta$. We also show several applications of our general results: one is a moment-based bound on $N_X(\mathbb{E}[X])$, which is an important quantity in randomized approaches to cubature construction or measure reduction problem. Another application is the determination of the canonical convex body included in a random convex polytope given by independent copies of $X$, where our combinatorial approach allows us to generalize existing results in random matrix community significantly., Comment: 34 pages
- Published
- 2021
- Full Text
- View/download PDF
25. The shifted ODE method for underdamped Langevin MCMC
- Author
-
Foster, James, Lyons, Terry, and Oberhauser, Harald
- Subjects
Mathematics - Numerical Analysis ,Mathematics - Probability ,Mathematics - Statistics Theory ,60J22, 60L90, 65C30 - Abstract
In this paper, we consider the underdamped Langevin diffusion (ULD) and propose a numerical approximation using its associated ordinary differential equation (ODE). When used as a Markov Chain Monte Carlo (MCMC) algorithm, we show that the ODE approximation achieves a $2$-Wasserstein error of $\varepsilon$ in $\mathcal{O}\big(d^{\frac{1}{3}}/\varepsilon^{\frac{2}{3}}\big)$ steps under the standard smoothness and strong convexity assumptions on the target distribution. This matches the complexity of the randomized midpoint method proposed by Shen and Lee [NeurIPS 2019] which was shown to be order optimal by Cao, Lu and Wang. However, the main feature of the proposed numerical method is that it can utilize additional smoothness of the target log-density $f$. More concretely, we show that the ODE approximation achieves a $2$-Wasserstein error of $\varepsilon$ in $\mathcal{O}\big(d^{\frac{2}{5}}/\varepsilon^{\frac{2}{5}}\big)$ and $\mathcal{O}\big(\sqrt{d}/\varepsilon^{\frac{1}{3}}\big)$ steps when Lipschitz continuity is assumed for the Hessian and third derivative of $f$. By discretizing this ODE using a third order Runge-Kutta method, we can obtain a practical MCMC method that uses just two additional gradient evaluations per step. In our experiment, where the target comes from a logistic regression, this method shows faster convergence compared to other unadjusted Langevin MCMC algorithms.
- Published
- 2021
26. Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections
- Author
-
Toth, Csaba, Bonnier, Patric, and Oberhauser, Harald
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Sequential data such as time series, video, or text can be challenging to analyse as the ordered structure gives rise to complex dependencies. At the heart of this is non-commutativity, in the sense that reordering the elements of a sequence can completely change its meaning. We use a classical mathematical object -- the tensor algebra -- to capture such dependencies. To address the innate computational complexity of high degree tensors, we use compositions of low-rank tensor projections. This yields modular and scalable building blocks for neural networks that give state-of-the-art performance on standard benchmarks such as multivariate time series classification and generative models for video., Comment: 37 pages, 6 figures, 8 tables
- Published
- 2020
27. Carath\'eodory Sampling for Stochastic Gradient Descent
- Author
-
Cosentino, Francesco, Oberhauser, Harald, and Abate, Alessandro
- Subjects
Computer Science - Machine Learning ,Mathematics - Probability ,Statistics - Machine Learning - Abstract
Many problems require to optimize empirical risk functions over large data sets. Gradient descent methods that calculate the full gradient in every descent step do not scale to such datasets. Various flavours of Stochastic Gradient Descent (SGD) replace the expensive summation that computes the full gradient by approximating it with a small sum over a randomly selected subsample of the data set that in turn suffers from a high variance. We present a different approach that is inspired by classical results of Tchakaloff and Carath\'eodory about measure reduction. These results allow to replace an empirical measure with another, carefully constructed probability measure that has a much smaller support, but can preserve certain statistics such as the expected gradient. To turn this into scalable algorithms we firstly, adaptively select the descent steps where the measure reduction is carried out; secondly, we combine this with Block Coordinate Descent so that measure reduction can be done very cheaply. This makes the resulting methods scalable to high-dimensional spaces. Finally, we provide an experimental validation and comparison.
- Published
- 2020
28. A Randomized Algorithm to Reduce the Support of Discrete Measures
- Author
-
Cosentino, Francesco, Oberhauser, Harald, and Abate, Alessandro
- Subjects
Computer Science - Machine Learning ,Mathematics - Probability ,Statistics - Machine Learning - Abstract
Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms and has the same mean when integrated against each of the $n$ functions. If $ N \gg n$ this results in a huge reduction of complexity. We give a simple geometric characterization of barycenters via negative cones and derive a randomized algorithm that computes this new measure by "greedy geometric sampling". We then study its properties, and benchmark it on synthetic and real-world data to show that it can be very beneficial in the $N\gg n$ regime. A Python implementation is available at \url{https://github.com/FraCose/Recombination_Random_Algos}.
- Published
- 2020
29. Adapted Topologies and Higher Rank Signatures
- Author
-
Bonnier, Patric, Liu, Chong, and Oberhauser, Harald
- Subjects
Mathematics - Probability - Abstract
The topology of weak convergence does not account for the growth of information over time that is captured in the filtration of an adapted stochastic process. For example, two adapted stochastic processes can have very similar laws but give completely different results in applications such as optimal stopping, queuing theory, or stochastic programming. To address such discontinuities, Aldous introduced the extended weak topology, and subsequently, Hoover and Keisler showed that both, weak topology and extended weak topology, are just the first two topologies in a sequence of topologies that get increasingly finer. We use higher rank expected signatures to embed adapted processes into graded linear spaces and show that these embeddings induce the adapted topologies of Hoover--Keisler., Comment: 39 pages, 6 figures
- Published
- 2020
30. Signature Cumulants, Ordered Partitions, and Independence of Stochastic Processes
- Author
-
Bonnier, Patric and Oberhauser, Harald
- Subjects
Mathematics - Probability ,Mathematics - Statistics Theory - Abstract
The sequence of so-called signature moments describes the laws of many stochastic processes in analogy with how the sequence of moments describes the laws of vector-valued random variables. However, even for vector-valued random variables, the sequence of cumulants is much better suited for many tasks than the sequence of moments. This motivates us to study so-called signature cumulants. To do so, we develop an elementary combinatorial approach and show that in the same way that cumulants relate to the lattice of partitions, signature cumulants relate to the lattice of so-called "ordered partitions". We use this to give a new characterisation of independence of multivariate stochastic processes; finally we construct a family of unbiased minimum-variance estimators of signature cumulants., Comment: 31 pages, 2 figures
- Published
- 2019
- Full Text
- View/download PDF
31. Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances
- Author
-
Toth, Csaba and Oberhauser, Harald
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Probability - Abstract
We develop a Bayesian approach to learning from sequential data by using Gaussian processes (GPs) with so-called signature kernels as covariance functions. This allows to make sequences of different length comparable and to rely on strong theoretical results from stochastic analysis. Signatures capture sequential structure with tensors that can scale unfavourably in sequence length and state space dimension. To deal with this, we introduce a sparse variational approach with inducing tensors. We then combine the resulting GP with LSTMs and GRUs to build larger models that leverage the strengths of each of these approaches and benchmark the resulting GPs on multivariate time series (TS) classification datasets. Code available at https://github.com/tgcsaba/GPSig., Comment: Near camera ready version for ICML 2020. Previous title: "Variational Gaussian Processes with Signature Covariances"
- Published
- 2019
32. A Free Boundary Characterisation of the Root Barrier for Markov Processes
- Author
-
Gassiat, Paul, Oberhauser, Harald, and Zou, Christina Z.
- Subjects
Mathematics - Probability ,60G40, 60J45 - Abstract
We study the existence, optimality, and construction of non-randomised stopping times that solve the Skorokhod embedding problem (SEP) for Markov processes which satisfy a duality assumption. These stopping times are hitting times of space-time subsets, so-called Root barriers. Our main result is, besides the existence and optimality, a potential-theoretic characterisation of this Root barrier as a free boundary. If the generator of the Markov process is sufficiently regular, this reduces to an obstacle PDE that has the Root barrier as free boundary and thereby generalises previous results from one-dimensional diffusions to Markov processes. However, our characterisation always applies and allows, at least in principle, to compute the Root barrier by dynamic programming, even when the well-posedness of the informally associated obstacle PDE is not clear. Finally, we demonstrate the flexibility of our method by replacing time by an additive functional in Root's construction. Already for multi-dimensional Brownian motion this leads to new class of constructive solutions of (SEP)., Comment: 31 pages, 14 figures
- Published
- 2019
33. An optimal polynomial approximation of Brownian motion
- Author
-
Foster, James, Lyons, Terry, and Oberhauser, Harald
- Subjects
Mathematics - Numerical Analysis ,Mathematics - Probability ,41A10, 60J65, 60L90, 65C30 - Abstract
In this paper, we will present a strong (or pathwise) approximation of standard Brownian motion by a class of orthogonal polynomials. The coefficients that are obtained from the expansion of Brownian motion in this polynomial basis are independent Gaussian random variables. Therefore it is practical (requires $N$ independent Gaussian coefficients) to generate an approximate sample path of Brownian motion that respects integration of polynomials with degree less than $N$. Moreover, since these orthogonal polynomials appear naturally as eigenfunctions of an integral operator defined by the Brownian bridge covariance function, the proposed approximation is optimal in a certain weighted $L^{2}(\mathbb{P})$ sense. In addition, discretizing Brownian paths as piecewise parabolas gives a locally higher order numerical method for stochastic differential equations (SDEs) when compared to the standard piecewise linear approach. We shall demonstrate these ideas by simulating Inhomogeneous Geometric Brownian Motion (IGBM). This numerical example will also illustrate the deficiencies of the piecewise parabola approximation when compared to a new version of the asymptotically efficient log-ODE (or Castell-Gaines) method., Comment: 27 pages, 8 figures
- Published
- 2019
- Full Text
- View/download PDF
34. Signature moments to characterize laws of stochastic processes
- Author
-
Chevyrev, Ilya and Oberhauser, Harald
- Subjects
Mathematics - Statistics Theory ,Mathematics - Probability ,Statistics - Machine Learning ,62M99 (Primary) 60G35, 62M07 (Secondary) - Abstract
The sequence of moments of a vector-valued random variable can characterize its law. We study the analogous problem for path-valued random variables, that is stochastic processes, by using so-called robust signature moments. This allows us to derive a metric of maximum mean discrepancy type for laws of stochastic processes and study the topology it induces on the space of laws of stochastic processes. This metric can be kernelized using the signature kernel which allows to efficiently compute it. As an application, we provide a non-parametric two-sample hypothesis test for laws of stochastic processes., Comment: 42 pages. Restructured the text, changed experiments in final section. To appear in Journal of Machine Learning Research
- Published
- 2018
35. Persistence paths and signature features in topological data analysis
- Author
-
Chevyrev, Ilya, Nanda, Vidit, and Oberhauser, Harald
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Probability ,Mathematics - Statistics Theory - Abstract
We introduce a new feature map for barcodes that arise in persistent homology computation. The main idea is to first realize each barcode as a path in a convenient vector space, and to then compute its path signature which takes values in the tensor algebra of that vector space. The composition of these two operations - barcode to path, path to tensor series - results in a feature map that has several desirable properties for statistical learning, such as universality and characteristicness, and achieves state-of-the-art results on common classification benchmarks., Comment: Additional experiment and further details. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence
- Published
- 2018
- Full Text
- View/download PDF
36. Probabilistic supervised learning
- Author
-
Gressmann, Frithjof, Király, Franz J., Mateen, Bilal, and Oberhauser, Harald
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Statistics Theory ,Statistics - Methodology - Abstract
Predictive modelling and supervised learning are central to modern data science. With predictions from an ever-expanding number of supervised black-box strategies - e.g., kernel methods, random forests, deep learning aka neural networks - being employed as a basis for decision making processes, it is crucial to understand the statistical uncertainty associated with these predictions. As a general means to approach the issue, we present an overarching framework for black-box prediction strategies that not only predict the target but also their own predictions' uncertainty. Moreover, the framework allows for fair assessment and comparison of disparate prediction strategies. For this, we formally consider strategies capable of predicting full distributions from feature variables, so-called probabilistic supervised learning strategies. Our work draws from prior work including Bayesian statistics, information theory, and modern supervised machine learning, and in a novel synthesis leads to (a) new theoretical insights such as a probabilistic bias-variance decomposition and an entropic formulation of prediction, as well as to (b) new algorithms and meta-algorithms, such as composite prediction strategies, probabilistic boosting and bagging, and a probabilistic predictive independence test. Our black-box formulation also leads (c) to a new modular interface view on probabilistic supervised learning and a modelling workflow API design, which we have implemented in the newly released skpro machine learning toolbox, extending the familiar modelling interface and meta-modelling functionality of sklearn. The skpro package provides interfaces for construction, composition, and tuning of probabilistic supervised learning strategies, together with orchestration features for validation and comparison of any such strategy - be it frequentist, Bayesian, or other.
- Published
- 2018
37. Sketching the order of events
- Author
-
Lyons, Terry and Oberhauser, Harald
- Subjects
Statistics - Machine Learning ,Computer Science - Data Structures and Algorithms ,Mathematics - Statistics Theory - Abstract
We introduce features for massive data streams. These stream features can be thought of as "ordered moments" and generalize stream sketches from "moments of order one" to "ordered moments of arbitrary order". In analogy to classic moments, they have theoretical guarantees such as universality that are important for learning algorithms.
- Published
- 2017
38. Kernels for sequentially ordered data
- Author
-
Király, Franz J and Oberhauser, Harald
- Subjects
Statistics - Machine Learning ,Computer Science - Discrete Mathematics ,Computer Science - Learning ,Mathematics - Statistics Theory ,Statistics - Methodology - Abstract
We present a novel framework for kernel learning with sequential data of any kind, such as time series, sequences of graphs, or strings. Our approach is based on signature features which can be seen as an ordered variant of sample (cross-)moments; it allows to obtain a "sequentialized" version of any static kernel. The sequential kernels are efficiently computable for discrete sequences and are shown to approximate a continuous moment form in a sampling sense. A number of known kernels for sequences arise as "sequentializations" of suitable static kernels: string kernels may be obtained as a special case, and alignment kernels are closely related up to a modification that resolves their open non-definiteness issue. Our experiments indicate that our signature-based sequential kernel framework may be a promising approach to learning with sequential data, such as time series, that allows to avoid extensive manual pre-processing.
- Published
- 2016
39. A free boundary characterisation of the Root barrier for Markov processes
- Author
-
Gassiat, Paul, Oberhauser, Harald, and Zou, Christina Z.
- Published
- 2021
- Full Text
- View/download PDF
40. An integral equation for Root's barrier and the generation of Brownian increments
- Author
-
Gassiat, Paul, Mijatović, Aleksandar, and Oberhauser, Harald
- Subjects
Mathematics - Probability - Abstract
We derive a nonlinear integral equation to calculate Root's solution of the Skorokhod embedding problem for atom-free target measures. We then use this to efficiently generate bounded time-space increments of Brownian motion and give a parabolic version of Muller's classic "Random walk over spheres" algorithm., Comment: Published at http://dx.doi.org/10.1214/14-AAP1042 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- Published
- 2013
- Full Text
- View/download PDF
41. A Levy-area between Brownian motion and rough paths with applications to robust non-linear filtering and RPDEs
- Author
-
Diehl, Joscha, Oberhauser, Harald, and Riedel, Sebastian
- Subjects
Mathematics - Probability - Abstract
We give meaning to differential equations with a rough path term and a Brownian noise term as driving signals. Such differential equations as well as the question of regularity of the solution map arise naturally and we discuss two applications: one revisits Clark's robustness problem in nonlinear filtering, the other is a Feynman--Kac type representation of linear RPDEs. En passant, we give a short and direct argument that implies integrability estimates for rough differential equations with Gaussian driving signals which is of independent interest., Comment: New coauthor, new section on integrability for Gaussian RDEs, minor changes
- Published
- 2013
42. Root's barrier, viscosity solutions of obstacle problems and reflected FBSDEs
- Author
-
Gassiat, Paul, Oberhauser, Harald, and Reis, Goncalo dos
- Subjects
Mathematics - Probability ,Mathematics - Analysis of PDEs - Abstract
We revisit work of Rost, Dupire and Cox--Wang on connections between Root's solution of the Skorokhod embedding problem and obstacle problems. We develop an approach based on viscosity sub- and supersolutions and an accompanying comparison principle. This gives a complete characterization of (reversed) Root barriers and leads to new proofs of existence as well as minimality of such barrier solutions by pure PDE methods. The approach is self-contained and general enough to cover martingale diffusions with degenerate elliptic or time-dependent volatility; it also provides insights about the dynamics of general Skorokhod embeddings.
- Published
- 2013
43. An extension of the functional Ito formula under a family of non-dominated measures
- Author
-
Oberhauser, Harald
- Subjects
Mathematics - Probability - Abstract
Motivated by questions arising in financial mathematics, Dupire introduced a notion of smoothness for functionals of paths (different from the usual Fr\'echet--Gat\'eaux derivatives) and arrived at a generalization of It\=o's formula applicable to functionals which have a pathwise continuous dependence on the trajectories of the underlying process. We study nonlinear functionals which do not have such pathwise continuity and further work simultaneously under the family of continuous semimartingale measures on path-space. We do this without introducing a second component, as carried out by Cont--Fournie but by using old work of Bichteler which allows to keep a pathwise picture even for complex functionals
- Published
- 2012
44. A Chen-Fliess approximation for diffusion functionals
- Author
-
Litterer, Christian and Oberhauser, Harald
- Subjects
Mathematics - Probability - Abstract
We show that an interesting class of functionals of stochastic differential equations can be approximated by a Chen-Fliess series of iterated stochastic integrals and give a L^{2} error estimate, thus generalizing the standard stochastic Taylor expansion. The coefficients in this series are given a very intuitive meaning by using functional derivatives, recently introduced by B. Dupire., Comment: extended introduction, references added, changed title, minor typos fixed
- Published
- 2011
45. Parabolic comparison revisited and applications
- Author
-
Diehl, Joscha, Friz, Peter K., and Oberhauser, Harald
- Subjects
Mathematics - Analysis of PDEs ,Mathematics - Probability ,35R99 - Abstract
We consider the Cauchy-Dirichlet problem $\partial_t u - F(t,x,u,Du,D^2 u) = 0 on (0,T)\times \R^n$ in viscosity sense. Comparison is established for bounded semi-continuous (sub-/super-)solutions under structural assumption (3.14) of the User's Guide plus a mild condition on $F$ such as to cope with the unbounded domain. Comparison on $(0,T]$, space-time regularity and existence are also discussed. Our analysis passes through an extension of the parabolic theorem of sums which appears to be useful in its own right.
- Published
- 2011
46. On the splitting-up method for rough (partial) differential equations
- Author
-
Friz, Peter and Oberhauser, Harald
- Subjects
Mathematics - Probability ,Mathematics - Analysis of PDEs ,60H15 - Abstract
This article introduces the splitting method to systems responding to rough paths as external stimuli. The focus is on nonlinear partial differential equations with rough noise but we also cover rough differential equations. Applications to stochastic partial differential equations arising in control theory and nonlinear filtering are given.
- Published
- 2010
47. Rough path stability of (semi-)linear SPDEs
- Author
-
Friz, Peter and Oberhauser, Harald
- Subjects
Mathematics - Probability ,Mathematics - Analysis of PDEs ,60H15 - Abstract
We give meaning to linear and semi-linear (possibly degenerate) parabolic partial differential equations with (affine) linear rough path noise and establish stability in a rough path metric. In the case of enhanced Brownian motion (Brownian motion with its L\'evy area) as rough path noise the solution coincides with the standard variational solution of the SPDE., Comment: Final submitted version, to appear in PTRF. The previous title was "Rough path stability of SPDEs arising in non-linear filtering"
- Published
- 2010
48. A generalized Fernique theorem and applications
- Author
-
Friz, Peter and Oberhauser, Harald
- Subjects
Mathematics - Probability ,Mathematics - Functional Analysis - Abstract
We prove a generalisation of Fernique's theorem which applies to a class of (measurable) functionals on abstract Wiener spaces by using the isoperimetric inequality. Our motivation comes from rough path theory where one deals with iterated integrals of Gaussian processes (which are generically not Gaussian). Gaussian integrability with explicitly given constants for variation and H\"older norms of the (fractional) Brownian rough path, Gaussian rough paths and the Banach space valued Wiener process enhanced with its L\'evy area [Ledoux, Lyons, Quian. "L\'evy area of Wiener processes in Banach spaces". Ann. Probab., 30(2):546--578, 2002] then all follow from applying our main theorem., Comment: To be published in the Proceedings of the AMS.
- Published
- 2010
49. A (rough) pathwise approach to a class of non-linear stochastic partial differential equations
- Author
-
Caruana, Michael, Friz, Peter, and Oberhauser, Harald
- Subjects
Mathematics - Analysis of PDEs ,Mathematics - Probability ,35R99, 60H15 - Abstract
We consider nonlinear parabolic evolution equations of the form $\partial_{t}u=F(t,x,Du,D^{2}u) $, subject to noise of the form $H(x,Du) \circ dB$ where $H$ is linear in $Du$ and $\circ dB$ denotes the Stratonovich differential of a multidimensional Brownian motion. Motivated by the essentially pathwise results of [Lions, P.-L. and Souganidis, P.E.; Fully nonlinear stochastic partial differential equations. C. R. Acad. Sci. Paris S\'{e}r. I Math. 326 (1998), no. 9] we propose the use of rough path analysis [Lyons, T. J.; Differential equations driven by rough signals. Rev. Mat. Iberoamericana 14 (1998), no. 2, 215--310] in this context. Although the core arguments are entirely deterministic, a continuity theorem allows for various probabilistic applications (limit theorems, support, large deviations, ...)., Comment: minor changes and some more details in the appendix; this version to appear in Annales de l'Institut Henri Poincar\'e / Analyse non lin\'eaire
- Published
- 2009
- Full Text
- View/download PDF
50. Non-standard approximations of the Ito-map
- Author
-
Friz, Peter and Oberhauser, Harald
- Subjects
Mathematics - Probability ,Mathematics - Classical Analysis and ODEs ,60H99 ,34K20 - Abstract
The Wong-Zakai theorem asserts that ODEs driven by "reasonable" (e.g. piecewise linear) approximations of Brownian motion converge to the corresponding Stratonovich stochastic differential equation. With the aid of rough path analysis, we study "non-reasonable" approximations and go beyond a well-known criterion of [Ikeda--Watanabe, North Holland 1989] in the sense that our result applies to perturbations on all levels, exhibiting additional drift terms involving any iterated Lie brackets of the driving vector fields. In particular, this applies to the approximations by McShane ('72) and Sussmann ('91). Our approach is not restricted to Brownian driving signals. At last, these ideas can be used to prove optimality of certain rough path estimates., Comment: 18 pages
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.