Author: "Katzfuss, Matthias" / Language: english - Searchworks@Jio Institute Digital Library Search Results

Author: Katzfuss, Matthias and Schäfer, Florian
Subjects: *DISTRIBUTION (Probability theory), *STOCHASTIC processes, *BAYESIAN field theory
Abstract: A multivariate distribution can be described by a triangular transport map from the target distribution to a simple reference distribution. We propose Bayesian nonparametric inference on the transport map by modeling its components using Gaussian processes. This enables regularization and uncertainty quantification of the map estimation, while resulting in a closed-form and invertible posterior map. We then focus on inferring the distribution of a nonstationary spatial field from a small number of replicates. We develop specific transport-map priors that are highly flexible and are motivated by the behavior of a large class of stochastic processes. Our approach is scalable to high-dimensional distributions due to data-dependent sparsity and parallel computations. We also discuss extensions, including Dirichlet process mixtures for flexible marginals. We present numerical results to demonstrate the accuracy, scalability, and usefulness of our methods, including statistical emulation of non-Gaussian climate-model output. for this article are available online. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Hierarchical sparse Cholesky decomposition with applications to high-dimensional spatio-temporal filtering

Author: Jurek, Marcin and Katzfuss, Matthias
Published: 2022
Full Text: View/download PDF

11. A Case Study Competition Among Methods for Analyzing Large Spatial Data

Author: Heaton, Matthew J., Datta, Abhirup, Finley, Andrew O., Furrer, Reinhard, Guinness, Joseph, Guhaniyogi, Rajarshi, Gerber, Florian, Gramacy, Robert B., Hammerling, Dorit, Katzfuss, Matthias, Lindgren, Finn, Nychka, Douglas W., Sun, Furong, and Zammit-Mangion, Andrew
Published: 2019

12. Functional analysis of variance (ANOVA) for carbon flux estimates from remote sensing data.

Author: Hobbs, Jonathan, Katzfuss, Matthias, Nguyen, Hai, Yadav, Vineet, and Liu, Junjie
Subjects: *REMOTE sensing, *FUNCTIONAL analysis, *CARBON cycle, *ATMOSPHERIC transport, *STATISTICAL models
Abstract: The constellation of Earth-observing satellites has now produced atmospheric greenhouse gas concentration estimates covering a period of several years. Their global coverage is providing additional information on the global carbon cycle. These products can be combined with complex inversion systems to infer the magnitude of carbon sources and sinks around the globe. Multiple factors, including the atmospheric transport model and satellite product aggregation method, can impact such flux estimates. Analysis of variance (ANOVA) is a well-established statistical framework for estimating common signals while partitioning variability across factors in the analysis of experiments. Functional ANOVA extends this approach with a statistical model that incorporates spatiotemporal correlation for each ANOVA component. The approach is illustrated on inversion experiments with different satellite retrieval aggregation methods and identifies consistent significant patterns in flux increments that span large spatial scales. Functional ANOVA identifies these patterns while accounting for the uncertainty at small spatial scales that is attributed to differences in the aggregation method. Functional ANOVA is also applied to a recent flux model intercomparison project (MIP), and the relative magnitudes of inversion system effects and data source (satellite versus in situ) are similar but exhibit slightly different importance for fluxes over different continents. In all examples, the unexplained residual variability is locally sizable in magnitude but with limited spatial and temporal correlation. These common behaviors across flux inversion experiments demonstrate the diagnostic capability for functional ANOVA to simultaneously distinguish the spatiotemporal coherence of carbon cycle processes and algorithmic factors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Sparse Cholesky factorization by greedy conditional selection

Author: Huan, Stephen, Guinness, Joseph, Katzfuss, Matthias, Owhadi, Houman, and Schäfer, Florian
Subjects: FOS: Computer and information sciences, 65F08, 65F55, 62-08, FOS: Mathematics, Numerical Analysis (math.NA), Mathematics - Numerical Analysis, Statistics - Computation, Computation (stat.CO)
Abstract: Dense kernel matrices resulting from pairwise evaluations of a kernel function arise naturally in machine learning and statistics. Previous work in constructing sparse approximate inverse Cholesky factors of such matrices by minimizing Kullback-Leibler divergence recovers the Vecchia approximation for Gaussian processes. These methods rely only on the geometry of the evaluation points to construct the sparsity pattern. In this work, we instead construct the sparsity pattern by leveraging a greedy selection algorithm that maximizes mutual information with target points, conditional on all points previously selected. For selecting $k$ points out of $N$, the naive time complexity is $\mathcal{O}(N k^4)$, but by maintaining a partial Cholesky factor we reduce this to $\mathcal{O}(N k^2)$. Furthermore, for multiple ($m$) targets we achieve a time complexity of $\mathcal{O}(N k^2 + N m^2 + m^3)$, which is maintained in the setting of aggregated Cholesky factorization where a selected point need not condition every target. We apply the selection algorithm to image classification and recovery of sparse Cholesky factors. By minimizing Kullback-Leibler divergence, we apply the algorithm to Cholesky factorization, Gaussian process regression, and preconditioning with the conjugate gradient, improving over $k$-nearest neighbors selection.
Published: 2023

14. A Multi-Resolution Approximation for Massive Spatial Datasets

Author: Katzfuss, Matthias
Published: 2017

15. Understanding the Ensemble Kalman Filter

Author: Katzfuss, Matthias, Stroud, Jonathan R., and Wikle, Christopher K.
Published: 2016

16. Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks

Author: Jimenez, Felix and Katzfuss, Matthias
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: For regression tasks, standard Gaussian processes (GPs) provide natural uncertainty quantification, while deep neural networks (DNNs) excel at representation learning. We propose to synergistically combine these two approaches in a hybrid method consisting of an ensemble of GPs built on the output of hidden layers of a DNN. GP scalability is achieved via Vecchia approximations that exploit nearest-neighbor conditional independence. The resulting deep Vecchia ensemble not only imbues the DNN with uncertainty quantification but can also provide more accurate and robust predictions. We demonstrate the utility of our model on several datasets and carry out experiments to understand the inner workings of the proposed method., 16 pages, 7 figures
Published: 2023

17. Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization

Author: Cao, Jian, Kang, Myeongjong, Jimenez, Felix, Sang, Huiyan, Schafer, Florian, and Katzfuss, Matthias
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Statistics - Computation, Computation (stat.CO), Machine Learning (cs.LG)
Abstract: To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. We combine this variational approximation of the posterior with a similar and efficient SIC-restricted Kullback-Leibler-optimal approximation of the prior. We then focus on a particular SIC ordering and nearest-neighbor-based sparsity pattern resulting in highly accurate prior and posterior approximations. For this setting, our variational approximation can be computed via stochastic gradient descent in polylogarithmic time per iteration. We provide numerical comparisons showing that the proposed double-Kullback-Leibler-optimal Gaussian-process approximation (DKLGP) can sometimes be vastly more accurate for stationary kernels than alternative approaches such as inducing-point and mean-field approximations at similar computational complexity., Accepted at the 2023 International Conference on Machine Learning (ICML). 18 pages with references and appendices, 14 figures
Published: 2023

18. Phenomic data-driven biological prediction of maize through field-based high-throughput phenotyping integration with genomic data.

Author: Adak, Alper, Kang, Myeongjong, Anderson, Steven L, Murray, Seth C, Jarquin, Diego, Wong, Raymond K W, and Katzfuß, Matthias
Subjects: SINGLE nucleotide polymorphisms, FLOWERING time, GENOME-wide association studies, FLOWERING of plants, BIOLOGICAL fitness, PLANT populations
Abstract: High-throughput phenotyping (HTP) has expanded the dimensionality of data in plant research; however, HTP has resulted in few novel biological discoveries to date. Field-based HTP (FHTP), using small unoccupied aerial vehicles (UAVs) equipped with imaging sensors, can be deployed routinely to monitor segregating plant population interactions with the environment under biologically meaningful conditions. Here, flowering dates and plant height, important phenological fitness traits, were collected on 520 segregating maize recombinant inbred lines (RILs) in both irrigated and drought stress trials in 2018. Using UAV phenomic, single nucleotide polymorphism (SNP) genomic, as well as combined data, flowering times were predicted using several scenarios. Untested genotypes were predicted with 0.58, 0.59, and 0.41 prediction ability for anthesis, silking, and terminal plant height, respectively, using genomic data, but prediction ability increased to 0.77, 0.76, and 0.58 when phenomic and genomic data were used together. Using the phenomic data in a genome-wide association study, a heat-related candidate gene (GRMZM2G083810 ; hsp18f) was discovered using temporal reflectance phenotypes belonging to flowering times (both irrigated and drought) trials where heat stress also peaked. Thus, a relationship between plants and abiotic stresses belonging to a specific time of growth was revealed only through use of temporal phenomic data. Overall, this study showed that (i) it is possible to predict complex traits using high dimensional phenomic data between different environments, and (ii) temporal phenomic data can reveal a time-dependent association between genotypes and abiotic stresses, which can help understand mechanisms to develop resilient plants. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. Parallel inference for massive distributed spatial data using low-rank models

Author: Katzfuss, Matthias and Hammerling, Dorit
Published: 2017
Full Text: View/download PDF

20. Locally anisotropic covariance functions on the sphere

Author: Cao, Jian, Zhang, Jingjie, Sun, Zhuoer, and Katzfuss, Matthias
Subjects: Methodology (stat.ME), FOS: Computer and information sciences, Statistics - Methodology
Abstract: Rapid developments in satellite remote-sensing technology have enabled the collection of geospatial data on a global scale, hence increasing the need for covariance functions that can capture spatial dependence on spherical domains. We propose a general method of constructing nonstationary, locally anisotropic covariance functions on the sphere based on covariance functions in R^3. We also provide theorems that specify the conditions under which the resulting correlation function is isotropic or axially symmetric. For large datasets on the sphere commonly seen in modern applications, the Vecchia approximation is used to achieve higher scalability on statistical inference. The importance of flexible covariance structures is demonstrated numerically using simulated data and a precipitation dataset.
Published: 2022

21. Scalable Spatio-Temporal Smoothing via Hierarchical Sparse Cholesky Decomposition

Author: Jurek, Marcin and Katzfuss, Matthias
Subjects: Methodology (stat.ME), FOS: Computer and information sciences, Statistics - Computation, Computation (stat.CO), Statistics - Methodology
Abstract: We propose an approximation to the forward-filter-backward-sampler (FFBS) algorithm for large-scale spatio-temporal smoothing. FFBS is commonly used in Bayesian statistics when working with linear Gaussian state-space models, but it requires inverting covariance matrices which have the size of the latent state vector. The computational burden associated with this operation effectively prohibits its applications in high-dimensional settings. We propose a scalable spatio-temporal FFBS approach based on the hierarchical Vecchia approximation of Gaussian processes, which has been previously successfully used in spatial statistics. On simulated and real data, our approach outperformed a low-rank FFBS approximation.
Published: 2022

22. Spatio-Temporal Data Fusion for Very Large Remote Sensing Datasets

Author: Nguyen, Hai, Cressie, Noel, Katzfuss, Matthias, and Braverman, Amy
Published: 2014

23. Scalable Gaussian-process regression and variable selection using Vecchia approximations

Author: Cao, Jian, Guinness, Joseph, Genton, Marc G., and Katzfuss, Matthias
Subjects: Methodology (stat.ME), FOS: Computer and information sciences, ComputingMethodologies_PATTERNRECOGNITION, Statistics - Machine Learning, MathematicsofComputing_NUMERICALANALYSIS, Statistics::Methodology, Machine Learning (stat.ML), Statistics - Methodology
Abstract: Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the Vecchia GP approximation, an ordered conditional approximation from spatial statistics that implies a sparse Cholesky factor of the precision matrix. We traverse the regularization path from strong to weak penalization, sequentially adding candidate covariates based on the gradient of the log-likelihood and deselecting irrelevant covariates via a new quadratic constrained coordinate descent algorithm. We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators. The resulting procedure is scalable to millions of responses and thousands of covariates. Theoretical analysis and numerical studies demonstrate the improved scalability and accuracy relative to existing methods., 30 pages, 9 figures
Published: 2022

24. Scalable spatio‐temporal smoothing via hierarchical sparse Cholesky decomposition.

Author: Jurek, Marcin and Katzfuss, Matthias
Subjects: GAUSSIAN processes, COVARIANCE matrices, KALMAN filtering
Abstract: We propose an approximation to the forward filter backward sampler (FFBS) algorithm for large‐scale spatio‐temporal smoothing. FFBS is commonly used in Bayesian statistics when working with linear Gaussian state‐space models, but it requires inverting covariance matrices which have the size of the latent state vector. The computational burden associated with this operation effectively prohibits its applications in high‐dimensional settings. We propose a scalable spatio‐temporal FFBS approach based on the hierarchical Vecchia approximation of Gaussian processes, which has been previously successfully used in spatial statistics. On simulated and real data, our approach outperformed a low‐rank FFBS approximation. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. Data fusion and spatial inference for remote sensing

Author: Braverman, Amy, Nguyen, Hai, Kang, Emily, Katzfuss, Matthias, Ma, Pulong, Michalak, Anna, Cressie, Noel, Stough, Tim, and Yadav, Vineet
Published: 2017

26. Data fusion and spatial inference for remote sensing

Author: Yadav, Vineet, Stough, Tim, Cressie, Noel, Michalak, Anna, Ma, Pulong, Katzfuss, Matthias, Kang, Emily, Nguyen, Hai, and Braverman, Amy
Abstract: UNKNOWN
Published: 2017

27. Functional ANOVA for Carbon Flux Estimates from Remote Sensing Data.

Author: Hobbs, Jonathan, Katzfuss, Matthias, Hai Nguyen, Yadav, Vineet, and Junjie Liu
Subjects: *REMOTE sensing, *CARBON cycle, *ATMOSPHERIC transport, *ANALYSIS of variance, *STATISTICAL models
Abstract: The constellation of Earth-observing satellites now produces atmospheric greenhouse gas concentration estimates across multiple years. Their global coverage is providing additional information on the global carbon cycle. These products are combined with complex inversion systems to infer the magnitude of carbon sources and sinks around the globe. Multiple factors, including the atmospheric transport model and satellite product aggregation method, can impact flux estimates. Functional analysis of variance (ANOVA) invokes a spatio-temporal statistical model to efficiently estimate common flux signals across multiple inversions, and partitions variability across the discrete factors considered. The approach is illustrated on inversion experiments with different satellite retrieval aggregation methods and identifies significant flux anomalies in the presence of mode differences across aggregation methods. Functional ANOVA is also applied to a recent flux model intercomparison project (MIP), and the relative magnitudes of transport model effects and data source (satellite versus in situ) are similar but exhibit slightly different importance for inversions over different continents. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

28. High-Dimensional Nonlinear Spatio-Temporal Filtering by Compressing Hierarchical Sparse Cholesky Factors.

Author: CHAKRABORTY, ANIRBAN and KATZFUSS, MATTHIAS
Subjects: *COLUMNS, *COVARIANCE matrices, *KALMAN filtering, *FILTERS & filtration
Abstract: Spatio-temporal filtering is a common and challenging task in many environmental applications, where the evolution is often nonlinear and the dimension of the spatial state may be very high. We propose a scalable filtering approach based on a hierarchical sparse Cholesky representation of the filtering covariance matrix. At each time point, we compress the sparse Cholesky factor into a dense matrix with a small number of columns. After applying the evolution to each of these columns, we decompress to obtain a hierarchical sparse Cholesky factor of the forecast covariance, which can then be updated based on newly available data. We illustrate the Cholesky evolution via an equivalent representation in terms of spatial basis functions. We also demonstrate the advantage of our method in numerical comparisons, including using a high-dimensional and nonlinear Lorenz model. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

29. Bayesian nonstationary and nonparametric covariance estimation for large spatial data

Author: Kidd, Brian and Katzfuss, Matthias
Subjects: Methodology (stat.ME), FOS: Computer and information sciences, Applications (stat.AP), Statistics - Applications, Statistics - Computation, Computation (stat.CO), Statistics - Methodology
Abstract: In spatial statistics, it is often assumed that the spatial field of interest is stationary and its covariance has a simple parametric form, but these assumptions are not appropriate in many applications. Given replicate observations of a Gaussian spatial field, we propose nonstationary and nonparametric Bayesian inference on the spatial dependence. Instead of estimating the quadratic (in the number of spatial locations) entries of the covariance matrix, the idea is to infer a near-linear number of nonzero entries in a sparse Cholesky factor of the precision matrix. Our prior assumptions are motivated by recent results on the exponential decay of the entries of this Cholesky factor for Matern-type covariances under a specific ordering scheme. Our methods are highly scalable and parallelizable. We conduct numerical comparisons and apply our methodology to climate-model output, enabling statistical emulation of an expensive physical model.
Published: 2020

30. Spatial Surface Reflectance Retrievals for Visible/Shortwave Infrared Remote Sensing via Gaussian Process Priors.

Author: Zilber, Daniel, Thompson, David R., Katzfuss, Matthias, Natraj, Vijay, Hobbs, Jonathan, and Braverman, Amy
Subjects: GAUSSIAN processes, REMOTE sensing, REFLECTANCE, SPECTRAL reflectance, SURFACE of the earth
Abstract: Remote Visible/Shortwave Infrared (VSWIR) imaging spectroscopy is a powerful tool for measuring the composition of Earth's surface over wide areas. This compositional information is captured by the spectral surface reflectance, where distinct shapes and absorption features indicate the chemical, bio- and geophysical properties of the materials in the scene. Estimating this surface reflectance requires removing the influence of atmospheric distortions caused by water vapor and particles. Traditionally reflectance is estimated by considering one location at a time, disentangling atmospheric and surface effects independently at all locations in a scene. However, this approach does not take advantage of spatial correlations between contiguous pixels. We propose an extension to a common Bayesian approach, Optimal Estimation, by introducing atmospheric correlations into the multivariate Gaussian prior. We show how this approach can be implemented as a small change to the traditional estimation procedure, thus limiting the additional computational burden. We demonstrate a simple version of the technique using simulations and multiple airborne radiance data sets. Our results show that the predicted atmospheric fields are smoother and more realistic than independent inversions given the assumption of spatial correlation and may reduce bias in the surface reflectance retrievals compared to post-process smoothing. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. Spatial Statistical Data Fusion for Remote Sensing Applications

Author: Nguyen, Hai, Katzfuss, Matthias, Cressie, Noel, and Braverman, Amy
Subjects: Earth Resources And Remote Sensing
Published: 2012

32. Multi-Resolution Filters for Massive Spatio-Temporal Data.

Author: Jurek, Marcin and Katzfuss, Matthias
Subjects: *SPATIOTEMPORAL processes, *GAUSSIAN processes, *MATHEMATICS
Abstract: Spatio-temporal datasets are rapidly growing in size. For example, environmental variables are measured with increasing resolution by increasing numbers of automated sensors mounted on satellites and aircraft. Using such data, which are typically noisy and incomplete, the goal is to obtain complete maps of the spatio-temporal process, together with uncertainty quantification. We focus here on real-time filtering inference in linear Gaussian state-space models. At each time point, the state is a spatial field evaluated on a very large spatial grid, making exact inference using the Kalman filter computationally infeasible. Instead, we propose a multi-resolution filter (MRF), a highly scalable and fully probabilistic filtering method that resolves spatial features at all scales. We prove that the MRF matrices exhibit a particular block-sparse multi-resolution structure that is preserved under filtering operations through time. We describe connections to existing methods, including hierarchical matrices from numerical mathematics. We also discuss inference on time-varying parameters using an approximate Rao-Blackwellized particle filter, in which the integrated likelihood is computed using the MRF. Using a simulation study and a real satellite-data application, we show that the MRF strongly outperforms competing approaches. include Python code for reproducing the simulations, some detailed properties of the MRF and auxiliary theoretical results. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

33. Interpretation of point forecasts with unknown directive.

Author: Schmidt, Patrick, Katzfuss, Matthias, and Gneiting, Tilmann
Subjects: ECONOMIC forecasting, GENERALIZED method of moments, FORECASTING, TIME series analysis
Abstract: Point forecasts can be interpreted as functionals (i.e., point summaries) of predictive distributions. We extend methodology for the identification of the functional based on time series of point forecasts and associated realizations. Focusing on state‐dependent quantiles and expectiles, we provide a generalized method of moments estimator for the functional, along with tests of optimality under general joint hypotheses of functional relationships and information bases. Our tests are more flexible, and in simulations better calibrated and more powerful than existing solutions. In empirical examples, economic growth forecasts and model output for precipitation are indicative of overstatement in anticipation of extreme events. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

34. Ensemble Kalman Filter Updates Based on Regularized Sparse Inverse Cholesky Factors.

Author: Boyles, Will and Katzfuss, Matthias
Subjects: *KALMAN filtering, *MATRIX inversion, *COVARIANCE matrices, *FORECASTING
Abstract: The ensemble Kalman filter (EnKF) is a popular technique for data assimilation in high-dimensional nonlinear state-space models. The EnKF represents distributions of interest by an ensemble, which is a form of dimension reduction that enables straightforward forecasting even for complicated and expensive evolution operators. However, the EnKF update step involves estimation of the forecast covariance matrix based on the (often small) ensemble, which requires regularization. Many existing regularization techniques rely on spatial localization, which may ignore long-range dependence. Instead, our proposed approach assumes a sparse Cholesky factor of the inverse covariance matrix, and the nonzero Cholesky entries are further regularized. The resulting method is highly flexible and computationally scalable. In our numerical experiments, our approach was more accurate and less sensitive to misspecification of tuning parameters than tapering-based localization. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

35. SPARSE CHOLESKY FACTORIZATION BY KULLBACK--LEIBLER MINIMIZATION.

Author: SCHÄFER, FLORIAN, KATZFUSS, MATTHIAS, and OWHADI, HOUMAN
Subjects: *GREEN'S functions, *ELLIPTIC functions, *FACTORIZATION, *GAUSSIAN distribution, *COMPUTATIONAL complexity, *GAUSSIAN processes
Abstract: We propose to compute a sparse approximate inverse Cholesky factor L of a dense covariance matrix \Theta by minimizing the Kullback--Leibler divergence between the Gaussian distributions \scrN (0, \Theta) and \scrN (0,L \top L 1), subject to a sparsity constraint. Surprisingly, this problem has a closed-form solution that can be computed efficiently, recovering the popular Vecchia approximation in spatial statistics. Based on recent results on the approximate sparsity of inverse Cholesky factors of \Theta obtained from pairwise evaluation of Green's functions of elliptic boundary-value problems at points \{xi\} 1\leq i\leq N \subset Rd, we propose an elimination ordering and sparsity pattern that allows us to compute \epsilon -approximate inverse Cholesky factors of such \Theta in computational complexity \scrO (N log(N/\epsilon)d) in space and \scrO (N log(N/\epsilon)2d) in time. To the best of our knowledge, this is the best asymptotic complexity for this class of problems. Furthermore, our method is embarrassingly parallel, automatically exploits low-dimensional structure in the data, and can perform Gaussian-process regression in linear (in N) space complexity. Motivated by its optimality properties, we propose applying our method to the joint covariance of training and prediction points in Gaussian-process regression, greatly improving stability and computational cost. Finally, we show how to apply our method to the important setting of Gaussian processes with additive noise, compromising neither accuracy nor computational complexity. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

36. Fine-Scale Spatiotemporal Air Pollution Analysis Using Mobile Monitors on Google Street View Vehicles.

Author: Guan, Yawen, Johnson, Margaret C., Katzfuss, Matthias, Mannshardt, Elizabeth, Messier, Kyle P., Reich, Brian J., and Song, Joon J.
Subjects: AIR pollution, AIR analysis, AIR pollution measurement, AIR quality, CITY traffic, STREETS
Abstract: People are increasingly concerned with understanding their personal environment, including possible exposure to harmful air pollutants. To make informed decisions on their day-to-day activities, they are interested in real-time information on a localized scale. Publicly available, fine-scale, high-quality air pollution measurements acquired using mobile monitors represent a paradigm shift in measurement technologies. A methodological framework utilizing these increasingly fine-scale measurements to provide real-time air pollution maps and short-term air quality forecasts on a fine-resolution spatial scale could prove to be instrumental in increasing public awareness and understanding. The Google Street View study provides a unique source of data with spatial and temporal complexities, with the potential to provide information about commuter exposure and hot spots within city streets with high traffic. We develop a computationally efficient spatiotemporal model for these data and use the model to make short-term forecasts and high-resolution maps of current air pollution levels. We also show via an experiment that mobile networks can provide more nuanced information than an equally sized fixed-location network. This modeling framework has important real-world implications in understanding citizens' personal environments, as data production and real-time availability continue to be driven by the ongoing development and improvement of mobile measurement technologies. for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

37. Ensemble Kalman Methods for High-Dimensional Hierarchical Dynamic Space-Time Models.

Author: Katzfuss, Matthias, Stroud, Jonathan R., and Wikle, Christopher K.
Subjects: *DYNAMIC models, *PARAMETER estimation, *KALMAN filtering, *STATISTICAL smoothing, *GIBBS sampling, *GEOPHYSICS, *STATE-space methods, *MARKOV chain Monte Carlo
Abstract: We propose a new class of filtering and smoothing methods for inference in high-dimensional, nonlinear, non-Gaussian, spatio-temporal state-space models. The main idea is to combine the ensemble Kalman filter and smoother, developed in the geophysics literature, with state-space algorithms from the statistics literature. Our algorithms address a variety of estimation scenarios, including online and off-line state and parameter estimation. We take a Bayesian perspective, for which the goal is to generate samples from the joint posterior distribution of states and parameters. The key benefit of our approach is the use of ensemble Kalman methods for dimension reduction, which allows inference for high-dimensional state vectors. We compare our methods to existing ones, including ensemble Kalman filters, particle filters, and particle MCMC. Using a real data example of cloud motion and data simulated under a number of nonlinear and non-Gaussian scenarios, we show that our approaches outperform these existing methods. for this article are available online. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

38. Multiscale Data Fusion for Surface Soil Moisture Estimation: A Spatial Hierarchical Approach.

Author: Kathuria, Dhruva, Mohanty, Binayak P., and Katzfuss, Matthias
Subjects: MULTISENSOR data fusion, SOIL moisture measurement, MEASUREMENT errors, SOIL moisture
Abstract: Surface soil moisture (SSM) has been identified as a key climate variable governing hydrologic and atmospheric processes across multiple spatial scales at local, regional, and global levels. The global burgeoning of SSM datasets in the past decade holds a significant potential in improving our understanding of multiscale SSM dynamics. The primary issues that hinder the fusion of SSM data from disparate instruments are (1) different spatial resolutions of the data instruments, (2) inherent spatial variability in SSM caused due to atmospheric and land surface controls, and (3) measurement errors caused due to imperfect retrievals of instruments. We present a data fusion scheme which takes all the above three factors into account using a Bayesian spatial hierarchical model (SHM), combining a geostatistical approach with a hierarchical model. The applicability of the fusion scheme is demonstrated by fusing point, airborne, and satellite data for a watershed exhibiting high spatial variability in Manitoba, Canada. We demonstrate that the proposed data fusion scheme is adept at assimilating and predicting SSM distribution across all three scales while accounting for potential measurement errors caused due to imperfect retrievals. Further validation of the algorithm is required in different hydroclimates and surface heterogeneity as well as for other data platforms for wider applicability. Plain Language Summary: Surface soil moisture (SSM) is an essential climate‐variable governing land‐atmosphere interactions. SSM is spatially variable in the presence of changing atmospheric factors such as rainfall and land‐surface characteristics such as soil, vegetation, and topography. SSM is measured using various instruments from point to satellite resolutions (25–40 km) and each instrument is accompanied by its own set of errors. Due to the importance of SSM, it would be beneficial to combine the SSM measurements from all available instruments in a region while accounting for the spatially varying nature of SSM and the measurement errors caused due to instruments. We present a novel framework to achieve the abovementioned objective and successfully apply it to a watershed in Manitoba, Canada to combine data from point, airborne, and satellite instruments. We demonstrate that the proposed framework can be used to optimally combine and predict SSM across different spatial resolutions in the presence of uncertainty. Key Points: Proposed a multi‐scale data fusion framework accounting for spatial variance/correlation of soil moistureThe proposed framework optimally separates the inherent soil moisture dynamics and measurement errors in instrumentsThe framework is applied to combine point, airborne and satellite data in a heterogeneous watershed [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

39. BADER: Bayesian analysis of differential expression in RNA sequencing data

Author: Katzfuss, Matthias, Neudecker, Andreas, Anders, Simon, and Gagneur, Julien
Subjects: Methodology (stat.ME), FOS: Computer and information sciences, Applications (stat.AP), Statistics - Applications, Statistics - Methodology
Abstract: Identifying differentially expressed genes from RNA sequencing data remains a challenging task because of the considerable uncertainties in parameter estimation and the small sample sizes in typical applications. Here we introduce Bayesian Analysis of Differential Expression in RNA-sequencing data (BADER). Due to our choice of data and prior distributions, full posterior inference for BADER can be carried out efficiently. The method appropriately takes uncertainty in gene variance into account, leading to higher power than existing methods in detecting differentially expressed genes. Moreover, we show that the posterior samples can be naturally integrated into downstream gene set enrichment analyses, with excellent performance in detecting enriched sets. An open-source R package (BADER) that provides a user-friendly interface to a C++ back-end is available on Bioconductor., 14 pages, 3 figures, 1 table
Published: 2014

40. A Nonstationary Geostatistical Framework for Soil Moisture Prediction in the Presence of Surface Heterogeneity.

Author: Kathuria, Dhruva, Mohanty, Binayak P., and Katzfuss, Matthias
Subjects: SOIL moisture, GEOLOGICAL statistics
Abstract: Soil moisture is spatially variable due to complex interactions between geologic, topographic, vegetation, and atmospheric variables. Correct representation of subgrid soil moisture variability is crucial in improving land surface modeling schemes and remote sensing retrievals. In addition to the mean structure, the variance and correlation of soil moisture are affected by the underlying land surface heterogeneity. This often violates the underlying assumption of stationarity/isotropy made by classical geostatistical models. The present study proposes a geostatistical framework to predict and upscale soil moisture in a nonstationary setting using a flexible spatial model whose variance/correlation structure varies with changing land surface characteristics. The proposed framework is applied to model soil moisture distribution using in situ data in the Red River watershed in Southern Manitoba, Canada. It is seen that both the variance and correlation structure exhibits spatial nonstationarity for the given surface heterogeneity driven primarily by vegetation and soil texture. At the beginning of the crop season, soil texture plays a critical role in the drying cycle by decreasing variance and increasing correlation as the soil becomes drier. Once the crops begin to mature, vegetation becomes the dominant driver, promoting spatial correlation and reducing SM variance. We upscale our point scale soil moisture predictions to the airborne extent (∼1.5 km) and find that the upscaled soil moisture agrees well with the observed airborne data with root‐mean‐square error values ranging from 0.04 to 0.08 (v/v). The proposed framework can be used to predict and upscale soil moisture in heterogeneous environments. Plain Language Summary: Soil moisture (SM) is a critical variable governing the global water and energy cycles. Understanding how SM varies in space is therefore critical. This spatial variation of SM can be typically defined by three statistical quantities: mean (average value), variance (how far the individual SM values are from the average value), and correlation (how individual SM values are related to each other). Variance/correlation of SM are typically assumed to be constant in traditional geostatistics methods. This is a major shortcoming because it has been well established that land surface characteristics such as soil, vegetation, and topography affects the spatial variability of SM. In this study, we propose a framework that accounts for the effect of these characteristics on the variance/correlation of SM. We apply our framework to a watershed in Manitoba, Canada, and find that our framework performs significantly better than the traditional method. We find that soil texture and vegetation affect SM distribution at different stages of crop growth. We aggregate our point scale SM predictions to 1.5‐km (airborne) scale and find that our predictions mimic observed SM data at this scale. We conclude that our framework can be used to predict and aggregate SM using surface data. Key Points: Proposed a framework to assess spatial nonstationarity of soil moistureOptimal prediction and upscaling of soil moisture under nonstationarityQuantified the effects of soil texture and vegetation on the spatial variance/correlation of soil moisture [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

41. Statistical Inference for Massive Distributed Spatial Data Using Low-Rank Models

Author: Katzfuss, Matthias
Abstract: Due to rapid data growth, it is increasingly becoming infeasible to move massive datasets, and statistical analyses have to be carried out where the data reside. If several massive datasets stored in separate physical locations are all relevant to a given problem, the challenge is to obtain valid inference based on all data without moving the datasets. This distributed data problem frequently arises in the geophysical and environmental sciences, for example when a spatial process of interest is measured by several satellite instruments. We show that for the widely used class of spatial low-rank models, which contain a component that can be written as a linear combination of spatial basis functions, computationally feasible spatial inference and prediction for massive distributed data can be carried out exactly and in parallel. The required number of floating-point operations is linear in the number of data points, while the required amount of communication does not depend on the data sizes at all. After discussing several extensions and special cases of this result, we apply our methodology to carry out spatio-temporal filtering inference on total precipitable water measured by three different sensor systems.
Published: 2014
Full Text: View/download PDF

42. A Bayesian Adaptive Ensemble Kalman Filter for Sequential State and Parameter Estimation.

Author: Stroud, Jonathan R., Katzfuss, Matthias, and Wikle, Christopher K.
Subjects: *KALMAN filtering, *PARAMETER estimation, *BAYESIAN analysis, *DISTRIBUTION (Probability theory), *SEQUENTIAL analysis
Abstract: This paper proposes new methodology for sequential state and parameter estimation within the ensemble Kalman filter. The method is fully Bayesian and propagates the joint posterior distribution of states and parameters over time. To implement the method, the authors consider three representations of the marginal posterior distribution of the parameters: a grid-based approach, a Gaussian approximation, and a sequential importance sampling (SIR) approach with kernel resampling. In contrast to existing online parameter estimation algorithms, the new method explicitly accounts for parameter uncertainty and provides a formal way to combine information about the parameters from data at different time periods. The method is illustrated and compared to existing approaches using simulated and real data. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

43. A Bayesian hierarchical model for climate change detection and attribution.

Author: Katzfuss, Matthias, Hammerling, Dorit, and Smith, Richard L.
Published: 2017
Full Text: View/download PDF

44. Bayesian nonstationary spatial modeling for very large datasets.

Author: Katzfuss, Matthias
Subjects: BAYESIAN analysis, ANALYSIS of covariance, APPROXIMATION theory, LOW-rank matrices, STATISTICS
Abstract: With the proliferation of modern high-resolution measuring instruments mounted on satellites, planes, ground-based vehicles, and monitoring stations, a need has arisen for statistical methods suitable for the analysis of large spatial datasets observed on large spatial domains. Statistical analyses of such datasets provide two main challenges: first, traditional spatial-statistical techniques are often unable to handle large numbers of observations in a computationally feasible way; second, for large and heterogeneous spatial domains, it is often not appropriate to assume that a process of interest is stationary over the entire domain. We address the first challenge by using a model combining a low-rank component, which allows for flexible modeling of medium-to-long-range dependence via a set of spatial basis functions, with a tapered remainder component, which allows for modeling of local dependence using a compactly supported covariance function. Addressing the second challenge, we propose two extensions to this model that result in increased flexibility: first, the model is parameterized on the basis of a nonstationary Matérn covariance, where the parameters vary smoothly across space; second, in our fully Bayesian model, all components and parameters are considered random, including the number, locations, and shapes of the basis functions used in the low-rank component. Using simulated data and a real-world dataset of high-resolution soil measurements, we show that both extensions can result in substantial improvements over the current state-of-the-art. Copyright © 2013 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

45. Bayesian hierarchical spatio-temporal smoothing for very large datasets.

Author: Katzfuss, Matthias and Cressie, Noel
Subjects: BAYESIAN analysis, ANALYSIS of covariance, MONTE Carlo method, EXPECTATION-maximization algorithms, AUTOREGRESSION (Statistics), STOCHASTIC processes
Abstract: Spatio-temporal statistics is prone to the curse of dimensionality: one manifestation of this is inversion of the data-covariance matrix, which is not in general feasible for very-large-to-massive datasets, such as those observed by satellite instruments. This becomes even more of a problem in fully Bayesian statistical models, where the inversion typically has to be carried out many times in Markov chain Monte Carlo samplers. Here, we propose a Bayesian hierarchical spatio-temporal random effects (STRE) model that offers fast computation: Dimension reduction is achieved by projecting the process onto a basis-function space of low, fixed dimension, and the temporal evolution is modeled using a dynamical autoregressive model in time. We develop a multiresolutional prior for the propagator matrix that allows for unknown (random) sparsity and shrinkage, and we describe how sampling from the posterior distribution can be achieved in a feasible way, even if this matrix is very large. Finally, we compare inference based on our fully Bayesian STRE model with that based on an empirical-Bayesian STRE-model approach, where parameters are estimated via an expectation-maximization algorithm. The comparison is carried out in a simulation study and on a real-world dataset of global satellite CO2 measurements. Copyright © 2011 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

46. Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets.

Author: Katzfuss, Matthias and Cressie, Noel
Subjects: *SMOOTHING (Numerical analysis), *REMOTE sensing, *ALGORITHMS, *DIMENSION reduction (Statistics), *ESTIMATION theory, *SIMULATION methods & models, *PARAMETER estimation
Abstract: The use of satellite measurements in climate studies promises many new scientific insights if those data can be efficiently exploited. Due to sparseness of daily data sets, there is a need to fill spatial gaps and to borrow strength from adjacent days. Nonetheless, these satellites are typically capable of conducting on the order of 100,000 retrievals per day, which makes it impossible to apply traditional spatio-temporal statistical methods, even in supercomputing environments. To overcome these challenges, we make use of a spatio-temporal mixed-effects model. For each massive daily data set, dimension reduction is achieved by essentially modelling the underlying process as a linear combination of spatial basis functions on the globe. The application of a dynamical autoregressive model in time, over the reduced space, allows rapid sequential computation of optimal smoothing predictions via the Kalman smoother; this is known as Fixed Rank Smoothing (FRS). The dimension-reduced mixed-effects model contains a number of unknown parameters, including covariance and propagator matrices, which describe the spatial and temporal dependence structure in the reduced-dimensional process. We take an empirical-Bayes approach to inference, which involves estimating the parameters and substituting them into the optimal predictors. Method-of-moments (MM) parameter estimation (currently used in FRS) is typically inefficient compared to maximum likelihood (ML) estimation and can result in large sampling variability. Here, we develop ML estimation via an expectation-maximization (EM) algorithm, which offers stable computation of valid estimators and makes efficient use of spatial and temporal dependence in the data. The two parameter-estimation approaches, MM and ML, are compared in a simulation study. We also apply our methodology to global satellite CO measurements: We optimally smooth the sparse daily CO maps obtained by the Atmospheric InfraRed Sounder (AIRS) instrument on the Aqua satellite; then, using FRS with EM-estimated parameters, a complete sequence of the daily global CO fields can be obtained, together with their associated prediction uncertainties. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

47. Spatio-temporal models for large-scale indicators of extreme weather.

Author: Heaton, Matthew J., Katzfuss, Matthias, Ramachandar, Shahla, Pedings, Kathryn, Gilleland, Eric, Mannshardt-Shamseldin, Elizabeth, and Smith, Richard L.
Subjects: WEATHER, SPATIO-temporal variation, THUNDERSTORMS, TORNADOES
Abstract: Extreme weather events such as thunderstorms and tornadoes are of great concern as these events pose a significant threat to life, property, and economic stability. Because of the difficulty of gathering data on extreme events, this paper proposes modeling the conditions for extreme weather through large-scale indicators. The advantage of using large-scale indicators is that climate models can be used to generate data whereas climate models cannot generate data on extreme events themselves. This paper focuses on comparing spatio-temporal models for reanalysis data of large-scale indicators for extreme weather observed across the continental United States and Mexico. Results indicate that rigorous treatment of spatial and temporal dynamics is necessary. The models find that the intensity of conditions for extreme weather is particularly high for the central United States and the intensity of these conditions is increasing over time but the amount of increase may not be practically significant. Copyright © 2010 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

48. Spatial Retrievals of Atmospheric Carbon Dioxide from Satellite Observations.

Author: Hobbs, Jonathan, Katzfuss, Matthias, Zilber, Daniel, Brynjarsdóttir, Jenný, Mondal, Anirban, Berrocal, Veronica, and Dubovik, Oleg
Subjects: *ATMOSPHERIC carbon dioxide, *WEATHER, *TRACE gases
Abstract: Modern remote-sensing retrievals often invoke a Bayesian approach to infer atmospheric properties from observed radiances. In this approach, plausible mean states and variability for the quantities of interest are encoded in a prior distribution. Recent developments have devised prior assumptions for the correlation among atmospheric constituents and across observing locations. This work formulates a spatial statistical framework for simultaneous multi-footprint retrievals of carbon dioxide (CO2) with application to the Orbiting Carbon Observatory-2/3 (OCO-2/3). Formally, the retrieval state vector is extended to include atmospheric and surface conditions at many footprints in a small region, and a prior distribution that assumes spatial correlation across these locations is assumed. This spatial prior allows the length-scale, or range, of spatial correlation to vary between different elements of the state vector. Various single- and multi-footprint retrievals are compared in a simulation study. A spatial prior that also includes relatively large prior variances for CO2 results in posterior inferences that most accurately represent the true state and that reduce the correlation in retrieval error across locations. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

49. Scalable Gaussian-process regression and variable selection using Vecchia approximations.

Author: Jian Cao, Guinness, Joseph, Genton, Marc G., and Katzfuss, Matthias
Subjects: *GAUSSIAN processes, *NUMERICAL analysis, *MATHEMATICAL variables, *SCALABILITY, *STATISTICS, *SPARSE approximations
Abstract: Gaussian process (GP) regression is a exible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the Vecchia GP approximation, an ordered conditional approximation from spatial statistics that implies a sparse Cholesky factor of the precision matrix. We traverse the regularization path from strong to weak penalization, sequentially adding candidate covariates based on the gradient of the log-likelihood and deselecting irrelevant covariates via a new quadratic constrained coordinate descent algorithm. We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators. The resulting procedure is scalable to millions of responses and thousands of covariates. Theoretical analysis and numerical studies demonstrate the improved scalability and accuracy relative to existing methods. [ABSTRACT FROM AUTHOR]
Published: 2022

50. Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide.

Author: Messier KP and Katzfuss M
Abstract: Nitrogen dioxide (NO 2 ) is a primary constituent of traffic-related air pollution and has well established harmful environmental and human-health impacts. Knowledge of the spatiotemporal distribution of NO 2 is critical for exposure and risk assessment. A common approach for assessing air pollution exposure is linear regression involving spatially referenced covariates, known as land-use regression (LUR). We develop a scalable approach for simultaneous variable selection and estimation of LUR models with spatiotemporally correlated errors, by combining a general-Vecchia Gaussian-process approximation with a penalty on the LUR coefficients. In comparisons to existing methods using simulated data, our approach resulted in higher model-selection specificity and sensitivity and in better prediction in terms of calibration and sharpness, for a wide range of relevant settings. In our spatiotemporal analysis of daily, US-wide, ground-level NO 2 data, our approach was more accurate, and produced a sparser and more interpretable model. Our daily predictions elucidate spatiotemporal patterns of NO 2 concentrations across the United States, including significant variations between cities and intra-urban variation. Thus, our predictions will be useful for epidemiological and risk-assessment studies seeking daily, national-scale predictions, and they can be used in acute-outcome health-risk assessments.
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Journal

Region

Database

Publisher

50 results on '"Katzfuss, Matthias"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources