129 results on '"Gu, Mengyang"'
Search Results
2. The inverse Kalman filter
- Author
-
Fang, Xinyi and Gu, Mengyang
- Subjects
Statistics - Methodology ,Statistics - Applications ,Statistics - Computation - Abstract
In this study, we introduce a new approach, the inverse Kalman filter (IKF), which enables accurate matrix-vector multiplication between a covariance matrix from a dynamic linear model and any real-valued vector with linear computational cost. We incorporate the IKF with the conjugate gradient algorithm, which substantially accelerates the computation of matrix inversion for a general form of covariance matrices, whereas other approximation approaches may not be directly applicable. We demonstrate the scalability and efficiency of the IKF approach through distinct applications, including nonparametric estimation of particle interaction functions and predicting incomplete lattices of correlated data, using both simulation and real-world observations, including cell trajectory and satellite radar interferogram.
- Published
- 2024
3. Learning from landmarks, curves, surfaces, and shapes in Geomstats
- Author
-
Pereira, Luís F., Brigant, Alice Le, Myers, Adele, Hartman, Emmanuel, Khan, Amil, Tuerkoen, Malik, Dold, Trey, Gu, Mengyang, Suárez-Serrato, Pablo, and Miolane, Nina
- Subjects
Computer Science - Graphics ,Computer Science - Mathematical Software ,Mathematics - Differential Geometry - Abstract
We introduce the shape module of the Python package Geomstats to analyze shapes of objects represented as landmarks, curves and surfaces across fields of natural sciences and engineering. The shape module first implements widely used shape spaces, such as the Kendall shape space, as well as elastic spaces of discrete curves and surfaces. The shape module further implements the abstract mathematical structures of group actions, fiber bundles, quotient spaces and associated Riemannian metrics which allow users to build their own shape spaces. The Riemannian geometry tools enable users to compare, average, interpolate between shapes inside a given shape space. These essential operations can then be leveraged to perform statistics and machine learning on shape data. We present the object-oriented implementation of the shape module along with illustrative examples and show how it can be used to perform statistics and machine learning on shape spaces.
- Published
- 2024
4. Perfecting Liquid-State Theories with Machine Intelligence
- Author
-
Wu, Jianzhong and Gu, Mengyang
- Subjects
Physics - Chemical Physics ,Condensed Matter - Soft Condensed Matter ,Computer Science - Machine Learning ,Physics - Computational Physics ,Statistics - Applications - Abstract
Recent years have seen a significant increase in the use of machine intelligence for predicting electronic structure, molecular force fields, and the physicochemical properties of various condensed systems. However, substantial challenges remain in developing a comprehensive framework capable of handling a wide range of atomic compositions and thermodynamic conditions. This perspective discusses potential future developments in liquid-state theories leveraging on recent advancements of functional machine learning. By harnessing the strengths of theoretical analysis and machine learning techniques including surrogate models, dimension reduction and uncertainty quantification, we envision that liquid-state theories will gain significant improvements in accuracy, scalability and computational efficiency, enabling their broader applications across diverse materials and chemical systems.
- Published
- 2023
5. Sequential Kalman filter for fast online changepoint detection in longitudinal health records
- Author
-
Li, Hanmo, Wang, Yuedong, and Gu, Mengyang
- Subjects
Statistics - Applications ,Statistics - Methodology - Abstract
This article introduces the sequential Kalman filter, a computationally scalable approach for online changepoint detection with temporally correlated data. The temporal correlation was not considered in the Bayesian online changepoint detection approach due to the large computational cost. Motivated by detecting COVID-19 infections for dialysis patients from massive longitudinal health records with a large number of covariates, we develop a scalable approach to detect multiple changepoints from correlated data by sequentially stitching Kalman filters of subsequences to compute the joint distribution of the observations, which has linear computational complexity with respect to the number of observations between the last detected changepoint and the current observation at each time point, without approximating the likelihood function. Compared to other online changepoint detection methods, simulated experiments show that our approach is more precise in detecting single or multiple changes in mean, variance, or correlation for temporally correlated data. Furthermore, we propose a new way to integrate classification and changepoint detection approaches that improve the detection delay and accuracy for detecting COVID-19 infection compared to other alternatives.
- Published
- 2023
6. Analyzing Disparity and Temporal Progression of Internet Quality through Crowdsourced Measurements with Bias-Correction
- Author
-
Lee, Hyeongseong, Paul, Udit, Gupta, Arpit, Belding, Elizabeth, and Gu, Mengyang
- Subjects
Statistics - Applications ,Computer Science - Networking and Internet Architecture - Abstract
Crowdsourced speedtest measurements are an important tool for studying internet performance from the end user perspective. Nevertheless, despite the accuracy of individual measurements, simplistic aggregation of these data points is problematic due to their intrinsic sampling bias. In this work, we utilize a dataset of nearly 1 million individual Ookla Speedtest measurements, correlate each datapoint with 2019 Census demographic data, and develop new methods to present a novel analysis to quantify regional sampling bias and the relationship of internet performance to demographic profile. We find that the crowdsourced Ookla Speedtest data points contain significant sampling bias across different census block groups based on a statistical test of homogeneity. We introduce two methods to correct the regional bias by the population of each census block group. Whereas the sampling bias leads to a small discrepancy in the overall cumulative distribution function of internet speed in a city between estimation from original samples and bias-corrected estimation, the discrepancy is much smaller compared to the size of the sampling heterogeneity across regions. Further, we show that the sampling bias is strongly associated with a few demographic variables, such as income, education level, age, and ethnic distribution. Through regression analysis, we find that regions with higher income, younger populations, and lower representation of Hispanic residents tend to measure faster internet speeds along with substantial collinearity amongst socioeconomic attributes and ethnic composition. Finally, we find that average internet speed increases over time based on both linear and nonlinear analysis from state space models, though the regional sampling bias may result in a small overestimation of the temporal increase of internet speed.
- Published
- 2023
7. Ab initio uncertainty quantification in scattering analysis of microscopy
- Author
-
Gu, Mengyang, He, Yue, Liu, Xubo, and Luo, Yimin
- Subjects
Physics - Computational Physics ,Condensed Matter - Soft Condensed Matter ,Physics - Data Analysis, Statistics and Probability ,Statistics - Applications ,Statistics - Methodology - Abstract
Estimating parameters from data is a fundamental problem, customarily done by minimizing a loss function between a model and observed statistics. In scattering-based analysis, researchers often employ their domain expertise to select a specific range of wave vectors for analysis, a choice that can vary depending on the specific case. We introduce another paradigm that defines a probabilistic generative model from the beginning of data processing and propagates the uncertainty for parameter estimation, termed the ab initio uncertainty quantification (AIUQ). As an illustrative example, we demonstrate this approach with differential dynamic microscopy (DDM) that extracts dynamical information through Fourier analysis at a selected range of wave vectors. We first show that the conventional way of estimation in DDM is equivalent to fitting a temporal variogram in the reciprocal space using a latent factor model. Then we derive the maximum marginal likelihood estimator, which optimally weighs the information at all wave vectors, therefore eliminating the need to select the range of wave vectors. Furthermore, we substantially reduce the computational cost by utilizing the generalized Schur algorithm for Toeplitz covariances without approximation. Simulated studies validate that AIUQ improves estimation accuracy and enables model selection with automated analysis. The utility of AIUQ is also demonstrated by three distinct sets of experiments: first in an isotropic Newtonian fluid, pushing limits of optically dense systems compared to multiple particle tracking; next in a system undergoing a sol-gel transition, automating the determination of gelling points and critical exponent; and lastly, in discerning anisotropic diffusive behavior of colloids in a liquid crystal. These outcomes collectively underscore AIUQ's versatility to capture system dynamics in an efficient and automated manner., Comment: 23 pages, 9 figures
- Published
- 2023
- Full Text
- View/download PDF
8. Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification
- Author
-
Gu, Mengyang, Lin, Yizi, Lee, Victor Chang, and Qiu, Diana Y
- Subjects
Applied Mathematics ,Mathematical Sciences ,Bayesian prior ,Generative models ,Dynamic mode decomposition ,Forecast ,Gaussian processes ,Uncertainty quantification ,Fluids & Plasmas ,Applied mathematics ,Mathematical physics ,Numerical and computational mathematics - Published
- 2024
9. Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification
- Author
-
Gu, Mengyang, Lin, Yizi, Lee, Victor Chang, and Qiu, Diana
- Subjects
Statistics - Methodology ,Physics - Data Analysis, Statistics and Probability ,Statistics - Applications - Abstract
Data-driven modeling is useful for reconstructing nonlinear dynamical systems when the underlying process is unknown or too expensive to compute. Having reliable uncertainty assessment of the forecast enables tools to be deployed to predict new scenarios unobserved before. In this work, we first extend parallel partial Gaussian processes for predicting the vector-valued transition function that links the observations between the current and next time points, and quantify the uncertainty of predictions by posterior sampling. Second, we show the equivalence between the dynamic mode decomposition and the maximum likelihood estimator of the linear mapping matrix in the linear state space model. The connection provides a {probabilistic generative} model of dynamic mode decomposition and thus, uncertainty of predictions can be obtained. Furthermore, we draw close connections between different data-driven models for approximating nonlinear dynamics, through a unified view of generative models. We study two numerical examples, where the inputs of the dynamics are assumed to be known in the first example and the inputs are unknown in the second example. The examples indicate that uncertainty of forecast can be properly quantified, whereas model or input misspecification can degrade the accuracy of uncertainty quantification.
- Published
- 2023
- Full Text
- View/download PDF
10. A Nonparametric Mixed-Effects Mixture Model for Patterns of Clinical Measurements Associated with COVID-19
- Author
-
Ma, Xiaoran, Guo, Wensheng, Gu, Mengyang, Usvyat, Len, Kotanko, Peter, and Wang, Yuedong
- Subjects
Statistics - Methodology - Abstract
Some patients with COVID-19 show changes in signs and symptoms such as temperature and oxygen saturation days before being positively tested for SARS-CoV-2, while others remain asymptomatic. It is important to identify these subgroups and to understand what biological and clinical predictors are related to these subgroups. This information will provide insights into how the immune system may respond differently to infection and can further be used to identify infected individuals. We propose a flexible nonparametric mixed-effects mixture model that identifies risk factors and classifies patients with biological changes. We model the latent probability of biological changes using a logistic regression model and trajectories in the latent groups using smoothing splines. We developed an EM algorithm to maximize the penalized likelihood for estimating all parameters and mean functions. We evaluate our methods by simulations and apply the proposed model to investigate changes in temperature in a cohort of COVID-19-infected hemodialysis patients.
- Published
- 2023
11. Data-Driven Model Construction for Anisotropic Dynamics of Active Matter
- Author
-
Gu, Mengyang, Fang, Xinyi, and Luo, Yimin
- Published
- 2023
12. Molecular-scale substrate anisotropy and crowding drive long-range nematic order of cell monolayers
- Author
-
Luo, Yimin, Gu, Mengyang, Park, Minwook, Fang, Xinyi, Kwon, Younghoon, Urueña, Juan Manuel, de Alaniz, Javier Read, Helgeson, Matthew E., Marchetti, M. Cristina, and Valentine, Megan T.
- Subjects
Physics - Biological Physics ,Condensed Matter - Soft Condensed Matter - Abstract
The ability of cells to reorganize in response to external stimuli is important in areas ranging from morphogenesis to tissue engineering. Elongated cells can co-align due to steric effects, forming states with local order. We show that molecular-scale substrate anisotropy can direct cell organization, resulting in the emergence of nematic order on tissue scales. To quantitatively examine the disorder-order transition, we developed a high-throughput imaging platform to analyze velocity and orientational correlations for several thousand cells over days. The establishment of global, seemingly long-ranged order is facilitated by enhanced cell division along the substrate's nematic axis, and associated extensile stresses that restructure the cells' actomyosin networks. Our work, which connects to a class of systems known as active dry nematics, provides a new understanding of the dynamics of cellular remodeling and organization in weakly interacting cell collectives. This enables data-driven discovery of cell-cell interactions and points to strategies for tissue engineering., Comment: 29 pages, 7 figures
- Published
- 2022
- Full Text
- View/download PDF
13. Reliable emulation of complex functionals by active learning with error control
- Author
-
Fang, Xinyi, Gu, Mengyang, and Wu, Jianzhong
- Subjects
Physics - Chemical Physics ,Statistics - Applications - Abstract
A statistical emulator can be used as a surrogate of complex physics-based calculations to drastically reduce the computational cost. Its successful implementation hinges on an accurate representation of the nonlinear response surface with a high-dimensional input space. Conventional "space-filling" designs, including random sampling and Latin hypercube sampling, become inefficient as the dimensionality of the input variables increases, and the predictive accuracy of the emulator can degrade substantially for a test input distant from the training input set. To address this fundamental challenge, we develop a reliable emulator for predicting complex functionals by active learning with error control (ALEC). The algorithm is applicable to infinite-dimensional mapping with high-fidelity predictions and a controlled predictive error. The computational efficiency has been demonstrated by emulating the classical density functional theory (cDFT) calculations, a statistical-mechanical method widely used in modeling the equilibrium properties of complex molecular systems. We show that ALEC is much more accurate than conventional emulators based on the Gaussian processes with "space-filling" designs and alternative active learning methods. Besides, it is computationally more efficient than direct cDFT calculations. ALEC can be a reliable building block for emulating expensive functionals owing to its minimal computational cost, controllable predictive error, and fully automatic features., Comment: 15 pages, 10 figures
- Published
- 2022
- Full Text
- View/download PDF
14. Scalable marginalization of correlated latent variables with applications to learning particle interaction kernels
- Author
-
Gu, Mengyang, Liu, Xubo, Fang, Xinyi, and Tang, Sui
- Subjects
Statistics - Computation - Abstract
Marginalization of latent variables or nuisance parameters is a fundamental aspect of Bayesian inference and uncertainty quantification. In this work, we focus on scalable marginalization of latent variables in modeling correlated data, such as spatio-temporal or functional observations. We first introduce Gaussian processes (GPs) for modeling correlated data and highlight the computational challenge, where the computational complexity increases cubically fast along with the number of observations. We then review the connection between the state space model and GPs with Mat{\'e}rn covariance for temporal inputs. The Kalman filter and Rauch-Tung-Striebel smoother were introduced as a scalable marginalization technique for computing the likelihood and making predictions of GPs without approximation. We then introduce recent efforts on extending the scalable marginalization idea to the linear model of coregionalization for multivariate correlated output and spatio-temporal observations. In the final part of this work, we introduce a novel marginalization technique to estimate interaction kernels and forecast particle trajectories. The achievement lies in the sparse representation of covariance function, then applying conjugate gradient for solving the computational challenges and improving predictive accuracy. The computational advances achieved in this work outline a wide range of applications in molecular dynamic simulation, cellular migration, and agent-based models.
- Published
- 2022
15. RobustCalibration: Robust Calibration of Computer Models in R
- Author
-
Gu, Mengyang
- Subjects
Statistics - Computation ,Statistics - Applications - Abstract
Two fundamental research tasks in science and engineering are forward predictions and data inversion. This article introduces a recent R package RobustCalibration for Bayesian data inversion and model calibration by experiments and field observations. Mathematical models for forward predictions are often written in computer code, and they can be computationally expensive slow to run. To overcome the computational bottleneck from the simulator, we implemented a statistical emulator from the RobustGaSP package for emulating both scalar-valued or vector-valued computer model outputs. Both posterior sampling and maximum likelihood approach are implemented in the RobustCalibration package for parameter estimation. For imperfect computer models, we implement Gaussian stochastic process and the scaled Gaussian stochastic process for modeling the discrepancy function between the reality and mathematical model. This package is applicable to various types of field observations, such as repeated experiments and multiple sources of measurements. We discuss numerical examples of calibrating mathematical models that have closed-form expressions, and differential equations solved by numerical methods.
- Published
- 2022
16. Characterizing Performance Inequity Across U.S. Ookla Speedtest Users
- Author
-
Paul, Udit, Liu, Jiamo, Adarsh, Vivek, Gu, Mengyang, Gupta, Arpit, and Belding, Elizabeth
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
The Internet has become indispensable to daily activities, such as work, education and health care. Many of these activities require Internet access data rates that support real-time video conferencing. However, digital inequality persists across the United States, not only in who has access but in the quality of that access. Speedtest by Ookla allows users to run network diagnostic tests to better understand the current performance of their network. In this work, we leverage an Internet performance dataset from Ookla, together with an ESRI demographic dataset, to conduct a comprehensive analysis that characterizes performance differences between Speedtest users across the U.S. Our analysis shows that median download speeds for Speedtest users can differ by over 150Mbps between states. Further, there are important distinctions between user categories. For instance, all but one state showed statistically significant differences in performance between Speedtest users in urban and rural areas. The difference also exists in urban areas between high and low income users in 27 states. Our analysis reveals that states that demonstrate this disparity in Speedtest results are geographically bigger, more populous and have a wider dispersion of median household income. We conclude by highlighting several challenges to the complex problem space of digital inequality characterization and provide recommendations for furthering research on this topic., Comment: 10 pages, 5 figures, 3 tables
- Published
- 2021
17. Efficient force field and energy emulation through partition of permutationally equivalent atoms
- Author
-
Li, Hao, Zhou, Musen, Sebastian, Jessalyn, Wu, Jianzhong, and Gu, Mengyang
- Subjects
Physics - Chemical Physics ,Statistics - Applications - Abstract
Gaussian process (GP) emulator has been used as a surrogate model for predicting force field and molecular potential, to overcome the computational bottleneck of molecular dynamics simulation. Integrating both atomic force and energy in predictions was found to be more accurate than using energy alone, yet it requires $O((NM)^3)$ computational operations for computing the likelihood function and making predictions, where $N$ is the number of atoms and $M$ is the number of simulated configurations in the training sample, due to the inversion of a large covariance matrix. The large computational need limits its applications to emulating simulation of small molecules. The computational challenge of using both gradient information and function values in GPs was recently noticed in statistics and machine learning communities, where conventional approximation methods, such as the low rank decomposition or sparse approximation, may not work well. Here we introduce a new approach, the atomized force field (AFF) model, that integrates both force and energy in the emulator with many fewer computational operations. The drastic reduction on computation is achieved by utilizing the naturally sparse structure of the covariance satisfying the constraints of the energy conservation and permutation symmetry of atoms. The efficient machine learning algorithm extends the limits of its applications on larger molecules under the same computational budget, with nearly no loss of predictive accuracy. Furthermore, our approach contains uncertainty assessment of predictions of atomic forces and potentials, useful for developing a sequential design over the chemical input space, with almost no increase in computational cost.
- Published
- 2021
- Full Text
- View/download PDF
18. Correction: Evaluation of Schlemm’s canal with swept-source optical coherence tomography in primary angle-closure disease
- Author
-
Ding, Xuming, Huang, Lulu, Peng, Cheng, Xu, Li, Liu, Yixin, Yang, Yijie, Wang, Ning, Gu, Mengyang, Sun, Chengyang, Wu, Yue, and Guo, Wenyi
- Published
- 2023
- Full Text
- View/download PDF
19. Evaluation of Schlemm’s canal with swept-source optical coherence tomography in primary angle-closure disease
- Author
-
Ding, Xuming, Huang, Lulu, Peng, Cheng, Xu, Li, Liu, Yixin, Yang, Yijie, Wang, Ning, Gu, Mengyang, Sun, Chengyang, Wu, Yue, and Guo, Wenyi
- Published
- 2023
- Full Text
- View/download PDF
20. Uncertainty quantification and estimation in differential dynamic microscopy
- Author
-
Gu, Mengyang, Luo, Yimin, He, Yue, Helgeson, Matthew E., and Valentine, Megan T.
- Subjects
Condensed Matter - Soft Condensed Matter ,Physics - Data Analysis, Statistics and Probability ,Statistics - Applications - Abstract
Differential dynamic microscopy (DDM) is a form of video image analysis that combines the sensitivity of scattering and the direct visualization benefits of microscopy. DDM is broadly useful in determining dynamical properties including the intermediate scattering function for many spatiotemporally correlated systems. Despite its straightforward analysis, DDM has not been fully adopted as a routine characterization tool, largely due to computational cost and lack of algorithmic robustness. We present statistical analysis that quantifies the noise, reduces the computational order and enhances the robustness of DDM analysis. We propagate the image noise through the Fourier analysis, which allows us to comprehensively study the bias in different estimators of model parameters, and we derive a different way to detect whether the bias is negligible. Furthermore, through use of Gaussian process regression (GPR), we find that predictive samples of the image structure function require only around 0.5%-5% of the Fourier transforms of the observed quantities. This vastly reduces computational cost, while preserving information of the quantities of interest, such as quantiles of the image scattering function, for subsequent analysis. The approach, which we call DDM with uncertainty quantification (DDM-UQ), is validated using both simulations and experiments with respect to accuracy and computational efficiency, as compared with conventional DDM and multiple particle tracking. Overall, we propose that DDM-UQ lays the foundation for important new applications of DDM, as well as to high-throughput characterization. We implement the fast computation tool in a new, publicly available MATLAB software package., Comment: Published in Physical Review E. 24 pages, 12 figures. Typos in Section 2B are corrected
- Published
- 2021
- Full Text
- View/download PDF
21. Gut microbiota compositional profile in patients with posner-schlossman syndrome
- Author
-
Wang, Ning, Sun, Chengyang, Ju, Yahan, Huang, Lulu, Liu, Yixin, Gu, Mengyang, Xu, Chenrui, Wang, Minghan, Wu, Yue, Zhang, Dandan, Xu, Li, and Guo, Wenyi
- Published
- 2024
- Full Text
- View/download PDF
22. Gaussian orthogonal latent factor processes for large incomplete matrices of correlated data
- Author
-
Gu, Mengyang and Li, Hanmo
- Subjects
Statistics - Methodology ,Statistics - Applications ,Statistics - Computation - Abstract
We introduce Gaussian orthogonal latent factor processes for modeling and predicting large correlated data. To handle the computational challenge, we first decompose the likelihood function of the Gaussian random field with a multi-dimensional input domain into a product of densities at the orthogonal components with lower-dimensional inputs. The continuous-time Kalman filter is implemented to compute the likelihood function efficiently without making approximations. We also show that the posterior distribution of the factor processes is independent, as a consequence of prior independence of factor processes and orthogonal factor loading matrix. For studies with large sample sizes, we propose a flexible way to model the mean, and we derive the marginal posterior distribution to solve identifiability issues in sampling these parameters. Both simulated and real data applications confirm the outstanding performance of this method.
- Published
- 2020
23. Robust estimation of SARS-CoV-2 epidemic in US counties
- Author
-
Li, Hanmo and Gu, Mengyang
- Subjects
Statistics - Applications - Abstract
The COVID-19 outbreak is asynchronous in US counties. Mitigating the COVID-19 transmission requires not only the state and federal level order of protective measures such as social distancing and testing, but also public awareness of time-dependent risk and reactions at county and community levels. We propose a robust approach to estimate the heterogeneous progression of SARS-CoV-2 at all US counties having no less than 2 COVID-19 associated deaths, and we use the daily probability of contracting (PoC) SARS-CoV-2 for a susceptible individual to quantify the risk of SARS-CoV-2 transmission in a community. We found that shortening by $5\%$ of the infectious period of SARS-CoV-2 can reduce around $39\%$ (or $78$K, $95\%$ CI: $[66$K $, 89$K $]$) of the COVID-19 associated deaths in the US as of 20 September 2020. Our findings also indicate that reducing infection and deaths by a shortened infectious period is more pronounced for areas with the effective reproduction number close to 1, suggesting that testing should be used along with other mitigation measures, such as social distancing and facial mask-wearing, to reduce the transmission rate. Our deliverable includes a dynamic county-level map for local officials to determine optimal policy responses and for the public to better understand the risk of contracting SARS-CoV-2 on each day.
- Published
- 2020
24. Emulating the First Principles of Matter: A Probabilistic Roadmap
- Author
-
Wu, Jianzhong and Gu, Mengyang
- Subjects
Computer Science - Computational Engineering, Finance, and Science ,Physics - Computational Physics ,Statistics - Applications - Abstract
This chapter provides a tutorial overview of first principles methods to describe the properties of matter at the ground state or equilibrium. It begins with a brief introduction to quantum and statistical mechanics for predicting the electronic structure and diverse static properties of of many-particle systems useful for practical applications. Pedagogical examples are given to illustrate the basic concepts and simple applications of quantum Monte Carlo and density functional theory -- two representative methods commonly used in the literature of first principles modeling. In addition, this chapter highlights the practical needs for the integration of physics-based modeling and data-science approaches to reduce the computational cost and expand the scope of applicability. A special emphasis is placed on recent developments of statistical surrogate models to emulate first principles calculation from a probabilistic point of view. The probabilistic approach provides an internal assessment of the approximation accuracy of emulation that quantifies the uncertainty in predictions. Various recent advances toward this direction establish a new marriage between Gaussian processes and first principles calculation, with physical properties, such as translational, rotational, and permutation symmetry, naturally encoded in new kernel functions. Finally, it concludes with some prospects on future advances in the field toward faster yet more accurate computation leveraging a synergetic combination {of} novel theoretical concepts and efficient numerical algorithms.
- Published
- 2020
25. Robust estimation of SARS-CoV-2 epidemic in US counties.
- Author
-
Li, Hanmo and Gu, Mengyang
- Subjects
Humans ,Masks ,United States ,Basic Reproduction Number ,COVID-19 ,SARS-CoV-2 ,Physical Distancing - Abstract
The COVID-19 outbreak is asynchronous in US counties. Mitigating the COVID-19 transmission requires not only the state and federal level order of protective measures such as social distancing and testing, but also public awareness of time-dependent risk and reactions at county and community levels. We propose a robust approach to estimate the heterogeneous progression of SARS-CoV-2 at all US counties having no less than 2 COVID-19 associated deaths, and we use the daily probability of contracting (PoC) SARS-CoV-2 for a susceptible individual to quantify the risk of SARS-CoV-2 transmission in a community. We found that shortening by [Formula: see text] of the infectious period of SARS-CoV-2 can reduce around [Formula: see text] (or 78 K, [Formula: see text] CI: [66 K , 89 K ]) of the COVID-19 associated deaths in the US as of 20 September 2020. Our findings also indicate that reducing infection and deaths by a shortened infectious period is more pronounced for areas with the effective reproduction number close to 1, suggesting that testing should be used along with other mitigation measures, such as social distancing and facial mask-wearing, to reduce the transmission rate. Our deliverable includes a dynamic county-level map for local officials to determine optimal policy responses and for the public to better understand the risk of contracting SARS-CoV-2 on each day.
- Published
- 2021
26. Aryl hydrocarbon receptor dependent anti-inflammation and neuroprotective effects of tryptophan metabolites on retinal ischemia/reperfusion injury
- Author
-
Yang, Yijie, Wang, Ning, Xu, Li, Liu, Yixin, Huang, Lulu, Gu, Mengyang, Wu, Yue, Guo, Wenyi, and Sun, Hao
- Published
- 2023
- Full Text
- View/download PDF
27. Calibration of imperfect geophysical models by multiple satellite interferograms with measurement bias
- Author
-
Gu, Mengyang, Anderson, Kyle, and McPhillips, Erika
- Subjects
Statistics - Methodology - Abstract
Model calibration consists of using experimental or field data to estimate the unknown parameters of a mathematical model. The presence of model discrepancy and measurement bias in the data complicates this task. Satellite interferograms, for instance, are widely used for calibrating geophysical models in geological hazard quantification. In this work, we used satellite interferograms to relate ground deformation observations to the properties of the magma chamber at K\={\i}lauea Volcano in Hawai`i. We derived closed-form marginal likelihoods and implemented posterior sampling procedures that simultaneously estimate the model discrepancy of physical models, and the measurement bias from the atmospheric error in satellite interferograms. We found that model calibration by aggregating multiple interferograms and downsampling the pixels in the interferograms can reduce the computation complexity compared to calibration approaches based on multiple data sets. The conditions that lead to no loss of information from data aggregation and downsampling are studied. Simulation illustrates that both discrepancy and measurement bias can be estimated, and real applications demonstrate that modeling both effects helps obtain a reliable estimation of a physical model's unobserved parameters and enhance its predictive accuracy. We implement the computational tools in the RobustCalibration package available on CRAN.
- Published
- 2018
- Full Text
- View/download PDF
28. Generalized probabilistic principal component analysis of correlated data
- Author
-
Gu, Mengyang and Shen, Weining
- Subjects
Statistics - Methodology - Abstract
Principal component analysis (PCA) is a well-established tool in machine learning and data processing. The principal axes in PCA were shown to be equivalent to the maximum marginal likelihood estimator of the factor loading matrix in a latent factor model for the observed data, assuming that the latent factors are independently distributed as standard normal distributions. However, the independence assumption may be unrealistic for many scenarios such as modeling multiple time series, spatial processes, and functional data, where the outcomes are correlated. In this paper, we introduce the generalized probabilistic principal component analysis (GPPCA) to study the latent factor model for multiple correlated outcomes, where each factor is modeled by a Gaussian process. Our method generalizes the previous probabilistic formulation of PCA (PPCA) by providing the closed-form maximum marginal likelihood estimator of the factor loadings and other parameters. Based on the explicit expression of the precision matrix in the marginal likelihood that we derived, the number of the computational operations is linear to the number of output variables. Furthermore, we also provide the closed-form expression of the marginal likelihood when other covariates are included in the mean structure. We highlight the advantage of GPPCA in terms of the practical relevance, estimation accuracy and computational convenience. Numerical studies of simulated and real data confirm the excellent finite-sample performance of the proposed approach.
- Published
- 2018
29. Nonparametric estimation of utility functions
- Author
-
Gu, Mengyang, Bhattacharjya, Debarun, and Subramanian, Dharmashankar
- Subjects
Statistics - Applications - Abstract
Inferring a decision maker's utility function typically involves an elicitation phase where the decision maker responds to a series of elicitation queries, followed by an estimation phase where the state-of-the-art is to either fit the response data to a parametric form (such as the exponential or power function) or perform linear interpolation. We introduce a Bayesian nonparametric method involving Gaussian stochastic processes for estimating a utility function. Advantages include the flexibility to fit a large class of functions, favorable theoretical properties, and a fully probabilistic view of the decision maker's preference properties including risk attitude. Using extensive simulation experiments as well as two real datasets from the literature, we demonstrate that the proposed approach yields estimates with lower mean squared errors. While our focus is primarily on single-attribute utility functions, one of the real datasets involves three attributes; the results indicate that nonparametric methods also seem promising for multi-attribute utility function estimation.
- Published
- 2018
30. A theoretical framework of the scaled Gaussian stochastic process in prediction and calibration
- Author
-
Gu, Mengyang, Xie, Fangzheng, and Wang, Long
- Subjects
Mathematics - Statistics Theory - Abstract
Model calibration or data inversion is one of fundamental tasks in uncertainty quantification. In this work, we study the theoretical properties of the scaled Gaussian stochastic process (S-GaSP), to model the discrepancy between reality and imperfect mathematical models. We establish the explicit connection between Gaussian stochastic process (GaSP) and S-GaSP through the orthogonal series representation. The predictive mean estimator in the S-GaSP calibration model converges to the reality at the same rate as the GaSP with a suitable choice of the regularization and scaling parameters. We also show the calibrated mathematical model in the S-GaSP calibration converges to the one that minimizes the $L_2$ loss between the reality and mathematical model, whereas the GaSP model with other widely used covariance functions does not have this property. Numerical examples confirm the excellent finite sample performance of our approaches compared to a few recent approaches.
- Published
- 2018
31. Jointly Robust Prior for Gaussian Stochastic Process in Emulation, Calibration and Variable Selection
- Author
-
Gu, Mengyang
- Subjects
Statistics - Methodology - Abstract
Gaussian stochastic process (GaSP) has been widely used in two fundamental problems in uncertainty quantification, namely the emulation and calibration of mathematical models. Some objective priors, such as the reference prior, are studied in the context of emulating (approximating) computationally expensive mathematical models. In this work, we introduce a new class of priors, called the jointly robust prior, for both the emulation and calibration. This prior is designed to maintain various advantages from the reference prior. In emulation, the jointly robust prior has an appropriate tail decay rate as the reference prior, and is computationally simpler than the reference prior in parameter estimation. Moreover, the marginal posterior mode estimation with the jointly robust prior can separate the influential and inert inputs in mathematical models, while the reference prior does not have this property. We establish the posterior propriety for a large class of priors in calibration, including the reference prior and jointly robust prior in general scenarios, but the jointly robust prior is preferred because the calibrated mathematical model typically predicts the reality well. The jointly robust prior is used as the default prior in two new R packages, called "RobustGaSP" and "RobustCalibration", available on CRAN for emulation and calibration, respectively.
- Published
- 2018
32. RobustGaSP: Robust Gaussian Stochastic Process Emulation in R
- Author
-
Gu, Mengyang, Palomo, Jesús, and Berger, James O.
- Subjects
Statistics - Computation - Abstract
Gaussian stochastic process emulation is a powerful tool for approximating computationally intensive computer models. However, estimation of parameters in the GaSP emulator is a challenging task. No closed-form estimator is available and many numerical problems arise with standard estimates, e.g., the maximum likelihood estimator. In this package, we implement a marginal posterior mode estimator, for special priors and parameterizations, an estimation method that meets the robust parameter estimation criteria discussed in \cite{Gu2018robustness}; mathematical reasons are provided therein to explain why robust parameter estimation can greatly improve predictive performance of the emulator. In addition, inert inputs (inputs that almost have no effect on the variability of a function) can be identified from the marginal posterior mode estimation, at no extra computational cost. The package also implements the parallel partial Gaussian stochastic process (PP GaSP) emulator (\cite{gu2016parallel}) for the scenario where the computer model has multiple outputs on e.g. spatial-temporal coordinates. The package can be operated in a default mode, but also allows numerous user specifications, such as the capability of specifying trend functions and noise terms. Examples are studied herein to highlight the performance of the package in terms of out-of-sample prediction.}
- Published
- 2018
33. Magnolol limits NFκB-dependent inflammation by targeting PPARγ relieving retinal ischemia/reperfusion injury
- Author
-
Wang, Ning, Yang, Yijie, Liu, Yixin, Huang, Lulu, Gu, Mengyang, Wu, Yue, Xu, Li, Sun, Hao, and Guo, Wenyi
- Published
- 2022
- Full Text
- View/download PDF
34. Fast Nonseparable Gaussian Stochastic Process with Application to Methylation Level Interpolation
- Author
-
Gu, Mengyang and Xu, Yanxun
- Subjects
Statistics - Methodology - Abstract
Gaussian stochastic process (GaSP) has been widely used as a prior over functions due to its flexibility and tractability in modeling. However, the computational cost in evaluating the likelihood is $O(n^3)$, where $n$ is the number of observed points in the process, as it requires to invert the covariance matrix. This bottleneck prevents GaSP being widely used in large-scale data. We propose a general class of nonseparable GaSP models for multiple functional observations with a fast and exact algorithm, in which the computation is linear ($O(n)$) and exact, requiring no approximation to compute the likelihood. We show that the commonly used linear regression and separable models are special cases of the proposed nonseparable GaSP model. Through the study of an epigenetic application, the proposed nonseparable GaSP model can accurately predict the genome-wide DNA methylation levels and compares favorably to alternative methods, such as linear regression, random forest and localized Kriging method. The algorithm for fast computation is implemented in the ${\tt FastGaSP}$ R package on CRAN., Comment: Published version of the paper. The typos in the joint distribution in supplementary materials are corrected
- Published
- 2017
- Full Text
- View/download PDF
35. Robust Gaussian Stochastic Process Emulation
- Author
-
Gu, Mengyang, Wang, Xiaojing, and Berger, James O.
- Subjects
Mathematics - Statistics Theory - Abstract
We consider estimation of the parameters of a Gaussian Stochastic Process (GaSP), in the context of emulation (approximation) of computer models for which the outcomes are real-valued scalars. The main focus is on estimation of the GaSP parameters through various generalized maximum likelihood methods, mostly involving finding posterior modes; this is because full Bayesian analysis in computer model emulation is typically prohibitively expensive. The posterior modes that are studied arise from objective priors, such as the reference prior. These priors have been studied in the literature for the situation of an isotropic covariance function or under the assumption of separability in the design of inputs for model runs used in the GaSP construction. In this paper, we consider more general designs (e.g., a Latin Hypercube Design) with a class of commonly used anisotropic correlation functions, which can be written as a product of isotropic correlation functions, each having an unknown range parameter and a fixed roughness parameter. We discuss properties of the objective priors and marginal likelihoods for the parameters of the GaSP and establish the posterior propriety of the GaSP parameters, but our main focus is to demonstrate that certain parameterizations result in more robust estimation of the GaSP parameters than others, and that some parameterizations that are in common use should clearly be avoided. These results are applicable to many frequently used covariance functions, e.g., power exponential, Mat{\'e}rn, rational quadratic and spherical covariance. We also generalize the results to the GaSP model with a nugget parameter. Both theoretical and numerical evidence is presented concerning the performance of the studied procedures.
- Published
- 2017
36. Scaled Gaussian Stochastic Process for Computer Model Calibration and Prediction
- Author
-
Gu, Mengyang and Wang, Long
- Subjects
Statistics - Methodology - Abstract
We consider the problem of calibrating an imperfect computer model using experimental data. To compensate the misspecification of the computer model and make more accurate predictions, a discrepancy function is often included and modeled via a Gaussian stochastic process (GaSP). The calibrated computer model alone, however, sometimes fits the experimental data poorly, as the calibration parameters become unidentifiable. In this work, we propose the scaled Gaussian stochastic process (S-GaSP), a novel stochastic process that bridges the gap between two predominant methods, namely the $L_2$ calibration and the GaSP calibration. It is shown that our approach performs well in both calibration and prediction. A computationally feasible approach is introduced for this new model under the Bayesian paradigm. Compared with the GaSP calibration, the S-GaSP calibration enables the calibrated computer model itself to predict the reality well, based on the posterior distribution of the calibration parameters. Numerical comparisons of the simulated and real data are provided to illustrate the connections and differences between the proposed S-GaSP and other alternative approaches.
- Published
- 2017
37. ROBUST GAUSSIAN STOCHASTIC PROCESS EMULATION
- Author
-
Gu, Mengyang, Wang, Xiaojing, and Berger, James O.
- Published
- 2018
38. Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification
- Author
-
Gu, Mengyang, primary, Lin, Yizi, additional, Lee, Victor Chang, additional, and Qiu, Diana Y., additional
- Published
- 2023
- Full Text
- View/download PDF
39. Efficient force field and energy emulation through partition of permutationally equivalent atoms.
- Author
-
Li, Hao, Zhou, Musen, Sebastian, Jessalyn, Wu, Jianzhong, and Gu, Mengyang
- Subjects
FORCE & energy ,MOLECULAR force constants ,NUCLEAR energy ,MOLECULAR dynamics ,NUCLEAR forces (Physics) - Abstract
Gaussian process (GP) emulator has been used as a surrogate model for predicting force field and molecular potential, to overcome the computational bottleneck of ab initio molecular dynamics simulation. Integrating both atomic force and energy in predictions was found to be more accurate than using energy alone, yet it requires O((NM)
3 ) computational operations for computing the likelihood function and making predictions, where N is the number of atoms and M is the number of simulated configurations in the training sample due to the inversion of a large covariance matrix. The high computational cost limits its applications to the simulation of small molecules. The computational challenge of using both gradient information and function values in GPs was recently noticed in machine learning communities, whereas conventional approximation methods may not work well. Here, we introduce a new approach, the atomized force field model, that integrates both force and energy in the emulator with many fewer computational operations. The drastic reduction in computation is achieved by utilizing the naturally sparse covariance structure that satisfies the constraints of the energy conservation and permutation symmetry of atoms. The efficient machine learning algorithm extends the limits of its applications on larger molecules under the same computational budget, with nearly no loss of predictive accuracy. Furthermore, our approach contains an uncertainty assessment of predictions of atomic forces and energies, useful for developing a sequential design over the chemical input space. [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
40. Predicting SARS-CoV-2 infection among hemodialysis patients using multimodal data
- Author
-
Duan, Juntao, primary, Li, Hanmo, additional, Ma, Xiaoran, additional, Zhang, Hanjie, additional, Lasky, Rachel, additional, Monaghan, Caitlin K., additional, Chaudhuri, Sheetal, additional, Usvyat, Len A., additional, Gu, Mengyang, additional, Guo, Wensheng, additional, Kotanko, Peter, additional, and Wang, Yuedong, additional
- Published
- 2023
- Full Text
- View/download PDF
41. Calibration of Imperfect Geophysical Models by Multiple Satellite Interferograms with Measurement Bias.
- Author
-
Gu, Mengyang, Anderson, Kyle, and McPhillips, Erika
- Subjects
- *
SAMPLING (Process) , *GEOLOGICAL modeling , *CALIBRATION , *MEASUREMENT - Abstract
Model calibration consists of using experimental or field data to estimate the unknown parameters of a mathematical model. The presence of model discrepancy and measurement bias in the data complicates this task. Satellite interferograms, for instance, are widely used for calibrating geophysical models in geological hazard quantification. In this work, we used satellite interferograms to relate ground deformation observations to the properties of the magma chamber at K i¯lauea Volcano in Hawai'i. We derived closed-form marginal likelihoods and implemented posterior sampling procedures that simultaneously estimate the model discrepancy of physical models, and the measurement bias from the atmospheric error in satellite interferograms. We found that model calibration by aggregating multiple interferograms and downsampling the pixels in the interferograms can reduce the computation complexity compared to calibration approaches based on multiple data sets. The conditions that lead to no loss of information from data aggregation and downsampling are studied. Simulation illustrates that both discrepancy and measurement bias can be estimated, and real applications demonstrate that modeling both effects helps obtain a reliable estimation of a physical model's unobserved parameters and enhance its predictive accuracy. We implement the computational tools in the RobustCalibration package available on CRAN. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. A High Computationally Efficient Parallel Partial Gaussian Process for Large-Scale Power System Probabilistic Transient Stability Assessment
- Author
-
Ye, Ketian, Zhao, Junbo, Li, Hanmo, and Gu, Mengyang
- Abstract
This article proposes a specifically designed parallel partial Gaussian process (PPGP) for large-scale probabilistic power system transient stability assessment (TSA). The differential and algebraic equations with uncertain resources are defined and reformulated regarding uncertainty quantification. The challenges of probabilistic TSA in large-scale systems are investigated and the necessity of introducing efficient modeling approaches is emphasized. PPGP inherits the advantages of Gaussian process modeling, including data-driven modeling that does not require the knowledge of input distribution and can effectively quantify the confidence interval of the estimate. It accelerates the model construction and evaluation by reducing the number of parameters to be estimated. In addition, thanks to the enhancement techniques via isotropic kernel, reparameterization, and composite likelihood, PPGP is able to achieve probabilistic TSA under high uncertainty conditions while maintaining very high computational efficiency. A theoretical analysis of the time complexity of PPGP is provided. Comparison results with the Latin hypercube sampling method, many single Gaussian processes, and sparse Gaussian process methods on the Texas 2000-bus system with 471 PVs and 470 wind generations highlight the significantly improved model efficacy and efficiency of the proposed method. Furthermore, experimental validation demonstrates the proposed method's efficacy in handling the co-existence of both stable and unstable cases.
- Published
- 2024
- Full Text
- View/download PDF
43. Physics-informed Gaussian process regression of in operando capacitance for carbon supercapacitors
- Author
-
Pan, Runtong, primary, Gu, Mengyang, additional, and Wu, Jianzhong, additional
- Published
- 2023
- Full Text
- View/download PDF
44. A High Computationally Efficient Parallel Partial Gaussian Process for Large-Scale Power System Probabilistic Transient Stability Assessment
- Author
-
Ye, Ketian, primary, Zhao, Junbo, additional, Li, Hanmo, additional, and Gu, Mengyang, additional
- Published
- 2023
- Full Text
- View/download PDF
45. Combined Trabeculotomy-Non-Penetrating Deep Sclerectomy for Glaucoma in Sturge-Weber Syndrome
- Author
-
Huang, Lulu, primary, Xu, Li, additional, Liu, Yixin, additional, Yang, Yijie, additional, Wang, Ning, additional, Gu, Mengyang, additional, Sun, Chengyang, additional, Wu, Yue, additional, and Guo, Wenyi, additional
- Published
- 2023
- Full Text
- View/download PDF
46. Reliable emulation of complex functionals by active learning with error control
- Author
-
Fang, Xinyi, primary, Gu, Mengyang, additional, and Wu, Jianzhong, additional
- Published
- 2022
- Full Text
- View/download PDF
47. Gaussian Orthogonal Latent Factor Processes for Large Incomplete Matrices of Correlated Data
- Author
-
Gu, Mengyang, primary and Li, Hanmo, additional
- Published
- 2022
- Full Text
- View/download PDF
48. Non-Gaussian and anisotropic fluctuations mediate the progression of global cellular order: a data-driven study
- Author
-
Gu, Mengyang, Fang, Xinyi, and Luo, Yimin
- Subjects
FOS: Computer and information sciences ,Biological Physics (physics.bio-ph) ,Physics - Data Analysis, Statistics and Probability ,FOS: Physical sciences ,Applications (stat.AP) ,Physics - Biological Physics ,Statistics - Applications ,Data Analysis, Statistics and Probability (physics.data-an) - Abstract
The dynamics of cellular pattern formation are crucial for understanding embryonic development and tissue morphogenesis. Recent studies have shown that human dermal fibroblasts cultured on liquid crystal elastomers can exhibit an increase in orientational alignment over time, accompanied by cell proliferation, under the influence of the weak guidance of a molecularly aligned substrate. However, a comprehensive understanding of how this order arises remains largely unknown. This knowledge gap may be attributed, in part, to a scarcity of mechanistic models that can capture the temporal progression of the complex nonequilibrium dynamics during the cellular alignment process. To fill in this gap, we develop a hybrid procedure that utilizes statistical learning approaches to select individual-level features for extending the state-of-art physics models. The maximum likelihood estimator of the model was derived and implemented in computationally scalable algorithms for model calibration and simulation. By including these features, such as the non-Gaussian, anisotropic fluctuations, and limiting alignment interaction only to neighboring cells with the same velocity direction, this model is able to reproduce system-level parameters: the temporal progression of the velocity orientational order parameters and the variability of velocity vectors. Unlike other data-driven approaches, we do not rely on a loss function to tune model parameters to match these system-level characteristics. Furthermore, we develop a computational toolbox for automating model construction and calibration that can be applied to other systems of active matter.
- Published
- 2023
- Full Text
- View/download PDF
49. Supplementary Materials from Molecular-scale substrate anisotropy, crowding and division drive collective behaviours in cell monolayers
- Author
-
Luo, Yimin, Gu, Mengyang, Park, Minwook, Fang, Xinyi, Kwon, Younghoon, Urueña, Juan Manuel, Read de Alaniz, Javier, Helgeson, Matthew E., Marchetti, Cristina M., and Valentine, Megan T.
- Abstract
The ability of cells to reorganize in response to external stimuli is important in areas ranging from morphogenesis to tissue engineering. While nematic order is common in biological tissues, it typically only extends to small regions of cells interacting via steric repulsion. On isotropic substrates, elongated cells can co-align due to steric effects, forming ordered but randomly oriented finite-size domains. However, we have discovered that flat substrates with nematic order can induce global nematic alignment of dense, spindle-like cells, thereby influencing cell organization and collective motion and driving alignment on the scale of the entire tissue. Remarkably, single cells are not sensitive to the substrate’s anisotropy. Rather, the emergence of global nematic order is a collective phenomenon that requires both steric effects and molecular-scale anisotropy of the substrate. To quantify the rich set of behaviours afforded by this system, we analyse velocity, positional and orientational correlations for several thousand cells over days. The establishment of global order is facilitated by enhanced cell division along the substrate’s nematic axis, and associated extensile stresses that restructure the cells’ actomyosin networks. Our work provides a new understanding of the dynamics of cellular remodelling and organization among weakly interacting cells.
- Published
- 2023
- Full Text
- View/download PDF
50. A Theoretical Framework of the Scaled Gaussian Stochastic Process in Prediction and Calibration
- Author
-
Gu, Mengyang, primary, Xie, Fangzheng, additional, and Wang, Long, additional
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.