560 results on '"0104 Statistics"'
Search Results
2. Gaussian Variational Approximations for High-dimensional State Space Models
- Author
-
Quiroz, M, Nott, DJ, Kohn, R, Quiroz, M, Nott, DJ, and Kohn, R
- Abstract
We consider a Gaussian variational approximation of the posterior density in high-dimensional state space models. The number of parameters in the covariance matrix of the variational approximation grows as the square of the number of model parameters, so it is necessary to find simple yet effective parametrisations of the covariance structure when the number of model parameters is large. We approximate the joint posterior density of the state vectors by a dynamic factor model, having Markovian time dependence and a factor covariance structure for the states. This gives a reduced description of the dependence structure for the states, as well as a temporal conditional independence structure similar to that in the true posterior. We illustrate the methodology on two examples. The first is a spatio-temporal model for the spread of the Eurasian collared-dove across North America. Our approach compares favorably to a recently proposed ensemble Kalman filter method for approximate inference in high-dimensional hierarchical spatio-temporal models. Our second example is a Wishart-based multi-variate stochastic volatility model for financial returns, which is outside the class of models the ensemble Kalman filter method can handle.
- Published
- 2023
3. THE KOLMOGOROV INEQUALITY FOR THE MAXIMUM OF THE SUM OF RANDOM VARIABLES AND ITS MARTINGALE ANALOGUES
- Author
-
Kordzakhia, NE, Novikov, AA, Shiryaev, AN, Kordzakhia, NE, Novikov, AA, and Shiryaev, AN
- Abstract
We give a survey of the results related to extensions of the Kolmogorov inequality for the distribution of the absolute value of the maximum of the sum of centered independent random variables to the case of martingales considered at random stopping times.
- Published
- 2023
4. A platform trial design for preventive vaccines against Marburg virus and other emerging infectious disease threats
- Author
-
Ira M Longini, Yang Yang, Thomas R Fleming, César Muñoz-Fontela, Rui Wang, Susan S Ellenberg, George Qian, M Elizabeth Halloran, Martha Nason, Victor De Gruttola, Sabue Mulangu, Yunda Huang, Christl A Donnelly, Ana-Maria Henao Restrepo, Medical Research Council (MRC), and National Institute for Health Research
- Subjects
Statistics & Probability ,Research & Experimental Medicine ,Communicable Diseases, Emerging ,emerging infectious disease threat ,Animals ,Humans ,Marburg Virus Disease ,Marburg virus ,Pharmacology ,Vaccines ,Science & Technology ,Randomized placebo-controlled vaccine trial ,SARS-CoV-2 ,0104 Statistics ,COVID-19 ,1103 Clinical Sciences ,vaccine efficacy ,General Medicine ,cluster-randomized vaccine trial ,TIME ,CLUSTER-RANDOMIZED-TRIALS ,Medicine, Research & Experimental ,EVENT ,Marburgvirus ,SAMPLE-SIZE ,Life Sciences & Biomedicine ,CLINICAL-TRIALS - Abstract
Background: The threat of a possible Marburg virus disease outbreak in Central and Western Africa is growing. While no Marburg virus vaccines are currently available for use, several candidates are in the pipeline. Building on knowledge and experiences in the designs of vaccine efficacy trials against other pathogens, including SARS-CoV-2, we develop designs of randomized Phase 3 vaccine efficacy trials for Marburg virus vaccines. Methods: A core protocol approach will be used, allowing multiple vaccine candidates to be tested against controls. The primary objective of the trial will be to evaluate the effect of each vaccine on the rate of virologically confirmed Marburg virus disease, although Marburg infection assessed via seroconversion could be the primary objective in some cases. The overall trial design will be a mixture of individually and cluster-randomized designs, with individual randomization done whenever possible. Clusters will consist of either contacts and contacts of contacts of index cases, that is, ring vaccination, or other transmission units. Results: The primary efficacy endpoint will be analysed as a time-to-event outcome. A vaccine will be considered successful if its estimated efficacy is greater than 50% and has sufficient precision to rule out that true efficacy is less than 30%. This will require approximately 150 total endpoints, that is, cases of confirmed Marburg virus disease, per vaccine/comparator combination. Interim analyses will be conducted after 50 and after 100 events. Statistical analysis of the trial will be blended across the different types of designs. Under the assumption of a 6-month attack rate of 1% of the participants in the placebo arm for both the individually and cluster-randomized populations, the most likely sample size is about 20,000 participants per arm. Conclusion: This event-driven design takes into the account the potentially sporadic spread of Marburg virus. The proposed trial design may be applicable for other pathogens against which effective vaccines are not yet available.
- Published
- 2022
5. Inferring the Sources of HIV Infection in Africa from Deep-Sequence Data with Semi-Parametric Bayesian Poisson Flow Models
- Author
-
Xi, Xiaoyue, Spencer, Simon E. F., Hall, Matthew, Grabowski, M. Kate, Kagaayi, Joseph, Ratmann, Oliver, HASH(0x5651c98a9a40), and Bill & Melinda Gates Foundation
- Subjects
Statistics and Probability ,Science & Technology ,TRANSMISSION ,Statistics & Probability ,infectious disease epidemiology ,0104 Statistics ,MEN ,phylodynamics ,PREVENTION ,Stan ,DISEASE ,flow models ,NETWORKS ,Physical Sciences ,origin-destination models ,INFERENCE ,Statistics, Probability and Uncertainty ,Gaussian process ,RA ,Mathematics - Abstract
Pathogen deep-sequencing is an increasingly routinely used technology in infectious disease surveillance. We present a semi-parametric Bayesian Poisson model to exploit these emerging data for inferring infectious disease transmission flows and the sources of infection at the population level. The framework is computationally scalable in high-dimensional flow spaces thanks to Hilbert Space Gaussian process approximations, allows for sampling bias adjustments, and estimation of gender- and age-specific transmission flows at finer resolution than previously possible. We apply the approach to densely sampled, population-based HIV deep-sequence data from Rakai, Uganda, and find substantive evidence that adolescent and young women were predominantly infected through age-disparate relationships in the study period 2009–2015.
- Published
- 2022
6. A weak law of large numbers for realised covariation in a Hilbert space setting
- Author
-
Fred Espen Benth, Dennis Schroers, and Almut E.D. Veraart
- Subjects
Statistics and Probability ,Statistics & Probability ,60F99, 62M99 ,Applied Mathematics ,0104 Statistics ,Probability (math.PR) ,05 social sciences ,1502 Banking, Finance and Investment ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,01 natural sciences ,010104 statistics & probability ,0102 Applied Mathematics ,Modeling and Simulation ,0502 economics and business ,FOS: Mathematics ,0101 mathematics ,Mathematics - Probability ,050205 econometrics - Abstract
This article generalises the concept of realised covariation to Hilbert-space-valued stochastic processes. More precisely, based on high-frequency functional data, we construct an estimator of the trace-class operator-valued integrated volatility process arising in general mild solutions of Hilbert space-valued stochastic evolution equations in the sense of Da Prato and Zabczyk (2014). We prove a weak law of large numbers for this estimator, where the convergence is uniform on compacts in probability with respect to the Hilbert-Schmidt norm. In addition, we show that the conditions on the volatility process are valid for most common stochastic volatility models in Hilbert spaces., Comment: 34 pages
- Published
- 2022
7. Scoring predictions at extreme quantiles
- Author
-
Axel Gandy, Kaushik Jana, Almut E. D. Veraart, and Lloyd's Register Foundation
- Subjects
SELECTION ,FOS: Computer and information sciences ,Statistics and Probability ,Economics and Econometrics ,Science & Technology ,60G70 ,Economics ,Statistics & Probability ,Applied Mathematics ,0104 Statistics ,Statistics - Applications ,High quantile ,Extreme value ,Modeling and Simulation ,Physical Sciences ,REGRESSION ,Applications (stat.AP) ,Quantile score ,stat.AP ,Mathematics ,Social Sciences (miscellaneous) ,Analysis - Abstract
Prediction of quantiles at extreme tails is of interest in numerous applications. Extreme value modelling provides various competing predictors for this point prediction problem. A common method of assessment of a set of competing predictors is to evaluate their predictive performance in a given situation. However, due to the extreme nature of this inference problem, it can be possible that the predicted quantiles are not seen in the historical records, particularly when the sample size is small. This situation poses a problem to the validation of the prediction with its realisation. In this article, we propose two non-parametric scoring approaches to assess extreme quantile prediction mechanisms. The proposed assessment methods are based on predicting a sequence of equally extreme quantiles on different parts of the data. We then use the quantile scoring function to evaluate the competing predictors. The performance of the scoring methods is compared with the conventional scoring method and the superiority of the former methods are demonstrated in a simulation study. The methods are then applied to reanalyse cyber Netflow data from Los Alamos National Laboratory and daily precipitation data at a station in California available from Global Historical Climatology Network.
- Published
- 2022
8. A Dynamic Choice Model to Estimate the User Cost of Crowding with Large-Scale Transit Data
- Author
-
Prateek Bansal, Daniel Hörcher, Daniel J. Graham, and The Leverhulme Trust
- Subjects
Statistics and Probability ,Economics and Econometrics ,Science & Technology ,Statistics & Probability ,VALUATION ,LATENT MARKOV-MODELS ,0104 Statistics ,Social Sciences ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Social Sciences, Mathematical Methods ,inertia ,dynamic preferences ,ROUTE CHOICE ,1603 Demography ,STATE ,expectation-maximization ,Physical Sciences ,1403 Econometrics ,LABEL SWITCHING PROBLEM ,Statistics, Probability and Uncertainty ,Mathematical Methods In Social Sciences ,Mathematics ,crowding valuation ,Social Sciences (miscellaneous) ,smart card data - Abstract
Efficient mass transit provision should be responsive to the behaviour of passengers. Operators often conduct surveys to elicit passenger perspectives, but these can be expensive to administer and can suffer from hypothetical biases. With the advent of smart card and automated vehicle location data, operators have reliable sources of revealed preference (RP) data that can be utilized to estimate transit riders’ valuation of service attributes. To date, effective use of RP data has been limited due to modelling complexities. We propose a dynamic choice model (DCM) for population-level longitudinal RP data to address prominent challenges. In the DCM, riders are assumed to follow different decision rules (compensatory and inertia/habit) and temporal switching between decision rules based on experience-based learning is also formulated. We develop an expectation–maximization algorithm to estimate the DCM and apply our model to estimate passenger valuation of crowding. Using large-scale data of 2 months with over four million daily trips by an Asian metro, our DCM estimates show an increase of 47% in passenger’s valuation of travel time under extremely crowded conditions. Furthermore, the average passenger follows the compensatory rule on only 25.5% or fewer trips. These results are valuable for supply-side decisions of transit operators.
- Published
- 2022
9. Observation-driven models for discrete-valued time series
- Author
-
Mirko Armillotta, Alessandra Luati, Monia Lupparelli, Econometrics and Data Science, Armillotta, Mirko, Luati, Alessandra, and Lupparelli, Monia
- Subjects
Statistics and Probability ,Science & Technology ,Statistics & Probability ,0104 Statistics ,Link-function ,ERGODICITY ,Count data, generalized ARMA models, likeli- hood inference, link-function ,Likelihood inference ,Generalized ARMA models ,Physical Sciences ,REGRESSION ,Statistics, Probability and Uncertainty ,Mathematics ,Count data - Abstract
Statistical inference for discrete-valued time series has not been developed like traditional methods for time series generated by continuous random variables. Some relevant models exist, but the lack of a homogenous framework raises some critical issues. For instance, it is not trivial to explore whether models are nested and it is quite arduous to derive stochastic properties which simultaneously hold across different specifications. In this paper, inference for a general class of first order observation-driven models for discrete-valued processes is developed. Stochastic properties such as stationarity and ergodicity are derived under easy-to-check conditions, which can be directly applied to all the models encompassed in the class and for every distribution which satisfies mild moment conditions. Consistency and asymptotic normality of quasi-maximum likelihood estimators are established, with the focus on the exponential family. Finite sample properties and the use of information criteria for model selection are investigated throughout Monte Carlo studies. An empirical application to count data is discussed, concerning a test-bed time series on the spread of an infection.
- Published
- 2022
10. Nonparametric estimation of the intensity function of a spatial point process on a Riemannian manifold
- Author
-
S Ward, H S Battey, E A K Cohen, Wellcome Trust, Engineering and Physical Sciences Research Council, and Engineering & Physical Science Research Council (EPSRC)
- Subjects
Statistics and Probability ,Applied Mathematics ,General Mathematics ,0103 Numerical and Computational Mathematics ,Statistics & Probability ,0104 Statistics ,1403 Econometrics ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,Agricultural and Biological Sciences (miscellaneous) - Abstract
Summary This paper is concerned with nonparametric estimation of the intensity function of a point process on a Riemannian manifold. It provides a first-order asymptotic analysis of the proposed kernel estimator for Poisson processes, supplemented by empirical work to probe the behaviour in finite samples and under other generative regimes. The investigation highlights the scope for finite-sample improvements by allowing the bandwidth to adapt to local curvature.
- Published
- 2023
11. Wind energy forecasting with missing values within a fully conditional specification framework
- Author
-
Honglin Wen, Pierre Pinson, Jie Gu, and Zhijian Jin
- Subjects
FOS: Computer and information sciences ,0104 Statistics ,1403 Econometrics ,FOS: Electrical engineering, electronic engineering, information engineering ,Applications (stat.AP) ,Econometrics ,Systems and Control (eess.SY) ,Business and International Management ,Statistics - Applications ,Electrical Engineering and Systems Science - Systems and Control ,1505 Marketing ,Physics::Atmospheric and Oceanic Physics - Abstract
Wind power forecasting is essential to power system operation and electricity markets. As abundant data became available thanks to the deployment of measurement infrastructures and the democratization of meteorological modelling, extensive data-driven approaches have been developed within both point and probabilistic forecasting frameworks. These models usually assume that the dataset at hand is complete and overlook missing value issues that often occur in practice. In contrast to that common approach, we rigorously consider here the wind power forecasting problem in the presence of missing values, by jointly accommodating imputation and forecasting tasks. Our approach allows inferring the joint distribution of input features and target variables at the model estimation stage based on incomplete observations only. We place emphasis on a fully conditional specification method owing to its desirable properties, e.g., being assumption-free when it comes to these joint distributions. Then, at the operational forecasting stage, with available features at hand, one can issue forecasts by implicitly imputing all missing entries. The approach is applicable to both point and probabilistic forecasting, while yielding competitive forecast quality within both simulation and real-world case studies. It confirms that by using a powerful universal imputation method like fully conditional specification, the proposed approach is superior to the common approach, especially in the context of probabilistic forecasting., Comment: revision to International Journal of Forecasting
- Published
- 2023
12. Standardized Partial Sums and Products of p-Values
- Author
-
Nicholas A. Heard
- Subjects
HIGHER CRITICISM ,Statistics and Probability ,Science & Technology ,Statistics & Probability ,0104 Statistics ,Scientific literature ,Meta-analysis ,DRAWN ,Physical Sciences ,Statistics ,1403 Econometrics ,Range (statistics) ,Discrete Mathematics and Combinatorics ,Fisher's method ,Statistics, Probability and Uncertainty ,Truncated product ,Mathematics ,POPULATION - Abstract
In meta analysis, a diverse range of methods for combining multiple p-values have been applied throughout the scientific literature. For sparse signals where only a small proportion of the p-values are truly significant, a technique called higher criticism has previously been shown to have asymptotic consistency and more power than Fisher’s original method. However, higher criticism and other related methods can still lack power. Three new, simple to compute statistics are now proposed for detecting sparse signals, based on standardizing partial sums or products of p-value order statistics. The use of standardization is theoretically justified with results demonstrating asymptotic normality, and avoids the computational difficulties encountered when working with analytic forms of the distributions of the partial sums and products. In particular, the standardized partial product demonstrates more power than existing methods for both the standard Gaussian mixture model and a real data example from computer network modeling.
- Published
- 2021
13. A flexible hierarchical framework for improving inference in area-referenced environmental health studies
- Author
-
Marta Blangiardo, Monica Pirani, Anna Hansell, Alexina J. Mason, Sylvia Richardson, Medical Research Council, Pirani, Monica [0000-0002-6418-3278], Mason, Alexina J [0000-0001-7319-4545], Hansell, Anna L [0000-0001-9904-7447], Richardson, Sylvia [0000-0003-1998-492X], Blangiardo, Marta [0000-0002-1621-704X], and Apollo - University of Cambridge Repository
- Subjects
Statistics and Probability ,Lung Neoplasms ,Statistics & Probability ,Bayesian probability ,Bayesian inference ,Nitrogen Dioxide ,Inference ,area-referenced studies ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,missing data ,0302 clinical medicine ,LUNG-CANCER ,Econometrics ,Humans ,030212 general & internal medicine ,EXPOSURE ,0101 mathematics ,uncertainty ,Propensity Score ,data integration ,NITROGEN-DIOXIDE ,Science & Technology ,MORTALITY ,Confounding ,0104 Statistics ,Bayes Theorem ,General Medicine ,AIR-POLLUTION ,Missing data ,RISKS ,MODEL ,BIAS ,England ,Propensity score matching ,Physical Sciences ,Survey data collection ,MISSING CONFOUNDERS ,Mathematical & Computational Biology ,Statistics, Probability and Uncertainty ,Life Sciences & Biomedicine ,Environmental Health ,Mathematics ,Environmental epidemiology - Abstract
Study designs where data have been aggregated by geographical areas are popular in environmental epidemiology. These studies are commonly based on administrative databases and, providing a complete spatial coverage, are particularly appealing to make inference on the entire population. However, the resulting estimates are often biased and difficult to interpret due to unmeasured confounders, which typically are not available from routinely collected data. We propose a framework to improve inference drawn from such studies exploiting information derived from individual-level survey data. The latter are summarized in an area-level scalar score by mimicking at ecological level the well-known propensity score methodology. The literature on propensity score for confounding adjustment is mainly based on individual-level studies and assumes a binary exposure variable. Here, we generalize its use to cope with area-referenced studies characterized by a continuous exposure. Our approach is based upon Bayesian hierarchical structures specified into a two-stage design: (i) geolocated individual-level data from survey samples are up-scaled at ecological level, then the latter are used to estimate a generalized ecological propensity score (EPS) in the in-sample areas; (ii) the generalized EPS is imputed in the out-of-sample areas under different assumptions about the missingness mechanisms, then it is included into the ecological regression, linking the exposure of interest to the health outcome. This delivers area-level risk estimates, which allow a fuller adjustment for confounding than traditional areal studies. The methodology is illustrated by using simulations and a case study investigating the risk of lung cancer mortality associated with nitrogen dioxide in England (UK).
- Published
- 2022
- Full Text
- View/download PDF
14. On inference in high-dimensional regression
- Author
-
Heather S Battey, Nancy Reid, Engineering and Physical Sciences Research Council, and Engineering & Physical Science Research Council (EPSRC)
- Subjects
Statistics and Probability ,0102 Applied Mathematics ,Statistics & Probability ,0104 Statistics ,1403 Econometrics ,Statistics, Probability and Uncertainty - Abstract
This paper develops an approach to inference in a linear regression model when the number of potential explanatory variables is larger than the sample size. The approach treats each regression coefficient in turn as the interest parameter, the remaining coefficients being nuisance parameters, and seeks an optimal interest-respecting transformation, inducing sparsity on the relevant blocks of the notional Fisher information matrix. The induced sparsity is exploited through a marginal least-squares analysis for each variable, as in a factorial experiment, thereby avoiding penalization. One parameterization of the problem is found to be particularly convenient, both computationally and mathematically. In particular, it permits an analytic solution to the optimal transformation problem, facilitating theoretical analysis and comparison to other work. In contrast to regularized regression, such as the lasso and its extensions, neither adjustment for selection nor rescaling of the explanatory variables is needed, ensuring the physical interpretation of regression coefficients is retained. Recommended usage is within a broader set of inferential statements, so as to reflect uncertainty over the model as well as over the parameters. The considerations involved in extending the work to other regression models are briefly discussed.
- Published
- 2022
15. A hierarchical meta-analysis for settings involving multiple outcomes across multiple cohorts
- Author
-
Hocagil, Tugba Akkaya, Ryan, Louise M., Cook, Richard J., Richardson, Gale A., Day, Nancy L., Coles, Claire D., Olson, Heather Carmichael, Jacobson, Sandra W., and Jacobson, Joseph L.
- Subjects
Methodology (stat.ME) ,FOS: Computer and information sciences ,0104 Statistics ,Statistics - Methodology - Abstract
Evidence from animal models and epidemiological studies has linked prenatal alcohol exposure (PAE) to a broad range of long-term cognitive and behavioral deficits. However, there is virtually no information in the scientific literature regarding the levels of PAE associated with an increased risk of clinically significant adverse effects. During the period from 1975-1993, several prospective longitudinal cohort studies were conducted in the U.S., in which maternal reports regarding alcohol use were obtained during pregnancy and the cognitive development of the offspring was assessed from early childhood through early adulthood. The sample sizes in these cohorts did not provide sufficient power to examine effects associated with different levels and patterns of PAE. To address this critical public health issue, we have developed a hierarchical meta-analysis to synthesize information regarding the effects of PAE on cognition, integrating data on multiple endpoints from six U.S. longitudinal cohort studies. Our approach involves estimating the dose-response coefficients for each endpoint and then pooling these correlated dose-response coefficients to obtain an estimated `global' effect of exposure on cognition. In the first stage, we use individual participant data to derive estimates of the effects of PAE by fitting regression models that adjust for potential confounding variables using propensity scores. The correlation matrix characterizing the dependence between the endpoint-specific dose-response coefficients estimated within each cohort is then run, while accommodating incomplete information on some endpoints. We also compare and discuss inferences based on the proposed approach to inferences based on a full multivariate analysis
- Published
- 2022
16. A Relabeling Approach to Handling the Class Imbalance Problem for Logistic Regression
- Author
-
Yazhe Li, Niall M. Adams, and Tony Bellotti
- Subjects
Statistics and Probability ,Science & Technology ,Computer science ,Statistics & Probability ,0104 Statistics ,Logistic regression ,Minority class ,CLASSIFICATION ,EXISTENCE ,Standard procedure ,Class imbalance ,ComputingMethodologies_PATTERNRECOGNITION ,Relabeling ,EM ,Physical Sciences ,Statistics ,1403 Econometrics ,Discrete Mathematics and Combinatorics ,High imbalance ,ALGORITHM ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
Logistic regression is a standard procedure for real-world classification problems. The challenge of class imbalance arises in two-class classification problems when the minority class is observed much less than the majority class. This characteristic is endemic in many domains. Work by Owen [2007] has shown that cluster structure among the minority class may be a specific problem in highly imbalanced logistic regression. In this paper, we propose a novel relabeling approach to handle the class imbalance problem when using logistic regression, which essentially assigns new labels to the minority class observations. An Expectation-Maximization algorithm is formalized to serve as a tool for efficiently computing this relabeling. Modeling on such relabeled data can lead to improved predictive performance. We demonstrate the effectiveness of this approach with detailed experiments on real data sets.
- Published
- 2021
17. Real World Evidence in Medical Cannabis Research
- Author
-
Mikael H. Sodergren, Oliver Salazar, Rishi Banerjee, Daniel Couch, Simon Erridge, Barbara Pacchetti, and Nagina Mangal
- Subjects
Delta(9)-tetrahydrocannabinol ,Biomedical Research ,Process (engineering) ,Computer science ,Statistics & Probability ,Pharmacy ,Medical Marijuana ,Real world evidence ,1117 Public Health and Health Services ,Delta-9-tetrahydrocannabinol ,Pharmacovigilance ,Cannabidiol ,Electronic Health Records ,Humans ,Pharmacology (medical) ,Pharmacology & Pharmacy ,Registries ,Cannabis-based medicinal products ,Pharmacology, Toxicology and Pharmaceutics (miscellaneous) ,Cannabis ,Data collection ,Science & Technology ,Medicinal Cannabis ,business.industry ,0104 Statistics ,Public Health, Environmental and Occupational Health ,PAIN ,Risk analysis (engineering) ,Medical cannabis ,PATTERNS ,Commentary ,Evidence collection ,business ,Life Sciences & Biomedicine ,Medical Informatics - Abstract
Background Whilst access to cannabis-based medicinal products (CBMPs) has increased globally subject to relaxation of scheduling laws globally, one of the main barriers to appropriate patient access remains a paucity of high-quality evidence surrounding their clinical effects. Discussion Whilst randomised controlled trials (RCTs) remain the gold-standard for clinical evaluation, there are notable barriers to their implementation. Development of CBMPs requires novel approaches of evidence collection to address these challenges. Real world evidence (RWE) presents a solution to not only both provide immediate impact on clinical care, but also inform well-conducted RCTs. RWE is defined as evidence derived from health data sourced from non-interventional studies, registries, electronic health records and insurance data. Currently it is used mostly to monitor post-approval safety requirements allowing for long-term pharmacovigilance. However, RWE has the potential to be used in conjunction or as an extension to RCTs to both broaden and streamline the process of evidence generation. Conclusion Novel approaches of data collection and analysis will be integral to improving clinical evidence on CBMPs. RWE can be used in conjunction or as an extension to RCTs to increase the speed of evidence generation, as well as reduce costs. Currently, there is an abundance of potential data however, whilst a number of platforms now exist to capture real world data it is important the right tools and analysis are utilised to unlock potential insights from these.
- Published
- 2021
18. Trustworthiness of Statistical Inference
- Author
-
David J. Hand
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Economics and Econometrics ,Statistics & Probability ,Social Sciences ,Face (sociological concept) ,bans ,1603 Demography ,Methodology (stat.ME) ,BAYES ,hypothesis testing ,1403 Econometrics ,Statistical inference ,Positive economics ,Statistics - Methodology ,significance testing ,Science & Technology ,0104 Statistics ,trust ,Social Sciences, Mathematical Methods ,trustworthiness ,Trustworthiness ,Physical Sciences ,Damages ,Position (finance) ,Criticism ,p-values ,Statistics, Probability and Uncertainty ,Psychology ,Mathematical Methods In Social Sciences ,Mathematics ,Social Sciences (miscellaneous) - Abstract
We examine the role of trustworthiness and trust in statistical inference, arguing that it is the extent of trustworthiness in inferential statistical tools which enables trust in the conclusions. Certain tools, such as the p-value and significance test, have recently come under renewed criticism, with some arguing that they damage trust in statistics. We argue the contrary, beginning from the position that the central role of these methods is to form the basis for trusted conclusions in the face of uncertainty in the data, and noting that it is the misuse and misunderstanding of these tools which damages trustworthiness and hence trust. We go on to argue that recent calls to ban these tools tackle the symptom, not the cause, and themselves risk damaging the capability of science to advance, as well as risking feeding into public suspicion of the discipline of statistics. The consequence could be aggravated mistrust of our discipline and of science more generally. In short, the very proposals could work in quite the contrary direction from that intended. We make some alternative proposals for tackling the misuse and misunderstanding of these methods, and for how trust in our discipline might be promoted.
- Published
- 2021
19. Variational inference for Markovian queueing networks
- Author
-
Iker Perez and Giuliano Casale
- Subjects
Statistics and Probability ,Mathematical optimization ,Optimization problem ,Statistics & Probability ,MODELS ,variational methods ,Inference ,Markov process ,symbols.namesake ,0102 Applied Mathematics ,CHAINS ,Markov jump process ,Mathematics ,Queueing theory ,Science & Technology ,Counting process ,Applied Mathematics ,0104 Statistics ,Probabilistic logic ,Conditional probability distribution ,JUMP-PROCESSES ,Queueing networks ,Physical Sciences ,SIMULATION ,symbols ,Routing (electronic design automation) ,BAYESIAN-INFERENCE - Abstract
Queueing networks are stochastic systems formed by interconnected resources routing and serving jobs. They induce jump processes with distinctive properties, and find widespread use in inferential tasks. Here, service rates for jobs and potential bottlenecks in the routing mechanism must be estimated from a reduced set of observations. However, this calls for the derivation of complex conditional density representations, over both the stochastic network trajectories and the rates, which is considered an intractable problem. Numerical simulation procedures designed for this purpose do not scale, because of high computational costs; furthermore, variational approaches relying on approximating measures and full independence assumptions are unsuitable. In this paper, we offer a probabilistic interpretation of variational methods applied to inference tasks with queueing networks, and show that approximating measure choices routinely used with jump processes yield ill-defined optimization problems. Yet we demonstrate that it is still possible to enable a variational inferential task, by considering a novel space expansion treatment over an analogous counting process for job transitions. We present and compare exemplary use cases with practical queueing networks, showing that our framework offers an efficient and improved alternative where existing variational or numerically intensive solutions fail.
- Published
- 2021
20. Changepoint detection in non-exchangeable data
- Author
-
Hallgren, KL, Heard, NA, and Adams, NM
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Technology ,Science & Technology ,Statistics & Probability ,Dependent data ,Reversible jump MCMC ,MODELS ,0104 Statistics ,SERIES ,Theoretical Computer Science ,Methodology (stat.ME) ,Computational Theory and Mathematics ,Computer Science, Theory & Methods ,Physical Sciences ,Computer Science ,BINARY SEGMENTATION ,Statistics, Probability and Uncertainty ,BAYESIAN-INFERENCE ,Mathematics ,Changepoint detection ,Statistics - Methodology ,0802 Computation Theory and Mathematics - Abstract
Changepoint models typically assume the data within each segment are independent and identically distributed conditional on some parameters that change across segments. This construction may be inadequate when data are subject to local correlation patterns, often resulting in many more changepoints fitted than preferable. This article proposes a Bayesian changepoint model that relaxes the assumption of exchangeability within segments. The proposed model supposes data within a segment are m-dependent for some unknown $$m \geqslant 0$$ m ⩾ 0 that may vary between segments, resulting in a model suitable for detecting clear discontinuities in data that are subject to different local temporal correlations. The approach is suited to both continuous and discrete data. A novel reversible jump Markov chain Monte Carlo algorithm is proposed to sample from the model; in particular, a detailed analysis of the parameter space is exploited to build proposals for the orders of dependence. Two applications demonstrate the benefits of the proposed model: computer network monitoring via change detection in count data, and segmentation of financial time series.
- Published
- 2022
21. The SSA is 60!
- Author
-
Ryan, L, Henstridge, J, Kasza, J, Ryan, L, Henstridge, J, and Kasza, J
- Abstract
To celebrate the 60th anniversary of the Statistical Society of Australia, we put the spotlight on four members past and present. Interviews and profiles by Louise Ryan, John Henstridge and Jessica Kasza.
- Published
- 2022
22. Sparse linear mixed model selection via streamlined variational Bayes
- Author
-
Degani, E, Maestrini, L, Toczydłowska, D, Wand, MP, Degani, E, Maestrini, L, Toczydłowska, D, and Wand, MP
- Abstract
Linear mixed models are a versatile statistical tool to study data by accounting for fixed effects and random effects from multiple sources of variability. In many situations, a large number of candidate fixed effects is available and it is of interest to select a parsimonious subset of those being effectively relevant for predicting the response variable. Variational approximations facilitate fast approximate Bayesian inference for the parameters of a variety of statistical models, including linear mixed models. However, for models having a high number of fixed or random effects, simple application of standard variational inference principles does not lead to fast approximate inference algorithms, due to the size of model design matrices and inefficient treatment of sparse matrix problems arising from the required approximating density parameters updates. We illustrate how recently developed streamlined variational inference procedures can be generalized to make fast and accurate inference for the parameters of linear mixed models with nested random effects and global-local priors for Bayesian fixed effects selection. Our variational inference algorithms achieve convergence to the same optima of their standard imple-mentations, although with significantly lower computational effort, mem-ory usage and time, especially for large numbers of random effects. Using simulated and real data examples, we assess the quality of automated procedures for fixed effects selection that are free from hyperparameters tuning and only rely upon variational posterior approximations. Moreover, we show high accuracy of variational approximations against model fitting via Markov Chain Monte Carlo sampling.
- Published
- 2022
23. Density estimation via Bayesian inference engines
- Author
-
Wand, MP, Yu, JCF, Wand, MP, and Yu, JCF
- Abstract
We explain how effective automatic probability density function estimates can be constructed using contemporary Bayesian inference engines such as those based on no-U-turn sampling and expectation propagation. Extensive simulation studies demonstrate that the proposed density estimates have excellent comparative performance and scale well to very large sample sizes due to a binning strategy. Moreover, the approach is fully Bayesian and all estimates are accompanied by point-wise credible intervals. An accompanying package in the R language facilitates easy use of the new density estimates.
- Published
- 2022
24. A hierarchical meta-analysis for settings involving multiple outcomes across multiple cohorts
- Author
-
Akkaya Hocagil, T, Ryan, LM, Cook, RJ, Jacobson, SW, Richardson, GA, Day, NL, Coles, CD, Carmichael Olson, H, Jacobson, JL, Akkaya Hocagil, T, Ryan, LM, Cook, RJ, Jacobson, SW, Richardson, GA, Day, NL, Coles, CD, Carmichael Olson, H, and Jacobson, JL
- Abstract
Evidence from animal models and epidemiological studies has linked prenatal alcohol exposure (PAE) to a broad range of long-term cognitive and behavioural deficits. However, there is a paucity of evidence regarding the nature and levels of PAE associated with increased risk of clinically significant cognitive deficits. To derive robust and efficient estimates of the effects of PAE on cognitive function, we have developed a hierarchical meta-analysis approach to synthesize information regarding the effects of PAE on cognition, integrating data on multiple outcomes from six U.S. longitudinal cohort studies. A key assumption of standard methods of meta-analysis, effect sizes are independent, is violated when multiple intercorrelated outcomes are synthesized across studies. Our approach involves estimating the dose–response coefficients for each outcome and then pooling these correlated dose–response coefficients to obtain an estimated “global” effect of exposure on cognition. In the first stage, we use individual participant data to derive estimates of the effects of PAE by fitting regression models that adjust for potential confounding variables using propensity scores. The correlation matrix characterizing the dependence between the outcome-specific dose–response coefficients estimated within each cohort is then run, while accommodating incomplete information on some outcome. We also compare inferences based on the proposed approach to inferences based on a full multivariate analysis.
- Published
- 2022
25. High-frequency estimation of the Lévy-driven graph Ornstein-Uhlenbeck process
- Author
-
Courgeau, Valentin and Veraart, Almut E. D.
- Subjects
Methodology (stat.ME) ,FOS: Computer and information sciences ,FOS: Economics and business ,Statistics and Probability ,Statistical Finance (q-fin.ST) ,0104 Statistics ,FOS: Mathematics ,Statistics Theory (math.ST) ,Statistics, Probability and Uncertainty ,62E20, 62-09, 62F12, 62P12, 65F50 - Abstract
We consider the Graph Ornstein-Uhlenbeck (GrOU) process observed on a non-uniform discrete timegrid and introduce discretised max- imum likelihood estimators with parameters specific to the whole graph or specific to each component of the graph. Under a high-frequency sampling scheme, we study the asymptotic behaviour of those estimators as the mesh size of the observation grid goes to zero. We prove two stable central limit theorems to the same distribution as in the continuously-observed case un- der both finite and infinite jump activity for the L ́evy driving noise. In addition to providing the consistency of the estimators, the stable con- vergence allows us to consider probabilistic sparse inference procedures on the edges themselves when a graph structure is not explicitly available. It also preserves its asymptotic properties. In particular, we also show the asymptotic normality and consistency of an Adaptive Lasso scheme. We apply the new estimators to wind capacity factor measurements, i.e. the ratio between the wind power produced locally compared to its rated peak power, across fifty locations in Northern Spain and Portugal. We compare those estimators to the standard least squares estimator through a simula- tion study extending known univariate results across graph configurations, noise types and amplitudes.
- Published
- 2022
26. Nonparametric Bayesian inference for reversible multi-dimensional diffusions
- Author
-
Giordano, Matteo and Ray, Kolyan
- Subjects
0102 Applied Mathematics ,Statistics & Probability ,Probability (math.PR) ,0104 Statistics ,FOS: Mathematics ,1403 Econometrics ,Mathematics - Statistics Theory ,Mathematics - Numerical Analysis ,Statistics Theory (math.ST) ,Numerical Analysis (math.NA) ,Mathematics - Probability - Abstract
We study nonparametric Bayesian models for reversible multi-dimensional diffusions with periodic drift. For continuous observation paths, reversibility is exploited to prove a general posterior contraction rate theorem for the drift gradient vector field under approximation-theoretic conditions on the induced prior for the invariant measure. The general theorem is applied to Gaussian priors and $p$-exponential priors, which are shown to converge to the truth at the minimax optimal rate over Sobolev smoothness classes in any dimension., Comment: 41 pages, 1 figure, to appear in the Annals of Statistics
- Published
- 2022
27. A parameter estimation method for multivariate aggregated Hawkes processes
- Author
-
Shlomovich, Leigh, Cohen, Edward A. K., and Adams, Niall
- Subjects
Methodology (stat.ME) ,FOS: Computer and information sciences ,Statistics & Probability ,0104 Statistics ,Statistics - Methodology ,62M09 ,0802 Computation Theory and Mathematics - Abstract
It is often assumed that events cannot occur simultaneously when modelling data with point processes. This raises a problem as real-world data often contains synchronous observations due to aggregation or rounding, resulting from limitations on recording capabilities and the expense of storing high volumes of precise data. In order to gain a better understanding of the relationships between processes, we consider modelling the aggregated event data using multivariate Hawkes processes, which offer a description of mutually-exciting behaviour and have found wide applications in areas including seismology and finance. Here we generalise existing methodology on parameter estimation of univariate aggregated Hawkes processes to the multivariate case using a Monte Carlo Expectation Maximization (MC-EM) algorithm and through a simulation study illustrate that alternative approaches to this problem can be severely biased, with the multivariate MC-EM method outperforming them in terms of MSE in all considered cases., Comment: 14 pages, 5 figures
- Published
- 2022
28. Reflection on modern methods: constructing directed acyclic graphs (DAGs) with domain experts for health services research
- Author
-
Daniela Rodrigues, Noemi Kreif, Anna Lawrence-Jones, Mauricio Barahona, Erik Mayer, Imperial College Healthcare NHS Trust- BRC Funding, National Institute for Health Research, NHS North West London CCG, Engineering & Physical Science Research Council (EPSRC), and Nuffield Foundation
- Subjects
potential outcomes ,Science & Technology ,directed acyclic graphs ,Epidemiology ,0104 Statistics ,Confounding Factors, Epidemiologic ,General Medicine ,SENSITIVITY-ANALYSIS ,policy evaluation ,State Medicine ,health services research ,1117 Public Health and Health Services ,Causality ,Data Interpretation, Statistical ,OBSERVATIONAL RESEARCH ,CAUSAL INFERENCE ,Humans ,KNOWLEDGE ,Life Sciences & Biomedicine ,Public, Environmental & Occupational Health - Abstract
Directed acyclic graphs (DAGs) are a useful tool to represent, in a graphical format, researchers’ assumptions about the causal structure among variables while providing a rationale for the choice of confounding variables to adjust for. With origins in the field of probabilistic graphical modelling, DAGs are yet to be widely adopted in applied health research, where causal assumptions are frequently made for the purpose of evaluating health services initiatives. In this context, there is still limited practical guidance on how to construct and use DAGs. Some progress has recently been made in terms of building DAGs based on studies from the literature, but an area that has received less attention is how to create DAGs from information provided by domain experts, an approach of particular importance when there is limited published information about the intervention under study. This approach offers the opportunity for findings to be more robust and relevant to patients, carers and the public, and more likely to inform policy and clinical practice. This article draws lessons from a stakeholder workshop involving patients, health care professionals, researchers, commissioners and representatives from industry, whose objective was to draw DAGs for a complex intervention—online consultation, i.e. written exchange between the patient and health care professional using an online system—in the context of the English National Health Service. We provide some initial, practical guidance to those interested in engaging with domain experts to develop DAGs.
- Published
- 2022
29. Variational Bayes for High-Dimensional Linear Regression With Sparse Priors
- Author
-
Kolyan Ray, Botond Szabo, and Mathematics
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,62G20 (Primary), 62G05, 65K10 (secondary) ,Statistics & Probability ,Mathematics - Statistics Theory ,Machine Learning (stat.ML) ,Statistics Theory (math.ST) ,Model selection ,Bayesian inference ,1603 Demography ,Methodology (stat.ME) ,Bayes' theorem ,Statistics - Machine Learning ,EMPIRICAL BAYES ,MODEL SELECTION, ORACLE INEQUALITIES, SPARSITY, SPIKE-AND-SLAB PRIOR, VARIATIONAL BAYES ,Prior probability ,Linear regression ,1403 Econometrics ,FOS: Mathematics ,SPIKE ,Statistics::Methodology ,Applied mathematics ,NEEDLES ,Statistics - Methodology ,Selection (genetic algorithm) ,Mathematics ,Science & Technology ,0104 Statistics ,Spike-and-slab prior ,Statistics::Computation ,VARIABLE SELECTION ,Oracle inequalities ,Physical Sciences ,SDG 1 - No Poverty ,Compatibility (mechanics) ,CONVERGENCE-RATES ,INFERENCE ,STRAW ,Spike (software development) ,Statistics, Probability and Uncertainty ,Variational Bayes ,Sparsity ,POSTERIOR CONCENTRATION - Abstract
We study a mean-field spike and slab variational Bayes (VB) approximation to Bayesian model selection priors in sparse high-dimensional linear regression. Under compatibility conditions on the design matrix, oracle inequalities are derived for the mean-field VB approximation, implying that it converges to the sparse truth at the optimal rate and gives optimal prediction of the response vector. The empirical performance of our algorithm is studied, showing that it works comparably well as other state-of-the-art Bayesian variable selection methods. We also numerically demonstrate that the widely used coordinate-ascent variational inference (CAVI) algorithm can be highly sensitive to the parameter updating order, leading to potentially poor performance. To mitigate this, we propose a novel prioritized updating scheme that uses a data-driven updating order and performs better in simulations. The variational algorithm is implemented in the R package 'sparsevb'., Comment: 42 pages. To appear in Journal of the American Statistical Association
- Published
- 2021
30. Forecasting recovery rates on non-performing loans with machine learning
- Author
-
Frédéric Vrins, Damiano Brigo, Paolo Gambetti, Anthony Bellotti, and UCL - SSH/LIDAM/LFIN - Louvain Finance
- Subjects
Economics ,Computer science ,media_common.quotation_subject ,MODELS ,Social Sciences ,Machine learning ,computer.software_genre ,Loss given default ,Business & Economics ,Debt ,REGRESSION ,0502 economics and business ,1403 Econometrics ,Econometrics ,050207 economics ,Business and International Management ,Set (psychology) ,Credit risk ,1505 Marketing ,Risk management ,050205 econometrics ,media_common ,Risk Management ,Actuarial science ,business.industry ,0104 Statistics ,05 social sciences ,Non-performing loans ,Superior set of models ,Debt collection ,Management ,Random forest ,Recovery rate ,Loan ,Portfolio ,Default ,Artificial intelligence ,business ,Non-performing loan ,Defaulted loans ,computer ,Forecasting - Abstract
We compare the performances of a wide set of regression techniques and machine learning algorithms for predicting recovery rates on non-performing loans, using a private database from a European debt collection agency. We find that rule-based algorithms such as Cubist, boosted trees and random forests perform significantly better than other approaches. In addition to loan contract specificities, the predictors referring to the bank recovery process -- prior to the portfolio's sale to the debt collector -- are also proven to strongly enhance forecasting performances. These variables, derived from the time-series of contacts to defaulted clients and clients' reimbursements to the bank, help all algorithms to better identify debtors with different repayment ability and/or commitment, and in general with different recovery potential.
- Published
- 2021
31. The global burden of chronic hepatitis B virus infection: comparison of country-level prevalence estimates from four research groups
- Author
-
Nora Schmit, Shevanthi Nayagam, Mark Thursz, Timothy B. Hallett, Medical Research Council (MRC), and National Institute for Health Research
- Subjects
sub-Saharan Africa ,Other Infectious Diseases ,Epidemiology ,viral-hepatitis elimination ,prevalence ,030231 tropical medicine ,Global Health ,World Health Organization ,infectious diseases ,medicine.disease_cause ,Representativeness heuristic ,1117 Public Health and Health Services ,disease burden ,modelling ,03 medical and health sciences ,Hepatitis B, Chronic ,0302 clinical medicine ,Seroepidemiologic Studies ,medicine ,Humans ,Seroprevalence ,AcademicSubjects/MED00860 ,030212 general & internal medicine ,Child ,Africa South of the Sahara ,Disease burden ,Hepatitis B virus ,business.industry ,indicator ,0104 Statistics ,Percentage point ,General Medicine ,Hepatitis B ,medicine.disease ,monitoring ,Child, Preschool ,Pairwise comparison ,business ,Viral hepatitis ,Demography - Abstract
Background Progress towards viral hepatitis elimination goals relies on accurate estimates of chronic hepatitis B virus (HBV)-infection prevalence. We compared existing sources of country-level estimates from 2013 to 2017 to investigate the extent and underlying drivers of differences between them. Methods The four commonly cited sources of global-prevalence estimates, i.e. the Institute for Health Metrics and Evaluation, Schweitzer et al., the World Health Organization (WHO) and the CDA Foundation, were compared by calculating pairwise differences between sets of estimates and assessing their within-country variation. Differences in underlying empirical data and modelling methods were investigated as contributors to differences in sub-Saharan African estimates. Results The four sets of estimates across all ages were comparable overall and agreed on the global distribution of HBV burden. The WHO and the CDA produced the most similar estimates, differing by a median of 0.8 percentage points. Larger discrepancies were seen in estimates of prevalence in children under 5 years of age and in sub-Saharan African countries, where the median pairwise differences were 2.7 percentage and 2.4 percentage points for all-age prevalence and in children, respectively. Recency and representativeness of included data, and different modelling assumptions of the age distribution of HBV burden, seemed to contribute to these differences. Conclusion Current prevalence estimates, particularly those from the WHO and the CDA based on more recent empirical data, provide a useful resource to assess the population-level burden of chronic HBV-infection. However, further seroprevalence data in young children are needed particularly in sub-Saharan Africa. This is a priority, as monitoring progress towards elimination depends on improved knowledge of prevalence in this age group.
- Published
- 2020
32. Applying generalized funnel plots to help design statistical analyses
- Author
-
Janet Aisbett, Eric J. Drinkwater, Kenneth L. Quarrie, and Stephen Woodcock
- Subjects
Statistics and Probability ,Statistics & Probability ,0104 Statistics ,Statistics, Probability and Uncertainty - Abstract
Researchers across many fields routinely analyze trial data using Null Hypothesis Significance Tests with zero null and p
- Published
- 2022
33. Interoperability of statistical models in pandemic preparedness: principles and reality
- Author
-
George Nicholson, Marta Blangiardo, Mark Briers, Peter J. Diggle, Tor Erlend Fjelde, Hong Ge, Robert J. B. Goudie, Radka Jersakova, Ruairidh E. King, Brieuc C. L. Lehmann, Ann-Marie Mallon, Tullia Padellini, Yee Whye Teh, Chris Holmes, Sylvia Richardson, Goudie, Robert [0000-0001-9554-1499], Richardson, Sylvia [0000-0003-1998-492X], Apollo - University of Cambridge Repository, and National Institute of Child Health and Human Development
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Statistics & Probability ,multi-source inference ,General Mathematics ,ENGLAND ,interoperability ,SARS-COV-2 ,Statistics - Applications ,Article ,Methodology (stat.ME) ,Bayesian graphical models ,INFECTION ,EPIDEMIOLOGY ,Applications (stat.AP) ,stat.AP ,Statistics - Methodology ,Modularization ,Science & Technology ,0104 Statistics ,COVID-19 ,evidence synthesis ,PREVALENCE ,stat.ME ,62P10 ,Physical Sciences ,INFERENCE ,Statistics, Probability and Uncertainty ,Mathematics ,Bayesian melding - Abstract
We present "interoperability" as a guiding framework for statistical modelling to assist policy makers asking multiple questions using diverse datasets in the face of an evolving pandemic response. Interoperability provides an important set of principles for future pandemic preparedness, through the joint design and deployment of adaptable systems of statistical models for disease surveillance using probabilistic reasoning. We illustrate this through case studies for inferring spatial-temporal coronavirus disease 2019 (COVID-19) prevalence and reproduction numbers in England., Comment: 26 pages, 10 figures, for associated mpeg file Movie 1 please see https://www.dropbox.com/s/kn9y1v6zvivfla1/Interoperability_of_models_Movie_1.mp4?dl=0
- Published
- 2022
34. Stable isotopic signatures of methane from waste sources through atmospheric measurements
- Author
-
Bakkaloglu, Semra, Lowry, Dave, Fisher, Rebecca E., Menoud, Malika, Lanoisellé, Mathias, Chen, Huilin, Röckmann, Thomas, Nisbet, Euan G., Sub Atmospheric physics and chemistry, Marine and Atmospheric Research, Sub Atmospheric physics and chemistry, Marine and Atmospheric Research, and Isotope Research
- Subjects
0907 Environmental Engineering ,Atmospheric Science ,Methane emissions ,Environmental Science(all) ,Waste isotopic signature ,0104 Statistics ,Greenhouse gas emissions ,Carbon isotopes ,Meteorology & Atmospheric Sciences ,0401 Atmospheric Sciences ,Deuterium ,General Environmental Science - Abstract
This study aimed to characterize the carbon isotopic signatures (δ13C-CH4) of several methane waste sources, predominantly in the UK, and during field campaigns in the Netherlands and Turkey. CH4 plumes emitted from waste sources were detected during mobile surveys using a cavity ring-down spectroscopy (CRDS) analyser. Air samples were collected in the plumes for subsequent isotope analysis by gas chromatography isotope ratio mass spectrometry (GC-IRMS) to characterize δ13C-CH4. The isotopic signatures were determined through a Keeling plot approach and the bivariate correlated errors and intrinsic scatter (BCES) fitting method. The δ13C-CH4 and δ2H-CH4 signatures were identified from biogas plants (−54.6 ± 5.6‰, n = 34; −314.4 ± 23‰ n = 3), landfills (−56.8 ± 2.3‰, n = 43; −268.2 ± 2.1‰, n = 2), sewage treatment plants (−51.6 ± 2.2‰, n = 15; −303.9 ± 22‰, n = 6), composting facilities (−54.7 ± 3.9‰, n = 6), a landfill leachate treatment plant (−57.1 ± 1.8‰, n = 2), one water treatment plant (−53.7 ± 0.1‰) and a waste recycling facility (−53.2 ± 0.2‰). The overall signature of 71 waste sources ranged from −64.4 to −44.3‰, with an average of −55.1 ± 4.1‰ (n = 102) for δ13C, −341 to −267‰, with an average of −300.3 ± 25‰ (n = 11) for δ2H, which can be distinguished from other source types in the UK such as gas leaks and ruminants. The study also demonstrates that δ2H-CH4 signatures, in addition to δ13C-CH4, can aid in better waste source apportionment and increase the granularity of isotope data required to improve regional modelling.
- Published
- 2022
35. The least favorable noise
- Author
-
Ernst, Philip A., Kagan, Abram M., and Rogers, L. C. G.
- Subjects
Statistics and Probability ,Statistics & Probability ,Probability (math.PR) ,0104 Statistics ,FOS: Mathematics ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,60E07, 60E10, 60E05 ,Statistics, Probability and Uncertainty ,Mathematics - Probability - Abstract
Suppose that a random variable $X$ of interest is observed perturbed by independent additive noise $Y$. This paper concerns the "the least favorable perturbation" $\hat Y_\ep$, which maximizes the prediction error $E(X-E(X|X+Y))^2$ in the class of $Y$ with $ \var (Y)\leq \ep$. We find a characterization of the answer to this question, and show by example that it can be surprisingly complicated. However, in the special case where $X$ is infinitely divisible, the solution is complete and simple. We also explore the conjecture that noisier $Y$ makes prediction worse., 15 pages, 9 figures
- Published
- 2022
36. Robust estimation of large panels with factor structures
- Author
-
Avarucci, M and Zaffaroni, P
- Subjects
Statistics and Probability ,GLS ,Science & Technology ,Factor structure ,Statistics & Probability ,0104 Statistics ,1603 Demography ,REGRESSION-MODELS ,Physical Sciences ,1403 Econometrics ,Panel ,GROWTH ,INFERENCE ,Statistics, Probability and Uncertainty ,Robustness ,Mathematics ,Weighted least squares estimation ,ERROR - Abstract
This article studies estimation of linear panel regression models with heterogeneous coefficients using a class of weighted least squares estimators, when both the regressors and the error possibly contain a common latent factor structure. Our theory is robust to the specification of such a factor structure because it does not require any information on the number of factors or estimation of the factor structure itself. Moreover, our theory is efficient, in certain circumstances, because it nests the GLS principle. We first show how our unfeasible weighted-estimator provides a bias-adjusted estimator with the conventional limiting distribution, for situations in which the OLS is affected by a first-order bias. The technical challenge resolved in the article consists of showing how these properties are preserved for the feasible weighted estimator in a double-asymptotics setting. Our theory is illustrated by extensive Monte Carlo experiments and an empirical application that investigates the link between capital accumulation and economic growth in an international setting. Supplementary materials for this article are available online.
- Published
- 2022
37. Large and moderate deviations for stochastic Volterra systems
- Author
-
Alexandre Pannier, Antoine Jacquier, and Engineering & Physical Science Research Council (EPSRC)
- Subjects
Statistics and Probability ,Applied Mathematics ,Statistics & Probability ,Probability (math.PR) ,0104 Statistics ,60F10, 60G22, 91G20 ,1502 Banking, Finance and Investment ,math.PR ,FOS: Economics and business ,Modeling and Simulation ,0102 Applied Mathematics ,FOS: Mathematics ,Pricing of Securities (q-fin.PR) ,Quantitative Finance - Pricing of Securities ,q-fin.PR ,Mathematics - Probability - Abstract
We provide a unified treatment of pathwise Large and Moderate deviations principles for a general class of multidimensional stochastic Volterra equations with singular kernels, not necessarily of convolution form. Our methodology is based on the weak convergence approach by Budhijara, Dupuis and Ellis. We show in particular how this framework encompasses most rough volatility models used in mathematical finance and generalises many recent results in the literature., 39 pages
- Published
- 2022
38. Discussion of assumption-lean inference for generalised linear model parameters by Vansteelandt and Dukes
- Author
-
Battey, H, Engineering and Physical Sciences Research Council, and Engineering & Physical Science Research Council (EPSRC)
- Subjects
0102 Applied Mathematics ,Statistics & Probability ,0104 Statistics ,1403 Econometrics - Published
- 2022
39. Obituary: David Cox
- Author
-
Battey, H, Reid, N, Engineering and Physical Sciences Research Council, and Engineering & Physical Science Research Council (EPSRC)
- Subjects
0104 Statistics - Published
- 2022
40. SDEs with uniform distributions: Peacocks, conic martingales and mean reverting uniform diffusions
- Author
-
Monique Jeanblanc, Frédéric Vrins, Damiano Brigo, UCL - SSH/LIDAM/LFIN - Louvain Finance, UCL - SSH/LIDAM/CORE - Center for operations research and econometrics, Department of Mathematics [Imperial College London], Imperial College London, Laboratoire de Mathématiques et Modélisation d'Evry (LaMME), Institut National de la Recherche Agronomique (INRA)-Université d'Évry-Val-d'Essonne (UEVE)-ENSIIE-Centre National de la Recherche Scientifique (CNRS), Université Catholique de Louvain = Catholic University of Louvain (UCL), Institut National de la Recherche Agronomique (INRA) - Université d'Evry-Val d'Essonne - ENSIIE - Centre National de la Recherche Scientifique (CNRS), Université Catholique de Louvain (UCL), ANR 11-LABX-0019., Institut National de la Recherche Agronomique (INRA)-Université d'Évry-Val-d'Essonne (UEVE)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques et Modélisation d'Evry, JEANBLANC, Monique, UCL - SSH/IMMAQ/CORE - Center for operations research and econometrics, and UCL - SSH/IMMAQ/LFIN-Louvain Finance
- Subjects
Uniform Martingale Diffusions ,[MATH.MATH-PR] Mathematics [math]/Probability [math.PR] ,Pure mathematics ,Uniformly distributed stochastic differential equation ,[MATH] Mathematics [math] ,math.PR ,01 natural sciences ,gales ,010104 statistics & probability ,Stochastic differential equation ,Mathematics::Probability ,0102 Applied Mathematics ,60J60 * Department of Mathematics ,Mean Reverting Uniform Diffusion ,Mean reverting uniform diffusion ,Maximum entropy stochastic recovery rates ,[MATH]Mathematics [math] ,Maximum Entropy ,Stochastic Recovery Rates ,Imperial College ,Mathematics ,Conic Martin-gales ,peacock process ,Applied Mathematics ,Weak solution ,Principle of maximum entropy ,0104 Statistics ,1502 Banking, Finance and Investment ,Uniformly distributed Stochastic Differential Equation ,Uniform SDE Sim-ulation ,Conic section ,Modeling and Simulation ,ulation ,Peacock Process ,Martingale (probability theory) ,Statistics and Probability ,uniformmly distributed diffusion ,Uniformly distributed diffusion ,Statistics & Probability ,conic martingales ,Uniform SDE Sim ,Mathematical Finance and Stochastic Analysis groups ,Uniform Martingale diffusions ,Maximum Entropy Stochastic Correlation ,Modelling and Simulation ,Uniformly distributed Diffusion ,Ergodic theory ,Maximum entropy stochastic correlation ,0101 mathematics ,010102 general mathematics ,Maximum Entropy Stochastic Recovery Rates ,Regular polygon ,AMS classification codes: 60H10 ,mean reverting uniform SDE ,[MATH.MATH-PR]Mathematics [math]/Probability [math.PR] ,60H10, 60J60 ,Mean Reverting Uniform SDE ,Local time ,Conic Martin ,Uniform SDE simulation - Abstract
It is known since Kellerer (1972) that for any peacock process there exist mar-tingales with the same marginal laws. Nevertheless, there is no general method for finding such martingales that yields diffusions. Indeed, Kellerer's proof is not constructive: finding the dynamics of processes associated to a given peacock is not trivial in general. In this paper we are interested in the uniform peacock that is, the peacock with uniform law at all times on a generic time-varying support [a(t), b(t)]. We derive explicitly the corresponding Stochastic Differential Equations (SDEs) and prove that, under certain conditions on the boundaries a(t) and b(t), they admit a unique strong solution yielding the relevant diffusion process. We discuss the relationship between our result and the previous derivation of diffusion processes associated to square-root and linear time-boundaries, emphasizing the cases where our approach adds strong uniqueness, and study the local time and activity of the solution processes. We then study the peacock with uniform law at all times on a constant support [−1, 1] and derive the SDE of an associated mean-reverting diffusion process with uniform margins that is not a martingale. For the related SDE we prove existence of a solution in [0, T ]. Finally, we provide a numerical case study showing that these processes have the desired uniform behaviour. These results may be used to model random probabilities, random recovery rates or random correlations.
- Published
- 2020
41. Multiresolution analysis of point processes and statistical thresholding for Haar wavelet-based intensity estimation
- Author
-
Youssef Taleb and Edward A. K. Cohen
- Subjects
Statistics and Probability ,business.industry ,Statistics & Probability ,Homogeneity (statistics) ,Multiresolution analysis ,0104 Statistics ,05 social sciences ,Estimator ,Pattern recognition ,01 natural sciences ,Thresholding ,Point process ,Haar wavelet ,010104 statistics & probability ,Wavelet ,Likelihood-ratio test ,0502 economics and business ,Artificial intelligence ,0101 mathematics ,business ,050205 econometrics ,Mathematics - Abstract
We take a wavelet based approach to the analysis of point processes and the estimation of the first order intensity under a continuous time setting. A Haar wavelet multiresolution analysis of a point process is formulated which motivates the definition of homogeneity at different scales of resolution, termed $J$-th level homogeneity. Further to this, the activity in a point process' first order behavior at different scales of resolution is also defined and termed $L$-th level innovation. Likelihood ratio tests for both these properties are proposed with asymptotic distributions provided, even when only a single realization of the point process is observed. The test for $L$-th level innovation forms the basis for a collection of statistical strategies for thresholding coefficients in a wavelet based estimator of the intensity function. These thresholding strategies outperform the existing local hard thresholding strategy on a range of simulation scenarios. The presented methodology is applied to NetFlow data to demonstrate its effectiveness at characterizing multiscale behavior on computer networks.
- Published
- 2020
42. Relative semi‐ampleness in positive characteristic
- Author
-
Hiromu Tanaka and Paolo Cascini
- Subjects
Pure mathematics ,Statement (logic) ,General Mathematics ,Invertible sheaf ,Space (mathematics) ,01 natural sciences ,0101 Pure Mathematics ,14C20, 14G17 ,math.AG ,Mathematics - Algebraic Geometry ,Mathematics::Algebraic Geometry ,0103 physical sciences ,FOS: Mathematics ,14C20 (primary) ,0101 mathematics ,Algebraic Geometry (math.AG) ,Mathematics ,Science & Technology ,0104 Statistics ,010102 general mathematics ,Zero (complex analysis) ,PROJECTIVITY ,Physical Sciences ,010307 mathematical physics ,14G17 (secondary) - Abstract
Given an invertible sheaf on a fibre space between projective varieties of positive characteristic, we show that fibrewise semi-ampleness implies relative semi-ampleness. The same statement fails in characteristic zero., 52 pages. Final version to appear in the Proceedings of the London Mathematical Society
- Published
- 2020
43. Advances in spatiotemporal models for non-communicable disease surveillance
- Author
-
Areti Boulieri, Frédéric B. Piel, Paul Elliott, Marta Blangiardo, Gavin Shaddick, Peter J. Diggle, Medical Research Council (MRC), Wellcome Trust, and Public Health England
- Subjects
Exploit ,Epidemiology ,Computer science ,Surveillance Methods ,Supplement Articles ,Bayesian hierarchical models ,01 natural sciences ,1117 Public Health and Health Services ,010104 statistics & probability ,03 medical and health sciences ,Spatio-Temporal Analysis ,0302 clinical medicine ,Humans ,Bayesian hierarchical modeling ,030212 general & internal medicine ,0101 mathematics ,Noncommunicable Diseases ,Disease surveillance ,Surveillance ,Warning system ,Model selection ,0104 Statistics ,Bayes Theorem ,General Medicine ,non-communicable diseases ,Data science ,Modifiable areal unit problem ,Conceptual framework ,Population Surveillance ,spattemporal modelling - Abstract
Surveillance systems are commonly used to provide early warning detection or to assess an impact of an intervention/policy. Traditionally, the methodological and conceptual frameworks for surveillance have been designed for infectious diseases, but the rising burden of non-communicable diseases (NCDs) worldwide suggests a pressing need for surveillance strategies to detect unusual patterns in the data and to help unveil important risk factors in this setting. Surveillance methods need to be able to detect meaningful departures from expectation and exploit dependencies within such data to produce unbiased estimates of risk as well as future forecasts. This has led to the increasing development of a range of space-time methods specifically designed for NCD surveillance. We present an overview of recent advances in spatiotemporal disease surveillance for NCDs, using hierarchically specified models. This provides a coherent framework for modelling complex data structures, dealing with data sparsity, exploiting dependencies between data sources and propagating the inherent uncertainties present in both the data and the modelling process. We then focus on three commonly used models within the Bayesian Hierarchical Model (BHM) framework and, through a simulation study, we compare their performance. We also discuss some challenges faced by researchers when dealing with NCD surveillance, including how to account for false detection and the modifiable areal unit problem. Finally, we consider how to use and interpret the complex models, how model selection may vary depending on the intended user group and how best to communicate results to stakeholders and the general public.
- Published
- 2020
44. High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9
- Author
-
G. Alastair Young
- Subjects
Statistics and Probability ,Wainwright ,Science & Technology ,0199 Other Mathematical Sciences ,Statistics & Probability ,Philosophy ,Physical Sciences ,0104 Statistics ,High-dimensional statistics ,Statistics, Probability and Uncertainty ,Humanities ,Mathematics - Published
- 2020
45. Gender Differences in the Perception of Safety in Public Transport
- Author
-
Alexander Barron, Mark Trompet, Laila Ait Bihi Ouali, and Daniel J. Graham
- Subjects
VICTIMIZATION ,Statistics and Probability ,Economics and Econometrics ,Statistics & Probability ,media_common.quotation_subject ,Social Sciences ,Customer Satisfaction ,Behavioural Responses ,PERSONAL SECURITY ,1603 Demography ,Perception ,0502 economics and business ,1403 Econometrics ,WOMENS FEAR ,050207 economics ,media_common ,Perceived safety ,Metros ,FEELING UNSAFE ,050210 logistics & transportation ,Science & Technology ,Buses ,business.industry ,0104 Statistics ,05 social sciences ,Gender ,Public Transport ,Social Sciences, Mathematical Methods ,CRIME ,Test (assessment) ,Safety ,Feeling ,Public transport ,Physical Sciences ,Demographic economics ,Customer satisfaction ,Gender gap ,Statistics, Probability and Uncertainty ,Psychology ,business ,Inclusion (education) ,Mathematical Methods In Social Sciences ,Mathematics ,Social Sciences (miscellaneous) - Abstract
Summary Concerns over women's safety on public transport systems are commonly reported in the media. We develop statistical models to test for gender differences in the perception of safety and satisfaction on urban metros and buses by using large-scale unique customer satisfaction data for 28 world cities over the period 2009–2018. Results indicate a significant gender gap in the perception of safety, with women being 10% more likely than men to feel unsafe in metros (6% for buses). This gender gap is larger for safety than for overall satisfaction (3% in metros and 2.5% in buses), which is consistent with safety being one dimension of overall satisfaction. Results are stable across specifications and robust to inclusion of city level and time controls. We find heterogeneous responses by sociodemographic characteristics. Data indicate that 45% of women feel secure in trains and metro stations (and 55% in buses). Thus the gender gap encompasses more differences in transport perception between men and women rather than an intrinsic network fear. Additional models test for the influence of metro characteristics on perceived safety levels and find that more acts of violence, larger carriages and emptier vehicles decrease women's feeling of safety.
- Published
- 2020
46. Propensity score-based methods for causal inference in observational studies with non-binary treatments
- Author
-
Shandong Zhao, Kosuke Imai, and David A. van Dyk
- Subjects
Statistics and Probability ,Models, Statistical ,Epidemiology ,Statistics & Probability ,0104 Statistics ,05 social sciences ,Causal effect ,MEDLINE ,Binary number ,01 natural sciences ,1117 Public Health and Health Services ,Causality ,010104 statistics & probability ,Health Information Management ,Causal inference ,0502 economics and business ,Propensity score matching ,Observational study ,050207 economics ,0101 mathematics ,Propensity Score ,Psychology ,Clinical psychology - Abstract
Propensity score methods are a part of the standard toolkit for applied researchers who wish to ascertain causal effects from observational data. While they were originally developed for binary treatments, several researchers have proposed generalizations of the propensity score methodology for non-binary treatment regimes. Such extensions have widened the applicability of propensity score methods and are indeed becoming increasingly popular themselves. In this article, we closely examine two methods that generalize propensity scores in this direction, namely, the propensity function (PF), and the generalized propensity score (GPS), along with two extensions of the GPS that aim to improve its robustness. We compare the assumptions, theoretical properties, and empirical performance of these methods. On a theoretical level, the GPS and its extensions are advantageous in that they are designed to estimate the full dose response function rather than the average treatment effect that is estimated with the PF. We compare GPS with a new PF method, both of which estimate the dose response function. We illustrate our findings and proposals through simulation studies, including one based on an empirical study about the effect of smoking on healthcare costs. While our proposed PF-based estimator preforms well, we generally advise caution in that all available methods can be biased by model misspecification and extrapolation.
- Published
- 2020
47. Understanding Relationships Between Chlamydial Infection, Symptoms, and Testing Behavior
- Author
-
Peter J White, Joanna Lewis, National Institute for Health Research, and Medical Research Council (MRC)
- Subjects
Male ,medicine.medical_specialty ,Adolescent ,Epidemiology ,Population ,Bayesian analysis ,01 natural sciences ,1117 Public Health and Health Services ,Young Adult ,010104 statistics & probability ,03 medical and health sciences ,Mathematical model ,0302 clinical medicine ,medicine ,Humans ,Sex organ ,030212 general & internal medicine ,Chlamydia ,0101 mathematics ,education ,Mass screening ,Reproductive health ,education.field_of_study ,business.industry ,0104 Statistics ,Sexual risk behavior ,Bayes Theorem ,Chlamydia Infections ,Partner notification ,medicine.disease ,United Kingdom ,Confidence interval ,3. Good health ,Infectious Diseases ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Female ,Statistical model ,Symptom Assessment ,business ,Demography - Abstract
Supplemental Digital Content is available in the text., Background: Genital chlamydia is the most commonly diagnosed sexually transmitted infection worldwide and can have serious long-term sequelae. Numerous countries invest substantially in testing but evidence for programs’ effectiveness is inconclusive. It is important to understand the effects of testing programs in different groups of people. Methods: We analyzed data on sexual behavior and chlamydia tests from 16-to 24-year olds in Britain’s third National Survey of Sexual Attitudes and Lifestyles, considering test setting, reason, and result. We conducted descriptive analysis accounting for survey design and nonresponse, and Bayesian analysis using a mathematical model. Results: Most men testing due to symptoms tested in sexual health settings (63%; 95% confidence interval 43%–84%) but most women testing due to symptoms were tested by general practitioners (59%; 43%–76%). Within behavioral groups, positivity of chlamydia screens (tests not prompted by symptoms or partner notification) was similar to population prevalence. Screening rates were higher in women and in those reporting more partners: median (95% credible interval) rates per year in men were 0.30 (0.25–0.36) (0 new partners), 0.45 (0.37–0.54) (1 new partner), and 0.60 (0.50–0.73) (≥2 new partners). In women, they were 0.61 (0.53–0.69) (0 new partners), 0.89 (0.75–1.04) (1 new partner), and 1.2 (1.0–1.4) (≥2 new partners). Conclusions: Proportion of testing occurring in sexual health is not a proxy for proportion prompted by symptoms. Test positivity depends on a combination of force of infection and screening rate and does not simply reflect prevalence or behavioral risk. The analysis highlights the value of recording testing reason and behavioral characteristics to inform cost-effective control.
- Published
- 2020
48. Adding value to core outcome set development using multimethod systematic reviews
- Author
-
Ginny Brunton, Sandy Oliver, James Webbe, Chris Gale, and Medical Research Council
- Subjects
STRESS ,Quality Assurance, Health Care ,Psychological intervention ,Delphi method ,THERAPY ,01 natural sciences ,Outcome (game theory) ,010104 statistics & probability ,0302 clinical medicine ,Outcome Assessment, Health Care ,030212 general & internal medicine ,PUBLIC INVOLVEMENT ,Qualitative Research ,Randomized Controlled Trials as Topic ,Evidence-Based Medicine ,MOTHERS ,0104 Statistics ,CONTROLLED-TRIALS ,Multidisciplinary Sciences ,Treatment Outcome ,Systematic review ,Research Design ,Science & Technology - Other Topics ,Psychology ,Life Sciences & Biomedicine ,CLINICAL-TRIALS ,Endpoint Determination ,RESEARCH PRIORITIES ,1117 Public Health and Health Services ,Education ,03 medical and health sciences ,Meta-Analysis as Topic ,Added value ,Humans ,QUALITY ,0101 mathematics ,Set (psychology) ,Medical education ,Science & Technology ,EXTREMELY PRETERM INFANTS ,Infant, Newborn ,Infant ,FRAMEWORK ,Clinical trial ,Sample Size ,Intensive Care, Neonatal ,Mathematical & Computational Biology ,Neonatology ,Systematic Reviews as Topic ,Qualitative research - Abstract
Trials evaluating the same interventions rarely measure or report identical outcomes. This limits the possibility of aggregating effect sizes across studies to generate high quality evidence through systematic reviews and meta-analyses. To address this problem, Core Outcome Sets (COS) establish agreed sets of outcomes to be used in all future trials. When developing COS, potential outcome domains are identified by systematically reviewing the outcomes of trials, and increasingly, through primary qualitative research exploring the experiences of key stakeholders, with relevant outcome domains subsequently determined through transdisciplinary consensus development. However, the primary qualitative component can be time consuming with unclear impact. We aimed to examine the potential added value of a qualitative systematic review alongside a quantitative systematic review of trial outcomes to inform COS development in neonatal care using case analysis methods.We compared the methods and findings of a scoping review of neonatal trial outcomes and a scoping review of qualitative research on parents', patients' and professional caregivers' perspectives of neonatal care. Together, these identified a wider range and greater depth of health and social outcome domains, some unique to each review, which were incorporated into the subsequent Delphi process and informed the final set of core outcome domains. Qualitative scoping reviews of participant perspectives research, used in conjunction with quantitative scoping reviews of trials, could identify more outcome domains for consideration and could provide greater depth of understanding to inform stakeholder group discussion in COS development. This is an innovation in the application of research synthesis methods. This article is protected by copyright. All rights reserved.
- Published
- 2020
49. Exploring the role of genetic confounding in the association between maternal and offspring body mass index: evidence from three birth cohorts
- Author
-
Sirkka Keinänen-Kiukaanniemi, Jian Yang, Matthias Wielscher, Marc J. Gunter, Sylvain Sebert, Marjo-Riitta Järvelin, Inga Prokopenko, Peter M. Visscher, Ville Karhunen, Janine F. Felix, David M. Evans, Tom Bond, Paul F. O'Reilly, Juha Auvinen, Alex Lewin, Debbie A Lawlor, Minna Männikkö, Epidemiology, Erasmus MC other, and Pediatrics
- Subjects
0301 basic medicine ,Adult ,Male ,Pediatric Obesity ,Epidemiology ,Restricted maximum likelihood ,Offspring ,Birth weight ,Mothers ,Single-nucleotide polymorphism ,Maternal ,Biology ,1117 Public Health and Health Services ,Body Mass Index ,BMI ,03 medical and health sciences ,0302 clinical medicine ,Pregnancy ,Genotype ,Birth Weight ,Humans ,030212 general & internal medicine ,Obesity ,genetic confounding ,Child ,2. Zero hunger ,offspring ,0104 Statistics ,Confounding ,General Medicine ,ALSPAC ,NFBCs ,Confidence interval ,030104 developmental biology ,Female ,Body mass index ,Demography - Abstract
Background Maternal pre-pregnancy body mass index (BMI) is positively associated with offspring birth weight (BW) and BMI in childhood and adulthood. Each of these associations could be due to causal intrauterine effects, or confounding (genetic or environmental), or some combination of these. Here we estimate the extent to which the association between maternal BMI and offspring body size is explained by offspring genotype, as a first step towards establishing the importance of genetic confounding. Methods We examined the associations of maternal pre-pregnancy BMI with offspring BW and BMI at 1, 5, 10 and 15 years, in three European birth cohorts (n ≤11 498). Bivariate Genomic-relatedness-based Restricted Maximum Likelihood implemented in the GCTA software (GCTA-GREML) was used to estimate the extent to which phenotypic covariance was explained by offspring genotype as captured by common imputed single nucleotide polymorphisms (SNPs). We merged individual participant data from all cohorts, enabling calculation of pooled estimates. Results Phenotypic covariance (equivalent here to Pearson’s correlation coefficient) between maternal BMI and offspring phenotype was 0.15 [95% confidence interval (CI): 0.13, 0.17] for offspring BW, increasing to 0.29 (95% CI: 0.26, 0.31) for offspring 15 year BMI. Covariance explained by offspring genotype was negligible for BW [–0.04 (95% CI: –0.09, 0.01)], but increased to 0.12 (95% CI: 0.04, 0.21) at 15 years, which is equivalent to 43% (95% CI: 15%, 72%) of the phenotypic covariance. Sensitivity analyses using weight, BMI and ponderal index as the offspring phenotype at all ages showed similar results. Conclusions Offspring genotype explains a substantial fraction of the covariance between maternal BMI and offspring adolescent BMI. This is consistent with a potentially important role for genetic confounding as a driver of the maternal BMI–offspring BMI association.
- Published
- 2020
50. Some Perspectives on Inference in High Dimensions
- Author
-
Battey, H, Cox, DR, Engineering and Physical Sciences Research Council, and Engineering & Physical Science Research Council (EPSRC)
- Subjects
Statistics and Probability ,Statistics & Probability ,General Mathematics ,0104 Statistics ,Statistics, Probability and Uncertainty - Abstract
With very large amounts of data, important aspects of statistical analysis may appear largely descriptive in that the role of probability sometimes seems limited or totally absent. The main emphasis of the present paper lies on contexts where formulation in terms of a probabilistic model is feasible and fruitful but to be at all realistic large numbers of unknown parameters need consideration. Then many of the standard approaches to statistical analysis, for instance direct application of the method of maximum likelihood, or the use of flat priors, often encounter difficulties. After a brief discussion of broad conceptual issues and the use of asymptotic analysis in statistical inference, we provide some new perspectives on aspects of high-dimensional statistical theory, emphasizing particularly a number of important open problems.
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.