20,453 results on '"Models, Statistical"'
Search Results
2. Covariate-adjusted generalized pairwise comparisons in small samples.
- Author
-
Jaspers S, Verbeeck J, and Thas O
- Subjects
- Humans, Sample Size, Data Interpretation, Statistical, Bias, Models, Statistical, Computer Simulation
- Abstract
Semiparametric probabilistic index models allow for the comparison of two groups of observations, whilst adjusting for covariates, thereby fitting nicely within the framework of generalized pairwise comparisons (GPC). As with most regression approaches in this setting, the limited amount of data results in invalid inference as the asymptotic normality assumption is not met. In addition, separation issues might arise when considering small samples. In this article, we show that the parameters of the probabilistic index model can be estimated using generalized estimating equations, for which adjustments exist that lead to estimators of the sandwich variance-covariance matrix with improved finite sample properties and that can deal with bias due to separation. In this way, appropriate inference can be performed as is shown through extensive simulation studies. The known relationships between the probabilistic index and other GPC statistics allow to also provide valid inference for example, the net treatment benefit or the success odds., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
3. BHAFT: Bayesian heredity-constrained accelerated failure time models for detecting gene-environment interactions in survival analysis.
- Author
-
Sun N, Chu J, He Q, Wang Y, Han Q, Yi N, Zhang R, and Shen Y
- Subjects
- Humans, Survival Analysis, Models, Statistical, Prognosis, Adenocarcinoma of Lung genetics, Adenocarcinoma of Lung mortality, Algorithms, Bayes Theorem, Gene-Environment Interaction, Lung Neoplasms genetics, Lung Neoplasms mortality, Computer Simulation
- Abstract
In addition to considering the main effects, understanding gene-environment (G × E) interactions is imperative for determining the etiology of diseases and the factors that affect their prognosis. In the existing statistical framework for censored survival outcomes, there are several challenges in detecting G × E interactions, such as handling high-dimensional omics data, diverse environmental factors, and algorithmic complications in survival analysis. The effect heredity principle has widely been used in studies involving interaction identification because it incorporates the dependence of the main and interaction effects. However, Bayesian survival models that incorporate the assumption of this principle have not been developed. Therefore, we propose Bayesian heredity-constrained accelerated failure time (BHAFT) models for identifying main and interaction (M-I) effects with novel spike-and-slab or regularized horseshoe priors to incorporate the assumption of effect heredity principle. The R package rstan was used to fit the proposed models. Extensive simulations demonstrated that BHAFT models had outperformed other existing models in terms of signal identification, coefficient estimation, and prognosis prediction. Biologically plausible G × E interactions associated with the prognosis of lung adenocarcinoma were identified using our proposed model. Notably, BHAFT models incorporating the effect heredity principle could identify both main and interaction effects, which are highly useful in exploring G × E interactions in high-dimensional survival analysis. The code and data used in our paper are available at https://github.com/SunNa-bayesian/BHAFT., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
4. Design considerations for Factorial Adaptive Multi-Arm Multi-Stage (FAST) clinical trials.
- Author
-
Beall J, Elm J, Semler MW, Wang L, Rice T, Kamel H, Mack W, and Mistry AM
- Subjects
- Humans, Data Interpretation, Statistical, Time Factors, Treatment Outcome, Endpoint Determination, Sample Size, Models, Statistical, Computer Simulation, Clinical Trials as Topic methods, Research Design
- Abstract
Background: Multi-Arm, Multi-Stage (MAMS) clinical trial designs allow for multiple therapies to be compared across a spectrum of clinical trial phases. MAMS designs fall under several overarching design groups, including adaptive designs (AD) and multi-arm (MA) designs. Factorial clinical trials designs represent a combination of factorial and MAMS trial designs and can provide increased efficiency relative to fixed, traditional designs. We explore design choices associated with Factorial Adaptive Multi-Arm Multi-Stage (FAST) designs, which represent the combination of factorial and MAMS designs., Methods: Simulation studies were conducted to assess the impact of the type of analyses, the timing of analyses, and the effect size observed across multiple outcomes on trial operating characteristics for a FAST design. Given multiple outcomes types assessed within the hypothetical trial, the primary analysis approach for each assessment varied depending on data type., Results: The simulation studies demonstrate that the proposed class of FAST trial designs can offer a framework to potentially provide improvements relative to other trial designs, such as a MAMS or factorial trial. Further, we note that the design implementation decisions, such as the timing and type of analyses conducted throughout trial, can have a great impact on trial operating characteristics., Conclusions: Motivated by a trial currently under design, our work shows that the FAST category of trial can potentially offer benefits similar to both MAMS and factorial designs; however, the chosen design aspects which can be included in a FAST trial need to be thoroughly explored during the planning phase., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
5. Addressing dispersion in mis-measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data.
- Author
-
Zhao K, Oualkacha K, Zeng Y, Shen C, Klein K, Lakhal-Chaieb L, Labbe A, Pastinen T, Hudson M, Colmegna I, Bernatsky S, and Greenwood CMT
- Subjects
- Humans, Multivariate Analysis, Arthritis, Rheumatoid genetics, Likelihood Functions, Sulfites chemistry, Sequence Analysis, DNA methods, DNA Methylation, Algorithms, Computer Simulation, Models, Statistical
- Abstract
Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS.", (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
6. A multivariate to multivariate approach for voxel-wise genome-wide association analysis.
- Author
-
Wu Q, Zhang Y, Huang X, Ma T, Hong LE, Kochunov P, and Chen S
- Subjects
- Humans, Multivariate Analysis, White Matter diagnostic imaging, Connectome methods, Models, Statistical, Brain diagnostic imaging, Corpus Callosum diagnostic imaging, Genome-Wide Association Study methods, Polymorphism, Single Nucleotide, Computer Simulation, Algorithms
- Abstract
The joint analysis of imaging-genetics data facilitates the systematic investigation of genetic effects on brain structures and functions with spatial specificity. We focus on voxel-wise genome-wide association analysis, which may involve trillions of single nucleotide polymorphism (SNP)-voxel pairs. We attempt to identify underlying organized association patterns of SNP-voxel pairs and understand the polygenic and pleiotropic networks on brain imaging traits. We propose a bi-clique graph structure (ie, a set of SNPs highly correlated with a cluster of voxels) for the systematic association pattern. Next, we develop computational strategies to detect latent SNP-voxel bi-cliques and an inference model for statistical testing. We further provide theoretical results to guarantee the accuracy of our computational algorithms and statistical inference. We validate our method by extensive simulation studies, and then apply it to the whole genome genetic and voxel-level white matter integrity data collected from 1052 participants of the human connectome project. The results demonstrate multiple genetic loci influencing white matter integrity measures on splenium and genu of the corpus callosum., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
7. Propensity score weighted multi-source exchangeability models for incorporating external control data in randomized clinical trials.
- Author
-
Wei W, Zhang Y, and Roychoudhury S
- Subjects
- Humans, Data Interpretation, Statistical, Bias, Propensity Score, Randomized Controlled Trials as Topic methods, Models, Statistical, Computer Simulation
- Abstract
Among clinical trialists, there has been a growing interest in using external data to improve decision-making and accelerate drug development in randomized clinical trials (RCTs). Here we propose a novel approach that combines the propensity score weighting (PW) and the multi-source exchangeability modelling (MEM) approaches to augment the control arm of a RCT in the rare disease setting. First, propensity score weighting is used to construct weighted external controls that have similar observed pre-treatment characteristics as the current trial population. Next, the MEM approach evaluates the similarity in outcome distributions between the weighted external controls and the concurrent control arm. The amount of external data we borrow is determined by the similarities in pretreatment characteristics and outcome distributions. The proposed approach can be applied to binary, continuous and count data. We evaluate the performance of the proposed PW-MEM method and several competing approaches based on simulation and re-sampling studies. Our results show that the PW-MEM approach improves the precision of treatment effect estimates while reducing the biases associated with borrowing data from external sources., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
8. Renewable risk assessment of heterogeneous streaming time-to-event cohorts.
- Author
-
Ding J, Li J, and Wang X
- Subjects
- Humans, Risk Assessment methods, Cohort Studies, Models, Statistical, Time Factors, Lung Neoplasms, Computer Simulation
- Abstract
The analysis of streaming time-to-event cohorts has garnered significant research attention. Most existing methods require observed cohorts from a study sequence to be independent and identically sampled from a common model. This assumption may be easily violated in practice. Our methodology operates within the framework of online data updating, where risk estimates for each cohort of interest are continuously refreshed using the latest observations and historical summary statistics. At each streaming stage, we introduce parameters to quantify the potential discrepancy between batch-specific effects from adjacent cohorts. We then employ penalized estimation techniques to identify nonzero discrepancy parameters, allowing us to adaptively adjust risk estimates based on current data and historical trends. We illustrate our proposed method through extensive empirical simulations and a lung cancer data analysis., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
9. Model driven method for exploring individual and confounding effects in spontaneous adverse event reporting databases.
- Author
-
Lv B, Li Y, Shi A, and Pan J
- Subjects
- Humans, Male, Female, United States, United States Food and Drug Administration, Sex Factors, Odds Ratio, Bias, Drug Interactions, Confounding Factors, Epidemiologic, Models, Statistical, Adverse Drug Reaction Reporting Systems statistics & numerical data, Databases, Factual, Drug-Related Side Effects and Adverse Reactions epidemiology, Data Mining methods, Computer Simulation, Product Surveillance, Postmarketing methods, Product Surveillance, Postmarketing statistics & numerical data
- Abstract
Background: Spontaneous Adverse Event Reporting (SAER) databases play a crucial role in post-marketing drug surveillance. However, the traditional model-free disproportionality analysis has been challenged by the insufficiency in investigating subgroup and confounders. These issues result in significant low-precision and biases in data mining for SAER., Methods: The Model-Driven Reporting Odds Ratio (MD-ROR) was proposed to bridge the gap between SAER database and explainable models for exploring individual and confounding effects. MD-ROR is grounded in a well-designed model, rather than a 2 × 2 cross table, for estimating AE-drug signals. Consequently, individual and confounding effects can be parameterized based on these models. We employed simulation data and the FDA Adverse Event Reporting System (FAERS) database., Result: The simulated data indicated the subgroup effects estimated by MD-ROR were unbiased and efficient. Moreover, the adjusted-MD-ROR demonstrated greater robustness against confounding biases than the crude ROR. Applying our method to the FAERS database suggested higher occurrences of drug interactions and cardiac adverse events induced by Midazolam in females compared to males., Conclusion: The study underscored that MD-ROR holds promise as a method for investigating individual and confounding effects in SAER databases.
- Published
- 2024
- Full Text
- View/download PDF
10. Simultaneous multi-transient linear-combination modeling of MRS data improves uncertainty estimation.
- Author
-
Zöllner HJ, Davies-Jenkins C, Simicic D, Tal A, Sulam J, and Oeltzschner G
- Subjects
- Humans, Reproducibility of Results, Linear Models, Sensitivity and Specificity, Signal-To-Noise Ratio, gamma-Aminobutyric Acid metabolism, Models, Statistical, Magnetic Resonance Spectroscopy methods, Computer Simulation, Monte Carlo Method, Algorithms
- Abstract
Purpose: The interest in applying and modeling dynamic MRS has recently grown. Two-dimensional modeling yields advantages for the precision of metabolite estimation in interrelated MRS data. However, it is unknown whether including all transients simultaneously in a 2D model without averaging (presuming a stable signal) performs similarly to one-dimensional (1D) modeling of the averaged spectrum. Therefore, we systematically investigated the accuracy, precision, and uncertainty estimation of both described model approaches., Methods: Monte Carlo simulations of synthetic MRS data were used to compare the accuracy and uncertainty estimation of simultaneous 2D multitransient linear-combination modeling (LCM) with 1D-LCM of the average. A total of 2,500 data sets per condition with different noise representations of a 64-transient MRS experiment at six signal-to-noise levels for two separate spin systems (scyllo-inositol and gamma-aminobutyric acid) were analyzed. Additional data sets with different levels of noise correlation were also analyzed. Modeling accuracy was assessed by determining the relative bias of the estimated amplitudes against the ground truth, and modeling precision was determined by SDs and Cramér-Rao lower bounds (CRLBs)., Results: Amplitude estimates for 1D- and 2D-LCM agreed well and showed a similar level of bias compared with the ground truth. Estimated CRLBs agreed well between both models and with ground-truth CRLBs. For correlated noise, the estimated CRLBs increased with the correlation strength for the 1D-LCM but remained stable for the 2D-LCM., Conclusion: Our results indicate that the model performance of 2D multitransient LCM is similar to averaged 1D-LCM. This validation on a simplified scenario serves as a necessary basis for further applications of 2D modeling., (© 2024 International Society for Magnetic Resonance in Medicine.)
- Published
- 2024
- Full Text
- View/download PDF
11. deepAFT: A nonlinear accelerated failure time model with artificial neural network.
- Author
-
Norman PA, Li W, Jiang W, and Chen BE
- Subjects
- Humans, Survival Analysis, Deep Learning, Models, Statistical, Neural Networks, Computer, Proportional Hazards Models, Algorithms, Computer Simulation, Nonlinear Dynamics
- Abstract
The Cox regression model or accelerated failure time regression models are often used for describing the relationship between survival outcomes and potential explanatory variables. These models assume the studied covariates are connected to the survival time or its distribution or their transformations through a function of a linear regression form. In this article, we propose nonparametric, nonlinear algorithms (deepAFT methods) based on deep artificial neural networks to model survival outcome data in the broad distribution family of accelerated failure time models. The proposed methods predict survival outcomes directly and tackle the problem of censoring via an imputation algorithm as well as re-weighting and transformation techniques based on the inverse probabilities of censoring. Through extensive simulation studies, we confirm that the proposed deepAFT methods achieve accurate predictions. They outperform the existing regression models in prediction accuracy, while being flexible and robust in modeling covariate effects of various nonlinear forms. Their prediction performance is comparable to other established deep learning methods such as deepSurv and random survival forest methods. Even though the direct output is the expected survival time, the proposed AFT methods also provide predictions for distributional functions such as the cumulative hazard and survival functions without additional learning efforts. For situations where the popular Cox regression model may not be appropriate, the deepAFT methods provide useful and effective alternatives, as shown in simulations, and demonstrated in applications to a lymphoma clinical trial study., (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
12. Nonparametric empirical Bayes biomarker imputation and estimation.
- Author
-
Barbehenn A and Zhao SD
- Subjects
- Humans, Models, Statistical, Statistics, Nonparametric, Data Interpretation, Statistical, Bayes Theorem, Biomarkers analysis, Computer Simulation
- Abstract
Biomarkers are often measured in bulk to diagnose patients, monitor patient conditions, and research novel drug pathways. The measurement of these biomarkers often suffers from detection limits that result in missing and untrustworthy measurements. Frequently, missing biomarkers are imputed so that down-stream analysis can be conducted with modern statistical methods that cannot normally handle data subject to informative censoring. This work develops an empirical Bayes g $$ g $$ -modeling method for imputing and denoising biomarker measurements. We establish superior estimation properties compared to popular methods in simulations and with real data, providing the useful biomarker measurement estimations for down-stream analysis., (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
13. A note on the Wilcoxon-Mann-Whitney test and tied observations.
- Author
-
Neuhäuser M and Ruxton GD
- Subjects
- Statistics, Nonparametric, Humans, Data Interpretation, Statistical, Models, Statistical, Computer Simulation
- Abstract
Recently, it was recommended to omit tied observations before applying the two-sample Wilcoxon-Mann-Whitney test McGee M. et al. (2018). Using a simulation study, we argue for exact tests using all the data (including tied values) as a preferable approach. Exact tests, with tied observations included guarantee the type I error rate with a better exploitation of the significance level and a larger power than the corresponding tests after the omission of tied observations. The omission of ties can produce a considerable change in the shape of the sample, and so can violate underlying test assumptions. Thus, on both theoretical and practical grounds, the recommendation to omit tied values cannot be supported, relative to analysing the whole data set in the same way whether or not ties occur, preferably with an exact permutation test., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Neuhäuser, Ruxton. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2024
- Full Text
- View/download PDF
14. A rigorous and versatile statistical test for correlations between stationary time series.
- Author
-
Yuan AE and Shou W
- Subjects
- Time Factors, Animals, Models, Statistical, Statistics, Nonparametric, Data Interpretation, Statistical, Computer Simulation
- Abstract
In disciplines from biology to climate science, a routine task is to compute a correlation between a pair of time series and determine whether the correlation is statistically significant (i.e., unlikely under the null hypothesis that the time series are independent). This problem is challenging because time series typically exhibit autocorrelation and thus cannot be properly analyzed with the standard iid-oriented statistical tests. Although there are well-known parametric tests for time series, these are designed for linear correlation statistics and thus not suitable for the increasingly popular nonlinear correlation statistics. There are also nonparametric tests that can be used with any correlation statistic, but for these, the conditions that guarantee correct false positive rates are either restrictive or unclear. Here, we describe the truncated time-shift (TTS) test, a nonparametric procedure to test for dependence between 2 time series. We prove that this test correctly controls the false positive rate as long as one of the time series is stationary, a minimally restrictive requirement among current tests. The TTS test is versatile because it can be used with any correlation statistic. Using synthetic data, we demonstrate that this test performs correctly even while other tests suffer high false positive rates. In simulation examples, simple guidelines for parameter choices allow high statistical power to be achieved with sufficient data. We apply the test to datasets from climatology, animal behavior, and microbiome science, verifying previously discovered dependence relationships and detecting additional relationships., Competing Interests: WS is a member of the PLOS Biology Editorial Board., (Copyright: © 2024 Yuan, Shou. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2024
- Full Text
- View/download PDF
15. Two-stage randomized clinical trials with a right-censored endpoint: Comparison of frequentist and Bayesian adaptive designs.
- Author
-
Boumendil L, Chevret S, Lévy V, and Biard L
- Subjects
- Humans, Sample Size, Research Design, Endpoint Determination, Leukemia, Lymphocytic, Chronic, B-Cell drug therapy, Models, Statistical, Bayes Theorem, Randomized Controlled Trials as Topic statistics & numerical data, Computer Simulation
- Abstract
Adaptive randomized clinical trials are of major interest when dealing with a time-to-event outcome in a prolonged observation window. No consensus exists either to define stopping boundaries or to combine p $$ p $$ values or test statistics in the terminal analysis in the case of a frequentist design and sample size adaptation. In a one-sided setting, we compared three frequentist approaches using stopping boundaries relying on α $$ \alpha $$ -spending functions and a Bayesian monitoring setting with boundaries based on the posterior distribution of the log-hazard ratio. All designs comprised a single interim analysis with an efficacy stopping rule and the possibility of sample size adaptation at this interim step. Three frequentist approaches were defined based on the terminal analysis: combination of stagewise statistics (Wassmer) or of p $$ p $$ values (Desseaux), or on patientwise splitting (Jörgens), and we compared the results with those of the Bayesian monitoring approach (Freedman). These different approaches were evaluated in a simulation study and then illustrated on a real dataset from a randomized clinical trial conducted in elderly patients with chronic lymphocytic leukemia. All approaches controlled for the type I error rate, except for the Bayesian monitoring approach, and yielded satisfactory power. It appears that the frequentist approaches are the best in underpowered trials. The power of all the approaches was affected by the violation of the proportional hazards (PH) assumption. For adaptive designs with a survival endpoint and a one-sided alternative hypothesis, the Wassmer and Jörgens approaches after sample size adaptation should be preferred, unless violation of PH is suspected., (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
16. Non-parametric inference on calibration of predicted risks.
- Author
-
Sadatsafavi M and Petkau J
- Subjects
- Humans, Risk Assessment methods, Myocardial Infarction mortality, Statistics, Nonparametric, Calibration, Probability, Models, Statistical, Computer Simulation
- Abstract
Moderate calibration, the expected event probability among observations with predicted probability z being equal to z, is a desired property of risk prediction models. Current graphical and numerical techniques for evaluating moderate calibration of risk prediction models are mostly based on smoothing or grouping the data. As well, there is no widely accepted inferential method for the null hypothesis that a model is moderately calibrated. In this work, we discuss recently-developed, and propose novel, methods for the assessment of moderate calibration for binary responses. The methods are based on the limiting distributions of functions of standardized partial sums of prediction errors converging to the corresponding laws of Brownian motion. The novel method relies on well-known properties of the Brownian bridge which enables joint inference on mean and moderate calibration, leading to a unified "bridge" test for detecting miscalibration. Simulation studies indicate that the bridge test is more powerful, often substantially, than the alternative test. As a case study we consider a prediction model for short-term mortality after a heart attack, where we provide suggestions on graphical presentation and the interpretation of results. Moderate calibration can be assessed without requiring arbitrary grouping of data or using methods that require tuning of parameters., (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
17. Conditional score approaches to errors-in-variables competing risks data in discrete time.
- Author
-
Wen CC and Chen YH
- Subjects
- Humans, Survival Analysis, Algorithms, Models, Statistical, Regression Analysis, Risk Assessment methods, Scleroderma, Systemic, Computer Simulation, Proportional Hazards Models
- Abstract
Analysis of competing risks data has been an important topic in survival analysis due to the need to account for the dependence among the competing events. Also, event times are often recorded on discrete time scales, rendering the models tailored for discrete-time nature useful in the practice of survival analysis. In this work, we focus on regression analysis with discrete-time competing risks data, and consider the errors-in-variables issue where the covariates are prone to measurement errors. Viewing the true covariate value as a parameter, we develop the conditional score methods for various discrete-time competing risks models, including the cause-specific and subdistribution hazards models that have been popular in competing risks data analysis. The proposed estimators can be implemented by efficient computation algorithms, and the associated large sample theories can be simply obtained. Simulation results show satisfactory finite sample performances, and the application with the competing risks data from the scleroderma lung study reveals the utility of the proposed methods., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
18. Categorical linkage-data analysis.
- Author
-
Zhang LC and Tuoto T
- Subjects
- Humans, Data Interpretation, Statistical, Probability, Medical Record Linkage methods, Models, Statistical, Computer Simulation
- Abstract
Analysis of integrated data often requires record linkage in order to join together the data residing in separate sources. In case linkage errors cannot be avoided, due to the lack a unique identity key that can be used to link the records unequivocally, standard statistical techniques may produce misleading inference if the linked data are treated as if they were true observations. In this paper, we propose methods for categorical data analysis based on linked data that are not prepared by the analyst, such that neither the match-key variables nor the unlinked records are available. The adjustment is based on the proportion of false links in the linked file and our approach allows the probabilities of correct linkage to vary across the records without requiring that one is able to estimate this probability for each individual record. It accommodates also the general situation where unmatched records that cannot possibly be correctly linked exist in all the sources. The proposed methods are studied by simulation and applied to real data., (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
19. Familywise error for multiple time-to-event endpoints in a group sequential design.
- Author
-
Thomsen HF, Lausvig NL, Pipper CB, Andersen S, Damgaard LH, Emerson SS, and Ravn H
- Subjects
- Humans, Research Design, Models, Statistical, Proportional Hazards Models, Data Interpretation, Statistical, Computer Simulation, Endpoint Determination methods
- Abstract
We investigate the familywise error rate (FWER) for time-to-event endpoints evaluated using a group sequential design with a hierarchical testing procedure for secondary endpoints. We show that, in this setup, the correlation between the log-rank test statistics at interim and at end of study is not congruent with the canonical correlation derived for normal-distributed endpoints. We show, both theoretically and by simulation, that the correlation also depends on the level of censoring, the hazard rates of the endpoints, and the hazard ratio. To optimize operating characteristics in this complex scenario, we propose a simulation-based method to assess the FWER which, better than the alpha-spending approach, can inform the choice of critical values for testing secondary endpoints., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
20. Model-based bioequivalence approach for sparse pharmacokinetic bioequivalence studies: Model selection or model averaging?
- Author
-
Philipp M, Tessier A, Donnelly M, Fang L, Feng K, Zhao L, Grosser S, Sun G, Sun W, Mentré F, and Bertrand J
- Subjects
- Humans, Pharmacokinetics, Therapeutic Equivalency, Computer Simulation, Cross-Over Studies, Models, Statistical
- Abstract
Conventional pharmacokinetic (PK) bioequivalence (BE) studies aim to compare the rate and extent of drug absorption from a test (T) and reference (R) product using non-compartmental analysis (NCA) and the two one-sided test (TOST). Recently published regulatory guidance recommends alternative model-based (MB) approaches for BE assessment when NCA is challenging, as for long-acting injectables and products which require sparse PK sampling. However, our previous research on MB-TOST approaches showed that model misspecification can lead to inflated type I error. The objective of this research was to compare the performance of model selection (MS) on R product arm data and model averaging (MA) from a pool of candidate structural PK models in MBBE studies with sparse sampling. Our simulation study was inspired by a real case BE study using a two-way crossover design. PK data were simulated using three structural models under the null hypothesis and one model under the alternative hypothesis. MB-TOST was applied either using each of the five candidate models or following MS and MA with or without the simulated model in the pool. Assuming T and R have the same PK model, our simulation shows that following MS and MA, MB-TOST controls type I error rates at or below 0.05 and attains similar or even higher power than when using the simulated model. Thus, we propose to use MS prior to MB-TOST for BE studies with sparse PK sampling and to consider MA when candidate models have similar Akaike information criterion., (© 2024 Servier and The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.)
- Published
- 2024
- Full Text
- View/download PDF
21. A Bayesian method to detect drug-drug interaction using external information for spontaneous reporting system.
- Author
-
Tada K, Maruo K, and Gosho M
- Subjects
- Humans, Drug-Related Side Effects and Adverse Reactions, Databases, Factual, Models, Statistical, United States, Bayes Theorem, Drug Interactions, Adverse Drug Reaction Reporting Systems statistics & numerical data, Computer Simulation
- Abstract
Due to the insufficiency of safety assessments of clinical trials for drugs, further assessments are required for post-marketed drugs. In addition to adverse drug reactions (ADRs) induced by one drug, drug-drug interaction (DDI)-induced ADR should also be investigated. The spontaneous reporting system (SRS) is a powerful tool for evaluating the safety of drugs continually. In this study, we propose a novel Bayesian method for detecting potential DDIs in a database collected by the SRS. By applying a power prior, the proposed method can borrow information from similar drugs for a drug assessed DDI to increase sensitivity of detection. The proposed method can also adjust the amount of the information borrowed by tuning the parameters in power prior. In the simulation study, we demonstrate the aforementioned increase in sensitivity. Depending on the scenarios, approximately 20 points of sensitivity of the proposed method increase from an existing method to a maximum. We also indicate the possibility of early detection of potential DDIs by the proposed method through analysis of the database shared by the Food and Drug Administration. In conclusion, the proposed method has a higher sensitivity and a novel criterion to detect potential DDIs early, provided similar drugs have similar observed-expected ratios to the drug under assessment., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
22. A simple and effective method for simulating nested exchangeable correlated binary data for longitudinal cluster randomised trials.
- Author
-
Bowden RA, Kasza J, and Forbes AB
- Subjects
- Humans, Longitudinal Studies, Cluster Analysis, Research Design statistics & numerical data, Models, Statistical, Data Interpretation, Statistical, Algorithms, Randomized Controlled Trials as Topic methods, Randomized Controlled Trials as Topic statistics & numerical data, Computer Simulation, Cross-Over Studies
- Abstract
Background: Simulation is an important tool for assessing the performance of statistical methods for the analysis of data and for the planning of studies. While methods are available for the simulation of correlated binary random variables, all have significant practical limitations for simulating outcomes from longitudinal cluster randomised trial designs, such as the cluster randomised crossover and the stepped wedge trial designs. For these trial designs as the number of observations in each cluster increases these methods either become computationally infeasible or their range of allowable correlations rapidly shrinks to zero., Methods: In this paper we present a simple method for simulating binary random variables with a specified vector of prevalences and correlation matrix. This method allows for the outcome prevalence to change due to treatment or over time, and for a 'nested exchangeable' correlation structure, in which observations in the same cluster are more highly correlated if they are measured in the same time period than in different time periods, and where different individuals are measured in each time period. This means that our method is also applicable to more general hierarchical clustered data contexts, such as students within classrooms within schools. The method is demonstrated by simulating 1000 datasets with parameters matching those derived from data from a cluster randomised crossover trial assessing two variants of stress ulcer prophylaxis., Results: Our method is orders of magnitude faster than the most well known general simulation method while also allowing a much wider range of correlations than alternative methods. An implementation of our method is available in an R package NestBin., Conclusions: This simulation method is the first to allow for practical and efficient simulation of large datasets of binary outcomes with the commonly used nested exchangeable correlation structure. This will allow for much more effective testing of designs and inference methods for longitudinal cluster randomised trials with binary outcomes., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
23. Clinical trials with mechanism evaluation of intervention(s): mind the power and sample size calculation.
- Author
-
Lee KM, Hellier J, and Emsley R
- Subjects
- Sample Size, Humans, Mediation Analysis, Intention to Treat Analysis, Treatment Outcome, Data Interpretation, Statistical, Linear Models, Models, Statistical, Computer Simulation, Randomized Controlled Trials as Topic methods, Research Design
- Abstract
Background: Mediation analysis, often completed as secondary analysis to estimating the main treatment effect, investigates situations where an exposure may affect an outcome both directly and indirectly through intervening mediator variables. Although there has been much research on power in mediation analyses, most of this has focused on the power to detect indirect effects. Little consideration has been given to the extent to which the strength of the mediation pathways, i.e., the intervention-mediator path and the mediator-outcome path respectively, may affect the power to detect the total effect, which would correspond to the intention-to-treat effect in a randomized trial., Methods: We conduct a simulation study to evaluate the relation between the mediation pathways and the power of testing the total treatment effect, i.e., the intention-to-treat effect. Consider a sample size that is computed based on the usual formula for testing the total effect in a two-arm trial. We generate data for a continuous mediator and a normal outcome using the conventional mediation models. We estimate the total effect using simple linear regression and evaluate the power of a two-sided test. We explore multiple data generating scenarios by varying the magnitude of the mediation paths whilst keeping the total effect constant., Results: Simulations show the estimated total effect is unbiased across the considered scenarios as expected, but the mean of its standard error increases with the magnitude of the mediator-outcome path and the variability in the residual error of the mediator, respectively. Consequently, this affects the power of testing the total effect, which is always lower than planned when the mediator-outcome path is non-trivial and a naive sample size was employed. Analytical explanation confirms that the intervention-mediator path does not affect the power of testing the total effect but the mediator-outcome path. The usual effect size consideration can be adjusted to account for the magnitude of the mediator-outcome path and its residual error., Conclusions: The sample size calculation for studies with efficacy and mechanism evaluation should account for the mediator-outcome association or risk the power to detect the total effect/intention-to-treat effect being lower than planned., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
24. Simulating survival data when one subgroup lacks information.
- Author
-
Zhao Y, Yan P, and Yang X
- Subjects
- Humans, Survival Analysis, Models, Statistical, Research Design statistics & numerical data, Data Interpretation, Statistical, Clinical Trials as Topic statistics & numerical data, Clinical Trials as Topic methods, Computer Simulation
- Abstract
In this paper, we aim to show the process of simulating survival data when the distribution of the overall population and one subgroup (called "positive subgroup") as well as the proportion of the subgroup is known, while the distribution of the other subgroup (called "negative subgroup") is unknown. We propose a combination method which generates survival data of the positive subgroup and negative subgroup, respectively, and survival data of the overall population are the combination of the two subgroups. The parameters of the overall population and the positive subgroup need to satisfy certain constraints, otherwise the parameters may lead to contradictions. From simulation, we show that our proposed combination method can reflect the correlation between the test statistics of overall population and positive subgroup, which makes the simulated data more realistic and the results of simulation more reliable. Moreover, for a multiplicity control in trial design, the combination method can help to determine the α splitting strategy between primary endpoints, and is helpful in designs of clinical trials as shown in three applications.
- Published
- 2024
- Full Text
- View/download PDF
25. On variance estimation of target population created by inverse probability weighting.
- Author
-
Chen J, Chen R, Feng Y, Tan M, Chen P, and Wu Y
- Subjects
- Humans, Data Interpretation, Statistical, Observational Studies as Topic statistics & numerical data, Observational Studies as Topic methods, Sarcopenia epidemiology, Sarcopenia physiopathology, Propensity Score, Probability, Computer Simulation, Models, Statistical
- Abstract
Inverse probability weighting (IPW) is frequently used to reduce or minimize the observed confounding in observational studies. IPW creates a pseudo-sample by weighting each individual by the inverse of the conditional probability of receiving the treatment level that he/she has actually received. In the pseudo-sample there is no variation among the multiple individuals generated by weighting the same individual in the original sample. This would reduce the variability of the data and therefore bias the variance estimate in the target population. Conventional variance estimation methods for IPW estimators generally ignore this underestimation and tend to produce biased estimates of variance. We here propose a more reasonable method that incorporates this source of variability by using parametric bootstrapping based on intra-stratum variability estimates. This approach firstly uses propensity score stratification and intra-stratum standard deviation to approximate the variability among multiple individuals generated based on a single individual whose propensity score falls within the corresponding stratum. The parametric bootstrapping is then used to incorporate the target variability by re-generating outcomes after adding a random error term to the original data. The performance of the proposed method is compared with three existing methods including the naïve model-based variance estimator, the nonparametric bootstrap variance estimator, and the robust variance estimator in the simulation section. An example of patients with sarcopenia is used to illustrate the implementation of the proposed approach. According to the results, the proposed approach has desirable statistical properties and can be easily implemented using the provided R code.
- Published
- 2024
- Full Text
- View/download PDF
26. A systematic approach to adaptive sequential design for clinical trials: using simulations to select a design with desired operating characteristics.
- Author
-
Gao P and Zhang W
- Subjects
- Humans, Sample Size, Models, Statistical, Clinical Trials, Phase III as Topic statistics & numerical data, Clinical Trials, Phase III as Topic methods, Data Interpretation, Statistical, Computer Simulation, Research Design statistics & numerical data
- Abstract
The failure rates of phase 3 trials are high. Incorrect sample size due to uncertainty of effect size could be a critical contributing factor. Adaptive sequential design (ASD), which may include one or more sample size re-estimations (SSR), has been a popular approach for dealing with such uncertainties. The operating characteristics (OCs) of ASD, including the unconditional power and mean sample size, can be substantially affected by many factors, including the planned sample size, the interim analysis schedule and choice of critical boundaries and rules for interim analysis. We propose a systematic, comprehensive strategy which uses iterative simulations to investigate the operating characteristics of adaptive designs and help achieve adequate unconditional power and cost-effective mean sample size if the effect size is in a pre-identified range.
- Published
- 2024
- Full Text
- View/download PDF
27. Multiple test procedures of disease prevalence based on stratified partially validated series in the presence of a gold standard.
- Author
-
Qiu SF, Zhang XL, Qu YQ, and Han YQ
- Subjects
- Humans, Prevalence, Data Interpretation, Statistical, Reproducibility of Results, Research Design statistics & numerical data, Computer Simulation, Models, Statistical
- Abstract
This paper discusses the problem of disease prevalence in clinical studies, focusing on multiple comparisons based on stratified partially validated series in the presence of a gold standard. Five test statistics, including two Wald-type test statistics, the inverse hyperbolic tangent transformation test statistic, likelihood ratio test statistic, and score test statistic, are proposed to conduct multiple comparisons. To control the overall type I error rate, several adjustment procedures are developed, namely the Bonferroni, Single-step adjusted MaxT, Single-step adjusted MinP, Holm's Step-down, and Hochberg's step-up procedures, based on these test statistics. The performance of the proposed methods is evaluated through simulation studies in terms of the empirical type I error rate and empirical power. Simulation results show that the Single-step adjusted MaxT procedure and Single-step adjusted MinP procedure generally outperform the other three procedures, and these two test procedures based on all test statistics have satisfactory performance. Notably, the Single-step adjusted MinP procedure tends to exhibit higher empirical power than the Single-step adjusted MaxT procedure. Furthermore, the Step-down and Step-up procedures show greater power compared to the Bonferroni method. The study also observes that as the validated ratio increases, the empirical type I errors of all test procedures approach the nominal level while maintaining higher power. Two real examples are presented to illustrate the proposed methods.
- Published
- 2024
- Full Text
- View/download PDF
28. Estimation of treatment effects in early-phase randomized clinical trials involving external control data.
- Author
-
Götte H, Kirchner M, Krisam J, Allignol A, Schüler A, and Kieser M
- Subjects
- Humans, Treatment Outcome, Data Interpretation, Statistical, Sample Size, Research Design statistics & numerical data, Models, Statistical, Proportional Hazards Models, Randomized Controlled Trials as Topic statistics & numerical data, Randomized Controlled Trials as Topic methods, Computer Simulation
- Abstract
There are good reasons to perform a randomized controlled trial (RCT) even in early phases of clinical development. However, the low sample sizes in those settings lead to high variability of the treatment effect estimate. The variability could be reduced by adding external control data if available. For the common setting of suitable subject-level control group data only available from one external (clinical trial or real-world) data source, we evaluate different analysis options for estimating the treatment effect via hazard ratios. The impact of the external control data is usually guided by the level of similarity with the current RCT data. Such level of similarity can be determined via outcome and/or baseline covariate data comparisons. We provide an overview over existing methods, propose a novel option for a combined assessment of outcome and baseline data, and compare a selected set of approaches in a simulation study under varying assumptions regarding observable and unobservable confounder distributions using a time-to-event model. Our various simulation scenarios also reflect the differences between external clinical trial and real-world data. Data combinations via simple outcome-based borrowing or simple propensity score weighting with baseline covariate data are not recommended. Analysis options which conflate outcome and baseline covariate data perform best in our simulation study.
- Published
- 2024
- Full Text
- View/download PDF
29. Assessing the performance of methods for central statistical monitoring of a binary or continuous outcome in multi-center trials: A simulation study.
- Author
-
Ge L, Wang Z, Liu CC, Childress S, Wildfire J, and Wu G
- Subjects
- Humans, Data Interpretation, Statistical, Research Design, Sample Size, Computer Simulation, Models, Statistical, Multicenter Studies as Topic methods, Area Under Curve
- Abstract
Background: Quality study monitoring is fundamental to patient safety and data integrity. Regulators and industry consortia have increasingly advocated for risk-based monitoring (RBM) and central statistical monitoring (CSM) for more effective and efficient monitoring. Assessing which statistical methods underpin these approaches can best identify unusual data patterns in multi-center clinical trials that may be driven by potential systematic errors is important., Methods: We assessed various CSM techniques, including cross-tests, fixed-effects, mixed-effects, and finite mixture models, across scenarios with different sample sizes, contamination rates, and overdispersion via simulation. Our evaluation utilized threshold-independent metrics such as the area under the curve (AUC) and average precision (AP), offering a fuller picture of CSM performance., Results: All CSM methods showed consistent characteristics across center sizes or overdispersion. The adaptive finite mixture model outperformed others in AUC and AP, especially at 30% contamination, upholding high specificity unless converging to a single-component model due to low contamination or deviation. The mixed-effects model performed well at lower contamination rates. However, it became conservative in specificity and exhibited declined performance for binary outcomes under high deviation. Cross-tests and fixed-effects methods underperformed, especially when deviation increased., Conclusion: Our evaluation explored the merits and drawbacks of multiple CSM methods, and found that relying on sensitivity and specificity alone is likely insufficient to fully measure predictive performance. The finite mixture method demonstrated more consistent performance across scenarios by mitigating the influence of outliers. In practice, considering the study-specific costs of false positives/negatives with available resources for monitoring is important., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024. Published by Elsevier Inc.)
- Published
- 2024
- Full Text
- View/download PDF
30. Balancing versus modelling in weighted analysis of non-randomised studies with survival outcomes: A simulation study.
- Author
-
Filla T, Schwender H, and Kuss O
- Subjects
- Humans, Survival Analysis, Computer Simulation, Propensity Score, Models, Statistical
- Abstract
Weighting methods are widely used for causal effect estimation in non-randomised studies. In general, these methods use the propensity score (PS), the probability of receiving the treatment given the covariates, to arrive at the respective weights. All of these "modelling" methods actually optimize prediction of the respective outcome, which is, in the PS model, treatment assignment. However, this does not match with the actual aim of weighting, which is eliminating the association between covariates and treatment assignment. In the "balancing" approach, covariates are thus balanced directly by solving systems of numerical equations, explicitly without fitting a PS model. To compare modelling, balancing and hybrid approaches to weighting we performed a large simulation study for a binary treatment and a survival outcome. For maximal practical relevance all simulation parameters were selected after a systematic review of medical studies that used PS methods for analysis. We also introduce a new hybrid method that uses the idea of the covariate balancing propensity score and matching weights, thus avoiding extreme weights. In addition, we present a corrected robust variance estimator for some of the methods. Overall, our simulations results indicate that balancing approach methods work worse than expected. However, among the considered balancing methods, entropy balancing consistently outperforms the variance balancing approach. All methods estimating the average treatment effect in the overlap population perform well with very little bias and small standard errors even in settings with misspecified propensity score models. Finally, the coverage using the standard robust variance estimator was too high for all methods, with the proposed corrected robust variance estimator improving coverage in a variety of settings., (© 2024 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
31. Fractional accumulative calibration-free odds (f-aCFO) design for delayed toxicity in phase I clinical trials.
- Author
-
Fang J and Yin G
- Subjects
- Humans, Research Design, Dose-Response Relationship, Drug, Calibration, Drug-Related Side Effects and Adverse Reactions, Models, Statistical, Time Factors, Clinical Trials, Phase I as Topic methods, Computer Simulation
- Abstract
The calibration-free odds (CFO) design has been demonstrated to be robust, model-free, and practically useful but faces challenges when dealing with late-onset toxicity. The emergence of the time-to-event (TITE) method and fractional method leads to the development of TITE-CFO and fractional CFO (fCFO) designs to accumulate delayed toxicity. Nevertheless, existing CFO-type designs have untapped potential because they primarily consider dose information from the current position and its two neighboring positions. To incorporate information from all doses, we propose the accumulative CFO (aCFO) design by utilizing data at all dose levels similar to a tug-of-war game where players distant from the center also contribute their strength. This approach enhances full information utilization while still preserving the model-free and calibration-free characteristics. Extensive simulation studies demonstrate performance improvement over the original CFO design, emphasizing the advantages of incorporating information from a broader range of dose levels. Furthermore, we propose to incorporate late-onset outcomes into the TITE-aCFO and f-aCFO designs, with f-aCFO displaying superior performance over existing methods in both fixed and random simulation scenarios. In conclusion, the aCFO and f-aCFO designs can be considered robust, efficient, and user-friendly approaches for conducting phase I trials without or with late-onsite toxicity., (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
32. Assessing heterogeneity in surrogacy using censored data.
- Author
-
Parast L, Tian L, and Cai T
- Subjects
- Humans, Female, Models, Statistical, Male, Gonadal Steroid Hormones blood, Gonadal Steroid Hormones therapeutic use, Statistics, Nonparametric, Data Interpretation, Statistical, Diabetes Mellitus, Biomarkers blood, Computer Simulation, Blood Glucose analysis
- Abstract
Determining whether a surrogate marker can be used to replace a primary outcome in a clinical study is complex. While many statistical methods have been developed to formally evaluate a surrogate marker, they generally do not provide a way to examine heterogeneity in the utility of a surrogate marker. Similar to treatment effect heterogeneity, where the effect of a treatment varies based on a patient characteristic, heterogeneity in surrogacy means that the strength or utility of the surrogate marker varies based on a patient characteristic. The few methods that have been recently developed to examine such heterogeneity cannot accommodate censored data. Studies with a censored outcome are typically the studies that could most benefit from a surrogate because the follow-up time is often long. In this paper, we develop a robust nonparametric approach to assess heterogeneity in the utility of a surrogate marker with respect to a baseline variable in a censored time-to-event outcome setting. In addition, we propose and evaluate a testing procedure to formally test for heterogeneity at a single time point or across multiple time points simultaneously. Finite sample performance of our estimation and testing procedure are examined in a simulation study. We use our proposed method to investigate the complex relationship between change in fasting plasma glucose, diabetes, and sex hormones using data from the diabetes prevention program study., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
33. Flexible cost-penalized Bayesian model selection: Developing inclusion paths with an application to diagnosis of heart disease.
- Author
-
Porter EM, Franck CT, and Adams S
- Subjects
- Humans, Health Care Costs statistics & numerical data, Male, Bayes Theorem, Heart Diseases economics, Heart Diseases diagnosis, Models, Statistical, Computer Simulation
- Abstract
We propose a Bayesian model selection approach that allows medical practitioners to select among predictor variables while taking their respective costs into account. Medical procedures almost always incur costs in time and/or money. These costs might exceed their usefulness for modeling the outcome of interest. We develop Bayesian model selection that uses flexible model priors to penalize costly predictors a priori and select a subset of predictors useful relative to their costs. Our approach (i) gives the practitioner control over the magnitude of cost penalization, (ii) enables the prior to scale well with sample size, and (iii) enables the creation of our proposed inclusion path visualization, which can be used to make decisions about individual candidate predictors using both probabilistic and visual tools. We demonstrate the effectiveness of our inclusion path approach and the importance of being able to adjust the magnitude of the prior's cost penalization through a dataset pertaining to heart disease diagnosis in patients at the Cleveland Clinic Foundation, where several candidate predictors with various costs were recorded for patients, and through simulated data., (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
34. HIV estimation using population-based surveys with non-response: A partial identification approach.
- Author
-
Adegboye OA, Fujii T, Leung DH, and Siyu L
- Subjects
- Humans, Kenya epidemiology, Prevalence, Malawi epidemiology, Models, Statistical, Zambia epidemiology, Male, Female, Bias, Data Interpretation, Statistical, HIV Infections epidemiology, Computer Simulation, Health Surveys
- Abstract
HIV estimation using data from the demographic and health surveys (DHS) is limited by the presence of non-response and test refusals. Conventional adjustments such as imputation require the data to be missing at random. Methods that use instrumental variables allow the possibility that prevalence is different between the respondents and non-respondents, but their performance depends critically on the validity of the instrument. Using Manski's partial identification approach, we form instrumental variable bounds for HIV prevalence from a pool of candidate instruments. Our method does not require all candidate instruments to be valid. We use a simulation study to evaluate and compare our method against its competitors. We illustrate the proposed method using DHS data from Zambia, Malawi and Kenya. Our simulations show that imputation leads to seriously biased results even under mild violations of non-random missingness. Using worst case identification bounds that do not make assumptions about the non-response mechanism is robust but not informative. By taking the union of instrumental variable bounds balances informativeness of the bounds and robustness to inclusion of some invalid instruments. Non-response and refusals are ubiquitous in population based HIV data such as those collected under the DHS. Partial identification bounds provide a robust solution to HIV prevalence estimation without strong assumptions. Union bounds are significantly more informative than the worst case bounds without sacrificing credibility., (© 2024 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
35. Convergence, sampling and total order estimator effects on parameter orthogonality in global sensitivity analysis.
- Author
-
Saxton H, Xu X, Schenkel T, Clayton RH, and Halliday I
- Subjects
- Humans, Computational Biology methods, Models, Cardiovascular, Models, Statistical, Algorithms, Models, Biological, Uncertainty, Computer Simulation
- Abstract
Dynamical system models typically involve numerous input parameters whose "effects" and orthogonality need to be quantified through sensitivity analysis, to identify inputs contributing the greatest uncertainty. Whilst prior art has compared total-order estimators' role in recovering "true" effects, assessing their ability to recover robust parameter orthogonality for use in identifiability metrics has not been investigated. In this paper, we perform: (i) an assessment using a different class of numerical models representing the cardiovascular system, (ii) a wider evaluation of sampling methodologies and their interactions with estimators, (iii) an investigation of the consequences of permuting estimators and sampling methodologies on input parameter orthogonality, (iv) a study of sample convergence through resampling, and (v) an assessment of whether positive outcomes are sustained when model input dimensionality increases. Our results indicate that Jansen or Janon estimators display efficient convergence with minimum uncertainty when coupled with Sobol and the lattice rule sampling methods, making them prime choices for calculating parameter orthogonality and influence. This study reveals that global sensitivity analysis is convergence driven. Unconverged indices are subject to error and therefore the true influence or orthogonality of the input parameters are not recovered. This investigation importantly clarifies the interactions of the estimator and the sampling methodology by reducing the associated ambiguities, defining novel practices for modelling in the life sciences., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Saxton et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2024
- Full Text
- View/download PDF
36. A unified Bayesian framework for bias adjustment in multiple comparisons from clinical trials.
- Author
-
Du Y, Li J, Raha S, and Qu Y
- Subjects
- Humans, Models, Statistical, Double-Blind Method, Selection Bias, Bias, Multicenter Studies as Topic, Clinical Trials as Topic statistics & numerical data, Bayes Theorem, Computer Simulation, Randomized Controlled Trials as Topic statistics & numerical data
- Abstract
In clinical trials, multiple comparisons arising from various treatments/doses, subgroups, or endpoints are common. Typically, trial teams focus on the comparison showing the largest observed treatment effect, often involving a specific treatment pair and endpoint within a subgroup. These findings frequently lead to follow-up pivotal studies, many of which do not confirm the initial positive results. Selection bias occurs when the most promising treatment, subgroup, or endpoint is chosen for further development, potentially skewing subsequent investigations. Such bias can be defined as the deviation in the observed treatment effects from the underlying truth. In this article, we propose a general and unified Bayesian framework to address selection bias in clinical trials with multiple comparisons. Our approach does not require a priori specification of a parametric distribution for the prior, offering a more flexible and generalized solution. The proposed method facilitates a more accurate interpretation of clinical trial results by adjusting for such selection bias. Through simulation studies, we compared several methods and demonstrated their superior performance over the normal shrinkage estimator. We recommended the use of Bayesian Model Averaging estimator averaging over Gaussian Mixture Models as the prior distribution based on its performance and flexibility. We applied the method to a multicenter, randomized, double-blind, placebo-controlled study investigating the cardiovascular effects of dulaglutide., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
37. Causal mediation analysis with a three-dimensional image mediator.
- Author
-
Chen M and Zhou Y
- Subjects
- Humans, Female, Magnetic Resonance Imaging, Imaging, Three-Dimensional methods, Cesarean Section statistics & numerical data, Models, Statistical, Intelligence, Child, Pregnancy, Child Development, White Matter diagnostic imaging, Mediation Analysis, Causality, Computer Simulation
- Abstract
Causal mediation analysis is increasingly abundant in biology, psychology, and epidemiology studies and so forth. In particular, with the advent of the big data era, the issue of high-dimensional mediators is becoming more prevalent. In neuroscience, with the widespread application of magnetic resonance technology in the field of brain imaging, studies on image being a mediator emerged. In this study, a novel causal mediation analysis method with a three-dimensional image mediator is proposed. We define the average casual effects under the potential outcome framework, explore several sufficient conditions for the valid identification, and develop techniques for estimation and inference. To verify the effectiveness of the proposed method, a series of simulations under various scenarios is performed. Finally, the proposed method is applied to a study on the causal effect of mother's delivery mode on child's IQ development. It is found that cesarean section may have a negative effect on intellectual performance and that this effect is mediated by white matter development. Additional prospective and longitudinal studies may be necessary to validate these emerging findings., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
38. A fast bootstrap algorithm for causal inference with large data.
- Author
-
Kosko M, Wang L, and Santacatterina M
- Subjects
- Humans, Female, Confidence Intervals, Coronary Disease epidemiology, Models, Statistical, Data Interpretation, Statistical, Bias, Observational Studies as Topic methods, Observational Studies as Topic statistics & numerical data, Algorithms, Causality, Computer Simulation
- Abstract
Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non-causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this article, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
39. Novel non-linear models for clinical trial analysis with longitudinal data: A tutorial using SAS for both frequentist and Bayesian methods.
- Author
-
Wang G, Wang W, Mangal B, Liao Y, Schneider L, Li Y, Xiong C, McDade E, Kennedy R, Bateman R, and Cutter G
- Subjects
- Humans, Longitudinal Studies, Nonlinear Dynamics, Proportional Hazards Models, Data Interpretation, Statistical, Bayes Theorem, Clinical Trials as Topic methods, Models, Statistical, Computer Simulation
- Abstract
Longitudinal data from clinical trials are commonly analyzed using mixed models for repeated measures (MMRM) when the time variable is categorical or linear mixed-effects models (ie, random effects model) when the time variable is continuous. In these models, statistical inference is typically based on the absolute difference in the adjusted mean change (for categorical time) or the rate of change (for continuous time). Previously, we proposed a novel approach: modeling the percentage reduction in disease progression associated with the treatment relative to the placebo decline using proportional models. This concept of proportionality provides an innovative and flexible method for simultaneously modeling different cohorts, multivariate endpoints, and jointly modeling continuous and survival endpoints. Through simulated data, we demonstrate the implementation of these models using SAS procedures in both frequentist and Bayesian approaches. Additionally, we introduce a novel method for implementing MMRM models (ie, analysis of response profile) using the nlmixed procedure., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
40. A new type of generalized information criterion for regularization parameter selection in penalized regression with application to treatment process data.
- Author
-
Ghatari AH and Aminghafari M
- Subjects
- Humans, Male, Female, Prostatic Neoplasms drug therapy, Breast Neoplasms drug therapy, Regression Analysis, Parkinson Disease drug therapy, Parkinson Disease diagnosis, Data Interpretation, Statistical, Computer Simulation, Models, Statistical
- Abstract
We propose a new approach to select the regularization parameter using a new version of the generalized information criterion ( GIC ) in the subject of penalized regression. We prove the identifiability of bridge regression model as a prerequisite of statistical modeling. Then, we propose asymptotically efficient generalized information criterion ( AGIC ) and prove that it has asymptotic loss efficiency. Also, we verified the better performance of AGIC in comparison to the older versions of GIC . Furthermore, we propose MSE search paths to order the selected features by lasso regression based on numerical studies. The MSE search paths provide a way to cover the lack of feature ordering in lasso regression model. The performance of AGIC with other types of GIC is compared using MSE and model utility in simulation study. We exert AGIC and other criteria to analyze breast and prostate cancer and Parkinson disease datasets. The results confirm the superiority of AGIC in almost all situations.
- Published
- 2024
- Full Text
- View/download PDF
41. A Bayesian phase I-II clinical trial design to find the biological optimal dose on drug combination.
- Author
-
Wang Z, Zhang J, Xia T, He R, and Yan F
- Subjects
- Humans, Antineoplastic Combined Chemotherapy Protocols administration & dosage, Antineoplastic Combined Chemotherapy Protocols therapeutic use, Antineoplastic Combined Chemotherapy Protocols adverse effects, Logistic Models, Neoplasms drug therapy, Linear Models, Models, Statistical, Bayes Theorem, Clinical Trials, Phase I as Topic statistics & numerical data, Clinical Trials, Phase I as Topic methods, Clinical Trials, Phase II as Topic statistics & numerical data, Clinical Trials, Phase II as Topic methods, Dose-Response Relationship, Drug, Research Design statistics & numerical data, Computer Simulation
- Abstract
In recent years, combined therapy shows expected treatment effect as they increase dose intensity, work on multiple targets and benefit more patients for antitumor treatment. However, dose -finding designs for combined therapy face a number of challenges. Therefore, under the framework of phase I-II, we propose a two-stage dose -finding design to identify the biologically optimal dose combination (BODC), defined as the one with the maximum posterior mean utility under acceptable safety. We model the probabilities of toxicity and efficacy by using linear logistic regression models and conduct Bayesian model selection (BMS) procedure to define the most likely pattern of dose-response surface. The BMS can adaptively select the most suitable model during the trial, making the results robust. We investigated the operating characteristics of the proposed design through simulation studies under various practical scenarios and showed that the proposed design is robust and performed well.
- Published
- 2024
- Full Text
- View/download PDF
42. Two-stage response adaptive randomization designs for multi-arm trials with binary outcome.
- Author
-
Lu X and Shan G
- Subjects
- Humans, Sample Size, Clinical Trials, Phase II as Topic statistics & numerical data, Clinical Trials, Phase II as Topic methods, Pancreatic Neoplasms drug therapy, Pancreatic Neoplasms genetics, Random Allocation, Treatment Outcome, BRCA2 Protein genetics, BRCA1 Protein genetics, Models, Statistical, Data Interpretation, Statistical, Research Design statistics & numerical data, Randomized Controlled Trials as Topic statistics & numerical data, Randomized Controlled Trials as Topic methods, Computer Simulation
- Abstract
In recent years, adaptive randomization methods have gained significant popularity in clinical research and trial design due to their ability to provide both efficiency and flexibility in adjusting the statistical procedures of ongoing clinical trials. For a study to compare multiple treatments, a multi-arm two-stage design could be utilized to select the best treatment from the first stage and further compare that treatment with control in the second stage. The traditional design used equal randomization in both stages. To better utilize the interim results from the first stage, we propose to develop response adaptive randomization two-stage designs for a multi-arm clinical trial with binary outcome. Two allocation methods are considered: (1) an optimal allocation based on a sequential design; (2) the play-the-winner rule. Optimal multi-arm two-stage designs are obtained under three criteria: minimizing the expected number of failures, minimizing the average expected sample size, and minimizing the expected sample size under the null hypothesis. Simulation studies show that the proposed adaptive design based on the play-the-winner rule has good performance. A phase II trial for patients with pancreas adenocarcinoma and a germline BRCA / PALB2 mutation was used to illustrate the application of the proposed response adaptive randomization designs.
- Published
- 2024
- Full Text
- View/download PDF
43. Post-selection inference in regression models for group testing data.
- Author
-
Shen Q, Gregory K, and Huang X
- Subjects
- Likelihood Functions, Humans, Logistic Models, Data Interpretation, Statistical, Biometry methods, Models, Statistical, Computer Simulation, Algorithms
- Abstract
We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
44. Adjusting for incomplete baseline covariates in randomized controlled trials: a cross-world imputation framework.
- Author
-
Song Y, Hughes JP, and Ye T
- Subjects
- Humans, Data Interpretation, Statistical, Child, Biometry methods, Adenoidectomy statistics & numerical data, Tonsillectomy statistics & numerical data, Randomized Controlled Trials as Topic statistics & numerical data, Randomized Controlled Trials as Topic methods, Computer Simulation, Models, Statistical
- Abstract
In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
45. Sensitivity analysis for publication bias in meta-analysis of sparse data based on exact likelihood.
- Author
-
Hu T, Zhou Y, and Hattori S
- Subjects
- Humans, Likelihood Functions, Linear Models, Data Interpretation, Statistical, Models, Statistical, Sensitivity and Specificity, Biometry methods, Meta-Analysis as Topic, Publication Bias statistics & numerical data, Computer Simulation
- Abstract
Meta-analysis is a powerful tool to synthesize findings from multiple studies. The normal-normal random-effects model is widely used to account for between-study heterogeneity. However, meta-analyses of sparse data, which may arise when the event rate is low for binary or count outcomes, pose a challenge to the normal-normal random-effects model in the accuracy and stability in inference since the normal approximation in the within-study model may not be good. To reduce bias arising from data sparsity, the generalized linear mixed model can be used by replacing the approximate normal within-study model with an exact model. Publication bias is one of the most serious threats in meta-analysis. Several quantitative sensitivity analysis methods for evaluating the potential impacts of selective publication are available for the normal-normal random-effects model. We propose a sensitivity analysis method by extending the likelihood-based sensitivity analysis with the $t$-statistic selection function of Copas to several generalized linear mixed-effects models. Through applications of our proposed method to several real-world meta-analyses and simulation studies, the proposed method was proven to outperform the likelihood-based sensitivity analysis based on the normal-normal model. The proposed method would give useful guidance to address publication bias in the meta-analysis of sparse data., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
46. A Bayesian latent-subgroup platform design for dose optimization.
- Author
-
Mu R, Zhan X, Tang RS, and Yuan Y
- Subjects
- Humans, Drug Development methods, Drug Development statistics & numerical data, Models, Statistical, United States, United States Food and Drug Administration, Neoplasms drug therapy, Research Design, Biometry methods, Bayes Theorem, Maximum Tolerated Dose, Computer Simulation, Dose-Response Relationship, Drug, Antineoplastic Agents administration & dosage
- Abstract
The US Food and Drug Administration launched Project Optimus to reform the dose optimization and dose selection paradigm in oncology drug development, calling for the paradigm shift from finding the maximum tolerated dose to the identification of optimal biological dose (OBD). Motivated by a real-world drug development program, we propose a master-protocol-based platform trial design to simultaneously identify OBDs of a new drug, combined with standards of care or other novel agents, in multiple indications. We propose a Bayesian latent subgroup model to accommodate the treatment heterogeneity across indications, and employ Bayesian hierarchical models to borrow information within subgroups. At each interim analysis, we update the subgroup membership and dose-toxicity and -efficacy estimates, as well as the estimate of the utility for risk-benefit tradeoff, based on the observed data across treatment arms to inform the arm-specific decision of dose escalation and de-escalation and identify the OBD for each arm of a combination partner and an indication. The simulation study shows that the proposed design has desirable operating characteristics, providing a highly flexible and efficient way for dose optimization. The design has great potential to shorten the drug development timeline, save costs by reducing overlapping infrastructure, and speed up regulatory approval., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
47. Visibility graph-based covariance functions for scalable spatial analysis in non-convex partially Euclidean domains.
- Author
-
Gilbert B and Datta A
- Subjects
- Models, Statistical, Normal Distribution, Biometry methods, Algorithms, Spatial Analysis, Computer Simulation
- Abstract
We present a new method for constructing valid covariance functions of Gaussian processes for spatial analysis in irregular, non-convex domains such as bodies of water. Standard covariance functions based on geodesic distances are not guaranteed to be positive definite on such domains, while existing non-Euclidean approaches fail to respect the partially Euclidean nature of these domains where the geodesic distance agrees with the Euclidean distances for some pairs of points. Using a visibility graph on the domain, we propose a class of covariance functions that preserve Euclidean-based covariances between points that are connected in the domain while incorporating the non-convex geometry of the domain via conditional independence relationships. We show that the proposed method preserves the partially Euclidean nature of the intrinsic geometry on the domain while maintaining validity (positive definiteness) and marginal stationarity of the covariance function over the entire parameter space, properties which are not always fulfilled by existing approaches to construct covariance functions on non-convex domains. We provide useful approximations to improve computational efficiency, resulting in a scalable algorithm. We compare the performance of our method with those of competing state-of-the-art methods using simulation studies on synthetic non-convex domains. The method is applied to data regarding acidity levels in the Chesapeake Bay, showing its potential for ecological monitoring in real-world spatial applications on irregular domains., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
48. Unit information Dirichlet process prior.
- Author
-
Gu J and Yin G
- Subjects
- Survival Analysis, Humans, Algorithms, Biometry methods, Data Interpretation, Statistical, Bayes Theorem, Markov Chains, Monte Carlo Method, Models, Statistical, Computer Simulation
- Abstract
Prior distributions, which represent one's belief in the distributions of unknown parameters before observing the data, impact Bayesian inference in a critical and fundamental way. With the ability to incorporate external information from expert opinions or historical datasets, the priors, if specified appropriately, can improve the statistical efficiency of Bayesian inference. In survival analysis, based on the concept of unit information (UI) under parametric models, we propose the unit information Dirichlet process (UIDP) as a new class of nonparametric priors for the underlying distribution of time-to-event data. By deriving the Fisher information in terms of the differential of the cumulative hazard function, the UIDP prior is formulated to match its prior UI with the weighted average of UI in historical datasets and thus can utilize both parametric and nonparametric information provided by historical datasets. With a Markov chain Monte Carlo algorithm, simulations and real data analysis demonstrate that the UIDP prior can adaptively borrow historical information and improve statistical efficiency in survival analysis., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
49. High-dimensional multivariate analysis of variance via geometric median and bootstrapping.
- Author
-
Cheng G, Lin R, and Peng L
- Subjects
- Humans, Multivariate Analysis, Models, Statistical, Female, Data Interpretation, Statistical, Gene Expression Profiling statistics & numerical data, Sample Size, Biometry methods, Breast Neoplasms genetics, Computer Simulation, Algorithms
- Abstract
The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
50. Integrating external summary information in the presence of prior probability shift: an application to assessing essential hypertension.
- Author
-
Chen C, Han P, Chen S, Shardell M, and Qin J
- Subjects
- Humans, Models, Statistical, Risk Factors, Hypertension, Data Interpretation, Statistical, Biometry methods, Computer Simulation, Probability, Essential Hypertension
- Abstract
Recent years have witnessed a rise in the popularity of information integration without sharing of raw data. By leveraging and incorporating summary information from external sources, internal studies can achieve enhanced estimation efficiency and prediction accuracy. However, a noteworthy challenge in utilizing summary-level information is accommodating the inherent heterogeneity across diverse data sources. In this study, we delve into the issue of prior probability shift between two cohorts, wherein the difference of two data distributions depends on the outcome. We introduce a novel semi-parametric constrained optimization-based approach to integrate information within this framework, which has not been extensively explored in existing literature. Our proposed method tackles the prior probability shift by introducing the outcome-dependent selection function and effectively addresses the estimation uncertainty associated with summary information from the external source. Our approach facilitates valid inference even in the absence of a known variance-covariance estimate from the external source. Through extensive simulation studies, we observe the superiority of our method over existing ones, showcasing minimal estimation bias and reduced variance for both binary and continuous outcomes. We further demonstrate the utility of our method through its application in investigating risk factors related to essential hypertension, where the reduced estimation variability is observed after integrating summary information from an external data., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.