475 results on '"Sample size determination"'
Search Results
2. Optimal multiple testing and design in clinical trials.
- Author
-
Heller, Ruth, Krieger, Abba, and Rosset, Saharon
- Subjects
- *
EXPERIMENTAL design , *TEST design , *NULL hypothesis , *FALSE positive error , *SAMPLE size (Statistics) - Abstract
A central goal in designing clinical trials is to find the test that maximizes power (or equivalently minimizes required sample size) for finding a false null hypothesis subject to the constraint of type I error. When there is more than one test, such as in clinical trials with multiple endpoints, the issues of optimal design and optimal procedures become more complex. In this paper, we address the question of how such optimal tests should be defined and how they can be found. We review different notions of power and how they relate to study goals, and also consider the requirements of type I error control and the nature of the procedures. This leads us to an explicit optimization problem with objective and constraints that describe its specific desiderata. We present a complete solution for deriving optimal procedures for two hypotheses, which have desired monotonicity properties, and are computationally simple. For some of the optimization formulations this yields optimal procedures that are identical to existing procedures, such as Hommel's procedure or the procedure of Bittman et al. (2009), while for other cases it yields completely novel and more powerful procedures than existing ones. We demonstrate the nature of our novel procedures and their improved power extensively in a simulation and on the APEX study (Cohen et al., 2016). [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Power analysis for cluster randomized trials with continuous coprimary endpoints.
- Author
-
Yang, Siyun, Moerbeek, Mirjam, Taljaard, Monica, and Li, Fan
- Subjects
- *
CLUSTER randomized controlled trials , *PRAGMATICS , *INTRACLASS correlation , *CLUSTER analysis (Statistics) , *EXPECTATION-maximization algorithms , *NULL hypothesis , *SAMPLE size (Statistics) - Abstract
Pragmatic trials evaluating health care interventions often adopt cluster randomization due to scientific or logistical considerations. Systematic reviews have shown that coprimary endpoints are not uncommon in pragmatic trials but are seldom recognized in sample size or power calculations. While methods for power analysis based on K (K≥2$K\ge 2$) binary coprimary endpoints are available for cluster randomized trials (CRTs), to our knowledge, methods for continuous coprimary endpoints are not yet available. Assuming a multivariate linear mixed model (MLMM) that accounts for multiple types of intraclass correlation coefficients among the observations in each cluster, we derive the closed‐form joint distribution of K treatment effect estimators to facilitate sample size and power determination with different types of null hypotheses under equal cluster sizes. We characterize the relationship between the power of each test and different types of correlation parameters. We further relax the equal cluster size assumption and approximate the joint distribution of the K treatment effect estimators through the mean and coefficient of variation of cluster sizes. Our simulation studies with a finite number of clusters indicate that the predicted power by our method agrees well with the empirical power, when the parameters in the MLMM are estimated via the expectation‐maximization algorithm. An application to a real CRT is presented to illustrate the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. Sample size considerations for stepped wedge designs with subclusters
- Author
-
Fan Li, Monica Taljaard, and Kendra Davis-Plourde
- Subjects
Statistics and Probability ,Canonical link element ,General Immunology and Microbiology ,Covariance matrix ,Applied Mathematics ,Gaussian ,General Medicine ,Article ,General Biochemistry, Genetics and Molecular Biology ,Generalized linear mixed model ,symbols.namesake ,Sample size determination ,Linearization ,symbols ,General Agricultural and Biological Sciences ,Cluster analysis ,Algorithm ,Eigenvalues and eigenvectors ,Mathematics - Abstract
The stepped wedge cluster randomized trial (SW-CRT) is an increasingly popular design for evaluating health service delivery or policy interventions. An essential consideration of this design is the need to account for both within-period and between-period correlations in sample size calculations. Especially when embedded in health care delivery systems, many SW-CRTs may have subclusters nested in clusters, within which outcomes are collected longitudinally. However, existing sample size methods that account for between-period correlations have not allowed for multiple levels of clustering. We present computationally efficient sample size procedures that properly differentiate within-period and between-period intracluster correlation coefficients in SW-CRTs in the presence of subclusters. We introduce an extended block exchangeable correlation matrix to characterize the complex dependencies of outcomes within clusters. For Gaussian outcomes, we derive a closed-form sample size expression that depends on the correlation structure only through two eigenvalues of the extended block exchangeable correlation structure. For non-Gaussian outcomes, we present a generic sample size algorithm based on linearization and elucidate simplifications under canonical link functions. For example, we show that the approximate sample size formula under a logistic linear mixed model depends on three eigenvalues of the extended block exchangeable correlation matrix. We provide an extension to accommodate unequal cluster sizes and validate the proposed methods via simulations. Finally, we illustrate our methods in two real SW-CRTs with subclusters.
- Published
- 2021
5. Interim monitoring in sequential multiple assignment randomized trials
- Author
-
Liwen Wu, Junyao Wang, and Abdus S. Wahed
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Computer science ,Machine learning ,computer.software_genre ,Wald test ,General Biochemistry, Genetics and Molecular Biology ,law.invention ,Methodology (stat.ME) ,Dynamic treatment regime ,Randomized controlled trial ,law ,Interim ,Statistics - Methodology ,General Immunology and Microbiology ,business.industry ,Applied Mathematics ,Inverse probability weighting ,technology, industry, and agriculture ,General Medicine ,humanities ,Clinical trial ,Sample size determination ,Artificial intelligence ,General Agricultural and Biological Sciences ,business ,computer ,Type I and type II errors - Abstract
A sequential multiple assignment randomized trial (SMART) facilitates the comparison of multiple adaptive treatment strategies (ATSs) simultaneously. Previous studies have established a framework to test the homogeneity of multiple ATSs by a global Wald test through inverse probability weighting. SMARTs are generally lengthier than classical clinical trials due to the sequential nature of treatment randomization in multiple stages. Thus, it would be beneficial to add interim analyses allowing for an early stop if overwhelming efficacy is observed. We introduce group sequential methods to SMARTs to facilitate interim monitoring based on the multivariate chi-square distribution. Simulation studies demonstrate that the proposed interim monitoring in SMART (IM-SMART) maintains the desired type I error and power with reduced expected sample size compared to the classical SMART. Finally, we illustrate our method by reanalyzing a SMART assessing the effects of cognitive behavioral and physical therapies in patients with knee osteoarthritis and comorbid subsyndromal depressive symptoms.
- Published
- 2021
6. Group sequential testing for cluster randomized trials with time‐to‐event endpoint
- Author
-
Sin-Ho Jung and Jianghao Li
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Computer science ,Applied Mathematics ,General Medicine ,Interim analysis ,General Biochemistry, Genetics and Molecular Biology ,Log-rank test ,CRTS ,Joint probability distribution ,Sample size determination ,Statistics ,Fraction (mathematics) ,General Agricultural and Biological Sciences ,Statistical hypothesis testing ,Event (probability theory) - Abstract
We propose group sequential methods for cluster randomized trials (CRTs) with time-to-event endpoint. The alpha spending function approach is used for sequential data monitoring. The key to this approach is determining the joint distribution of test statistics and the information fraction at the time of interim analysis. We prove that the sequentially computed log-rank statistics in CRTs do not have independent increment property. We also propose an information fraction for group sequential trials with clustered survival data and a corresponding sample size determination approach. Extensive simulation studies are conducted to evaluate the performance of our proposed testing procedure using some existing alpha spending functions in terms of expected sample size and maximal sample size. Real study examples are taken to demonstrate our method.
- Published
- 2021
7. On polygenic risk scores for complex traits prediction
- Author
-
Fei Zou and Bingxin Zhao
- Subjects
Statistics and Probability ,Multifactorial Inheritance ,Single-nucleotide polymorphism ,Genome-wide association study ,Polymorphism, Single Nucleotide ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Risk Factors ,Statistics ,Humans ,Computer Simulation ,Genetic Predisposition to Disease ,0101 mathematics ,030304 developmental biology ,Genetic association ,Mathematics ,0303 health sciences ,General Immunology and Microbiology ,Applied Mathematics ,General Medicine ,Heritability ,Genetic architecture ,Polygene ,Sample size determination ,Trait ,General Agricultural and Biological Sciences ,Genome-Wide Association Study - Abstract
Polygenic risk scores (PRS) have gained substantial attention for complex traits prediction in genome-wide association studies (GWAS). Motivated by the polygenic model of complex traits, we study the statistical properties of PRS under the high-dimensional but sparsity free setting where the triplet (n,p,m)→(∞,∞,∞) with n,p,m being the sample size, the number of assayed single-nucleotide polymorphisms (SNPs), and the number of assayed causal SNPs, respectively. First, we derive asymptotic results on the out-of-sample (prediction) R-squared for PRS. These results help understand the widespread observed gap between the in-sample heritability (or partial R-squared due to the genetic features) estimate and the out-of-sample R-squared for most complex traits. Next, we investigate how features should be selected (e.g., by a p-value threshold) for constructing optimal PRS. We reveal that the optimal threshold depends largely on the genetic architecture underlying the complex trait and the sample size of the training GWAS, or the m/n ratio. For highly polygenic traits with a large m/n ratio, it is difficult to separate causal and null SNPs and stringent feature selection in principle often leads to poor PRS prediction. We numerically illustrate the theoretical results with intensive simulation studies and real data analysis on 33 complex traits with a wide range of genetic architectures in the UK Biobank database.
- Published
- 2021
8. Multivariate survival analysis in big data: A divide‐and‐combine approach
- Author
-
Minge Xie, Shou-En Lu, John B. Kostis, Wei Wang, and Jerry Q Cheng
- Subjects
Statistics and Probability ,Multivariate statistics ,Computer science ,Feature selection ,Marginal model ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Consistency (statistics) ,Statistics ,Computer Simulation ,0101 mathematics ,Equivalence (measure theory) ,Proportional Hazards Models ,030304 developmental biology ,0303 health sciences ,General Immunology and Microbiology ,Applied Mathematics ,Estimator ,General Medicine ,Survival Analysis ,Sample size determination ,Sample Size ,Multivariate Analysis ,Confidence distribution ,General Agricultural and Biological Sciences - Abstract
Multivariate failure time data are frequently analyzed using the marginal proportional hazards models and the frailty models. When the sample size is extraordinarily large, using either approach could face computational challenges. In this paper, we focus on the marginal model approach and propose a divide-and-combine method to analyze large-scale multivariate failure time data. Our method is motivated by the Myocardial Infarction Data Acquisition System (MIDAS), a New Jersey statewide database that includes 73,725,160 admissions to nonfederal hospitals and emergency rooms (ERs) from 1995 to 2017. We propose to randomly divide the full data into multiple subsets and propose a weighted method to combine these estimators obtained from individual subsets using three weights. Under mild conditions, we show that the combined estimator is asymptotically equivalent to the estimator obtained from the full data as if the data were analyzed all at once. In addition, to screen out risk factors with weak signals, we propose to perform the regularized estimation on the combined estimator using its combined confidence distribution. Theoretical properties, such as consistency, oracle properties, and asymptotic equivalence between the divide-and-combine approach and the full data approach are studied. Performance of the proposed method is investigated using simulation studies. Our method is applied to the MIDAS data to identify risk factors related to multivariate cardiovascular-related health outcomes.
- Published
- 2021
9. Power and sample size for observational studies of point exposure effects
- Author
-
Bonnie E. Shook-Sa and Michael G. Hudgens
- Subjects
Statistics and Probability ,01 natural sciences ,Article ,General Biochemistry, Genetics and Molecular Biology ,Statistical power ,Design effect ,010104 statistics & probability ,03 medical and health sciences ,Statistics ,0101 mathematics ,Probability ,030304 developmental biology ,Mathematics ,0303 health sciences ,Models, Statistical ,General Immunology and Microbiology ,Applied Mathematics ,Inverse probability weighting ,Confounding ,Estimator ,General Medicine ,Weighting ,Causality ,Research Design ,Sample size determination ,Sample Size ,Causal inference ,General Agricultural and Biological Sciences - Abstract
Inverse probability of treatment weights are commonly used to control for confounding when estimating causal effects of point exposures from observational data. When planning a study that will be analyzed with inverse probability of treatment weights, determining the required sample size for a given level of statistical power is challenging because of the effect of weighting on the variance of the estimated causal means. This paper considers the utility of the design effect to quantify the effect of weighting on the precision of causal estimates. The design effect is defined as the ratio of the variance of the causal mean estimator divided by the variance of a naïve estimator if, counter to fact, no confounding had been present and weights were not needed. A simple, closed-form approximation of the design effect is derived that is outcome invariant and can be estimated during the study design phase. Once the design effect is approximated for each treatment group, sample size calculations are conducted as for a randomized trial, but with variances inflated by the design effects to account for weighting. Simulations demonstrate the accuracy of the design effect approximation, and practical considerations are discussed.
- Published
- 2020
10. Restricted mean survival time as a function of restriction time
- Author
-
Yingchao Zhong and Douglas E. Schaubel
- Subjects
Statistics and Probability ,Generalized linear model ,Computer science ,01 natural sciences ,Article ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Covariate ,Statistics ,Truncation (statistics) ,0101 mathematics ,Proportional Hazards Models ,030304 developmental biology ,0303 health sciences ,General Immunology and Microbiology ,Applied Mathematics ,Regression analysis ,General Medicine ,Survival Analysis ,Regression ,Survival Rate ,Sample size determination ,Sample Size ,RMST ,Metric (mathematics) ,Regression Analysis ,General Agricultural and Biological Sciences - Abstract
Restricted mean survival time (RMST) is a clinically interpretable and meaningful survival metric that has gained popularity in recent years. Several methods are available for regression modeling of RMST, most based on pseudo-observations or what is essentially an inverse-weighted complete-case analysis. No existing RMST regression method allows for the covariate effects to be expressed as functions over time. This is a considerable limitation, in light of the many hazard regression methods that do accommodate such effects. To address this void in the literature, we propose RMST methods that permit estimating time-varying effects. In particular, we propose an inference framework for directly modeling RMST as a continuous function of L. Large-sample properties are derived. Simulation studies are performed to evaluate the performance of the methods in finite sample sizes. The proposed framework is applied to kidney transplant data obtained from the Scientific Registry of Transplant Recipients (SRTR).
- Published
- 2020
11. Poststratification fusion learning in longitudinal data analysis
- Author
-
Peter X.-K. Song and Lu Tang
- Subjects
Data Analysis ,Statistics and Probability ,Computer science ,Feature selection ,Machine learning ,computer.software_genre ,01 natural sciences ,Regularization (mathematics) ,General Biochemistry, Genetics and Molecular Biology ,Statistical power ,010104 statistics & probability ,03 medical and health sciences ,Statistical inference ,medicine ,Humans ,Computer Simulation ,Attrition ,Longitudinal Studies ,0101 mathematics ,Generalized estimating equation ,030304 developmental biology ,0303 health sciences ,Models, Statistical ,General Immunology and Microbiology ,business.industry ,Applied Mathematics ,General Medicine ,medicine.disease ,Sample size determination ,Parametric model ,Artificial intelligence ,General Agricultural and Biological Sciences ,business ,computer - Abstract
Stratification is a very commonly used approach in biomedical studies to handle sample heterogeneity arising from, for examples, clinical units, patient subgroups, or missing-data. A key rationale behind such approach is to overcome potential sampling biases in statistical inference. Two issues of such stratification-based strategy are (i) whether individual strata are sufficiently distinctive to warrant stratification, and (ii) sample size attrition resulted from the stratification may potentially lead to loss of statistical power. To address these issues, we propose a penalized generalized estimating equations approach to reducing the complexity of parametric model structures due to excessive stratification. Specifically, we develop a data-driven fusion learning approach for longitudinal data that improves estimation efficiency by integrating information across similar strata, yet still allows necessary separation for stratum-specific conclusions. The proposed method is evaluated by simulation studies and applied to a motivating example of psychiatric study to demonstrate its usefulness in real world settings.
- Published
- 2020
12. Discussion on 'Predictively consistent prior effective sample sizes,' by Beat Neuenschwander, Sebastian Weber, Heinz Schmidli, and Anthony O'Hagan
- Author
-
Atanu Biswas and Jean-François Angers
- Subjects
Statistics and Probability ,Multivariate statistics ,Binary number ,Beat (acoustics) ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Dirichlet distribution ,Response vector ,010104 statistics & probability ,03 medical and health sciences ,symbols.namesake ,Clinical Trials, Phase II as Topic ,Prior probability ,Applied mathematics ,0101 mathematics ,030304 developmental biology ,Mathematics ,0303 health sciences ,General Immunology and Microbiology ,Applied Mathematics ,General Medicine ,Effective sample size ,Sample size determination ,Sample Size ,symbols ,General Agricultural and Biological Sciences - Abstract
We extend the approach of finding effective sample size for a typical phase II clinical trial having efficacy and toxicity as two components of the response vector. The case of binary efficacy and binary toxicity is illustrated under Dirichlet and multivariate T priors.
- Published
- 2020
13. Discussion on 'Predictively consistent prior effective sample sizes,' by Beat Neuenschwander, Sebastian Weber, Heinz Schmidli, and Anthony O'Hagan
- Author
-
Ruitao Lin and Yeonhee Park
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Sample size determination ,Applied Mathematics ,Beat (acoustics) ,General Medicine ,Statistical physics ,General Agricultural and Biological Sciences ,General Biochemistry, Genetics and Molecular Biology ,Mathematics - Published
- 2020
14. Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test
- Author
-
Giovanni Nattino, Stanley Lemeshow, and Michael L. Pennell
- Subjects
Statistics and Probability ,0303 health sciences ,General Immunology and Microbiology ,Calibration (statistics) ,Applied Mathematics ,General Medicine ,Logistic regression ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Confidence interval ,010104 statistics & probability ,03 medical and health sciences ,Hosmer–Lemeshow test ,Goodness of fit ,Sample size determination ,Statistics ,0101 mathematics ,General Agricultural and Biological Sciences ,Statistic ,030304 developmental biology ,Statistical hypothesis testing ,Mathematics - Abstract
Evaluating the goodness of fit of logistic regression models is crucial to ensure the accuracy of the estimated probabilities. Unfortunately, such evaluation is problematic in large samples. Because the power of traditional goodness of fit tests increases with the sample size, practically irrelevant discrepancies between estimated and true probabilities are increasingly likely to cause the rejection of the hypothesis of perfect fit in larger and larger samples. This phenomenon has been widely documented for popular goodness of fit tests, such as the Hosmer-Lemeshow test. To address this limitation, we propose a modification of the Hosmer-Lemeshow approach. By standardizing the noncentrality parameter that characterizes the alternative distribution of the Hosmer-Lemeshow statistic, we introduce a parameter that measures the goodness of fit of a model but does not depend on the sample size. We provide the methodology to estimate this parameter and construct confidence intervals for it. Finally, we propose a formal statistical test to rigorously assess whether the fit of a model, albeit not perfect, is acceptable for practical purposes. The proposed method is compared in a simulation study with a competing modification of the Hosmer-Lemeshow test, based on repeated subsampling. We provide a step-by-step illustration of our method using a model for postneonatal mortality developed in a large cohort of more than 300 000 observations.
- Published
- 2020
15. Power analysis for cluster randomized trials with multiple binary co‐primary endpoints
- Author
-
Song Zhang, Dateng Li, and Jing Cao
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Computer science ,Applied Mathematics ,General Medicine ,computer.software_genre ,General Biochemistry, Genetics and Molecular Biology ,law.invention ,Correlation ,Clinical trial ,Randomized controlled trial ,law ,Joint probability distribution ,Sample size determination ,Sample Size ,Cluster Analysis ,Computer Simulation ,Data mining ,General Agricultural and Biological Sciences ,Generalized estimating equation ,computer ,Randomized Controlled Trials as Topic ,Statistical hypothesis testing ,Type I and type II errors - Abstract
Cluster randomized trials (CRTs) are widely used in different areas of medicine and public health. Recently, with increasing complexity of medical therapies and technological advances in monitoring multiple outcomes, many clinical trials attempt to evaluate multiple co-primary endpoints. In this study, we present a power analysis method for CRTs with K≥2 binary co-primary endpoints. It is developed based on the GEE (generalized estimating equation) approach, and three types of correlations are considered: inter-subject correlation within each endpoint, intra-subject correlation across endpoints, and inter-subject correlation across endpoints. A closed-form joint distribution of the K test statistics is derived, which facilitates the evaluation of power and type I error for arbitrarily constructed hypotheses. We further present a theorem that characterizes the relationship between various correlations and testing power. We assess the performance of the proposed power analysis method based on extensive simulation studies. An application example to a real clinical trial is presented.
- Published
- 2020
16. Sample size and power for the weighted log‐rank test and Kaplan‐Meier based tests with allowance for nonproportional hazards
- Author
-
Godwin Yung and Yi Liu
- Subjects
Statistics and Probability ,Randomization Ratio ,Percentile ,Kaplan-Meier Estimate ,Wald test ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Statistics ,Humans ,0101 mathematics ,Proportional Hazards Models ,030304 developmental biology ,Statistical hypothesis testing ,Mathematics ,0303 health sciences ,General Immunology and Microbiology ,Applied Mathematics ,Nonparametric statistics ,General Medicine ,Asymptotic theory (statistics) ,Survival Analysis ,Survival Rate ,Log-rank test ,Sample size determination ,Sample Size ,General Agricultural and Biological Sciences - Abstract
Asymptotic distributions under alternative hypotheses and their corresponding sample size and power equations are derived for nonparametric test statistics commonly used to compare two survival curves. Test statistics include the weighted log-rank test and the Wald test for difference in (or ratio of) Kaplan-Meier survival probability, percentile survival, and restricted mean survival time. Accrual, survival, and loss to follow-up are allowed to follow any arbitrary continuous distribution. We show that Schoenfeld's equation-often used by practitioners to calculate the required number of events for the unweighted log-rank test-can be inaccurate even when the proportional hazards (PH) assumption holds. In fact, it can mislead one to believe that 1:1 is the optimal randomization ratio (RR), when actually power can be gained by assigning more patients to the active arm. Meaningful improvements to Schoenfeld's equation are made. The present theory should be useful in designing clinical trials, particularly in immuno-oncology where nonproportional hazards are frequently encountered. We illustrate the application of our theory with an example exploring optimal RR under PH and a second example examining the impact of delayed treatment effect. A companion R package npsurvSS is available for download on CRAN.
- Published
- 2019
17. Discussion on 'Predictively Consistent Prior Effective Sample Sizes' by Beat Neuenschwander, Sebastian Weber, Heinz Schmidli, and Anthony O'Hagan
- Author
-
Alexander M. Kaizer and John Kittelson
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Computer science ,Sample size determination ,Applied Mathematics ,Beat (acoustics) ,General Medicine ,Statistical physics ,General Agricultural and Biological Sciences ,General Biochemistry, Genetics and Molecular Biology - Published
- 2020
18. Power calculation for cross‐sectional stepped wedge cluster randomized trials with variable cluster sizes
- Author
-
Linda Harrison, Tom Chen, and Rui Wang
- Subjects
Statistics and Probability ,01 natural sciences ,Upper and lower bounds ,Article ,General Biochemistry, Genetics and Molecular Biology ,Design effect ,010104 statistics & probability ,03 medical and health sciences ,Statistics ,Cluster (physics) ,Cluster Analysis ,0101 mathematics ,Randomized Controlled Trials as Topic ,030304 developmental biology ,Mathematics ,0303 health sciences ,General Immunology and Microbiology ,Applied Mathematics ,Estimator ,General Medicine ,Cross-Sectional Studies ,Efficiency ,Research Design ,Sample size determination ,Sample Size ,General Agricultural and Biological Sciences ,Algorithms ,Realization (probability) ,Arithmetic mean - Abstract
Standard sample size calculation formulas for stepped wedge cluster randomized trials (SW-CRTs) assume that cluster sizes are equal. When cluster sizes vary substantially, ignoring this variation may lead to an under-powered study. We investigate the relative efficiency of a SW-CRT with varying cluster sizes to equal cluster sizes, and derive variance estimators for the intervention effect that account for this variation under a mixed effects model; a commonly-used approach for analyzing data from cluster randomized trials. When cluster sizes vary, the power of a SW-CRT depends on the order in which clusters receive the intervention, which is determined through randomization. We first derive a variance formula that corresponds to any particular realization of the randomized sequence and propose efficient algorithms to identify upper and lower bounds of the power. We then obtain an “expected” power based on a first-order approximation to the variance formula, where the expectation is taken with respect to all possible randomization sequences. Finally, we provide a variance formula for more general settings where only the cluster size arithmetic mean and coefficient of variation, instead of exact cluster sizes, are known in the design stage. We evaluate our methods through simulations and illustrate that the average power of a SW-CRT decreases as the variation in cluster sizes increases, and the impact is largest when the number of clusters is small.
- Published
- 2019
19. Quantification of prior impact in terms of effective current sample size
- Author
-
Manuel Wiesenfarth and Silvia Calderazzo
- Subjects
Statistics and Probability ,Biometry ,Computer science ,Bayesian probability ,Machine learning ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Current sample ,Equating ,Prior probability ,Humans ,Computer Simulation ,0101 mathematics ,Prior information ,030304 developmental biology ,0303 health sciences ,Measure (data warehouse) ,Models, Statistical ,General Immunology and Microbiology ,Adaptive Clinical Trials as Topic ,business.industry ,Applied Mathematics ,Bayes Theorem ,General Medicine ,Data model ,Sample size determination ,Data Interpretation, Statistical ,Sample Size ,Artificial intelligence ,General Agricultural and Biological Sciences ,business ,computer - Abstract
Bayesian methods allow borrowing of historical information through prior distributions. The concept of prior effective sample size (prior ESS) facilitates quantification and communication of such prior information by equating it to a sample size. Prior information can arise from historical observations; thus, the traditional approach identifies the ESS with such a historical sample size. However, this measure is independent of newly observed data, and thus would not capture an actual "loss of information" induced by the prior in case of prior-data conflict. We build on a recent work to relate prior impact to the number of (virtual) samples from the current data model and introduce the effective current sample size (ECSS) of a prior, tailored to the application in Bayesian clinical trial designs. Special emphasis is put on robust mixture, power, and commensurate priors. We apply the approach to an adaptive design in which the number of recruited patients is adjusted depending on the effective sample size at an interim analysis. We argue that the ECSS is the appropriate measure in this case, as the aim is to save current (as opposed to historical) patients from recruitment. Furthermore, the ECSS can help overcome lack of consensus in the ESS assessment of mixture priors and can, more broadly, provide further insights into the impact of priors. An R package accompanies the paper.
- Published
- 2019
20. Simulation‐selection‐extrapolation: Estimation in high‐dimensional errors‐in‐variables models
- Author
-
Linh Nghiem and Cornelis J. Potgieter
- Subjects
Statistics and Probability ,Generalized linear model ,Computer science ,Feature selection ,Wilms Tumor ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Covariate ,Methods ,Humans ,0101 mathematics ,Proportional Hazards Models ,030304 developmental biology ,0303 health sciences ,Models, Statistical ,General Immunology and Microbiology ,Gene Expression Profiling ,Applied Mathematics ,Linear model ,Estimator ,Regression analysis ,General Medicine ,Microarray Analysis ,Logistic Models ,Sample size determination ,Sample Size ,Linear Models ,Errors-in-variables models ,Scientific Experimental Error ,General Agricultural and Biological Sciences ,Algorithm - Abstract
Errors-in-variables models in high-dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation-SELection-EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors-in-variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline-based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.
- Published
- 2019
21. Analysis of covariance in randomized trials: More precision and valid confidence intervals, without model assumptions
- Author
-
Michael Rosenblum, Bingkai Wang, and Elizabeth L. Ogburn
- Subjects
Statistics and Probability ,Average treatment effect ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Covariate ,Statistics ,Confidence Intervals ,Humans ,Computer Simulation ,Point estimation ,0101 mathematics ,Randomized Controlled Trials as Topic ,030304 developmental biology ,Mathematics ,Analysis of covariance ,Analysis of Variance ,0303 health sciences ,Models, Statistical ,General Immunology and Microbiology ,Mental Disorders ,Applied Mathematics ,Linear model ,General Medicine ,Confidence interval ,Treatment Outcome ,Standard error ,Sample size determination ,Data Interpretation, Statistical ,Sample Size ,Linear Models ,General Agricultural and Biological Sciences - Abstract
"Covariate adjustment" in the randomized trial context refers to an estimator of the average treatment effect that adjusts for chance imbalances between study arms in baseline variables (called "covariates"). The baseline variables could include, for example, age, sex, disease severity, and biomarkers. According to two surveys of clinical trial reports, there is confusion about the statistical properties of covariate adjustment. We focus on the analysis of covariance (ANCOVA) estimator, which involves fitting a linear model for the outcome given the treatment arm and baseline variables, and trials that use simple randomization with equal probability of assignment to treatment and control. We prove the following new (to the best of our knowledge) robustness property of ANCOVA to arbitrary model misspecification: Not only is the ANCOVA point estimate consistent (as proved by Yang and Tsiatis, 2001) but so is its standard error. This implies that confidence intervals and hypothesis tests conducted as if the linear model were correct are still asymptotically valid even when the linear model is arbitrarily misspecified, for example, when the baseline variables are nonlinearly related to the outcome or there is treatment effect heterogeneity. We also give a simple, robust formula for the variance reduction (equivalently, sample size reduction) from using ANCOVA. By reanalyzing completed randomized trials for mild cognitive impairment, schizophrenia, and depression, we demonstrate how ANCOVA can achieve variance reductions of 4 to 32%.
- Published
- 2019
22. Improved methods for estimating abundance and related demographic parameters from mark‐resight data
- Author
-
Gary C. White, Brett T. McClintock, and Moira A. Pryde
- Subjects
population size ,Statistics and Probability ,Computer science ,Poisson distribution ,Models, Biological ,survival ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Songbirds ,Mark and recapture ,010104 statistics & probability ,03 medical and health sciences ,symbols.namesake ,capture‐recapture ,Statistics ,Animals ,Program NOREMARK ,0101 mathematics ,Demography ,030304 developmental biology ,Population Density ,0303 health sciences ,Models, Statistical ,General Immunology and Microbiology ,biology ,Applied Mathematics ,Estimator ,Sampling (statistics) ,General Medicine ,biology.organism_classification ,Random effects model ,capture resight ,Confidence interval ,Petroica ,Sample size determination ,Sample Size ,Biometric Methodology ,symbols ,General Agricultural and Biological Sciences - Abstract
Over the past decade, there has been much methodological development for the estimation of abundance and related demographic parameters using mark‐resight data. Often viewed as a less‐invasive and less‐expensive alternative to conventional mark recapture, mark‐resight methods jointly model marked individual encounters and counts of unmarked individuals, and recent extensions accommodate common challenges associated with imperfect detection. When these challenges include both individual detection heterogeneity and an unknown marked sample size, we demonstrate several deficiencies associated with the most widely used mark‐resight models currently implemented in the popular capture‐recapture freeware Program MARK. We propose a composite likelihood solution based on a zero‐inflated Poisson log‐normal model and find the performance of this new estimator to be superior in terms of bias and confidence interval coverage. Under Pollock's robust design, we also extend the models to accommodate individual‐level random effects across sampling occasions as a potentially more realistic alternative to models that assume independence. As a motivating example, we revisit a previous analysis of mark‐resight data for the New Zealand Robin (Petroica australis) and compare inferences from the proposed estimators. For the all‐too‐common situation where encounter rates are low, individual detection heterogeneity is non‐negligible, and the number of marked individuals is unknown, we recommend practitioners use the zero‐inflated Poisson log‐normal mark‐resight estimator as now implemented in Program MARK.
- Published
- 2019
23. Double‐wavelet transform for multisubject task‐induced functional magnetic resonance imaging data
- Author
-
Hakmook Kang, David Badre, and Minchun Zhou
- Subjects
Statistics and Probability ,Spatial correlation ,Covariance function ,Computer science ,Models, Neurological ,Wavelet Analysis ,Article ,General Biochemistry, Genetics and Molecular Biology ,Correlation ,Spatio-Temporal Analysis ,Wavelet ,medicine ,Humans ,Brain Mapping ,General Immunology and Microbiology ,medicine.diagnostic_test ,business.industry ,Covariance matrix ,Applied Mathematics ,Reproducibility of Results ,Wavelet transform ,Pattern recognition ,General Medicine ,Magnetic Resonance Imaging ,Sample size determination ,Artificial intelligence ,General Agricultural and Biological Sciences ,business ,Functional magnetic resonance imaging ,Algorithms - Abstract
The goal of this article is to model multisubject task-induced functional magnetic resonance imaging (fMRI) response among predefined regions of interest (ROIs) of the human brain. Conventional approaches to fMRI analysis only take into account temporal correlations, but do not rigorously model the underlying spatial correlation due to the complexity of estimating and inverting the high dimensional spatio-temporal covariance matrix. Other spatio-temporal model approaches estimate the covariance matrix with the assumption of stationary time series, which is not always feasible. To address these limitations, we propose a double-wavelet approach for modeling the spatio-temporal brain process. Working with wavelet coefficients simplifies temporal and spatial covariance structure because under regularity conditions, wavelet coefficients are approximately uncorrelated. Different wavelet functions were used to capture different correlation structures in the spatio-temporal model. The main advantages of the wavelet approach are that it is scalable and that it deals with nonstationarity in brain signals. Simulation studies showed that our method could reduce false-positive and false-negative rates by taking into account spatial and temporal correlations simultaneously. We also applied our method to fMRI data to study activation in prespecified ROIs in the prefontal cortex. Data analysis showed that the result using the double-wavelet approach was more consistent than the conventional approach when sample size decreased.
- Published
- 2019
24. Sample size formula for general win ratio analysis
- Author
-
Lu Mao, KyungMann Kim, and Xinran Miao
- Subjects
Statistics and Probability ,Population ,U-statistic ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Odds ,010104 statistics & probability ,03 medical and health sciences ,Statistics ,Test statistic ,Odds Ratio ,Humans ,Computer Simulation ,0101 mathematics ,education ,030304 developmental biology ,Mathematics ,Proportional Hazards Models ,0303 health sciences ,education.field_of_study ,General Immunology and Microbiology ,Applied Mathematics ,Nonparametric statistics ,General Medicine ,Outcome (probability) ,Sample size determination ,Cardiovascular Diseases ,Sample Size ,Mann–Whitney U test ,General Agricultural and Biological Sciences - Abstract
Originally proposed for the analysis of prioritized composite endpoints, the win ratio has now expanded into a broad class of methodology based on general pairwise comparisons. Complicated by the non-i.i.d. structure of the test statistic, however, sample size estimation for the win ratio has lagged behind. In this article, we develop general and easy-to-use formulas to calculate sample size for win ratio analysis of different outcome types. In a nonparametric setting, the null variance of the test statistic is derived using U-statistic theory in terms of a dispersion parameter called the standard rank deviation, an intrinsic characteristic of the null outcome distribution and the user-defined rule of comparison. The effect size can be hypothesized either on the original scale of the population win ratio, or on the scale of a "usual" effect size suited to the outcome type. The latter approach allows one to measure the effect size by, for example, odds/continuation ratio for totally/partially ordered outcomes and hazard ratios for composite time-to-event outcomes. Simulation studies show that the derived formulas provide accurate estimates for the required sample size across different settings. As illustration, real data from two clinical studies of hepatic and cardiovascular diseases are used as pilot data to calculate sample sizes for future trials.
- Published
- 2021
25. Information-incorporated Gaussian graphical model for gene expression data
- Author
-
Cunjie Lin, Huangdi Yi, Qingzhao Zhang, and Shuangge Ma
- Subjects
Statistics and Probability ,Computer science ,Gaussian ,Normal Distribution ,Gene Expression ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Article ,010104 statistics & probability ,03 medical and health sciences ,Consistency (database systems) ,symbols.namesake ,Gene Regulatory Networks ,Graphical model ,0101 mathematics ,Selection (genetic algorithm) ,030304 developmental biology ,0303 health sciences ,Conditional dependence ,Models, Statistical ,General Immunology and Microbiology ,Applied Mathematics ,Perspective (graphical) ,General Medicine ,ComputingMethodologies_PATTERNRECOGNITION ,Sample size determination ,symbols ,Data mining ,General Agricultural and Biological Sciences ,computer ,Algorithms ,Network analysis - Abstract
In the analysis of gene expression data, network approaches take a system perspective and have played an irreplaceably important role. Gaussian graphical models (GGMs) have been popular in the network analysis of gene expression data. They investigate the conditional dependence between genes and "transform" the problem of estimating network structures into a sparse estimation of precision matrices. When there is a moderate to large number of genes, the number of parameters to be estimated may overwhelm the limited sample size, leading to unreliable estimation and selection. In this article, we propose incorporating information from previous studies (for example, those deposited at PubMed) to assist estimating the network structure in the present data. It is recognized that such information can be partial, biased, or even wrong. A penalization-based estimation approach is developed, shown to have consistency properties, and realized using an effective computational algorithm. Simulation demonstrates its competitive performance under various information accuracy scenarios. The analysis of TCGA lung cancer prognostic genes leads to network structures different from the alternatives.
- Published
- 2020
26. Improving precision and power in randomized trials for COVID-19 treatments using covariate adjustment, for binary, ordinal, and time-to-event outcomes
- Author
-
Iván Díaz, David Benkeser, Daniel O. Scharfstein, Jodi B Segal, Michael Rosenblum, and Alex Luedtke
- Subjects
Statistics and Probability ,Coronavirus disease 2019 (COVID-19) ,MEDLINE ,Inference ,Context (language use) ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Article ,law.invention ,survival analysis ,010104 statistics & probability ,03 medical and health sciences ,Biometric Practice ,Randomized controlled trial ,Discusion ,COVID‐19 ,law ,Statistics ,Covariate ,randomized trial ,Medicine ,Humans ,0101 mathematics ,030304 developmental biology ,Event (probability theory) ,covariate adjustment ,ordinal outcomes ,Randomized Controlled Trials as Topic ,0303 health sciences ,General Immunology and Microbiology ,business.industry ,SARS-CoV-2 ,Applied Mathematics ,Estimator ,COVID-19 ,General Medicine ,United States ,COVID-19 Drug Treatment ,Clinical trial ,Hospitalization ,Treatment Outcome ,Sample size determination ,business ,General Agricultural and Biological Sciences - Abstract
We aim to help inform the choice of estimand (i.e., target of inference) and analysis method to be used in future COVID-19 treatment trials. To this end, we describe estimands for outcome types of particular interest in these trials (ordinal and time-to-event). When the outcome is ordinal, the estimands that we consider are the difference between study arms in the mean outcome, the Mann-Whitney (rank--based) estimand, and the average of the cumulative log odds ratios over the levels of the outcome. For time-to-event outcomes, we consider the difference between arms in the restricted mean survival time, the difference between arms in the cumulative incidence, and the relative risk. Advantageously, the interpretability of these estimands does not rely on a proportional odds or proportional hazards assumptions. For each estimand, we evaluate the potential value added by using estimators that leverage information in baseline variables to improve precision and reduce the required sample size to achieve a desired power. These are called covariate adjusted estimators. To evaluate the performance of the covariate adjusted and unadjusted estimators that we present, we simulate two-arm, randomized trials comparing a hypothetical COVID-19 treatment versus standard of care, where the primary outcome is ordinal or time-to-event. Our simulated distributions are derived from two sources: longitudinal data on over 500 patients hospitalized at Weill Cornell Medicine New York Presbyterian Hospital prior to March 28, 2020, and a CDC preliminary description of 2449 cases reported to the CDC from February 12 to March 16, 2020. We focus on hospitalized, COVID-19 positive patients and consider the following outcomes: intubation, ventilator use, and death. We conduct simulations using all three estimands when the outcome is ordinal, but only evaluate the restricted mean survival time when the outcome is time to event. Our simulations showed that, in trials with at least 200 participants, the precision gains due to covariate adjustment are equivalent to requiring 10-20% fewer participants to achieve the same power as a trial that uses the unadjusted estimator; this was the case for each outcome and estimand that we considered.
- Published
- 2020
27. Rejoinder to 'Predictively consistent prior effective sample sizes,' by Beat Neuenschwander, Sebastian Weber, Heinz Schmidli, and Anthony O'Hagan
- Author
-
Sebastian Weber, Beat Neuenschwander, Anthony O'Hagan, and Heinz Schmidli
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Sample size determination ,Research Design ,Applied Mathematics ,Philosophy ,Sample Size ,Beat (acoustics) ,General Medicine ,General Agricultural and Biological Sciences ,Humanities ,General Biochemistry, Genetics and Molecular Biology - Published
- 2020
28. On assessing binary regression models based on ungrouped data
- Author
-
Chunling Lu and Yuhong Yang
- Subjects
Statistics and Probability ,Computer science ,media_common.quotation_subject ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Hosmer–Lemeshow test ,Bias ,Goodness of fit ,Voting ,Humans ,Computer Simulation ,Lack-of-fit sum of squares ,0101 mathematics ,Probability ,030304 developmental biology ,media_common ,0303 health sciences ,General Immunology and Microbiology ,Applied Mathematics ,Health Care Costs ,General Medicine ,Power (physics) ,Zero (linguistics) ,Logistic Models ,Sample size determination ,Data Interpretation, Statistical ,Sample Size ,Binary regression ,Data mining ,General Agricultural and Biological Sciences ,computer - Abstract
Assessing a binary regression model based on ungrouped data is a commonly encountered but very challenging problem. Although tests, such as Hosmer-Lemeshow test and le Cessie-van Houwelingen test, have been devised and widely used in applications, they often have low power in detecting lack of fit and not much theoretical justification has been made on when they can work well. In this article, we propose a new approach based on a cross-validation voting system to address the problem. In addition to a theoretical guarantee that the probabilities of type I and II errors both converge to zero as the sample size increases for the new method under proper conditions, our simulation results demonstrate that it performs very well.
- Published
- 2018
29. Sample size determination for GEE analyses of stepped wedge cluster randomized trials
- Author
-
John S. Preisser, Fan Li, and Elizabeth L. Turner
- Subjects
Statistics and Probability ,Biometry ,Estimating equations ,01 natural sciences ,Article ,General Biochemistry, Genetics and Molecular Biology ,Correlation ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Bias ,Statistics ,Cluster (physics) ,Cluster Analysis ,Humans ,Computer Simulation ,030212 general & internal medicine ,0101 mathematics ,Generalized estimating equation ,Eigenvalues and eigenvectors ,Randomized Controlled Trials as Topic ,Mathematics ,General Immunology and Microbiology ,Covariance matrix ,Applied Mathematics ,Estimator ,General Medicine ,Cross-Sectional Studies ,Research Design ,Sample size determination ,Sample Size ,General Agricultural and Biological Sciences - Abstract
In stepped wedge cluster randomized trials, intact clusters of individuals switch from control to intervention from a randomly assigned period onwards. Such trials are becoming increasingly popular in health services research. When a closed cohort is recruited from each cluster for longitudinal follow-up, proper sample size calculation should account for three distinct types of intraclass correlations: the within-period, the inter-period, and the within-individual correlations. Setting the latter two correlation parameters to be equal accommodates cross-sectional designs. We propose sample size procedures for continuous and binary responses within the framework of generalized estimating equations that employ a block exchangeable within-cluster correlation structure defined from the distinct correlation types. For continuous responses, we show that the intraclass correlations affect power only through two eigenvalues of the correlation matrix. We demonstrate that analytical power agrees well with simulated power for as few as eight clusters, when data are analyzed using bias-corrected estimating equations for the correlation parameters concurrently with a bias-corrected sandwich variance estimator.
- Published
- 2018
30. Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data
- Author
-
Sandra E. Safo, Yongho Jeon, Sungkyu Jung, and Jeongyoun Ahn
- Subjects
FOS: Computer and information sciences ,0301 basic medicine ,Statistics and Probability ,Biometry ,Multivariate analysis ,Linear programming ,Carcinogenesis ,Computer science ,Breast Neoplasms ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Methodology (stat.ME) ,010104 statistics & probability ,03 medical and health sciences ,Dimension (vector space) ,Humans ,Computer Simulation ,0101 mathematics ,Statistics - Methodology ,Eigendecomposition of a matrix ,General Immunology and Microbiology ,Applied Mathematics ,General Medicine ,DNA Methylation ,Covariance ,030104 developmental biology ,Sample size determination ,Sample Size ,Multivariate Analysis ,Transcriptome ,General Agricultural and Biological Sciences ,Canonical correlation ,Algorithm ,computer ,Data integration - Abstract
We present a method for individual and integrative analysis of high dimension, low sample size data that capitalizes on the recurring theme in multivariate analysis of projecting higher dimensional data onto a few meaningful directions that are solutions to a generalized eigenvalue problem. We propose a general framework, called SELP (Sparse Estimation with Linear Programming), with which one can obtain a sparse estimate for a solution vector of a generalized eigenvalue problem. We demonstrate the utility of SELP on canonical correlation analysis for an integrative analysis of methylation and gene expression profiles from a breast cancer study, and we identify some genes known to be associated with breast carcinogenesis, which indicates that the proposed method is capable of generating biologically meaningful insights. Simulation studies suggest that the proposed method performs competitive in comparison with some existing methods in identifying true signals in various underlying covariance structures.
- Published
- 2018
31. Pseudo and conditional score approach to joint analysis of current count and current status data
- Author
-
Chi-Chung Wen and Yi-Hau Chen
- Subjects
Statistics and Probability ,Biometry ,Frail Elderly ,Asymptotic distribution ,Interval (mathematics) ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Fractures, Bone ,010104 statistics & probability ,Recurrence ,Risk Factors ,Statistics ,Humans ,Computer Simulation ,Poisson Distribution ,0101 mathematics ,Aged ,Event (probability theory) ,Mathematics ,Aged, 80 and over ,Likelihood Functions ,Frailty ,General Immunology and Microbiology ,Applied Mathematics ,010102 general mathematics ,Estimator ,General Medicine ,Regression ,Delta method ,Survival function ,Sample size determination ,Data Interpretation, Statistical ,Sample Size ,Osteoporosis ,General Agricultural and Biological Sciences - Abstract
We develop a joint analysis approach for recurrent and nonrecurrent event processes subject to case I interval censorship, which are also known in literature as current count and current status data, respectively. We use a shared frailty to link the recurrent and nonrecurrent event processes, while leaving the distribution of the frailty fully unspecified. Conditional on the frailty, the recurrent event is assumed to follow a nonhomogeneous Poisson process, and the mean function of the recurrent event and the survival function of the nonrecurrent event are assumed to follow some general form of semiparametric transformation models. Estimation of the models is based on the pseudo-likelihood and the conditional score techniques. The resulting estimators for the regression parameters and the unspecified baseline functions are shown to be consistent with rates of square and cubic roots of the sample size, respectively. Asymptotic normality with closed-form asymptotic variance is derived for the estimator of the regression parameters. We apply the proposed method to a fracture-osteoporosis survey data to identify risk factors jointly for fracture and osteoporosis in elders, while accounting for association between the two events within a subject.
- Published
- 2018
32. Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression
- Author
-
Chirag J. Patel, Corwin M. Zigler, Francesca Dominici, and Ander Wilson
- Subjects
Statistics and Probability ,Multivariate statistics ,General Immunology and Microbiology ,Mean squared error ,Applied Mathematics ,Confounding ,Regression analysis ,General Medicine ,Bayesian inference ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Sample size determination ,Linear regression ,Covariate ,Statistics ,030212 general & internal medicine ,0101 mathematics ,General Agricultural and Biological Sciences ,Mathematics - Abstract
In environmental and nutritional epidemiology and in many other fields, there is increasing interest in estimating the effect of simultaneous exposure to several agents (e.g., multiple nutrients, pesticides, or air pollutants) on a health outcome. We consider estimating the effect of a multivariate exposure that includes several continuous agents and their interactions-on an outcome, when the true confounding variables are an unknown subset of a potentially large (relative to sample size) set of measured covariates. Our approach is rooted in the ideas of Bayesian model averaging: the exposure effect is estimated as a weighted average of the estimated exposure effects obtained under several linear regression models that include different sets of the potential confounders. We introduce a data-driven prior that assigns to the likely confounders a higher probability of being included into the regression model. We show that our approach can also be formulated as a penalized likelihood formulation with an interpretable tuning parameter. Through a simulation study, we demonstrate that the proposed approach identifies parsimonious models that are fully adjusted for observed confounding and estimates the multivariate exposure effect with smaller mean squared error compared to several alternatives. We apply the method to an Environmental Wide Association Study using National Heath and Nutrition Examination Survey to estimate the effect of mixtures of nutrients and pesticides on lipid levels.
- Published
- 2018
33. Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size
- Author
-
Chung-Wei Shen and Yi-Hau Chen
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,business.industry ,Applied Mathematics ,Model selection ,Accounting ,Regression analysis ,Feature selection ,General Medicine ,Disease cluster ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Regression ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Sample size determination ,Resampling ,030212 general & internal medicine ,0101 mathematics ,General Agricultural and Biological Sciences ,business ,Generalized estimating equation ,Mathematics - Abstract
We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly.
- Published
- 2018
34. Motivating sample sizes in adaptive Phase I trials via Bayesian posterior credible intervals
- Author
-
Thomas Braun
- Subjects
Statistics and Probability ,Optimal design ,General Immunology and Microbiology ,Computer science ,Applied Mathematics ,Bayesian probability ,Nonparametric statistics ,Contrast (statistics) ,Context (language use) ,Sample (statistics) ,General Medicine ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Sample size determination ,030220 oncology & carcinogenesis ,Statistics ,0101 mathematics ,General Agricultural and Biological Sciences ,Beta distribution - Abstract
In contrast with typical Phase III clinical trials, there is little existing methodology for determining the appropriate numbers of patients to enroll in adaptive Phase I trials. And, as stated by Dennis Lindley in a more general context, "[t]he simple practical question of 'What size of sample should I take' is often posed to a statistician, and it is a question that is embarrassingly difficult to answer." Historically, simulation has been the primary option for determining sample sizes for adaptive Phase I trials, and although useful, can be problematic and time-consuming when a sample size is needed relatively quickly. We propose a computationally fast and simple approach that uses Beta distributions to approximate the posterior distributions of DLT rates of each dose and determines an appropriate sample size through posterior coverage rates. We provide sample sizes produced by our methods for a vast number of realistic Phase I trial settings and demonstrate that our sample sizes are generally larger than those produced by a competing approach that is based upon the nonparametric optimal design.
- Published
- 2018
35. Experimental design for multi-drug combination studies using signaling networks
- Author
-
HongâBin Fang, Ming Tan, and Hengzhen Huang
- Subjects
0301 basic medicine ,Statistics and Probability ,Drug ,Optimal design ,General Immunology and Microbiology ,Computer science ,Applied Mathematics ,media_common.quotation_subject ,In silico ,General Medicine ,01 natural sciences ,Laboratory testing ,General Biochemistry, Genetics and Molecular Biology ,Toxicology ,010104 statistics & probability ,03 medical and health sciences ,Signaling network ,030104 developmental biology ,Drug development ,Sample size determination ,Biochemical engineering ,0101 mathematics ,General Agricultural and Biological Sciences ,media_common - Abstract
Combinations of multiple drugs are an important approach to maximize the chance for therapeutic success by inhibiting multiple pathways/targets. Analytic methods for studying drug combinations have received increasing attention because major advances in biomedical research have made available large number of potential agents for testing. The preclinical experiment on multi-drug combinations plays a key role in (especially cancer) drug development because of the complex nature of the disease, the need to reduce development time and costs. Despite recent progresses in statistical methods for assessing drug interaction, there is an acute lack of methods for designing experiments on multi-drug combinations. The number of combinations grows exponentially with the number of drugs and dose-levels and it quickly precludes laboratory testing. Utilizing experimental dose-response data of single drugs and a few combinations along with pathway/network information to obtain an estimate of the functional structure of the dose-response relationship in silico, we propose an optimal design that allows exploration of the dose-effect surface with the smallest possible sample size in this article. The simulation studies show our proposed methods perform well.
- Published
- 2017
36. Adaptive designs for the one-sample log-rank test
- Author
-
Robert Kwiecien, Rene Schmidt, and Andreas Faldum
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Applied Mathematics ,Martingale central limit theorem ,Sample (statistics) ,General Medicine ,Separation principle ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Log-rank test ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Sample size determination ,030220 oncology & carcinogenesis ,Statistics ,Computerized adaptive testing ,0101 mathematics ,General Agricultural and Biological Sciences ,Null hypothesis ,Statistic ,Mathematics - Abstract
Summary Traditional designs in phase IIa cancer trials are single-arm designs with a binary outcome, for example, tumor response. In some settings, however, a time-to-event endpoint might appear more appropriate, particularly in the presence of loss to follow-up. Then the one-sample log-rank test might be the method of choice. It allows to compare the survival curve of the patients under treatment to a prespecified reference survival curve. The reference curve usually represents the expected survival under standard of the care. In this work, convergence of the one-sample log-rank statistic to Brownian motion is proven using Rebolledo's martingale central limit theorem while accounting for staggered entry times of the patients. On this basis, a confirmatory adaptive one-sample log-rank test is proposed where provision is made for data dependent sample size reassessment. The focus is to apply the inverse normal method. This is done in two different directions. The first strategy exploits the independent increments property of the one-sample log-rank statistic. The second strategy is based on the patient-wise separation principle. It is shown by simulation that the proposed adaptive test might help to rescue an underpowered trial and at the same time lowers the average sample number (ASN) under the null hypothesis as compared to a single-stage fixed sample design.
- Published
- 2017
37. Sample size determination for multilevel hierarchical designs using generalized linear mixed models
- Author
-
Anup Amatya and Dulal K. Bhaumik
- Subjects
Statistics and Probability ,Research design ,General Immunology and Microbiology ,Applied Mathematics ,Linear model ,General Medicine ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Generalized linear mixed model ,Power (physics) ,010104 statistics & probability ,03 medical and health sciences ,Power analysis ,0302 clinical medicine ,Sample size determination ,Probability distribution ,030212 general & internal medicine ,Data mining ,0101 mathematics ,General Agricultural and Biological Sciences ,computer ,Type I and type II errors ,Mathematics - Abstract
A unified statistical methodology of sample size determination is developed for hierarchical designs that are frequently used in many areas, particularly in medical and health research studies. The solid foundation of the proposed methodology opens a new horizon for power analysis in presence of various conditions. Important features such as joint significance testing, unequal allocations of clusters across intervention groups, and differential attrition rates over follow up time points are integrated to address some useful questions that investigators often encounter while conducting such studies. Proposed methodology is shown to perform well in terms of maintaining type I error rates and achieving the target power under various conditions. Proposed method is also shown to be robust with respect to violation of distributional assumptions of random-effects.
- Published
- 2017
38. On estimation of time-dependent attributable fraction from population-based case-control studies
- Author
-
Li Hsu, Wei Zhao, and Ying Qing Chen
- Subjects
Statistics and Probability ,education.field_of_study ,General Immunology and Microbiology ,Applied Mathematics ,Population ,Estimator ,General Medicine ,Odds ratio ,Logistic regression ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Delta method ,0302 clinical medicine ,Sample size determination ,Attributable risk ,Statistics ,Kernel smoother ,030212 general & internal medicine ,0101 mathematics ,General Agricultural and Biological Sciences ,education ,Mathematics - Abstract
Summary Population attributable fraction (PAF) is widely used to quantify the disease burden associated with a modifiable exposure in a population. It has been extended to a time-varying measure that provides additional information on when and how the exposure's impact varies over time for cohort studies. However, there is no estimation procedure for PAF using data that are collected from population-based case-control studies, which, because of time and cost efficiency, are commonly used for studying genetic and environmental risk factors of disease incidences. In this article, we show that time-varying PAF is identifiable from a case-control study and develop a novel estimator of PAF. Our estimator combines odds ratio estimates from logistic regression models and density estimates of the risk factor distribution conditional on failure times in cases from a kernel smoother. The proposed estimator is shown to be consistent and asymptotically normal with asymptotic variance that can be estimated empirically from the data. Simulation studies demonstrate that the proposed estimator performs well in finite sample sizes. Finally, the method is illustrated by a population-based case-control study of colorectal cancer.
- Published
- 2017
39. Conditional estimation in two-stage adaptive designs
- Author
-
Frank Miller and Per Broberg
- Subjects
Statistics and Probability ,Estimation ,General Immunology and Microbiology ,Computer science ,Applied Mathematics ,Maximum likelihood ,Estimator ,Conditional maximum likelihood ,General Medicine ,Conditional estimation ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Bias of an estimator ,Sample size determination ,Statistics ,030212 general & internal medicine ,Stage (hydrology) ,0101 mathematics ,General Agricultural and Biological Sciences - Abstract
We consider conditional estimation in two-stage sample size adjustable designs and the following bias. More specifically, we consider a design which permits raising the sample size when interim results look rather promising, and, which keeps the originally planned sample size when results look very promising. The estimation procedures reported comprise the unconditional maximum likelihood, the conditionally unbiased Rao-Blackwell estimator, the conditional median unbiased estimator, and the conditional maximum likelihood with and without bias correction. We compare these estimators based on analytical results and by a simulation study. We show in a real clinical trial setting how they can be applied.
- Published
- 2017
40. Estimation and testing problems in auditory neuroscience via clustering
- Author
-
Youngdeok Hwang, Bret Hanlon, and Samantha Wright
- Subjects
Statistics and Probability ,Quantitative Biology::Neurons and Cognition ,General Immunology and Microbiology ,Computer science ,Applied Mathematics ,05 social sciences ,Experimental data ,Estimator ,General Medicine ,Mixture model ,050105 experimental psychology ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,0302 clinical medicine ,medicine.anatomical_structure ,Sample size determination ,Singular value decomposition ,medicine ,Auditory system ,0501 psychology and cognitive sciences ,General Agricultural and Biological Sciences ,Cluster analysis ,Neuroscience ,030217 neurology & neurosurgery ,Statistical hypothesis testing - Abstract
Summary The processing of auditory information in neurons is an important area in neuroscience. We consider statistical analysis for an electrophysiological experiment related to this area. The recorded synaptic current responses from the experiment are observed as clusters, where the number of clusters is related to an important characteristic of the auditory system. This number is difficult to estimate visually because the clusters are blurred by biological variability. Using singular value decomposition and a Gaussian mixture model, we develop an estimator for the number of clusters. Additionally, we provide a method for hypothesis testing and sample size determination in the two-sample problem. We illustrate our approach with both simulated and experimental data.
- Published
- 2017
41. Estimating individualized treatment regimes from crossover designs
- Author
-
Crystal T. Nguyen, Jaimie N. Davis, Donna Spruijt-Metz, Michael R. Kosorok, Daniel J. Luckett, Anna R. Kahkoska, and Grace E. Shearrer
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,Population ,Crossover ,Decision tree ,Machine Learning (stat.ML) ,01 natural sciences ,Statistics - Applications ,General Biochemistry, Genetics and Molecular Biology ,Article ,Machine Learning (cs.LG) ,Methodology (stat.ME) ,Machine Learning ,010104 statistics & probability ,03 medical and health sciences ,Statistics - Machine Learning ,Statistics ,Humans ,Learning ,Applications (stat.AP) ,0101 mathematics ,Precision Medicine ,education ,Statistics - Methodology ,030304 developmental biology ,Statistical hypothesis testing ,0303 health sciences ,education.field_of_study ,Cross-Over Studies ,General Immunology and Microbiology ,Applied Mathematics ,Inverse probability weighting ,General Medicine ,Decision rule ,Crossover study ,3. Good health ,Sample size determination ,Research Design ,General Agricultural and Biological Sciences - Abstract
Personalized medicine is the practice of tailoring treatment to account for patient heterogeneity (Chakraborty and Murphy, 2014). Health care providers have practiced personalized medicine by adjusting doses or prescriptions based on a patient’s medical history or demographics for centuries (Ashley, 2015). Precision medicine is an emerging field that aims to support personalized medicine decisions with reproducible research (Collins and Varmus, 2015). Such research is imperative, particularly when diseases are expressed with great heterogeneity across patients. A topic of interest in precision medicine is the individualized treatment regime (ITR): a set of decision rules for one or more decision time points that can be used to assign patients to treatment that is tailored by their patient-specific factors (Lavori and Dawson, 2014). One objective in precision medicine is to estimate the optimal ITR, or the ITR that maximizes the mean of some desirable outcome (Kosorok and Moodie, 2015). Crossover clinical trials are uniquely suited to precision medicine, because they allow for observing responses to multiple treatments for each patient. This paper introduces a method to estimate optimal ITRs using data from a crossover study by extending generalized outcome weighted learning (GOWL) (Chen et al., 2018) to deal with correlated outcomes. In a crossover study, patients are randomized to a sequence of treatments rather than a single treatment. Thus, multiple outcomes are observed, one per subject from each treatment period, and each subject acts as his or her own control for reduced between-subject variability (Wellek and Blettner, 2012). Such designs are popular in pilot studies, where only a small sample of individuals is available and/or affordable, because crossover designs at a lower sample size are able achieve the same level of power as a parallel design with a larger sample size. To concurrently address precision medicine aims with aims of hypothesis testing, it is imperative that ITR estimation methods be broadened to crossover designs in an accessible and easy to implement way in order to better take advantage of the additional data available in such crossover studies. There have been many developments in machine learning methods for answering precision medicine questions from parallel study designs. For example, Qian and Murphy (2011) indirectly estimate the decision rule using L1 penalized least squares; Zhang et al. (2012a) maximize a doubly robust augmented inverse probability weighted estimator for the population mean outcome; Athey and Wager (2017) maximize a doubly robust score that may take into account instrumental variables; Kallus (2018) employs a weighting algorithm similar to inverse probability weighting but minimize the worst case mean square error; Laber and Zhao (2015) propose the use of decision trees, which prove to be both flexible and easily interpretable; Zhao et al. (2012), Zhang et al. (2012b), Zhou et al. (2017), and Chen et al. (2018) directly estimate the decision rule by viewing the problem from a weighted classification standpoint. However, little work has been done to develop precision medicine methods that handle correlated observations in the single-stage decision setting such as those that arise from crossover designs. Kulasekera and Siriwardhana (2019) propose a weighted ranking algorithm to estimate a decision rule that maximizes either the expected outcome or the probability of selecting the best treatment, but they assume that there are no carryover effects present. Because the intended effect of the washout period can be difficult to achieve in practice (Wellek and Blettner, 2012), it is imperative that methods for crossover designs can be applied when carryover effects are present. In this paper, we show that the difference in response to two treatments from a 2 × 2 crossover trial can be used as the reward in the generalized outcome weighted learning (GOWL) objective function to estimate an optimal ITR. We introduce a plug-in estimator that can be used with the proposed method to account for carryover effects. Additionally, we show that using a crossover design with the proposed method results in improvements in misclassification rate and estimated value when compared to standard methods for a parallel design at the same sample size. As a clinical example, consider nutritional recommendations surrounding the intake of dietary fiber for the purpose of weight loss. Although increased fiber is recommended across the population for a myriad of health benefits (US Department of Agriculture, 2010), evidence of the impact of the consumption of dietary fiber for improved satiety and reduction in body weight is mixed (Halliday et al., 2018). Heterogeneity in response to dietary fiber may be leveraged to develop targeted fiber interventions to promote feelings of satiety. We use data from a crossover study in which Hispanic and African American adolescents who are overweight and obese were fed breakfast and lunch under a typical Western high sugar diet condition, and a high fiber diet condition. From these data, we estimate a decision rule with which clinical care providers can input patient characteristics, including demographics and clinical measures, and receive a recommendation to maximize the change in measures of perceived satiety from before breakfast to after lunch. This type of analysis could be useful in identifying a subgroup of at-risk adolescents for which targeting specific dietary recommendations is expected to lead to an increase in patient-reported satiety, thereby helping to decrease caloric intake in a population with great clinical need for effective, non-invasive weight loss strategies. The rest of this paper is organized as follows. In Section 2, we review outcome weighted learning (OWL) (Zhao et al., 2012) and present the proposed method for estimating an optimal ITR from a crossover study regardless of the presence of carryover effects. Section 3 establishes Fisher and global consistency. Section 4 demonstrates the performance of the proposed method in simulation studies, with results on misclassification rate and estimated value. Section 5 displays an analysis of data from a feeding trial with overweight and obese Hispanic and African American adolescents, and we conclude with a discussion in Section 6. Additional simulation results, feeding trial results and their discussion, and theoretical derivations may be found in the web based supporting materials.
- Published
- 2018
42. Model averaged double robust estimation
- Author
-
Nils D. Arvold, Francesca Dominici, Giovanni Parmigiani, and Matthew Cefalu
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Mean squared error ,Applied Mathematics ,Confounding ,Estimator ,General Medicine ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Robustness (computer science) ,Sample size determination ,Causal inference ,Statistics ,Covariate ,Propensity score matching ,Econometrics ,030212 general & internal medicine ,0101 mathematics ,General Agricultural and Biological Sciences ,Mathematics - Abstract
Researchers estimating causal effects are increasingly challenged with decisions on how to best control for a potentially high-dimensional set of confounders. Typically, a single propensity score model is chosen and used to adjust for confounding, while the uncertainty surrounding which covariates to include into the propensity score model is often ignored, and failure to include even one important confounder will results in bias. We propose a practical and generalizable approach that overcomes the limitations described above through the use of model averaging. We develop and evaluate this approach in the context of double robust estimation. More specifically, we introduce the model averaged double robust (MA-DR) estimators, which account for model uncertainty in both the propensity score and outcome model through the use of model averaging. The MA-DR estimators are defined as weighted averages of double robust estimators, where each double robust estimator corresponds to a specific choice of the outcome model and the propensity score model. The MA-DR estimators extend the desirable double robustness property by achieving consistency under the much weaker assumption that either the true propensity score model or the true outcome model be within a specified, possibly large, class of models. Using simulation studies, we also assessed small sample properties, and found that MA-DR estimators can reduce mean squared error substantially, particularly when the set of potential confounders is large relative to the sample size. We apply the methodology to estimate the average causal effect of temozolomide plus radiotherapy versus radiotherapy alone on one-year survival in a cohort of 1887 Medicare enrollees who were diagnosed with glioblastoma between June 2005 and December 2009.
- Published
- 2016
43. Integrative genetic risk prediction using non-parametric empirical Bayes classification
- Author
-
Sihai Dave Zhao
- Subjects
0301 basic medicine ,Statistics and Probability ,Computer science ,Disease ,Machine learning ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Bayes' theorem ,Component (UML) ,0101 mathematics ,Genetic risk ,General Immunology and Microbiology ,business.industry ,Applied Mathematics ,Nonparametric statistics ,General Medicine ,Effective sample size ,030104 developmental biology ,Sample size determination ,Personalized medicine ,Artificial intelligence ,General Agricultural and Biological Sciences ,business ,computer - Abstract
Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find existing data that examine the target disease of interest, especially if that disease is rare or poorly studied. Furthermore, individual-level genotype data from these auxiliary studies are typically difficult to obtain. This paper proposes a new approach to integrative genetic risk prediction of complex diseases with binary phenotypes. It accommodates possible heterogeneity in the genetic etiologies of the target and auxiliary diseases using a tuning parameter-free nonparametric empirical Bayes procedure, and can be trained using only auxiliary summary statistics. Simulation studies show that the proposed method can provide superior predictive accuracy relative to non-integrative as well as integrative classifiers. The method is applied to a recent study of pediatric autoimmune diseases, where it substantially reduces prediction error for certain target/auxiliary disease combinations. The proposed method is implemented in the R package ssa.
- Published
- 2016
44. An adaptive Mantel-Haenszel test for sensitivity analysis in observational studies
- Author
-
Paul R. Rosenbaum and Dylan S. Small
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Applied Mathematics ,General Medicine ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Cochran–Mantel–Haenszel statistics ,Test (assessment) ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Joint probability distribution ,Sample size determination ,030220 oncology & carcinogenesis ,Multiple comparisons problem ,Statistics ,Computerized adaptive testing ,Sensitivity (control systems) ,Data mining ,0101 mathematics ,General Agricultural and Biological Sciences ,computer ,Mathematics ,Statistical hypothesis testing - Abstract
In a sensitivity analysis in an observational study with a binary outcome, is it better to use all of the data or to focus on subgroups that are expected to experience the largest treatment effects? The answer depends on features of the data that may be difficult to anticipate, a trade-off between unknown effect-sizes and known sample sizes. We propose a sensitivity analysis for an adaptive test similar to the Mantel-Haenszel test. The adaptive test performs two highly correlated analyses, one focused analysis using a subgroup, one combined analysis using all of the data, correcting for multiple testing using the joint distribution of the two test statistics. Because the two component tests are highly correlated, this correction for multiple testing is small compared with, for instance, the Bonferroni inequality. The test has the maximum design sensitivity of two component tests. A simulation evaluates the power of a sensitivity analysis using the adaptive test. Two examples are presented. An R package, sensitivity2x2xk, implements the procedure.
- Published
- 2016
45. A Bayesian integrative approach for multi-platform genomic data: A kidney cancer case study
- Author
-
James D. Doecke, Thierry Chekouo, Kim Anh Do, and Francesco C. Stingo
- Subjects
0301 basic medicine ,Statistics and Probability ,Computer science ,Bayesian probability ,Genomics ,Machine learning ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Statistical power ,010104 statistics & probability ,03 medical and health sciences ,Bayes' theorem ,0101 mathematics ,Markov random field ,General Immunology and Microbiology ,business.industry ,Applied Mathematics ,Regression analysis ,General Medicine ,Data science ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Sample size determination ,Artificial intelligence ,General Agricultural and Biological Sciences ,business ,computer - Abstract
Integration of genomic data from multiple platforms has the capability to increase precision, accuracy, and statistical power in the identification of prognostic biomarkers. A fundamental problem faced in many multi-platform studies is unbalanced sample sizes due to the inability to obtain measurements from all the platforms for all the patients in the study. We have developed a novel Bayesian approach that integrates multi-regression models to identify a small set of biomarkers that can accurately predict time-to-event outcomes. This method fully exploits the amount of available information across platforms and does not exclude any of the subjects from the analysis. Through simulations, we demonstrate the utility of our method and compare its performance to that of methods that do not borrow information across regression models. Motivated by The Cancer Genome Atlas kidney renal cell carcinoma dataset, our methodology provides novel insights missed by non-integrative models.
- Published
- 2016
46. Testing violations of the exponential assumption in cancer clinical trials with survival endpoints
- Author
-
Daniel Zelterman, Christos Hatzis, Heping Zhang, Kerin B. Adelson, Lajos Pusztai, Gang Han, and Michael J. Schell
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Cancer clinical trial ,Applied Mathematics ,Failure rate ,General Medicine ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Exponential function ,Clinical trial ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Sample size determination ,030220 oncology & carcinogenesis ,Censoring (clinical trials) ,Statistics ,Econometrics ,0101 mathematics ,General Agricultural and Biological Sciences ,Survival analysis ,Type I and type II errors ,Mathematics - Abstract
Personalized cancer therapy requires clinical trials with smaller sample sizes compared to trials involving unselected populations that have not been divided into biomarker subgroups. The use of exponential survival modeling for survival endpoints has the potential of gaining 35% efficiency or saving 28% required sample size (Miller, 1983), making personalized therapy trials more feasible. However, the use of exponential survival has not been fully accepted in cancer research practice due to uncertainty about whether or not the exponential assumption holds. We propose a test for identifying violations of the exponential assumption using a reduced piecewise exponential approach. Compared with an alternative goodness-of-fit test, which suffers from inflation of type I error rate under various censoring mechanisms, the proposed test maintains the correct type I error rate. We conduct power analysis using simulated data based on different types of cancer survival distribution in the SEER registry database, and demonstrate the implementation of this approach in existing cancer clinical trials.
- Published
- 2016
47. Sequential multiple assignment randomization trials with enrichment design
- Author
-
Ying Liu, Donglin Zeng, and Yuanjia Wang
- Subjects
Statistics and Probability ,Research design ,Computer science ,Sample (statistics) ,Machine learning ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,law.invention ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Resource (project management) ,Randomized controlled trial ,law ,Statistics ,0101 mathematics ,Duration (project management) ,Dropout (neural networks) ,General Immunology and Microbiology ,business.industry ,Applied Mathematics ,Clinical study design ,technology, industry, and agriculture ,General Medicine ,humanities ,Sample size determination ,Artificial intelligence ,General Agricultural and Biological Sciences ,business ,computer ,030217 neurology & neurosurgery - Abstract
Sequential multiple assignment randomization trial (SMART) is a powerful design to study Dynamic Treatment Regimes (DTRs) and allows causal comparisons of DTRs. To handle practical challenges of SMART, we propose a SMART with Enrichment (SMARTer) design, which performs stage-wise enrichment for SMART. SMARTer can improve design efficiency, shorten the recruitment period, and partially reduce trial duration to make SMART more practical with limited time and resource. Specifically, at each subsequent stage of a SMART, we enrich the study sample with new patients who have received previous stages’ treatments in a naturalistic fashion without randomization, and only randomize them among the current stage treatment options. One extreme case of the SMARTer is to synthesize separate independent single-stage randomized trials with patients who have received previous stage treatments. We show data from SMARTer allows for unbiased estimation of DTRs as SMART does under certain assumptions. Furthermore, we show analytically that the efficiency gain of the new design over SMART can be significant especially when the dropout rate is high. Lastly, extensive simulation studies are performed to demonstrate performance of SMARTer design, and sample size estimation in a scenario informed by real data from a SMART study is presented.
- Published
- 2016
48. Likelihood ratio tests for a dose-response effect using multiple nonlinear regression models
- Author
-
Björn Bornkamp and Georg Gutjahr
- Subjects
Statistics and Probability ,Mathematical optimization ,General Immunology and Microbiology ,Distribution (number theory) ,Applied Mathematics ,Asymptotic distribution ,General Medicine ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Nonlinear system ,0302 clinical medicine ,Sample size determination ,Null distribution ,Applied mathematics ,Z-test ,030212 general & internal medicine ,0101 mathematics ,General Agricultural and Biological Sciences ,Null hypothesis ,Nonlinear regression ,Mathematics - Abstract
We consider the problem of testing for a dose-related effect based on a candidate set of (typically nonlinear) dose-response models using likelihood-ratio tests. For the considered models this reduces to assessing whether the slope parameter in these nonlinear regression models is zero or not. A technical problem is that the null distribution (when the slope is zero) depends on non-identifiable parameters, so that standard asymptotic results on the distribution of the likelihood-ratio test no longer apply. Asymptotic solutions for this problem have been extensively discussed in the literature. The resulting approximations however are not of simple form and require simulation to calculate the asymptotic distribution. In addition, their appropriateness might be doubtful for the case of a small sample size. Direct simulation to approximate the null distribution is numerically unstable due to the non identifiability of some parameters. In this article, we derive a numerical algorithm to approximate the exact distribution of the likelihood-ratio test under multiple models for normally distributed data. The algorithm uses methods from differential geometry and can be used to evaluate the distribution under the null hypothesis, but also allows for power and sample size calculations. We compare the proposed testing approach to the MCP-Mod methodology and alternative methods for testing for a dose-related trend in a dose-finding example data set and simulations.
- Published
- 2016
49. Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering
- Author
-
Wen-Xin Zhou, Lan Wang, Jinyuan Chang, and Wen Zhou
- Subjects
0301 basic medicine ,Statistics and Probability ,General Immunology and Microbiology ,Computer science ,Group (mathematics) ,Applied Mathematics ,Structure (category theory) ,General Medicine ,Covariance ,Quantitative Biology::Genomics ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Moment (mathematics) ,010104 statistics & probability ,03 medical and health sciences ,Matrix (mathematics) ,030104 developmental biology ,Sample size determination ,Feature (machine learning) ,0101 mathematics ,General Agricultural and Biological Sciences ,Cluster analysis ,Algorithm - Abstract
Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and is available on CRAN.
- Published
- 2016
50. Nonparametric analysis of bivariate gap time with competing risks
- Author
-
Chenguang Wang, Chiung Yu Huang, and Mei Cheng Wang
- Subjects
Statistics and Probability ,General Immunology and Microbiology ,Applied Mathematics ,05 social sciences ,Nonparametric statistics ,Estimator ,General Medicine ,Bivariate analysis ,Conditional probability distribution ,01 natural sciences ,Censoring (statistics) ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,Sample size determination ,0502 economics and business ,Statistics ,Econometrics ,Cumulative incidence ,0101 mathematics ,General Agricultural and Biological Sciences ,Statistic ,050205 econometrics ,Mathematics - Abstract
This article considers nonparametric methods for studying recurrent disease and death with competing risks. We first point out that comparisons based on the well-known cumulative incidence function can be confounded by different prevalence rates of the competing events, and that comparisons of the conditional distribution of the survival time given the failure event type are more relevant for investigating the prognosis of different patterns of recurrence disease. We then propose nonparametric estimators for the conditional cumulative incidence function as well as the conditional bivariate cumulative incidence function for the bivariate gap times, that is, the time to disease recurrence and the residual lifetime after recurrence. To quantify the association between the two gap times in the competing risks setting, a modified Kendall's tau statistic is proposed. The proposed estimators for the conditional bivariate cumulative incidence distribution and the association measure account for the induced dependent censoring for the second gap time. Uniform consistency and weak convergence of the proposed estimators are established. Hypothesis testing procedures for two-sample comparisons are discussed. Numerical simulation studies with practical sample sizes are conducted to evaluate the performance of the proposed nonparametric estimators and tests. An application to data from a pancreatic cancer study is presented to illustrate the methods developed in this article.
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.