1,448 results on '"type I error"'
Search Results
152. What is the proper way to apply the multiple comparison test?
- Author
-
Sangseok Lee and Dong Kyu Lee
- Subjects
alpha inflation ,analysis of variance ,bonferroni ,dunnett ,multiple comparison ,scheffé ,statistics ,tukey ,type i error ,type ii error ,Anesthesiology ,RD78.3-87.3 - Abstract
Multiple comparisons tests (MCTs) are performed several times on the mean of experimental conditions. When the null hypothesis is rejected in a validation, MCTs are performed when certain experimental conditions have a statistically significant mean difference or there is a specific aspect between the group means. A problem occurs if the error rate increases while multiple hypothesis tests are performed simultaneously. Consequently, in an MCT, it is necessary to control the error rate to an appropriate level. In this paper, we discuss how to test multiple hypotheses simultaneously while limiting type I error rate, which is caused by α inflation. To choose the appropriate test, we must maintain the balance between statistical power and type I error rate. If the test is too conservative, a type I error is not likely to occur. However, concurrently, the test may have insufficient power resulted in increased probability of type II error occurrence. Most researchers may hope to find the best way of adjusting the type I error rate to discriminate the real differences between observed data without wasting too much statistical power. It is expected that this paper will help researchers understand the differences between MCTs and apply them appropriately.
- Published
- 2018
- Full Text
- View/download PDF
153. Value of information methods to design a clinical trial in a small population to optimise a health economic utility function
- Author
-
Michael Pearce, Siew Wan Hee, Jason Madan, Martin Posch, Simon Day, Frank Miller, Sarah Zohar, and Nigel Stallard
- Subjects
Decision theory ,Health economics ,Power ,Rare disease ,Regulator ,Type I error ,Medicine (General) ,R5-920 - Abstract
Abstract Background Most confirmatory randomised controlled clinical trials (RCTs) are designed with specified power, usually 80% or 90%, for a hypothesis test conducted at a given significance level, usually 2.5% for a one-sided test. Approval of the experimental treatment by regulatory agencies is then based on the result of such a significance test with other information to balance the risk of adverse events against the benefit of the treatment to future patients. In the setting of a rare disease, recruiting sufficient patients to achieve conventional error rates for clinically reasonable effect sizes may be infeasible, suggesting that the decision-making process should reflect the size of the target population. Methods We considered the use of a decision-theoretic value of information (VOI) method to obtain the optimal sample size and significance level for confirmatory RCTs in a range of settings. We assume the decision maker represents society. For simplicity we assume the primary endpoint to be normally distributed with unknown mean following some normal prior distribution representing information on the anticipated effectiveness of the therapy available before the trial. The method is illustrated by an application in an RCT in haemophilia A. We explicitly specify the utility in terms of improvement in primary outcome and compare this with the costs of treating patients, both financial and in terms of potential harm, during the trial and in the future. Results The optimal sample size for the clinical trial decreases as the size of the population decreases. For non-zero cost of treating future patients, either monetary or in terms of potential harmful effects, stronger evidence is required for approval as the population size increases, though this is not the case if the costs of treating future patients are ignored. Conclusions Decision-theoretic VOI methods offer a flexible approach with both type I error rate and power (or equivalently trial sample size) depending on the size of the future population for whom the treatment under investigation is intended. This might be particularly suitable for small populations when there is considerable information about the patient population.
- Published
- 2018
- Full Text
- View/download PDF
154. Testing hypotheses under covariate-adaptive randomisation and additive models
- Author
-
Ting Ye
- Subjects
biased coin ,clinical trials ,robust test ,t-test ,type i error ,variance estimator ,Probabilities. Mathematical statistics ,QA273-280 - Abstract
Covariate-adaptive randomisation has a long history of applications in clinical trials. Shao, Yu, and Zhong [(2010). A theory for testing hypotheses under covariate-adaptive randomization. Biometrika, 97, 347–360] and Shao and Yu [(2013). Validity of tests under covariate-adaptive biased coin randomization and generalized linear models. Biometrics, 69, 960–969] showed that the simple t-test is conservative under covariate-adaptive biased coin (CABC) randomisation in terms of type I error, and proposed a valid test using the bootstrap. Under a general additive model with CABC randomisation, we construct a calibrated t-test that shares the same property as the bootstrap method in Shao et al. (2010), but do not need large computation required by the bootstrap method. Some simulation results are presented to show the finite sample performance of the calibrated t-test.
- Published
- 2018
- Full Text
- View/download PDF
155. Ranked Set Two-Sample Permutation Test
- Author
-
Monjed H. Samuh
- Subjects
Permutation test ,Ranked set sampling ,Statistical power ,Type I error ,Statistics ,HA1-4737 - Abstract
In this paper, ranked set two-sample permutation test of comparing two-independent groups in terms of some measure of location is presented. Three test statistics are proposed. The statistical power of these new test statistics are evaluated numerically. The results are compared with the statistical power of the usual two-sample permutation test under simple random sampling and with the classical independent two-sample t-test.
- Published
- 2018
- Full Text
- View/download PDF
156. Is There a Consensus? An Experimental Trial to Test the Sufficiency of Methodologies Used to Measure Economic Impact.
- Author
-
Rascher, Daniel A., Hyun, Giseob, and Nagel, Mark S.
- Subjects
ECONOMIC impact ,FALSE positive error ,NULL hypothesis ,ECONOMIC activity - Abstract
This research utilizes local GDP of 383 MSAs in the U.S. to determine whether historical methods in the academic literature to measure the economic impact of sports are sensitive enough to generate conclusive results. An experiment is created and shows that commonly used methods fail to be able to detect the builtin-by-design injections of economic activity for the experimental group until very high levels of treatment of at least $300 million to $1 billion annually are present, thus providing evidence that Type I errors (rejecting a true null hypothesis) are likely to have occurred in some of the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
157. Comparative evaluation of goodness of fit tests for normal distribution using simulation and empirical data.
- Author
-
Anastasiou, Achilleas, Karagrigoriou, Alex, and Katsileros, Anastasios
- Subjects
- *
GAUSSIAN distribution , *KURTOSIS , *FALSE positive error , *T-test (Statistics) , *DISTRIBUTION (Probability theory) , *GOODNESS-of-fit tests , *SAMPLE size (Statistics) - Abstract
The normal distribution is considered to be one of the most important distributions, with numerous applications in various fields, including the field of agricultural sciences. The purpose of this study is to evaluate the most popular normality tests, comparing the performance in terms of the size (type I error) and the power against a large spectrum of distributions with simulations for various sample sizes and significance levels, as well as through empirical data from agricultural experiments. The simulation results show that the power of all normality tests is low for small sample size, but as the sample size increases, the power increases as well. Also, the results show that the Shapiro–Wilk test is powerful over a wide range of alternative distributions and sample sizes and especially in asymmetric distributions. Moreover the D'Agostino–Pearson Omnibus test is powerful for small sample sizes against symmetric alternative distributions, while the same is true for the Kurtosis test for moderate and large sample sizes. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
158. Robust tests for treatment effect in survival analysis under covariate‐adaptive randomization.
- Author
-
Ye, Ting and Shao, Jun
- Subjects
RANDOMIZATION (Statistics) ,TREATMENT effectiveness ,SURVIVAL analysis (Biometry) ,FALSE positive error ,ASYMPTOTIC efficiencies ,LOG-rank test - Abstract
Summary: Covariate‐adaptive randomization is popular in clinical trials with sequentially arrived patients for balancing treatment assignments across prognostic factors that may have influence on the response. However, existing theory on tests for the treatment effect under covariate‐adaptive randomization is limited to tests under linear or generalized linear models, although the covariate‐adaptive randomization method has been used in survival analysis for a long time. Often, practitioners will simply adopt a conventional test to compare two treatments, which is controversial since tests derived under simple randomization may not be valid in terms of type I error under other randomization schemes. We derive the asymptotic distribution of the partial likelihood score function under covariate‐adaptive randomization and a working model that is subject to possible model misspecification. Using this general result, we prove that the partial likelihood score test that is robust against model misspecification under simple randomization is no longer robust but conservative under covariate‐adaptive randomization. We also show that the unstratified log‐rank test is conservative and the stratified log‐rank test remains valid under covariate‐adaptive randomization. We propose a modification to variance estimation in the partial likelihood score test, which leads to a score test that is valid and robust against arbitrary model misspecification under a large family of covariate‐adaptive randomization schemes including simple randomization. Furthermore, we show that the modified partial likelihood score test derived under a correctly specified model is more powerful than log‐rank‐type tests in terms of Pitman's asymptotic relative efficiency. Simulation studies about the type I error and power of various tests are presented under several popular randomization schemes. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
159. Multiple imputation score tests and an application to Cochran‐Mantel‐Haenszel statistics.
- Author
-
Lu, Kaifeng
- Subjects
- *
TEST scoring , *FALSE positive error , *PARAMETER estimation , *STATISTICS - Abstract
The standard multiple imputation technique focuses on parameter estimation. In this study, we describe a method for conducting score tests following multiple imputation. As an important application, we use the Cochran‐Mantel‐Haenszel (CMH) test as a score test and compare the proposed multiple imputation method with a method based on the Wilson‐Hilferty transformation of the CMH statistic. We show that the proposed multiple imputation method preserves the nominal significance level for three types of alternative hypotheses, whereas that based on the Wilson‐Hilferty transformation inflates type I error for the "row means differ" and "general association" alternative hypotheses. Moreover, we find that this type I error inflation worsens as the amount of missing data increases. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
160. Empirical profile Bayesian estimation for extrapolation of historical adult data to pediatric drug development.
- Author
-
Wu, Yaoshi, Hui, Jianan, and Deng, Qiqi
- Subjects
- *
DRUG development , *FALSE positive error , *CHILD patients , *TREATMENT effectiveness , *PEDIATRIC therapy - Abstract
Summary: For pediatric drug development, the clinical effectiveness of the study medication for the adult population has already been demonstrated. Given the fact that it is usually not feasible to enroll a large number of pediatric patients, appropriately leveraging historical adult data into pediatric evaluation may be critical to success of pediatric drug development. In this manuscript, we propose a new empirical Bayesian approach, profile Bayesian estimation, to dynamically borrow adult information to the evaluation of treatment effect in pediatric patients. The new approach demonstrates an attractive balance between type I error control and power gain under the transfer‐ability assumption that the pediatric treatment effect size may differ from the adult treatment effect size. The decision making boundary mimics the real‐world practice in pediatric drug development. In addition, the posterior mean of the proposed empirical profile Bayesian is an unbiased estimator of the true pediatric treatment effect. We compare our approach to robust mixture prior with prior weight for informative borrowing set to 0.5 or 0.9, regular Bayesian approach, and frequentist for both type I error and power. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
161. Assessing via Simulation the Operating Characteristics of the WHO Scale for COVID-19 Endpoints.
- Author
-
O'Kelly, Michael and Li, Siying
- Subjects
- *
COVID-19 , *FALSE positive error - Abstract
Many clinical trials of treatments for patients hospitalised for COVID-19 use an ordinal scale recommended by the World Heath Organisation. The scale represents intensity of medical intervention, with higher scores for interventions more burdensome for the patient, and highest score for death. There is uncertainty about use of this ordinal scale in testing hypotheses. With the objective of assessing the power and Type I error of potential endpoints and analyses based on the ordinal scale, trajectories of the score over 28 days were simulated for scenarios based closely on results of two trials recently published. The simulation used transition probabilities for the ordinal scale over time. No one endpoint was optimal across scenarios, but a ranked measure of trajectory fared moderately well in all scenarios. Type I error was controlled at close to the nominal level for all endpoints. Because not tied to a particular population with regard to baseline severity, the use of transition probabilities allows plausible assessment of endpoints in populations with configurations of baseline score for which data is not yet published, provided some data on the relevant transition probabilities are available. The results could support experts in the choice of endpoint based on the ordinal scale. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
162. Best (but oft-forgotten) practices: sample size and power calculation for a dietary intervention trial with episodically consumed foods.
- Author
-
Zhang, Wei, Liu, Aiyi, Zhang, Zhiwei, Nansel, Tonja, and Halabi, Susan
- Subjects
EXPERIMENTAL design ,INGESTION ,MEDICAL research ,STATISTICAL hypothesis testing ,STATISTICS ,SAMPLE size (Statistics) ,STATISTICAL power analysis ,DATA analysis - Abstract
Dietary interventions often target foods that are underconsumed relative to dietary guidelines, such as vegetables, fruits, and whole grains. Because these foods are only consumed episodically for some participants, data from such a study often contains a disproportionally large number of zeros due to study participants who do not consume any of the target foods on the days that dietary intake is assessed, thus generating semicontinuous data. These zeros need to be properly accounted for when calculating sample sizes to ensure that the study is adequately powered to detect a meaningful intervention effect size. Nonetheless, this issue has not been well addressed in the literature. Instead, methods that are common for continuous outcomes are typically used to compute the sample sizes, resulting in a substantially under- or overpowered study. We propose proper approaches to calculating the sample size needed for dietary intervention studies that target episodically consumed foods. Sample size formulae are derived for detecting the mean difference in the amount of intake of an episodically consumed food between an intervention and a control group. Numerical studies are conducted to investigate the accuracy of the sample size formulae as compared with the ad hoc methods. The simulation results show that the proposed formulae are appropriate for estimating the sample sizes needed to achieve the desired power for the study. The proposed method for sample size is recommended for designing dietary intervention studies targeting episodically consumed foods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
163. Reflexões sobre o viés de publicação: um guia para praticantes de estatística para a análise de dados e uso inapropriado do coeficiente de correlação em ciências da saúde.
- Author
-
da Cunha Nascimento, Dahan, Alves de Almeida, Jeeser, Delgado Carvalho, Thiago Hanyel, and Prestes, Jonato
- Abstract
Copyright of Revista Brasileira de Ciência e Movimento: RBCM is the property of Revista Brasileira de Ciencia e Movimento and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
164. Handling Skewed Data: A Comparison of Two Popular Methods.
- Author
-
Hammouri, Hanan M., Sabo, Roy T., Alsaadawi, Rasha, and Kheirallah, Khalid A.
- Subjects
MONTE Carlo method ,FALSE positive error ,GAMMA distributions ,BETA distribution ,MEDICAL scientists ,ERROR rates - Abstract
Scientists in biomedical and psychosocial research need to deal with skewed data all the time. In the case of comparing means from two groups, the log transformation is commonly used as a traditional technique to normalize skewed data before utilizing the two-group t-test. An alternative method that does not assume normality is the generalized linear model (GLM) combined with an appropriate link function. In this work, the two techniques are compared using Monte Carlo simulations; each consists of many iterations that simulate two groups of skewed data for three different sampling distributions: gamma, exponential, and beta. Afterward, both methods are compared regarding Type I error rates, power rates and the estimates of the mean differences. We conclude that the t-test with log transformation had superior performance over the GLM method for any data that are not normal and follow beta or gamma distributions. Alternatively, for exponentially distributed data, the GLM method had superior performance over the t-test with log transformation. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
165. Revisit of test‐then‐pool methods and some practical considerations.
- Author
-
Li, Wen, Liu, Frank, and Snavely, Duane
- Subjects
- *
FALSE positive error , *DRUG development - Abstract
Summary: Test‐then‐pool is a simple statistical method that borrows historical information to improve efficiency of the drug development process. The original test‐then‐pool method examines the difference between the historical and current information and then pools the information if there is no significant difference. One drawback of this method is that a nonsignificant difference may not always imply consistency between the historical and current information. As a result, the original test‐then‐pool method is more likely to incorrectly borrow information from the historical control when the current trial has a small sample size. Statistically, it is more natural to use an equivalence test for examining the consistency. This manuscript develops an equivalence‐based test‐then‐pool method for a continuous endpoint, explains the relationship between the two test‐then‐pool methods, explores the choice of an equivalence margin through the overlap probability, and proposes an adjustment to the nominal testing level for controlling type I error under the true consistency scenario. Furthermore, the analytical forms of the type I error and power for the two test‐then‐pool methods are derived, and practical considerations for using them are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
166. Evaluating sources of baseline data using dual‐criteria and conservative dual‐criteria methods: A quantitative analysis.
- Author
-
Falligant, John Michael, McNulty, Molly K., Kranak, Michael P., Hausman, Nicole L., and Rooker, Griffin W.
- Subjects
- *
DATABASE management , *RESEARCH methodology , *STATISTICS , *DECISION making in clinical medicine , *DATA analysis , *QUANTITATIVE research , *CONTENT mining - Abstract
Scheithauer et al. (2020) recently demonstrated that differences in the source of baseline data extracted from a functional analysis (FA) may not affect subsequent clinical decision‐making in comparison to a standard baseline. These outcomes warrant additional quantitative examination, as correspondence of visual analysis has sometimes been reported to be unreliable. In the current study, we quantified the occurrence of false positives within a dataset of FA and baseline data using the dual‐criteria (DC) and conservative dual‐criteria (CDC) methods. Results of this quantitative analysis suggest that false positives were more likely when using FA data (rather than original baseline data) as the initial treatment baseline. However, both sources of baseline data may have acceptably low levels of false positives for practical use. Overall, the findings provide preliminary quantitative support for the conclusion that determinations of effective treatment may be easily obtained using different sources of baseline data. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
167. Identifying error and accurately interpreting environmental DNA metabarcoding results: A case study to detect vertebrates at arid zone waterholes.
- Author
-
Furlan, Elise M., Davis, Jenny, and Duncan, Richard P.
- Subjects
- *
ARID regions , *DNA , *FALSE positive error , *DNA replication - Abstract
Environmental DNA (eDNA) metabarcoding surveys enable rapid, noninvasive identification of taxa from trace samples with wide‐ranging applications from characterizing local biodiversity to identifying food‐web interactions. However, the technique is prone to error from two major sources: (a) contamination through foreign DNA entering the workflow, and (b) misidentification of DNA within the workflow. Both types of error have the potential to obscure true taxon presence or to increase taxonomic richness by incorrectly identifying taxa as present at sample sites, but multiple error sources can remain unaccounted for in metabarcoding studies. Here, we use data from an eDNA metabarcoding study designed to detect vertebrate species at waterholes in Australia's arid zone to illustrate where and how in the workflow errors can arise, and how to mitigate those errors. We detected the DNA of 36 taxa spanning 34 families, 19 orders and five vertebrate classes in water samples from waterholes, demonstrating the potential for eDNA metabarcoding surveys to provide rapid, noninvasive detection in remote locations, and to widely sample taxonomic diversity from aquatic through to terrestrial taxa. However, we initially identified 152 taxa in the samples, meaning there were many false positive detections. We identified the sources of these errors, allowing us to design a stepwise process to detect and remove error, and provide a template to minimize similar errors that are likely to arise in other metabarcoding studies. Our findings suggest eDNA metabarcoding surveys need to be carefully conducted and screened for errors to ensure their accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
168. Strategies in adjusting for multiple comparisons: A primer for pediatric surgeons.
- Author
-
Staffa, Steven J. and Zurakowski, David
- Abstract
In pediatric surgery research, the issue of multiple comparisons commonly arises when there are multiple patient or experimental groups being compared two at a time on an outcome of interest. Performing multiple statistical comparisons increases the likelihood of finding a false positive result when there truly are no statistically significant group differences (falsely rejecting the null hypothesis when it is true). In order to control for the risk of false positive results, there are several statistical approaches that surgeons should consider in collaboration with a biostatistician when performing a study that is prone to the issue of false discovery related to multiple comparisons. It is becoming increasingly more common for high impact journals to require authors to carefully consider multiplicity in their studies. Therefore, the objective of this primer is to provide surgeons with a useful guide and recommendations on how to go about taking multiple comparisons into account to keep false positive results at an acceptable level. We provide background on the issue of multiple comparisons and risk of type I error and guidance on statistical approaches (i.e. multiple comparisons procedures) that can be implemented to control the type I false positive error rate based on the statistical analysis plan. These include, but are not limited to, the Bonferroni correction, the False Discovery Rate (FDR) approach, Tukey's procedure, Scheffé's procedure, Holm's procedure, and Dunnett's procedure. We present the results of the various approaches following one-way analysis of the variance (ANOVA) using a hypothetical surgical research example of the comparison between three experimental groups of rats on skin defect coverage for experimental spina bifida: the TRASCET group, sham control, and saline control. The ultimate decision in accounting for multiple comparisons is situation-dependent and surgeons should work with their statistical colleagues to ensure the best approach for controlling the type I error rate and interpreting the evidence when making multiple inferences and comparisons. The risk of rejecting the null hypothesis increases when multiple hypotheses are tested using the same data. Surgeons should be aware of the available approaches and considerations to take into account multiplicity in the statistical plan or protocol of their clinical and basic science research studies. This strategy will improve their study design and ensure the most appropriate analysis of their data. Not adjusting for multiple comparisons can lead to misleading presentation of evidence to the surgical research community because of exaggerating treatment differences or effects. Review article. N/A [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
169. A Three-Treatment Two-Stage Design for Selection of a Candidate Formulation and Subsequent Demonstration of Bioequivalence.
- Author
-
Fuglsang, Anders
- Abstract
This paper introduces a two-stage bioequivalence design involving the selection of one out of two candidate formulations at an initial stage and quantifies the overall power (chance of ultimately showing bioequivalence) in a range of scenarios with CVs ranging from 0.1 to 1. The methods introduced are derivates of the methods invented in 2008 by Diane Potvin and co-workers (Pharm Stat. 7(4): 245-262, 2008). The idea is to test the two candidate formulations independently in an initial stage, making a selection of one of these formulations basis of the observed point estimates, and to run, when necessary, a second stage of the trial with pooling of data. Alpha levels are identified which are shown to control the maximum type I error at 5%. Results, expressed as powers and sample sizes, are also published for scenarios where the two formulations are far apart in terms of the match against the reference (one GMR being 0.80, the other GMR being 0.95) and in scenarios where the two test formulations have an actual better match (one GMR being 0.90, the other GMR being 0.95). The methods seem to be compliant with wording of present guidelines from EMA, FDA, WHO, and Health Canada. Therefore the work presented here may be useful for companies developing drugs whose approval hinges on in vivo proof of bioequivalence and where traditional in vitro screening methods (such as dissolution trials) may have poor ability to predict the best candidate. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
170. “Repeated sampling from the same population?” A critique of Neyman and Pearson’s responses to Fisher.
- Author
-
Rubin, Mark
- Abstract
Fisher (1945a, 1945b, 1955, 1956, 1960) criticised the Neyman-Pearson approach to hypothesis testing by arguing that it relies on the assumption of “repeated sampling from the same population.” The present article considers the responses to this criticism provided by Pearson (1947) and Neyman (1977). Pearson interpreted alpha levels in relation to imaginary replications of the original test. This interpretation is appropriate when test users are sure that their replications will be equivalent to one another. However, by definition, scientific researchers do not possess sufficient knowledge about the relevant and irrelevant aspects of their tests and populations to be sure that their replications will be equivalent to one another. Pearson also interpreted the alpha level as a personal rule that guides researchers’ behavior during hypothesis testing. However, this interpretation fails to acknowledge that the same researcher may use different alpha levels in different testing situations. Addressing this problem, Neyman proposed that the average alpha level adopted by a particular researcher can be viewed as an indicator of that researcher’s typical Type I error rate. Researchers’ average alpha levels may be informative from a metascientific perspective. However, they are not useful from a scientific perspective. Scientists are more concerned with the error rates of specific tests of specific hypotheses, rather than the error rates of their colleagues. It is concluded that neither Neyman nor Pearson adequately rebutted Fisher’s “repeated sampling” criticism. Fisher’s significance testing approach is briefly considered as an alternative to the Neyman-Pearson approach. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
171. Comparisons of global tests on intersection hypotheses and their application in matched parallel gatekeeping procedures.
- Author
-
Ouyang, John, Zhang, Peter, Carroll, Kevin J., Lee, Jennifer, and Koch, Gary
- Subjects
- *
FALSE positive error , *TREATMENT effectiveness , *DRUG control , *HYPOTHESIS - Abstract
A clinical trial often has primary and secondary endpoints and comparisons of high and low doses of a study drug to a control. Multiplicity is not only caused by the multiple comparisons of study drugs versus the control, but also from the hierarchical structure of the hypotheses. Closed test procedures were proposed as general methods to address multiplicity. Two commonly used tests for intersection hypotheses in closed test procedures are the Simes test and the average method. When the treatment effect of a less efficacious dose is not much smaller than the treatment effect of a more efficacious dose for a specific endpoint, the average method has better power than the Simes test for the comparison of two doses versus control. Accordingly, for inferences for primary and secondary endpoints, the matched parallel gatekeeping procedure based on the Simes test for testing intersection hypotheses is extended here to allow the average method for such testing. This procedure is further extended to clinical trials with more than two endpoints as well as to clinical trials with more than two active doses and a control. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
172. START: single‐to‐double arm transition design for phase II clinical trials.
- Author
-
Shi, Haolun, Zhang, Teng, and Yin, Guosheng
- Subjects
- *
CLINICAL trials , *PHASE transitions , *EXPERIMENTAL design , *TREATMENT effectiveness , *FALSE positive error - Abstract
Summary: Phase II clinical trials designed for evaluating a drug's treatment effect can be either single‐arm or double‐arm. A single‐arm design tests the null hypothesis that the response rate of a new drug is lower than a fixed threshold, whereas a double‐arm scheme takes a more objective comparison of the response rate between the new treatment and the standard of care through randomization. Although the randomized design is the gold standard for efficacy assessment, various situations may arise where a single‐arm pilot study prior to a randomized trial is necessary. To combine the single‐ and double‐arm phases and pool the information together for better decision making, we propose a Single‐To‐double ARm Transition design (START) with switching hypotheses tests, where the first stage compares the new drug's response rate with a minimum required level and imposes a continuation criterion, and the second stage utilizes randomization to determine the treatment's superiority. We develop a software package in R to calibrate the frequentist error rates and perform simulation studies to assess the trial characteristics. Finally, a metastatic pancreatic cancer trial is used for illustrating the decision rules under the proposed START design. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
173. Using dual‐criteria methods to supplement visual inspection: Replication and extension.
- Author
-
Falligant, John Michael, McNulty, Molly K., Hausman, Nicole L., and Rooker, Griffin W.
- Subjects
- *
MEDICAL research , *PROBABILITY theory , *STATISTICAL hypothesis testing , *DATA analysis , *RESEARCH bias , *CASE-control method , *PUBLICATION bias - Abstract
The dual‐criteria and conservative dual‐criteria methods effectively supplement visual analysis with both simulated and published datasets. However, extant research evaluating the probability of observing false positive outcomes with published data may be affected by case selection bias and publication bias. Thus, the probability of obtaining false positive outcomes using these methods with data collected in the course of clinical care is unknown. We extracted baseline data from clinical datasets using a consecutive controlled case‐series design and calculated the proportion of false positive outcomes for baseline phases of various lengths. Results replicated previous findings from Lanovaz, Huxley, and Dufour (2017), as the proportion of false positive outcomes generally decreased as the number of points in Phase B (but not Phase A) increased using both methods. Extending these findings, results also revealed differences in the rate of false positive outcomes across different types of baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
174. Assessing the Batch Effects on Design and Analysis of Equivalence and Noninferiority Studies.
- Author
-
Liao, Jason J. Z., Yu, Ziji, Jiang, Xinhua, and Heyse, Joseph F.
- Subjects
- *
FALSE positive error , *DATA analysis - Abstract
Batch effects are the sources of variation in drug substances and drug products in terms of potency. Although the implication of batch effects has been widely recognized in statistical literature, the batch variability information is usually not considered in designing and analyzing clinical studies. In this article, the impact of batch variability is systematically explored for both the design and data analysis stages of equivalence and noninferiority clinical studies. Designing studies including more batches can increase the probability of success for demonstrating equivalence or noninferiority, while maintaining the control of Type I error. Ignoring the batch effect in the data analysis may cause marked underestimation of the variability which can lead to Type I error inflation. To achieve a desired precision of the treatment estimate, a formula was provided to select appropriate number of batches. The datasets from a phase I oncology study, and a phase III vaccine study are used to illustrate the importance of considering batch effects. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
175. P-values - a chronic conundrum.
- Author
-
Gao, Jian
- Subjects
- *
FALSE positive error , *STATISTICAL hypothesis testing , *NULL hypothesis , *MEDICAL research , *SCIENTIFIC community - Abstract
Background: In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain language and concrete examples, this paper is intended to elucidate the p-value confusion from its root, to explicate the difference between significance and hypothesis testing, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p-value.Main Text: The confusion with p-values has plagued the research community and medical practitioners for decades. However, efforts to clarify it have been largely futile, in part, because intuitive yet mathematically rigorous educational materials are scarce. Additionally, the lack of a practical alternative to the p-value for guarding against randomness also plays a role. The p-value confusion is rooted in the misconception of significance and hypothesis testing. Most, including many statisticians, are unaware that p-values and significance testing formed by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson. And most otherwise great statistics textbooks tend to cobble the two paradigms together and make no effort to elucidate the subtle but fundamental differences between them. The p-value is a practical tool gauging the "strength of evidence" against the null hypothesis. It informs investigators that a p-value of 0.001, for example, is stronger than 0.05. However, p-values produced in significance testing are not the probabilities of type I errors as commonly misconceived. For a p-value of 0.05, the chance a treatment does not work is not 5%; rather, it is at least 28.9%.Conclusions: A long-overdue effort to understand p-values correctly is much needed. However, in medical research and practice, just banning significance testing and accepting uncertainty are not enough. Researchers, clinicians, and patients alike need to know the probability a treatment will or will not work. Thus, the calibrated p-values (the probability that a treatment does not work) should be reported in research papers. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
176. Do we need to adjust for interim analyses in a Bayesian adaptive trial design?
- Author
-
Ryan, Elizabeth G., Brock, Kristian, Gates, Simon, and Slade, Daniel
- Subjects
- *
FALSE positive error , *BAYESIAN analysis , *EXPERIMENTAL design , *TREATMENT effectiveness , *DECISION making - Abstract
Background: Bayesian adaptive methods are increasingly being used to design clinical trials and offer several advantages over traditional approaches. Decisions at analysis points are usually based on the posterior distribution of the treatment effect. However, there is some confusion as to whether control of type I error is required for Bayesian designs as this is a frequentist concept.Methods: We discuss the arguments for and against adjusting for multiplicities in Bayesian trials with interim analyses. With two case studies we illustrate the effect of including interim analyses on type I/II error rates in Bayesian clinical trials where no adjustments for multiplicities are made. We propose several approaches to control type I error, and also alternative methods for decision-making in Bayesian clinical trials.Results: In both case studies we demonstrated that the type I error was inflated in the Bayesian adaptive designs through incorporation of interim analyses that allowed early stopping for efficacy and without adjustments to account for multiplicity. Incorporation of early stopping for efficacy also increased the power in some instances. An increase in the number of interim analyses that only allowed early stopping for futility decreased the type I error, but also decreased power. An increase in the number of interim analyses that allowed for either early stopping for efficacy or futility generally increased type I error and decreased power.Conclusions: Currently, regulators require demonstration of control of type I error for both frequentist and Bayesian adaptive designs, particularly for late-phase trials. To demonstrate control of type I error in Bayesian adaptive designs, adjustments to the stopping boundaries are usually required for designs that allow for early stopping for efficacy as the number of analyses increase. If the designs only allow for early stopping for futility then adjustments to the stopping boundaries are not needed to control type I error. If one instead uses a strict Bayesian approach, which is currently more accepted in the design and analysis of exploratory trials, then type I errors could be ignored and the designs could instead focus on the posterior probabilities of treatment effects of clinically-relevant values. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
177. Methods to control the empirical type I error rate in average bioequivalence tests for highly variable drugs.
- Author
-
Deng, Yuhao and Zhou, Xiao-Hua
- Subjects
- *
FALSE positive error , *GENERIC drugs , *ERROR rates , *PHARMACOLOGY , *EMPIRICAL research , *DRUGS - Abstract
Average bioequivalence tests are used in clinical trials to determine whether a generic drug has the same effect as an original drug in the population. For highly variable drugs whose intra-subject variances of direct drug effects are high, extra criteria are needed in bioequivalence studies. Currently used average bioequivalence tests for highly variable drugs recommended by the European Medicines Agency and the US Food and Drug Administration use sample estimators in the null hypotheses of interest. They cannot control the empirical type I error rate, so the consumer's risk is higher than the predetermined level. In this paper, we propose two new statistically sound methods that can control the empirical type I error rate without involving any sample estimators in the null hypotheses. In the proposed methods, we consider the average level of direct drug effects and the intra-subject variance of the direct drug effects. The first proposed method tests the latter parameter first to determine whether a product should be regarded as a highly variable drug, and then tests the former using corresponding bioequivalence limits. The second proposed method tests these two parameters simultaneously to capture the bioequivalence region. Extensive simulations are done to compare these methods. The simulation results show that the proposed methods have good performance on controlling the empirical type I error rate. The proposed methods are useful for pharmaceutical manufacturers and regulators. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
178. Design, analysis and reporting of multi-arm trials and strategies to address multiple testing.
- Author
-
Odutayo, Ayodele, Gryaznov, Dmitry, Copsey, Bethan, Monk, Paul, Speich, Benjamin, Roberts, Corran, Vadher, Karan, Dutton, Peter, Briel, Matthias, Hopewell, Sally, Altman, Douglas G, group, and the ASPIRE study, and the ASPIRE study group, and ASPIRE study group
- Subjects
- *
FALSE positive error , *CRIME & the press , *BONFERRONI correction , *ERROR rates , *DECISION making , *RESEARCH ethics , *EXPERIMENTAL design , *STATISTICS , *CLINICAL trials , *LONGITUDINAL method - Abstract
Background: It is unclear how multiple treatment comparisons are managed in the analysis of multi-arm trials, particularly related to reducing type I (false positive) and type II errors (false negative).Methods: We conducted a cohort study of clinical-trial protocols that were approved by research ethics committees in the UK, Switzerland, Germany and Canada in 2012. We examined the use of multiple-testing procedures to control the overall type I error rate. We created a decision tool to determine the need for multiple-testing procedures. We compared the result of the decision tool to the analysis plan in the protocol. We also compared the pre-specified analysis plans in trial protocols to their publications.Results: Sixty-four protocols for multi-arm trials were identified, of which 50 involved multiple testing. Nine of 50 trials (18%) used a single-step multiple-testing procedures such as a Bonferroni correction and 17 (38%) used an ordered sequence of primary comparisons to control the overall type I error. Based on our decision tool, 45 of 50 protocols (90%) required use of a multiple-testing procedure but only 28 of the 45 (62%) accounted for multiplicity in their analysis or provided a rationale if no multiple-testing procedure was used. We identified 32 protocol-publication pairs, of which 8 planned a global-comparison test and 20 planned a multiple-testing procedure in their trial protocol. However, four of these eight trials (50%) did not use the global-comparison test. Likewise, 3 of the 20 trials (15%) did not perform the multiple-testing procedure in the publication. The sample size of our study was small and we did not have access to statistical-analysis plans for the included trials in our study.Conclusions: Strategies to reduce type I and type II errors are inconsistently employed in multi-arm trials. Important analytical differences exist between planned analyses in clinical-trial protocols and subsequent publications, which may suggest selective reporting of analyses. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
179. p-value Problems? An Examination of Evidential Value in Criminology.
- Author
-
Wooditch, Alese, Fisher, Ryan, Wu, Xiaoyun, and Johnson, Nicole J.
- Subjects
- *
MOTIVATIONAL interviewing , *FALSE positive error , *DRUG control , *CRIMINOLOGY , *POLICE legitimacy , *PROCEDURAL justice - Abstract
Objectives: This study aims to assess the evidential value of the knowledgebase in criminology after accounting for the presence of potential Type I errors. Methods: The present study examines the distribution of 1248 p-values (that inform 84 statistically significant outcomes across 26 systematic reviews) in meta-analyses on the topic of crime and justice published by the Campbell Collaboration (CC) using p-curve analysis. Results: The distribution of all CC p-values have a significant cluster of p-values immediately below 0.05, which is indicative of p-hacking. Evidential value (right skewed p-curves) is detected in most meta-analytic topic areas but not motivational interviewing (substance use outcome), sex offender treatment (sexual/general recidivism), police legitimacy (procedural justice), street-level drug law enforcement (total crime), and treatment effectiveness in secure corrections (juvenile recidivism). Conclusions: More studies, especially carefully designed and implemented randomized experiments with sufficiently large sample sizes, are needed before we are able to affirm the presence of evidential value and replicability of studies in all CC topic areas with confidence. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
180. Outcome Reporting Bias in Randomized Experiments on Substance Use Disorders.
- Author
-
Wooditch, Alese, Sloas, Lincoln B., Wu, Xiaoyun, and Key, Aleisha
- Subjects
- *
SUBSTANCE-induced disorders , *CLINICAL trial registries , *FALSE positive error , *SUBSTANCE abuse , *CLINICAL trials , *CRIMINOLOGY - Abstract
Objectives: This cohort study explores the prevalence and effect of suspected outcome reporting bias in clinical trials on substance use disorders. Methods: Protocols on the ClinicalTrials.gov registry are compared with the corresponding trial reports for 95 clinical trials across 3,162 outcomes. Variation in average effect size is examined by completeness and accuracy of reporting using ordinary least squares regression with robust standard errors (Eicker-Huber-White sandwich estimator). Results: Trials reports are frequently incomplete and inconsistent with their protocol. The most commonly practiced biased reporting practices are added outcomes not prespecified on the protocol, insufficiently pre-specifying outcomes, and omitting outcomes that were pre-specified on the protocol. There is a linear trend between the number of different biased reporting practices the trialist(s) engaged in and mean study-level Cohen's d (+ 0.214 with each additional type of biased reporting practice). Trials with omitted pre-specified outcomes have a significantly higher Cohen's d on average when compared to trials that did not omit such outcomes (+ 0.315). Added outcomes have a Cohen's d that is 0.385 higher in comparison to reported outcomes that were pre-specified on the protocol. Conclusions: The magnitude of outcome reporting bias raises considerable concern regarding inflated type I error rates. Implications for clinical trials on substance abuse, and randomized experiments in criminology, more generally, are discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
181. The harmonic mean χ2‐test to substantiate scientific findings.
- Author
-
Held, Leonhard
- Subjects
TREATMENT effectiveness ,MEDICAL equipment ,FALSE positive error ,CHI-squared test ,DEGREES of freedom ,HEART failure - Abstract
Summary: Statistical methodology plays a crucial role in drug regulation. Decisions by the US Food and Drug Administration or European Medicines Agency are typically made based on multiple primary studies testing the same medical product, where the two‐trials rule is the standard requirement, despite shortcomings. A new approach is proposed for this task based on the harmonic mean of the squared study‐specific test statistics. Appropriate scaling ensures that, for any number of independent studies, the null distribution is a χ2‐distribution with 1 degree of freedom. This gives rise to a new method for combining one‐sided p‐values and calculating confidence intervals for the overall treatment effect. Further properties are discussed and a comparison with the two‐trials rule is made, as well as with alternative research synthesis methods. An attractive feature of the new approach is that a claim of success requires each study to be convincing on its own to a certain degree depending on the overall level of significance and the number of studies. The new approach is motivated by and applied to data from five clinical trials investigating the effect of carvedilol for the treatment of patients with moderate to severe heart failure. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
182. Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions.
- Author
-
Zhang, Min, Yu, Youfei, Wang, Shikun, Salvatore, Maxwell, G. Fritsche, Lars, He, Zihuai, and Mukherjee, Bhramar
- Subjects
- *
FALSE positive error , *ERROR rates , *REGRESSION analysis - Abstract
The statistical practice of modeling interaction with two linear main effects and a product term is ubiquitous in the statistical and epidemiological literature. Most data modelers are aware that the misspecification of main effects can potentially cause severe type I error inflation in tests for interactions, leading to spurious detection of interactions. However, modeling practice has not changed. In this article, we focus on the specific situation where the main effects in the model are misspecified as linear terms and characterize its impact on common tests for statistical interaction. We then propose some simple alternatives that fix the issue of potential type I error inflation in testing interaction due to main effect misspecification. We show that when using the sandwich variance estimator for a linear regression model with a quantitative outcome and two independent factors, both the Wald and score tests asymptotically maintain the correct type I error rate. However, if the independence assumption does not hold or the outcome is binary, using the sandwich estimator does not fix the problem. We further demonstrate that flexibly modeling the main effect under a generalized additive model can largely reduce or often remove bias in the estimates and maintain the correct type I error rate for both quantitative and binary outcomes regardless of the independence assumption. We show, under the independence assumption and for a continuous outcome, overfitting and flexibly modeling the main effects does not lead to power loss asymptotically relative to a correctly specified main effect model. Our simulation study further demonstrates the empirical fact that using flexible models for the main effects does not result in a significant loss of power for testing interaction in general. Our results provide an improved understanding of the strengths and limitations for tests of interaction in the presence of main effect misspecification. Using data from a large biobank study "The Michigan Genomics Initiative", we present two examples of interaction analysis in support of our results. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
183. Interim Monitoring for Futility in Clinical Trials With Two Co-Primary Endpoints Using Prediction.
- Author
-
Asakura, Koko, Evans, Scott R., and Hamasaki, Toshimitsu
- Subjects
- *
CLINICAL trials monitoring , *FORECASTING , *TREATMENT effectiveness , *FALSE positive error - Abstract
We discuss using prediction as a flexible and practical approach for monitoring futility in clinical trials with two co-primary endpoints (CPE). This approach is appealing in that it provides quantitative evaluation of potential effect sizes and associated precision, and can be combined with flexible error-spending strategies. We extend prediction of effect size estimates and the construction of predicted intervals to the two CPE case, and illustrate interim futility monitoring of treatment effects using prediction with an example. We also discuss alternative approaches based on the conditional and predictive powers, compare these methods and provide some guidance on the use of prediction for better decision in clinical trials with CPE. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
184. A Geometry-Based Multiple Testing Correction for Contingency Tables by Truncated Normal Distribution.
- Author
-
Basak, Tapati, Nagashima, Kazuhisa, Kajimoto, Satoshi, Kawaguchi, Takahisa, Tabara, Yasuharu, Matsuda, Fumihiko, and Yamada, Ryo
- Abstract
Inference procedure is a critical step of experimental researches to draw scientific conclusions especially in multiple testing. The false positive rate increases unless the unadjusted marginal p-values are corrected. Therefore, a multiple testing correction is necessary to adjust the p-values based on the number of tests to control type I error. We propose a multiple testing correction of MAX-test for a contingency table, where multiple χ
2 -tests are applied based on a truncated normal distribution (TND) estimation method by Botev. The table and tests are defined geometrically by contour hyperplanes in the degrees of freedom (df) dimensional space. A linear algebraic method called spherization transforms the shape of the space, defined by the contour hyperplanes of the distribution of tables sharing the same marginal counts. So, the stochastic distributions of these tables are transformed into a standard multivariate normal distribution in df-dimensional space. Geometrically, the p-value is defined by a convex polytope consisted of truncating hyperplanes of test's contour lines in df-dimensional space. The TND approach of the Botev method was used to estimate the corrected p. Finally, the features of our approach were extracted using a real GWAS data. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
185. Difficulties in benchmarking ecological null models: an assessment of current methods.
- Author
-
Molina, Chai and Stone, Lewi
- Subjects
- *
STATISTICAL hypothesis testing , *ECOLOGICAL models , *FALSE positive error , *BIOTIC communities , *BENCHMARKING (Management) - Abstract
Identifying species interactions and detecting when ecological communities are structured by them is an important problem in ecology and biogeography. Ecologists have developed specialized statistical hypothesis tests to detect patterns indicative of community‐wide processes in their field data. In this respect, null model approaches have proved particularly popular. The freedom allowed in choosing the null model and statistic to construct a hypothesis test leads to a proliferation of possible hypothesis tests from which ecologists can choose to detect these processes. Here, we point out some serious shortcomings of a popular approach to choosing the best hypothesis for the ecological problem at hand that involves benchmarking different hypothesis tests by assessing their performance on artificially constructed data sets. Terminological errors concerning the use of Type I and Type II errors that underlie these approaches are discussed. We argue that the key benchmarking methods proposed in the literature are not a sound guide for selecting null hypothesis tests, and further, that there is no simple way to benchmark null hypothesis tests. Surprisingly, the basic problems identified here do not appear to have been addressed previously, and these methods are still being used to develop and test new null models and summary statistics, from quantifying community structure (e.g., nestedness and modularity) to analyzing ecological networks. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
186. Notes on misspecifying the random effects distribution regarding analysis under the AB/BA crossover trial in dichotomous data – a Monte Carlo evaluation.
- Author
-
Zhu, Lixia and Lui, Kung-Jong
- Subjects
- *
MONTE Carlo method , *CROSSOVER trials , *FALSE positive error , *RANDOM effects model , *LOGISTIC regression analysis , *REGRESSION analysis - Abstract
When patient responses are dichotomous under an AB/BA design, we commonly assume the normal random effects logistic regression model. This normality assumption for random effects is, however, unlikely to hold. Based on the maximum likelihood estimator, we apply Monte Carlo simulations to investigate the impact of misspecification on hypothesis testing and estimation because of the incorrect normal assumption for random effects. We find that Type I error is not affected by misspecifying the normal random effects distributions. We further find that the influence due to misspecifying distributions of random effects on power, bias and mean-squared-error, the coverage probability and the average length of the confidence interval is generally minimal when the variation of responses between patients is small and the number of patients per group is large. This influence can be substantial when the variation of responses between patients is large and the number of patients per group is small. Thus, the estimate of the required sample size or the accuracy of an interval estimator under the normal random effects assumption can be liberal. We use the data comparing a drug with placebo in treating cerebrovascular deficiency to illustrate the potential difference in inference between various random effects distributions considered here. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
187. Response adaptive randomization procedures in seamless phase II/III clinical trials.
- Author
-
Zhu, Hongjian, Piao, Jin, Lee, J. Jack, Hu, Feifang, and Zhang, Lixin
- Subjects
- *
FALSE positive error , *CLINICAL trials , *MATHEMATICAL statistics , *ERROR rates , *ASYMPTOTIC distribution , *MARTINGALES (Mathematics) - Abstract
It is desirable to work efficiently and cost effectively to evaluate new therapies in a time-sensitive and ethical manner without compromising the integrity and validity of the development process. The seamless phase II/III clinical trial has been proposed to meet this need, and its efficient, ethical and economic advantages can be strengthened by its combination with innovative response adaptive randomization (RAR) procedures. In particular, well-designed frequentist RAR procedures can target theoretically optimal allocation proportions, and there are explicit asymptotic results. However, there has been little research into seamless phase II/III clinical trials with frequentist RAR because of the difficulty in performing valid statistical inference and controlling the type I error rate. In this paper, we propose the framework for a family of frequentist RAR designs for seamless phase II/III trials, derive the asymptotic distribution of the parameter estimators using martingale processes and offer solutions to control the type I error rate. The numerical studies demonstrate our theoretical findings and the advantages of the proposed methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
188. MANOVA: A Procedure Whose Time Has Passed?
- Author
-
Huang, Francis L.
- Subjects
- *
MULTIVARIATE analysis , *ANALYSIS of variance , *EDUCATIONAL psychology , *FALSE positive error - Abstract
Multivariate analysis of variance (MANOVA) is a statistical procedure commonly used in fields such as education and psychology. However, MANOVA's popularity may actually be for the wrong reasons. The large majority of published research using MANOVA focus on univariate research questions rather than on the multivariate questions that MANOVA is said to specifically address. Given the more complicated and limited nature of interpreting MANOVA effects (which researchers may not actually be interested in given the actual post hoc strategies employed) and that various flexible and well-known statistical alternatives are available, I suggest that researchers consult these better known, robust, and flexible procedures instead, given the proper match with the research question of interest. Just because a researcher has multiple dependent variables of interest does not mean that a MANOVA should be used at all. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
189. Group-Sequential Three-Arm Non-inferiority Clinical Trials
- Author
-
Hamasaki, Toshimitsu, Asakura, Koko, Evans, Scott R., Ochiai, Toshimitsu, Hamasaki, Toshimitsu, Asakura, Koko, Evans, Scott R., and Ochiai, Toshimitsu
- Published
- 2016
- Full Text
- View/download PDF
190. Interim Evaluation of Efficacy in Clinical Trials with Two Co-primary Endpoints
- Author
-
Hamasaki, Toshimitsu, Asakura, Koko, Evans, Scott R., Ochiai, Toshimitsu, Hamasaki, Toshimitsu, Asakura, Koko, Evans, Scott R., and Ochiai, Toshimitsu
- Published
- 2016
- Full Text
- View/download PDF
191. Sample Size Recalculation in Clinical Trials with Two Co-primary Endpoints
- Author
-
Hamasaki, Toshimitsu, Asakura, Koko, Evans, Scott R., Ochiai, Toshimitsu, Hamasaki, Toshimitsu, Asakura, Koko, Evans, Scott R., and Ochiai, Toshimitsu
- Published
- 2016
- Full Text
- View/download PDF
192. Interim Evaluation of Efficacy or Futility in Clinical Trials with Two Co-primary Endpoints
- Author
-
Hamasaki, Toshimitsu, Asakura, Koko, Evans, Scott R., Ochiai, Toshimitsu, Hamasaki, Toshimitsu, Asakura, Koko, Evans, Scott R., and Ochiai, Toshimitsu
- Published
- 2016
- Full Text
- View/download PDF
193. Basics of Biostatistics
- Author
-
Iocca, Oreste and Iocca, Oreste, editor
- Published
- 2016
- Full Text
- View/download PDF
194. Hypothesis Testing
- Author
-
Lee, Cheng-Few, Lee, John, Chang, Jow-Ran, Tai, Tzu, Lee, Cheng-Few, Lee, John, Chang, Jow-Ran, and Tai, Tzu
- Published
- 2016
- Full Text
- View/download PDF
195. The effect of autocorrelation when performing theapproximated permutation test
- Author
-
Schouten, Amerik, Thor, Linus, Schouten, Amerik, and Thor, Linus
- Abstract
In meteorology, permutation tests are a commonly recommended tool because standard paramet-ric methods of inference often is insufficient when analysing weather. Therefore other methodswith less stringent assumptions like permutation testing is used. Time series data of weathercontains both temporal and spatial autocorrelation that may violate the exchangeability assump-tion of the permutation test. This paper explores the effect of autocorrelation in data on theapproximated permutation test. From a literary study it is found that the type I error of the test,i.e. rejecting the test while the null is true, increases when the assumption of exchangeability isviolated. A simulation study was then preformed on three meteorological variables, temperature,wind and precipitation in Iberia following a cold spell in the US. This increase in the type I er-ror was not found to be significant in the simulated data with autocorrelation ranging from 0 to 0.9.
- Published
- 2023
196. Improved family-wise error rate control in multiple equivalence testing
- Author
-
Leday, Gwenaël G.R., Hemerik, Jesse, Engel, Jasper, van der Voet, Hilko, Leday, Gwenaël G.R., Hemerik, Jesse, Engel, Jasper, and van der Voet, Hilko
- Abstract
Equivalence testing is an important component of safety assessments, used for example by the European Food Safety Authority, to allow new food or feed products on the market. The aim of such tests is to demonstrate equivalence of characteristics of test and reference crops. Equivalence tests are typically univariate and applied to each measured analyte (characteristic) separately without multiplicity correction. This increases the probability of making false claims of equivalence (type I errors) when evaluating multiple analytes simultaneously. To solve this problem, familywise error rate (FWER) control using Hochberg's method has been proposed. This paper demonstrates that, in the context of equivalence testing, other FWER-controlling methods are more powerful than Hochberg's. Particularly, it is shown that Hommel's method is guaranteed to perform at least as well as Hochberg's and that an “adaptive” version of Bonferroni's method, which uses an estimator of the proportion of non-equivalent characteristics, often substantially outperforms Hommel's method. Adaptive Bonferroni takes better advantage of the particular context of food safety where a large proportion of true equivalences is expected, a situation where other methods are particularly conservative. The different methods are illustrated by their application to two compositional datasets and further assessed and compared using simulated data.
- Published
- 2023
197. CONSTRUCT MEASUREMENT IN STRATEGIC MANAGEMENT RESEARCH: ILLUSION OR REALITY?
- Author
-
Boyd, Brian K., Gove, Steve, and Hitt, Michael A.
- Subjects
STRATEGIC planning ,MANAGEMENT ,RESEARCH methodology ,MANAGEMENT science ,MEASUREMENT-model comparison ,STATISTICAL reliability ,MEASUREMENT errors ,BUSINESS planning ,MATHEMATICAL models - Abstract
Strategic management research has been characterized as placing less emphasis on construct measurement than other management subfields. In this work, we document the state of the art of measurement in strategic management research, and discuss the implications for interpreting the results of research in this field. To assess the breadth of measurement issues in the discipline, we conducted a content analysis of empirical strategic management articles published in leading journals in the period of 1998-2000. We found that few studies discuss reliability and validity issues, and empirical research in the field commonly relies on single-indicator measures. Additionally, studies rarely address the problems of attenuation due to measurement error. We close with a discussion of the implications for future research and for interpreting prior work in strategic management. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
198. DFIT: An R Package for Raju's Differential Functioning of Items and Tests Framework
- Author
-
Víctor H. Cervantes
- Subjects
DFIT framework ,differential item functioning ,type I error ,power calculation ,analytical standard error ,Statistics ,HA1-4737 - Abstract
This paper presents DFIT, an R package that implements the differential functioning of items and tests framework as well as the Monte Carlo item parameter replication approach for producing cut-off points for differential item functioning indices. Furthermore, it illustrates how to use the package to calculate power for the NCDIF index, both post hoc, as has regularly been the case in differential item functioning empirical and simulation studies, as well as a priori given certain item parameters. The version reviewed here implements all DFIT indices and Raju's area measures for tests comprised of items modeled with the same parametric item response unidimensional model (1-, 2-, and 3-parameters, generalized partial credit model or graded response model), the Mantel-Haenszel statistic with an underlying dichotomous item response model, and the item parameter replication method for any of the estimated indices with dichotomous item response models.
- Published
- 2017
- Full Text
- View/download PDF
199. Formation, Testing of Hypothesis and Confidence Interval in Medical Research
- Author
-
Vasudevan, Senthilvel
- Subjects
type I error ,type II error ,confidence interval ,hypothesis testing ,medical research - Abstract
Background: Statistics which help us in arriving at the criterion for such decisions in any research. Hypothesis means assumptions. It is an important activity of pharmacy or medical fields and its related research. Materials and Methods: Statistical inferences play an important role in biological statistical tests and arriving at some conclusion. Some suitable examples were also workout in this section. Results: Confidence intervals provide a method of stating the precision or closeness of the sample statistics. It contains lower and upper limits. Conclusion: We concluded that the hypothesis is very much useful and essential tool in medical, nursing, pharmacy and other science and biomedical sciences as well as in its research fields. Some numerical illustrations with suitable examples also there. Key Words:hypothesis testing, type I error, type II error, confidence interval, medical research
- Published
- 2022
- Full Text
- View/download PDF
200. Optimizing cut-off grade considering grade estimation uncertainty - A case study of Witwatersrand gold-producing areas
- Author
-
Birch, C.C.
- Subjects
Type II error ,Excel Solver ,Type I error ,Materials Chemistry ,Metals and Alloys ,mixed-integer linear programming ,uncertainty ,simulation ,@Risk ,Geotechnical Engineering and Engineering Geology ,cut-off grade ,optimization ,NPV - Abstract
Due to grade estimation uncertainty, two statistical errors can occur. The Type I error is where material is classified as ore and mined, despite the true value being below the break-even grade. This material is dilution. The Type II error is where the material is estimated to be below the cut-off grade and is classified as waste, although the true grade is actually above the break-even grade. This material is not mined and is lost. The uncertainty was assumed to follow a normal distribution in a previous study. For this study, estimated block values are compared to those determined after mining (the best estimate of the true grade). This actual data from four mines shows that the uncertainty follows a Laplace distribution. There is no single solution regarding adjusting the cut-off grade away from the break-even grade, considering estimation uncertainty, that could be applied to all gold mines. However, adjusting the cut-off grade downwards (up to 22% for one mine) is noted when optimizing the profit considering grade uncertainties. This type of adjustment could open up significant mining areas and extend the life of the mine.
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.