148 results
Search Results
2. Letter to the editor regarding the paper 'New weighting methods when cases are only a subset of events in a nested case‐control study' by Qian M. Zhou, Xuan Wang, Yingye Zheng, and Tianxi Cai
- Author
-
Dominic Edelmann, Kristin Ohneberg, Natalia Becker, Axel Benner, and Martin Schumacher
- Subjects
Statistics and Probability ,General Medicine ,Statistics, Probability and Uncertainty - Published
- 2023
- Full Text
- View/download PDF
3. Regarding Paper “Multiple testing with discrete data: Proportion of true null hypotheses and two adaptive FDR procedures” by Xiongzhi Chen, Rebecca W. Doerge, and Joseph F. Heyse
- Author
-
Biswas, Aniket, primary
- Published
- 2020
- Full Text
- View/download PDF
4. Editorial for the discussion papers on the p -value controversy
- Author
-
Alfò, Marco, primary and Böhning, Dankmar, additional
- Published
- 2017
- Full Text
- View/download PDF
5. Contribution to the discussion of the paper by Stefan Wellek: “A critical evaluation of the current p -value controversy”
- Author
-
Farcomeni, Alessio, primary
- Published
- 2017
- Full Text
- View/download PDF
6. Contribution to the discussion of the paper by Stefan Wellek: 'A critical evaluation of the current p -value controversy'
- Author
-
Alessio Farcomeni
- Subjects
Statistics and Probability ,Research design ,MEDLINE ,General Medicine ,01 natural sciences ,Type III error ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Econometrics ,P-hacking ,030212 general & internal medicine ,p-value ,0101 mathematics ,Statistics, Probability and Uncertainty ,Current (fluid) ,Mathematical economics ,Mathematics - Published
- 2017
- Full Text
- View/download PDF
7. Editorial for the discussion papers on the p -value controversy
- Author
-
Dankmar Böhning and Marco Alfò
- Subjects
Data Analysis ,Statistics and Probability ,Psychoanalysis ,Philosophy ,Publications ,General Medicine ,Biostatistics ,Periodicals as Topic ,Statistics, Probability and Uncertainty - Published
- 2017
- Full Text
- View/download PDF
8. Modeling and computation of multistep batch testing for infectious diseases
- Author
-
Xiaolin Li, Haoran Jiang, and Hongshik Ahn
- Subjects
Statistics and Probability ,Mathematical optimization ,optimal batch size ,Computer science ,Monte Carlo method ,coronavirus ,sample pooling ,specificity ,Communicable Diseases ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,COVID-19 Testing ,0302 clinical medicine ,Humans ,Computer Simulation ,030212 general & internal medicine ,Sensitivity (control systems) ,0101 mathematics ,Pandemics ,Flexibility (engineering) ,SARS-CoV-2 ,Numerical analysis ,COVID-19 ,Statistical model ,General Medicine ,sensitivity ,Research Papers ,Nonlinear system ,Variable (computer science) ,False positive rate ,Statistics, Probability and Uncertainty ,Research Paper - Abstract
We propose a mathematical model based on probability theory to optimize COVID‐19 testing by a multistep batch testing approach with variable batch sizes. This model and simulation tool dramatically increase the efficiency and efficacy of the tests in a large population at a low cost, particularly when the infection rate is low. The proposed method combines statistical modeling with numerical methods to solve nonlinear equations and obtain optimal batch sizes at each step of tests, with the flexibility to incorporate geographic and demographic information. In theory, this method substantially improves the false positive rate and positive predictive value as well. We also conducted a Monte Carlo simulation to verify this theory. Our simulation results show that our method significantly reduces the false negative rate. More accurate assessment can be made if the dilution effect or other practical factors are taken into consideration. The proposed method will be particularly useful for the early detection of infectious diseases and prevention of future pandemics. The proposed work will have broader impacts on medical testing for contagious diseases in general.
- Published
- 2021
- Full Text
- View/download PDF
9. Construction and assessment of prediction rules for binary outcome in the presence of missing predictor data using multiple imputation and cross‐validation: Methodological approach and data‐based evaluation
- Author
-
Erika Banzato, Liesbeth C. de Wreede, and Bart Mertens
- Subjects
cross‐validation ,Statistics and Probability ,Biometry ,multiple imputation ,Computer science ,Logistic regression ,computer.software_genre ,cross-validation ,01 natural sciences ,binary outcome ,Cross-validation ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Software ,030212 general & internal medicine ,Imputation (statistics) ,0101 mathematics ,Analysis of Variance ,Estimation theory ,business.industry ,Binary outcome ,prediction ,General Medicine ,Missing data ,Research Papers ,Brier score ,Calibration ,Data mining ,Statistics, Probability and Uncertainty ,business ,computer ,Research Paper - Abstract
We investigate calibration and assessment of predictive rules when missing values are present in the predictors. Our paper has two key objectives. The first is to investigate how the calibration of the prediction rule can be combined with use of multiple imputation to account for missing predictor observations. The second objective is to propose such methods that can be implemented with current multiple imputation software, while allowing for unbiased predictive assessment through validation on new observations for which outcome is not yet available. We commence with a review of the methodological foundations of multiple imputation as a model estimation approach as opposed to a purely algorithmic description. We specifically contrast application of multiple imputation for parameter (effect) estimation with predictive calibration. Based on this review, two approaches are formulated, of which the second utilizes application of the classical Rubin's rules for parameter estimation, while the first approach averages probabilities from models fitted on single imputations to directly approximate the predictive density for future observations. We present implementations using current software that allow for validation and estimation of performance measures by cross‐validation, as well as imputation of missing data in predictors on the future data where outcome is missing by definition. To simplify, we restrict discussion to binary outcome and logistic regression throughout. Method performance is verified through application on two real data sets. Accuracy (Brier score) and variance of predicted probabilities are investigated. Results show substantial reductions in variation of calibrated probabilities when using the first approach.
- Published
- 2020
- Full Text
- View/download PDF
10. A comparison of methods for analysing multiple outcome measures in randomised controlled trials using a simulation study
- Author
-
Rumana Z Omar, Gareth Ambler, and Victoria Vickerstaff
- Subjects
Statistics and Probability ,Multiple outcome ,Multivariate statistics ,Computer science ,multiple endpoints ,multivariate model ,Intervention effect ,randomised controlled trials ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Bias ,Outcome Assessment, Health Care ,Statistics ,Computer Simulation ,030212 general & internal medicine ,Imputation (statistics) ,0101 mathematics ,Randomized Controlled Trials as Topic ,multiple outcomes ,Univariate ,Outcome measures ,General Medicine ,Missing data ,Research Design ,Data Interpretation, Statistical ,Pairwise comparison ,Statistics, Probability and Uncertainty ,Trial and Survey Methodology ,Research Paper - Abstract
Multiple primary outcomes are sometimes collected and analysed in randomised controlled trials (RCTs), and are used in favour of a single outcome. By collecting multiple primary outcomes, it is possible to fully evaluate the effect that an intervention has for a given disease process. A simple approach to analysing multiple outcomes is to consider each outcome separately, however, this approach does not account for any pairwise correlations between the outcomes. Any cases with missing values must be ignored, unless an additional imputation step is performed. Alternatively, multivariate methods that explicitly model the pairwise correlations between the outcomes may be more efficient when some of the outcomes have missing values. In this paper, we present an overview of relevant methods that can be used to analyse multiple outcome measures in RCTs, including methods based on multivariate multilevel (MM) models. We perform simulation studies to evaluate the bias in the estimates of the intervention effects and the power of detecting true intervention effects observed when using selected methods. Different simulation scenarios were constructed by varying the number of outcomes, the type of outcomes, the degree of correlations between the outcomes and the proportions and mechanisms of missing data. We compare multivariate methods to univariate methods with and without multiple imputation. When there are strong correlations between the outcome measures (ρ > .4), our simulation studies suggest that there are small power gains when using the MM model when compared to analysing the outcome measures separately. In contrast, when there are weak correlations (ρ < .4), the power is reduced when using univariate methods with multiple imputation when compared to analysing the outcome measures separately.
- Published
- 2020
- Full Text
- View/download PDF
11. Validation of discrete time‐to‐event prediction models in the presence of competing risks
- Author
-
Jean-François Timsit, Rachel Heyard, Leonhard Held, University of Zurich, and Heyard, Rachel
- Subjects
Statistics and Probability ,Biometry ,area under the curve ,Computer science ,Calibration (statistics) ,Mean squared prediction error ,610 Medicine & health ,Machine learning ,computer.software_genre ,Competing risks ,Risk Assessment ,competing events ,Intensive care ,Humans ,1804 Statistics, Probability and Uncertainty ,2613 Statistics and Probability ,discrete time‐to‐event model ,Event (probability theory) ,validation ,prediction error ,Models, Statistical ,business.industry ,Statistics ,calibration slope ,Pneumonia, Ventilator-Associated ,Probability and statistics ,10060 Epidemiology, Biostatistics and Prevention Institute (EBPI) ,General Medicine ,Research Papers ,Intensive Care Units ,Discrete time and continuous time ,dynamic prediction models ,Calibration ,Pseudomonas aeruginosa ,Probability and Uncertainty ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,computer ,Predictive modelling ,Research Paper - Abstract
Clinical prediction models play a key role in risk stratification, therapy assignment and many other fields of medical decision making. Before they can enter clinical practice, their usefulness has to be demonstrated using systematic validation. Methods to assess their predictive performance have been proposed for continuous, binary, and time-to-event outcomes, but the literature on validation methods for discrete time-to-event models with competing risks is sparse. The present paper tries to fill this gap and proposes new methodology to quantify discrimination, calibration, and prediction error (PE) for discrete time-to-event outcomes in the presence of competing risks. In our case study, the goal was to predict the risk of ventilator-associated pneumonia (VAP) attributed to Pseudomonas aeruginosa in intensive care units (ICUs). Competing events are extubation, death, and VAP due to other bacteria. The aim of this application is to validate complex prediction models developed in previous work on more recently available validation data. Keywords: area under the curve; calibration slope; competing events; discrete time-to-event model; dynamic prediction models; prediction error; validation.
- Published
- 2019
- Full Text
- View/download PDF
12. On the relation between the cause‐specific hazard and the subdistribution rate for competing risks data: The Fine–Gray model revisited
- Author
-
Hans C. van Houwelingen, Martin Schumacher, and Hein Putter
- Subjects
Statistics and Probability ,Biometry ,cause-specific hazard ,media_common.quotation_subject ,Inference ,Competing risks ,Risk Assessment ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Covariate ,cause‐specific hazard ,Econometrics ,030212 general & internal medicine ,0101 mathematics ,cumulative incidence ,Reduction factor ,competing risks ,media_common ,Analysis of Variance ,Models, Statistical ,subdistribution hazard ,General Medicine ,proportional hazards ,Research Papers ,Feeling ,Subdistribution hazard ,Statistics, Probability and Uncertainty ,Psychology ,Gray (horse) ,Cause specific hazard ,Research Paper - Abstract
The Fine–Gray proportional subdistribution hazards model has been puzzling many people since its introduction. The main reason for the uneasy feeling is that the approach considers individuals still at risk for an event of cause 1 after they fell victim to the competing risk of cause 2. The subdistribution hazard and the extended risk sets, where subjects who failed of the competing risk remain in the risk set, are generally perceived as unnatural . One could say it is somewhat of a riddle why the Fine–Gray approach yields valid inference. To take away these uneasy feelings, we explore the link between the Fine–Gray and cause‐specific approaches in more detail. We introduce the reduction factor as representing the proportion of subjects in the Fine–Gray risk set that has not yet experienced a competing event. In the presence of covariates, the dependence of the reduction factor on a covariate gives information on how the effect of the covariate on the cause‐specific hazard and the subdistribution hazard relate. We discuss estimation and modeling of the reduction factor, and show how they can be used in various ways to estimate cumulative incidences, given the covariates. Methods are illustrated on data of the European Society for Blood and Marrow Transplantation.
- Published
- 2020
- Full Text
- View/download PDF
13. Effect size measures and their benchmark values for quantifying benefit or risk of medicinal products
- Author
-
Volker W. Rahlfs and Helmuth Zimmermann
- Subjects
Statistics and Probability ,effect size measures ,Biometry ,Drug-Related Side Effects and Adverse Reactions ,Distribution (number theory) ,transformation of measures ,binary ,Value (computer science) ,Risk Assessment ,01 natural sciences ,Measure (mathematics) ,Normal distribution ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Statistics ,030212 general & internal medicine ,0101 mathematics ,continuous data ,Proportional Hazards Models ,Mathematics ,General Biometry ,Stochastic Processes ,Absolute risk reduction ,Mann–Whitney measure ,ordinal ,General Medicine ,clinical relevance ,Benchmarking ,Strictly standardized mean difference ,Binary data ,Benchmark (computing) ,Statistics, Probability and Uncertainty ,Research Paper - Abstract
The standardized mean difference is a well‐known effect size measure for continuous, normally distributed data. In this paper we present a general basis for important other distribution families. As a general concept, usable for every distribution family, we introduce the relative effect, also called Mann–Whitney effect size measure of stochastic superiority. This measure is a truly robust measure, needing no assumptions about a distribution family. It is thus the preferred tool for assumption‐free, confirmatory studies. For normal distribution shift, proportional odds, and proportional hazards, we show how to derive many global values such as risk difference average, risk difference extremum, and odds ratio extremum. We demonstrate that the well‐known benchmark values of Cohen with respect to group differences—small, medium, large—can be translated easily into corresponding Mann–Whitney values. From these, we get benchmarks for parameters of other distribution families. Furthermore, it is shown that local measures based on binary data (2 × 2 tables) can be associated with the Mann–Whitney measure: The concept of stochastic superiority can always be used. It is a general statistical value in every distribution family. It therefore yields a procedure for standardizing the assessment of effect size measures. We look at the aspect of relevance of an effect size and—introducing confidence intervals—present some examples for use in statistical practice.
- Published
- 2019
- Full Text
- View/download PDF
14. A flexible design for advanced Phase I/II clinical trials with continuous efficacy endpoints
- Author
-
Thomas Jaki and Pavel Mozgunov
- Subjects
Statistics and Probability ,nonmonotonic efficacy ,Biometry ,Endpoint Determination ,Computer science ,Machine learning ,computer.software_genre ,01 natural sciences ,Outcome (game theory) ,010104 statistics & probability ,03 medical and health sciences ,Clinical Trials, Phase II as Topic ,0302 clinical medicine ,Neoplasms ,combination trial ,Advanced phase ,Humans ,Clinical Trials ,Molecular Targeted Therapy ,030212 general & internal medicine ,0101 mathematics ,Clinical Trials, Phase I as Topic ,business.industry ,General Medicine ,Clinical trial ,continuous endpoint ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,computer ,Phase I/II clinical trial ,Research Paper - Abstract
There is growing interest in integrated Phase I/II oncology clinical trials involving molecularly targeted agents (MTA). One of the main challenges of these trials are nontrivial dose–efficacy relationships and administration of MTAs in combination with other agents. While some designs were recently proposed for such Phase I/II trials, the majority of them consider the case of binary toxicity and efficacy endpoints only. At the same time, a continuous efficacy endpoint can carry more information about the agent's mechanism of action, but corresponding designs have received very limited attention in the literature. In this work, an extension of a recently developed information‐theoretic design for the case of a continuous efficacy endpoint is proposed. The design transforms the continuous outcome using the logistic transformation and uses an information–theoretic argument to govern selection during the trial. The performance of the design is investigated in settings of single‐agent and dual‐agent trials. It is found that the novel design leads to substantial improvements in operating characteristics compared to a model‐based alternative under scenarios with nonmonotonic dose/combination–efficacy relationships. The robustness of the design to missing/delayed efficacy responses and to the correlation in toxicity and efficacy endpoints is also investigated.
- Published
- 2019
- Full Text
- View/download PDF
15. Interim analysis incorporating short‐ and long‐term binary endpoints
- Author
-
Cornelia Ursula Kunz, Julia Niewczas, and Franz König
- Subjects
Statistics and Probability ,combination test ,conditional power ,Biometry ,Time Factors ,Endpoint Determination ,Computer science ,01 natural sciences ,sample size reassessment ,Normal-inverse Gaussian distribution ,010104 statistics & probability ,03 medical and health sciences ,futility stopping ,0302 clinical medicine ,adaptive designs ,Drug Discovery ,Statistics ,Clinical endpoint ,Humans ,030212 general & internal medicine ,0101 mathematics ,Clinical Trials as Topic ,Estimator ,General Medicine ,Function (mathematics) ,Interim analysis ,Other Topics ,Term (time) ,Sample size determination ,Statistics, Probability and Uncertainty ,GLMs and Discrete Responses ,Research Paper ,Type I and type II errors - Abstract
Designs incorporating more than one endpoint have become popular in drug development. One of such designs allows for incorporation of short‐term information in an interim analysis if the long‐term primary endpoint has not been yet observed for some of the patients. At first we consider a two‐stage design with binary endpoints allowing for futility stopping only based on conditional power under both fixed and observed effects. Design characteristics of three estimators: using primary long‐term endpoint only, short‐term endpoint only, and combining data from both are compared. For each approach, equivalent cut‐off point values for fixed and observed effect conditional power calculations can be derived resulting in the same overall power. While in trials stopping for futility the type I error rate cannot get inflated (it usually decreases), there is loss of power. In this study, we consider different scenarios, including different thresholds for conditional power, different amount of information available at the interim, different correlations and probabilities of success. We further extend the methods to adaptive designs with unblinded sample size reassessments based on conditional power with inverse normal method as the combination function. Two different futility stopping rules are considered: one based on the conditional power, and one from P‐values based on Z‐statistics of the estimators. Average sample size, probability to stop for futility and overall power of the trial are compared and the influence of the choice of weights is investigated.
- Published
- 2019
- Full Text
- View/download PDF
16. Implementation of AMNOG: An industry perspective
- Author
-
Friedhelm Leverkus and Christy Chuang-Stein
- Subjects
Statistics and Probability ,Drug Industry ,Operations research ,Endpoint Determination ,Parliament ,Process (engineering) ,media_common.quotation_subject ,Early benefit assessment ,Comparator ,01 natural sciences ,Phase (combat) ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Statutory law ,Humans ,030212 general & internal medicine ,0101 mathematics ,media_common ,Actuarial science ,business.industry ,Perspective (graphical) ,AMNOG ,General Medicine ,Endpoint ,Research Papers ,Subgroup ,Negotiation ,Clinical Trials, Phase III as Topic ,Additional benefit ,Net benefit ,New product development ,Government Regulation ,Business ,Statistics, Probability and Uncertainty ,Research Paper - Abstract
In 2010, the Federal Parliament (Bundestag) of Germany passed a new law (Arzneimittelmarktneuordnungsgesetz, AMNOG) on the regulation of medicinal products that applies to all pharmaceutical products with active ingredients that are launched beginning January 1, 2011. The law describes the process to determine the price at which an approved new product will be reimbursed by the statutory health insurance system. The process consists of two phases. The first phase assesses the additional benefit of the new product versus an appropriate comparator (zweckmäßige Vergleichstherapie, zVT). The second phase involves price negotiation. Focusing on the first phase, this paper investigates requirements of benefit assessment of a new product under this law with special attention on the methods applied by the German authorities on issues such as the choice of the comparator, patient relevant endpoints, subgroup analyses, extent of benefit, determination of net benefit, primary and secondary endpoints, and uncertainty of the additional benefit. We propose alternative approaches to address the requirements in some cases and invite other researchers to help develop solutions in other cases.
- Published
- 2015
- Full Text
- View/download PDF
17. Methodological approach to determine minor, considerable, and major treatment effects in the early benefit assessment of new drugs
- Author
-
Ralf Bender, Guido Skipka, Jürgen Windeler, Stefan Lange, Beate Wieseler, Thomas M. Kaiser, and Stefanie Thomas
- Subjects
Magnitude of effects ,Statistics and Probability ,Biometry ,Drug Industry ,media_common.quotation_subject ,Early benefit assessment ,Context (language use) ,Risk Assessment ,03 medical and health sciences ,0302 clinical medicine ,Drug Therapy ,Statutory law ,Health care ,Humans ,Quality (business) ,030212 general & internal medicine ,Drug Approval ,media_common ,Actuarial science ,Operationalization ,business.industry ,030503 health policy & services ,AMNOG ,General Medicine ,Research Papers ,Clinical relevance ,restrict ,Value (economics) ,Government Regulation ,Business ,Statistics, Probability and Uncertainty ,0305 other medical science ,Monte Carlo Method ,Shifted hypotheses ,Research Paper ,Added benefit - Abstract
At the beginning of 2011, the early benefit assessment of new drugs was introduced in Germany with the Act on the Reform of the Market for Medicinal Products (AMNOG). The Federal Joint Committee (G‐BA) generally commissions the Institute for Quality and Efficiency in Health Care (IQWiG) with this type of assessment, which examines whether a new drug shows an added benefit (a positive patient‐relevant treatment effect) over the current standard therapy. IQWiG is required to assess the extent of added benefit on the basis of a dossier submitted by the pharmaceutical company responsible. In this context, IQWiG was faced with the task of developing a transparent and plausible approach for operationalizing how to determine the extent of added benefit. In the case of an added benefit, the law specifies three main extent categories (minor, considerable, major). To restrict value judgements to a minimum in the first stage of the assessment process, an explicit and abstract operationalization was needed. The present paper is limited to the situation of binary data (analysis of 2 × 2 tables), using the relative risk as an effect measure. For the treatment effect to be classified as a minor, considerable, or major added benefit, the methodological approach stipulates that the (two‐sided) 95% confidence interval of the effect must exceed a specified distance to the zero effect. In summary, we assume that our approach provides a robust, transparent, and thus predictable foundation to determine minor, considerable, and major treatment effects on binary outcomes in the early benefit assessment of new drugs in Germany. After a decision on the added benefit of a new drug by G‐BA, the classification of added benefit is used to inform pricing negotiations between the umbrella organization of statutory health insurance and the pharmaceutical companies.
- Published
- 2015
- Full Text
- View/download PDF
18. Blinded and unblinded sample size reestimation in crossover trials balanced for period
- Author
-
Adrian Mander, Michael J. Grayling, and James Wason
- Subjects
Statistics and Probability ,Biometry ,Randomization ,Blinding ,Computer science ,Crossover ,Kaplan-Meier Estimate ,01 natural sciences ,Statistics, Nonparametric ,sample size reestimation ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Statistics ,Humans ,030212 general & internal medicine ,0101 mathematics ,Clinical Trials as Topic ,Cross-Over Studies ,Models, Statistical ,Clinical study design ,Estimator ,Issues in Complex Clinical Trials ,General Medicine ,Crossover study ,blinded ,Sample size determination ,Sample Size ,Adaptive design ,Heart Transplantation ,Regression Analysis ,crossover trial ,Statistics, Probability and Uncertainty ,internal pilot study ,Research Paper - Abstract
The determination of the sample size required by a crossover trial typically depends on the specification of one or more variance components. Uncertainty about the value of these parameters at the design stage means that there is often a risk a trial may be under‐ or overpowered. For many study designs, this problem has been addressed by considering adaptive design methodology that allows for the re‐estimation of the required sample size during a trial. Here, we propose and compare several approaches for this in multitreatment crossover trials. Specifically, regulators favor reestimation procedures to maintain the blinding of the treatment allocations. We therefore develop blinded estimators for the within and between person variances, following simple or block randomization. We demonstrate that, provided an equal number of patients are allocated to sequences that are balanced for period, the proposed estimators following block randomization are unbiased. We further provide a formula for the bias of the estimators following simple randomization. The performance of these procedures, along with that of an unblinded approach, is then examined utilizing three motivating examples, including one based on a recently completed four‐treatment four‐period crossover trial. Simulation results show that the performance of the proposed blinded procedures is in many cases similar to that of the unblinded approach, and thus they are an attractive alternative.
- Published
- 2018
- Full Text
- View/download PDF
19. Bayesian variable selection logistic regression with paired proteomic measurements
- Author
-
Bart Mertens and Alexia Kakourou
- Subjects
Proteomics ,0301 basic medicine ,Statistics and Probability ,Inference ,Feature selection ,isotope clusters ,Bayesian inference ,Logistic regression ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,Component (UML) ,Humans ,paired measurements ,added-value assessment ,0101 mathematics ,Selection (genetic algorithm) ,mass spectrometry ,Mathematics ,General Biometry ,Bayesian variable selection ,Models, Statistical ,business.industry ,Bayes Theorem ,Pattern recognition ,prediction ,General Medicine ,Expression (mathematics) ,Pancreatic Neoplasms ,Logistic Models ,030104 developmental biology ,added‐value assessment ,Predictive power ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,Research Paper - Abstract
We explore the problem of variable selection in a case‐control setting with mass spectrometry proteomic data consisting of paired measurements. Each pair corresponds to a distinct isotope cluster and each component within pair represents a summary of isotopic expression based on either the intensity or the shape of the cluster. Our objective is to identify a collection of isotope clusters associated with the disease outcome and at the same time assess the predictive added‐value of shape beyond intensity while maintaining predictive performance. We propose a Bayesian model that exploits the paired structure of our data and utilizes prior information on the relative predictive power of each source by introducing multiple layers of selection. This allows us to make simultaneous inference on which are the most informative pairs and for which—and to what extent—shape has a complementary value in separating the two groups. We evaluate the Bayesian model on pancreatic cancer data. Results from the fitted model show that most predictive potential is achieved with a subset of just six (out of 1289) pairs while the contribution of the intensity components is much higher than the shape components. To demonstrate how the method behaves under a controlled setting we consider a simulation study. Results from this study indicate that the proposed approach can successfully select the truly predictive pairs and accurately estimate the effects of both components although, in some cases, the model tends to overestimate the inclusion probability of the second component.
- Published
- 2018
- Full Text
- View/download PDF
20. Multiple imputation for discrete data: Evaluation of the joint latent normal model
- Author
-
Matteo Quartagno and James R. Carpenter
- Subjects
Statistics and Probability ,Biometry ,multiple imputation ,Computer science ,Binary number ,Breast Neoplasms ,Multivariate normal distribution ,Machine learning ,computer.software_genre ,01 natural sciences ,missing data ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Humans ,030212 general & internal medicine ,Imputation (statistics) ,0101 mathematics ,Categorical variable ,joint model ,General Biometry ,Models, Statistical ,business.industry ,Location model ,General Medicine ,Missing data ,R package ,Logistic Models ,Multivariate Analysis ,latent normal model ,categorical data ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,computer ,Research Paper ,Count data - Abstract
Missing data are ubiquitous in clinical and social research, and multiple imputation (MI) is increasingly the methodology of choice for practitioners. Two principal strategies for imputation have been proposed in the literature: joint modelling multiple imputation (JM‐MI) and full conditional specification multiple imputation (FCS‐MI). While JM‐MI is arguably a preferable approach, because it involves specification of an explicit imputation model, FCS‐MI is pragmatically appealing, because of its flexibility in handling different types of variables. JM‐MI has developed from the multivariate normal model, and latent normal variables have been proposed as a natural way to extend this model to handle categorical variables. In this article, we evaluate the latent normal model through an extensive simulation study and an application on data from the German Breast Cancer Study Group, comparing the results with FCS‐MI. We divide our investigation in four sections, focusing on (i) binary, (ii) categorical, (iii) ordinal, and (iv) count data. Using data simulated from both the latent normal model and the general location model, we find that in all but one extreme general location model setting JM‐MI works very well, and sometimes outperforms FCS‐MI. We conclude the latent normal model, implemented in the R package jomo, can be used with confidence by researchers, both for single and multilevel multiple imputation.
- Published
- 2019
- Full Text
- View/download PDF
21. Does it help that efficacy has been proven once we start discussing (added) benefit?
- Author
-
Yvonne Ziert and Armin Koch
- Subjects
Statistics and Probability ,Process (engineering) ,Decision Making ,Biostatistics ,030204 cardiovascular system & hematology ,Assessment of clinical trials ,Formal proof ,Drug licensing ,03 medical and health sciences ,0302 clinical medicine ,Humans ,030212 general & internal medicine ,Reimbursement ,Licensure ,Actuarial science ,General Medicine ,Research Papers ,Cardiovascular Diseases ,Insurance, Health, Reimbursement ,Position (finance) ,Business ,Statistics, Probability and Uncertainty ,Research Paper - Abstract
Since the introduction of benefit assessment to support reimbursement decisions in Germany there seems to be the impression that totally distinct methodology and strategies for decision making would apply in the field of drug licensing and reimbursement. In this article, the position is held that, while decisions may differ due to differing mandates of drug licensing and reimbursement bodies, the underlying strategies are quite similar. For this purpose, we briefly summarize the legal basis for decision making in both fields from a methodological point of view, and review two recent decisions about reimbursement regarding grounds for approval. We comment on two examples, where decision making was based on the same pivotal studies in the licensing and reimbursement process. We conclude that strategies in the field of reimbursement are (from a methodological standpoint) until now more liberal than established rules in the field of drug licensing, but apply the same principles. Formal proof of efficacy preceding benefit assessment can thus be understood as a gatekeeper against principally wrong decision making about efficacy and risks of new drugs in full recognition that more is needed. We elaborate on the differences between formal proof of efficacy on the one hand and the assessment of benefit/risk or added benefit on the other hand, because it is important for statisticians to understand the difference between the two approaches.
- Published
- 2015
- Full Text
- View/download PDF
22. Bayesian hierarchical modelling of continuous non‐negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter
- Author
-
Mike P. Toms, Ruth King, Benjamin Thomas Swallow, Stephen T. Buckland, EPSRC, University of St Andrews. School of Mathematics and Statistics, University of St Andrews. Scottish Oceans Institute, University of St Andrews. Centre for Research into Ecological & Environmental Modelling, University of St Andrews. Marine Alliance for Science & Technology Scotland, and University of St Andrews. St Andrews Sustainability Institute
- Subjects
0106 biological sciences ,Statistics and Probability ,MCMC ,QH301 Biology ,Bayesian probability ,NDAS ,reversible jump MCMC ,010603 evolutionary biology ,01 natural sciences ,Birds ,QH301 ,Tweedie distributions ,010104 statistics & probability ,symbols.namesake ,Surveys and Questionnaires ,Tweedie distribution ,Statistics ,Econometrics ,Animals ,Bayesian hierarchical modeling ,QA Mathematics ,0101 mathematics ,QA ,R2C ,Bayesian hierarchical model ,Continuous nonnegative data ,Mathematics ,Models, Statistical ,Behavior, Animal ,Population size ,Bayes Theorem ,Regression analysis ,Markov chain Monte Carlo ,General Medicine ,Research Papers ,Excess zeros ,Markov Chains ,symbols ,Regression Analysis ,Spike (software development) ,Seasons ,Statistics, Probability and Uncertainty ,BDC ,Gardens ,Algorithms ,Research Paper ,Count data - Abstract
The development of methods for dealing with continuous data with a spike at zero has lagged behind those for overdispersed or zero‐inflated count data. We consider longitudinal ecological data corresponding to an annual average of 26 weekly maximum counts of birds, and are hence effectively continuous, bounded below by zero but also with a discrete mass at zero. We develop a Bayesian hierarchical Tweedie regression model that can directly accommodate the excess number of zeros common to this type of data, whilst accounting for both spatial and temporal correlation. Implementation of the model is conducted in a Markov chain Monte Carlo (MCMC) framework, using reversible jump MCMC to explore uncertainty across both parameter and model spaces. This regression modelling framework is very flexible and removes the need to make strong assumptions about mean‐variance relationships a priori. It can also directly account for the spike at zero, whilst being easily applicable to other types of data and other model formulations. Whilst a correlative study such as this cannot prove causation, our results suggest that an increase in an avian predator may have led to an overall decrease in the number of one of its prey species visiting garden feeding stations in the United Kingdom. This may reflect a change in behaviour of house sparrows to avoid feeding stations frequented by sparrowhawks, or a reduction in house sparrow population size as a result of sparrowhawk increase. Publisher PDF
- Published
- 2015
- Full Text
- View/download PDF
23. Median estimation of chemical constituents for sampling on two occasions under a log‐normal model
- Author
-
Athanassios Kondylis
- Subjects
Statistics and Probability ,Time Factors ,Composite median ,Regression estimator ,Population ,Monte Carlo method ,Miscellanea ,Chemical compounds ,Smoke ,Statistics ,Statistical inference ,education ,Mathematics ,Estimation ,education.field_of_study ,Models, Statistical ,Sampling (statistics) ,Regression analysis ,Tobacco Products ,General Medicine ,Sampling with partial replacement ,Population model ,Chemical constituents ,Regression Analysis ,Model‐assisted inference ,Statistics, Probability and Uncertainty ,Monte Carlo Method ,Research Paper - Abstract
Sampling from a finite population on multiple occasions introduces dependencies between the successive samples when overlap is designed. Such sampling designs lead to efficient statistical estimates, while they allow estimating changes over time for the targeted outcomes. This makes them very popular in real‐world statistical practice. Sampling with partial replacement can also be very efficient in biological and environmental studies where estimation of toxicants and its trends over time is the main interest. Sampling with partial replacement is designed here on two occasions in order to estimate the median concentration of chemical constituents quantified by means of liquid chromatography coupled with tandem mass spectrometry. Such data represent relative peak areas resulting from the chromatographic analysis. They are therefore positive‐valued and skewed data, and are commonly fitted very well by the log‐normal model. A log‐normal model is assumed here for chemical constituents quantified in mainstream cigarette smoke in a real case study. Combining design‐based and model‐based approaches for statistical inference, we seek for the median estimation of chemical constituents by sampling with partial replacement on two time occasions. We also discuss the limitations of extending the proposed approach to other skewed population models. The latter is investigated by means of a Monte Carlo simulation study.
- Published
- 2015
- Full Text
- View/download PDF
24. Contribution to the discussion of 'When should meta-analysis avoid making hidden normality assumptions?'
- Author
-
Heinzl, Harald and Mittlboeck, Martina
- Subjects
Discussion Paper ,Statistics and Probability ,Research design ,Computer science ,business.industry ,media_common.quotation_subject ,MEDLINE ,General Medicine ,computer.software_genre ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,Discussion: When should meta‐analysis avoid making hidden normality assumptions? ,0302 clinical medicine ,Meta-analysis ,030212 general & internal medicine ,Artificial intelligence ,0101 mathematics ,Statistics, Probability and Uncertainty ,business ,computer ,Natural language processing ,Normality ,media_common - Published
- 2018
- Full Text
- View/download PDF
25. Explaining the optimistic performance evaluation of newly proposed methods: A cross‐design validation experiment
- Author
-
Christina Nießl, Sabine Hoffmann, Theresa Ullmann, and Anne‐Laure Boulesteix
- Subjects
Methodology (stat.ME) ,FOS: Computer and information sciences ,Statistics and Probability ,General Medicine ,Statistics, Probability and Uncertainty ,Statistics - Methodology - Abstract
The constant development of new data analysis methods in many fields of research is accompanied by an increasing awareness that these new methods often perform better in their introductory paper than in subsequent comparison studies conducted by other researchers. We attempt to explain this discrepancy by conducting a systematic experiment that we call "cross-design validation of methods". In the experiment, we select two methods designed for the same data analysis task, reproduce the results shown in each paper, and then re-evaluate each method based on the study design (i.e., data sets, competing methods, and evaluation criteria) that was used to show the abilities of the other method. We conduct the experiment for two data analysis tasks, namely cancer subtyping using multi-omic data and differential gene expression analysis. Three of the four methods included in the experiment indeed perform worse when they are evaluated on the new study design, which is mainly caused by the different data sets. Apart from illustrating the many degrees of freedom existing in the assessment of a method and their effect on its performance, our experiment suggests that the performance discrepancies between original and subsequent papers may not only be caused by the non-neutrality of the authors proposing the new method but also by differences regarding the level of expertise and field of application.
- Published
- 2023
- Full Text
- View/download PDF
26. Categories, components, and techniques in a modular construction of basket trials for application and further research
- Author
-
Meinhard Kieser, Johannes Krisam, and Moritz Pohl
- Subjects
Statistics and Probability ,First contact ,animal structures ,Computer science ,business.industry ,General Medicine ,Decision rule ,In-basket test ,Modular construction ,Modular design ,Precision medicine ,Notation ,Data science ,body regions ,nervous system ,Research Design ,Frequentist inference ,embryonic structures ,Humans ,Statistics, Probability and Uncertainty ,business ,Medical Futility - Abstract
Basket trials have become a virulent topic in medical and statistical research during the last decade. The core idea of them is to treat patients, who express the same genetic predisposition-either personally or their disease-with the same treatment irrespective of the location of the disease. The location of the disease defines each basket and the pathway of the treatment uses the common genetic predisposition among the baskets. This opens the opportunity to share information among baskets, which can consequently increase the information of the basket-wise response with respect to the investigated treatment. This further allows dynamic decisions regarding futility and efficacy of individual baskets during the ongoing trial. Several statistical designs have been proposed on how a basket trial can be conducted and this has left an unclear situation with many options. The different designs propose different mathematical and statistical techniques, different decision rules, and also different trial purposes. This paper presents a broad overview of existing designs, categorizes them, and elaborates their similarities and differences. A uniform and consistent notation facilitates the first contact, introduction, and understanding of the statistical methodologies and techniques used in basket trials. Finally, this paper presents a modular approach for the construction of basket trials in applied medical science and forms a base for further research of basket trial designs and their techniques.
- Published
- 2021
- Full Text
- View/download PDF
27. One‐two dependence and probability inequalities between one‐ and two‐sided union‐intersection tests
- Author
-
Helmut Finner and Markus Roters
- Subjects
Statistics and Probability ,Inequality ,Intersection (set theory) ,media_common.quotation_subject ,Mathematical statistics ,General Medicine ,Type (model theory) ,01 natural sciences ,Empirical distribution function ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Goodness of fit ,Multiple comparisons problem ,Econometrics ,030212 general & internal medicine ,0101 mathematics ,Statistics, Probability and Uncertainty ,Random variable ,Probability ,Mathematics ,media_common - Abstract
In a paper published in 1939 in The Annals of Mathematical Statistics, Wald and Wolfowitz discussed the possible validity of a probability inequality between one- and two-sided coverage probabilities for the empirical distribution function. Twenty-eight years later, Vandewiele and Noé proved this inequality for Kolmogorov-Smirnov type goodness of fit tests. We refer to this type of inequality as one-two inequality. In this paper, we generalize their result for one- and two-sided union-intersection tests based on positively associated random variables and processes. Thereby, we give a brief review of different notions of positive association and corresponding results. Moreover, we introduce the notion of one-two dependence and discuss relationships with other dependence concepts. While positive association implies one-two dependence, the reverse implication fails. Last but not least, the Bonferroni inequality and the one-two inequality yield lower and upper bounds for two-sided acceptance/rejection probabilities which differ only slightly for significance levels not too large. We discuss several examples where the one-two inequality applies. Finally, we briefly discuss the possible impact of the validity of a one-two inequality on directional error control in multiple testing.
- Published
- 2021
- Full Text
- View/download PDF
28. Discussion on 'Correct and logical causal inference for binary and time‐to‐event outcomes in randomized controlled trials' by Yi Liu, Bushi Wang, Miao Yang, Jianan Hui, Heng Xu, Siyoen Kil, and Jason C. Hsu
- Author
-
Gene Pennello and Dandan Xu
- Subjects
Statistics and Probability ,Confounding ,Hazard ratio ,Subgroup analysis ,General Medicine ,01 natural sciences ,Odds ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Conditional independence ,Estimand ,Causal inference ,Statistics ,030212 general & internal medicine ,0101 mathematics ,Statistics, Probability and Uncertainty ,Psychology ,Event (probability theory) - Abstract
In their paper, Liu et al. (2020) pointed out illogical discrepancies between subgroup and overall causal effects for some efficacy measures, in particular the odds and hazard ratios. As the authors show, the culprit is subgroups having prognostic effects within treatment arms. In response to their provocative findings, we found that the odds and hazard ratios are logic respecting when the subgroups are purely predictive, that is, the distribution of the potential outcome for the control treatment is homogeneous across subgroups. We also found that when we redefined the odds and hazards ratio causal estimands in terms of the joint distribution of the potential outcomes, the discrepancies are resolved under specific models in which the potential outcomes are conditionally independent. In response to other discussion points in the paper, we also provide remarks on association versus causation, confounding, statistical computing software, and dichotomania.
- Published
- 2020
- Full Text
- View/download PDF
29. Generalized estimating equations approach for spatial lattice data: A case study in adoption of improved maize varieties in Mozambique
- Author
-
Lourenço Manuel and João Domingos Scalon
- Subjects
Statistics and Probability ,Generalized linear model ,Covariance matrix ,General Medicine ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Quasi-likelihood ,Autoregressive model ,Statistics ,Covariate ,030212 general & internal medicine ,0101 mathematics ,Statistics, Probability and Uncertainty ,Generalized estimating equation ,Random variable ,Spatial analysis ,Mathematics - Abstract
Generalized estimating equations (GEE) are extension of generalized linear models (GLM) widely applied in longitudinal data analysis. GEE are also applied in spatial data analysis using geostatistics methods. In this paper, we advocate application of GEE for spatial lattice data by modeling the spatial working correlation matrix using the Moran's index and the spatial weight matrix. We present theoretical developments and results for simulated and actual data as well. For the former case, 1,000 samples of a random variable (response variable) defined in (0, 1) interval were generated using different values of the Moran's index. In addition, 1,000 samples of a binary and a continuous variable were also randomly generated as covariates. In each sample, three structures of spatial working correlation matrices were used while modeling: The independent, autoregressive, and the Toeplitz structure. Two measures were used to evaluate the performance of each of the spatial working correlation structures: the asymptotic relative efficiency and the working correlation selection criterions. The results showed that both measures indicated that the autoregressive spatial working correlation matrix proposed in this paper presents the best performance in general. For the actual data case, the proportion of small farmers who used improved maize varieties was considered as the response variable and a set of nine variables were used as covariates. Two structures of spatial working correlation matrices were used and the results showed consistence with those obtained in the simulation study.
- Published
- 2020
- Full Text
- View/download PDF
30. IV estimation without distributional assumptions
- Author
-
Theis Lange and Aksel Karl Georg Jensen
- Subjects
Statistics and Probability ,Estimation ,Biometry ,Computer science ,Instrumental variable ,General Medicine ,01 natural sciences ,Outcome (probability) ,010104 statistics & probability ,03 medical and health sciences ,Variable (computer science) ,0302 clinical medicine ,Bounding overwatch ,Causal inference ,Key (cryptography) ,Econometrics ,030212 general & internal medicine ,Point estimation ,0101 mathematics ,Statistics, Probability and Uncertainty - Abstract
It is widely known that Instrumental Variable (IV) estimation allows the researcher to estimate causal effects between an exposure and an outcome even in face of serious uncontrolled confounding. The key requirement for IV estimation is the existence of a variable, the instrument, which only affects the outcome through its effects on the exposure and that the instrument-outcome relationship is unconfounded. Countless papers have employed such techniques and carefully addressed the validity of the IV assumption just mentioned. However, less appreciated is that fact that the IV estimation also depends on a number of distributional assumptions in particular linearities. In this paper, we propose a novel bounding procedure which can bound the true causal effect relying only on the key IV assumption and not on any distributional assumptions. For a purely binary case (instrument, exposure, and outcome all binary), such boundaries have been proposed by Balke and Pearl in 1997. We extend such boundaries to non-binary settings. In addition, our procedure offers a tuning parameter such that one can go from the traditional IV analysis, which provides a point estimate, to a completely unrestricted bound and anything in between. Subject matter knowledge can be used when setting the tuning parameter. To the best of our knowledge, no such methods exist elsewhere. The method is illustrated using a pivotal study which introduced IV estimation to epidemiologists. Here, we demonstrate that the conclusion of this paper indeed hinges on these additional distributional assumptions. R-code is provided in the Supporting Information.
- Published
- 2020
- Full Text
- View/download PDF
31. Predictive functional ANOVA models for longitudinal analysis of mandibular shape changes
- Author
-
Lara Fontanella, Luigi Ippoliti, and Pasquale Valentini
- Subjects
Statistics and Probability ,Multivariate statistics ,Biometry ,Computer science ,Bayesian probability ,Normal Distribution ,Mandible ,Bayesian inference ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,symbols.namesake ,0302 clinical medicine ,Prior probability ,Occlusion ,Humans ,Longitudinal Studies ,030212 general & internal medicine ,0101 mathematics ,Gaussian process ,Models, Statistical ,business.industry ,Pattern recognition ,Statistical model ,General Medicine ,symbols ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business - Abstract
In this paper, we introduce a Bayesian statistical model for the analysis of functional data observed at several time points. Examples of such data include the Michigan growth study where we wish to characterize the shape changes of human mandible profiles. The form of the mandible is often used by clinicians as an aid in predicting the mandibular growth. However, whereas many studies have demonstrated the changes in size that may occur during the period of pubertal growth spurt, shape changes have been less well investigated. Considering a group of subjects presenting normal occlusion, in this paper we thus describe a Bayesian functional ANOVA model that provides information about where and when the shape changes of the mandible occur during different stages of development. The model is developed by defining the notion of predictive process models for Gaussian process (GP) distributions used as priors over the random functional effects. We show that the predictive approach is computationally appealing and that it is useful to analyze multivariate functional data with unequally spaced observations that differ among subjects and times. Graphical posterior summaries show that our model is able to provide a biological interpretation of the morphometric findings and that they comprehensively describe the shape changes of the human mandible profiles. Compared with classical cephalometric analysis, this paper represents a significant methodological advance for the study of mandibular shape changes in two dimensions.
- Published
- 2019
- Full Text
- View/download PDF
32. Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain
- Author
-
María del Mar Rueda, Sara Pasadas‐del‐Amo, Beatriz Cobo Rodríguez, Luis Castro‐Martín, and Ramón Ferri‐García
- Subjects
Statistics and Probability ,General Medicine ,Statistics, Probability and Uncertainty - Abstract
Web surveys have replaced Face-to-Face and computer assisted telephone interviewing (CATI) as the main mode of data collection in most countries. This trend was reinforced as a consequence of COVID-19 pandemic-related restrictions. However, this mode still faces significant limitations in obtaining probability-based samples of the general population. For this reason, most web surveys rely on nonprobability survey designs. Whereas probability-based designs continue to be the gold standard in survey sampling, nonprobability web surveys may still prove useful in some situations. For instance, when small subpopulations are the group under study and probability sampling is unlikely to meet sample size requirements, complementing a small probability sample with a larger nonprobability one may improve the efficiency of the estimates. Nonprobability samples may also be designed as a mean for compensating for known biases in probability-based web survey samples by purposely targeting respondent profiles that tend to be underrepresented in these surveys. This is the case in the Survey on the impact of the COVID-19 pandemic in Spain (ESPACOV) that motivates this paper. In this paper, we propose a methodology for combining probability and nonprobability web-based survey samples with the help of machine-learning techniques. We then assess the efficiency of the resulting estimates by comparing them with other strategies that have been used before. Our simulation study and the application of the proposed estimation method to the second wave of the ESPACOV Survey allow us to conclude that this is the best option for reducing the biases observed in our data.
- Published
- 2022
- Full Text
- View/download PDF
33. Assessment of local influence for the analysis of agreement
- Author
-
Felipe Osorio, Manuel Galea, and Carla Leal
- Subjects
Data Analysis ,Sleep Wake Disorders ,Statistics and Probability ,Biometry ,Maximum likelihood ,Monte Carlo method ,Multivariate normal distribution ,Diagnostic tools ,01 natural sciences ,Clinical study ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Statistics ,Humans ,030212 general & internal medicine ,0101 mathematics ,Probability ,Mathematics ,Clinical Trials as Topic ,Likelihood Functions ,Measurement method ,Models, Statistical ,Estimator ,General Medicine ,Concordance correlation coefficient ,Statistics, Probability and Uncertainty ,Monte Carlo Method - Abstract
The concordance correlation coefficient (CCC) and the probability of agreement (PA) are two frequently used measures for evaluating the degree of agreement between measurements generated by two different methods. In this paper, we consider the CCC and the PA using the bivariate normal distribution for modeling the observations obtained by two measurement methods. The main aim of this paper is to develop diagnostic tools for the detection of those observations that are influential on the maximum likelihood estimators of the CCC and the PA using the local influence methodology but not based on the likelihood displacement. Thus, we derive first- and second-order measures considering the case-weight perturbation scheme. The proposed methodology is illustrated through a Monte Carlo simulation study and using a dataset from a clinical study on transient sleep disorder. Empirical results suggest that under certain circumstances first-order local influence measures may be more powerful than second-order measures for the detection of influential observations.
- Published
- 2019
- Full Text
- View/download PDF
34. K‐Sample comparisons using propensity analysis
- Author
-
Hyun Joo Ahn, Sin-Ho Jung, and Sang Ah Chi
- Subjects
Statistics and Probability ,Biometry ,Endpoint Determination ,Population ,Kaplan-Meier Estimate ,Dunnett's test ,01 natural sciences ,Article ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Statistics ,Humans ,030212 general & internal medicine ,0101 mathematics ,Propensity Score ,education ,Multinomial logistic regression ,Mathematics ,education.field_of_study ,Inverse probability weighting ,Decision Trees ,Regression analysis ,General Medicine ,Observational Studies as Topic ,Propensity score matching ,Regression Analysis ,Observational study ,Statistics, Probability and Uncertainty - Abstract
In this paper, we investigate K-group comparisons on survival endpoints for observational studies. In clinical databases for observational studies, treatment for patients are chosen with probabilities varying depending on their baseline characteristics. This often results in non-comparable treatment groups because of imbalance in baseline characteristics of patients among treatment groups. In order to overcome this issue, we conduct propensity analysis and match the subjects with similar propensity scores across treatment groups or compare weighted group means (or weighted survival curves for censored outcome variables) using the inverse probability weighting (IPW). To this end, multinomial logistic regression has been a popular propensity analysis method to estimate the weights. We propose to use decision tree method as an alternative propensity analysis due to its simplicity and robustness. We also propose IPW rank statistics, called Dunnett-type test and ANOVA-type test, to compare 3 or more treatment groups on survival endpoints. Using simulations, we evaluate the finite sample performance of the weighted rank statistics combined with these propensity analysis methods. We demonstrate these methods with a real data example. The IPW method also allows us for unbiased estimation of population parameters of each treatment group. In this paper, we limit our discussions to survival outcomes, but all the methods can be easily modified for any type of outcomes, such as binary or continuous variables.
- Published
- 2019
- Full Text
- View/download PDF
35. A probabilistic network for the diagnosis of acute cardiopulmonary diseases
- Author
-
Alessandro Magrini, Federico M. Stefanini, and Davide Luciani
- Subjects
FOS: Computer and information sciences ,Lung Diseases ,Statistics and Probability ,Biometry ,Heart Diseases ,Computer science ,Machine Learning (stat.ML) ,02 engineering and technology ,Machine learning ,computer.software_genre ,Statistics - Applications ,03 medical and health sciences ,symbols.namesake ,Bayesian inference ,Belief elicitation ,Beta regression ,Categorical logistic regression ,Latent variables ,Statistics, Probability and Uncertainty ,0302 clinical medicine ,Statistics - Machine Learning ,Joint probability distribution ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,Applications (stat.AP) ,62F15, 62P10 ,030212 general & internal medicine ,Medical diagnosis ,Categorical variable ,Probability ,Cardiopulmonary disease ,business.industry ,Univariate ,Probabilistic logic ,Markov chain Monte Carlo ,General Medicine ,Conditional probability distribution ,Hospitals ,Acute Disease ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Monte Carlo Method ,computer - Abstract
In this paper, the development of a probabilistic network for the diagnosis of acute cardiopulmonary diseases is presented. This paper is a draft version of the article published after peer review in 2018 (https://doi.org/10.1002/bimj.201600206). A panel of expert physicians collaborated to specify the qualitative part, that is a directed acyclic graph defining a factorization of the joint probability distribution of domain variables. The quantitative part, that is the set of all conditional probability distributions defined by each factor, was estimated in the Bayesian paradigm: we applied a special formal representation, characterized by a low number of parameters and a parameterization intelligible for physicians, elicited the joint prior distribution of parameters from medical experts, and updated it by conditioning on a dataset of hospital patient records using Markov Chain Monte Carlo simulation. Refinement was cyclically performed until the probabilistic network provided satisfactory Concordance Index values for a selection of acute diseases and reasonable inference on six fictitious patient cases. The probabilistic network can be employed to perform medical diagnosis on a total of 63 diseases (38 acute and 25 chronic) on the basis of up to 167 patient findings., The DOI of the article published after peer review was added. A technical detail was added in Section 3.2, Formula 8 (as a consequence, the ID of all the subsequent formulas result augmented by 1 with respect to the previous version). The prior standard deviation of the Gamma distribution in Table 4 was fixed (in the previous version, the prior variance was indicated, instead)
- Published
- 2017
- Full Text
- View/download PDF
36. Comparing dependent kappa coefficients obtained on multilevel data
- Subjects
PARADOXES ,HIGH AGREEMENT ,BINARY DATA ,Delta method ,WEIGHTED KAPPA ,Intraclass ,Hierarchical ,STATISTICS ,Clustered bootstrap ,PREVALENCE ,CLUSTERED DATA ,Rater ,RELIABILITY ,COHENS KAPPA ,ESTIMATING EQUATIONS - Abstract
Reliability and agreement are two notions of paramount importance in medical and behavioral sciences. They provide information about the quality of the measurements. When the scale is categorical, reliability and agreement can be quantified through different kappa coefficients. The present paper provides two simple alternatives to more advanced modeling techniques, which are not always adequate in case of a very limited number of subjects, when comparing several dependent kappa coefficients obtained on multilevel data. This situation frequently arises in medical sciences, where multilevel data are common. Dependent kappa coefficients can result from the assessment of the same individuals at various occasions or when each member of a group is compared to an expert, for example. The method is based on simple matrix calculations and is available in the R package multiagree. Moreover, the statistical properties of the proposed method are studied using simulations. Although this paper focuses on kappa coefficients, the method easily extends to other statistical measures.
- Published
- 2017
- Full Text
- View/download PDF
37. Clustering multiply imputed multivariate high-dimensional longitudinal profiles
- Author
-
Paul Dendale, Liesbeth Bruckers, and Geert Molenberghs
- Subjects
Statistics and Probability ,Dimensionality reduction ,Functional data analysis ,02 engineering and technology ,General Medicine ,Function (mathematics) ,Missing data ,computer.software_genre ,01 natural sciences ,Data set ,010104 statistics & probability ,Principal component analysis ,Consensus clustering ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,0101 mathematics ,Statistics, Probability and Uncertainty ,Cluster analysis ,Algorithm ,computer ,Mathematics - Abstract
In this paper, we propose a method to cluster multivariate functional data with missing observations. Analysis of functional data often encompasses dimension reduction techniques such as principal component analysis (PCA). These techniques require complete data matrices. In this paper, the data are completed by means of multiple imputation, and subsequently each imputed data set is submitted to a cluster procedure. The final partition of the data, summarizing the partitions obtained for the imputed data sets, is obtained by means of ensemble clustering. The uncertainty in cluster membership, due to missing data, is characterized by means of the agreement between the members of the ensemble and fuzziness of the consensus clustering. The potential of the method was brought out on the heart failure (HF) data. Daily measurement for four biomarkers (heart rate, diastolic, and systolic blood pressure, weight) were used to cluster the patients. To normalize the distributions of the longitudinal outcomes, the data were transformed with a natural logarithm function. A cubic spline base with 69 basis functions was employed to smooth the profiles. The proposed algorithm indicates the existence of a latent structure and divides the HF patients into two clusters, showing a different evolution in blood pressure values and weight. In general, cluster results are sensitive to choices made. Likewise for the proposed approach, alternative choices for the distance measure, procedure to optimize the objective function, choice of the scree-test threshold, or the number of principal components, to be used in the approximation of the surrogate density, could all influence the final partition. For the HF data set, the final partition depends on the number of principal components used in the procedure.
- Published
- 2017
- Full Text
- View/download PDF
38. Do we consent to rules of consent and confidentiality?
- Author
-
Tim Friede and Karl Wegscheider
- Subjects
Statistics and Probability ,business.industry ,Internet privacy ,General Medicine ,computer.software_genre ,01 natural sciences ,Data sharing ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Informed consent ,Openness to experience ,030211 gastroenterology & hepatology ,Confidentiality ,Data mining ,0101 mathematics ,Statistics, Probability and Uncertainty ,business ,Psychology ,computer - Abstract
Confidentiality and informed consent are important concepts enabling the sharing of sensitive data. In a paper on this topic of this issue, Williams and Pigeot (2017) discuss that these have to be balanced with openness to ensure research standards. In this opinion paper, we give some background on how the paper by Williams and Pigeot evolved, reflect on their concepts, and provide some examples of the application of these concepts in various settings relevant to biostatisticians working in health research.
- Published
- 2017
- Full Text
- View/download PDF
39. Detection of spatial change points in the mean and covariances of multivariate simultaneous autoregressive models
- Author
-
Philipp E. Otto and Wolfgang Schmid
- Subjects
Statistics and Probability ,Nonlinear autoregressive exogenous model ,05 social sciences ,Asymptotic distribution ,General Medicine ,01 natural sciences ,Empirical distribution function ,010104 statistics & probability ,Gumbel distribution ,Autoregressive model ,0502 economics and business ,Statistics ,Test statistic ,Autoregressive integrated moving average ,0101 mathematics ,Statistics, Probability and Uncertainty ,Algorithm ,STAR model ,050205 econometrics ,Mathematics - Abstract
In this paper, we propose a test procedure to detect change points of multidimensional autoregressive processes. The considered process differs from typical applied spatial autoregressive processes in that it is assumed to evolve from a predefined center into every dimension. Additionally, structural breaks in the process can occur at a certain distance from the predefined center. The main aim of this paper is to detect such spatial changes. In particular, we focus on shifts in the mean and the autoregressive parameter. The proposed test procedure is based on the likelihood-ratio approach. Eventually, the goodness-of-fit values of the estimators are compared for different shifts. Moreover, the empirical distribution of the test statistic of the likelihood-ratio test is obtained via Monte Carlo simulations. We show that the generalized Gumbel distribution seems to be a suitable limiting distribution of the proposed test statistic. Finally, we discuss the detection of lung cancer in computed tomography scans and illustrate the proposed test procedure.
- Published
- 2016
- Full Text
- View/download PDF
40. Parameter redundancy in discrete state-space and integrated models
- Author
-
Rachel S. McCrea and Diana J. Cole
- Subjects
0106 biological sciences ,Statistics and Probability ,Joint likelihood ,General Medicine ,010603 evolutionary biology ,01 natural sciences ,010104 statistics & probability ,Multiple data ,Redundancy (information theory) ,Population model ,Statistics ,Identifiability ,0101 mathematics ,Statistics, Probability and Uncertainty ,Probability of survival ,Algorithm ,Mathematics - Abstract
Discrete state-space models are used in ecology to describe the dynamics of wild animal populations, with parameters, such as the probability of survival, being of ecological interest. For a particular parametrization of a model it is not always clear which parameters can be estimated. This inability to estimate all parameters is known as parameter redundancy or a model is described as nonidentifiable. In this paper we develop methods that can be used to detect parameter redundancy in discrete state-space models. An exhaustive summary is a combination of parameters that fully specify a model. To use general methods for detecting parameter redundancy a suitable exhaustive summary is required. This paper proposes two methods for the derivation of an exhaustive summary for discrete state-space models using discrete analogues of methods for continuous state-space models. We also demonstrate that combining multiple data sets, through the use of an integrated population model, may result in a model in which all parameters are estimable, even though models fitted to the separate data sets may be parameter redundant.
- Published
- 2016
- Full Text
- View/download PDF
41. Two‐stage screened selection designs for randomized phase II trials with time‐to‐event endpoints
- Author
-
Jianrong Wu, Haitao Pan, and Chia‐Wei Hsu
- Subjects
Statistics and Probability ,Research Design ,Neoplasms ,Sample Size ,Humans ,Computer Simulation ,General Medicine ,Statistics, Probability and Uncertainty ,Child ,Randomized Controlled Trials as Topic - Abstract
Phase II exploratory multiarm studies that randomize among new treatments are found to be broadly useful and appear to be of value both scientifically and logistically, especially in the areas of unmet needs, for example, pediatric cancer. This multiarm design also has a faster recruitment rate because it provides patients with more treatment choices than traditional two-arm randomized controlled trials do. In contrast to direct formal comparisons in multiarm multistage designs, for example, umbrella or platform designs, the screened selection design (SSD) recommends using a promising treatment arm by ranking according to the effect size, which often needs lesser sample sizes than the former. In this paper, the usefulness of the phase II SSD design is exemplified by three real trials. However, the existing SSD methods can only deal with binary endpoints. Motivated by the real trials in the authors' respective institutions, we propose using the two-stage SSD and its variant for randomized phase II trials with the time-to-event endpoint. The proposed methods not only provide a high probability of selecting a superior treatment arm but also control the type I error rate for testing the efficacy of each treatment arm versus a common external control. Sample size calculations have been derived and simulation studies demonstrate desirable operating characteristics. The proposed design has been used for designing three real trials. An R package frequentistSSD has been developed and is freely accessible for practitioners.
- Published
- 2022
- Full Text
- View/download PDF
42. Bayesian two‐stage sequential enrichment design for biomarker‐guided phase II trials for anticancer therapies
- Author
-
Liwen Su, Xin Chen, Jingyi Zhang, Jun Gao, and Fangrong Yan
- Subjects
Statistics and Probability ,Random Allocation ,Research Design ,Humans ,Bayes Theorem ,Computer Simulation ,Molecular Targeted Therapy ,General Medicine ,Statistics, Probability and Uncertainty ,Biomarkers - Abstract
Biomarker-guided phase II trials have become increasingly important for personalized cancer treatment. In this paper, we propose a Bayesian two-stage sequential enrichment design for such biomarker-guided trials. We assumed that all patients were dichotomized as marker positive or marker negative based on their biomarker status; the positive patients were considered more likely to respond to the targeted drug. Early stopping rules and adaptive randomization methods were embedded in the design to control the number of patients receiving inferior treatment. At the same time, a Bayesian hierarchical model was used to borrow information between the positive and negative control arms to improve efficiency. Simulation results showed that the proposed design achieved higher empirical power while controlling the type I error and assigned more patients to the superior treatment arms. The operating characteristics suggested that the design has good performance and may be useful for biomarker-guided phase II trials for evaluating anticancer therapies.
- Published
- 2022
- Full Text
- View/download PDF
43. gBOIN‐ET: The generalized Bayesian optimal interval design for optimal dose‐finding accounting for ordinal graded efficacy and toxicity in early clinical trials
- Author
-
Kentaro Takeda, Satoshi Morita, and Masataka Taguri
- Subjects
Statistics and Probability ,Dose-Response Relationship, Drug ,Maximum Tolerated Dose ,Research Design ,Neoplasms ,Humans ,Antineoplastic Agents ,Bayes Theorem ,Computer Simulation ,General Medicine ,Statistics, Probability and Uncertainty - Abstract
One of the primary objectives of an oncology dose-finding trial for novel therapies, such as molecular targeted agents and immune-oncology therapies, is to identify an optimal dose (OD) that is tolerable and therapeutically beneficial for subjects in subsequent clinical trials. These new therapeutic agents appear more likely to induce multiple low- or moderate-grade toxicities than dose-limiting toxicities. Besides, efficacy should be evaluated as an overall response and stable disease in solid tumors and the difference between complete remission and partial remission in lymphoma. This paper proposes the generalized Bayesian optimal interval design for dose-finding accounting for efficacy and toxicity grades. The new design, named "gBOIN-ET" design, is model-assisted, simple, and straightforward to implement in actual oncology dose-finding trials than model-based approaches. These characteristics are quite valuable in practice. A simulation study shows that the gBOIN-ET design has advantages compared with the other model-assisted designs in the percentage of correct OD selection and the average number of patients allocated to the ODs across various realistic settings.
- Published
- 2022
- Full Text
- View/download PDF
44. Optimization of adaptive designs with respect to a performance score
- Author
-
Carolin Herrmann, Meinhard Kieser, Geraldine Rauch, and Maximilian Pilz
- Subjects
Statistics and Probability ,Research Design ,Sample Size ,General Medicine ,Statistics, Probability and Uncertainty - Abstract
Adaptive designs are an increasingly popular method for the adaptation of design aspects in clinical trials, such as the sample size. Scoring different adaptive designs helps to make an appropriate choice among the numerous existing adaptive design methods. Several scores have been proposed to evaluate adaptive designs. Moreover, it is possible to determine optimal two-stage adaptive designs with respect to a customized objective score by solving a constrained optimization problem. In this paper, we use the conditional performance score by Herrmann et al. (2020) as the optimization criterion to derive optimal adaptive two-stage designs. We investigate variations of the original performance score, for example, by assigning different weights to the score components and by incorporating prior assumptions on the effect size. We further investigate a setting where the optimization framework is extended by a global power constraint, and additional optimization of the critical value function next to the stage-two sample size is performed. Those evaluations with respect to the sample size curves and the resulting design's performance can contribute to facilitate the score's usage in practice.
- Published
- 2022
- Full Text
- View/download PDF
45. Missing data imputation in clinical trials using recurrent neural network facilitated by clustering and oversampling
- Author
-
Halimu N. Haliduola, Ulrich Mansmann, and Frank Bretz
- Subjects
Statistics and Probability ,Likelihood Functions ,Bias ,Data Interpretation, Statistical ,Cluster Analysis ,Humans ,Neural Networks, Computer ,General Medicine ,Statistics, Probability and Uncertainty - Abstract
In clinical practice, the composition of missing data may be complex, for example, a mixture of missing at random (MAR) and missing not at random (MNAR) assumptions. Many methods under the assumption of MAR are available. Under the assumption of MNAR, likelihood-based methods require specification of the joint distribution of the data, and the missingness mechanism has been introduced as sensitivity analysis. These classic models heavily rely on the underlying assumption, and, in many realistic scenarios, they can produce unreliable estimates. In this paper, we develop a machine learning based missing data prediction framework with the aim of handling more realistic missing data scenarios. We use an imbalanced learning technique (i.e., oversampling of minority class) to handle the MNAR data. To implement oversampling in longitudinal continuous variable, we first perform clustering via
- Published
- 2022
- Full Text
- View/download PDF
46. Disease mapping method comparing the spatial distribution of a disease with a control disease
- Author
-
Oana Petrof, Thomas Neyens, Maren Vranckx, Valerie Nuyts, Benoit Nemery, Kristiaan Nackaerts, Christel Faes, Petrof, Oana/0000-0002-1802-9640, Neyens, Thomas/0000-0003-2364-7555, FAES, Christel/0000-0002-1878-9869, Nemery, Benoit/0000-0003-0571-4689, Nackaerts, Kristiaan/0000-0003-0754-0002, PETROF, Oana, NEYENS, Thomas, VRANCKX, Maren, Nuyts, Valerie, Nemery, Benoit, Nackaerts, Kristiaan, and FAES, Christel
- Subjects
standardization ,Statistics and Probability ,Belgium ,Risk Factors ,case-control study ,mesothelioma ,Case-Control Studies ,disease mapping ,Uncertainty ,BYM model ,Computer Simulation ,General Medicine ,Statistics, Probability and Uncertainty - Abstract
Small-area methods are being used in spatial epidemiology to understand the effect of location on health and detect areas where the risk of a disease is significantly elevated. Disease mapping models relate the observed number of cases to an expected number of cases per area. Expected numbers are often calculated by internal standardization, which requires both accurate population numbers and disease rates per gender and/or age group. However, confidentiality issues or the absence of high-quality information about the characteristics of a population-at-risk can hamper those calculations. Based on methods in point process analysis for situations without accurate population data, we propose the use of a case-control approach in the context of lattice data, in which an unrelated, spatially unstructured disease is used as a control disease. We correct for the uncertainty in the estimation of the expected values, which arises by using the control-disease's observed number of cases as a representation of a fraction of the total population. We apply our methods to a Belgian study of mesothelioma risk, where pancreatic cancer serves as the control disease. The analysis results are in close agreement with those coming from traditional disease mapping models based on internally standardized expected counts. The simulation study results confirm our findings for different spatial structures. We show that the proposed method can adequately address the problem of inaccurate or unavailable population data in disease mapping analysis. FondsWetenschappelijk Onderzoek, Grant/Award Number: 12S7217N; Stichting Tegen Kanker, Grant/Award Number: 2012-222 Thomas Neyens was funded as a postdoctoral researcher by the Research Foundation Flanders (12S7217N). The data sets used for this paper were provided by the Belgian Cancer Registry in the framework of a research project funded by the Foundation against Cancer, Belgium (project 2012-222).
- Published
- 2022
- Full Text
- View/download PDF
47. Sample size determination for comparing accuracies between two diagnostic tests under a paired design
- Author
-
Yi‐Ting Hwang and Nan‐Cheng Su
- Subjects
Statistics and Probability ,ROC Curve ,Diagnostic Tests, Routine ,Area Under Curve ,Sample Size ,General Medicine ,Statistics, Probability and Uncertainty - Abstract
With the progressive technology, many medical researches are aimed to develop diagnostic tests that can detect diseases faster and accurately. The assessment of the accuracy of the diagnostic test for classifying two groups is through the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC). When a paired design is considered, the sample size determination requires the information about two AUC estimates and the corresponding variance and covariance of two AUC estimators. This paper derives the nonparametric estimators of the variance and covariance of two AUC estimators. The result is used to derive the sample size formula when the paired sample is planned. Since most of the results do not have a closed form, numerical results are provided under various scenarios.
- Published
- 2022
- Full Text
- View/download PDF
48. Modified score function for monotone likelihood in the semiparametric mixture cure model
- Author
-
Vinícius Diniz Mayrink, Enrico A. Colosimo, and Frederico M. Almeida
- Subjects
Statistics and Probability ,Likelihood Functions ,Models, Statistical ,Estimator ,Score ,Context (language use) ,General Medicine ,Survival Analysis ,Bias ,Sample size determination ,Expectation–maximization algorithm ,Covariate ,Statistics ,Humans ,Computer Simulation ,Statistics, Probability and Uncertainty ,Likelihood function ,Monte Carlo Method ,Categorical variable ,Algorithms ,Mathematics - Abstract
The cure fraction models are intended to analyze lifetime data from populations where some individuals are immune to the event under study, and allow a joint estimation of the distribution related to the cured and susceptible subjects, as opposed to the usual approach ignoring the cure rate. In situations involving small sample sizes with many censored times, the detection of nonfinite coefficients may arise via maximum likelihood. This phenomenon is commonly known as monotone likelihood (ML), occurring in the Cox and logistic regression models when many categorical and unbalanced covariates are present. An existing solution to prevent the issue is based on the Firth correction, originally developed to reduce the estimation bias. The method ensures finite estimates by penalizing the likelihood function. In the context of mixture cure models, the ML issue is rarely discussed in the literature; therefore, this topic can be seen as the first contribution of our paper. The second major contribution, not well addressed elsewhere, is the study of the ML issue in cure mixture modeling under the flexibility of a semiparametric framework to handle the baseline hazard. We derive the modified score function based on the Firth approach and explore finite sample size properties of the estimators via a Monte Carlo scheme. The simulation results indicate that the performance of coefficients related to the binary covariates are strongly affected to the imbalance degree. A real illustration, in the melanoma dataset, is discussed using a relatively novel data set collected in a Brazilian university hospital.
- Published
- 2021
- Full Text
- View/download PDF
49. Estimation in multivariate linear mixed models for longitudinal data with multiple outputs: Application to PBCseq data analysis
- Author
-
Mozhgan Taavoni and Mohammad Arashi
- Subjects
Statistics and Probability ,Variable (computer science) ,Multivariate statistics ,Standard error ,Heavy-tailed distribution ,Computer science ,Statistics ,Feature selection ,Penalty method ,General Medicine ,Maximization ,Statistics, Probability and Uncertainty ,Generalized linear mixed model - Abstract
In many biomedical studies or clinical trials, we have data with more than one response variable on the same subject repeatedly measured over time. In analyzing such data, we adopt a multivariate linear mixed-effects longitudinal model. On the other hand, in longitudinal data, we often find features that do not impact modeling the response variable and are eliminated from the study. In this paper, we consider the problem of simultaneous variable selection and estimation in a multivariate t linear mixed-effects model (MtLMM) for analyzing longitudinally measured multioutcome data. This work's motivation comes from a cohort study of patients with primary biliary cirrhosis. The interest is eliminating insignificant variables using the smoothly clipped and absolute deviation penalty function in the MtLMM. The proposed penalized model offers robustness and flexibility to accommodate fat tails. An expectation conditional maximization algorithm is employed for the computation of maximum likelihood estimates of parameters. The calculation of standard errors is affected by an information-based method. The methodology is illustrated by analyzing Mayo Clinic Primary Biliary Cirrhosis sequential (PBCseq) data and a simulation study. We found drugs and sex can be eliminated from the PBCseq analysis, and over time the disease progresses.
- Published
- 2021
- Full Text
- View/download PDF
50. Type I multivariate zero‐inflated COM–Poisson regression model
- Author
-
Carlos R. Diniz, Rogério A. Santana, Marinho G. Andrade, and Katiane S. Conceição
- Subjects
Statistics and Probability ,Multivariate statistics ,Models, Statistical ,COVID-19 ,Regression analysis ,General Medicine ,Type (model theory) ,Poisson distribution ,symbols.namesake ,Distribution (mathematics) ,Overdispersion ,Statistics ,symbols ,Statistics::Methodology ,Poisson Distribution ,Poisson regression ,Statistics, Probability and Uncertainty ,Mathematics ,Count data - Abstract
In this paper, we present the Type I multivariate zero-inflated Conway–Maxwell–Poisson distribution, whose development is based on the extension of the Type I multivariate zero-inflated Poisson distribution. We developed important properties of the distribution and present a regression model. The AIC and BIC criteria are used to select the best fitted model. Two real data sets have been used to illustrate the proposed model. Moreover, we conclude by stating that the Type I multivariate zero-inflated Conway–Maxwell–Poisson distribution produces a better fitted model for multivariate count data with excess of zeros. © 2021 Wiley-VCH GmbH
- Published
- 2021
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.