178 results on '"Uno, H."'
Search Results
2. A flexible and coherent test/estimation procedure based on restricted mean survival times for censored time-to-event data in randomized clinical trials.
- Author
-
Horiguchi M, Cronin AM, Takeuchi M, and Uno H
- Subjects
- Humans, Neoplasms therapy, Proportional Hazards Models, Statistics, Nonparametric, Time-Lapse Imaging, Randomized Controlled Trials as Topic methods, Statistics as Topic methods, Survival Analysis
- Abstract
In randomized clinical trials where time-to-event is the primary outcome, almost routinely, the logrank test is prespecified as the primary test and the hazard ratio is used to quantify treatment effect. If the ratio of 2 hazard functions is not constant, the logrank test is not optimal and the interpretation of hazard ratio is not obvious. When such a nonproportional hazards case is expected at the design stage, the conventional practice is to prespecify another member of weighted logrank tests, eg, Peto-Prentice-Wilcoxon test. Alternatively, one may specify a robust test as the primary test, which can capture various patterns of difference between 2 event time distributions. However, most of those tests do not have companion procedures to quantify the treatment difference, and investigators have fallen back on reporting treatment effect estimates not associated with the primary test. Such incoherence in the "test/estimation" procedure may potentially mislead clinicians/patients who have to balance risk-benefit for treatment decision. To address this, we propose a flexible and coherent test/estimation procedure based on restricted mean survival time, where the truncation time τ is selected data dependently. The proposed procedure is composed of a prespecified test and an estimation of corresponding robust and interpretable quantitative treatment effect. The utility of the new procedure is demonstrated by numerical studies based on 2 randomized cancer clinical trials; the test is dramatically more powerful than the logrank, Wilcoxon tests, and the restricted mean survival time-based test with a fixed τ, for the patterns of difference seen in these cancer clinical trials., (Copyright © 2018 John Wiley & Sons, Ltd.)
- Published
- 2018
- Full Text
- View/download PDF
3. Efficiency of two sample tests via the restricted mean survival time for analyzing event time observations.
- Author
-
Tian L, Fu H, Ruberg SJ, Uno H, and Wei LJ
- Subjects
- Humans, Observation, Time Factors, Proportional Hazards Models, Survival Analysis
- Abstract
In comparing two treatments with the event time observations, the hazard ratio (HR) estimate is routinely used to quantify the treatment difference. However, this model dependent estimate may be difficult to interpret clinically especially when the proportional hazards (PH) assumption is violated. An alternative estimation procedure for treatment efficacy based on the restricted means survival time or t-year mean survival time (t-MST) has been discussed extensively in the statistical and clinical literature. On the other hand, a statistical test via the HR or its asymptotically equivalent counterpart, the logrank test, is asymptotically distribution-free. In this article, we assess the relative efficiency of the hazard ratio and t-MST tests with respect to the statistical power under various PH and non-PH models theoretically and empirically. When the PH assumption is valid, the t-MST test performs almost as well as the HR test. For non-PH models, the t-MST test can substantially outperform its HR counterpart. On the other hand, the HR test can be powerful when the true difference of two survival functions is quite large at end but not the beginning of the study. Unfortunately, for this case, the HR estimate may not have a simple clinical interpretation for the treatment effect due to the violation of the PH assumption., (© 2017, The International Biometric Society.)
- Published
- 2018
- Full Text
- View/download PDF
4. Nonparametric inference in the accelerated failure time model using restricted means.
- Author
-
Giurcanu MC and Karrison TG
- Subjects
- Computer Simulation, Humans, Probability, Survival Analysis
- Abstract
We propose a nonparametric estimate of the scale-change parameter for characterizing the difference between two survival functions under the accelerated failure time model using an estimating equation based on restricted means. Advantages of our restricted means based approach compared to current nonparametric procedures is the strictly monotone nature of the estimating equation as a function of the scale-change parameter, leading to a unique root, as well as the availability of a direct standard error estimate, avoiding the need for hazard function estimation or re-sampling to conduct inference. We derive the asymptotic properties of the proposed estimator for fixed and for random point of restriction. In a simulation study, we compare the performance of the proposed estimator with parametric and nonparametric competitors in terms of bias, efficiency, and accuracy of coverage probabilities. The restricted means based approach provides unbiased estimates and accurate confidence interval coverage rates with efficiency ranging from 81% to 95% relative to fitting the correct parametric model. An example from a randomized clinical trial in head and neck cancer is provided to illustrate an application of the methodology in practice., (© 2021. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.)
- Published
- 2022
- Full Text
- View/download PDF
5. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis.
- Author
-
Uno H, Claggett B, Tian L, Inoue E, Gallo P, Miyata T, Schrag D, Takeuchi M, Uyama Y, Zhao L, Skali H, Solomon S, Jacobus S, Hughes M, Packer M, and Wei LJ
- Subjects
- Humans, Longitudinal Studies, Proportional Hazards Models, Survival Analysis
- Abstract
In a longitudinal clinical study to compare two groups, the primary end point is often the time to a specific event (eg, disease progression, death). The hazard ratio estimate is routinely used to empirically quantify the between-group difference under the assumption that the ratio of the two hazard functions is approximately constant over time. When this assumption is plausible, such a ratio estimate may capture the relative difference between two survival curves. However, the clinical meaning of such a ratio estimate is difficult, if not impossible, to interpret when the underlying proportional hazards assumption is violated (ie, the hazard ratio is not constant over time). Although this issue has been studied extensively and various alternatives to the hazard ratio estimator have been discussed in the statistical literature, such crucial information does not seem to have reached the broader community of health science researchers. In this article, we summarize several critical concerns regarding this conventional practice and discuss various well-known alternatives for quantifying the underlying differences between groups with respect to a time-to-event end point. The data from three recent cancer clinical trials, which reflect a variety of scenarios, are used throughout to illustrate our discussions. When there is not sufficient information about the profile of the between-group difference at the design stage of the study, we encourage practitioners to consider a prespecified, clinically meaningful, model-free measure for quantifying the difference and to use robust estimation procedures to draw primary inferences., (© 2014 by American Society of Clinical Oncology.)
- Published
- 2014
- Full Text
- View/download PDF
6. Restricted mean survival time for interval-censored data.
- Author
-
Zhang C, Wu Y, and Yin G
- Subjects
- Bias, Clinical Trials as Topic, Computer Simulation, Humans, Proportional Hazards Models, Survival Rate, Survival Analysis
- Abstract
Restricted mean survival time (RMST) evaluates the mean event-free survival time up to a prespecified time point. It has been used as an alternative measure of treatment effect owing to its model-free structure and clinically meaningful interpretation of treatment benefit for right-censored data. In clinical trials, another type of censoring called interval censoring may occur if subjects are examined at several discrete time points and the survival time falls into an interval rather than being exactly observed. The missingness of exact observations under interval-censored cases makes the nonparametric measure of treatment effect more challenging. Employing the linear smoothing technique to overcome the ambiguity, we propose a new model-free measure for the interval-censored RMST. As an alternative to the commonly used log-rank test, we further construct a hypothesis testing procedure to assess the survival difference between two groups. Simulation studies show that the bias of our proposed interval-censored RMST estimator is negligible and the testing procedure delivers promising performance in detecting between-group difference with regard to size and power under various configurations of survival curves. The proposed method is illustrated by reanalyzing two real datasets containing interval-censored observations., (© 2020 John Wiley & Sons, Ltd.)
- Published
- 2020
- Full Text
- View/download PDF
7. Model-free conditional screening for ultrahigh-dimensional survival data via conditional distance correlation.
- Author
-
Cui H, Liu Y, Mao G, and Zhang J
- Subjects
- Lymphoma, Large B-Cell, Diffuse mortality, Humans, Computer Simulation, Survival Analysis
- Abstract
How to select the active variables that have significant impact on the event of interest is a very important and meaningful problem in the statistical analysis of ultrahigh-dimensional data. In many applications, researchers often know that a certain set of covariates are active variables from some previous investigations and experiences. With the knowledge of the important prior knowledge of active variables, we propose a model-free conditional screening procedure for ultrahigh dimensional survival data based on conditional distance correlation. The proposed procedure can effectively detect the hidden active variables that are jointly important but are weakly correlated with the response. Moreover, it performs well when covariates are strongly correlated with each other. We establish the sure screening property and the ranking consistency of the proposed method and conduct extensive simulation studies, which suggests that the proposed procedure works well for practical situations. Then, we illustrate the new approach through a real dataset from the diffuse large-B-cell lymphoma study S1., (© 2022 Wiley-VCH GmbH.)
- Published
- 2023
- Full Text
- View/download PDF
8. A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data.
- Author
-
Uno H, Tian L, Cai T, Kohane IS, and Wei LJ
- Subjects
- Biomarkers, Tumor genetics, Biostatistics, Breast Neoplasms genetics, Breast Neoplasms mortality, Computer Simulation, Evidence-Based Medicine statistics & numerical data, Female, Humans, Proportional Hazards Models, ROC Curve, Risk, Survival Analysis
- Abstract
Risk prediction procedures can be quite useful for the patient's treatment selection, prevention strategy, or disease management in evidence-based medicine. Often, potentially important new predictors are available in addition to the conventional markers. The question is how to quantify the improvement from the new markers for prediction of the patient's risk in order to aid cost-benefit decisions. The standard method, using the area under the receiver operating characteristic curve, to measure the added value may not be sensitive enough to capture incremental improvements from the new markers. Recently, some novel alternatives to area under the receiver operating characteristic curve, such as integrated discrimination improvement and net reclassification improvement, were proposed. In this paper, we consider a class of measures for evaluating the incremental values of new markers, which includes the preceding two as special cases. We present a unified procedure for making inferences about measures in the class with censored event time data. The large sample properties of our procedures are theoretically justified. We illustrate the new proposal with data from a cancer study to evaluate a new gene score for prediction of the patient's survival., (Copyright © 2012 John Wiley & Sons, Ltd.)
- Published
- 2013
- Full Text
- View/download PDF
9. Restricted mean survival time as a summary measure of time-to-event outcome.
- Author
-
Hasegawa T, Misawa S, Nakagawa S, Tanaka S, Tanase T, Ugai H, Wakana A, Yodo Y, Tsuchiya S, and Suganami H
- Subjects
- Humans, Lung Neoplasms mortality, Proportional Hazards Models, Sample Size, Time Factors, Treatment Outcome, Survival Analysis
- Abstract
Many clinical research studies evaluate a time-to-event outcome, illustrate survival functions, and conventionally report estimated hazard ratios to express the magnitude of the treatment effect when comparing between groups. However, it may not be straightforward to interpret the hazard ratio clinically and statistically when the proportional hazards assumption is invalid. In some recent papers published in clinical journals, the use of restricted mean survival time (RMST) or τ-year mean survival time is discussed as one of the alternative summary measures for the time-to-event outcome. The RMST is defined as the expected value of time to event limited to a specific time point corresponding to the area under the survival curve up to the specific time point. This article summarizes the necessary information to conduct statistical analysis using the RMST, including the definition and statistical properties of the RMST, adjusted analysis methods, sample size calculation, information fraction for the RMST difference, and clinical and statistical meaning and interpretation. Additionally, we discuss how to set the specific time point to define the RMST from two main points of view. We also provide developed SAS codes to determine the sample size required to detect an expected RMST difference with appropriate power and reconstruct individual survival data to estimate an RMST reference value from a reported survival curve., (© 2020 John Wiley & Sons Ltd.)
- Published
- 2020
- Full Text
- View/download PDF
10. Utilizing the integrated difference of two survival functions to quantify the treatment contrast for designing, monitoring, and analyzing a comparative clinical study.
- Author
-
Zhao L, Tian L, Uno H, Solomon SD, Pfeffer MA, Schindler JS, and Wei LJ
- Subjects
- Confidence Intervals, Humans, Kaplan-Meier Estimate, Proportional Hazards Models, Sample Size, Survival Rate, Models, Theoretical, Randomized Controlled Trials as Topic methods, Research Design, Survival Analysis
- Abstract
Background: Consider a comparative, randomized clinical study with a specific event time as the primary end point. In the presence of censoring, standard methods of summarizing the treatment difference are based on Kaplan-Meier curves, the logrank test, and the point and interval estimates via Cox's procedure. Moreover, for designing and monitoring the study, one usually utilizes an event-driven scheme to determine the sample sizes and interim analysis time points., Purpose: When the proportional hazards (PHs) assumption is violated, the logrank test may not have sufficient power to detect the difference between two event time distributions. The resulting hazard ratio estimate is difficult, if not impossible, to interpret as a treatment contrast. When the event rates are low, the corresponding interval estimate for the 'hazard ratio' can be quite large due to the fact that the interval length depends on the observed numbers of events. This may indicate that there is not enough information for making inferences about the treatment comparison even when there is no difference between two groups. This situation is quite common for a postmarketing safety study. We need an alternative way to quantify the group difference., Methods: Instead of quantifying the treatment group difference using the hazard ratio, we consider an easily interpretable and model-free parameter, the integrated survival rate difference over a prespecified time interval, as an alternative. We present the inference procedures for such a treatment contrast. This approach is purely nonparametric and does not need any model assumption such as the PHs. Moreover, when we deal with equivalence or noninferiority studies and the event rates are low, our procedure would provide more information about the treatment difference. We used a cardiovascular trial data set to illustrate our approach., Results: The results using the integrated event rate differences have a heuristic interpretation for the treatment difference even when the PHs assumption is not valid. When the event rates are low, for example, for the cardiovascular study discussed in this article, the procedure for the integrated event rate difference provides tight interval estimates in contrast to those based on the event-driven inference method., Limitations: The design of a trial with the integrated event rate difference may be more complicated than that using the event-driven procedure. One may use simulation to determine the sample size and the estimated duration of the study., Conclusions: The procedure discussed in this article can be a useful alternative to the standard PHs method in the survival analysis.
- Published
- 2012
- Full Text
- View/download PDF
11. Graphical procedures for evaluating overall and subject-specific incremental values from new predictors with censored event time data.
- Author
-
Uno H, Cai T, Tian L, and Wei LJ
- Subjects
- Bias, Humans, Prevalence, Prognosis, Risk Factors, Survival Rate, Endpoint Determination methods, Outcome Assessment, Health Care methods, Proportional Hazards Models, Survival Analysis
- Abstract
Quantitative procedures for evaluating added values from new markers over a conventional risk scoring system for predicting event rates at specific time points have been extensively studied. However, a single summary statistic, for example, the area under the receiver operating characteristic curve or its derivatives, may not provide a clear picture about the relationship between the conventional and the new risk scoring systems. When there are no censored event time observations in the data, two simple scatterplots with individual conventional and new scores for "cases" and "controls" provide valuable information regarding the overall and the subject-specific level incremental values from the new markers. Unfortunately, in the presence of censoring, it is not clear how to construct such plots. In this article, we propose a nonparametric estimation procedure for the distributions of the differences between two risk scores conditional on the conventional score. The resulting quantile curves of these differences over the subject-specific conventional score provide extra information about the overall added value from the new marker. They also help us to identify a subgroup of future subjects who need the new predictors, especially when there is no unified utility function available for cost-risk-benefit decision making. The procedure is illustrated with two data sets. The first is from a well-known Mayo Clinic primary biliary cirrhosis liver study. The second is from a recent breast cancer study on evaluating the added value from a gene score, which is relatively expensive to measure compared with the routinely used clinical biomarkers for predicting the patient's survival after surgery., (© 2011, The International Biometric Society.)
- Published
- 2011
- Full Text
- View/download PDF
12. Assessing predictive accuracy of survival regressions subject to nonindependent censoring.
- Author
-
Wang M, Long Q, Chen C, and Zhang L
- Subjects
- Computer Simulation, Humans, Probability, Proportional Hazards Models, Survival Analysis
- Abstract
Survival regression is commonly applied in biomedical studies or clinical trials, and evaluating their predictive performance plays an essential role for model diagnosis and selection. The presence of censored data, particularly if informative, may pose more challenges for the assessment of predictive accuracy. Existing literature mainly focuses on prediction for survival probabilities with limitation work for survival time. In this work, we focus on accuracy measures of predicted survival times adjusted for a potentially informative censoring mechanism (ie, coarsening at random (CAR); non-CAR) by adopting the technique of inverse probability of censoring weighting. Our proposed predictive metric can be adaptive to various survival regression frameworks including but not limited to accelerated failure time models and proportional hazards models. Moreover, we provide the asymptotic properties of the inverse probability of censoring weighting estimators under CAR. We consider the settings of high-dimensional data under CAR or non-CAR for extensions. The performance of the proposed method is evaluated through extensive simulation studies and analysis of real data from the Critical Assessment of Microarray Data Analysis., (© 2019 John Wiley & Sons, Ltd.)
- Published
- 2020
- Full Text
- View/download PDF
13. On null hypotheses in survival analysis.
- Author
-
Stensrud MJ, Røysland K, and Ryalen PC
- Subjects
- Chemotherapy, Adjuvant, Colonic Neoplasms drug therapy, Colonic Neoplasms mortality, Computer Simulation, Datasets as Topic, Humans, Randomized Controlled Trials as Topic, Proportional Hazards Models, Survival Analysis
- Abstract
The conventional nonparametric tests in survival analysis, such as the log-rank test, assess the null hypothesis that the hazards are equal at all times. However, hazards are hard to interpret causally, and other null hypotheses are more relevant in many scenarios with survival outcomes. To allow for a wider range of null hypotheses, we present a generic approach to define test statistics. This approach utilizes the fact that a wide range of common parameters in survival analysis can be expressed as solutions of differential equations. Thereby, we can test hypotheses based on survival parameters that solve differential equations driven by cumulative hazards, and it is easy to implement the tests on a computer. We present simulations, suggesting that our tests perform well for several hypotheses in a range of scenarios. As an illustration, we apply our tests to evaluate the effect of adjuvant chemotherapies in patients with colon cancer, using data from a randomized controlled trial., (© 2019 The International Biometric Society.)
- Published
- 2019
- Full Text
- View/download PDF
14. Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study.
- Author
-
Li, Yingxia, Herold, Tobias, Mansmann, Ulrich, and Hornung, Roman
- Subjects
MULTIOMICS ,SURVIVAL rate ,PREDICTION models ,SURVIVAL analysis (Biometry) ,DATABASES - Abstract
Background: Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions. Methods: In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives. Results: Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures. Conclusions: Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Comparison between asymptotic and re-randomisation tests under non-proportional hazards in a randomised controlled trial using the minimisation method.
- Author
-
Kimura, Ryusei, Nomura, Shogo, Nagashima, Kengo, and Sato, Yasunori
- Abstract
Background: Pocock-Simon's minimisation method has been widely used to balance treatment assignments across prognostic factors in randomised controlled trials (RCTs). Previous studies focusing on the survival outcomes have demonstrated that the conservativeness of asymptotic tests without adjusting for stratification factors, as well as the inflated type I error rate of adjusted asymptotic tests conducted in a small sample of patients, can be relaxed using re-randomisation tests. Although several RCTs using minimisation have suggested the presence of non-proportional hazards (non-PH) effects, the application of re-randomisation tests has been limited to the log-rank test and Cox PH models, which may result in diminished statistical power when confronted with non-PH scenarios. To address this issue, we proposed two re-randomisation tests based on a maximum combination of weighted log-rank tests (MaxCombo test) and the difference in restricted mean survival time (dRMST) up to a fixed time point τ , both of which can be extended to adjust for randomisation stratification factors. Methods: We compared the performance of asymptotic and re-randomisation tests using the MaxCombo test, dRMST, log-rank test, and Cox PH models, assuming various non-PH situations for RCTs using minimisation, with total sample sizes of 50, 100, and 500 at a 1:1 allocation ratio. We mainly considered null, and alternative scenarios featuring delayed, crossing, and diminishing treatment effects. Results: Across all examined null scenarios, re-randomisation tests maintained the type I error rates at the nominal level. Conversely, unadjusted asymptotic tests indicated excessive conservatism, while adjusted asymptotic tests in both the Cox PH models and dRMST indicated inflated type I error rates for total sample sizes of 50. The stratified MaxCombo-based re-randomisation test consistently exhibited robust power across all examined scenarios. Conclusions: The re-randomisation test is a useful alternative in non-PH situations for RCTs with minimisation using the stratified MaxCombo test, suggesting its robust power in various scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Prioritising deteriorating patients using time-to-event analysis: prediction model development and internal–external validation.
- Author
-
Blythe, Robin, Parsons, Rex, Barnett, Adrian G., Cook, David, McPhail, Steven M., and White, Nicole M.
- Abstract
Background: Binary classification models are frequently used to predict clinical deterioration, however they ignore information on the timing of events. An alternative is to apply time-to-event models, augmenting clinical workflows by ranking patients by predicted risks. This study examines how and why time-to-event modelling of vital signs data can help prioritise deterioration assessments using lift curves, and develops a prediction model to stratify acute care inpatients by risk of clinical deterioration. Methods: We developed and validated a Cox regression for time to in-hospital mortality. The model used time-varying covariates to estimate the risk of clinical deterioration. Adult inpatient medical records from 5 Australian hospitals between 1 January 2019 and 31 December 2020 were used for model development and validation. Model discrimination and calibration were assessed using internal–external cross validation. A discrete-time logistic regression model predicting death within 24 h with the same covariates was used as a comparator to the Cox regression model to estimate differences in predictive performance between the binary and time-to-event outcome modelling approaches. Results: Our data contained 150,342 admissions and 1016 deaths. Model discrimination was higher for Cox regression than for discrete-time logistic regression, with cross-validated AUCs of 0.96 and 0.93, respectively, for mortality predictions within 24 h, declining to 0.93 and 0.88, respectively, for mortality predictions within 1 week. Calibration plots showed that calibration varied by hospital, but this can be mitigated by ranking patients by predicted risks. Conclusion: Time-varying covariate Cox models can be powerful tools for triaging patients, which may lead to more efficient and effective care in time-poor environments when the times between observations are highly variable. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Methods for non-proportional hazards in clinical trials: A systematic review.
- Author
-
Bardo, Maximilian, Huber, Cynthia, Benda, Norbert, Brugger, Jonas, Fellinger, Tobias, Galaune, Vaidotas, Heinz, Judith, Heinzl, Harald, Hooker, Andrew C, Klinglmüller, Florian, König, Franz, Mathes, Tim, Mittlböck, Martina, Posch, Martin, Ristl, Robin, and Friede, Tim
- Subjects
PROPORTIONAL hazards models ,CLINICAL trials ,LOG-rank test ,HAZARDS - Abstract
For the analysis of time-to-event data, frequently used methods such as the log-rank test or the Cox proportional hazards model are based on the proportional hazards assumption, which is often debatable. Although a wide range of parametric and non-parametric methods for non-proportional hazards has been proposed, there is no consensus on the best approaches. To close this gap, we conducted a systematic literature search to identify statistical methods and software appropriate under non-proportional hazard. Our literature search identified 907 abstracts, out of which we included 211 articles, mostly methodological ones. Review articles and applications were less frequently identified. The articles discuss effect measures, effect estimation and regression approaches, hypothesis tests, and sample size calculation approaches, which are often tailored to specific non-proportional hazard situations. Using a unified notation, we provide an overview of methods available. Furthermore, we derive some guidance from the identified articles. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Deep Survival Models Can Improve Long-Term Mortality Risk Estimates from Chest Radiographs.
- Author
-
Liu, Mingzhu, Nagpal, Chirag, and Dubrawski, Artur
- Subjects
SURVIVAL analysis (Biometry) ,DEATH forecasting ,TIME perspective ,DISEASE risk factors ,DEEP learning ,MORTALITY ,CHEST X rays - Abstract
Deep learning has recently demonstrated the ability to predict long-term patient risk and its stratification when trained on imaging data such as chest radiographs. However, existing methods formulate estimating patient risk as a binary classification, typically ignoring or limiting the use of temporal information, and not accounting for the loss of patient follow-up, which reduces the fidelity of estimation and limits the prediction to a certain time horizon. In this paper, we demonstrate that deep survival and time-to-event prediction models can outperform binary classifiers at predicting mortality and risk of adverse health events. In our study, deep survival models were trained to predict risk scores from chest radiographs and patient demographic information in the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial (25,433 patient data points used in this paper) for 2-, 5-, and 10-year time horizons. Binary classification models that predict mortality at these time horizons were built as baselines. Compared to the considered alternative, deep survival models improve the Brier score (5-year: 0.0455 [95% CI, 0.0427–0.0482] vs. 0.0555 [95% CI, (0.0535–0.0575)], p < 0.05) and expected calibration error (ECE) (5-year: 0.0110 [95% CI, 0.0080–0.0141] vs. 0.0747 [95% CI, 0.0718–0.0776], p < 0.05) for those fixed time horizons and are able to generate predictions for any time horizon, without the need to retrain the models. Our study suggests that deep survival analysis tools can outperform binary classification in terms of both discriminative performance and calibration, offering a potentially plausible solution for forecasting risk in clinical practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Improved nonparametric survival prediction using CoxPH, Random Survival Forest & DeepHit Neural Network.
- Author
-
Asghar, Naseem, Khalil, Umair, Ahmad, Basheer, Alshanbari, Huda M., Hamraz, Muhammad, Ahmad, Bakhtiyar, and Khan, Dost Muhammad
- Subjects
SURVIVAL analysis (Biometry) ,FEATURE selection ,FORECASTING ,PREDICTION models - Abstract
In recent times, time-to-event data such as time to failure or death is routinely collected alongside high-throughput covariates. These high-dimensional bioinformatics data often challenge classical survival models, which are either infeasible to fit or produce low prediction accuracy due to overfitting. To address this issue, the focus has shifted towards introducing a novel approaches for feature selection and survival prediction. In this article, we propose a new hybrid feature selection approach that handles high-dimensional bioinformatics datasets for improved survival prediction. This study explores the efficacy of four distinct variable selection techniques: LASSO, RSF-vs, SCAD, and CoxBoost, in the context of non-parametric biomedical survival prediction. Leveraging these methods, we conducted comprehensive variable selection processes. Subsequently, survival analysis models—specifically CoxPH, RSF, and DeepHit NN—were employed to construct predictive models based on the selected variables. Furthermore, we introduce a novel approach wherein only variables consistently selected by a majority of the aforementioned feature selection techniques are considered. This innovative strategy, referred to as the proposed method, aims to enhance the reliability and robustness of variable selection, subsequently improving the predictive performance of the survival analysis models. To evaluate the effectiveness of the proposed method, we compare the performance of the proposed approach with the existing LASSO, RSF-vs, SCAD, and CoxBoost techniques using various performance metrics including integrated brier score (IBS), concordance index (C-Index) and integrated absolute error (IAE) for numerous high-dimensional survival datasets. The real data applications reveal that the proposed method outperforms the competing methods in terms of survival prediction accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Cutting-plane algorithm for estimation of sparse Cox proportional hazards models.
- Author
-
Saishu, Hiroki, Kudo, Kota, and Takano, Yuichi
- Abstract
Survival analysis is a family of statistical methods for analyzing event occurrence times. We adopt a mixed-integer optimization approach to estimation of sparse Cox proportional hazards (PH) models for survival analysis. Specifically, we propose a high-performance cutting-plane algorithm based on a reformulation of our sparse estimation problem into a bilevel optimization problem. This algorithm solves the upper-level problem using cutting planes that are generated from the dual lower-level problem to approximate an upper-level nonlinear objective function. To solve the dual lower-level problem efficiently, we devise a quadratic approximation of the Fenchel conjugate of the loss function. We also develop a computationally efficient least-squares method for adjusting quadratic approximations to fit each dataset. Computational results demonstrate that our method outperforms regularized estimation methods in terms of accuracy for both prediction and subset selection especially for low-dimensional datasets. Moreover, our quadratic approximation of the Fenchel conjugate function accelerates the cutting-plane algorithm and maintains high generalization performance of sparse Cox PH models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Simultaneous inference procedures for the comparison of multiple characteristics of two survival functions.
- Author
-
Ristl, Robin, Götte, Heiko, Schüler, Armin, Posch, Martin, and König, Franz
- Subjects
MULTIPLE comparisons (Statistics) ,SURVIVAL rate ,LOG-rank test ,FALSE positive error ,INFERENTIAL statistics - Abstract
Survival time is the primary endpoint of many randomized controlled trials, and a treatment effect is typically quantified by the hazard ratio under the assumption of proportional hazards. Awareness is increasing that in many settings this assumption is a priori violated, for example, due to delayed onset of drug effect. In these cases, interpretation of the hazard ratio estimate is ambiguous and statistical inference for alternative parameters to quantify a treatment effect is warranted. We consider differences or ratios of milestone survival probabilities or quantiles, differences in restricted mean survival times, and an average hazard ratio to be of interest. Typically, more than one such parameter needs to be reported to assess possible treatment benefits, and in confirmatory trials, the according inferential procedures need to be adjusted for multiplicity. A simple Bonferroni adjustment may be too conservative because the different parameters of interest typically show considerable correlation. Hence simultaneous inference procedures that take into account the correlation are warranted. By using the counting process representation of the mentioned parameters, we show that their estimates are asymptotically multivariate normal and we provide an estimate for their covariance matrix. We propose according to the parametric multiple testing procedures and simultaneous confidence intervals. Also, the logrank test may be included in the framework. Finite sample type I error rate and power are studied by simulation. The methods are illustrated with an example from oncology. A software implementation is provided in the R package nph. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Survival analysis under imperfect record linkage using historic census data.
- Author
-
Marks-Anglin, Arielle K., Barg, Frances K., Ross, Michelle, Wiebe, Douglas J., and Hwang, Wei-Ting
- Subjects
CENSUS ,SURVIVAL analysis (Biometry) ,SURVIVAL rate ,OCCUPATIONAL mortality ,BLACK men ,OCCUPATIONAL exposure - Abstract
Background: Advancements in linking publicly available census records with vital and administrative records have enabled novel investigations in epidemiology and social history. However, in the absence of unique identifiers, the linkage of the records may be uncertain or only be successful for a subset of the census cohort, resulting in missing data. For survival analysis, differential ascertainment of event times can impact inference on risk associations and median survival. Methods: We modify some existing approaches that are commonly used to handle missing survival times to accommodate this imperfect linkage situation including complete case analysis, censoring, weighting, and several multiple imputation methods. We then conduct simulation studies to compare the performance of the proposed approaches in estimating the associations of a risk factor or exposure in terms of hazard ratio (HR) and median survival times in the presence of missing survival times. The effects of different missing data mechanisms and exposure-survival associations on their performance are also explored. The approaches are applied to a historic cohort of residents in Ambler, PA, established using the 1930 US census, from which only 2,440 out of 4,514 individuals (54%) had death records retrievable from publicly available data sources and death certificates. Using this cohort, we examine the effects of occupational and paraoccupational asbestos exposure on survival and disparities in mortality by race and gender. Results: We show that imputation based on conditional survival results in less bias and greater efficiency relative to a complete case analysis when estimating log-hazard ratios and median survival times. When the approaches are applied to the Ambler cohort, we find a significant association between occupational exposure and mortality, particularly among black individuals and males, but not between paraoccupational exposure and mortality. Discussion: This investigation illustrates the strengths and weaknesses of different imputation methods for missing survival times due to imperfect linkage of the administrative or registry data. The performance of the methods may depend on the missingness process as well as the parameter being estimated and models of interest, and such factors should be considered when choosing the methods to address the missing event times. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Smoothed quantile residual life regression analysis with application to the Korea HIV/AIDS cohort study.
- Author
-
Kim, Soo Min, Choi, Yunsu, Kang, Sangwook, and HIV/AIDS cohort study, Korea
- Subjects
REGRESSION analysis ,AIDS ,HIV ,CD4 lymphocyte count ,QUANTILE regression - Abstract
Background: The residual life of a patient with human immunodeficiency virus (HIV) is of major interest to patients and their physicians. While existing analyses of HIV patient survival focus mostly on data collected at baseline, residual life analysis allows for dynamic analysis based on additional data collected over a period of time. As survival times typically exhibit a right-skewed distribution, the median provides a more useful summary of the underlying distribution than the mean. In this paper, we propose an efficient inference procedure that fits a semiparametric quantile regression model assessing the effect of longitudinal biomarkers on the residual life of HIV patients until the development of dyslipidemia, a disease becoming more prevalent among those with HIV. Methods: For estimation of model parameters, we propose an induced smoothing method that smooths nonsmooth estimating functions based on check functions. For variance estimation, we propose an efficient resampling-based estimator. The proposed estimators are theoretically justified. Simulation studies are used to evaluate their finite sample performances, including their prediction accuracy. We analyze the Korea HIV/AIDS cohort study data to examine the effects of CD4 (cluster of differentiation 4) cell count on the residual life of HIV patients to the onset of dyslipidemia. Results: The proposed estimator is shown to be consistent and normally distributed asymptotically. Under various simulation settings, our estimates are approximately unbiased. Their variances estimates are close to the empirical variances and their computational efficiency is superior to that of the nonsmooth counterparts. Two measures of prediction performance indicate that our method adequately reflects the dynamic character of longitudinal biomarkers and residual life. The analysis of the Korea HIV/AIDS cohort study data shows that CD4 cell count is positively associated with residual life to the onset of dyslipidemia but the effect is not statistically significant. Conclusions: Our method enables direct prediction of residual lifetimes with a dynamic feature that accommodates data accumulated at different times. Our estimator significantly improves computational efficiency in variance estimation compared to the existing nonsmooth estimator. Analysis of the HIV/AIDS cohort study data reveals dynamic effects of CD4 cell count on the residual life to the onset of dyslipidemia. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Intermediate-stage (BCLC stage B) infiltrative hepatocellular carcinoma: safety and efficacy of chemoembolization.
- Author
-
Kim, Seong Ho, Kim, Jin Hyoung, Kim, Gun Ha, Kim, Ji Hoon, Ko, Heung-Kyu, Chu, Hee Ho, Shin, Ji Hoon, Gwon, Dong Il, Ko, Gi-Young, Yoon, Hyun-Ki, Aljerdah, Shakir, and Kim, Nayoung
- Subjects
CHEMOEMBOLIZATION ,DISEASE risk factors ,OVERALL survival ,PREDICTION models - Abstract
Objectives: To evaluate the safety and efficacy of chemoembolization in patients with intermediate-stage infiltrative Hepatocellular carcinoma (HCC). Materials and methods: This retrospective study evaluated outcomes in treatment-naïve patients who received chemoembolization as first-line treatment for intermediate-stage infiltrative HCC between 2002 and 2022. Of the 2029 treatment-naïve patients who received chemoembolization as first-line treatment for intermediate-stage HCC, 244 (12%) were identified as having the infiltrative type. After excluding two patients lost to follow-up, 242 patients were evaluated. Results: Median post-chemoembolization overall survival (OS) was 16 months. Multivariable Cox analysis identified four factors predictive of OS: Child–Pugh class B (hazard ratio [HR], 1.84; p = 0.001), maximal tumor size ≥ 10 cm (HR, 1.67; p < 0.001), tumor number ≥ 4 (HR, 1.42; p = 0.037), and bilobar tumor involvement (HR, 1.64; p = 0.003). These four factors were used to create pretreatment prediction models, with risk scores of 0–1, 2–4, and 5–7 defined as low, intermediate, and high risk, respectively. Median OS times in these three groups were 34, 18, and 8 months, respectively (p < 0.001). The objective tumor response rate following chemoembolization was 53%. The major complication rate was 9% overall and was significantly higher in the high-risk group (22%) than in the low (2%) and intermediate (3%) risk groups (p < 0.001). Conclusion: Chemoembolization is safe and effective in selected patients with intermediate-stage infiltrative HCC. Chemoembolization is not recommended in high-risk patients with intermediate-stage infiltrative HCC because of poor OS and high rates of major complications. Clinical relevance statement: A pretreatment prediction model was developed using four risk factors associated with overall survival following chemoembolization for intermediate-stage infiltrative hepatocellular carcinoma. This model may provide valuable information for clinical decision-making. Key Points: • Four risk factors (Child–Pugh score B, maximal tumor size ≥ 10 cm, tumor number ≥ 4, and bilobar tumor involvement) were used to create pretreatment prediction models, with risk scores of 0–1, 2–4, and 5–7 defined as low, intermediate, and high risk, respectively. • Median overall survival (OS) times and major complication rate in these three groups were 34, 18, and 8 months, and 2%, 3%, and 22%, respectively (p < 0.001). Chemoembolization is not recommended in high-risk patients with intermediate-stage infiltrative Hepatocellular carcinoma (HCC) because of poor OS and high rates of major complications. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Explainability of random survival forests in predicting conversion risk from mild cognitive impairment to Alzheimer's disease.
- Author
-
Sarica, Alessia, Aracri, Federica, Bianco, Maria Giovanna, Arcuri, Fulvia, Quattrone, Andrea, and Quattrone, Aldo
- Subjects
ALZHEIMER'S disease ,MILD cognitive impairment ,FOREST conversion ,NEUROPSYCHOLOGICAL tests - Abstract
Random Survival Forests (RSF) has recently showed better performance than statistical survival methods as Cox proportional hazard (CPH) in predicting conversion risk from mild cognitive impairment (MCI) to Alzheimer's disease (AD). However, RSF application in real-world clinical setting is still limited due to its black-box nature. For this reason, we aimed at providing a comprehensive study of RSF explainability with SHapley Additive exPlanations (SHAP) on biomarkers of stable and progressive patients (sMCI and pMCI) from Alzheimer's Disease Neuroimaging Initiative. We evaluated three global explanations—RSF feature importance, permutation importance and SHAP importance—and we quantitatively compared them with Rank-Biased Overlap (RBO). Moreover, we assessed whether multicollinearity among variables may perturb SHAP outcome. Lastly, we stratified pMCI test patients in high, medium and low risk grade, to investigate individual SHAP explanation of one pMCI patient per risk group. We confirmed that RSF had higher accuracy (0.890) than CPH (0.819), and its stability and robustness was demonstrated by high overlap (RBO > 90%) between feature rankings within first eight features. SHAP local explanations with and without correlated variables had no substantial difference, showing that multicollinearity did not alter the model. FDG, ABETA42 and HCI were the first important features in global explanations, with the highest contribution also in local explanation. FAQ, mPACCdigit, mPACCtrailsB and RAVLT immediate had the highest influence among all clinical and neuropsychological assessments in increasing progression risk, as particularly evident in pMCI patients' individual explanation. In conclusion, our findings suggest that RSF represents a useful tool to support clinicians in estimating conversion-to-AD risk and that SHAP explainer boosts its clinical utility with intelligible and interpretable individual outcomes that highlights key features associated with AD prognosis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. PathExpSurv: pathway expansion for explainable survival analysis and disease gene discovery.
- Author
-
Hou, Zhichao, Leng, Jiacheng, Yu, Jiating, Xia, Zheng, and Wu, Ling-Yun
- Subjects
SURVIVAL analysis (Biometry) ,MACHINE learning ,NEURAL pathways ,GENES - Abstract
Background: In the field of biology and medicine, the interpretability and accuracy are both important when designing predictive models. The interpretability of many machine learning models such as neural networks is still a challenge. Recently, many researchers utilized prior information such as biological pathways to develop neural networks-based methods, so as to provide some insights and interpretability for the models. However, the prior biological knowledge may be incomplete and there still exists some unknown information to be explored. Results: We proposed a novel method, named PathExpSurv, to gain an insight into the black-box model of neural network for cancer survival analysis. We demonstrated that PathExpSurv could not only incorporate the known prior information into the model, but also explore the unknown possible expansion to the existing pathways. We performed downstream analyses based on the expanded pathways and successfully identified some key genes associated with the diseases and original pathways. Conclusions: Our proposed PathExpSurv is a novel, effective and interpretable method for survival analysis. It has great utility and value in medical diagnosis and offers a promising framework for biological research. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
27. Sensitivity of Survival Analysis Metrics.
- Author
-
Vasilev, Iulii, Petrovskiy, Mikhail, and Mashechkin, Igor
- Subjects
SENSITIVITY analysis ,DATA distribution ,KAPLAN-Meier estimator ,RECURSIVE partitioning ,DATA analysis ,SURVIVAL analysis (Biometry) - Abstract
Survival analysis models allow for predicting the probability of an event over time. The specificity of the survival analysis data includes the distribution of events over time and the proportion of classes. Late events are often rare and do not correspond to the main distribution and strongly affect the quality of the models and quality assessment. In this paper, we identify four cases of excessive sensitivity of survival analysis metrics and propose methods to overcome them. To set the equality of observation impacts, we adjust the weights of events based on target time and censoring indicator. According to the sensitivity of metrics, A U P R C (area under Precision-Recall curve) is best suited for assessing the quality of survival models, and other metrics are used as loss functions. To evaluate the influence of the loss function, the B a g g i n g model uses ones to select the size and hyperparameters of the ensemble. The experimental study included eight real medical datasets. The proposed modifications of I B S (Integrated Brier Score) improved the quality of B a g g i n g compared to the classical loss functions. In addition, in seven out of eight datasets, the B a g g i n g with new loss functions outperforms the existing models of the scikit-survival library. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. Concordance indices with left‐truncated and right‐censored data.
- Author
-
Hartman, Nicholas, Kim, Sehee, He, Kevin, and Kalbfleisch, John D.
- Subjects
CENSORING (Statistics) ,CHRONIC kidney failure ,SURVIVAL analysis (Biometry) ,PREDICTION models ,SCIENTIFIC observation - Abstract
In the context of time‐to‐event analysis, a primary objective is to model the risk of experiencing a particular event in relation to a set of observed predictors. The Concordance Index (C‐Index) is a statistic frequently used in practice to assess how well such models discriminate between various risk levels in a population. However, the properties of conventional C‐Index estimators when applied to left‐truncated time‐to‐event data have not been well studied, despite the fact that left‐truncation is commonly encountered in observational studies. We show that the limiting values of the conventional C‐Index estimators depend on the underlying distribution of truncation times, which is similar to the situation with right‐censoring as discussed in Uno et al. (2011) [On the C‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine 30(10), 1105–1117]. We develop a new C‐Index estimator based on inverse probability weighting (IPW) that corrects for this limitation, and we generalize this estimator to settings with left‐truncated and right‐censored data. The proposed IPW estimators are highly robust to the underlying truncation distribution and often outperform the conventional methods in terms of bias, mean squared error, and coverage probability. We apply these estimators to evaluate a predictive survival model for mortality among patients with end‐stage renal disease. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. Individual risk prediction: Comparing random forests with Cox proportional‐hazards model by a simulation study.
- Author
-
Baralou, Valia, Kalpourtzi, Natasa, and Touloumi, Giota
- Abstract
With big data becoming widely available in healthcare, machine learning algorithms such as random forest (RF) that ignores time‐to‐event information and random survival forest (RSF) that handles right‐censored data are used for individual risk prediction alternatively to the Cox proportional hazards (Cox‐PH) model. We aimed to systematically compare RF and RSF with Cox‐PH. RSF with three split criteria [log‐rank (RSF‐LR), log‐rank score (RSF‐LRS), maximally selected rank statistics (RSF‐MSR)]; RF, Cox‐PH, and Cox‐PH with splines (Cox‐S) were evaluated through a simulation study based on real data. One hundred eighty scenarios were investigated assuming different associations between the predictors and the outcome (linear/linear and interactions/nonlinear/nonlinear and interactions), training sample sizes (500/1000/5000), censoring rates (50%/75%/93%), hazard functions (increasing/decreasing/constant), and number of predictors (seven, 15 including noise variables). Methods' performance was evaluated with time‐dependent area under curve and integrated Brier score. In all scenarios, RF had the worst performance. In scenarios with a low number of events (⩽70), Cox‐PH was at least noninferior to RSF, whereas under linearity assumption it outperformed RSF. Under the presence of interactions, RSF performed better than Cox‐PH as the number of events increased whereas Cox‐S reached at least similar performance with RSF under nonlinear effects. RSF‐LRS performed slightly worse than RSF‐LR and RSF‐MSR when including noise variables and interaction effects. When applied to real data, models incorporating survival time performed better. Although RSF algorithms are a promising alternative to conventional Cox‐PH as data complexity increases, they require a higher number of events for training. In time‐to‐event analysis, algorithms that consider survival time should be used. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. Investigating non-inferiority or equivalence in time-to-event data under non-proportional hazards.
- Author
-
Möllenhoff, Kathrin and Tresch, Achim
- Subjects
LOG-rank test ,PROPORTIONAL hazards models ,FALSE positive error ,SURVIVAL analysis (Biometry) ,HAZARDS - Abstract
The classical approach to analyze time-to-event data, e.g. in clinical trials, is to fit Kaplan–Meier curves yielding the treatment effect as the hazard ratio between treatment groups. Afterwards, a log-rank test is commonly performed to investigate whether there is a difference in survival or, depending on additional covariates, a Cox proportional hazard model is used. However, in numerous trials these approaches fail due to the presence of non-proportional hazards, resulting in difficulties of interpreting the hazard ratio and a loss of power. When considering equivalence or non-inferiority trials, the commonly performed log-rank based tests are similarly affected by a violation of this assumption. Here we propose a parametric framework to assess equivalence or non-inferiority for survival data. We derive pointwise confidence bands for both, the hazard ratio and the difference of the survival curves. Further we propose a test procedure addressing non-inferiority and equivalence by directly comparing the survival functions at certain time points or over an entire range of time. Once the model's suitability is proven the method provides a noticeable power benefit, irrespectively of the shape of the hazard ratio. On the other hand, model selection should be carried out carefully as misspecification may cause type I error inflation in some situations. We investigate the robustness and demonstrate the advantages and disadvantages of the proposed methods by means of a simulation study. Finally, we demonstrate the validity of the methods by a clinical trial example. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
31. Studentized permutation method for comparing two restricted mean survival times with small sample from randomized trials.
- Author
-
Ditzhaus, Marc, Yu, Menggang, and Xu, Jin
- Subjects
SURVIVAL rate ,PERMUTATIONS ,CONFIDENCE intervals - Abstract
Recent observations, especially in cancer immunotherapy clinical trials with time‐to‐event outcomes, show that the commonly used proportional hazard assumption is often not justifiable, hampering an appropriate analysis of the data by hazard ratios. An attractive alternative advocated is given by the restricted mean survival time (RMST), which does not rely on any model assumption and can always be interpreted intuitively. Since methods for the RMST based on asymptotic theory suffer from inflated type‐I error under small sample sizes, a permutation test was proposed recently leading to more convincing results in simulations. However, classical permutation strategies require an exchangeable data setup between comparison groups which may be limiting in practice. Besides, it is not possible to invert related testing procedures to obtain valid confidence intervals, which can provide more in‐depth information. In this paper, we address these limitations by proposing a studentized permutation test as well as respective permutation‐based confidence intervals. In an extensive simulation study, we demonstrate the advantage of our new method, especially in situations with relatively small sample sizes and unbalanced groups. Finally, we illustrate the application of the proposed method by re‐analyzing data from a recent lung cancer clinical trial. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
32. Pitfalls of the concordance index for survival outcomes.
- Author
-
Hartman, Nicholas, Kim, Sehee, He, Kevin, and Kalbfleisch, John D.
- Subjects
SURVIVAL rate ,SURVIVAL analysis (Biometry) ,PATIENTS' attitudes ,PROGNOSTIC models ,MODEL validation - Abstract
Prognostic models are useful tools for assessing a patient's risk of experiencing adverse health events. In practice, these models must be validated before implementation to ensure that they are clinically useful. The concordance index (C‐Index) is a popular statistic that is used for model validation, and it is often applied to models with binary or survival outcome variables. In this paper, we summarize existing criticism of the C‐Index and show that many limitations are accentuated when applied to survival outcomes, and to continuous outcomes more generally. We present several examples that show the challenges in achieving high concordance with survival outcomes, and we argue that the C‐Index is often not clinically meaningful in this setting. We derive a relationship between the concordance probability and the coefficient of determination under an ordinary least squares model with normally distributed predictors, which highlights the limitations of the C‐Index for continuous outcomes. Finally, we recommend existing alternatives that more closely align with common uses of survival models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
33. Bayesian nonparametric analysis of restricted mean survival time.
- Author
-
Zhang, Chenyang and Yin, Guosheng
- Subjects
SURVIVAL rate ,BAYESIAN analysis ,DISTRIBUTION (Probability theory) ,FREQUENTIST statistics ,NONPARAMETRIC estimation ,CENSORING (Statistics) ,SURVIVAL analysis (Biometry) - Abstract
The restricted mean survival time (RMST) evaluates the expectation of survival time truncated by a prespecified time point, because the mean survival time in the presence of censoring is typically not estimable. The frequentist inference procedure for RMST has been widely advocated for comparison of two survival curves, while research from the Bayesian perspective is rather limited. For the RMST of both right‐ and interval‐censored data, we propose Bayesian nonparametric estimation and inference procedures. By assigning a mixture of Dirichlet processes (MDP) prior to the distribution function, we can estimate the posterior distribution of RMST. We also explore another Bayesian nonparametric approach using the Dirichlet process mixture model and make comparisons with the frequentist nonparametric method. Simulation studies demonstrate that the Bayesian nonparametric RMST under diffuse MDP priors leads to robust estimation and under informative priors it can incorporate prior knowledge into the nonparametric estimator. Analysis of real trial examples demonstrates the flexibility and interpretability of the Bayesian nonparametric RMST for both right‐ and interval‐censored data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
34. Supervised two‐dimensional functional principal component analysis with time‐to‐event outcomes and mammogram imaging data.
- Author
-
Jiang, Shu, Cao, Jiguo, Rosner, Bernard, and Colditz, Graham A.
- Subjects
PRINCIPAL components analysis ,MAMMOGRAMS ,LEAST squares ,MEDICAL screening ,BREAST cancer - Abstract
Screening mammography aims to identify breast cancer early and secondarily measures breast density to classify women at higher or lower than average risk for future breast cancer in the general population. Despite the strong association of individual mammography features to breast cancer risk, the statistical literature on mammogram imaging data is limited. While functional principal component analysis (FPCA) has been studied in the literature for extracting image‐based features, it is conducted independently of the time‐to‐event response variable. With the consideration of building a prognostic model for precision prevention, we present a set of flexible methods, supervised FPCA (sFPCA) and functional partial least squares (FPLS), to extract image‐based features associated with the failure time while accommodating the added complication from right censoring. Throughout the article, we hope to demonstrate that one method is favored over the other under different clinical setups. The proposed methods are applied to the motivating data set from the Joanne Knight Breast Health cohort at Siteman Cancer Center. Our approaches not only obtain the best prediction performance compared to the benchmark model, but also reveal different risk patterns within the mammograms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. Omnibus test for restricted mean survival time based on influence function.
- Author
-
Gu, Jiaqi, Fan, Yiwei, and Yin, Guosheng
- Subjects
SURVIVAL rate ,SURVIVAL analysis (Biometry) ,LOG-rank test ,ASYMPTOTIC distribution ,KAPLAN-Meier estimator ,COVARIANCE matrices - Abstract
The restricted mean survival time (RMST), which evaluates the expected survival time up to a pre-specified time point τ , has been widely used to summarize the survival distribution due to its robustness and straightforward interpretation. In comparative studies with time-to-event data, the RMST-based test has been utilized as an alternative to the classic log-rank test because the power of the log-rank test deteriorates when the proportional hazards assumption is violated. To overcome the challenge of selecting an appropriate time point τ , we develop an RMST-based omnibus Wald test to detect the survival difference between two groups throughout the study follow-up period. Treating a vector of RMSTs at multiple quantile-based time points as a statistical functional, we construct a Wald χ 2 test statistic and derive its asymptotic distribution using the influence function. We further propose a new procedure based on the influence function to estimate the asymptotic covariance matrix in contrast to the usual bootstrap method. Simulations under different scenarios validate the size of our RMST-based omnibus test and demonstrate its advantage over the existing tests in power, especially when the true survival functions cross within the study follow-up period. For illustration, the proposed test is applied to two real datasets, which demonstrate its power and applicability in various situations. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. Machine learning for optimized individual survival prediction in resectable upper gastrointestinal cancer.
- Author
-
Jung, Jin-On, Crnovrsanin, Nerma, Wirsik, Naita Maren, Nienhüser, Henrik, Peters, Leila, Popp, Felix, Schulze, André, Wagner, Martin, Müller-Stich, Beat Peter, Büchler, Markus Wolfgang, and Schmidt, Thomas
- Subjects
MACHINE learning ,GASTROINTESTINAL cancer ,GASTROINTESTINAL surgery ,DISEASE risk factors ,ESOPHAGEAL cancer ,SURVIVAL rate - Abstract
Purpose: Surgical oncologists are frequently confronted with the question of expected long-term prognosis. The aim of this study was to apply machine learning algorithms to optimize survival prediction after oncological resection of gastroesophageal cancers. Methods: Eligible patients underwent oncological resection of gastric or distal esophageal cancer between 2001 and 2020 at Heidelberg University Hospital, Department of General Surgery. Machine learning methods such as multi-task logistic regression and survival forests were compared with usual algorithms to establish an individual estimation. Results: The study included 117 variables with a total of 1360 patients. The overall missingness was 1.3%. Out of eight machine learning algorithms, the random survival forest (RSF) performed best with a concordance index of 0.736 and an integrated Brier score of 0.166. The RSF demonstrated a mean area under the curve (AUC) of 0.814 over a time period of 10 years after diagnosis. The most important long-term outcome predictor was lymph node ratio with a mean AUC of 0.730. A numeric risk score was calculated by the RSF for each patient and three risk groups were defined accordingly. Median survival time was 18.8 months in the high-risk group, 44.6 months in the medium-risk group and above 10 years in the low-risk group. Conclusion: The results of this study suggest that RSF is most appropriate to accurately answer the question of long-term prognosis. Furthermore, we could establish a compact risk score model with 20 input parameters and thus provide a clinical tool to improve prediction of oncological outcome after upper gastrointestinal surgery. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. Predicting the onset of breast cancer using mammogram imaging data with irregular boundary.
- Author
-
Jiang, Shu, Cao, Jiguo, Colditz, Graham A, and Rosner, Bernard
- Subjects
MAMMOGRAMS ,BREAST cancer ,TRIANGULATION ,PRINCIPAL components analysis ,FEATURE extraction ,DISEASE risk factors - Abstract
With mammography being the primary breast cancer screening strategy, it is essential to make full use of the mammogram imaging data to better identify women who are at higher and lower than average risk. Our primary goal in this study is to extract mammogram-based features that augment the well-established breast cancer risk factors to improve prediction accuracy. In this article, we propose a supervised functional principal component analysis (sFPCA) over triangulations method for extracting features that are ordered by the magnitude of association with the failure time outcome. The proposed method accommodates the irregular boundary issue posed by the breast area within the mammogram imaging data with flexible bivariate splines over triangulations. We also provide an eigenvalue decomposition algorithm that is computationally efficient. Compared to the conventional unsupervised FPCA method, the proposed method results in a lower Brier Score and higher area under the ROC curve (AUC) in simulation studies. We apply our method to data from the Joanne Knight Breast Health Cohort at Siteman Cancer Center. Our approach not only obtains the best prediction performance comparing to unsupervised FPCA and benchmark models but also reveals important risk patterns within the mammogram images. This demonstrates the importance of utilizing additional supervised image-based features to clarify breast cancer risk. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. Performance of the Matsumiya scoring system in cervical cancer patients with bone metastasis: an external validation study.
- Author
-
Wongyikul, Pakpoom, Wongchanudom, Sukaphong, Lumkul, Lalita, Isaradech, Natthanaphop, Phanphaisarn, Areerak, Phinyo, Phichayut, and Pruksakorn, Dumnoensun
- Subjects
BONE metastasis ,BONE cancer ,CERVICAL cancer ,CANCER patients ,UNIVERSITY hospitals - Abstract
Background: Accurate prognostic prediction of survival in cervical cancer patients with bone metastasis is important for treatment planning. We aimed to externally validate the Matsumiya scoring system using external patient data. Methods: We collected a retrospective cohort of patients with cervical cancer diagnosed with bone metastasis at Chiang Mai University Hospital from 1st January 2007 to 31st December 2016. The Matsumiya score was composed of 5 predictors, including the presence of extraskeletal metastasis, ECOG performance status, history of previous chemo- or radiotherapy, the presence of multiple bone metastasis, and bone metastasis-free interval < 12 months. Harrell's C-statistics and score calibration plots were used to evaluate the score performance. We also reconstructed the development study to estimate apparent performance values for comparison during external validation. Results: A total of 124 cervical cancer patients with bone metastasis were included in this study. The 13-, 26-, and 52-week survival probabilities in the validation study were 70.1%, 50.5%, and 25.7%, respectively. Several differences were identified between development and validation studies regarding clinical characteristics, case-mix, and predictor–outcome associations. Harrell's C-statistics in the development and validation study were 0.714 and 0.567. The score showed poor agreement between the observed and the predicted survival probabilities in the validation study. Score reweighting and refitting showed only modest improvement in performance. Conclusion: A prognostic scoring system by Matsumiya et al. performed poorly in our cohort of Thai cervical cancer patients with bone metastasis. We suggested that the score should be sufficiently updated before being used. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
39. Multivariate longitudinal data for survival analysis of cardiovascular event prediction in young adults: insights from a comparative explainable study.
- Author
-
Nguyen, Hieu T., Vasconcellos, Henrique D., Keck, Kimberley, Reis, Jared P., Lewis, Cora E., Sidney, Steven, Lloyd-Jones, Donald M., Schreiner, Pamela J., Guallar, Eliseo, Wu, Colin O., Lima, João A.C., and Ambale-Venkatesh, Bharath
- Subjects
PANEL analysis ,YOUNG adults ,SURVIVAL analysis (Biometry) ,DATA analysis ,FORECASTING - Abstract
Background: Multivariate longitudinal data are under-utilized for survival analysis compared to cross-sectional data (CS - data collected once across cohort). Particularly in cardiovascular risk prediction, despite available methods of longitudinal data analysis, the value of longitudinal information has not been established in terms of improved predictive accuracy and clinical applicability. Methods: We investigated the value of longitudinal data over and above the use of cross-sectional data via 6 distinct modeling strategies from statistics, machine learning, and deep learning that incorporate repeated measures for survival analysis of the time-to-cardiovascular event in the Coronary Artery Risk Development in Young Adults (CARDIA) cohort. We then examined and compared the use of model-specific interpretability methods (Random Survival Forest Variable Importance) and model-agnostic methods (SHapley Additive exPlanation (SHAP) and Temporal Importance Model Explanation (TIME)) in cardiovascular risk prediction using the top-performing models. Results: In a cohort of 3539 participants, longitudinal information from 35 variables that were repeatedly collected in 6 exam visits over 15 years improved subsequent long-term (17 years after) risk prediction by up to 8.3% in C-index compared to using baseline data (0.78 vs. 0.72), and up to approximately 4% compared to using the last observed CS data (0.75). Time-varying AUC was also higher in models using longitudinal data (0.86–0.87 at 5 years, 0.79–0.81 at 10 years) than using baseline or last observed CS data (0.80–0.86 at 5 years, 0.73–0.77 at 10 years). Comparative model interpretability analysis revealed the impact of longitudinal variables on model prediction on both the individual and global scales among different modeling strategies, as well as identifying the best time windows and best timing within that window for event prediction. The best strategy to incorporate longitudinal data for accuracy was time series massive feature extraction, and the easiest interpretable strategy was trajectory clustering. Conclusion: Our analysis demonstrates the added value of longitudinal data in predictive accuracy and epidemiological utility in cardiovascular risk survival analysis in young adults via a unified, scalable framework that compares model performance and explainability. The framework can be extended to a larger number of variables and other longitudinal modeling methods. Trial registration: ClinicalTrials.gov Identifier: NCT00005130, Registration Date: 26/05/2000. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Comparison of State-of-the-Art Neural Network Survival Models with the Pooled Cohort Equations for Cardiovascular Disease Risk Prediction.
- Author
-
Deng, Yu, Liu, Lei, Jiang, Hongmei, Peng, Yifan, Wei, Yishu, Zhou, Zhiyang, Zhong, Yizhen, Zhao, Yun, Yang, Xiaoyun, Yu, Jingzhi, Lu, Zhiyong, Kho, Abel, Ning, Hongyan, Allen, Norrina B., Wilkins, John T., Liu, Kiang, Lloyd-Jones, Donald M., and Zhao, Lihui
- Subjects
ARTIFICIAL neural networks ,BLACK men ,CARDIOVASCULAR diseases risk factors ,CARDIOVASCULAR diseases ,TEXT recognition - Abstract
Background: The Pooled Cohort Equations (PCEs) are race- and sex-specific Cox proportional hazards (PH)-based models used for 10-year atherosclerotic cardiovascular disease (ASCVD) risk prediction with acceptable discrimination. In recent years, neural network models have gained increasing popularity with their success in image recognition and text classification. Various survival neural network models have been proposed by combining survival analysis and neural network architecture to take advantage of the strengths from both. However, the performance of these survival neural network models compared to each other and to PCEs in ASCVD prediction is unknown. Methods: In this study, we used 6 cohorts from the Lifetime Risk Pooling Project (with 5 cohorts as training/internal validation and one cohort as external validation) and compared the performance of the PCEs in 10-year ASCVD risk prediction with an all two-way interactions Cox PH model (Cox PH-TWI) and three state-of-the-art neural network survival models including Nnet-survival, Deepsurv, and Cox-nnet. For all the models, we used the same 7 covariates as used in the PCEs. We fitted each of the aforementioned models in white females, white males, black females, and black males, respectively. We evaluated models' internal and external discrimination power and calibration. Results: The training/internal validation sample comprised 23216 individuals. The average age at baseline was 57.8 years old (SD = 9.6); 16% developed ASCVD during average follow-up of 10.50 (SD = 3.02) years. Based on 10 × 10 cross-validation, the method that had the highest C-statistics was Deepsurv (0.7371) for white males, Deepsurv and Cox PH-TWI (0.7972) for white females, PCE (0.6981) for black males, and Deepsurv (0.7886) for black females. In the external validation dataset, Deepsurv (0.7032), Cox-nnet (0.7282), PCE (0.6811), and Deepsurv (0.7316) had the highest C-statistics for white male, white female, black male, and black female population, respectively. Calibration plots showed that in 10 × 10 validation, all models had good calibration in all race and sex groups. In external validation, all models overestimated the risk for 10-year ASCVD. Conclusions: We demonstrated the use of the state-of-the-art neural network survival models in ASCVD risk prediction. Neural network survival models had similar if not superior discrimination and calibration compared to PCEs. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
41. Clinical effectiveness reporting of novel cancer drugs in the context of non-proportional hazards: a review of nice single technology appraisals.
- Author
-
Salmon, David and Melendez-Torres, G. J.
- Abstract
Objectives: The hazard ratio (HR) is a commonly used summary statistic when comparing time to event (TTE) data between trial arms, but assumes the presence of proportional hazards (PH). Non-proportional hazards (NPH) are increasingly common in NICE technology appraisals (TAs) due to an abundance of novel cancer treatments, which have differing mechanisms of action compared with traditional chemotherapies. The goal of this study is to understand how pharmaceutical companies, evidence review groups (ERGs) and appraisal committees (ACs) test for PH and report clinical effectiveness in the context of NPH. Methods: A thematic analysis of NICE TAs concerning novel cancer treatments published between 1 January 2020 and 31 December 2021 was undertaken. Data on PH testing and clinical effectiveness reporting for overall survival (OS) and progression-free survival (PFS) were obtained from company submissions, ERG reports, and final appraisal determinations (FADs). Results: NPH were present for OS or PFS in 28/40 appraisals, with log-cumulative hazard plots the most common testing methodology (40/40), supplemented by Schoenfeld residuals (20/40) and/or other statistical methods (6/40). In the context of NPH, the HR was ubiquitously reported by companies, inconsistently critiqued by ERGs (10/28), and commonly reported in FADs (23/28). Conclusions: There is inconsistency in PH testing methodology used in TAs. ERGs are inconsistent in critiquing use of the HR in the context of NPH, and even when critiqued it remains a commonly reported outcome measure in FADs. Other measures of clinical effectiveness should be considered, along with guidance on clinical effectiveness reporting when NPH are present. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. Treatment effect measures under nonproportional hazards.
- Author
-
Snapinn, Steven, Jiang, Qi, and Ke, Chunlei
- Subjects
SURVIVAL rate ,TREATMENT effectiveness ,SURVIVAL analysis (Biometry) ,HAZARDS - Abstract
In a clinical trial with a time‐to‐event endpoint the treatment effect can be measured in various ways. Under proportional hazards all reasonable measures (such as the hazard ratio and the difference in restricted mean survival time) are consistent in the following sense: Take any control group survival distribution such that the hazard rate remains above zero; if there is no benefit by any measure there is no benefit by all measures, and as the magnitude of treatment benefit increases by any measure it increases by all measures. Under nonproportional hazards, however, survival curves can cross, and the direction of the effect for any pair of measures can be inconsistent. In this paper we critically evaluate a variety of treatment effect measures in common use and identify flaws with them. In particular, we demonstrate that a treatment's benefit has two distinct and independent dimensions which can be measured by the difference in the survival rate at the end of follow‐up and the difference in restricted mean survival time, and that commonly used measures do not adequately capture both dimensions. We demonstrate that a generalized hazard difference, which can be estimated by the difference in exposure‐adjusted subject incidence rates, captures both dimensions, and that its inverse, the number of patient‐years of follow‐up that results in one fewer event (the NYNT), is an easily interpretable measure of the magnitude of clinical benefit. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. A Data-Driven Framework for Small Hydroelectric Plant Prognosis Using Tsfresh and Machine Learning Survival Models.
- Author
-
de Santis, Rodrigo Barbosa, Gontijo, Tiago Silveira, and Costa, Marcelo Azevedo
- Subjects
SURVIVAL analysis (Biometry) ,MACHINE learning ,RENEWABLE energy sources ,FAST Fourier transforms ,ENGINEERING models ,BOOSTING algorithms - Abstract
Maintenance in small hydroelectric plants (SHPs) is essential for securing the expansion of clean energy sources and supplying the energy estimated to be required for the coming years. Identifying failures in SHPs before they happen is crucial for allowing better management of asset maintenance, lowering operating costs, and enabling the expansion of renewable energy sources. Most fault prognosis models proposed thus far for hydroelectric generating units are based on signal decomposition and regression models. In the specific case of SHPs, there is a high occurrence of data being censored, since the operation is not consistently steady and can be repeatedly interrupted due to transmission problems or scarcity of water resources. To overcome this, we propose a two-step, data-driven framework for SHP prognosis based on time series feature engineering and survival modeling. We compared two different strategies for feature engineering: one using higher-order statistics and the other using the Tsfresh algorithm. We adjusted three machine learning survival models—CoxNet, survival random forests, and gradient boosting survival analysis—for estimating the concordance index of these approaches. The best model presented a significant concordance index of 77.44%. We further investigated and discussed the importance of the monitored sensors and the feature extraction aggregations. The kurtosis and variance were the most relevant aggregations in the higher-order statistics domain, while the fast Fourier transform and continuous wavelet transform were the most frequent transformations when using Tsfresh. The most important sensors were related to the temperature at several points, such as the bearing generator, oil hydraulic unit, and turbine radial bushing. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. Methods for handling missing data in serially sampled sputum specimens for mycobacterial culture conversion calculation.
- Author
-
Malatesta, Samantha, Weir, Isabelle R., Weber, Sarah E., Bouton, Tara C., Carney, Tara, Theron, Danie, Myers, Bronwyn, Horsburgh, C. Robert, Warren, Robin M., Jacobson, Karen R., and White, Laura F.
- Subjects
MISSING data (Statistics) ,SURVIVAL rate ,SPUTUM ,MYCOBACTERIUM tuberculosis ,STATISTICAL power analysis - Abstract
Background: The occurrence and timing of mycobacterial culture conversion is used as a proxy for tuberculosis treatment response. When researchers serially sample sputum during tuberculosis studies, contamination or missed visits leads to missing data points. Traditionally, this is managed by ignoring missing data or simple carry-forward techniques. Statistically advanced multiple imputation methods potentially decrease bias and retain sample size and statistical power.Methods: We analyzed data from 261 participants who provided weekly sputa for the first 12 weeks of tuberculosis treatment. We compared methods for handling missing data points in a longitudinal study with a time-to-event outcome. Our primary outcome was time to culture conversion, defined as two consecutive weeks with no Mycobacterium tuberculosis growth. Methods used to address missing data included: 1) available case analysis, 2) last observation carried forward, and 3) multiple imputation by fully conditional specification. For each method, we calculated the proportion culture converted and used survival analysis to estimate Kaplan-Meier curves, hazard ratios, and restricted mean survival times. We compared methods based on point estimates, confidence intervals, and conclusions to specific research questions.Results: The three missing data methods lead to differences in the number of participants achieving conversion; 78 (32.8%) participants converted with available case analysis, 154 (64.7%) converted with last observation carried forward, and 184 (77.1%) converted with multiple imputation. Multiple imputation resulted in smaller point estimates than simple approaches with narrower confidence intervals. The adjusted hazard ratio for smear negative participants was 3.4 (95% CI 2.3, 5.1) using multiple imputation compared to 5.2 (95% CI 3.1, 8.7) using last observation carried forward and 5.0 (95% CI 2.4, 10.6) using available case analysis.Conclusion: We showed that accounting for missing sputum data through multiple imputation, a statistically valid approach under certain conditions, can lead to different conclusions than naïve methods. Careful consideration for how to handle missing data must be taken and be pre-specified prior to analysis. We used data from a TB study to demonstrate these concepts, however, the methods we described are broadly applicable to longitudinal missing data. We provide valuable statistical guidance and code for researchers to appropriately handle missing data in longitudinal studies. [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
45. Doubly‐robust methods for differences in restricted mean lifetimes using pseudo‐observations.
- Author
-
Choi, Sangbum, Choi, Taehwa, Lee, Hye‐Young, Han, Sung Won, and Bandyopadhyay, Dipankar
- Subjects
MACHINE learning ,SURVIVAL rate ,SURVIVAL analysis (Biometry) ,CAUSAL inference ,REGRESSION analysis - Abstract
In clinical studies or trials comparing survival times between two treatment groups, the restricted mean lifetime (RML), defined as the expectation of the survival from time 0 to a prespecified time‐point, is often the quantity of interest that is readily interpretable to clinicians without any modeling restrictions. It is well known that if the treatments are not randomized (as in observational studies), covariate adjustment is necessary to account for treatment imbalances due to confounding factors. In this article, we propose a simple doubly‐robust pseudo‐value approach to effectively estimate the difference in the RML between two groups (akin to a metric for estimating average causal effects), while accounting for confounders. The proposed method combines two general approaches: (a) group‐specific regression models for the time‐to‐event and covariate information, and (b) inverse probability of treatment assignment weights, where the RMLs are replaced by the corresponding pseudo‐observations for survival outcomes, thereby mitigating the estimation complexities in presence of censoring. The proposed estimator is double‐robust, in the sense that it is consistent if at least one of the two working models remains correct. In addition, we explore the potential of available machine learning algorithms in causal inference to reduce possible bias of the causal estimates in presence of a complex association between the survival outcome and covariates. We conduct extensive simulation studies to assess the finite‐sample performance of the pseudo‐value causal effect estimators. Furthermore, we illustrate our methodology via application to a dataset from a breast cancer cohort study. The proposed method is implementable using the R package drRML, available in GitHub. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
46. Survival Regression with Accelerated Failure Time Model in XGBoost.
- Author
-
Barnwal, Avinash, Cho, Hyunsu, and Hocking, Toby
- Subjects
COMPUTER performance ,COMMUNITIES ,SURVIVAL analysis (Biometry) ,MACHINE learning - Abstract
Survival regression is used to estimate the relation between time-to-event and feature variables, and is important in application domains such as medicine, marketing, risk management, and sales management. Nonlinear tree based machine learning algorithms as implemented in libraries such as XGBoost, scikit-learn, LightGBM, and CatBoost are often more accurate in practice than linear models. However, existing state-of-the-art implementations of tree-based models have offered limited support for survival regression. In this work, we implement loss functions for learning accelerated failure time (AFT) models in XGBoost, to increase the support for survival modeling for different kinds of label censoring. We demonstrate with real and simulated experiments the effectiveness of AFT in XGBoost with respect to a number of baselines, in two respects: generalization performance and training speed. Furthermore, we take advantage of the support for NVIDIA GPUs in XGBoost to achieve substantial speedup over multi-core CPUs. To our knowledge, our work is the first implementation of AFT that uses the processing power of NVIDIA GPUs. Starting from the 1.2.0 release, the XGBoost package natively supports the AFT model. The addition of AFT in XGBoost has had significant impact in the open source community, and a few statistics packages now use the XGBoost AFT model. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. Default risk prediction and feature extraction using a penalized deep neural network.
- Author
-
Lin, Cunjie, Qiao, Nan, Zhang, Wenli, Li, Yang, and Ma, Shuangge
- Abstract
Online peer-to-peer lending platforms provide loans directly from lenders to borrowers without passing through traditional financial institutions. For lenders on these platforms to avoid loss, it is crucial that they accurately assess default risk so that they can make appropriate decisions. In this study, we develop a penalized deep learning model to predict default risk based on survival data. As opposed to simply predicting whether default will occur, we focus on predicting the probability of default over time. Moreover, by adding an additional one-to-one layer in the neural network, we achieve feature selection and estimation simultaneously by incorporating an L 1 -penalty into the objective function. The minibatch gradient descent algorithm makes it possible to handle massive data. An analysis of a real-world loan data and simulations demonstrate the model’s competitive practical performance, which suggests favorable potential applications in peer-to-peer lending platforms. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. Restricted mean survival time regression model with time‐dependent covariates.
- Author
-
Zhang, Chengfeng, Huang, Baoyi, Wu, Hongji, Yuan, Hao, Hou, Yawen, and Chen, Zheng
- Abstract
In clinical or epidemiological follow‐up studies, methods based on time scale indicators such as the restricted mean survival time (RMST) have been developed to some extent. Compared with traditional hazard rate indicator system methods, the RMST is easier to interpret and does not require the proportional hazard assumption. To date, regression models based on the RMST are indirect or direct models of the RMST and baseline covariates. However, time‐dependent covariates are becoming increasingly common in follow‐up studies. Based on the inverse probability of censoring weighting (IPCW) method, we developed a regression model of the RMST and time‐dependent covariates. Through Monte Carlo simulation, we verified the estimation performance of the regression parameters of the proposed model. Compared with the time‐dependent Cox model and the fixed (baseline) covariate RMST model, the time‐dependent RMST model has a better prediction ability. Finally, an example of heart transplantation was used to verify the above conclusions. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
49. Nonparametric estimation in an illness‐death model with component‐wise censoring.
- Author
-
Eaton, Anne, Sun, Yifei, Neaton, James, and Luo, Xianghua
- Subjects
NONPARAMETRIC estimation ,CENSORSHIP ,CORONARY disease ,HEART disease related mortality ,CENSORING (Statistics) ,EARLY death - Abstract
In disease settings where study participants are at risk for death and a serious nonfatal event, composite endpoints defined as the time until the earliest of death or the nonfatal event are often used as the primary endpoint in clinical trials. In practice, if the nonfatal event can only be detected at clinic visits and the death time is known exactly, the resulting composite endpoint exhibits "component‐wise censoring." The standard method used to estimate event‐free survival in this setting fails to account for component‐wise censoring. We apply a kernel smoothing method previously proposed for a marker process in a novel way to produce a nonparametric estimator for event‐free survival that accounts for component‐wise censoring. The key insight that allows us to apply this kernel method is thinking of nonfatal event status as an intermittently observed binary time‐dependent variable rather than thinking of time to the nonfatal event as interval‐censored. We also propose estimators for the probability in state and restricted mean time in state for reversible or irreversible illness‐death models, under component‐wise censoring, and derive their large‐sample properties. We perform a simulation study to compare our method to existing multistate survival methods and apply the methods on data from a large randomized trial studying a multifactor intervention for reducing morbidity and mortality among men at above average risk of coronary heart disease. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
50. A nonparametric statistical method for two crossing survival curves.
- Author
-
Huang, Xinghui, Lyu, Jingjing, Hou, Yawen, and Chen, Zheng
- Subjects
SURVIVAL analysis (Biometry) ,FALSE positive error ,LOG-rank test ,RECEIVER operating characteristic curves ,ERROR rates ,INFERENTIAL statistics ,RELIABILITY in engineering - Abstract
In comparative research on time-to-event data for two groups, when two survival curves cross each other, it may be difficult to use the log-rank test and hazard ratio (HR) to properly assess the treatment benefit. Our aim was to identify a method for evaluating the treatment benefits for two groups in the above situation. We quantified treatment benefits based on an intuitive measure called the area between two survival curves (ABS), which is a robust measure of treatment benefits in clinical trials regardless of whether the proportional hazards assumption is violated or two survival curves cross each other. Additionally, we propose a permutation test based on the ABS, and we evaluate the effectiveness and reliability of this test with simulated data. The ABS permutation test is a robust statistical inference method with an acceptable type I error rate and superior power to detect differences in treatment effects, especially when the proportional hazards assumption is violated. The ABS can be used to intuitively quantify treatment differences over time and provide reliable conclusions in complicated situations, such as crossing survival curves. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.