27,272 results on '"Models, Statistical"'
Search Results
2. Prefiltered component-based greedy (PreCoG) scan method.
- Author
-
French JP, Meysami M, and Lipner EM
- Subjects
- Humans, Cluster Analysis, Computer Simulation, Risk Factors, Models, Statistical, Algorithms
- Abstract
The spatial distribution of disease cases can provide important insights into disease spread and its potential risk factors. Identifying disease clusters correctly can help us discover new risk factors and inform interventions to control and prevent the spread of disease as quickly as possible. In this study, we propose a novel scan method, the Prefiltered Component-based Greedy (PreCoG) scan method, which efficiently and accurately detects irregularly shaped clusters using a prefiltered component-based algorithm. The PreCoG scan method's flexibility allows it to perform well in detecting both regularly and irregularly-shaped clusters. Additionally, it is fast to apply while providing high power, sensitivity, and positive predictive value for the detected clusters compared to other scan methods. To confirm the effectiveness of the PreCoG method, we compare its performance to many other scan methods. Additionally, we have implemented this method in the smerc R package to make it publicly available to other researchers. Our proposed PreCoG scan method presents a unique and innovative process for detecting disease clusters and can improve the accuracy of disease surveillance systems., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
3. Periodically correlated time series and the Variable Bandpass Periodic Block Bootstrap.
- Author
-
Valachovic EL
- Subjects
- Time Factors, Computer Simulation, Models, Statistical, Algorithms
- Abstract
This research introduces a novel approach to resampling periodically correlated time series using bandpass filters for frequency separation called the Variable Bandpass Periodic Block Bootstrap and then examines the significant advantages of this new method. While bootstrapping allows estimation of a statistic's sampling distribution by resampling the original data with replacement, and block bootstrapping is a model-free resampling strategy for correlated time series data, both fail to preserve correlations in periodically correlated time series. Existing extensions of the block bootstrap help preserve the correlation structures of periodically correlated processes but suffer from flaws and inefficiencies. Analyses of time series data containing cyclic, seasonal, or periodically correlated principal components often seen in annual, daily, or other cyclostationary processes benefit from separating these components. The Variable Bandpass Periodic Block Bootstrap uses bandpass filters to separate a periodically correlated component from interference such as noise at other uncorrelated frequencies. A simulation study is presented, demonstrating near universal improvements obtained from the Variable Bandpass Periodic Block Bootstrap when compared with prior block bootstrapping methods for periodically correlated time series., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Edward L. Valachovic. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2024
- Full Text
- View/download PDF
4. Latent classification model for censored longitudinal binary outcome.
- Author
-
Kuo JC, Chan W, Leon-Novelo L, Lairson DR, Brown A, and Fujimoto K
- Subjects
- Humans, Longitudinal Studies, Computer Simulation, Models, Statistical, Texas epidemiology, SARS-CoV-2, Female, COVID-19 epidemiology, Markov Chains, Latent Class Analysis, Algorithms
- Abstract
Latent classification model is a class of statistical methods for identifying unobserved class membership among the study samples using some observed data. In this study, we proposed a latent classification model that takes a censored longitudinal binary outcome variable and uses its changing pattern over time to predict individuals' latent class membership. Assuming the time-dependent outcome variables follow a continuous-time Markov chain, the proposed method has two primary goals: (1) estimate the distribution of the latent classes and predict individuals' class membership, and (2) estimate the class-specific transition rates and rate ratios. To assess the model's performance, we conducted a simulation study and verified that our algorithm produces accurate model estimates (ie, small bias) with reasonable confidence intervals (ie, achieving approximately 95% coverage probability). Furthermore, we compared our model to four other existing latent class models and demonstrated that our approach yields higher prediction accuracies for latent classes. We applied our proposed method to analyze the COVID-19 data in Houston, Texas, US collected between January first 2021 and December 31st 2021. Early reports on the COVID-19 pandemic showed that the severity of a SARS-CoV-2 infection tends to vary greatly by cases. We found that while demographic characteristics explain some of the differences in individuals' experience with COVID-19, some unaccounted-for latent variables were associated with the disease., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
5. Addressing dispersion in mis-measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data.
- Author
-
Zhao K, Oualkacha K, Zeng Y, Shen C, Klein K, Lakhal-Chaieb L, Labbe A, Pastinen T, Hudson M, Colmegna I, Bernatsky S, and Greenwood CMT
- Subjects
- Humans, Multivariate Analysis, Arthritis, Rheumatoid genetics, Likelihood Functions, Sulfites chemistry, Sequence Analysis, DNA methods, DNA Methylation, Algorithms, Computer Simulation, Models, Statistical
- Abstract
Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS.", (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
6. A multivariate to multivariate approach for voxel-wise genome-wide association analysis.
- Author
-
Wu Q, Zhang Y, Huang X, Ma T, Hong LE, Kochunov P, and Chen S
- Subjects
- Humans, Multivariate Analysis, White Matter diagnostic imaging, Connectome methods, Models, Statistical, Brain diagnostic imaging, Corpus Callosum diagnostic imaging, Genome-Wide Association Study methods, Polymorphism, Single Nucleotide, Computer Simulation, Algorithms
- Abstract
The joint analysis of imaging-genetics data facilitates the systematic investigation of genetic effects on brain structures and functions with spatial specificity. We focus on voxel-wise genome-wide association analysis, which may involve trillions of single nucleotide polymorphism (SNP)-voxel pairs. We attempt to identify underlying organized association patterns of SNP-voxel pairs and understand the polygenic and pleiotropic networks on brain imaging traits. We propose a bi-clique graph structure (ie, a set of SNPs highly correlated with a cluster of voxels) for the systematic association pattern. Next, we develop computational strategies to detect latent SNP-voxel bi-cliques and an inference model for statistical testing. We further provide theoretical results to guarantee the accuracy of our computational algorithms and statistical inference. We validate our method by extensive simulation studies, and then apply it to the whole genome genetic and voxel-level white matter integrity data collected from 1052 participants of the human connectome project. The results demonstrate multiple genetic loci influencing white matter integrity measures on splenium and genu of the corpus callosum., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
7. Accelerating joint species distribution modelling with Hmsc-HPC by GPU porting.
- Author
-
Rahman AU, Tikhonov G, Oksanen J, Rossi T, and Ovaskainen O
- Subjects
- Models, Biological, Machine Learning, Computer Graphics, Models, Statistical, Humans, Algorithms, Computational Biology methods, Software
- Abstract
Joint species distribution modelling (JSDM) is a widely used statistical method that analyzes combined patterns of all species in a community, linking empirical data to ecological theory and enhancing community-wide prediction tasks. However, fitting JSDMs to large datasets is often computationally demanding and time-consuming. Recent studies have introduced new statistical and machine learning techniques to provide more scalable fitting algorithms, but extending these to complex JSDM structures that account for spatial dependencies or multi-level sampling designs remains challenging. In this study, we aim to enhance JSDM scalability by leveraging high-performance computing (HPC) resources for an existing fitting method. Our work focuses on the Hmsc R-package, a widely used JSDM framework that supports the integration of various dataset types into a single comprehensive model. We developed a GPU-compatible implementation of its model-fitting algorithm using Python and the TensorFlow library. Despite these changes, our enhanced framework retains the original user interface of the Hmsc R-package. We evaluated the performance of the proposed implementation across various model configurations and dataset sizes. Our results show a significant increase in model fitting speed for most models compared to the baseline Hmsc R-package. For the largest datasets, we achieved speed-ups of over 1000 times, demonstrating the substantial potential of GPU porting for previously CPU-bound JSDM software. This advancement opens promising opportunities for better utilizing the rapidly accumulating new biodiversity data resources for inference and prediction., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Rahman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2024
- Full Text
- View/download PDF
8. A 2PLM-RANK multidimensional forced-choice model and its fast estimation algorithm.
- Author
-
Zheng C, Liu J, Li Y, Xu P, Zhang B, Wei R, Zhang W, Liu B, and Huang J
- Subjects
- Humans, Computer Simulation, Models, Statistical, Models, Psychological, Algorithms, Choice Behavior physiology
- Abstract
High-stakes non-cognitive tests frequently employ forced-choice (FC) scales to deter faking. To mitigate the issue of score ipsativity derived, many scoring models have been devised. Among them, the multi-unidimensional pairwise preference (MUPP) framework is a highly flexible and commonly used framework. However, the original MUPP model was developed for unfolding response process and can only handle paired comparisons. The present study proposes the 2PLM-RANK as a generalization of the MUPP model to accommodate dominance RANK format response. In addition, an improved stochastic EM (iStEM) algorithm is devised for more stable and efficient parameter estimation. Simulation results generally supported the efficiency and utility of the new algorithm in estimating the 2PLM-RANK when applied to both triplets and tetrads across various conditions. An empirical illustration with responses to a 24-dimensional personality test further supported the practicality of the proposed model. To further aid in the application of the new model, a user-friendly R package is also provided., (© 2024. The Psychonomic Society, Inc.)
- Published
- 2024
- Full Text
- View/download PDF
9. Simultaneous multi-transient linear-combination modeling of MRS data improves uncertainty estimation.
- Author
-
Zöllner HJ, Davies-Jenkins C, Simicic D, Tal A, Sulam J, and Oeltzschner G
- Subjects
- Humans, Reproducibility of Results, Linear Models, Sensitivity and Specificity, Signal-To-Noise Ratio, gamma-Aminobutyric Acid metabolism, Models, Statistical, Magnetic Resonance Spectroscopy methods, Computer Simulation, Monte Carlo Method, Algorithms
- Abstract
Purpose: The interest in applying and modeling dynamic MRS has recently grown. Two-dimensional modeling yields advantages for the precision of metabolite estimation in interrelated MRS data. However, it is unknown whether including all transients simultaneously in a 2D model without averaging (presuming a stable signal) performs similarly to one-dimensional (1D) modeling of the averaged spectrum. Therefore, we systematically investigated the accuracy, precision, and uncertainty estimation of both described model approaches., Methods: Monte Carlo simulations of synthetic MRS data were used to compare the accuracy and uncertainty estimation of simultaneous 2D multitransient linear-combination modeling (LCM) with 1D-LCM of the average. A total of 2,500 data sets per condition with different noise representations of a 64-transient MRS experiment at six signal-to-noise levels for two separate spin systems (scyllo-inositol and gamma-aminobutyric acid) were analyzed. Additional data sets with different levels of noise correlation were also analyzed. Modeling accuracy was assessed by determining the relative bias of the estimated amplitudes against the ground truth, and modeling precision was determined by SDs and Cramér-Rao lower bounds (CRLBs)., Results: Amplitude estimates for 1D- and 2D-LCM agreed well and showed a similar level of bias compared with the ground truth. Estimated CRLBs agreed well between both models and with ground-truth CRLBs. For correlated noise, the estimated CRLBs increased with the correlation strength for the 1D-LCM but remained stable for the 2D-LCM., Conclusion: Our results indicate that the model performance of 2D multitransient LCM is similar to averaged 1D-LCM. This validation on a simplified scenario serves as a necessary basis for further applications of 2D modeling., (© 2024 International Society for Magnetic Resonance in Medicine.)
- Published
- 2024
- Full Text
- View/download PDF
10. Advanced OCTA imaging segmentation: Unsupervised, non-linear retinal vessel detection using modified self-organizing maps and joint MGRF modeling.
- Author
-
Alksas A, Sharafeldeen A, Balaha HM, Haq MZ, Mahmoud A, Ghazal M, Alghamdi NS, Alhalabi M, Yousaf J, Sandhu H, and El-Baz A
- Subjects
- Humans, Image Processing, Computer-Assisted methods, Markov Chains, Retinal Diseases diagnostic imaging, Models, Statistical, Diagnosis, Computer-Assisted methods, Angiography methods, Retinal Vessels diagnostic imaging, Tomography, Optical Coherence methods, Algorithms
- Abstract
Background and Objective: This paper proposes a fully automated and unsupervised stochastic segmentation approach using two-level joint Markov-Gibbs Random Field (MGRF) to detect the vascular system from retinal Optical Coherence Tomography Angiography (OCTA) images, which is a critical step in developing Computer-Aided Diagnosis (CAD) systems for detecting retinal diseases., Methods: Using a new probabilistic model based on a Linear Combination of Discrete Gaussian (LCDG), the first level models the appearance of OCTA images and their spatially smoothed images. The parameters of the LCDG model are estimated using a modified Expectation Maximization (EM) algorithm. The second level models the maps of OCTA images, including the vascular system and other retina tissues, using MGRF with analytically estimated parameters from the input images. The proposed segmentation approach employs modified self-organizing maps as a MAP-based optimizer maximizing the joint likelihood and handles the Joint MGRF model in a new, unsupervised way. This approach deviates from traditional stochastic optimization approaches and leverages non-linear optimization to achieve more accurate segmentation results., Results: The proposed segmentation framework is evaluated quantitatively on a dataset of 204 subjects. Achieving 0.92 ± 0.03 Dice similarity coefficient, 0.69 ± 0.25 95-percentile bidirectional Hausdorff distance, and 0.93 ± 0.03 accuracy, confirms the superior performance of the proposed approach., Conclusions: The conclusions drawn from the study highlight the superior performance of the proposed unsupervised and fully automated segmentation approach in detecting the vascular system from OCTA images. This approach not only deviates from traditional methods but also achieves more accurate segmentation results, demonstrating its potential in aiding the development of CAD systems for detecting retinal diseases., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 Elsevier B.V. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
11. deepAFT: A nonlinear accelerated failure time model with artificial neural network.
- Author
-
Norman PA, Li W, Jiang W, and Chen BE
- Subjects
- Humans, Survival Analysis, Deep Learning, Models, Statistical, Neural Networks, Computer, Proportional Hazards Models, Algorithms, Computer Simulation, Nonlinear Dynamics
- Abstract
The Cox regression model or accelerated failure time regression models are often used for describing the relationship between survival outcomes and potential explanatory variables. These models assume the studied covariates are connected to the survival time or its distribution or their transformations through a function of a linear regression form. In this article, we propose nonparametric, nonlinear algorithms (deepAFT methods) based on deep artificial neural networks to model survival outcome data in the broad distribution family of accelerated failure time models. The proposed methods predict survival outcomes directly and tackle the problem of censoring via an imputation algorithm as well as re-weighting and transformation techniques based on the inverse probabilities of censoring. Through extensive simulation studies, we confirm that the proposed deepAFT methods achieve accurate predictions. They outperform the existing regression models in prediction accuracy, while being flexible and robust in modeling covariate effects of various nonlinear forms. Their prediction performance is comparable to other established deep learning methods such as deepSurv and random survival forest methods. Even though the direct output is the expected survival time, the proposed AFT methods also provide predictions for distributional functions such as the cumulative hazard and survival functions without additional learning efforts. For situations where the popular Cox regression model may not be appropriate, the deepAFT methods provide useful and effective alternatives, as shown in simulations, and demonstrated in applications to a lymphoma clinical trial study., (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
12. Bayesian mixture modelling with ranked set samples.
- Author
-
Alvandi A, Omidvar S, Hatefi A, Jafari Jozani M, Ozturk O, and Nematollahi N
- Subjects
- Humans, Female, Middle Aged, Aged, Computer Simulation, Monte Carlo Method, Likelihood Functions, Markov Chains, Bayes Theorem, Models, Statistical, Algorithms
- Abstract
We consider the Bayesian estimation of the parameters of a finite mixture model from independent order statistics arising from imperfect ranked set sampling designs. As a cost-effective method, ranked set sampling enables us to incorporate easily attainable characteristics, as ranking information, into data collection and Bayesian estimation. To handle the special structure of the ranked set samples, we develop a Bayesian estimation approach exploiting the Expectation-Maximization (EM) algorithm in estimating the ranking parameters and Metropolis within Gibbs Sampling to estimate the parameters of the underlying mixture model. Our findings show that the proposed RSS-based Bayesian estimation method outperforms the commonly used Bayesian counterpart using simple random sampling. The developed method is finally applied to estimate the bone disorder status of women aged 50 and older., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
13. Distributed non-disclosive validation of predictive models by a modified ROC-GLM.
- Author
-
Schalk D, Rehms R, Hoffmann VS, Bischl B, and Mansmann U
- Subjects
- Humans, Linear Models, Models, Statistical, Privacy, Databases, Factual statistics & numerical data, ROC Curve, Algorithms, Area Under Curve
- Abstract
Background: Distributed statistical analyses provide a promising approach for privacy protection when analyzing data distributed over several databases. Instead of directly operating on data, the analyst receives anonymous summary statistics, which are combined into an aggregated result. Further, in discrimination model (prognosis, diagnosis, etc.) development, it is key to evaluate a trained model w.r.t. to its prognostic or predictive performance on new independent data. For binary classification, quantifying discrimination uses the receiver operating characteristics (ROC) and its area under the curve (AUC) as aggregation measure. We are interested to calculate both as well as basic indicators of calibration-in-the-large for a binary classification task using a distributed and privacy-preserving approach., Methods: We employ DataSHIELD as the technology to carry out distributed analyses, and we use a newly developed algorithm to validate the prediction score by conducting distributed and privacy-preserving ROC analysis. Calibration curves are constructed from mean values over sites. The determination of ROC and its AUC is based on a generalized linear model (GLM) approximation of the true ROC curve, the ROC-GLM, as well as on ideas of differential privacy (DP). DP adds noise (quantified by the ℓ 2 sensitivity Δ 2 ( f ^ ) ) to the data and enables a global handling of placement numbers. The impact of DP parameters was studied by simulations., Results: In our simulation scenario, the true and distributed AUC measures differ by Δ AUC < 0.01 depending heavily on the choice of the differential privacy parameters. It is recommended to check the accuracy of the distributed AUC estimator in specific simulation scenarios along with a reasonable choice of DP parameters. Here, the accuracy of the distributed AUC estimator may be impaired by too much artificial noise added from DP., Conclusions: The applicability of our algorithms depends on the ℓ 2 sensitivity Δ 2 ( f ^ ) of the underlying statistical/predictive model. The simulations carried out have shown that the approximation error is acceptable for the majority of simulated cases. For models with high Δ 2 ( f ^ ) , the privacy parameters must be set accordingly higher to ensure sufficient privacy protection, which affects the approximation error. This work shows that complex measures, as the AUC, are applicable for validation in distributed setups while preserving an individual's privacy., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
14. A practically efficient algorithm for identifying critical control proteins in directed probabilistic biological networks.
- Author
-
Tokuhara Y, Akutsu T, Schwartz JM, and Nacher JC
- Subjects
- Humans, Computational Biology methods, Proteins metabolism, Proteins genetics, Probability, Models, Biological, Models, Statistical, Systems Biology methods, Algorithms, COVID-19, Signal Transduction physiology, Signal Transduction genetics, SARS-CoV-2
- Abstract
Network controllability is unifying the traditional control theory with the structural network information rooted in many large-scale biological systems of interest, from intracellular networks in molecular biology to brain neuronal networks. In controllability approaches, the set of minimum driver nodes is not unique, and critical nodes are the most important control elements because they appear in all possible solution sets. On the other hand, a common but largely unexplored feature in network control approaches is the probabilistic failure of edges or the uncertainty in the determination of interactions between molecules. This is particularly true when directed probabilistic interactions are considered. Until now, no efficient algorithm existed to determine critical nodes in probabilistic directed networks. Here we present a probabilistic control model based on a minimum dominating set framework that integrates the probabilistic nature of directed edges between molecules and determines the critical control nodes that drive the entire network functionality. The proposed algorithm, combined with the developed mathematical tools, offers practical efficiency in determining critical control nodes in large probabilistic networks. The method is then applied to the human intracellular signal transduction network revealing that critical control nodes are associated with important biological features and perturbed sets of genes in human diseases, including SARS-CoV-2 target proteins and rare disorders. We believe that the proposed methodology can be useful to investigate multiple biological systems in which directed edges are probabilistic in nature, both in natural systems or when determined with large uncertainties in-silico., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
15. rtestim: Time-varying reproduction number estimation with trend filtering.
- Author
-
Liu J, Cai Z, Gustafson P, and McDonald DJ
- Subjects
- Humans, Computational Biology methods, Communicable Diseases epidemiology, Computer Simulation, Software, Epidemiological Models, Poisson Distribution, Models, Statistical, Algorithms, Basic Reproduction Number
- Abstract
To understand the transmissibility and spread of infectious diseases, epidemiologists turn to estimates of the instantaneous reproduction number. While many estimation approaches exist, their utility may be limited. Challenges of surveillance data collection, model assumptions that are unverifiable with data alone, and computationally inefficient frameworks are critical limitations for many existing approaches. We propose a discrete spline-based approach that solves a convex optimization problem-Poisson trend filtering-using the proximal Newton method. It produces a locally adaptive estimator for instantaneous reproduction number estimation with heterogeneous smoothness. Our methodology remains accurate even under some process misspecifications and is computationally efficient, even for large-scale data. The implementation is easily accessible in a lightweight R package rtestim., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2024
- Full Text
- View/download PDF
16. jmBIG: enhancing dynamic risk prediction and personalized medicine through joint modeling of longitudinal and survival data in big routinely collected data.
- Author
-
Bhattacharjee A, Rajbongshi BK, and Vishwakarma GK
- Subjects
- Humans, Longitudinal Studies, Survival Analysis, Risk Assessment methods, Risk Assessment statistics & numerical data, Models, Statistical, Software, Precision Medicine methods, Precision Medicine statistics & numerical data, Bayes Theorem, Algorithms, Big Data
- Abstract
We have introduced the R package jmBIG to facilitate the analysis of large healthcare datasets and the development of predictive models. This package provides a comprehensive set of tools and functions specifically designed for the joint modelling of longitudinal and survival data in the context of big data analytics. The jmBIG package offers efficient and scalable implementations of joint modelling algorithms, allowing for integrating large-scale healthcare datasets.By utilizing the capabilities of jmBIG, researchers and analysts can effectively handle the challenges associated with big healthcare data, such as high dimensionality and complex relationships between multiple outcomes.With the support of jmBIG, analysts can seamlessly fit Bayesian joint models, generate predictions, and evaluate the performance of the models. The package incorporates cutting-edge methodologies and harnesses the computational capabilities of parallel computing to accelerate the analysis of large-scale healthcare datasets significantly. In summary, jmBIG empowers researchers to gain deeper insights into disease progression and treatment response, fostering evidence-based decision-making and paving the way for personalized healthcare interventions that can positively impact patient outcomes on a larger scale., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
17. A Bayesian adaptive design approach for stepped-wedge cluster randomized trials.
- Author
-
Wang J, Cao J, Ahn C, and Zhang S
- Subjects
- Humans, Cluster Analysis, Computer Simulation, Models, Statistical, Sample Size, Bayes Theorem, Research Design, Randomized Controlled Trials as Topic methods, Algorithms
- Abstract
Background: The Bayesian group sequential design has been applied widely in clinical studies, especially in Phase II and III studies. It allows early termination based on accumulating interim data. However, to date, there lacks development in its application to stepped-wedge cluster randomized trials, which are gaining popularity in pragmatic trials conducted by clinical and health care delivery researchers., Methods: We propose a Bayesian adaptive design approach for stepped-wedge cluster randomized trials, which makes adaptive decisions based on the predictive probability of declaring the intervention effective at the end of study given interim data. The Bayesian models and the algorithms for posterior inference and trial conduct are presented., Results: We present how to determine design parameters through extensive simulations to achieve desired operational characteristics. We further evaluate how various design factors, such as the number of steps, cluster size, random variability in cluster size, and correlation structures, impact trial properties, including power, type I error, and the probability of early stopping. An application example is presented., Conclusion: This study presents the incorporation of Bayesian adaptive strategies into stepped-wedge cluster randomized trials design. The proposed approach provides the flexibility to stop the trial early if substantial evidence of efficacy or futility is observed, improving the flexibility and efficiency of stepped-wedge cluster randomized trials., Competing Interests: Declaration of conflicting interestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
- Published
- 2024
- Full Text
- View/download PDF
18. Model-agnostic unsupervised detection of bots in a Likert-type questionnaire.
- Author
-
Ilagan MJ and Falk CF
- Subjects
- Humans, Surveys and Questionnaires, Models, Statistical, Computer Simulation, Data Interpretation, Statistical, Algorithms
- Abstract
To detect bots in online survey data, there is a wealth of literature on statistical detection using only responses to Likert-type items. There are two traditions in the literature. One tradition requires labeled data, forgoing strong model assumptions. The other tradition requires a measurement model, forgoing collection of labeled data. In the present article, we consider the problem where neither requirement is available, for an inventory that has the same number of Likert-type categories for all items. We propose a bot detection algorithm that is both model-agnostic and unsupervised. Our proposed algorithm involves a permutation test with leave-one-out calculations of outlier statistics. For each respondent, it outputs a p value for the null hypothesis that the respondent is a bot. Such an algorithm offers nominal sensitivity calibration that is robust to the bot response distribution. In a simulation study, we found our proposed algorithm to improve upon naive alternatives in terms of 95% sensitivity calibration and, in many scenarios, in terms of classification accuracy., (© 2023. The Psychonomic Society, Inc.)
- Published
- 2024
- Full Text
- View/download PDF
19. An enhanced cross-sectional HIV incidence estimator that incorporates prior HIV test results.
- Author
-
Bannick M, Donnell D, Hayes R, Laeyendecker O, and Gao F
- Subjects
- Humans, Incidence, Cross-Sectional Studies, Computer Simulation, Models, Statistical, Male, Randomized Controlled Trials as Topic, HIV Testing statistics & numerical data, Female, Sensitivity and Specificity, HIV Infections epidemiology, Algorithms
- Abstract
Incidence estimation of HIV infection can be performed using recent infection testing algorithm (RITA) results from a cross-sectional sample. This allows practitioners to understand population trends in the HIV epidemic without having to perform longitudinal follow-up on a cohort of individuals. The utility of the approach is limited by its precision, driven by the (low) sensitivity of the RITA at identifying recent infection. By utilizing results of previous HIV tests that individuals may have taken, we consider an enhanced RITA with increased sensitivity (and specificity). We use it to propose an enhanced estimator for incidence estimation. We prove the theoretical properties of the enhanced estimator and illustrate its numerical performance in simulation studies. We apply the estimator to data from a cluster-randomized trial to study the effect of community-level HIV interventions on HIV incidence. We demonstrate that the enhanced estimator provides a more precise estimate of HIV incidence compared to the standard estimator., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
20. Sample size determination for prediction models via learning-type curves.
- Author
-
Dayimu A, Simidjievski N, Demiris N, and Abraham J
- Subjects
- Humans, Sample Size, Learning Curve, Normal Distribution, Computer Simulation, Survival Analysis, Models, Statistical, Algorithms
- Abstract
This article is concerned with sample size determination methodology for prediction models. We propose to combine the individual calculations via learning-type curves. We suggest two distinct ways of doing so, a deterministic skeleton of a learning curve and a Gaussian process centered upon its deterministic counterpart. We employ several learning algorithms for modeling the primary endpoint and distinct measures for trial efficacy. We find that the performance may vary with the sample size, but borrowing information across sample size universally improves the performance of such calculations. The Gaussian process-based learning curve appears more robust and statistically efficient, while computational efficiency is comparable. We suggest that anchoring against historical evidence when extrapolating sample sizes should be adopted when such data are available. The methods are illustrated on binary and survival endpoints., (© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
21. A fast bootstrap algorithm for causal inference with large data.
- Author
-
Kosko M, Wang L, and Santacatterina M
- Subjects
- Humans, Female, Confidence Intervals, Coronary Disease epidemiology, Models, Statistical, Data Interpretation, Statistical, Bias, Observational Studies as Topic methods, Observational Studies as Topic statistics & numerical data, Algorithms, Causality, Computer Simulation
- Abstract
Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non-causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this article, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
22. Post-selection inference in regression models for group testing data.
- Author
-
Shen Q, Gregory K, and Huang X
- Subjects
- Likelihood Functions, Humans, Logistic Models, Data Interpretation, Statistical, Biometry methods, Models, Statistical, Computer Simulation, Algorithms
- Abstract
We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
23. Visibility graph-based covariance functions for scalable spatial analysis in non-convex partially Euclidean domains.
- Author
-
Gilbert B and Datta A
- Subjects
- Models, Statistical, Normal Distribution, Biometry methods, Algorithms, Spatial Analysis, Computer Simulation
- Abstract
We present a new method for constructing valid covariance functions of Gaussian processes for spatial analysis in irregular, non-convex domains such as bodies of water. Standard covariance functions based on geodesic distances are not guaranteed to be positive definite on such domains, while existing non-Euclidean approaches fail to respect the partially Euclidean nature of these domains where the geodesic distance agrees with the Euclidean distances for some pairs of points. Using a visibility graph on the domain, we propose a class of covariance functions that preserve Euclidean-based covariances between points that are connected in the domain while incorporating the non-convex geometry of the domain via conditional independence relationships. We show that the proposed method preserves the partially Euclidean nature of the intrinsic geometry on the domain while maintaining validity (positive definiteness) and marginal stationarity of the covariance function over the entire parameter space, properties which are not always fulfilled by existing approaches to construct covariance functions on non-convex domains. We provide useful approximations to improve computational efficiency, resulting in a scalable algorithm. We compare the performance of our method with those of competing state-of-the-art methods using simulation studies on synthetic non-convex domains. The method is applied to data regarding acidity levels in the Chesapeake Bay, showing its potential for ecological monitoring in real-world spatial applications on irregular domains., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
24. High-dimensional multivariate analysis of variance via geometric median and bootstrapping.
- Author
-
Cheng G, Lin R, and Peng L
- Subjects
- Humans, Multivariate Analysis, Models, Statistical, Female, Data Interpretation, Statistical, Gene Expression Profiling statistics & numerical data, Sample Size, Biometry methods, Breast Neoplasms genetics, Computer Simulation, Algorithms
- Abstract
The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
25. Summary statistics knockoffs inference with family-wise error rate control.
- Author
-
Yu CX, Gu J, Chen Z, and He Z
- Subjects
- Humans, Models, Statistical, Data Interpretation, Statistical, Biometry methods, Algorithms, Alzheimer Disease genetics, Computer Simulation
- Abstract
Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent on the response. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of the proposed method over existing alternatives in both statistical power and computational efficiency., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
26. PathGPS: discover shared genetic architecture using GWAS summary data.
- Author
-
Gao Z, Zhao Q, and Hastie T
- Subjects
- Humans, Metabolomics methods, Principal Component Analysis, Models, Genetic, Polymorphism, Single Nucleotide, Biological Specimen Banks, Computer Simulation, Models, Statistical, Genome-Wide Association Study statistics & numerical data, Algorithms
- Abstract
The increasing availability and scale of biobanks and "omic" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of "signal" genes with those of "noise" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating ("bagging") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
27. Bayesian analysis of joint quantile regression for multi-response longitudinal data with application to primary biliary cirrhosis sequential cohort study.
- Author
-
Tian YZ, Tang ML, Wong C, and Tian MZ
- Subjects
- Humans, Longitudinal Studies, Cohort Studies, Regression Analysis, Models, Statistical, Likelihood Functions, Bayes Theorem, Liver Cirrhosis, Biliary, Monte Carlo Method, Markov Chains, Algorithms
- Abstract
This article proposes a Bayesian approach for jointly estimating marginal conditional quantiles of multi-response longitudinal data with multivariate mixed effects model. The multivariate asymmetric Laplace distribution is employed to construct the working likelihood of the considered model. Penalization priors on regression parameters are incorporated into the working likelihood to conduct Bayesian high-dimensional inference. Markov chain Monte Carlo algorithm is used to obtain the fully conditional posterior distributions of all parameters and latent variables. Monte Carlo simulations are conducted to evaluate the sample performance of the proposed joint quantile regression approach. Finally, we analyze a longitudinal medical dataset of the primary biliary cirrhosis sequential cohort study to illustrate the real application of the proposed modeling method., Competing Interests: Declaration of conflicting interestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
- Published
- 2024
- Full Text
- View/download PDF
28. A structured iterative division approach for non-sparse regression models and applications in biological data analysis.
- Author
-
Yu S and Yang Y
- Subjects
- Humans, Regression Analysis, Models, Statistical, Female, Prognosis, Algorithms, Breast Neoplasms genetics, Alzheimer Disease genetics
- Abstract
In this paper, we focus on the modeling problem of estimating data with non-sparse structures, specifically focusing on biological data that exhibit a high degree of relevant features. Various fields, such as biology and finance, face the challenge of non-sparse estimation. We address the problems using the proposed method, called structured iterative division. Structured iterative division effectively divides data into non-sparse and sparse structures and eliminates numerous irrelevant variables, significantly reducing the error while maintaining computational efficiency. Numerical and theoretical results demonstrate the competitive advantage of the proposed method on a wide range of problems, and the proposed method exhibits excellent statistical performance in numerical comparisons with several existing methods. We apply the proposed algorithm to two biology problems, gene microarray datasets, and chimeric protein datasets, to the prognostic risk of distant metastasis in breast cancer and Alzheimer's disease, respectively. Structured iterative division provides insights into gene identification and selection, and we also provide meaningful results in anticipating cancer risk and identifying key factors., Competing Interests: Declaration of conflicting interestsThe authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
- Published
- 2024
- Full Text
- View/download PDF
29. Joint structure learning and causal effect estimation for categorical graphical models.
- Author
-
Castelletti F, Consonni G, and Della Vedova ML
- Subjects
- Humans, Anxiety, Biometry methods, Markov Chains, Causality, Computer Simulation, Algorithms, Monte Carlo Method, Models, Statistical, Depression
- Abstract
The scope of this paper is a multivariate setting involving categorical variables. Following an external manipulation of one variable, the goal is to evaluate the causal effect on an outcome of interest. A typical scenario involves a system of variables representing lifestyle, physical and mental features, symptoms, and risk factors, with the outcome being the presence or absence of a disease. These variables are interconnected in complex ways, allowing the effect of an intervention to propagate through multiple paths. A distinctive feature of our approach is the estimation of causal effects while accounting for uncertainty in both the dependence structure, which we represent through a directed acyclic graph (DAG), and the DAG-model parameters. Specifically, we propose a Markov chain Monte Carlo algorithm that targets the joint posterior over DAGs and parameters, based on an efficient reversible-jump proposal scheme. We validate our method through extensive simulation studies and demonstrate that it outperforms current state-of-the-art procedures in terms of estimation accuracy. Finally, we apply our methodology to analyze a dataset on depression and anxiety in undergraduate students., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
30. Reduced-rank clustered coefficient regression for addressing multicollinearity in heterogeneous coefficient estimation.
- Author
-
Zhong Y, He K, and Li G
- Subjects
- Humans, Cluster Analysis, Regression Analysis, SARS-CoV-2, Biometry methods, Data Interpretation, Statistical, Algorithms, COVID-19, Computer Simulation, Models, Statistical
- Abstract
Clustered coefficient regression (CCR) extends the classical regression model by allowing regression coefficients varying across observations and forming clusters of observations. It has become an increasingly useful tool for modeling the heterogeneous relationship between the predictor and response variables. A typical issue of existing CCR methods is that the estimation and clustering results can be unstable in the presence of multicollinearity. To address the instability issue, this paper introduces a low-rank structure of the CCR coefficient matrix and proposes a penalized non-convex optimization problem with an adaptive group fusion-type penalty tailor-made for this structure. An iterative algorithm is developed to solve this non-convex optimization problem with guaranteed convergence. An upper bound for the coefficient estimation error is also obtained to show the statistical property of the estimator. Empirical studies on both simulated datasets and a COVID-19 mortality rate dataset demonstrate the superiority of the proposed method to existing methods., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
31. Finite Mixtures of Latent Trait Analyzers With Concomitant Variables for Bipartite Networks: An Analysis of COVID-19 Data.
- Author
-
Failli D, Marino MF, and Martella F
- Subjects
- Humans, Computer Simulation, Cluster Analysis, SARS-CoV-2, COVID-19, Algorithms, Models, Statistical
- Abstract
Networks consist of interconnected units, known as nodes, and allow to formally describe interactions within a system. Specifically, bipartite networks depict relationships between two distinct sets of nodes, designated as sending and receiving nodes. An integral aspect of bipartite network analysis often involves identifying clusters of nodes with similar behaviors. The computational complexity of models for large bipartite networks poses a challenge. To mitigate this challenge, we employ a Mixture of Latent Trait Analyzers (MLTA) for node clustering. Our approach extends the MLTA to include covariates and introduces a double EM algorithm for estimation. Applying our method to COVID-19 data, with sending nodes representing patients and receiving nodes representing preventive measures, enables dimensionality reduction and the identification of meaningful groups. We present simulation results demonstrating the accuracy of the proposed method.
- Published
- 2024
- Full Text
- View/download PDF
32. Hypothesis tests in ordinal predictive models with optimal accuracy.
- Author
-
Liu Y, Luo S, and Li J
- Subjects
- Humans, Likelihood Functions, ROC Curve, Biometry methods, Algorithms, Computer Simulation, Models, Statistical
- Abstract
In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
33. The multivariate Bernoulli detector: change point estimation in discrete survival analysis.
- Author
-
van den Boom W, De Iorio M, Qian F, and Guglielmi A
- Subjects
- Humans, Survival Analysis, Models, Statistical, Multivariate Analysis, Biometry methods, Markov Chains, Computer Simulation, Monte Carlo Method, Bayes Theorem, Algorithms
- Abstract
Time-to-event data are often recorded on a discrete scale with multiple, competing risks as potential causes for the event. In this context, application of continuous survival analysis methods with a single risk suffers from biased estimation. Therefore, we propose the multivariate Bernoulli detector for competing risks with discrete times involving a multivariate change point model on the cause-specific baseline hazards. Through the prior on the number of change points and their location, we impose dependence between change points across risks, as well as allowing for data-driven learning of their number. Then, conditionally on these change points, a multivariate Bernoulli prior is used to infer which risks are involved. Focus of posterior inference is cause-specific hazard rates and dependence across risks. Such dependence is often present due to subject-specific changes across time that affect all risks. Full posterior inference is performed through a tailored local-global Markov chain Monte Carlo (MCMC) algorithm, which exploits a data augmentation trick and MCMC updates from nonconjugate Bayesian nonparametric methods. We illustrate our model in simulations and on ICU data, comparing its performance with existing approaches., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
34. Advancing unanchored simulated treatment comparisons: A novel implementation and simulation study.
- Author
-
Ren S, Ren S, Welton NJ, and Strong M
- Subjects
- Humans, Research Design, Models, Statistical, Treatment Outcome, Reproducibility of Results, Bias, Data Interpretation, Statistical, Computer Simulation, Algorithms, Technology Assessment, Biomedical
- Abstract
Population-adjusted indirect comparisons, developed in the 2010s, enable comparisons between two treatments in different studies by balancing patient characteristics in the case where individual patient-level data (IPD) are available for only one study. Health technology assessment (HTA) bodies increasingly rely on these methods to inform funding decisions, typically using unanchored indirect comparisons (i.e., without a common comparator), due to the need to evaluate comparative efficacy and safety for single-arm trials. Unanchored matching-adjusted indirect comparison (MAIC) and unanchored simulated treatment comparison (STC) are currently the only two approaches available for population-adjusted indirect comparisons based on single-arm trials. However, there is a notable underutilisation of unanchored STC in HTA, largely due to a lack of understanding of its implementation. We therefore develop a novel way to implement unanchored STC by incorporating standardisation/marginalisation and the NORmal To Anything (NORTA) algorithm for sampling covariates. This methodology aims to derive a suitable marginal treatment effect without aggregation bias for HTA evaluations. We use a non-parametric bootstrap and propose separately calculating the standard error for the IPD study and the comparator study to ensure the appropriate quantification of the uncertainty associated with the estimated treatment effect. The performance of our proposed unanchored STC approach is evaluated through a comprehensive simulation study focused on binary outcomes. Our findings demonstrate that the proposed approach is asymptotically unbiased. We argue that unanchored STC should be considered when conducting unanchored indirect comparisons with single-arm studies, presenting a robust approach for HTA decision-making., (© 2024 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
35. Controlling false discovery rate for mediator selection in high-dimensional data.
- Author
-
Dai R, Li R, Lee S, and Liu Y
- Subjects
- Humans, Adolescent, Neuroimaging methods, Neuroimaging statistics & numerical data, Data Interpretation, Statistical, Models, Statistical, False Positive Reactions, Biometry methods, Cognition, Magnetic Resonance Imaging methods, Magnetic Resonance Imaging statistics & numerical data, Algorithms, Computer Simulation, Brain diagnostic imaging
- Abstract
The need to select mediators from a high dimensional data source, such as neuroimaging data and genetic data, arises in much scientific research. In this work, we formulate a multiple-hypothesis testing framework for mediator selection from a high-dimensional candidate set, and propose a method, which extends the recent development in false discovery rate (FDR)-controlled variable selection with knockoff to select mediators with FDR control. We show that the proposed method and algorithm achieved finite sample FDR control. We present extensive simulation results to demonstrate the power and finite sample performance compared with the existing method. Lastly, we demonstrate the method for analyzing the Adolescent Brain Cognitive Development (ABCD) study, in which the proposed method selects several resting-state functional magnetic resonance imaging connectivity markers as mediators for the relationship between adverse childhood events and the crystallized composite score in the NIH toolbox., (© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
36. Time-Aware Missing Healthcare Data Prediction Based on ARIMA Model.
- Author
-
Kong L, Li G, Rafique W, Shen S, He Q, Khosravi MR, Wang R, and Qi L
- Subjects
- Humans, Databases, Factual, Delivery of Health Care, Models, Statistical, Algorithms
- Abstract
Healthcare uses state-of-the-art technologies (such as wearable devices, blood glucose meters, electrocardiographs), which results in the generation of large amounts of data. Healthcare data is essential in patient management and plays a critical role in transforming healthcare services, medical scheme design, and scientific research. Missing data is a challenging problem in healthcare due to system failure and untimely filing, resulting in inaccurate diagnosis treatment anomalies. Therefore, there is a need to accurately predict and impute missing data as only complete data could provide a scientific and comprehensive basis for patients, doctors, and researchers. However, traditional approaches in this paradigm often neglect the effect of the time factor on forecasting results. This article proposes a time-aware missing healthcare data prediction approach based on the autoregressive integrated moving average (ARIMA) model. We combine a truncated singular value decomposition (SVD) with the ARIMA model to improve the prediction efficiency of the ARIMA model and remove data redundancy and noise. Through the improved ARIMA model, our proposed approach (named MHDP
SVD_ARIMA ) can capture underlying pattern of healthcare data changes with time and accurately predict missing data. The experiments conducted on the WISDM dataset show that MHDPSVD_ARIMA approach is effective and efficient in predicting missing healthcare data.- Published
- 2024
- Full Text
- View/download PDF
37. LFK index does not reliably detect small-study effects in meta-analysis: A simulation study.
- Author
-
Schwarzer G, Rücker G, and Semaca C
- Subjects
- Humans, Bias, Data Interpretation, Statistical, Models, Statistical, Reproducibility of Results, Research Design, Sample Size, Algorithms, Computer Simulation, Meta-Analysis as Topic
- Abstract
The LFK index has been promoted as an improved method to detect bias in meta-analysis. Putatively, its performance does not depend on the number of studies in the meta-analysis. We conducted a simulation study, comparing the LFK index test to three standard tests for funnel plot asymmetry in settings with smaller or larger group sample sizes. In general, false positive rates of the LFK index test markedly depended on the number and size of studies as well as the between-study heterogeneity with values between 0% and almost 30%. Egger's test adhered well to the pre-specified significance level of 5% under homogeneity, but was too liberal (smaller groups) or conservative (larger groups) under heterogeneity. The rank test was too conservative for most simulation scenarios. The Thompson-Sharp test was too conservative under homogeneity, but adhered well to the significance level in case of heterogeneity. The true positive rate of the LFK index test was only larger compared with classic tests if the false positive rate was inflated. The power of classic tests was similar or larger than the LFK index test if the false positive rate of the LFK index test was used as significance level for the classic tests. Under ideal conditions, the false positive rate of the LFK index test markedly and unpredictably depends on the number and sample size of studies as well as the extent of between-study heterogeneity. The LFK index test in its current implementation should not be used to assess funnel plot asymmetry in meta-analysis., (© 2024 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
38. Group sequential methods based on supremum logrank statistics under proportional and nonproportional hazards.
- Author
-
Boher JM, Filleron T, Sfumato P, Bunouf P, and Cook RJ
- Subjects
- Humans, Immunotherapy statistics & numerical data, Models, Statistical, Clinical Trials as Topic statistics & numerical data, Data Interpretation, Statistical, Neoplasms therapy, Proportional Hazards Models, Monte Carlo Method, Algorithms
- Abstract
Despite the widespread use of Cox regression for modeling treatment effects in clinical trials, in immunotherapy oncology trials and other settings therapeutic benefits are not immediately realized thereby violating the proportional hazards assumption. Weighted logrank tests and the so-called Maxcombo test involving the combination of multiple logrank test statistics have been advocated to increase power for detecting effects in these and other settings where hazards are nonproportional. We describe a testing framework based on supremum logrank statistics created by successively analyzing and excluding early events, or obtained using a moving time window. We then describe how such tests can be conducted in a group sequential trial with interim analyses conducted for potential early stopping of benefit. The crossing boundaries for the interim test statistics are determined using an easy-to-implement Monte Carlo algorithm. Numerical studies illustrate the good frequency properties of the proposed group sequential methods., Competing Interests: Declaration of conflicting interestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
- Published
- 2024
- Full Text
- View/download PDF
39. A comprehensive review and shiny application on the matching-adjusted indirect comparison.
- Author
-
Jiang Z, Cappelleri JC, Gamalo M, Chen Y, Thomas N, and Chu H
- Subjects
- Humans, Technology Assessment, Biomedical, Models, Statistical, Research Design, Software, Calibration, Regression Analysis, Data Interpretation, Statistical, Network Meta-Analysis, Cost-Benefit Analysis, Computer Simulation, Comparative Effectiveness Research, Algorithms
- Abstract
Population-adjusted indirect comparison (PAIC) is an increasingly used technique for estimating the comparative effectiveness of different treatments for the health technology assessments when head-to-head trials are unavailable. Three commonly used PAIC methods include matching-adjusted indirect comparison (MAIC), simulated treatment comparison (STC), and multilevel network meta-regression (ML-NMR). MAIC enables researchers to achieve balanced covariate distribution across two independent trials when individual participant data are only available in one trial. In this article, we provide a comprehensive review of the MAIC methods, including their theoretical derivation, implicit assumptions, and connection to calibration estimation in survey sampling. We discuss the nuances between anchored and unanchored MAIC, as well as their required assumptions. Furthermore, we implement various MAIC methods in a user-friendly R Shiny application Shiny-MAIC. To our knowledge, it is the first Shiny application that implements various MAIC methods. The Shiny-MAIC application offers choice between anchored or unanchored MAIC, choice among different types of covariates and outcomes, and two variance estimators including bootstrap and robust standard errors. An example with simulated data is provided to demonstrate the utility of the Shiny-MAIC application, enabling a user-friendly approach conducting MAIC for healthcare decision-making. The Shiny-MAIC is freely available through the link: https://ziren.shinyapps.io/Shiny_MAIC/., (© 2024 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
40. A closer look at how experience, task domain, and self-confidence influence reliance towards algorithms.
- Author
-
Jessup SA, Alarcon GM, Willis SM, and Lee MA
- Subjects
- Humans, Male, Female, Adult, Young Adult, Task Performance and Analysis, Forecasting, Middle Aged, Adolescent, Models, Statistical, Algorithms, Intention, Self Concept
- Abstract
Prior research has demonstrated experience with a forecasting algorithm decreases reliance behaviors (i.e., the action of relying on the algorithm). However, the influence of model experience on reliance intentions (i.e., an intention or willingness to rely on the algorithm) has not been explored. Additionally, other factors such as self-confidence and domain knowledge are posited to influence algorithm reliance. The objective of this research was to examine how experience with a statistical model, task domain (used car sales, college grade point average (GPA), GitHub pull requests), and self-confidence influence reliance intentions, reliance behaviors, and perceived accuracy of one's own estimates and the model's estimates. Participants (N = 347) were recruited online and completed a forecasting task. Results indicate that there was a statistically significant effect of self-confidence and task domain on reliance intentions, reliance behaviors, and perceived accuracy. However, unlike previous findings, model experience did not significantly influence reliance behavior, nor did it lead to significant changes in reliance intentions or perceived accuracy of oneself or the model. Our data suggest that factors such as task domain and self-confidence influence algorithm use more so than model experience. Individual differences and situational factors should be considered important aspects that influence forecasters' decisions to rely on predictions from a model or to instead use their own estimates, which can lead to sub-optimal performance., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 Elsevier Ltd. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
41. Many nonnormalities, one simulation: Do different data generation algorithms affect study results?
- Author
-
Fairchild AJ, Yin Y, Baraldi AN, Astivia OLO, and Shi D
- Subjects
- Humans, Likelihood Functions, Data Interpretation, Statistical, Models, Statistical, Reproducibility of Results, Algorithms, Monte Carlo Method, Computer Simulation
- Abstract
Monte Carlo simulation studies are among the primary scientific outputs contributed by methodologists, guiding application of various statistical tools in practice. Although methodological researchers routinely extend simulation study findings through follow-up work, few studies are ever replicated. Simulation studies are susceptible to factors that can contribute to replicability failures, however. This paper sought to conduct a meta-scientific study by replicating one highly cited simulation study (Curran et al., Psychological Methods, 1, 16-29, 1996) that investigated the robustness of normal theory maximum likelihood (ML)-based chi-square fit statistics under multivariate nonnormality. We further examined the generalizability of the original study findings across different nonnormal data generation algorithms. Our replication results were generally consistent with original findings, but we discerned several differences. Our generalizability results were more mixed. Only two results observed under the original data generation algorithm held completely across other algorithms examined. One of the most striking findings we observed was that results associated with the independent generator (IG) data generation algorithm vastly differed from other procedures examined and suggested that ML was robust to nonnormality for the particular factor model used in the simulation. Findings point to the reality that extant methodological recommendations may not be universally valid in contexts where multiple data generation algorithms exist for a given data characteristic. We recommend that researchers consider multiple approaches to generating a specific data or model characteristic (when more than one is available) to optimize the generalizability of simulation results., (© 2024. The Psychonomic Society, Inc.)
- Published
- 2024
- Full Text
- View/download PDF
42. Subgroup detection in linear growth curve models with generalized linear mixed model (GLMM) trees.
- Author
-
Fokkema M and Zeileis A
- Subjects
- Humans, Linear Models, Longitudinal Studies, Models, Statistical, Computer Simulation, Data Interpretation, Statistical, Cross-Sectional Studies, Algorithms
- Abstract
Growth curve models are popular tools for studying the development of a response variable within subjects over time. Heterogeneity between subjects is common in such models, and researchers are typically interested in explaining or predicting this heterogeneity. We show how generalized linear mixed-effects model (GLMM) trees can be used to identify subgroups with different trajectories in linear growth curve models. Originally developed for clustered cross-sectional data, GLMM trees are extended here to longitudinal data. The resulting extended GLMM trees are directly applicable to growth curve models as an important special case. In simulated and real-world data, we assess performance of the extensions and compare against other partitioning methods for growth curve models. Extended GLMM trees perform more accurately than the original algorithm and LongCART, and similarly accurate compared to structural equation model (SEM) trees. In addition, GLMM trees allow for modeling both discrete and continuous time series, are less sensitive to (mis-)specification of the random-effects structure and are much faster to compute., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
43. Multi-objective extensive hypothesis testing for the estimation of advanced crash frequency models.
- Author
-
Ahern Z, Corry P, Rabbani W, and Paz A
- Subjects
- Humans, Accidents, Traffic prevention & control, Accidents, Traffic statistics & numerical data, Algorithms, Models, Statistical
- Abstract
Analyzing crash data is a complex and labor-intensive process that requires careful consideration of multiple interdependent modeling aspects, such as functional forms, transformations, likely contributing factors, correlations, and unobserved heterogeneity. Limited time, knowledge, and experience may lead to over-simplified, over-fitted, or misspecified models overlooking important insights. This paper proposes an extensive hypothesis testing framework including a multi-objective mathematical programming formulation and solution algorithms to estimate crash frequency models considering simultaneously likely contributing factors, transformations, non-linearities, and correlated random parameters. The mathematical programming formulation minimizes both in-sample fit and out-of-sample prediction. To address the complexity and non-convexity of the mathematical program, the proposed solution framework utilizes a variety of metaheuristic solution algorithms. Specifically, Harmony Search demonstrated minimal sensitivity to hyperparameters, enabling an efficient search for solutions without being influenced by the choice of hyperparameters. The effectiveness of the framework was evaluated using two real-world datasets and one synthetic dataset. Comparative analyses were performed using the two real-world datasets and the corresponding models published in literature by independent teams. The proposed framework showed its capability to pinpoint efficient model specifications, produce accurate estimates, and provide valuable insights for both researchers and practitioners. The proposed approach allows for the discovery of numerous insights while minimizing the time spent on model development. By considering a broader set of contributing factors, models with varied qualities can be generated. For instance, when applied to crash data from Queensland, the proposed approach revealed that the inclusion of medians on sharp curved roads can effectively reduce the occurrence of crashes, when applied to crash data from Washington, the simultaneous consideration of traffic volume and road curvature resulted in a notable reduction in crash variances but an increase in crash means., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 The Author(s). Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
44. Order selection for heterogeneous semiparametric hidden Markov models.
- Author
-
Zou Y, Song X, and Zhao Q
- Subjects
- Humans, Models, Statistical, Longitudinal Studies, Neuroimaging statistics & numerical data, Markov Chains, Alzheimer Disease, Bayes Theorem, Monte Carlo Method, Computer Simulation, Algorithms
- Abstract
Hidden Markov models (HMMs), which can characterize dynamic heterogeneity, are valuable tools for analyzing longitudinal data. The order of HMMs (ie, the number of hidden states) is typically assumed to be known or predetermined by some model selection criterion in conventional analysis. As prior information about the order frequently lacks, pairwise comparisons under criterion-based methods become computationally expensive with the model space growing. A few studies have conducted order selection and parameter estimation simultaneously, but they only considered homogeneous parametric instances. This study proposes a Bayesian double penalization (BDP) procedure for simultaneous order selection and parameter estimation of heterogeneous semiparametric HMMs. To overcome the difficulties in updating the order, we create a brand-new Markov chain Monte Carlo algorithm coupled with an effective adjust-bound reversible jump strategy. Simulation results reveal that the proposed BDP procedure performs well in estimation and works noticeably better than the conventional criterion-based approaches. Application of the suggested method to the Alzheimer's Disease Neuroimaging Initiative research further supports its usefulness., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
45. The 3 + 3 design in dose-finding studies with small sample sizes: Pitfalls and possible remedies.
- Author
-
Chiuzan C and Dehbi HM
- Subjects
- Humans, Sample Size, Models, Statistical, Maximum Tolerated Dose, Research Design, Bayes Theorem, Clinical Trials, Phase I as Topic methods, Algorithms, Computer Simulation, Dose-Response Relationship, Drug
- Abstract
In the last few years, numerous novel designs have been proposed to improve the efficiency and accuracy of phase I trials to identify the maximum-tolerated dose (MTD) or the optimal biological dose (OBD) for noncytotoxic agents. However, the conventional 3+3 approach, known for its and poor performance, continues to be an attractive choice for many trials despite these alternative suggestions. The article seeks to underscore the importance of moving beyond the 3+3 design by highlighting a different key element in trial design: the estimation of sample size and its crucial role in predicting toxicity and determining the MTD. We use simulation studies to compare the performance of the most used phase I approaches: 3+3, Continual Reassessment Method (CRM), Keyboard and Bayesian Optimal Interval (BOIN) designs regarding three key operating characteristics: the percentage of correct selection of the true MTD, the average number of patients allocated per dose level, and the average total sample size. The simulation results consistently show that the 3+3 algorithm underperforms in comparison to model-based and model-assisted designs across all scenarios and metrics. The 3+3 method yields significantly lower (up to three times) probabilities in identifying the correct MTD, often selecting doses one or even two levels below the actual MTD. The 3+3 design allocates significantly fewer patients at the true MTD, assigns higher numbers to lower dose levels, and rarely explores doses above the target dose-limiting toxicity (DLT) rate. The overall performance of the 3+3 method is suboptimal, with a high level of unexplained uncertainty and significant implications for accurately determining the MTD. While the primary focus of the article is to demonstrate the limitations of the 3+3 algorithm, the question remains about the preferred alternative approach. The intention is not to definitively recommend one model-based or model-assisted method over others, as their performance can vary based on parameters and model specifications. However, the presented results indicate that the CRM, Keyboard, and BOIN designs consistently outperform the 3+3 and offer improved efficiency and precision in determining the MTD, which is crucial in early-phase clinical trials., Competing Interests: Declaration of conflicting interestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
- Published
- 2024
- Full Text
- View/download PDF
46. Variable selection for latent class analysis in the presence of missing data with application to record linkage.
- Author
-
Xu H, Li X, Zhang Z, and Grannis S
- Subjects
- Humans, Models, Statistical, Cluster Analysis, Computer Simulation, Latent Class Analysis, Medical Record Linkage methods, Algorithms
- Abstract
The Fellegi-Sunter model is a latent class model widely used in probabilistic linkage to identify records that belong to the same entity. Record linkage practitioners typically employ all available matching fields in the model with the premise that more fields convey greater information about the true match status and hence result in improved match performance. In the context of model-based clustering, it is well known that such a premise is incorrect and the inclusion of noisy variables could compromise the clustering. Variable selection procedures have therefore been developed to remove noisy variables. Although these procedures have the potential to improve record matching, they cannot be applied directly due to the ubiquity of the missing data in record linkage applications. In this paper, we modify the stepwise variable selection procedure proposed by Fop, Smart, and Murphy and extend it to account for missing data common in record linkage. Through simulation studies, our proposed method is shown to select the correct set of matching fields across various settings, leading to better-performing algorithms. The improved match performance is also seen in a real-world application. We therefore recommend the use of our proposed selection procedure to identify informative matching fields for probabilistic record linkage algorithms., Competing Interests: Declaration of conflicting interestsThe authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
- Published
- 2024
- Full Text
- View/download PDF
47. Exploratory Procedure for Component-Based Structural Equation Modeling for Simple Structure by Simultaneous Rotation.
- Author
-
Yamashita N
- Subjects
- Humans, Computer Simulation, Models, Statistical, Latent Class Analysis, Factor Analysis, Statistical, Rotation, Psychometrics methods, Algorithms
- Abstract
Generalized structured component analysis (GSCA) is a structural equation modeling (SEM) procedure that constructs components by weighted sums of observed variables and confirmatorily examines their regressional relationship. The research proposes an exploratory version of GSCA, called exploratory GSCA (EGSCA). EGSCA is analogous to exploratory SEM (ESEM) developed as an exploratory factor-based SEM procedure, which seeks the relationships between the observed variables and the components by orthogonal rotation of the parameter matrices. The indeterminacy of orthogonal rotation in GSCA is first shown as a theoretical support of the proposed method. The whole EGSCA procedure is then presented, together with a new rotational algorithm specialized to EGSCA, which aims at simultaneous simplification of all parameter matrices. Two numerical simulation studies revealed that EGSCA with the following rotation successfully recovered the true values of the parameter matrices and was superior to the existing GSCA procedure. EGSCA was applied to two real datasets, and the model suggested by the EGSCA's result was shown to be better than the model proposed by previous research, which demonstrates the effectiveness of EGSCA in model exploration., (© 2023. The Author(s), under exclusive licence to The Psychometric Society.)
- Published
- 2024
- Full Text
- View/download PDF
48. Bayesian compositional models for ordinal response.
- Author
-
Zhang L, Zhang X, Leach JM, Rahman AF, and Yi N
- Subjects
- Humans, Inflammatory Bowel Diseases, Computer Simulation, Logistic Models, Bayes Theorem, Monte Carlo Method, Microbiota, Algorithms, Models, Statistical
- Abstract
Ordinal response is commonly found in medicine, biology, and other fields. In many situations, the predictors for this ordinal response are compositional, which means that the sum of predictors for each sample is fixed. Examples of compositional data include the relative abundance of species in microbiome data and the relative frequency of nutrition concentrations. Moreover, the predictors that are strongly correlated tend to have similar influence on the response outcome. Conventional cumulative logistic regression models for ordinal responses ignore the fixed-sum constraint on predictors and their associated interrelationships, and thus are not appropriate for analyzing compositional predictors.To solve this problem, we proposed Bayesian Compositional Models for Ordinal Response to analyze the relationship between compositional data and an ordinal response with a structured regularized horseshoe prior for the compositional coefficients and a soft sum-to-zero restriction on coefficients through the prior distribution. The method was implemented with R package rstan using efficient Hamiltonian Monte Carlo algorithm. We performed simulations to compare the proposed approach and existing methods for ordinal responses. Results revealed that our proposed method outperformed the existing methods in terms of parameter estimation and prediction. We also applied the proposed method to a microbiome study HMP2Data, to find microorganisms linked to ordinal inflammatory bowel disease levels. To make this work reproducible, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCO., Competing Interests: Declaration of conflicting interestsThe authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
- Published
- 2024
- Full Text
- View/download PDF
49. Application of fused graphical lasso to statistical inference for multiple sparse precision matrices.
- Author
-
Zhang Q, Li L, and Yang H
- Subjects
- Humans, Lymphoma, Large B-Cell, Diffuse, Models, Statistical, Computer Simulation, Algorithms
- Abstract
In this paper, the fused graphical lasso (FGL) method is used to estimate multiple precision matrices from multiple populations simultaneously. The lasso penalty in the FGL model is a restraint on sparsity of precision matrices, and a moderate penalty on the two precision matrices from distinct groups restrains the similar structure across multiple groups. In high-dimensional settings, an oracle inequality is provided for FGL estimators, which is necessary to establish the central limit law. We not only focus on point estimation of a precision matrix, but also work on hypothesis testing for a linear combination of the entries of multiple precision matrices. We apply a de-biasing technology, which is used to obtain a new consistent estimator with known distribution for implementing the statistical inference, and extend the statistical inference problem to multiple populations. The corresponding de-biasing FGL estimator and its asymptotic theory are provided. A simulation study and an application of the diffuse large B-cell lymphoma data show that the proposed test works well in high-dimensional situation., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2024
- Full Text
- View/download PDF
50. A discrete approximation method for modeling interval-censored multistate data.
- Author
-
You L, Liu X, and Krischer J
- Subjects
- Humans, Likelihood Functions, Longitudinal Studies, Computer Simulation, Models, Statistical, Data Interpretation, Statistical, Algorithms, Proportional Hazards Models, Heart Transplantation statistics & numerical data
- Abstract
Many longitudinal studies are designed to monitor participants for major events related to the progression of diseases. Data arising from such longitudinal studies are usually subject to interval censoring since the events are only known to occur between two monitoring visits. In this work, we propose a new method to handle interval-censored multistate data within a proportional hazards model framework where the hazard rate of events is modeled by a nonparametric function of time and the covariates affect the hazard rate proportionally. The main idea of this method is to simplify the likelihood functions of a discrete-time multistate model through an approximation and the application of data augmentation techniques, where the assumed presence of censored information facilitates a simpler parameterization. Then the expectation-maximization algorithm is used to estimate the parameters in the model. The performance of the proposed method is evaluated by numerical studies. Finally, the method is employed to analyze a dataset on tracking the advancement of coronary allograft vasculopathy following heart transplantation., (© 2024 John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.