Descriptor: "Zero inflation" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zero inflation"' showing total 596 results

Start Over Descriptor "Zero inflation"

596 results on '"Zero inflation"'

1. A simulation study of the performance of statistical models for count outcomes with excessive zeros.

Author: Zhou, Zhengyang, Li, Dateng, Huh, David, Xie, Minge, and Mun, Eun‐Young
Subjects: *FALSE positive error, *STATISTICAL power analysis, *STATISTICAL models, *ERROR rates, *HEALTH behavior
Abstract: Background: Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol‐related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero‐inflated, particularly compared with recently developed marginalized count regression approaches for such data. Methods: The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log‐transformed scales, respectively) and three prevailing count distribution‐based models (ie, Poisson, negative binomial, and zero‐inflated Poisson (ZIP) models). We also considered the marginalized zero‐inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero‐inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of the statistical models and approaches across data conditions that varied in sample size (N=100$$ N=100 $$ to 500), zero rate (0.2 to 0.8), and intervention effect sizes. Results: Under zero‐inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non‐zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of the linear model with a log‐transformed outcome variable was unsatisfactory. Conclusions: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero‐inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Review and revamp of compositional data transformation: A new framework combining proportion conversion and contrast transformation

Author: Yiqian Zhang, Jonas Schluter, Lijun Zhang, Xuan Cao, Robert R. Jenq, Hao Feng, Jonathan Haines, and Liangliang Zhang
Subjects: Compositional data analysis, Contrast transformation, Conversion, Microbiome, Relative abundance, Zero inflation, Biotechnology, TP248.13-248.65
Abstract: Due to the development of next-generation sequencing technology and an increased appreciation of their role in modulating host immunity and their potential as therapeutic agents, the human microbiome has emerged as a key area of interest in various biological investigations of human health and disease. However, microbiome data present a number of statistical challenges not addressed by existing methods, such as the varying sequencing depth, the compositionality, and zero inflation. Solutions like scaling and transformation methods help to mitigate heterogeneity and release constraints, but often introduce biases and yield inconsistent results on the same data. To address these issues, we conduct a systematic review of compositional data transformation, with a particular focus on the connection and distinction of existing techniques. Additionally, we create a new framework that enables the development of new transformations by combining proportion conversion with contrast transformations. This framework includes well-known methods such as Additive Log Ratio (ALR) and Centered Log Ratio (CLR) as special cases. Using this framework, we develop two novel transformations—Centered Arcsine Contrast (CAC) and Additive Arcsine Contrast (AAC)—which show enhanced performance in scenarios with high zero-inflation. Moreover, our findings suggest that ALR and CLR transformations are more effective when zero values are less prevalent. This comprehensive review and the innovative framework provide microbiome researchers with a significant direction to enhance data transformation procedures and improve analytical outcomes.
Published: 2024
Full Text: View/download PDF

3. Patent Keyword Analysis Using Bayesian Zero-Inflated Model and Text Mining

Author: Sunghae Jun
Subjects: patent keyword data, zero inflation, zero-inflated Poisson regression model, Bayesian inference, text mining, Statistics, HA1-4737
Abstract: Patent keyword analysis is used to analyze the technology keywords extracted from collected patent documents for specific technological fields. Thus, various methods related to this type of analysis have been researched in the industrial engineering fields, such as technology management and new product development. To analyze the patent document data, we have to search for patents related to the target technology and preprocess them to construct the patent–keyword matrix for statistical and machine learning algorithms. In general, a patent–keyword matrix has an extreme zero-inflated problem. This is because each keyword occupies one column even if it is included in only one document among all patent documents. General zero-inflated models have a limit at which the performance of the model deteriorates when the proportion of zeros becomes extremely large. To solve this problem, we applied a Bayesian inference to a general zero-inflated model. In this paper, we propose a patent keyword analysis using a Bayesian zero-inflated model to overcome the extreme zero-inflated problem in the patent–keyword matrix. In our experiments, we collected practical patents related to digital therapeutics technology and used the patent–keyword matrix preprocessed from them. We compared the performance of our proposed method with other comparative methods. Finally, we showed the validity and improved performance of our patent keyword analysis. We expect that our research can contribute to solving the extreme zero-inflated problem that occurs not only in patent keyword analysis, but also in various text big data analyses.
Published: 2024
Full Text: View/download PDF

4. Item Response Modeling of Clinical Instruments With Filter Questions: Disentangling Symptom Presence and Severity.

Author: Magnus, Brooke E.
Subjects: *ITEM response theory, *INDIVIDUAL differences, *LATENT variables, *PATHOLOGICAL psychology, *PSYCHOMETRICS
Abstract: Clinical instruments that use a filter/follow-up response format often produce data with excess zeros, especially when administered to nonclinical samples. When the unidimensional graded response model (GRM) is then fit to these data, parameter estimates and scale scores tend to suggest that the instrument measures individual differences only among individuals with severe levels of the psychopathology. In such scenarios, alternative item response models that explicitly account for excess zeros may be more appropriate. The multivariate hurdle graded response model (MH-GRM), which has been previously proposed for handling zero-inflated questionnaire data, includes two latent variables: susceptibility, which underlies responses to the filter question, and severity, which underlies responses to the follow-up question. Using both simulated and empirical data, the current research shows that compared to unidimensional GRMs, the MH-GRM is better able to capture individual differences across a wider range of psychopathology, and that when unidimensional GRMs are fit to data from questionnaires that include filter questions, individual differences at the lower end of the severity continuum largely go unmeasured. Practical implications are discussed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Patent Keyword Analysis Using Bayesian Zero-Inflated Model and Text Mining.

Author: Jun, Sunghae
Subjects: MACHINE learning, TEXT mining, STATISTICAL learning, NEW product development, POISSON regression
Abstract: Patent keyword analysis is used to analyze the technology keywords extracted from collected patent documents for specific technological fields. Thus, various methods related to this type of analysis have been researched in the industrial engineering fields, such as technology management and new product development. To analyze the patent document data, we have to search for patents related to the target technology and preprocess them to construct the patent–keyword matrix for statistical and machine learning algorithms. In general, a patent–keyword matrix has an extreme zero-inflated problem. This is because each keyword occupies one column even if it is included in only one document among all patent documents. General zero-inflated models have a limit at which the performance of the model deteriorates when the proportion of zeros becomes extremely large. To solve this problem, we applied a Bayesian inference to a general zero-inflated model. In this paper, we propose a patent keyword analysis using a Bayesian zero-inflated model to overcome the extreme zero-inflated problem in the patent–keyword matrix. In our experiments, we collected practical patents related to digital therapeutics technology and used the patent–keyword matrix preprocessed from them. We compared the performance of our proposed method with other comparative methods. Finally, we showed the validity and improved performance of our patent keyword analysis. We expect that our research can contribute to solving the extreme zero-inflated problem that occurs not only in patent keyword analysis, but also in various text big data analyses. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Fast Bayesian Inference for Spatial Mean-Parameterized Conway–Maxwell–Poisson Models.

Author: Kang, Bokgyeong, Hughes, John, and Haran, Murali
Subjects: *MARKOV chain Monte Carlo, *VACCINE refusal, *SPATIAL filters, *BAYESIAN field theory, *POISSON distribution
Abstract: AbstractCount data with complex features arise in many disciplines, including ecology, agriculture, criminology, medicine, and public health. Zero inflation, spatial dependence, and non-equidispersion are common features in count data. There are currently two classes of models that allow for these features—the mode-parameterized Conway–Maxwell–Poisson (COMP) distribution and the generalized Poisson model. However both require the use of either constraints on the parameter space or a parameterization that leads to challenges in interpretability. We propose spatial mean-parameterized COMP models that retain the flexibility of these models while resolving the above issues. We use a Bayesian spatial filtering approach in order to efficiently handle high-dimensional spatial data and we use reversible-jump MCMC to automatically choose the basis vectors for spatial filtering. The COMP distribution poses two additional computational challenges—an intractable normalizing function in the likelihood and no closed-form expression for the mean. We propose a fast computational approach that addresses these challenges by, respectively, introducing an efficient auxiliary variable algorithm and pre-computing key approximations for fast likelihood evaluation. We illustrate the application of our methodology to simulated and real datasets, including Texas HPV-cancer data and US vaccine refusal data. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Quantification and statistical modeling of droplet-based single-nucleus RNA-sequencing data.

Author: Kuo, Albert, Hansen, Kasper D, and Hicks, Stephanie C
Subjects: *NEGATIVE binomial distribution, *RNA sequencing, *STATISTICAL models, *GENE expression, *DISTRIBUTION (Probability theory)
Abstract: In complex tissues containing cells that are difficult to dissociate, single-nucleus RNA-sequencing (snRNA-seq) has become the preferred experimental technology over single-cell RNA-sequencing (scRNA-seq) to measure gene expression. To accurately model these data in downstream analyses, previous work has shown that droplet-based scRNA-seq data are not zero-inflated, but whether droplet-based snRNA-seq data follow the same probability distributions has not been systematically evaluated. Using pseudonegative control data from nuclei in mouse cortex sequenced with the 10x Genomics Chromium system and mouse kidney sequenced with the DropSeq system, we found that droplet-based snRNA-seq data follow a negative binomial distribution, suggesting that parametric statistical models applied to scRNA-seq are transferable to snRNA-seq. Furthermore, we found that the quantification choices in adapting quantification mapping strategies from scRNA-seq to snRNA-seq can play a significant role in downstream analyses and biological interpretation. In particular, reference transcriptomes that do not include intronic regions result in significantly smaller library sizes and incongruous cell type classifications. We also confirmed the presence of a gene length bias in snRNA-seq data, which we show is present in both exonic and intronic reads, and investigate potential causes for the bias. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. A Lindley–binomial model for analyzing the proportions with sparseness and excessive zeros.

Author: Deng, Dianliang and Zhang, Xiaoqing
Subjects: *EXPECTATION-maximization algorithms, *DISTRIBUTION (Probability theory), *URINARY tract infections, *GENERATING functions, *URINARY catheters, *GOODNESS-of-fit tests, *IMPLANTABLE catheters
Abstract: Proportional data arise frequently in a wide variety of fields of study. Such data often exhibit extra variation such as over/under dispersion, sparseness and zero inflation. For example, the hepatitis data present both sparseness and zero inflation with 19 contributing non-zero denominators of 5 or less and with 36 having zero seropositive out of 83 annual age groups. The whitefly data consists of 640 observations with 339 zeros (53%), which demonstrates extra zero inflation. The catheter management data involve excessive zeros with over 60% zeros averagely for outcomes of 193 urinary tract infections, 194 outcomes of catheter blockages and 193 outcomes of catheter displacements. However, the existing models cannot always address such features appropriately. In this paper, a new two-parameter probability distribution called Lindley–binomial (LB) distribution is proposed to analyze the proportional data with such features. The probabilistic properties of the distribution such as moment, moment generating function are derived. The Fisher scoring algorithm and EM algorithm are presented for the computation of estimates of parameters in the proposed LB regression model. The issues on goodness of fit for the LB model are discussed. A limited simulation study is also performed to evaluate the performance of derived EM algorithms for the estimation of parameters in the model with/without covariates. The proposed model is illustrated through three aforementioned proportional datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. A Marginalized Zero‐Inflated Negative Binomial Model for Spatial Data: Modeling COVID‐19 Deaths in Georgia.

Author: Mutiso, Fedelis, Pearce, John L., Benjamin‐Neelon, Sara E., Mueller, Noel T., Li, Hong, and Neelon, Brian
Abstract: Spatial count data with an abundance of zeros arise commonly in disease mapping studies. Typically, these data are analyzed using zero‐inflated models, which comprise a mixture of a point mass at zero and an ordinary count distribution, such as the Poisson or negative binomial. However, due to their mixture representation, conventional zero‐inflated models are challenging to explain in practice because the parameter estimates have conditional latent‐class interpretations. As an alternative, several authors have proposed marginalized zero‐inflated models that simultaneously model the excess zeros and the marginal mean, leading to a parameterization that more closely aligns with ordinary count models. Motivated by a study examining predictors of COVID‐19 death rates, we develop a spatiotemporal marginalized zero‐inflated negative binomial model that directly models the marginal mean, thus extending marginalized zero‐inflated models to the spatial setting. To capture the spatiotemporal heterogeneity in the data, we introduce region‐level covariates, smooth temporal effects, and spatially correlated random effects to model both the excess zeros and the marginal mean. For estimation, we adopt a Bayesian approach that combines full‐conditional Gibbs sampling and Metropolis–Hastings steps. We investigate features of the model and use the model to identify key predictors of COVID‐19 deaths in the US state of Georgia during the 2021 calendar year. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Application of zero-inflated Poisson model with heterogeneous random effects to evaluate the effect of oral health education on pregnant women's dental caries: A longitudinal experimental study.

Author: Ahmadi Gooraji, Somayeh, Zayeri, Farid, Sharifnejad, Yeganeh, Ghorbani, Zahra, Deghatipour, Marzie, Meymeh, Maryam Heydarpour, and Baghban, Alireza Akbarzadeh
Subjects: POISSON distribution, DENTAL health education, EDUCATIONAL outcomes, QUESTIONNAIRES, CLINICAL trials, PREGNANT women, DESCRIPTIVE statistics, CHI-squared test, MANN Whitney U Test, LONGITUDINAL method, EXPERIMENTAL design, DENTAL caries, DATA analysis software, QUALITY assurance, ORAL health, PREGNANCY
Abstract: Background: Pregnant women have poor knowledge of oral hygiene during pregnancy. One problem with the follow-up of dental caries in this group is zero accumulation in the decayed, missing, and filled teeth (DMFT) index, for which some models must be used to achieve valid results. The studied population may be heterogeneous in longitudinal studies, leading to biased estimates. We aimed to assess the impact of oral health education on dental caries in pregnant women using a suitable model in a longitudinal experimental study with heterogeneous random effects. Materials and Methods: This longitudinal, experimental research was carried out on pregnant women who visited medical centers in Tehran. The educational group (236 cases) received education for three sessions. The control group (200 cases) received only standard training. The DMFT index assessed oral and dental health at baseline, 6 months, and 24 months after delivery. The Chi-square test was used for comparing nominal variables and the Mann–Whitney U test for ordinal variables. The zero-inflated Poisson (ZIP) model was applied under heterogeneous and homogeneous random effects using R 4.2.1, SPSS 26, and SAS 9.4. The level of significance was set at 0.05. Results: Data from 436 women aged 15 years and older were analyzed. Zero accumulation in the DMFT was mainly related to the filled teeth (51%). The heterogeneous ZIP model fitted better to the data. On average, the intervention group exhibited a higher rate of change in filled teeth over time than the control group (P = 0.021). Conclusion: The proposed ZIP model is a suitable model for predicting filled teeth in pregnant women. An educational intervention during pregnancy can improve oral health in the long-term follow-up. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Bayesian modeling of interaction between features in sparse multivariate count data with application to microbiome study

Author: Zhang, Shuangjie, Shen, Yuning, Chen, Irene A, and Lee, Juhee
Subjects: Mathematical Sciences, Statistics, Covariance matrix, differential abundance, factor model, joint sparsity, kernel model, zero inflation, multivariate count data, Econometrics, Statistics & Probability
Abstract: Many statistical methods have been developed for the analysis of microbial community profiles, but due to the complexity of typical microbiome measurements, inference of interactions between microbial features remains challenging. We develop a Bayesian zero-inflated rounded log-normal kernel method to model interaction between microbial features in a community using multivariate count data in the presence of covariates and excess zeros. The model carefully constructs the interaction structure by imposing joint sparsity on the covariance matrix of the kernel and obtains a reliable estimate of the structure with a small sample size. The model also includes zero inflation to account for excess zeros observed in data and infers differential abundance of microbial features associated with covariates through log-linear regression. We provide simulation studies and real data analysis examples to demonstrate the developed model. Comparison of the model to a simpler model and popular alternatives in simulation studies shows that, in addition to an added and important insight on the feature interaction, it yields superior parameter estimates and model fit in various settings.
Published: 2023

12. A GLM-based zero-inflated generalized Poisson factor model for analyzing microbiome data.

Author: Jinling Chi, Jimin Ye, and Ying Zhou
Subjects: LOW-rank matrices, FACTOR analysis, HUMAN microbiota, QUANTITATIVE research, NUCLEOTIDE sequencing, BIOMES, POISSON regression
Abstract: Motivation: High-throughput sequencing technology facilitates the quantitative analysis of microbial communities, improving the capacity to investigate the associations between the human microbiome and diseases. Our primary motivating application is to explore the association between gut microbes and obesity. The complex characteristics of microbiome data, including high dimensionality, zero inflation, and over-dispersion, pose new statistical challenges for downstream analysis. Results: We propose a GLM-based zero-inflated generalized Poisson factor analysis (GZIGPFA) model to analyze microbiome data with complex characteristics. The GZIGPFA model is based on a zero-inflated generalized Poisson (ZIGP) distribution for modeling microbiome count data. A link function between the generalized Poisson rate and the probability of excess zeros is established within the generalized linear model (GLM) framework. The latent parameters of the GZIGPFA model constitute a low-rank matrix comprising a low-dimensional score matrix and a loading matrix. An alternating maximum likelihood algorithm is employed to estimate the unknown parameters, and cross-validation is utilized to determine the rank of the model in this study. The proposed GZIGPFA model demonstrates superior performance and advantages through comprehensive simulation studies and real data applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Bayesian semi-parametric inference for clustered recurrent events with zero inflation and a terminal event.

Author: Tian, Xinyuan, Ciarleglio, Maria, Cai, Jiachen, Greene, Erich J, Esserman, Denise, Li, Fan, and Zhao, Yize
Subjects: ACCIDENTAL fall prevention, BAYESIAN field theory, RANDOM effects model, CLUSTER randomized controlled trials, DATA structures, PRICE inflation
Abstract: Recurrent events are common in clinical studies and are often subject to terminal events. In pragmatic trials, participants are often nested in clinics and can be susceptible or structurally unsusceptible to the recurrent events. We develop a Bayesian shared random effects model to accommodate this complex data structure. To achieve robustness, we consider the Dirichlet processes to model the residual of the accelerated failure time model for the survival process as well as the cluster-specific shared frailty distribution, along with an efficient sampling algorithm for posterior inference. Our method is applied to a recent cluster randomized trial on fall injury prevention. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. New ridge parameter estimators for the zero-inflated Conway Maxwell Poisson ridge regression model.

Author: Ashraf, Bushra, Amin, Muhammad, and Akram, Muhammad Nauman
Subjects: *POISSON regression, *REGRESSION analysis, *MAXIMUM likelihood statistics
Abstract: One of the flexible count data models for dealing with over and under-dispersion with extra zeroes is the zero-inflated Conway–Maxwell Poisson (ZICOMP). The ZICOMP regression coefficients are generally estimated using the maximum likelihood estimator (MLE). In the ZICOMP regression model, when the explanatory variables are correlated, the MLE does not give efficient results. To overcome the effect of multicollinearitymode in the ZICOPM regression, we proposed the ridge regression estimator. To evaluate the performance of the estimator, we use mean squared error (MSE) as the performance evaluation criteria. A theoretical comparison of the ridge estimator with MLE is made to show the superiority of the estimator. The proposed estimator is evaluated with the help of a simulation study and a real application. The results of the simulation study and real application show the superiority of the proposed estimator because it produces a smaller MSE as compared to the MLE. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Application of zero-inflated Poisson model with heterogeneous random effects to evaluate the effect of oral health education on pregnant women's dental caries: A longitudinal experimental study.

Author: Gooraji, Somayeh Ahmadi, Zayeri, Farid, Sharifnejad, Yeganeh, Ghorbani, Zahra, Deghatipour, Marzie, Meymeh, Maryam Heydarpour, and Baghban, Alireza Akbarzadeh
Subjects: CAVITY prevention, POISSON distribution, HEALTH literacy, EDUCATIONAL outcomes, CLINICAL trials, PREGNANT women, ORAL hygiene, CHI-squared test, MANN Whitney U Test, DESCRIPTIVE statistics, LONGITUDINAL method, EXPERIMENTAL design, TREATMENT effect heterogeneity, HEALTH education, COMPARATIVE studies, DATA analysis software, ORAL health, PREGNANCY
Abstract: Background: Pregnant women have poor knowledge of oral hygiene during pregnancy. One problem with the follow-up of dental caries in this group is zero accumulation in the decayed, missing, and filled teeth (DMFT) index, for which some models must be used to achieve valid results. The studied population may be heterogeneous in longitudinal studies, leading to biased estimates. We aimed to assess the impact of oral health education on dental caries in pregnant women using a suitable model in a longitudinal experimental study with heterogeneous random effects. Materials and Methods: This longitudinal, experimental research was carried out on pregnant women who visited medical centers in Tehran. The educational group (236 cases) received education for three sessions. The control group (200 cases) received only standard training. The DMFT index assessed oral and dental health at baseline, 6 months, and 24 months after delivery. The Chi-square test was used for comparing nominal variables and the Mann-Whitney U test for ordinal variables. The zero-inflated Poisson (ZIP) model was applied under heterogeneous and homogeneous random effects using R 4.2.1, SPSS 26, and SAS 9.4. The level of significance was set at 0.05. Results: Data from 436 women aged 15 years and older were analyzed. Zero accumulation in the DMFT was mainly related to the filled teeth (51%). The heterogeneous ZIP model fitted better to the data. On average, the intervention group exhibited a higher rate of change in filled teeth over time than the control group (P = 0.021). Conclusion: The proposed ZIP model is a suitable model for predicting filled teeth in pregnant women. An educational intervention during pregnancy can improve oral health in the long-term follow-up. [ABSTRACT FROM AUTHOR]
Published: 2024

16. A zero-inflated model for spatiotemporal count data with extra zeros: application to 1950–2015 tornado data in Kansas.

Author: Yang, Hong-Ding, Chang, Audrey, Hsu, Wei-Wen, and Chen, Chun-Shu
Subjects: POISSON regression, TORNADOES, AUTOREGRESSIVE models, GIBBS sampling
Abstract: In many tornado climate studies, the number of tornado touchdowns is often the primary outcome of interest. These outcome measures are usually generated under a spatiotemporal correlation structure and contains many zeros due to the rarity of tornado occurrence at a specific location and time interval. To model the spatiotemporal count data with excess zeros, we propose a spatiotemporal zero-inflated Poisson (ZIP) model, which lends itself to ease of interpretation and computational simplicity. Technically, we embed a modified conditional autoregressive model in the ZIP model to describe the spatial and temporal correlations, where the probability of a pure zero in the ZIP is purposely designed to depend on locations but independent of time. Illustrated with the longitudinal tornado touchdown data in the state of Kansas from 1950 to 2015, our model suggests that the spatial correlation among the counties and the corresponding temperature are significant factors attributed to the tornado touchdowns. Through the model, we can also estimate the probabilities of no tornado touchdowns for each county over time. These estimated probabilities substantially help us understand the pattern of touchdowns and further identify the risk areas across Kansas. Moreover, these estimates can be iteratively updated when more current touchdown data are available. The final model for Kansas tornado touchdown data is evaluated using more recent data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Keyword Data Analysis Using Generative Models Based on Statistics and Machine Learning Algorithms.

Author: Jun, Sunghae
Subjects: MACHINE learning, PROBABILISTIC generative models, DATA analysis, STATISTICS, BIG data, PROBLEM solving
Abstract: For text big data analysis, we preprocessed text data and constructed a document–keyword matrix. The elements of this matrix represent the frequencies of keywords occurring in a document. The matrix has a zero-inflation problem because many elements are zero values. Also, in the process of preprocessing, the data size of the document–keyword matrix is reduced. However, various machine learning algorithms require a large amount of data, so to solve the problems of data shortage and zero inflation, we propose the use of generative models based on statistics and machine learning. In our experimental tests, we compared the performance of the models using simulation and practical data sets. Thus, we verified the validity and contribution of our research for keyword data analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Application of zero-inflated Poisson model with heterogeneous random effects to evaluate the effect of oral health education on pregnant women’s dental caries: A longitudinal experimental study

Author: Somayeh Ahmadi Gooraji, Farid Zayeri, Yeganeh Sharifnejad, Zahra Ghorbani, Marzie Deghatipour, Maryam Heydarpour Meymeh, and Alireza Akbarzadeh Baghban
Subjects: dental caries, longitudinal studies, pregnancy, zero inflation, Dentistry, RK1-715
Abstract: Background: Pregnant women have poor knowledge of oral hygiene during pregnancy. One problem with the follow-up of dental caries in this group is zero accumulation in the decayed, missing, and filled teeth (DMFT) index, for which some models must be used to achieve valid results. The studied population may be heterogeneous in longitudinal studies, leading to biased estimates. We aimed to assess the impact of oral health education on dental caries in pregnant women using a suitable model in a longitudinal experimental study with heterogeneous random effects. Materials and Methods: This longitudinal, experimental research was carried out on pregnant women who visited medical centers in Tehran. The educational group (236 cases) received education for three sessions. The control group (200 cases) received only standard training. The DMFT index assessed oral and dental health at baseline, 6 months, and 24 months after delivery. The Chi-square test was used for comparing nominal variables and the Mann–Whitney U test for ordinal variables. The zero-inflated Poisson (ZIP) model was applied under heterogeneous and homogeneous random effects using R 4.2.1, SPSS 26, and SAS 9.4. The level of significance was set at 0.05. Results: Data from 436 women aged 15 years and older were analyzed. Zero accumulation in the DMFT was mainly related to the filled teeth (51%). The heterogeneous ZIP model fitted better to the data. On average, the intervention group exhibited a higher rate of change in filled teeth over time than the control group (P = 0.021). Conclusion: The proposed ZIP model is a suitable model for predicting filled teeth in pregnant women. An educational intervention during pregnancy can improve oral health in the long-term follow-up.
Published: 2024
Full Text: View/download PDF

19. Joint modeling the frequency and duration of accelerometer‐measured physical activity from a lifestyle intervention trial.

Author: Siddique, Juned, Daniels, Michael J., Inan, Gül, Battalio, Samuel, Spring, Bonnie, and Hedeker, Donald
Subjects: *PHYSICAL activity, *RESEARCH personnel
Abstract: Physical activity (PA) guidelines recommend that PA be accumulated in bouts of 10 minutes or more in duration. Recently, researchers have sought to better understand how participants in PA interventions increase their activity. Participants can increase their daily PA by increasing the number of PA bouts per day while keeping the duration of the bouts constant; they can keep the number of bouts constant but increase the duration of each bout; or participants can increase both the number of bouts and their duration. We propose a novel joint modeling framework for modeling PA bouts and their duration over time. Our joint model is comprised of two sub‐models: a mixed‐effects Poisson hurdle sub‐model for the number of bouts per day and a mixed‐effects location scale gamma regression sub‐model to characterize the duration of the bouts and their variance. The model allows us to estimate how daily PA bouts and their duration vary together over the course of an intervention and by treatment condition and is specifically designed to capture the unique distributional features of bouted PA as measured by accelerometer: frequent measurements, zero‐inflated bouts, and skewed bout durations. We apply our methods to the Make Better Choices study, a longitudinal lifestyle intervention trial to increase PA. We perform a simulation study to evaluate how well our model is able to estimate relationships between outcomes. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. A zero‐inflated endemic–epidemic model with an application to measles time series in Germany.

Author: Lu, Junyi and Meyer, Sebastian
Abstract: Count data with an excess of zeros are often encountered when modeling infectious disease occurrence. The degree of zero inflation can vary over time due to nonepidemic periods as well as by age group or region. A well‐established approach to analyze multivariate incidence time series is the endemic–epidemic modeling framework, also known as the HHH approach. However, it assumes Poisson or negative binomial distributions and is thus not tailored to surveillance data with excess zeros. Here, we propose a multivariate zero‐inflated endemic–epidemic model with random effects that extends HHH. Parameters of both the zero‐inflation probability and the HHH part of this mixture model can be estimated jointly and efficiently via (penalized) maximum likelihood inference using analytical derivatives. We found proper convergence and good coverage of confidence intervals in simulation studies. An application to measles counts in the 16 German states, 2005–2018, showed that zero inflation is more pronounced in the Eastern states characterized by a higher vaccination coverage. Probabilistic forecasts of measles cases improved when accounting for zero inflation. We anticipate zero‐inflated HHH models to be a useful extension also for other applications and provide an implementation in an R package. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

21. Predicting Flash Flood Economic Damage at the Community Scale: Empirical Zero-Inflated Model with Semicontinuous Data.

Author: Chang, Shi, Singh Wilkho, Rohan, Gharaibeh, Nasir, Lyle, Stacey, and Zou, Lei
Subjects: FLOOD damage, LEAD time (Supply chain management), PRICES, VALUATION of real property, DATA modeling
Abstract: Rainfall-induced flash floods are characterized by their rapid onset and small spatial scale. With little lead time for warning, floodwater can accumulate rapidly and its force can damage roads, swamp houses, destroy bridges, and scour out channels. Having data-driven estimates of potential economic losses from flash floods (before they occur) helps authorities make informed decisions about planning and prioritizing mitigation projects. This article provides a probabilistic predictive model to estimate flash flood economic damage at the census tract scale. To simplify model utilization and avoid strong assumptions about property value and replacement costs, the model predicts the total cost of property and infrastructure damages for individual census tracts (expressed in 2019 prices). The model was developed based on a flash flood data set for a 15-year period (2005–2019) in Texas. The data set was assembled by integrating disparate data from multiple platforms. The occurrence of economic damage was found to be a zero-inflated problem. Therefore, we developed a two-part mixed-effect model. The model first estimates the probability that economic damage will occur (zero-inflated part) and then predicts the dollar amount of the economic damage (continuous part). Utilization of the developed model was demonstrated in an application to Harris County, Texas. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

22. A Bayesian Nonparametric Analysis for Zero-Inflated Multivariate Count Data with Application to Microbiome Study

Author: Shuler, Kurtis, Verbanic, Samuel, Chen, Irene A, and Lee, Juhee
Subjects: Mathematical Sciences, Statistics, Human Genome, Genetics, Prevention, Bioengineering, Aetiology, 2.5 Research design and methodologies (aetiology), Bayesian nonparametrics, dependent Dirichlet process, high-throughput sequencing, microbiome, multivariate count, normalization, operational taxonomic unit, zero inflation, Statistics & Probability
Abstract: High-throughput sequencing technology has enabled researchers to profile microbial communities from a variety of environments, but analysis of multivariate taxon count data remains challenging. We develop a Bayesian nonparametric (BNP) regression model with zero inflation to analyse multivariate count data from microbiome studies. A BNP approach flexibly models microbial associations with covariates, such as environmental factors and clinical characteristics. The model produces estimates for probability distributions which relate microbial diversity and differential abundance to covariates, and facilitates community comparisons beyond those provided by simple statistical tests. We compare the model to simpler models and popular alternatives in simulation studies, showing, in addition to these additional community-level insights, it yields superior parameter estimates and model fit in various settings. The model's utility is demonstrated by applying it to a chronic wound microbiome data set and a Human Microbiome Project data set, where it is used to compare microbial communities present in different environments.
Published: 2021

23. Disease mapping for spatially semi‐continuous data by estimating equations with application to dengue control.

Author: Lin, Pei‐Sheng, Yu, Yih‐Jeng, and Zhu, Jun
Subjects: *DISEASE mapping, *DENGUE, *DENGUE hemorrhagic fever, *EQUATIONS, *REGRESSION analysis, *DATA analysis
Abstract: Disease mapping is a research field to estimate spatial pattern of disease risks so that areas with elevated risk levels can be identified. The motivation of this article is from a study of dengue fever infection, which causes seasonal epidemics in almost every summer in Taiwan. For analysis of zero‐inflated data with spatial correlation and covariates, current methods would either cause a computational burden or miss associations between zero and non‐zero responses. In this article, we develop estimating equations for a mixture regression model that accommodates spatial dependence and zero inflation for study of disease propagation. Asymptotic properties for the proposed estimates are established. A simulation study is conducted to evaluate performance of the mixture estimating equations; and a dengue dataset from southern Taiwan is used to illustrate the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. Regression models for count data with excess zeros: A comparison using survey data

Author: Bhaskar, Adhin, Thennarasu, K., Philip, Mariamma, and Jaisoorya, T. S.
Subjects: count data, poisson, negative binomial, zero inflation, hurdle regression, Psychology, BF1-990
Abstract: Presence of excess zeros and the distributions are major concern in modeling count data. Zero inflated and hurdle models are regression techniques which can handle zero inflated count data. This study compares various count regression models for survey data observed with excess zeros. The data for the study is obtained from a survey conducted to assess the harms attributable to drinkers among children. Poisson, negative binomial and their zero inflated and hurdle versions were compared by fitting them to two count response variables, number of physical and number of psychological harms. The models were compared using fit indices, residual analysis and predicted values. The robustness of the models were also compared using simulated data sets. Results indicated that the Poisson regression was less robust to deviations from the distributional assumptions. The negative binomial regression and hurdle regression model were found to be suitable to model the number of physical and number of psychological harms respectively. The results showed that excess zeros in count data does not imply zero inflation. The zero inflated or hurdle models are suitable for zero inflated data. The selection between the zero inflated and hurdle models should be based on the assumed cause of zeros.
Published: 2023
Full Text: View/download PDF

25. Comparison of different count models for investigation of some environmental factors affecting stillbirth in holsteins

Author: Gevrekci, Y., Guneri, O.I., Takma, C., and Yesilova, A.
Published: 2022
Full Text: View/download PDF

26. Spatial correlated incidence modeling with zero inflation.

Author: Wang, Feifei, Li, Haofeng, Wang, Han, and Li, Yang
Abstract: Disease mapping models have been popularly used to model disease incidence with spatial correlation. In disease mapping models, zero inflation is an important issue, which often occurs in disease incidence datasets with high proportions of zero disease count. It is originated from limited survey coverage or unadvanced testing equipment, which makes some regions have no observed patients. Then excessive zeros recorded in the disease incidence dataset would mess up the true distributions of disease incidence and lead to inaccurate estimates. To address this issue, a zero‐inflated disease mapping model is developed in this work. In this model, a zero‐inflated process using Bernoulli indicators is assumed to characterize whether the zero inflation occurs for each region. For regions without zero inflation, a coherent and generative disease mapping model is applied for mapping the spatially correlated disease incidence. Independent spatial random effects are incorporated in both processes to account for the spatial patterns of zero inflation and disease incidence. External covariates are also considered in both processes to better explain the disease count data. To estimate the model, a Markov chain Monte Carlo algorithm is proposed. We evaluate model performance via a variety of simulation experiments. Finally, a Lyme disease dataset of Virginia is analyzed to illustrate the application of the proposed model. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

27. An alternative test for zero modification in the INAR(1) model with Poisson innovations.

Author: Huang, Jie and Zhu, Fukang
Subjects: *MARGINAL distributions, *POISSON distribution, *BINOMIAL distribution, *TIME series analysis
Abstract: Several methods have been proposed for detecting zero modification in the first-order integer-valued autoregressive (INAR(1)) process. A basic problem of these tests is that they rely upon asymptotic results. In this paper, an alternative test is introduced which makes direct use of the approximate distribution of the number of zeros, which can be described by a beta-binomial distribution. A hybrid estimator of the mean parameter of the marginal distribution of the Poisson INAR(1) process is given. A simulation study shows that power and size of the proposed test are competitive. Finally, real data examples are provided. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. scShapes: a statistical framework for identifying distribution shapes in single-cell RNA-sequencing data.

Author: Dharmaratne, Malindrie, Kulkarni, Ameya S, Taherian Fard, Atefeh, and Mar, Jessica C
Subjects: *GENE expression, *RNA sequencing, *PHENOTYPES, *GENETIC transcription regulation, *OPEN-ended questions
Abstract: Background Single-cell RNA sequencing (scRNA-seq) methods have been advantageous for quantifying cell-to-cell variation by profiling the transcriptomes of individual cells. For scRNA-seq data, variability in gene expression reflects the degree of variation in gene expression from one cell to another. Analyses that focus on cell–cell variability therefore are useful for going beyond changes based on average expression and, instead, identifying genes with homogeneous expression versus those that vary widely from cell to cell. Results We present a novel statistical framework, scShapes , for identifying differential distributions in single-cell RNA-sequencing data using generalized linear models. Most approaches for differential gene expression detect shifts in the mean value. However, as single-cell data are driven by overdispersion and dropouts, moving beyond means and using distributions that can handle excess zeros is critical. scShapes quantifies gene-specific cell-to-cell variability by testing for differences in the expression distribution while flexibly adjusting for covariates if required. We demonstrate that scShapes identifies subtle variations that are independent of altered mean expression and detects biologically relevant genes that were not discovered through standard approaches. Conclusions This analysis also draws attention to genes that switch distribution shapes from a unimodal distribution to a zero-inflated distribution and raises open questions about the plausible biological mechanisms that may give rise to this, such as transcriptional bursting. Overall, the results from scShapes help to expand our understanding of the role that gene expression plays in the transcriptional regulation of a specific perturbation or cellular phenotype. Our framework scShapes is incorporated into a Bioconductor R package (https://www.bioconductor.org/packages/release/bioc/html/scShapes.html). [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

29. Modeling zero inflation is not necessary for spatial transcriptomics

Author: Peiyao Zhao, Jiaqiang Zhu, Ying Ma, and Xiang Zhou
Subjects: Spatial transcriptomics, Zero inflation, Overdispersion, Poisson model, Negative binomial model, Biology (General), QH301-705.5, Genetics, QH426-470
Abstract: Abstract Background Spatial transcriptomics are a set of new technologies that profile gene expression on tissues with spatial localization information. With technological advances, recent spatial transcriptomics data are often in the form of sparse counts with an excessive amount of zero values. Results We perform a comprehensive analysis on 20 spatial transcriptomics datasets collected from 11 distinct technologies to characterize the distributional properties of the expression count data and understand the statistical nature of the zero values. Across datasets, we show that a substantial fraction of genes displays overdispersion and/or zero inflation that cannot be accounted for by a Poisson model, with genes displaying overdispersion substantially overlapped with genes displaying zero inflation. In addition, we find that either the Poisson or the negative binomial model is sufficient for modeling the majority of genes across most spatial transcriptomics technologies. We further show major sources of overdispersion and zero inflation in spatial transcriptomics including gene expression heterogeneity across tissue locations and spatial distribution of cell types. In particular, when we focus on a relatively homogeneous set of tissue locations or control for cell type compositions, the number of detected overdispersed and/or zero-inflated genes is substantially reduced, and a simple Poisson model is often sufficient to fit the gene expression data there. Conclusions Our study provides the first comprehensive evidence that excessive zeros in spatial transcriptomics are not due to zero inflation, supporting the use of count models without a zero inflation component for modeling spatial transcriptomics.
Published: 2022
Full Text: View/download PDF

30. Suicide variations between English neighbourhoods over 2017-21: The role of spatial scale.

Author: Congdon, Peter
Subjects: *SUICIDE risk factors, *RISK assessment, *SOCIOECONOMIC factors, *MENTAL illness, *POPULATION geography, *SOCIAL context, *SUICIDE, *METROPOLITAN areas, *RURAL conditions, *NEIGHBORHOOD characteristics, *REGRESSION analysis
Abstract: Geographic studies of suicide variation typically focus on predictors at the same level as the event rates, and the possible interplay between different spatial scales does not generally figure. In this paper we focus on suicide variations between 6856 small area census units in England, but against a background provided by nine regions, broad urban-rural categories, and 155 local labour markets. Suicide death totals vary considerably between the small areas, with more areas than expected having no deaths, so we apply zero inflated regression. With this framework, we consider the relative contribution of factors at higher and lower spatial scales in explaining small area suicide contrasts, and why some areas have unduly elevated or unduly low suicide rates. We find significantly lower suicide levels in English metropolitan regions, after allowing for neighbourhood influences, but considerable heterogeneity in risks within broader spatial units. Varying incidence in general is associated significantly with all observed neighbourhood risk factors (social fragmentation, socioeconomic status, mental ill-health, ethnic mix), but low fragmentation and low psychiatric morbidity are the only significant influences on unduly low incidence. • Considers ecological suicide variations in England, considering region and neighbourhood contexts in tandem. • Clear contrast between lower metropolitan suicide and other urban-rural categories. • Social fragmentation and area morbidity leading influences on both overall risk and low incidence. • Socioeconomic and ethnic mix are also major contextual influences on suicide levels. • Identifies strong tendency to spatial clustering of risk, relevant to suicide prevention. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Inferring predator–prey interactions from camera traps: A Bayesian co‐abundance modeling approach.

Author: Amir, Zachary, Sovie, Adia, and Luskin, Matthew Scott
Subjects: *PREDATION, *TIGERS, *POISSON distribution, *TROPICAL forests, *CAMERAS, *LEOPARD, *DEER
Abstract: Predator–prey dynamics are a fundamental part of ecology, but directly studying interactions has proven difficult. The proliferation of camera trapping has enabled the collection of large datasets on wildlife, but researchers face hurdles inferring interactions from observational data. Recent advances in hierarchical co‐abundance models infer species interactions while accounting for two species' detection probabilities, shared responses to environmental covariates, and propagate uncertainty throughout the entire modeling process. However, current approaches remain unsuitable for interacting species whose natural densities differ by an order of magnitude and have contrasting detection probabilities, such as predator–prey interactions, which introduce zero inflation and overdispersion in count histories. Here, we developed a Bayesian hierarchical N‐mixture co‐abundance model that is suitable for inferring predator–prey interactions. We accounted for excessive zeros in count histories using an informed zero‐inflated Poisson distribution in the abundance formula and accounted for overdispersion in count histories by including a random effect per sampling unit and sampling occasion in the detection probability formula. We demonstrate that models with these modifications outperform alternative approaches, improve model goodness‐of‐fit, and overcome parameter convergence failures. We highlight its utility using 20 camera trapping datasets from 10 tropical forest landscapes in Southeast Asia and estimate four predator–prey relationships between tigers, clouded leopards, and muntjac and sambar deer. Tigers had a negative effect on muntjac abundance, providing support for top‐down regulation, while clouded leopards had a positive effect on muntjac and sambar deer, likely driven by shared responses to unmodelled covariates like hunting. This Bayesian co‐abundance modeling approach to quantify predator–prey relationships is widely applicable across species, ecosystems, and sampling approaches and may be useful in forecasting cascading impacts following widespread predator declines. Taken together, this approach facilitates a nuanced and mechanistic understanding of food‐web ecology. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

32. Joint modeling of zero‐inflated longitudinal proportions and time‐to‐event data with application to a gut microbiome study.

Author: Hu, Jiyuan, Wang, Chan, Blaser, Martin J., and Li, Huilin
Subjects: *GUT microbiome, *HUMAN microbiota, *LONGITUDINAL method, *DATA analysis, *SURVIVAL rate, *POISSON regression
Abstract: Recent studies have suggested that the temporal dynamics of the human microbiome may have associations with human health and disease. An increasing number of longitudinal microbiome studies, which record time to disease onset, aim to identify candidate microbes as biomarkers for prognosis. Owing to the ultra‐skewness and sparsity of microbiome proportion (relative abundance) data, directly applying traditional statistical methods may result in substantial power loss or spurious inferences. We propose a novel joint modeling framework [JointMM], which is comprised of two sub‐models: a longitudinal sub‐model called zero‐inflated scaled‐beta generalized linear mixed‐effects regression to depict the temporal structure of microbial proportions among subjects; and a survival sub‐model to characterize the occurrence of an event and its relationship with the longitudinal microbiome proportions. JointMM is specifically designed to handle the zero‐inflated and highly skewed longitudinal microbial proportion data and examine whether the temporal pattern of microbial presence and/or the nonzero microbial proportions are associated with differences in the time to an event. The longitudinal sub‐model of JointMM also provides the capacity to investigate how the (time‐varying) covariates are related to the temporal microbial presence/absence patterns and/or the changing trend in nonzero proportions. Comprehensive simulations and real data analyses are used to assess the statistical efficiency and interpretability of JointMM. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. Symptom Presence and Symptom Severity as Unique Indicators of Psychopathology: An Application of Multidimensional Zero-Inflated and Hurdle Graded Response Models.

Author: Magnus, Brooke E. and Liu, Yang
Subjects: *MATHEMATICAL models, *SEVERITY of illness index, *PATHOLOGICAL psychology, *THEORY
Abstract: Questionnaires inquiring about psychopathology symptoms often produce data with excess zeros or the equivalent (e.g., none, never, and not at all). This type of zero inflation is especially common in nonclinical samples in which many people do not exhibit psychopathology, and if unaccounted for, can result in biased parameter estimates when fitting latent variable models. In the present research, we adopt a maximum likelihood approach in fitting multidimensional zero-inflated and hurdle graded response models to data from a psychological distress measure. These models include two latent variables: susceptibility, which relates to the probability of endorsing the symptom at all, and severity, which relates to the frequency of the symptom, given its presence. After estimating model parameters, we compute susceptibility and severity scale scores and include them as explanatory variables in modeling health-related criterion measures (e.g., suicide attempts, diagnosis of major depressive disorder). Results indicate that susceptibility and severity uniquely and differentially predict other health outcomes, which suggests that symptom presence and symptom severity are unique indicators of psychopathology and both may be clinically useful. Psychometric and clinical implications are discussed, including scale score reliability. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

34. Analysis of Longitudinal Binomial Data with Positive Association between the Number of Successes and the Number of Failures: An Application to Stock Instability Study.

Author: Zhang, Xiaolei, Yan, Guohua, Ma, Renjun, and Li, Jiaxiu
Subjects: *RANDOM numbers, *SUCCESS, *BIVARIATE analysis
Abstract: Numerous methods have been developed for longitudinal binomial data in the literature. These traditional methods are reasonable for longitudinal binomial data with a negative association between the number of successes and the number of failures over time; however, a positive association may occur between the number of successes and the number of failures over time in some behaviour, economic, disease aggregation and toxicological studies as the numbers of trials are often random. In this paper, we propose a joint Poisson mixed modelling approach to longitudinal binomial data with a positive association between longitudinal counts of successes and longitudinal counts of failures. This approach can accommodate both a random and zero number of trials. It can also accommodate overdispersion and zero inflation in the number of successes and the number of failures. An optimal estimation method for our model has been developed using the orthodox best linear unbiased predictors. Our approach not only provides robust inference against misspecified random effects distributions, but also consolidates the subject-specific and population-averaged inferences. The usefulness of our approach is illustrated with an analysis of quarterly bivariate count data of stock daily limit-ups and limit-downs. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

35. A deep neural network to de-noise single-cell RNA sequencing data.

Author: Sharifitabar M, Kazempour S, Razavian J, Sajedi S, Solhjoo S, and Zare H
Abstract: Single-cell RNA sequencing (scRNA-seq), a powerful technique for investigating the transcriptome of individual cells, enables the discovery of heterogeneous cell populations, rare cell types, and transcriptional dynamics in separate cells. Yet, scRNA-seq data analysis is limited by the problem of measurement dropouts, i.e., genes displaying zero expression levels. We introduce ZiPo, a deep artificial neural network for rate estimation and library size prediction in scRNA-seq data which incorporates adjustable zero inflation in the distribution to capture the dropouts. ZiPo builds upon established concepts, including using deep autoencoders and adopting the Poisson and negative binomial distributions, by taking advantage of novel strategies, including library size prediction and residual connections, to improve the overall performance. A significant innovation of ZiPo is the introduction of a scale-invariant loss term, making the weights sparse and, hence, the model biologically more interpretable. ZiPo quickly handles vast singular and mixed datasets, with the processing time directly proportional to the number of cells. In this paper, we demonstrate the power of ZiPo on three datasets and show its advantages over other current techniques. The code used to produce the results in this manuscript is available at https://bitbucket.org/habilzare/alzheimer/src/master/code/deep/ZiPo/., Competing Interests: Conflict of Interest Disclosures The authors have declared that no competing interests exist.
Published: 2024
Full Text: View/download PDF

36. Review and revamp of compositional data transformation: A new framework combining proportion conversion and contrast transformation.

Author: Zhang Y, Schluter J, Zhang L, Cao X, Jenq RR, Feng H, Haines J, and Zhang L
Abstract: Due to the development of next-generation sequencing technology and an increased appreciation of their role in modulating host immunity and their potential as therapeutic agents, the human microbiome has emerged as a key area of interest in various biological investigations of human health and disease. However, microbiome data present a number of statistical challenges not addressed by existing methods, such as the varying sequencing depth, the compositionality, and zero inflation. Solutions like scaling and transformation methods help to mitigate heterogeneity and release constraints, but often introduce biases and yield inconsistent results on the same data. To address these issues, we conduct a systematic review of compositional data transformation, with a particular focus on the connection and distinction of existing techniques. Additionally, we create a new framework that enables the development of new transformations by combining proportion conversion with contrast transformations. This framework includes well-known methods such as Additive Log Ratio (ALR) and Centered Log Ratio (CLR) as special cases. Using this framework, we develop two novel transformations-Centered Arcsine Contrast (CAC) and Additive Arcsine Contrast (AAC)-which show enhanced performance in scenarios with high zero-inflation. Moreover, our findings suggest that ALR and CLR transformations are more effective when zero values are less prevalent. This comprehensive review and the innovative framework provide microbiome researchers with a significant direction to enhance data transformation procedures and improve analytical outcomes., Competing Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (© 2024 The Authors.)
Published: 2024
Full Text: View/download PDF

37. Copula Markov Models for Count Series with Excess Zeros

Author: Sun, Li-Hsien, Huang, Xin-Wei, Alqawba, Mohammed S., Kim, Jong-Min, Emura, Takeshi, Sun, Li-Hsien, Huang, Xin-Wei, Alqawba, Mohammed S., Kim, Jong-Min, and Emura, Takeshi
Published: 2020
Full Text: View/download PDF

38. A Spatiotemporal Analytical Outlook of the Exposure to Air Pollution and COVID-19 Mortality in the USA.

Author: Chakraborty, Sounak, Dey, Tanujit, Jun, Yoonbae, Lim, Chae Young, Mukherjee, Anish, and Dominici, Francesca
Subjects: *COVID-19, *SARS-CoV-2, *AIR pollution, *COVID-19 pandemic, *RANDOM effects model, *PANDEMICS
Abstract: The world is experiencing a pandemic due to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), also known as COVID-19. The USA is also suffering from a catastrophic death toll from COVID-19. Several studies are providing preliminary evidence that short- and long-term exposure to air pollution might increase the severity of COVID-19 outcomes, including a higher risk of death. In this study, we develop a spatiotemporal model to estimate the association between exposure to fine particulate matter PM2.5 and mortality accounting for several social and environmental factors. More specifically, we implement a Bayesian zero-inflated negative binomial regression model with random effects that vary in time and space. Our goal is to estimate the association between air pollution and mortality accounting for the spatiotemporal variability that remained unexplained by the measured confounders. We applied our model to four regions of the USA with weekly data available for each county within each region. We analyze the data separately for each region because each region shows a different disease spread pattern. We found a positive association between long-term exposure to PM2.5 and the mortality from the COVID-19 disease for all four regions with three of four being statistically significant. Data and code are available at our GitHub repository. Supplementary materials accompanying this paper appear on-line. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

39. A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions.

Author: Yang, Lu and Chen, Jun
Subjects: FALSE positive error, STATISTICAL power analysis, FALSE discovery rate
Abstract: Background: Differential abundance analysis (DAA) is one central statistical task in microbiome data analysis. A robust and powerful DAA tool can help identify highly confident microbial candidates for further biological validation. Numerous DAA tools have been proposed in the past decade addressing the special characteristics of microbiome data such as zero inflation and compositional effects. Disturbingly, different DAA tools could sometimes produce quite discordant results, opening to the possibility of cherry-picking the tool in favor of one's own hypothesis. To recommend the best DAA tool or practice to the field, a comprehensive evaluation, which covers as many biologically relevant scenarios as possible, is critically needed. Results: We performed by far the most comprehensive evaluation of existing DAA tools using real data-based simulations. We found that DAA methods explicitly addressing compositional effects such as ANCOM-BC, Aldex2, metagenomeSeq (fitFeatureModel), and DACOMP did have improved performance in false-positive control. But they are still not optimal: type 1 error inflation or low statistical power has been observed in many settings. The recent LDM method generally had the best power, but its false-positive control in the presence of strong compositional effects was not satisfactory. Overall, none of the evaluated methods is simultaneously robust, powerful, and flexible, which makes the selection of the best DAA tool difficult. To meet the analysis needs, we designed an optimized procedure, ZicoSeq, drawing on the strength of the existing DAA methods. We show that ZicoSeq generally controlled for false positives across settings, and the power was among the highest. Application of DAA methods to a large collection of real datasets revealed a similar pattern observed in simulation studies. Conclusions: Based on the benchmarking study, we conclude that none of the existing DAA methods evaluated can be applied blindly to any real microbiome dataset. The applicability of an existing DAA method depends on specific settings, which are usually unknown a priori. To circumvent the difficulty of selecting the best DAA tool in practice, we design ZicoSeq, which addresses the major challenges in DAA and remedies the drawbacks of existing DAA methods. ZicoSeq can be applied to microbiome datasets from diverse settings and is a useful DAA tool for robust microbiome biomarker discovery. Ckj99NhoBCvpJt85XtgNgV Video Abstract [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

40. Modeling rounded counts using a zero-inflated mixture of power series family of distributions.

Author: Mirkamali, Sayed Jamal
Subjects: *POWER series, *POISSON regression, *EXPECTATION-maximization algorithms, *SMOKING statistics, *SMOKING, *REGRESSION analysis, *COUNTING
Abstract: This paper proposes an extension of zero-inflated models for analyzing rounded counting outcomes. A zero-inflated mixture of power series is proposed, and the EM algorithm is developed to estimate parameters. The accuracy of estimators is evaluated using a simulation study. The results of simulations show that the estimation procedure is successful and estimates are accurate. An application of our models for analyzing the number of cigarettes smoked per day of respondents for the American's Changing Lives study is enclosed. The proposed model best fits the data and the relationships between rounded counts and other covariates revealed by proposed regression models. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

41. Modeling dynamic correlation in zero‐inflated bivariate count data with applications to single‐cell RNA sequencing data.

Author: Yang, Zhen and Ho, Yen‐Yi
Subjects: *RNA sequencing, *BIOMOLECULES, *NUCLEOTIDE sequencing, *DYNAMIC models, *LATENT variables, *CONSUMER price indexes, *INTERNET servers
Abstract: Interactions between biological molecules in a cell are tightly coordinated and often highly dynamic. As a result of these varying signaling activities, changes in gene coexpression patterns could often be observed. The advancements in next‐generation sequencing technologies bring new statistical challenges for studying these dynamic changes of gene coexpression. In recent years, methods have been developed to examine genomic information from individual cells. Single‐cell RNA sequencing (scRNA‐seq) data are count‐based, and often exhibit characteristics such as overdispersion and zero inflation. To explore the dynamic dependence structure in scRNA‐seq data and other zero‐inflated count data, new approaches are needed. In this paper, we consider overdispersion and zero inflation in count outcomes and propose a ZEro‐inflated negative binomial dynamic COrrelation model (ZENCO). The observed count data are modeled as a mixture of two components: success amplifications and dropout events in ZENCO. A latent variable is incorporated into ZENCO to model the covariate‐dependent correlation structure. We conduct simulation studies to evaluate the performance of our proposed method and to compare it with existing approaches. We also illustrate the implementation of our proposed approach using scRNA‐seq data from a study of minimal residual disease in melanoma. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

42. Alternative Splicing of Neuropeptide Prohormone and Receptor Genes Associated with Pain Sensitivity Was Detected with Zero-Inflated Models.

Author: Southey, Bruce R. and Rodriguez-Zas, Sandra L.
Subjects: ALTERNATIVE RNA splicing, GENETIC engineering, GENES, NUCLEUS accumbens, CELLULAR signal transduction, PAIN medicine, SPREADING cortical depression
Abstract: Migraine is often accompanied by exacerbated sensitivity to stimuli and pain associated with alternative splicing of genes in signaling pathways. Complementary analyses of alternative splicing of neuropeptide prohormone and receptor genes involved in cell–cell communication in the trigeminal ganglia and nucleus accumbens regions of mice presenting nitroglycerin-elicited hypersensitivity and control mice were conducted. De novo sequence assembly detected 540 isoforms from 168 neuropeptide prohormone and receptor genes. A zero-inflated negative binomial model that accommodates for potential excess of zero isoform counts enabled the detection of 27, 202, and 12 differentially expressed isoforms associated with hypersensitivity, regions, and the interaction between hypersensitivity and regions, respectively. Skipped exons and alternative 3′ splice sites were the most frequent splicing events detected in the genes studied. Significant differential splicing associated with hypersensitivity was identified in CALCA and VGF neuropeptide prohormone genes and ADCYAP1R1, CRHR2, and IGF1R neuropeptide receptor genes. The prevalent region effect on differential isoform levels (202 isoforms) and alternative splicing (82 events) were consistent with the distinct splicing known to differentiate central nervous structures. Our findings highlight the changes in alternative splicing in neuropeptide prohormone and receptor genes associated with hypersensitivity to pain and the necessity to target isoform profiles for enhanced understanding and treatment of associated disorders such as migraine. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

43. Modified Poisson estimators for grouped and right-censored counts.

Author: Wang, Chendi
Subjects: *MAXIMUM likelihood statistics, *POISSON regression, *FISHER information, *REGRESSION analysis
Abstract: Grouped and right-censored (GRC) count data are widely adopted to study some sensitive topics or to collect information from less cognitive respondents in many research fields, such as psychology, sociology, and criminology. However, theoretical analysis of GRC counts is involved due to the co-existence of grouping schemes and right-censoring schemes. Recently, a modified Poisson regression model has been proposed to analyze GRC count data under the framework of maximum likelihood estimation. In this paper, I study the asymptotic properties of the maximum likelihood estimators of GRC counts that can cover the modified Poisson estimator. Existing results on modified Poisson estimators for GRC counts are only applicable to stochastic regressors with strictly positive definite Fisher information matrices. Results in this paper are derived under a milder condition that the information matrix of observations is divergent, which can cover the results for the stochastic case in the almost sure sense. Real data simulations are provided to investigate drug use in America. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

44. On the analysis of a discrete-time risk model with INAR(1) processes.

Author: Guan, Guohui and Hu, Xiang
Subjects: *RISK assessment, *PRICE inflation
Abstract: This paper considers an extension of the classical discrete-time risk model for which an INAR(1) process is utilized to model a temporal dependence between the number of claims. We apply a recursive method for deriving the Laplace transform of the aggregate claims with or without discounting in this framework. This methodology is implemented for the class of INAR(1) processes with an arbitrary innovations' distribution. Three risk models via specific INAR(1) processes are studied when the distribution of the individual claim sizes belongs to the class of mixed Erlang distributions. These different models allow us to discuss the frequent manifestations of equidispersion, overdispersion and zero inflation, and to evaluate the distribution of the (discounted) aggregate claims. Numerical examples are performed in order to illustrate the results obtained in this paper. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

45. Solving sampling bias problems in presence–absence or presence‐only species data using zero‐inflated models.

Author: Nolan, Victoria, Gilbert, Francis, and Reader, Tom
Subjects: *SPECIES distribution, *NATURE reserves, *SPECIES pools, *CITIZEN science, *GRID cells
Abstract: Aim: Large databases of species records such as those generated through citizen science projects, archives or museum collections are being used with increasing frequency in species distribution modelling (SDM) for conservation and land management. Despite the broad spatial and temporal coverage of the data, its application is often limited by the issue of sampling bias and consequently, zero inflation; there are more zeros (which are potentially 'false absences') in the data than expected. Here, we demonstrate how pooling species presence data into a 'pseudo‐abundance' count can allow identification and removal of sampling bias through the use of zero‐inflated (ZI) models, and thus solve a common SDM problem. Location: All locations Taxon: All taxa Methods: We present the results of a series of simulations based on hypothetical ecological scenarios of data collection using random and non‐random sampling strategies. Our simulations assume that the locations of occurrence records are known at a high spatial resolution, but that the absence of occurrence records may reflect under‐sampling. To simulate pooling of presence–absence or presence‐only data, we count occurrence records at intermediate and coarse spatial resolutions, and use ZI models to predict the counts (species abundance per grid cell) from environmental layers. Results: ZI models can successfully identify predictors of bias in species data and produce abundance prediction maps that are free from that bias. This phenomenon holds across multiple spatial scales, thereby presenting an advantage over presence‐only SDM methods such as binomial GLMs or MaxEnt, where information about species density is lost, and model performance declines at coarser scales. Main Conclusions: Our results highlight the value of converting presence–absence or presence‐only species data to 'pseudo‐abundance' and using ZI models to address the problem of sampling bias. This method has huge potential for ecological researchers when using large species datasets for research and conservation. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

46. Two-part joint model for a longitudinal semicontinuous marker and a terminal event with application to metastatic colorectal cancer data.

Author: Rustand, Denis, Briollais, Laurent, Tournigand, Christophe, and Rondeau, Virginie
Subjects: *COLORECTAL cancer, *METASTASIS, *PROPORTIONAL hazards models, *SURVIVAL analysis (Biometry), *PROGRESSION-free survival, *SURVIVAL rate, *COMPUTER simulation, *RESEARCH, *RESEARCH methodology, *EVALUATION research, *COMPARATIVE studies, *STATISTICAL models, *LONGITUDINAL method
Abstract: Joint models for a longitudinal biomarker and a terminal event have gained interests for evaluating cancer clinical trials because the tumor evolution reflects directly the state of the disease. A biomarker characterizing the tumor size evolution over time can be highly informative for assessing treatment options and could be taken into account in addition to the survival time. The biomarker often has a semicontinuous distribution, i.e., it is zero inflated and right skewed. An appropriate model is needed for the longitudinal biomarker as well as an association structure with the survival outcome. In this article, we propose a joint model for a longitudinal semicontinuous biomarker and a survival time. The semicontinuous nature of the longitudinal biomarker is specified by a two-part model, which splits its distribution into a binary outcome (first part) represented by the positive versus zero values and a continuous outcome (second part) with the positive values only. Survival times are modeled with a proportional hazards model for which we propose three association structures with the biomarker. Our simulation studies show some bias can arise in the parameter estimates when the semicontinuous nature of the biomarker is ignored, assuming the true model is a two-part model. An application to advanced metastatic colorectal cancer data from the GERCOR study is performed where our two-part model is compared to one-part joint models. Our results show that treatment arm B (FOLFOX6/FOLFIRI) is associated to higher SLD values over time and its positive association with the terminal event leads to an increased risk of death compared to treatment arm A (FOLFIRI/FOLFOX6). [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

47. Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics

Author: Kwangbom Choi, Yang Chen, Daniel A. Skelly, and Gary A. Churchill
Subjects: Single-cell RNA sequencing, Zero inflation, Bayesian model selection, Cell heterogeneity, Gene expression stochasticity, Biology (General), QH301-705.5, Genetics, QH426-470
Abstract: Abstract Background Single-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper treatment of zeros and no consensus on whether zero-inflated count distributions are necessary or even useful. While some studies assume the existence of zero inflation due to technical artifacts and attempt to impute the missing information, other recent studies argue that there is no zero inflation in scRNA-seq data. Results We apply a Bayesian model selection approach to unambiguously demonstrate zero inflation in multiple biologically realistic scRNA-seq datasets. We show that the primary causes of zero inflation are not technical but rather biological in nature. We also demonstrate that parameter estimates from the zero-inflated negative binomial distribution are an unreliable indicator of zero inflation. Conclusions Despite the existence of zero inflation in scRNA-seq counts, we recommend the generalized linear model with negative binomial count distribution, not zero-inflated, as a suitable reference model for scRNA-seq analysis.
Published: 2020
Full Text: View/download PDF

48. A Flexible Model for Time Series of Counts with Overdispersion or Underdispersion, Zero-Inflation and Heavy-Tailedness

Author: Qian, Lianyong and Zhu, Fukang
Published: 2023
Full Text: View/download PDF

49. Co-occurring lotic crayfishes exhibit variable long-term responses to extreme-flow events and temperature.

Author: Dunn, Corey G., Moore, Michael J., Sievert, Nicholas A., Paukert, Craig P., and DiStefano, Robert J.
Subjects: *CRAYFISH, *ATMOSPHERIC temperature, *AIR flow, *LIFE history theory, *TEMPERATURE, *HABITAT selection
Abstract: Crayfish serve critical roles in aquatic ecosystems as engineers, omnivores, and prey. It is unclear how increasingly frequent extreme-flow events and warming air temperatures will affect crayfish populations, partly because there are few long-term crayfish monitoring datasets. Using a unique 10-y dataset, we asked 1) whether recruitment of crayfishes in summer responded to extreme-flow events and air temperature during spring brooding and summer growing periods and 2) whether responses were similar among 3 co-occurring crayfish species. Golden (Faxonius luteus [Creaser, 1933]), Ozark (Faxonius ozarkae [Williams, 1952]), and Spothand (Faxonius punctimanus [Creaser, 1933]) crayfishes were sampled in quadrats at 2 sites each in the Big Piney (1993–2000) and Jacks Fork (1992–2001) rivers (Missouri, USA; n = 3355 1-m2 quadrats). We used zero-inflated generalized linear models to relate variability in quadrat-level age-0 counts to mean daily maximum air temperatures and flow metrics (variability, magnitude, and frequency of extreme high- and low-flow events). Species ranged from a small-bodied, abundant habitat generalist (Golden Crayfish) to large-bodied, uncommon habitat specialists (Ozark and Spothand crayfishes). Golden Crayfish occurred in higher-velocity habitats (riffles, runs) and had variable recruitment that increased during years with few spring and summer high-flow events and summers with lower flows and warmer temperatures. In contrast, annual recruitment variability of Ozark and Spothand crayfishes was low and explained by positive effects of cooler summers and by different flow metrics. Spothand Crayfish recruitment decreased in years with frequent spring and summer high-flow events, whereas lower summer minimum flow was the only flow metric that explained slight increases in Ozark Crayfish recruitment. Relationships with the preceding year's recruitment were quadratic for Ozark and Spothand crayfishes, suggesting potential density dependence at higher recruitment levels. Species-specific responses suggest that closely related crayfishes could respond idiosyncratically to changes in temperature and flow. Temperature- and flow-related disturbances may be key mechanisms mediating competition and, thus, may help maintain crayfish diversity. However, warming air temperatures and increasingly frequent extreme-flow events could disadvantage some species, thereby altering future crayfish assemblages. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

50. Conway–Maxwell–Poisson regression models for dispersed count data.

Author: Sellers, Kimberly F. and Premeaux, Bailey
Subjects: *REGRESSION analysis, *LONGITUDINAL method
Abstract: While Poisson regression serves as a standard tool for modeling the association between a count response variable and explanatory variables, it is well‐documented that this approach is limited by the Poisson model's assumption of data equi‐dispersion. The Conway–Maxwell–Poisson (COM‐Poisson) distribution has demonstrated itself as a viable alternative for real count data that express data over‐ or under‐dispersion, and thus the COM‐Poisson regression can flexibly model associations involving a discrete count response variable and covariates. This work overviews the ongoing developmental knowledge and advancement of COM‐Poisson regression, introducing the reader to the underlying model (and its considered reparametrizations) and related regression constructs, including zero‐inflated models, and longitudinal studies. This manuscript further introduces readers to associated computing tools available to perform COM‐Poisson and related regressions. This article is categorized under:Statistical Models > Linear ModelsStatistical Models > Generalized Linear Models [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

596 results on '"Zero inflation"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources