476 results on '"zero-inflation"'
Search Results
252. countfitteR: efficient selection of count distributions to assess DNA damage.
- Author
-
Chilimoniuk J, Gosiewska A, Słowik J, Weiss R, Deckert PM, Rödiger S, and Burdukiewicz M
- Abstract
Background: DNA double-strand breaks can be counted as discrete foci by imaging techniques. In personalized medicine and pharmacology, the analysis of counting data is relevant for numerous applications, e.g., for cancer and aging research and the evaluation of drug efficacy. By default, it is assumed to follow the Poisson distribution. This assumption, however, may lead to biased results and faulty conclusions in datasets with excess zero values (zero-inflation), a variance larger than the mean (overdispersion), or both. In such cases, the assumption of a Poisson distribution would skew the estimation of mean and variance, and other models like the negative binomial (NB), zero-inflated Poisson or zero-inflated NB distributions should be employed. The model chosen has an influence on the parameter estimation (mean value and confidence interval). Yet the choice of the suitable distribution model is not trivial., Methods: To support, simplify and objectify this process, we have developed the countfitteR software as an R package. We used a Bayesian approach for distribution model selection and the shiny web application framework for interactive data analysis., Results: We show the application of our software based on examples of DNA double-strand break count data from phenotypic imaging by multiplex fluorescence microscopy. In analyzing numerous datasets of molecular pharmacological markers (phosphorylated histone H2AX and p53 binding protein), countfitteR demonstrated an equal or superior statistical performance compared to the usually employed two-step procedure, with an overall power of up to 98%. In addition, it still gave information in cases with no result at all from the two-step procedure. In our data sample we found that the NB distribution was the most frequent, with the Poisson distribution taking second place., Conclusions: countfitteR can perform an automated distribution model selection and thus support the data analysis and lead to objective statistically verifiable estimated values. Originally designed for the analysis of foci in biomedical image data, countfitteR can be used in a variety of areas where non-Poisson distributed counting data is prevalent., Competing Interests: Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at http://dx.doi.org/10.21037/atm-20-6363). SR reports grants from Gesundheitscampus Brandenburg - Konsequenzen der altersassoziierten Zell - und Organfunktionen, grants from Initiative of the Brandenburgian Ministry of Science, Research and Culture (MWFK) during the conduct of the study; JC reports scholarship from Deutscher Akademischer Austauschdienst/German Academic Exchange Service (DAAD) during the conduct of part of the study. The other authors have no conflicts of interest to declare., (2021 Annals of Translational Medicine. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
253. A zero-inflated non-negative matrix factorization for the deconvolution of mixed signals of biological data.
- Author
-
Kong Y, Kozik A, Nakatsu CH, Jones-Hall YL, and Chun H
- Subjects
- Algorithms, Bias, Computer Simulation, Microbiota genetics, Models, Statistical
- Abstract
A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF., (© 2021 Walter de Gruyter GmbH, Berlin/Boston.)
- Published
- 2021
- Full Text
- View/download PDF
254. The Log-Normal zero-inflated cure regression model for labor time in an African obstetric population.
- Author
-
Cavenague de Souza HC, Louzada F, de Oliveira MR, Fawole B, Akintan A, Oyeneyin L, Sanni W, and da Silva Castro Perdoná G
- Abstract
In obstetrics and gynecology, knowledge about how women's features are associated with childbirth is important. This leads to establishing guidelines and can help managers to describe the dynamics of pregnant women's hospital stays. Then, time is a variable of great importance and can be described by survival models. An issue that should be considered in the modeling is the inclusion of women for whom the duration of labor cannot be observed due to fetal death, generating a proportion of times equal to zero. Additionally, another proportion of women's time may be censored due to some intervention. The aim of this paper was to present the Log-Normal zero-inflated cure regression model and to evaluate likelihood-based parameter estimation by a simulation study. In general, the inference procedures showed a better performance for larger samples and low proportions of zero inflation and cure. To exemplify how this model can be an important tool for investigating the course of the childbirth process, we considered the Better Outcomes in Labor Difficulty project dataset and showed that parity and educational level are associated with the main outcomes. We acknowledge the World Health Organization for granting us permission to use the dataset., Competing Interests: No potential conflict of interest was reported by the author(s)., (© 2021 Informa UK Limited, trading as Taylor & Francis Group.)
- Published
- 2021
- Full Text
- View/download PDF
255. The Truth behind the Zeros: A New Approach to Principal Component Analysis of the Neuropsychiatric Inventory.
- Author
-
Hellton KH, Cummings J, Vik-Mo AO, Nordrehaug JE, Aarsland D, Selbaek G, and Giil LM
- Subjects
- Affect, Aggression, Anxiety, Humans, Neuropsychological Tests, Principal Component Analysis, Dementia diagnosis
- Abstract
Psychiatric syndromes in dementia are often derived from the Neuropsychiatric Inventory (NPI) using principal component analysis (PCA). The validity of this statistical approach can be questioned, since the excessive proportion of zeros and skewness of NPI items may distort the estimated relations between the items. We propose a novel version of PCA, ZIBP-PCA, where a zero-inflated bivariate Poisson (ZIBP) distribution models the pairwise covariance between the NPI items. We compared the performance of the method to classical PCA under zero-inflation using simulations, and in two dementia-cohorts (N = 830, N = 1349). Simulations showed that component loadings from PCA were biased due to zero-inflation, while the loadings of ZIBP-PCA remained unaffected. ZIBP-PCA obtained a simpler component structure of "psychosis," "mood" and "agitation" in both dementia-cohorts, compared to PCA. The principal components from ZIBP-PCA had component loadings as follows: First, the component interpreted as "psychosis" was loaded by the items delusions and hallucinations. Second, the "mood" component was loaded by depression and anxiety. Finally, the "agitation" component was loaded by irritability and aggression. In conclusion, PCA is not equipped to handle zero-inflation. Using the NPI, PCA fails to identify components with a valid interpretation, while ZIBP-PCA estimates simple and interpretable components to characterize the psychopathology of dementia.
- Published
- 2021
- Full Text
- View/download PDF
256. Analysis of dental caries using generalized linear and count regression models
- Author
-
Javali M. Phil and Parameshwar V. Pandit
- Subjects
over-dispersion ,ZIP and ZINB regression models ,zero-inflation ,DMFT Index data ,GLM ,Poisson ,Negative Binomial ,lcsh:Statistics ,lcsh:HA1-4737 - Abstract
Generalized linear models (GLM) are generalization of linear regression models, which allow fitting regression models to response data in all the sciences especially medical and dental sciences that follow a general exponential family. These are flexible and widely used class of such models that can accommodate response variables. Count data are frequently characterized by overdispersion and excess zeros. Zero-inflated count models provide a parsimonious yet powerful way to model this type of situation. Such models assume that the data are a mixture of two separate data generation processes: one generates only zeros, and the other is either a Poisson or a negative binomial data-generating process. Zero inflated count regression models such as the zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB) regression models have been used to handle dental caries count data with many zeros. We present an evaluation framework to the suitability of applying the GLM, Poisson, NB, ZIP and ZINB to dental caries data set where the count data may exhibit evidence of many zeros and over-dispersion. Estimation of the model parameters using the method of maximum likelihood is provided. Based on the Vuong test statistic and the goodness of fit measure for dental caries data, the NB and ZINB regression models perform better than other count regression models.
- Published
- 2013
257. Joint Models for Spatial and Spatio-Temporal Point Processes
- Author
-
Albert-Green, Alisha
- Subjects
marked point processes ,Statistics and Probability ,zero-inflation ,disease mapping ,conditional autoregressive models ,joint modelling ,spatio-temporal point processes - Abstract
In biostatistics and environmetrics, interest often centres around the development of models and methods for making inference on observed point patterns assumed to be generated by latent spatial or spatio-temporal processes. Such analyses, however, are challenging as these data are typically hierarchical with complex correlation structures. In instances where data are spatially aggregated by reporting region and rates are low, further complications may result from zero-inflation. In this research, motivated by the analysis of spatio-temporal storm cell data, we generalize the Neyman-Scott parent-child process to account for hierarchical clustering. This is accomplished by allowing the parents to follow a log-Gaussian Cox process thereby incorporating correlation and facilitating inference at all levels of the hierarchy. A primary focus for these data is to jointly model storm cell detection and trajectories. To do so, storm cell duration, speed and direction are included in a marked point process framework. The thesis also proposes a general approach for the joint modelling of multivariate spatially aggregated point processes with the observed outcomes being zero-inflated count random variables. For such models, we incorporate correlation between the random field assumed to generate events and mean event counts. This is applied to lung and bronchus cancer incidence by public health unit in Ontario and a study of Comandra blister rust infection of lodgepole pine trees in British Columbia. The key contributions from this thesis include the following: 1) developing a spatio-temporal hierarchical cluster process that incorporates correlation at all levels of the hierarchy, 2) joint modelling of a hierarchical cluster process and multivariate marks, 3) extending the framework for the joint modelling of multivariate lattice data to enable decomposition of the sources of shared spatial structure and 4) investigating aspects of the partial misspecification of joint spatial structure for multivariate lattice data.
- Published
- 2016
258. Joint Analysis of Zero-heavy Longitudinal Outcomes: Models and Comparison of Study Designs
- Author
-
Lundy, Erin R
- Subjects
joint modeling ,heaped data ,longitudinal data ,random effect model ,mixture model ,Markov chain Monte Carlo ,discrete data ,zero-inflation ,Applied Statistics - Abstract
Understanding the patterns and mechanisms of the process of desistance from criminal activity is imperative for the development of effective sanctions and legal policy. Methodological challenges in the analysis of longitudinal criminal behaviour data include the need to develop methods for multivariate longitudinal discrete data, incorporating modulating exposure variables and several possible sources of zero-inflation. We develop new tools for zero-heavy joint outcome analysis which address these challenges and provide novel insights on processes related to offending patterns. Comparisons with existing approaches demonstrate the benefits of utilizing modeling frameworks which incorporate distinct sources of zeros. An additional concern in this context is heaping of self-reported counts where recorded counts are rounded to different levels of precision. Alternatively, more accurate data that is less burdensome on participants to record may be obtained by collecting information on presence/absence of events at periodic assessments. We compare these two study designs in the context of self-reported data related to criminal behaviour and provide insights on choice of design when heaping is expected. The contributions of this research work include the following: (i) Developing a general framework for joint modeling of multiple longitudinal zero-inflated count outcomes which incorporates a variety of probabilistic structures on the zero counts. (ii) Accommodating a subgroup of subjects who are not at-risk to engage in a particular outcome (iii) Incorporating the effect of a time-dependent exposure variable in settings where some outcomes are prohibited during exposure to a treatment. (iv) Illustrating the extent to which heaping of zero-inflated counts, arising from a variety of heaping mechanisms, can introduce bias, impeding the identification of important risk factors (v) Identifying situations where there is very little loss of efficiency in the analysis of presence/absence data, depending on the partition of the time for the presence/absence records and the underlying rate of events. (vi) Providing recommendations on the design of studies when heaping is a concern. (vii) Modeling of multiple longitudinal binary outcomes where a mixture model approach allows differential rates of recurrence of events, and where the underlying process generating events may resolve.
- Published
- 2016
259. GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data
- Author
-
Shengbing Huang, Li Chen, James Reeve, Jun Chen, Xuefeng Wang, and Lujun Zhang
- Subjects
0301 basic medicine ,Normalization (statistics) ,Bioinformatics ,Computer science ,Zero inflation ,Sequencing data ,lcsh:Medicine ,RNA-Seq ,Computational biology ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Quantitative Biology::Populations and Evolution ,Microbiome ,General Neuroscience ,lcsh:R ,Statistics ,Genomics ,General Medicine ,Quantitative Biology::Genomics ,Normalization ,Quantitative Biology::Quantitative Methods ,030104 developmental biology ,Metagenomics ,Pairwise comparison ,RNA-seq ,General Agricultural and Biological Sciences ,Zero-inflation ,Count data - Abstract
Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios—a simple but effective normalization method—for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.
- Published
- 2018
- Full Text
- View/download PDF
260. Compositional zero-inflated network estimation for microbiome data.
- Author
-
Ha MJ, Kim J, Galloway-Peña J, Do KA, and Peterson CB
- Subjects
- Humans, Leukemia microbiology, Computational Biology methods, Microbiota
- Abstract
Background: The estimation of microbial networks can provide important insight into the ecological relationships among the organisms that comprise the microbiome. However, there are a number of critical statistical challenges in the inference of such networks from high-throughput data. Since the abundances in each sample are constrained to have a fixed sum and there is incomplete overlap in microbial populations across subjects, the data are both compositional and zero-inflated., Results: We propose the COmpositional Zero-Inflated Network Estimation (COZINE) method for inference of microbial networks which addresses these critical aspects of the data while maintaining computational scalability. COZINE relies on the multivariate Hurdle model to infer a sparse set of conditional dependencies which reflect not only relationships among the continuous values, but also among binary indicators of presence or absence and between the binary and continuous representations of the data. Our simulation results show that the proposed method is better able to capture various types of microbial relationships than existing approaches. We demonstrate the utility of the method with an application to understanding the oral microbiome network in a cohort of leukemic patients., Conclusions: Our proposed method addresses important challenges in microbiome network estimation, and can be effectively applied to discover various types of dependence relationships in microbial communities. The procedure we have developed, which we refer to as COZINE, is available online at https://github.com/MinJinHa/COZINE .
- Published
- 2020
- Full Text
- View/download PDF
261. Copula-based Markov zero-inflated count time series models with application.
- Author
-
Alqawba M and Diawara N
- Abstract
Count time series data with excess zeros are observed in several applied disciplines. When these zero-inflated counts are sequentially recorded, they might result in serial dependence. Ignoring the zero-inflation and the serial dependence might produce inaccurate results. In this paper, Markov zero-inflated count time series models based on a joint distribution on consecutive observations are proposed. The joint distribution function of the consecutive observations is constructed through copula functions. First- and second-order Markov chains are considered with the univariate margins of zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), or zero-inflated Conway-Maxwell-Poisson (ZICMP) distributions. Under the Markov models, bivariate copula functions such as the bivariate Gaussian, Frank, and Gumbel are chosen to construct a bivariate distribution of two consecutive observations. Moreover, the trivariate Gaussian and max-infinitely divisible copula functions are considered to build the joint distribution of three consecutive observations. Likelihood-based inference is performed and asymptotic properties are studied. To evaluate the estimation method and the asymptotic results, simulated examples are studied. The proposed class of models are applied to sandstorm counts example. The results suggest that the proposed models have some advantages over some of the models in the literature for modeling zero-inflated count time series data., Competing Interests: No potential conflict of interest was reported by the author(s)., (© 2020 Informa UK Limited, trading as Taylor & Francis Group.)
- Published
- 2020
- Full Text
- View/download PDF
262. A study of alternative approaches to non-normal latent trait distributions in item response theory models used for health outcome measurement.
- Author
-
Smits N, Öğreden O, Garnier-Villarreal M, Terwee CB, and Chalmers RP
- Subjects
- Data Interpretation, Statistical, Models, Statistical, Outcome Assessment, Health Care
- Published
- 2020
- Full Text
- View/download PDF
263. A note on the weighting-type estimations of the zero-inflated Poisson regression model with missing data in covariates.
- Author
-
Lukusa, Martin T. and Phoa, Frederick Kin Hing
- Subjects
- *
POISSON regression , *TRAFFIC accidents , *REGRESSION analysis , *MISSING data (Statistics) , *DATA modeling , *STATISTICAL weighting - Abstract
A two-step weighting type method is proposed when some covariates in the zero-inflated Poisson model are missing at random. Semiparametric estimator and parametric estimator weighting-types are proposed accordingly. Their limit behaviors are studied theoretically and numerically. • Two consistent estimators of a ZIP model are developed when some covariates are MAR.. • These weighting estimators are consistent and asymptotically normal distributed. • They perform better than the estimator in the complete case. • They are more efficient than the estimator based on the true weight. • We apply it to a real-life example on the analysis of a traffic accident data. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
264. Statistical modeling of patterns in annual reproductive rates.
- Author
-
Brooks, Mollie E., Kristensen, Kasper, Darrigo, Maria Rosa, Rubim, Paulo, Uriarte, María, Bruna, Emilio, and Bolker, Benjamin M.
- Subjects
- *
NEGATIVE binomial distribution , *STATISTICAL models , *LOGITS , *BINOMIAL distribution , *LOGGERHEAD turtle - Abstract
Reproduction by individuals is typically recorded as count data (e.g., number of fledglings from a nest or inflorescences on a plant) and commonly modeled using Poisson or negative binomial distributions, which assume that variance is greater than or equal to the mean. However, distributions of reproductive effort are often underdispersed (i.e., variance < mean). When used in hypothesis tests, models that ignore underdispersion will be overly conservative and may fail to detect significant patterns. Here we show that generalized Poisson (GP) and Conway‐Maxwell‐Poisson (CMP) distributions are better choices for modeling reproductive effort because they can handle both overdispersion and underdispersion; we provide examples of how ecologists can use GP and CMP distributions in generalized linear models (GLMs) and generalized linear mixed models (GLMMs) to quantify patterns in reproduction. Using a new R package, glmmTMB, we construct GLMMs to investigate how rainfall and population density influence the number of fledglings in the warbler Oreothlypis celata and how flowering rate of Heliconia acuminata differs between fragmented and continuous forest. We also demonstrate how to deal with zero‐inflation, which occurs when there are more zeros than expected in the distribution, e.g., due to complete reproductive failure by some individuals. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
265. A Flexible Zero-Inflated Poisson Regression Model
- Author
-
Roemmele, Eric S.
- Subjects
- Bootstrap, Count data, EM Algorithm, zero-inflation, semiparametric model, Statistical Models, Statistical Theory
- Abstract
A practical problem often encountered with observed count data is the presence of excess zeros. Zero-inflation in count data can easily be handled by zero-inflated models, which is a two-component mixture of a point mass at zero and a discrete distribution for the count data. In the presence of predictors, zero-inflated Poisson (ZIP) regression models are, perhaps, the most commonly used. However, the fully parametric ZIP regression model could sometimes be restrictive, especially with respect to the mixing proportions. Taking inspiration from some of the recent literature on semiparametric mixtures of regressions models for flexible mixture modeling, we propose a semiparametric ZIP regression model. We present an "EM-like" algorithm for estimation and a summary of asymptotic properties of the estimators. The proposed semiparametric models are then applied to a data set involving clandestine methamphetamine laboratories and Alzheimer's disease.
- Published
- 2019
266. Zero-inflated models for identifying disease risk factors when case detection is imperfect: Application to highly pathogenic avian influenza H5N1 in Thailand (vol 114, pg 28, 2014)
- Author
-
Vergne, Timothée, Paul, Mathilde, Chaengprachak, Wanida, Durand, Benoit, Gilbert, Marius, Dufour, Barbara, Roger, François, Kasemsuwan, Suwicha, Grosbois, Vladimir, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad), Laboratoire de Santé Animale, Agence nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail (ANSES), University of London, Interactions hôtes-agents pathogènes [Toulouse] (IHAP), Institut National de la Recherche Agronomique (INRA)-Ecole Nationale Vétérinaire de Toulouse (ENVT), Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées, Department of Livestock Development, NationalInstitute of Animal Health (NIAH), Fonds national de la recherche scientifique, Université libre de Bruxelles (ULB), École nationale vétérinaire d'Alfort (ENVA), and Kasetsart University (KU)
- Subjects
Conditional autoregressive model ,Surveillance ,Under-detection ,Risk factors ,Bias ,Count ,[SDV]Life Sciences [q-bio] ,Avian influenza H5N1 ,Spatial ,Capture-recapture ,Evaluation ,Zero-inflation - Abstract
International audience; Logistic regression models integrating disease presence/absence data are widely used to identify risk factors for a given disease. However, when data arise from imperfect surveillance systems, the interpretation of results is confusing since explanatory variables can be related either to the occurrence of the disease or to the efficiency of the surveillance system. As an alternative, we present spatial and non-spatial zero-inflated Poisson (ZIP) regressions for modelling the number of highly pathogenic avian influenza (HPAI) H5N1 outbreaks that were reported at subdistrict level in Thailand during the second epidemic wave (July 3rd 2004 to May 5th 2005). The spatial ZIP model fitted the data more effectively than its non-spatial version. This model clarified the role of the different variables: for example, results suggested that human population density was not associated with the disease occurrence but was rather associated with the number of reported outbreaks given disease occurrence. In addition, these models allowed estimating that 902 (95% CI 881-922) subdistricts suffered at least one HPAI H5N1 outbreak in Thailand although only 779 were reported to veterinary authorities, leading to a general surveillance sensitivity of 86.4% (95% Cl 84.5-88.4). Finally, the outputs of the spatial ZIP model revealed the spatial distribution of the probability that a subdistrict could have been a false negative. The methodology presented here can easily be adapted to other animal health contexts.
- Published
- 2015
- Full Text
- View/download PDF
267. Counting on count data models
- Author
-
Rainer Winkelmann
- Subjects
Estimation ,zero-inflation ,Negative binomial distribution ,Poisson regression, negative binomial distribution, zero-inflation, hurdle model ,negative binomial distribution ,Poisson distribution ,Negative multinomial distribution ,jel:C25 ,Poisson regression ,symbols.namesake ,Quasi-likelihood ,Linear regression ,ddc:330 ,symbols ,Econometrics ,C25 ,jel:C2 ,hurdle model ,C2 ,Mathematics ,Count data - Abstract
Often, economic policies are directed toward outcomes that are measured as counts. Examples of economic variables that use a basic counting scale are number of children as an indicator of fertility, number of doctor visits as an indicator of health care demand, and number of days absent from work as an indicator of employee shirking. Several econometric methods are available for analyzing such data, including the Poisson and negative binomial models. They can provide useful insights that cannot be obtained from standard linear regression models. Estimation and interpretation are illustrated in two empirical examples.
- Published
- 2015
- Full Text
- View/download PDF
268. A joint model for hierarchical continuous and zero-inflated overdispersed count data
- Author
-
Geert Molenberghs, Wondwosen Kassahun, Geert Verbeke, Thomas Neyens, and Christel Faes
- Subjects
Statistics and Probability ,Generalized linear model ,Sequence ,Applied Mathematics ,Random effects model ,Outcome (probability) ,Distribution (mathematics) ,Overdispersion ,Modeling and Simulation ,Statistics ,Statistics, Probability and Uncertainty ,Cluster analysis ,clustering ,conjugate random effect ,hurdle model ,joint model ,normal random effect ,overdispersion ,zero-inflation ,Count data ,Mathematics - Abstract
© 2013, © 2013 Taylor & Francis. Many applications in public health, medical and biomedical or other studies demand modelling of two or more longitudinal outcomes jointly to get better insight into their joint evolution. In this regard, a joint model for a longitudinal continuous and a count sequence, the latter possibly overdispersed and zero-inflated (ZI), will be specified that assembles aspects coming from each one of them into one single model. Further, a subject-specific random effect is included to account for the correlation in the continuous outcome. For the count outcome, clustering and overdispersion are accommodated through two distinct sets of random effects in a generalized linear model as proposed by Molenberghs et al. [A family of generalized linear models for repeated measures with normal and conjugate random effects. Stat Sci. 2010;25:325–347]; one is normally distributed, the other conjugate to the outcome distribution. The association among the two sequences is captured by correlating the normal random effects describing the continuous and count outcome sequences, respectively. An excessive number of zero counts is often accounted for by using a so-called ZI or hurdle model. ZI models combine either a Poisson or negative-binomial model with an atom at zero as a mixture, while the hurdle model separately handles the zero observations and the positive counts. This paper proposes a general joint modelling framework in which all these features can appear together. We illustrate the proposed method with a case study and examine it further with simulations. peerreview_statement: The publishing and review policy for this title is described in its Aims & Scope. aims_and_scope_url: http://www.tandfonline.com/action/journalInformation?show=aimsScope&journalCode=gscs20 ispartof: Journal of Statistical Computation and Simulation vol:85 issue:3 pages:552-571 status: published
- Published
- 2015
269. Understanding the Distribution of Crime Victimization Using “British Crime Survey” Data: An Exercise in Statistical Reasoning
- Author
-
Hope, Tim, book editor
- Published
- 2012
- Full Text
- View/download PDF
270. Quantifying the impact of inter-site heterogeneity on the distribution of ChIP-seq data
- Author
-
Simon Tavaré, Andy G. Lynch, and Jonathan Cairns
- Subjects
mixture model ,lcsh:QH426-470 ,Computer science ,zero-inflation ,Negative binomial distribution ,Inference ,high-throughput sequencing ,Statistical model ,Mixture model ,Poisson distribution ,computer.software_genre ,Negative Binomial ,ChIP-seq ,lcsh:Genetics ,symbols.namesake ,Expectation–maximization algorithm ,symbols ,Genetics ,Molecular Medicine ,Original Research Article ,Poisson regression ,Data mining ,computer ,Newton's method ,Genetics (clinical) - Abstract
Chromatin Immunoprecipitation followed by sequencing (ChIP-seq) is a valuable tool for epigenetic studies. Analysis of the data arising from ChIP-seq experiments often requires implicit or explicit statistical modelling of the read counts. The simple Poisson model is attractive, but does not provide a good fit to observed ChIP-seq data. Researchers therefore often either extend to a more general model (e.g. the Negative Binomial), and/or exclude regions of the genome that do not conform to the model. Since many modelling strategies employed for ChIP-seq data reduce to fitting a mixture of Poisson distributions, we explore the problem of inferring the optimal mixing distribution. We apply the Constrained Newton Method (CNM), which suggests the Negative Binomial - Negative Binomial (NB-NB) mixture model as a candidate for modelling ChIP-seq data. We illustrate fitting the NB-NB model with an accelerated EM algorithm on four data sets from three species. Zero-inflated models have been suggested as an approach to improve model fit for ChIP-seq data. We show that the NB-NB mixture model requires no zero-inflation and suggest that in some cases the need for zero inflation is driven by the model's inability to cope with both artefactual large read counts and the frequently observed very low read counts.We see that the CNM-based approach is a useful diagnostic for the assessment of model fit and inference in ChIP-seq data and beyond. Use of the suggested NB-NB mixture model will be of value not only when calling peaks or otherwise modelling ChIP-seq data, but also when simulating data or constructing blacklists de novo.
- Published
- 2014
- Full Text
- View/download PDF
271. Random effect exponentiated-exponential geometric model for clustered/longitudinal zero-inflated count data.
- Author
-
Tapak L, Hamidi O, Amini P, and Verbeke G
- Abstract
For count responses, there are situations in biomedical and sociological applications in which extra zeroes occur. Modeling correlated (e.g. repeated measures and clustered) zero-inflated count data includes special challenges because the correlation between measurements for a subject or a cluster needs to be taken into account. Moreover, zero-inflated count data are often faced with over/under dispersion problem. In this paper, we propose a random effect model for repeated measurements or clustered data with over/under dispersed response called random effect zero-inflated exponentiated-exponential geometric regression model. The proposed method was illustrated through real examples. The performance of the model and asymptotical properties of the estimations were investigated using simulation studies., Competing Interests: No potential conflict of interest was reported by the authors., (© 2019 Informa UK Limited, trading as Taylor & Francis Group.)
- Published
- 2019
- Full Text
- View/download PDF
272. Modelling of zero-inflation improves inference of metagenomic gene count data.
- Author
-
Jonsson V, Österlund T, Nerman O, and Kristiansson E
- Subjects
- Bayes Theorem, Humans, Linear Models, Monte Carlo Method, Poisson Distribution, Bias, Data Interpretation, Statistical, Metagenomics statistics & numerical data
- Abstract
Metagenomics enables the study of gene abundances in complex mixtures of microorganisms and has become a standard methodology for the analysis of the human microbiome. However, gene abundance data is inherently noisy and contains high levels of biological and technical variability as well as an excess of zeros due to non-detected genes. This makes the statistical analysis challenging. In this study, we present a new hierarchical Bayesian model for inference of metagenomic gene abundance data. The model uses a zero-inflated overdispersed Poisson distribution which is able to simultaneously capture the high gene-specific variability as well as zero observations in the data. By analysis of three comprehensive datasets, we show that zero-inflation is common in metagenomic data from the human gut and, if not correctly modelled, it can lead to substantial reductions in statistical power. We also show, by using resampled metagenomic data, that our model has, compared to other methods, a higher and more stable performance for detecting differentially abundant genes. We conclude that proper modelling of the gene-specific variability, including the excess of zeros, is necessary to accurately describe gene abundances in metagenomic data. The proposed model will thus pave the way for new biological insights into the structure of microbial communities.
- Published
- 2019
- Full Text
- View/download PDF
273. Comparative study of micronucleus assays and dicentric plus ring chromosomes for dose assessment in particular cases of partial-body exposure.
- Author
-
Mendes ME, Mendonça JCG, Barquinero JF, Higueras M, Gonzalez JE, Andrade AMG, Silva LM, Nascimento AMS, Lima JCF, Silva JCG, Hwang S, Melo AMMA, Santos N, and Lima FF
- Subjects
- Chromosome Aberrations, Dose-Response Relationship, Radiation, Humans, Poisson Distribution, Micronucleus Tests, Radiation Dosage, Ring Chromosomes
- Abstract
Purpose: The goal was to compare the micronucleus (MN) and dicentric plus ring chromosomes (D + R) assays for dose assessment in cases of partial body irradiations (PBI). Materials and methods: We constructed calibration curves for each assay at doses ranging from 0 to 5 Gy of X-rays at dose rate of 0.275 Gy/min. To simulate partial-body exposures, blood samples from two donors were irradiated with 0.5, 1, 2 and 4 Gy and the ratios of irradiated to unirradiated blood were 25, 50, and 100%. Different tests were used to confirm if all samples were overdispersed or zero-inflated and for partial-body dose assessment we used the Qdr, Dolphin and Bayesian model. Results: In our samples for D + R calibration curve, practically all doses agreed with Poisson assumption, but MN exhibited overdispersed and zero-inflated cellular distributions. The exact Poisson tests and zero-inflated tests demonstrate that virtually all samples of D + R from PBI simulation fit the Poisson distribution and were not zero-inflated, but the MN samples were also overdispersed and zero-inflated. In the partial-body estimation, when Qdr and Dolphin methods were used the D + R results were better than MN, but the doses estimation defined by the Bayesian methodology were more accurate than the classical methods. Conclusions: Dicentric chromosomes continue to prove to be the best biological marker for dose assessment. However exposure scenarios of partial-body estimation, overdispersion and zero-inflation may not occur, it being a critical point not only for dose assessment, but also to confirm partial-body exposure. MN could be used as alternative assay for partial-body dose estimation, but in case of an accident without any information, the MN assay could not define whether the accident was a whole-body irradiation (WBI) or a PBI.
- Published
- 2019
- Full Text
- View/download PDF
274. Estimating the intensity of use by interacting predators and prey using camera traps.
- Author
-
Keim JL, Lele SR, DeWitt PD, Fitzpatrick JJ, and Jenni NS
- Subjects
- Animals, Ecosystem, Humans, Predatory Behavior, Deer, Reindeer, Wolves
- Abstract
Understanding how organisms distribute themselves in response to interacting species, ecosystems, climate, human development and time is fundamental to ecological study and practice. A measure to quantify the relationship among organisms and their environments is intensity of use: the rate of use of a specific resource in a defined unit of time. Estimating the intensity of use differs from estimating probabilities of occupancy or selection, which can remain constant even when the intensity of use varies. We describe a method to evaluate the intensity of use across conditions that vary in both space and time. We demonstrate its application on a large mammal community where linear developments and human activity are conjectured to influence the interactions between white-tailed deer (Odocoileus virginianus) and wolves (Canis lupus) with possible consequences on threatened woodland caribou (Rangifer tarandus caribou). We collect and quantify intensity of use data for multiple, interacting species with the goal of assessing management efficacy, including a habitat restoration strategy for linear developments. We test whether blocking linear developments by spreading logs across a 200-m interval can be applied as an immediate mitigation to reduce the intensities of use by humans, predator and prey species in a boreal caribou range. We deployed camera traps on linear developments with and without restoration treatments in a landscape exposed to both timber and oil development. We collected a three-year dataset and employed spatial recurrent event models to analyse intensity of use by an interacting human and large mammal community across a range of environmental and climatic conditions. Spatial recurrent event models revealed that intensity of use by humans influenced the intensity of use by all five large mammal species evaluated, and the intensities of use by wolves and deer were inextricably linked in space and time. Conditions that resist travel on linear developments had a strong negative effect on the intensity of human and large mammal use. Mitigation strategies that resist, or redirect, animal travel on linear developments can reduce the effects of resource development on interacting human and predator-prey interactions. Our approach is easily applied to other continuous time point-based survey methodologies and shows that measuring the intensity of use within animal communities can help scientists monitor, mitigate and understand ecological states and processes., (© 2019 The Authors. Journal of Animal Ecology © 2019 British Ecological Society.)
- Published
- 2019
- Full Text
- View/download PDF
275. A pathway for multivariate analysis of ecological communities using copulas.
- Author
-
Anderson MJ, de Valpine P, Punnett A, and Miller AE
- Abstract
We describe a new pathway for multivariate analysis of data consisting of counts of species abundances that includes two key components: copulas, to provide a flexible joint model of individual species, and dissimilarity-based methods, to integrate information across species and provide a holistic view of the community. Individual species are characterized using suitable (marginal) statistical distributions, with the mean, the degree of over-dispersion, and/or zero-inflation being allowed to vary among a priori groups of sampling units. Associations among species are then modeled using copulas, which allow any pair of disparate types of variables to be coupled through their cumulative distribution function, while maintaining entirely the separate individual marginal distributions appropriate for each species. A Gaussian copula smoothly captures changes in an index of association that excludes joint absences in the space of the original species variables. A permutation-based filter with exact family-wise error can optionally be used a priori to reduce the dimensionality of the copula estimation problem. We describe in detail a Monte Carlo expectation maximization algorithm for efficient estimation of the copula correlation matrix with discrete marginal distributions (counts). The resulting fully parameterized copula models can be used to simulate realistic ecological community data under fully specified null or alternative hypotheses. Distributions of community centroids derived from simulated data can then be visualized in ordinations of ecologically meaningful dissimilarity spaces. Multinomial mixtures of data drawn from copula models also yield smooth power curves in dissimilarity-based settings. Our proposed analysis pathway provides new opportunities to combine model-based approaches with dissimilarity-based methods to enhance understanding of ecological systems. We demonstrate implementation of the pathway through an ecological example, where associations among fish species were found to increase after the establishment of a marine reserve.
- Published
- 2019
- Full Text
- View/download PDF
276. Statistical methods for modelling the distribution and abundance of populations : application to a national survey of diurnal raptors in France
- Author
-
Le Rest, Kévin, Centre d'Études Biologiques de Chizé - UMR 7372 (CEBC), Université de La Rochelle (ULR)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Université de Poitiers, David Pinaud & Vincent Bretagnolle(david.pinaud@cebc.cnrs.fr & vincent.bretagnolle@cebc.cnrs.fr), Institut National de la Recherche Agronomique (INRA)-Université de La Rochelle (ULR)-Centre National de la Recherche Scientifique (CNRS), and Lacalle, Martine
- Subjects
[SDE] Environmental Sciences ,Surdispersion ,Raptors ,[SDE]Environmental Sciences ,Rapaces ,Autocorrélation spatiale ,Model selection ,Inflation en zéro ,Spatial autocorrelation ,Sélection de modèle ,Overdispersion ,Zero-inflation - Abstract
In the context of global biodiversity loss, more and more surveys are done at a broad spatial extent and during a long time period, which is done in order to understand processes driving the distribution, the abundance and the trends of populations at the relevant biological scales. These studies allow then defining more precise conservation status for species and establish pertinent conservation measures. However, the statistical analysis of such datasets leads some concerns. Usually, generalized linear models (GLM) are used, trying to link the variable of interest (e.g. presence/absence or abundance) with some external variables suspected to influence it (e.g. climatic and habitat variables). The main unresolved concern is about the selection of these external variables from a spatial dataset. This thesis details several possibilities and proposes a widely usable method based on a cross-validation procedure accounting for spatial dependencies. The method is evaluated through simulations and applied on several case studies, including datasets with higher than expected variability (overdispersion). A focus is also done for methods accounting for an excess of zeros (zeroinflation). The last part of this manuscript applies these methodological developments for modelling the distribution, abundance and trend of raptors breeding in France., Face au déclin global de la biodiversité, de nombreux suivis de populations animales et végétales sont réalisés sur de grandes zones géographiques et durant une longue période afin de comprendre les facteurs déterminant la distribution, l'abondance et les tendances des populations. Ces suivis à larges échelles permettent de statuer quantitativement sur l'état des populations et de mettre en place des plans de gestion appropriés en accord avec les échelles biologiques. L'analyse statistique de ce type de données n'est cependant pas sans poser un certain nombre de problèmes. Classiquement, on utilise des modèles linéaires généralisés (GLM), formalisant les liens entre des variables supposées influentes (par exemple caractérisant l'environnement) et la variable d'intérêt (souvent la présence / absence de l'espèce ou des comptages). Il se pose alors un problème majeur qui concerne la manière de sélectionner ces variables influentes dans un contexte de données spatialisées. Cette thèse explore différentes solutions et propose une méthode facilement applicable, basée sur une validation croisée tenant compte des dépendances spatiales. La performance de la méthode est évaluée par des simulations et différents cas d'études dont des données de comptages présentant une variabilité plus forte qu'attendue (surdispersion). Un intérêt particulier est aussi porté aux méthodes de modélisation pour les données ayant un nombre de zéros plus important qu'attendu (inflation en zéro). La dernière partie de la thèse utilise ces enseignements méthodologiques pour modéliser la distribution, l'abondance et les tendances des rapaces diurnes en France.
- Published
- 2013
277. Analysis of partial and complete protection in malaria cohort studies
- Author
-
Matthew Cairns, Kwaku Poku Asante, Seth Owusu-Agyei, Daniel Chandramohan, Brian Greenwood, and Paul Milligan
- Subjects
Male ,Statistics as Topic ,Population ,Negative binomial distribution ,Ghana ,Overdispersion ,Cohort Studies ,Risk Factors ,Environmental health ,parasitic diseases ,Humans ,Medicine ,education ,education.field_of_study ,business.industry ,Incidence ,Research ,Incidence (epidemiology) ,Malaria epidemiology ,Infant, Newborn ,Infant ,Regression analysis ,medicine.disease ,Malaria ,Infectious Diseases ,Child, Preschool ,Cohort ,Female ,Parasitology ,Heterogeneity ,Epidemiologic Methods ,business ,Zero-inflation ,Cohort study - Abstract
BACKGROUND: Malaria transmission is highly heterogeneous and analysis of incidence data must account for this for correct statistical inference. Less widely appreciated is the occurrence of a large number of zero counts (children without a malaria episode) in malaria cohort studies. Zero-inflated regression methods provide one means of addressing this issue, and also allow risk factors providing complete and partial protection to be disentangled. METHODS: Poisson, negative binomial (NB), zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) regression models were fitted to data from two cohort studies of malaria in children in Ghana. Multivariate models were used to understand risk factors for elevated incidence of malaria and for remaining malaria-free, and to estimate the fraction of the population not at risk of malaria. RESULTS: ZINB models, which account for both heterogeneity in individual risk and an unexposed sub-group within the population, provided the best fit to data in both cohorts. These approaches gave additional insight into the mechanism of factors influencing the incidence of malaria compared to simpler approaches, such as NB regression. For example, compared to urban areas, rural residence was found to both increase the incidence rate of malaria among exposed children, and increase the probability of being exposed. In Navrongo, 34% of urban residents were estimated to be at no risk, compared to 3% of rural residents. In Kintampo, 47% of urban residents and 13% of rural residents were estimated to be at no risk. CONCLUSION: These results illustrate the utility of zero-inflated regression methods for analysis of malaria cohort data that include a large number of zero counts. Specifically, these results suggest that interventions that reach mainly urban residents will have limited overall impact, since some urban residents are essentially at no risk, even in areas of high endemicity, such as in Ghana.
- Published
- 2013
- Full Text
- View/download PDF
278. A marginalized model for zero-inflated, overdispersed and correlated count data
- Author
-
Iddi, Samuel and Molenberghs, Geert
- Subjects
marginal multilevel model ,maximum likelihood estimation ,random effects model ,negative binomial ,overdispersion ,partial Marginalization ,poisson model ,zero-inflation - Abstract
Iddi and Molenberghs (2012) merged the attractive features of the so-called combined model of Molenberghs et al. (2010) and the marginalized model of Heagerty (1999) for hierarchical non-Gaussian data with overdispersion. In this model, the fixed-effect parameters retain their marginal interpretation. Lee et al. (2011) also developed an extension of Heagerty (1999) to handle zero-inflation from count data, using the hurdle model. To bring together all of these features, a marginalized, zero-inflated, overdispersed model for correlated count data is proposed. Using two empirical sets of data, it is shown that the proposed model leads to important improvements in model fit. The authors gratefully acknowledge the financial support from the IAP research Network P7/06 of the Belgian Government (Belgian Science Policy).
- Published
- 2013
279. A simulation study of maximum likelihood estimation in logistic regression with cured individuals
- Author
-
Diop, Aba, Diop, Aliou, Dupuy, Jean-François, Mathématiques, Image et Applications - EA 3165 (MIA), Université de La Rochelle (ULR), laboratoire d'Etudes et de recherches en Statistiques et Développement (LERSTAD), Université Gaston Bergé Sénégal, Institut de Recherche Mathématique de Rennes (IRMAR), AGROCAMPUS OUEST, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Université de Rennes 2 (UR2), Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA), Dupuy, Jean-Francois, La Rochelle Université (ULR), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-École normale supérieure - Rennes (ENS Rennes)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-INSTITUT AGRO Agrocampus Ouest, and Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
- Subjects
mixture model ,[STAT.ME] Statistics [stat]/Methodology [stat.ME] ,maximum likelihood estimation ,[STAT.CO]Statistics [stat]/Computation [stat.CO] ,[STAT.CO] Statistics [stat]/Computation [stat.CO] ,[STAT.ME]Statistics [stat]/Methodology [stat.ME] ,Zero-inflation - Abstract
The logistic regression model is widely used to investigate the relationship between a binary outcome Y and a set of potential predictors X. Diop et al. (2011) present some conditions under which the maximum likelihood estimator is consistent and asymptotically normal in the logistic regression model with a cure fraction. So far, however, only limited simulation results are available to judge the quality of this estimator in finite samples. Therefore in this paper, we conduct a detailed simulation study of its numerical properties. We evaluate its accuracy and the quality of the normal approximation of its asymptotic distribution. We also study the quality of the approximation for constructing asymptotic Wald-type tests of hypothesis. Finally, we consider the problem of estimating the conditional probability of the outcome. Our results indicate that when the proportion of cured individuals is moderate to moderately large, and the sample size is large enough, reliable statistical inferences can be obtained for the regression effects and the probability of the outcome. Our results also indicate that the approximations can be problematic when the cure fraction is very large.
- Published
- 2011
280. Comparing distribution models for small samples of overdispersed counts of freshwater fish
- Author
-
Jean-Michel Olivier, Lise Vaudor, Nicolas Lamouroux, Milieux aquatiques, écologie et pollutions (UR MALY), Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA), Biodiversité des Écosystèmes Lotiques, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés (LEHNA), Centre National de la Recherche Scientifique (CNRS)-Institut National de la Recherche Agronomique (INRA)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-École Nationale des Travaux Publics de l'État (ENTPE)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Recherche Agronomique (INRA)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-École Nationale des Travaux Publics de l'État (ENTPE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-École Nationale des Travaux Publics de l'État (ENTPE)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-École Nationale des Travaux Publics de l'État (ENTPE)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
0106 biological sciences ,zero-inflation ,Negative binomial distribution ,Sample (statistics) ,poisson distribution ,Poisson distribution ,negative binomial distribution ,01 natural sciences ,010104 statistics & probability ,symbols.namesake ,Abundance (ecology) ,Bayesian information criterion ,Statistical inference ,14. Life underwater ,0101 mathematics ,Relative species abundance ,profile likelihood ,Ecology, Evolution, Behavior and Systematics ,confidence intervals ,Nature and Landscape Conservation ,Mathematics ,abundance ,Ecology ,010604 marine biology & hydrobiology ,Confidence interval ,symbols ,[SDE.BE]Environmental Sciences/Biodiversity and Ecology - Abstract
International audience; The study of species abundance often relies on repeated abundance counts whose number is limited by logistic or financial constraints. The distribution of abundance counts is generally right-skewed (i.e. with many zeros and few high values) and needs to be modelled for statistical inference. We used an extensive dataset involving about 100,000 fish individuals of 12 freshwater fish species collected in electrofishing points (7 m(2)) during 350 field surveys made in 25 stream sites, in order to compare the performance and the generality of four distribution models of counts (Poisson, negative binomial and their zero-inflated counterparts). The negative binomial distribution was the best model (Bayesian Information Criterion) for 58% of the samples (species survey combinations) and was suitable for a variety of life histories, habitat, and sample characteristics. The performance of the models was closely related to samples' statistics such as total abundance and variance. Finally, we illustrated the consequences of a distribution assumption by calculating confidence intervals around the mean abundance, either based on the most suitable distribution assumption or on an asymptotical, distribution-free (Student's) method. Student's method generally corresponded to narrower confidence intervals, especially when there were few (
- Published
- 2011
- Full Text
- View/download PDF
281. Accounting for self-protective responses in randomized response data from a social security survey using the zero-inflated Poisson model
- Author
-
Ulf Böckenholt, Maarten Cruyff, Peter G. M. van der Heijden, and Ardo van den Hout
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Variables ,zero-inflation ,media_common.quotation_subject ,Poisson distribution ,Statistics - Applications ,Outcome (probability) ,Social security ,Poisson regression ,symbols.namesake ,Variable (computer science) ,regulatory noncompliance ,Modeling and Simulation ,Statistics ,Respondent ,Randomized response ,symbols ,Zero-inflated model ,self-protective responses ,Applications (stat.AP) ,Statistics, Probability and Uncertainty ,Psychology ,media_common - Abstract
In 2004 the Dutch Department of Social Affairs conducted a survey to assess the extent of noncompliance with social security regulations. The survey was conducted among 870 recipients of social security benefits and included a series of sensitive questions about regulatory noncompliance. Due to the sensitive nature of the questions the randomized response design was used. Although randomized response protects the privacy of the respondent, it is unlikely that all respondents followed the design. In this paper we introduce a model that allows for respondents displaying self-protective response behavior by consistently giving the nonincriminating response, irrespective of the outcome of the randomizing device. The dependent variable denoting the total number of incriminating responses is assumed to be generated by the application of randomized response to a latent Poisson variable denoting the true number of rule violations. Since self-protective responses result in an excess of observed zeros in relation to the Poisson randomized response distribution, these are modeled as observed zero-inflation. The model includes predictors of the Poisson parameters, as well as predictors of the probability of self-protective response behavior., Published in at http://dx.doi.org/10.1214/07-AOAS135 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- Published
- 2008
282. A Model Selection Paradigm for Modeling Recurrent Adenoma Data in Polyp Prevention Trials
- Author
-
Davidson, Christopher L. and Davidson, Christopher L.
- Abstract
Colorectal polyp prevention trials (PPTs) are randomized, placebo-controlled clinical trials that evaluate some chemo-preventive agent and include participants who will be followed for at least 3 years to compare the recurrence rates (counts) of adenomas. A large proportion of zero counts will likely be observed in both groups at the end of the observation period. Poisson general linear models (GLMs) are usually employed for estimation of recurrence in PPTs. Other models, including the negative binomial (NB2), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) may be better suited to handle zero-inflation or other forms of overdispersion that are common in count data. A model selection paradigm that determines a statistical approach for choosing the best fitting model for recurrence data is described. An example using a subset from a large Phase III clinical trial indicated that the ZINB model was the best fitting model for the data.
- Published
- 2012
283. Score tests for zero-inflated double poisson regression models.
- Author
-
Xie, Feng-chang, Lin, Jin-guan, and Wei, Bo-cheng
- Abstract
Count data with excess zeros encountered in many applications often exhibit extra variation. Therefore, zero-inflated Poisson (ZIP) model may fail to fit such data. In this paper, a zero-inflated double Poisson model (ZIDP), which is generalization of the ZIP model, is studied and the score tests for the significance of dispersion and zero-inflation in ZIDP model are developed. Meanwhile, this work also develops homogeneous tests for dispersion and/or zero-inflation parameter, and corresponding score test statistics are obtained. One numerical example is given to illustrate our methodology and the properties of score test statistics are investigated through Monte Carlo simulations. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
284. Zero-inflated generalized Poisson models with regression effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates
- Author
-
Czado, C., Erhardt, V., Min, A. and Wagner, S.
- Subjects
maximum likelihood estimator ,overdispersion ,patent outsourcing ,Vuong test ,zero-inflated generalized Poisson regression ,zero-inflation ,ddc - Abstract
This paper focuses on an extension of zero-inflated generalized Poisson (ZIGP) regression models for count data. We discuss generalized Poisson (GP) models where dispersion is modelled by an additional model parameter. Moreover, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. In addition to ZIGP models considered by several authors, we now allow for regression on the overdispersion and zero-inflation parameters. Consequently, we propose tools for an exploratory data analysis on the dispersion and zero-inflation level. An application dealing with outsourcing of patent filing processes will be used to compare these nonnested models. The model parameters are fitted by maximum likelihood using our R package ”ZIGP” available on CRAN. Asymptotic normality of the ML estimates in this non-exponential setting is proven. Standard errors are estimated using the asymptotic normality of the estimates. Appropriate exploratory data analysis tools are developed. Also, a model comparison using AIC statistics and Vuong tests is carried out. For the given data, our extended ZIGP regression model will prove to be superior over GP and ZIP models and even over ZIGP models with constant overall dispersion and zero-inflation parameters demonstrating the usefulness of our proposed extensions.
- Published
- 2006
285. On Hinde-Demetrio Regression Models for Overdispersed Count Data
- Author
-
Kokonendji, Célestin, Demétrio, Clarice G.B., Zocchi, Silvio S., Laboratoire de Mathématiques et de leurs Applications [Pau] (LMAP), Université de Pau et des Pays de l'Adour (UPPA)-Centre National de la Recherche Scientifique (CNRS), Escola Superior de Agricultura 'Luiz de Queiroz' (ESALQ), and Universidade de São Paulo (USP)
- Subjects
model selection ,compound Poisson ,unit variance function ,zero-inflation ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,generalized linear model ,AMS classification: Primary 62J02 ,Secondary 62J12, 62F07 ,Additive exponential dispersion model - Abstract
19 pages; In this paper we introduce the Hinde-Demétrio (HD) regression models for analyzing overdispersed count data and, mainly, investigate the e¤ect of dispersion parameter. The HD distributions are discrete additive exponential dispersion models (depending on canonical and dispersion parameters) with a third real index parameter p and have been characterized by its unit variance function $\mu+\mu^p$. For p equals to 2; 3, ..., the corresponding distributions are concentrated on nonnegative integers, overdispersed and zero-inflated with respect to a Poisson distribution having the same mean. The negative binomial ($p = 2$), strict arcsine ($p = 3$) and Poisson ($p \rightarrow \infty$) distributions are particular count HD families. From generalized linear modelling framework, the effect of dispersion parameter in the HD regression models, among other things, is pointed out through the double mean parametrization: unit and standard means. In the particular additive model, this effect must be negligible within an adequate HD model for fixed integer $p$. The estimation of the integer $p$ is also examined separately. The results are illustrated and discussed on a horticultural data set.
- Published
- 2006
286. Zero-inflated generalized Poisson models with regression effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates
- Author
-
Czado, Claudia, Erhardt, Vinzenz, and Min, Aleksey
- Subjects
ddc:519 ,zero-inflation ,zero-inflated generalized Poisson regression ,overdispersion ,Vuong test ,maximum likelihood estimator ,patent outsourcing - Abstract
This paper focuses on an extension of zero-inflated generalized Poisson (ZIGP) regression models for count data. We discuss generalized Poisson (GP) models where dispersion is modelled by an additional model parameter. Moreover, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. In addition to ZIGP regression introduced by Famoye and Singh (2003), we now allow for regression on the overdispersion and zero-inflation parameters. Consequently, we propose tools for an exploratory data analysis on the dispersion and zero-inflation level. An application dealing with outsourcing of patent filing processes will be used to compare these nonnested models. The model parameters are fitted by maximum likelihood. Asymptotic normality of the ML estimates in this non-exponential setting is proven. Standard errors are estimated using the asymptotic normality of the estimates. Appropriate exploratory data analysis tools are developed. Also, a model comparison using AIC statistics and Vuong tests (see Vuong (1989)) is carried out. For the given data, our extended ZIGP regression model will prove to be superior over GP and ZIP models and even ZIGP models with constant overall dispersion and zero-inflation parameters demonstrating the usefulness of our proposed extensions.
- Published
- 2006
- Full Text
- View/download PDF
287. Marginalized zero-inflated Poisson models with missing covariates.
- Author
-
Benecha HK, Preisser JS, Divaris K, Herring AH, and Das K
- Subjects
- Analysis of Variance, Child, Dental Caries prevention & control, Fluorides pharmacology, Humans, Monte Carlo Method, Mouthwashes pharmacology, Poisson Distribution, Schools statistics & numerical data, Biometry methods, Models, Statistical
- Abstract
Unlike zero-inflated Poisson regression, marginalized zero-inflated Poisson (MZIP) models for counts with excess zeros provide estimates with direct interpretations for the overall effects of covariates on the marginal mean. In the presence of missing covariates, MZIP and many other count data models are ordinarily fitted using complete case analysis methods due to lack of appropriate statistical methods and software. This article presents an estimation method for MZIP models with missing covariates. The method, which is applicable to other missing data problems, is illustrated and compared with complete case analysis by using simulations and dental data on the caries preventive effects of a school-based fluoride mouthrinse program., (© 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2018
- Full Text
- View/download PDF
288. GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data.
- Author
-
Chen L, Reeve J, Zhang L, Huang S, Wang X, and Chen J
- Abstract
Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios-a simple but effective normalization method-for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa., Competing Interests: Jun Chen is an Academic Editor for PeerJ.
- Published
- 2018
- Full Text
- View/download PDF
289. Identifiability in N-mixture models: a large-scale screening test with bird data.
- Author
-
Kéry M
- Subjects
- Animals, Poisson Distribution, Probability, Sample Size, Birds, Models, Statistical
- Abstract
Binomial N-mixture models have proven very useful in ecology, conservation, and monitoring: they allow estimation and modeling of abundance separately from detection probability using simple counts. Recently, doubts about parameter identifiability have been voiced. I conducted a large-scale screening test with 137 bird data sets from 2,037 sites. I found virtually no identifiability problems for Poisson and zero-inflated Poisson (ZIP) binomial N-mixture models, but negative-binomial (NB) models had problems in 25% of all data sets. The corresponding multinomial N-mixture models had no problems. Parameter estimates under Poisson and ZIP binomial and multinomial N-mixture models were extremely similar. Identifiability problems became a little more frequent with smaller sample sizes (267 and 50 sites), but were unaffected by whether the models did or did not include covariates. Hence, binomial N-mixture model parameters with Poisson and ZIP mixtures typically appeared identifiable. In contrast, NB mixtures were often unidentifiable, which is worrying since these were often selected by Akaike's information criterion. Identifiability of binomial N-mixture models should always be checked. If problems are found, simpler models, integrated models that combine different observation models or the use of external information via informative priors or penalized likelihoods, may help., (© 2017 by the Ecological Society of America.)
- Published
- 2018
- Full Text
- View/download PDF
290. Modeling excess zeros and heterogeneity in count data from a complex survey design with application to the demographic health survey in sub-Saharan Africa.
- Author
-
Dai L, Sweat MD, and Gebregziabher M
- Subjects
- Adolescent, Adult, Africa South of the Sahara epidemiology, Female, HIV Infections epidemiology, Humans, Male, Middle Aged, Models, Statistical, Poisson Distribution, Young Adult, Demography statistics & numerical data, Health Surveys statistics & numerical data
- Abstract
Purpose To show a novel application of a weighted zero-inflated negative binomial model in modeling count data with excess zeros and heterogeneity to quantify the regional variation in HIV-AIDS prevalence in sub-Saharan African countries. Methods Data come from latest round of the Demographic and Health Survey (DHS) conducted in three countries (Ethiopia-2011, Kenya-2009 and Rwanda-2010) using a two-stage cluster sampling design. The outcome is an aggregate count of HIV cases in each census enumeration area of each country. The outcome data are characterized by excess zeros and heterogeneity due to clustering. We compare scale weighted zero-inflated negative binomial models with and without random effects to account for zero-inflation, complex survey design and clustering. Finally, we provide marginalized rate ratio estimates from the best zero-inflated negative binomial model. Results The best fitting zero-inflated negative binomial model is scale weighted and with a common random intercept for the three countries. Rate ratio estimates from the final model show that HIV prevalence is associated with age and gender distribution, HIV acceptance, HIV knowledge, and its regional variation is associated with divorce rate, burden of sexually transmitted diseases and rural residence. Conclusions Scale weighted zero-inflated negative binomial with proper modeling of random effects is shown to be the best model for count data from a complex survey design characterized by excess zeros and extra heterogeneity. In our data example, the final rate ratio estimates show significant regional variation in the factors associated with HIV prevalence indicating that HIV intervention strategies should be tailored to the unique factors found in each country.
- Published
- 2018
- Full Text
- View/download PDF
291. Modelling bivariate count series with excess zeros
- Author
-
Lee, Andy, Wang, Kui, Carrivick, Philip, Yau, K., Stevenson, M., Lee, Andy, Wang, Kui, Carrivick, Philip, Yau, K., and Stevenson, M.
- Abstract
Bivariate time series of counts with excess zeros relative to the Poisson process are common in many bioscience applications. Failure to account for the extra zeros in the analysis may result in biased parameter estimates and misleading inferences. A class of bivariate zero-inflated Poisson autoregression models is presented to accommodate the zero-inflation and the inherent serial dependency between successive observations. An autoregressive correlation structure is assumed in the random component of the compound regression model. Parameter estimation is achieved via an EM algorithm, by maximizing an appropriate log-likelihood function to obtain residual maximum likelihood estimates. The proposed method is applied to analyze a bivariate series from an occupational health study, in which the zero-inflated injury count events are classified as either musculoskeletal or non-musculoskeletal in nature. The approach enables the evaluation of the effectiveness of a participatory ergonomics intervention at the population level, in terms of reducing the overall incidence of lost-time injury and a simultaneous decline in the two mean injury rates.
- Published
- 2005
292. Direct and flexible marginal inference for semicontinuous data.
- Author
-
Smith VA and Preisser JS
- Subjects
- Biostatistics methods, Data Interpretation, Statistical, Humans, Regression Analysis, Models, Statistical
- Abstract
The marginalized two-part (MTP) model for semicontinuous data proposed by Smith et al. provides direct inference for the effect of covariates on the marginal mean of positively continuous data with zeros. This brief note addresses mischaracterizations of the MTP model by Gebregziabher et al. Additionally, the MTP model is extended to incorporate the three-parameter generalized gamma distribution, which takes many well-known distributions as special cases, including the Weibull, gamma, inverse gamma, and log-normal distributions.
- Published
- 2017
- Full Text
- View/download PDF
293. Statistical Models for the Analysis of Zero-Inflated Pain Intensity Numeric Rating Scale Data.
- Author
-
Goulet JL, Buta E, Bathulapalli H, Gueorguieva R, and Brandt CA
- Subjects
- Adolescent, Adult, Age Factors, Cohort Studies, Female, Humans, Iraq War, 2003-2011, Male, Middle Aged, Musculoskeletal Pain physiopathology, Veterans, Young Adult, Models, Statistical, Musculoskeletal Pain diagnosis, Pain Measurement methods
- Abstract
Pain intensity is often measured in clinical and research settings using the 0 to 10 numeric rating scale (NRS). NRS scores are recorded as discrete values, and in some samples they may display a high proportion of zeroes and a right-skewed distribution. Despite this, statistical methods for normally distributed data are frequently used in the analysis of NRS data. We present results from an observational cross-sectional study examining the association of NRS scores with patient characteristics using data collected from a large cohort of 18,935 veterans in Department of Veterans Affairs care diagnosed with a potentially painful musculoskeletal disorder. The mean (variance) NRS pain was 3.0 (7.5), and 34% of patients reported no pain (NRS = 0). We compared the following statistical models for analyzing NRS scores: linear regression, generalized linear models (Poisson and negative binomial), zero-inflated and hurdle models for data with an excess of zeroes, and a cumulative logit model for ordinal data. We examined model fit, interpretability of results, and whether conclusions about the predictor effects changed across models. In this study, models that accommodate zero inflation provided a better fit than the other models. These models should be considered for the analysis of NRS data with a large proportion of zeroes., Perspective: We examined and analyzed pain data from a large cohort of veterans with musculoskeletal disorders. We found that many reported no current pain on the NRS on the diagnosis date. We present several alternative statistical methods for the analysis of pain intensity data with a large proportion of zeroes., (Published by Elsevier Inc.)
- Published
- 2017
- Full Text
- View/download PDF
294. Bayesian inference on quasi-sparse count data.
- Author
-
Datta J and Dunson DB
- Abstract
There is growing interest in analysing high-dimensional count data, which often exhibit quasi-sparsity corresponding to an overabundance of zeros and small nonzero counts. Existing methods for analysing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to quasi-sparse settings. We develop a new class of continuous local-global shrinkage priors tailored to quasi-sparse counts. Theoretical properties are assessed, including flexible posterior concentration and stronger control of false discoveries in multiple testing. Simulation studies demonstrate excellent small-sample properties relative to competing methods. We use the method to detect rare mutational hotspots in exome sequencing data and to identify North American cities most impacted by terrorism.
- Published
- 2016
- Full Text
- View/download PDF
295. Logistic regression for dichotomized counts.
- Author
-
Preisser JS, Das K, Benecha H, and Stamm JW
- Subjects
- Child, Dental Caries prevention & control, Humans, Odds Ratio, Poisson Distribution, Randomized Controlled Trials as Topic, Scotland, Toothpastes pharmacology, Logistic Models
- Abstract
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren., (© The Author(s) 2014.)
- Published
- 2016
- Full Text
- View/download PDF
296. Test of Treatment Effect with Zero-Inflated Over-Dispersed Count Data from Randomized Single Factor Experiments
- Author
-
Fan, Huihao
- Subjects
- Biostatistics, over-dispersion, count data, Poisson distribution, zero-inflation
- Abstract
Real-life count data are frequently characterized by over-dispersion (variance greater than mean) and excess zeros. Various methods exist in literature to combat zero-inflation and over-dispersion in count data. Among them Zero-inflated count models provide a parsimonious yet powerful way to model excess zeros in addition to allowing for over-dispersion. Such models assume that the counts are a mixture of two separate data generation process: one generates only zeros, and the other is a Poisson type data-generating process. Among mostly discussed models are zero-inflated Poisson (ZIP), zero inflated negative binomial (ZINB) and zero-inflated generalized Poisson (ZIGP). However, the performance and application condition of these models are not thoroughly studied. In this work, these common zero-inflation models are reviewed and compared under specified over-dispersion conditions via simulated data and real-life data in terms of statistical power and type I error rate. Performance of each model will be listed side by side to give a clear view of each model’s pros and cons in specific over-dispersion and zero-inflation condition. Further, the ZIGP model is chosen to extend to a more general situation where a random effect is incorporated to account for within-subject correlation and between subject heterogeneity. Likelihood based estimation of treatment effect will be developed for analysis of randomized experiments with random effect. Effect of model misspecification on model’s performance will be investigated in areas such as type I error rate, standard error and empirical statistical power. Case studies will be presented to illustrate the application these models.
- Published
- 2014
297. Evaluating the performance of two competing models of school suspension under simulation - the zero-inflated negative binomial and the negative binomial hurdle
- Author
-
Desjardins, Christopher David
- Subjects
- Negative binomial hurdle, Overdispersion, School suspensions, Simulation, Zero-inflated negative binomial, Zero-inflation
- Abstract
In many educational settings, count data arise that should not be considered realizations of the Poisson model. School days suspended represents an exemplary case of count data that may be zero-inflated and overdispersed relative to the Poisson model after controlling for explanatory variables. This study examined the performance of two models of school days suspended - the zero-inflated negative binomial and the negative binomial hurdle. This study aimed to understand whether the conditions considered would elicit comparable and/or disparate performance between these models. Additionally, this study aimed to understand the consequences of model misspecification when the data-generating mechanism was improperly specified. This study found that the negative binomial hurdle performed better in both simulation studies. Based on the conditions considered here, it is recommend that researchers consider the negative binomial hurdle model over the zero-inflated negative binomial model especially if the structural zero/zero parameters are to be treated as nuisance parameters or the presence of structural zeros is unknown. If structural zeros are expected, and interest is in these parameters, then the zero-inflated negative binomial should still be considered. Additionally, if interest is in the non-structural zero/count parameters, the results here suggest model misspecification has little effect on these parameters, and a researcher may select a model based on the parameters they are interested in interpreting.
- Published
- 2013
298. A Marginalized Zero-inflated Poisson Regression Model with Random Effects.
- Author
-
Long DL, Preisser JS, Herring AH, and Golin CE
- Abstract
Public health research often concerns relationships between exposures and correlated count outcomes. When counts exhibit more zeros than expected under Poisson sampling, the zero-inflated Poisson (ZIP) model with random effects may be used. However, the latent class formulation of the ZIP model can make marginal inference on the sampled population challenging. This article presents a marginalized ZIP model with random effects to directly model the mean of the mixture distribution consisting of 'susceptible' individuals and excess zeroes, providing straightforward inference for overall exposure effects. Simulations evaluate finite sample properties, and the new methods are applied to a motivational interviewing-based safer sex intervention trial, designed to reduce the number of unprotected sexual acts.
- Published
- 2015
- Full Text
- View/download PDF
299. CAUSAL MEDIATION ANALYSIS FOR NON-LINEAR MODELS
- Author
-
Wang, Wei
- Subjects
- Biostatistics, causal mediation analysis, non-linear models, indirect effect, mediation formula, sensitivity analysis, zero-inflation, multiple-mediator model
- Abstract
Mediators are intermediate variables in the causal pathway between an exposure and an outcome. Mediation analysis investigates the extent to which exposure effects occur through these variables, thus revealing causal mechanisms. One interesting question in causal inference area is mediation analysis for non-linear models.In the first part of this dissertation, we consider the estimation of mediation effects in zero-inflated (ZI) models intended to accommodate ‘extra’ zeros in count data. Focusing on the ZI negative binomial (ZINB) models, we provide a mediation formula approach to estimate the (overall) mediation effect in the standard two-stage mediation framework under the key sequential ignorability assumption. We also consider a novel decomposition of the overall mediation effect for the ZI context using a three-stage mediation model. Simulation study results demonstrate low bias of mediation effect estimators and close-to-nominal coverage probability (CP) of confidence intervals. The method is applied to a retrospective cohort study of dental caries in very low birth weight adolescents. For overall mediation effect estimation, sensitivity analysis was conducted to quantify the degree to which key assumption must be violated to reverse the original conclusion.The second question we focus on is the mediation analysis for a dichotomous outcome in multiple-mediator models. We formulate a joint model (probit-normal) using continuous latent variables for any binary mediators to account for correlations among multiple mediators. A mediation formula approach is proposed to estimate the total mediation effect and decomposed mediation effects based on this parametric model. We conduct a simulation study that demonstrates low bias of mediation effect estimators for two-mediator models with various combinations of mediator types. The results also show that the power to detect a non-zero total mediation effect increases as the correlation coefficient between two mediators increases, while power for individual mediation effects reaches a maximum when the mediators are uncorrelated. We illustrate our approach by applying it to a retrospective cohort study of dental caries in adolescents with low and high socioeconomic status. Sensitivity analysis is performed to assess the robustness of conclusions regarding mediation effects when the assumption of no unmeasured mediator-outcome confounders is violated
- Published
- 2012
300. Extensions of nonnegative matrix factorization for exploratory data analysis
- Author
-
Hiroyasu, Abe
- Subjects
直交制約 ,零過剰 ,zero-inflation ,非負値行列因子分解 ,orthogonal constraint ,nonnegative matrix factorization ,compound Poisson distribution ,複合ポアソン分布 ,tri-factorization ,3因子分解 ,Tweedie分布 ,Tweedie distribution - Abstract
非負値行列因子分解(NMF)は,全要素が非負であるデータ行列に対する行列分解法である.本論文では,実在するデータ行列に頻繁に見られる特徴や解釈容易性の向上を考慮に入れ,探索的にデータ分析を行うためのNMFの拡張について論じている.具体的には,零過剰行列や外れ値を含む行列を扱うための確率分布やダイバージェンス,さらには分解結果である因子行列の数や因子行列への直交制約について述べている., Nonnegative matrix factorization (NMF) is a matrix decomposition technique to analyze nonnegative data matrices, which are matrices of which all elements are nonnegative. In this thesis, we discuss extensions of NMF for exploratory data analysis considering common features of a real nonnegative data matrix and an easy interpretation. In particular, we discuss probability distributions and divergences for zero-inflated data matrix and data matrix with outliers, two-factor vs. three-factor, and orthogonal constraint to factor matrices., Doctor of Culture and Information Science, Doshisha University
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.