3,804 results on '"Mixture models"'
Search Results
2. A fully decoupled, iteration-free, unconditionally stable fractional-step scheme for dispersed multi-phase flows
- Author
-
Pacheco, Douglas R.Q.
- Published
- 2025
- Full Text
- View/download PDF
3. Exploring New Perspectives on the Ukrainian-Russian Conflict Using Text Mining Techniques: An Updated Analysis
- Author
-
Scarso, Leonardo, Novelli, Marco, Violante, Francesco Saverio, Pollice, Alessio, editor, and Mariani, Paolo, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Model-Based Clustering of Spatial Time Series Through the BayesMix library
- Author
-
Gianella, Matteo, Guglielmi, Alessandra, Pollice, Alessio, editor, and Mariani, Paolo, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Heuristic Clustering Algorithms
- Author
-
Bagirov, Adil, Karmitsa, Napsu, Taheri, Sona, Celebi, M. Emre, Series Editor, Bagirov, Adil, Karmitsa, Napsu, and Taheri, Sona
- Published
- 2025
- Full Text
- View/download PDF
6. Errors-in-variables model fitting for partially unpaired data utilizing mixture models.
- Author
-
Hoegele, Wolfgang and Brockhaus, Sarah
- Abstract
We introduce a general framework for regression in the errors-in-variables regime, allowing for full flexibility about the dimensionality of the data, observational error probability density types, the (nonlinear) model type and the avoidance of ad-hoc definitions of loss functions. In this framework, we introduce model fitting for partially unpaired data, i.e., for given data groups the pairing information of input and output is lost (semi-supervised). This is achieved by constructing mixture model densities, which directly model the loss of pairing information allowing inference. In a numerical simulation study linear and nonlinear model fits are illustrated as well as a real data study is presented based on life expectancy data from the world bank utilizing a multiple linear regression model. These results show that high quality model fitting is possible with partially unpaired data, which opens the possibility for new applications with unfortunate or deliberate loss of pairing information in data. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
7. Multi-indication Evidence Synthesis in Oncology Health Technology Assessment: Meta-analysis Methods and Their Application to a Case Study of Bevacizumab.
- Author
-
Singh, Janharpreet, Anwer, Sumayya, Palmer, Stephen, Saramago, Pedro, Thomas, Anne, Dias, Sofia, Soares, Marta O, and Bujkiewicz, Sylwia
- Abstract
Background: Multi-indication cancer drugs receive licensing extensions to include additional indications, as trial evidence on treatment effectiveness accumulates. We investigate how sharing information across indications can strengthen the inferences supporting health technology assessment (HTA). Methods: We applied meta-analytic methods to randomized trial data on bevacizumab, to share information across oncology indications on the treatment effect on overall survival (OS) or progression-free survival (PFS) and on the surrogate relationship between effects on PFS and OS. Common or random indication-level parameters were used to facilitate information sharing, and the further flexibility of mixture models was also explored. Results: Treatment effects on OS lacked precision when pooling data available at present day within each indication separately, particularly for indications with few trials. There was no suggestion of heterogeneity across indications. Sharing information across indications provided more precise estimates of treatment effects and surrogacy parameters, with the strength of sharing depending on the model. When a surrogate relationship was used to predict treatment effects on OS, uncertainty was reduced only when sharing effects on PFS in addition to surrogacy parameters. Corresponding analyses using the earlier, sparser (within and across indications) evidence available for particular HTAs showed that sharing on both surrogacy and PFS effects did not notably reduce uncertainty in OS predictions. Little heterogeneity across indications meant limited added value of the mixture models. Conclusions: Meta-analysis methods can be usefully applied to share information on treatment effectiveness across indications in an HTA context, to increase the precision of target indication estimates. Sharing on surrogate relationships requires caution, as meaningful precision gains in predictions will likely require a substantial evidence base and clear support for surrogacy from other indications. Highlights: We investigated how sharing information across indications can strengthen inferences on the effectiveness of multi-indication treatments in the context of health technology assessment (HTA). Multi-indication meta-analysis methods can provide more precise estimates of an effect on a final outcome or of the parameters describing the relationship between effects on a surrogate endpoint and a final outcome. Precision of the predicted effect on the final outcome based on an effect on the surrogate endpoint will depend on the precision of the effect on the surrogate endpoint and the strength of evidence of a surrogate relationship across indications. Multi-indication meta-analysis methods can be usefully applied to predict an effect on the final outcome, particularly where there is limited evidence in the indication of interest. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
8. Liu-type shrinkage estimators for mixture of logistic regressions: an osteoporosis study.
- Author
-
Ghanem, Elsayed, Hatefi, Armin, and Usefi, Hamid
- Subjects
- *
REGRESSION analysis , *EXPECTATION-maximization algorithms , *STATISTICS , *DATA analysis , *MULTICOLLINEARITY , *LOGISTIC regression analysis , *OSTEOPOROSIS - Abstract
AbstractThe logistic regression model is one of the most powerful statistical methods for the analysis of binary data. Logistic regression allows using a set of covariates to explain the binary responses. A mixture of logistic regression models is used to fit heterogeneous populations using an unsupervised learning approach. The multicollinearity problem is one of the most common problems in logistic and a mixture of logistic regressions, where the covariates are highly correlated. This problem results in unreliable maximum likelihood estimates for the regression coefficients. This research develops shrinkage methods to deal with the multicollinearity in a mixture of logistic regression models. These shrinkage methods include ridge and Liu-type estimators. Through extensive numerical studies, we show that the developed methods provide more reliable results in estimating the coefficients of the mixture. Finally, we applied shrinkage methods to analyze the status of bone disorders in women aged 50 and older. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Enhanced Bayesian Gaussian hidden Markov mixture clustering for improved knowledge discovery.
- Author
-
Ganesan, Anusha, Paul, Anand, and Kim, Sungho
- Abstract
The hidden Markov model (HMM) is widely utilized in natural language processing, speech recognition, autonomous vehicular systems, and healthcare for tasks such as clustering, pattern recognition, predictive modeling, anomaly detection, and time-series forecasting. However, HMMs can be sensitive to initial states, compromising clustering reliability. To address this issue, we propose an innovative integration of an HMM with hybrid distance metric learning and a modified Bayesian Gaussian mixture model (BGMM) to enhance clustering performance and robustness. A significant challenge in HMM applications is determining the optimal number of hidden states. We address this using a k-fold cross-validation strategy. Implementing our Bayesian Gaussian Hidden Markov Mixture Clustering Model (BGH2MCM) on five diverse datasets, we categorize the observed data sequences according to underlying hidden state sequences. This approach yields superior outcomes to conventional techniques such as K-means, agglomerative clustering, density-based spatial clustering of applications with noise (DBSCAN), and the BGMM. We evaluate the efficiency of our model using silhouette, Davies–Bouldin, and Calinski–Harabasz scores, accuracy metrics, and computation time. Our results demonstrate that the BGH2MCM consistently achieves better clustering quality and computational efficiency, showing an average computation time 23% lower than agglomerative clustering with HMM, 22% less than DBSCAN with HMM, and 14% lower than K-means with the HMM and a BGMM-HMM across all datasets. This study highlights the potential of our BGH2MCM to improve data mining and knowledge discovery practices from complex, real-world datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Multiphase Viscoelastic Non‐Newtonian Fluid Simulation.
- Author
-
Zhang, Y., Long, S., Xu, Y., Wang, X., Yao, C., Kosinka, J., Frey, S., Telea, A., and Ban, X.
- Subjects
- *
VISCOELASTIC materials , *NEWTONIAN fluids , *MANUFACTURING processes , *CHEMICAL bonds , *RHEOLOGY , *PSEUDOPLASTIC fluids - Abstract
We propose an SPH‐based method for simulating viscoelastic non‐Newtonian fluids within a multiphase framework. For this, we use mixture models to handle component transport and conformation tensor methods to handle the fluid's viscoelastic stresses. In addition, we consider a bonding effects network to handle the impact of microscopic chemical bonds on phase transport. Our method supports the simulation of both steady‐state viscoelastic fluids and discontinuous shear behavior. Compared to previous work on single‐phase viscous non‐Newtonian fluids, our method can capture more complex behavior, including material mixing processes that generate non‐Newtonian fluids. We adopt a uniform set of variables to describe shear thinning, shear thickening, and ordinary Newtonian fluids while automatically calculating local rheology in inhomogeneous solutions. In addition, our method can simulate large viscosity ranges under explicit integration schemes, which typically requires implicit viscosity solvers under earlier single‐phase frameworks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Clusters that are not there: An R tutorial and a Shiny app to quantify a priori inferential risks when using clustering methods.
- Author
-
Toffalini, Enrico, Gambarota, Filippo, Perugini, Ambra, Girardi, Paolo, Tobia, Valentina, Altoè, Gianmarco, Giofrè, David, and Feraco, Tommaso
- Subjects
- *
MACHINE learning , *SOCIAL science research , *PSYCHOLOGICAL research , *RESEARCH personnel , *CLUSTER analysis (Statistics) - Abstract
Clustering methods are increasingly used in social science research. Generally, researchers use them to infer the existence of qualitatively different types of individuals within a larger population, thus unveiling previously "hidden" heterogeneity. Depending on the clustering technique, however, valid inference requires some conditions and assumptions. Common risks include not only failing to detect existing clusters due to a lack of power but also revealing clusters that do not exist in the population. Simple data simulations suggest that under conditions of sample size, number, correlation and skewness of indicators that are frequently encountered in applied psychological research, commonly used clustering methods are at a high risk of detecting clusters that are not there. Generally, this is due to some violations of assumptions that are not usually considered critical in psychology. The present article illustrates a simple R tutorial and a Shiny app (for those who are not familiar with R) that allow researchers to quantify a priori inferential risks when performing clustering methods on their own data. Doing so is suggested as a much‐needed preliminary sanity check, because conditions that inflate the number of detected clusters are very common in applied psychological research scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Tests of covariate effects under finite Gaussian mixture regression models.
- Author
-
Gan, Chong, Chen, Jiahua, and Feng, Zeny
- Subjects
- *
FALSE positive error , *CLUSTER analysis (Statistics) , *REGRESSION analysis , *ERROR rates , *BATS , *GAUSSIAN mixture models - Abstract
Mixture of regression model is widely used to cluster subjects from a suspected heterogeneous population due to differential relationships between response and covariates over unobserved subpopulations. In such applications, statistical evidence pertaining to the significance of a hypothesis is important yet missing to substantiate the findings. In this case, one may wish to test hypotheses regarding the effect of a covariate such as its overall significance. If confirmed, a further test of whether its effects are different in different subpopulations might be performed. This paper is motivated by the analysis of Chiroptera dataset, in which, we are interested in knowing how forearm length development of bat species is influenced by precipitation within their habitats and living regions using finite Gaussian mixture regression (GMR) model. Since precipitation may have different effects on the evolutionary development of the forearm across the underlying subpopulations among bat species worldwide, we propose several testing procedures for hypotheses regarding the effect of precipitation on forearm length under finite GMR models. In addition to the real analysis of Chiroptera data, through simulation studies, we examine the performances of these testing procedures on their type I error rate, power, and consequently, the accuracy of clustering analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. A model of errors in BMI based on self-reported and measured anthropometrics with evidence from Brazilian data.
- Author
-
Davillas, Apostolos, de Oliveira, Victor Hugo, and Jones, Andrew M.
- Subjects
ERRORS-in-variables models ,REGRESSION analysis ,MEASUREMENT errors ,MEDICAL care use ,BODY mass index - Abstract
The economics of obesity literature implicitly assumes that measured anthropometrics are error-free and they are often treated as a gold standard when compared to self-reported data. We use factor mixture models to analyse measurement error in both self-reported and measured anthropometrics with nationally representative data from the 2013 National Health Survey in Brazil. A small but statistically significant fraction of measured anthropometrics are attributed to recording errors, while, as they are imprecisely recorded and due to reporting behaviour, only between 10 and 23% of our self-reported anthropometrics are free from any measurement error. Post-estimation analysis allows us to calculate hybrid anthropometric predictions that best approximate the true body weight and height distribution. BMI distributions based on the hybrid measures do not differ between our factor mixture models, with and without covariates, and are generally close to those based on measured data, while BMI based on self-reported data under-estimates the true BMI distribution. "Corrected self-reported BMI" measures, based on common methods to mitigate reporting error in self-reports using predictions from corrective equations, do not seem to be a good alternative to our "hybrid" BMI measures. Analysis of regression models for the association between BMI and health care utilization shows only small differences, concentrated at the far-right tails of the BMI distribution, when they are based on our hybrid measure as opposed to measured BMI. However, more pronounced differences are observed, at the lower and higher tails of BMI, when these are compared to self-reported or "corrected self-reported" BMI. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Estimation of a Generalized Treatment Effect in a Control Group Versus Treatment Group Design.
- Author
-
Jeske, Daniel R.
- Subjects
- *
TREATMENT effectiveness , *BLOOD pressure , *CLINICAL medicine , *CLINICAL trials , *CONTROL groups - Abstract
AbstractA control group versus treatment group design is considered where the responses in the treatment group are modeled as a two-component mixture model that accounts for the possibility that only a fraction of the patients in the treated group will respond to the treatment. In this setting, the treatment effect is generalized to include both the fraction of treated patients that respond to the treatment and the magnitude of the response. Two alternative correlated and biased estimators are combined to yield an estimator that is preferable to either one of the estimators individually. The combined estimator is demonstrated on an illustrative blood pressure dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Modeling uncertainty with the truncated zeta distribution in mixture models for ordinal responses.
- Author
-
Dai, Dayang and Wang, Dabuxilatu
- Subjects
- *
ZETA functions , *EXPECTATION-maximization algorithms , *PARAMETER estimation , *HEALTH surveys , *DATA modeling - Abstract
AbstractIn recent three decades, there has been a rapid increasing interest in the mixture models for ordinal responses, and the classical CUB model as a fundamental one has been extended to different preference and uncertainty models. In this article, based on a response style supported by Zipf’s law, we propose a novel mixture model for ordinal responses via replacing the uncertainty component of the CUB model with a truncated Zeta distribution. Parameters estimation with EM algorithm, inferential issues with respect to the approximation of a truncated Riemann Zeta function and estimators’ variance-covariance information matrix are investigated. The advantages of the proposed model over the CUB model have been illustrated with simulations of two sets of Monte Carlo experiments and practical applications of a health survey and a bicycle use. The intention of the article is to distinguish the respondents’ true preference from the response style of “the higher, the less”, so as to understand more reasonably the formation causes of ordinal responses. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Robust Classification via Finite Mixtures of Matrix Variate Skew- t Distributions.
- Author
-
Mahdavi, Abbas, Balakrishnan, Narayanaswamy, and Jamalizadeh, Ahad
- Subjects
- *
DISTRIBUTION (Probability theory) , *GAUSSIAN distribution , *IMAGE segmentation , *EARLY detection of cancer , *SKIN cancer , *SKEWNESS (Probability theory) - Abstract
Analysis of matrix variate data is becoming increasingly common in the literature, particularly in the field of clustering and classification. It is well known that real data, including real matrix variate data, often exhibit high levels of asymmetry. To address this issue, one common approach is to introduce a tail or skewness parameter to a symmetric distribution. In this regard, we introduce here a new distribution called the matrix variate skew-t distribution (MVST), which provides flexibility, in terms of heavy tail and skewness. We then conduct a thorough investigation of various characterizations and probabilistic properties of the MVST distribution. We also explore extensions of this distribution to a finite mixture model. To estimate the parameters of the MVST distribution, we develop an EM-type algorithm that computes maximum likelihood (ML) estimates of the model parameters. To validate the effectiveness and usefulness of the developed models and associated methods, we performed empirical experiments, using simulated data as well as three real data examples, including an application in skin cancer detection. Our results demonstrate the efficacy of the developed approach in handling asymmetric matrix variate data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Investigating Heterogeneity in Response Strategies: A Mixture Multidimensional IRTree Approach.
- Author
-
Alagöz, Ö. Emre C. and Meiser, Thorsten
- Subjects
- *
STATISTICAL models , *COMPUTER simulation , *RESEARCH funding , *DESCRIPTIVE statistics , *DECISION making , *PSYCHOMETRICS , *CONCEPTUAL structures , *DECISION trees , *TREATMENT effect heterogeneity , *EVIDENCE-based medicine - Abstract
To improve the validity of self-report measures, researchers should control for response style (RS) effects, which can be achieved with IRTree models. A traditional IRTree model considers a response as a combination of distinct decision-making processes, where the substantive trait affects the decision on response direction, while decisions about choosing the middle category or extreme categories are largely determined by midpoint RS (MRS) and extreme RS (ERS). One limitation of traditional IRTree models is the assumption that all respondents utilize the same set of RS in their response strategies, whereas it can be assumed that the nature and the strength of RS effects can differ between individuals. To address this limitation, we propose a mixture multidimensional IRTree (MM-IRTree) model that detects heterogeneity in response strategies. The MM-IRTree model comprises four latent classes of respondents, each associated with a different set of RS traits in addition to the substantive trait. More specifically, the class-specific response strategies involve (1) only ERS in the "ERS only" class, (2) only MRS in the "MRS only" class, (3) both ERS and MRS in the "2RS" class, and (4) neither ERS nor MRS in the "0RS" class. In a simulation study, we showed that the MM-IRTree model performed well in recovering model parameters and class memberships, whereas the traditional IRTree approach showed poor performance if the population includes a mixture of response strategies. In an application to empirical data, the MM-IRTree model revealed distinct classes with noticeable class sizes, suggesting that respondents indeed utilize different response strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Probabilistic and explainable modeling of Phase–Phase Cross-Frequency Coupling patterns in EEG. Application to dyslexia diagnosis.
- Author
-
Castillo-Barnes, Diego, Gallego-Molina, Nicolás J., Formoso, Marco A., Ortiz, Andrés, Figueiredo, Patrícia, and Luque, Juan L.
- Subjects
MACHINE learning ,GAUSSIAN mixture models ,AUDITORY perception ,SIGNAL processing ,COUPLINGS (Gearing) - Abstract
This work explores the intricate neural dynamics associated with dyslexia through the lens of Cross-Frequency Coupling (CFC) analysis applied to electroencephalography (EEG) signals evaluated from 48 seven-year-old Spanish readers from the LEEDUCA research platform. The analysis focuses on CFS (Cross-Frequency phase Synchronization) maps, capturing the interaction between different frequency bands during low-level auditory processing stimuli. Then, making use of Gaussian Mixture Models (GMMs), CFS activations are quantified and classified, offering a compressed representation of EEG activation maps. The study unveils promising results specially at the Theta-Gamma coupling (Area Under the Curve = 0.821), demonstrating the method's sensitivity to dyslexia-related neural patterns and highlighting potential applications in the early identification of dyslexic individuals. [Display omitted] • Novel CFC analysis of EEG detects dyslexia, revealing neural dynamics. • Histogram transformation improves dyslexia detection in CFS maps. • GMM reduces dimensionality and overfitting, preserving key EEG features. • Theta-Gamma coupling discriminates dyslexia with high AUC values (0.82, 0.71, 0.76). • Our method outperforms traditional techniques like PAF and PCA in efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Estimating Higher-Order Mixed Memberships via the l2,∞ Tensor Perturbation Bound.
- Author
-
Agterberg, Joshua and Zhang, Anru R.
- Subjects
- *
SIMPLEX algorithm , *SIGNAL-to-noise ratio , *MACHINE learning , *NOISE , *GENERALIZATION - Abstract
AbstractHigher-order multiway data is ubiquitous in machine learning and statistics and often exhibits community-like structures, where each component (node) along each different mode has a community membership associated with it. In this article we propose the sub-Gaussian)
tensor mixed-membership blockmodel , a generalization of the tensor blockmodel positing that memberships need not be discrete, but instead are convex combinations of latent communities. We establish the identifiability of our model and propose a computationally efficient estimation procedure based on the higher-order orthogonal iteration algorithm (HOOI) for tensor SVD composed with a simplex corner-finding algorithm. We then demonstrate the consistency of our estimation procedure by providing a per-node error bound under sub-Gaussian noise, which showcases the effect of higher-order structures on estimation accuracy. To prove our consistency result, we develop the l2,∞ tensor perturbation bound for HOOI under independent, heteroscedastic, sub-Gaussian noise that may be of independent interest. Our analysis uses a novel leave-one-out construction for the iterates, and our bounds depend only on spectral properties of the underlying low-rank tensor under nearly optimal signal-to-noise ratio conditions such that tensor SVD is computationally feasible. Finally, we apply our methodology to real and simulated data, demonstrating some effects not identifiable from the model with discrete community memberships. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
20. A Mixture Model Approach to Assessing Measurement Error in Surveys Using Reinterviews.
- Author
-
Hoellerbauer, Simon
- Subjects
- *
DATA quality , *MEASUREMENT errors , *RESEARCH personnel , *ACQUISITION of data , *MIXTURES - Abstract
Researchers are often unsure about the quality of the data collected by third-party actors, such as survey firms. This may be because of the inability to measure data quality effectively at scale and the difficulty with communicating which observations may be the source of measurement error. Researchers rely on survey firms to provide them with estimates of data quality and to identify observations that are problematic, potentially because they have been falsified or poorly collected. To address these issues, I propose the QualMix model, a mixture modeling approach to deriving estimates of survey data quality in situations in which two sets of responses exist for all or certain subsets of respondents. I apply this model to the context of survey reinterviews, a common form of data quality assessment used to detect falsification and data collection problems during enumeration. Through simulation based on real-world data, I demonstrate that the model successfully identifies incorrect observations and recovers latent enumerator and survey data quality. I further demonstrate the model's utility by applying it to reinterview data from a large survey fielded in Malawi, using it to identify significant variation in data quality across observations generated by different enumerators. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Mixtures of Probit Regression Models with Overlapping Clusters.
- Author
-
Ranciati, Saverio, Vinciotti, Veronica, Wit, Ernst C., and Galimberti, Giuliano
- Subjects
REGRESSION analysis ,CLUSTER analysis (Statistics) ,MARKOV chain Monte Carlo ,MULTIVARIATE analysis ,INFERENTIAL statistics - Abstract
Studies with binary outcomes on a heterogeneous population are quite common. Typically, the heterogeneity is modelled through varying effect coefficients within some binary regression setting combined with a clustering procedure. Most of the existing methods assign statistical units to distinct and non- overlapping clusters. However, there are scenarios where units exhibit a more complex organization and the clusters can be thought as partially overlapping. In this case, the standard approach does not work. In this paper, we define a mixture of regression models that allows overlapping clusters. This approach involves an overlap function that maps the regression coefficients, either at the unit or response level, of the parent clusters into the coefficients of the multiple allocation clusters. In order to deal with this intrinsic heterogeneity, regression analyses have to be stratified for different groups of observations or clusters. We present a computationally efficient Monte Carlo Markov Chain (MCMC) scheme for the case of a mixture of probit regressions. A simulation study shows the overall performance of the method. We conclude with two illustrative examples of modelling voting behavior, involving United States (US) Supreme Court justices over a number of topics and members of the United Kingdom (UK) parliament over divisions related to Brexit. These applications provide insights on the usefulness of the method in real applications. The method described can be extended to the case of a generic mixture of multivariate generalized linear models under overlapping clusters. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Non intrusive load monitoring using additive time series modeling via finite mixture models aggregation.
- Author
-
Tabarsaii, Soudabeh, Amayri, Manar, Bouguila, Nizar, and Eicker, Ursula
- Abstract
Energy disaggregation, or Non-Intrusive Load Monitoring (NILM), involves different methods aiming to distinguish the individual contribution of appliances, given the aggregated power signal. In this paper, the application of finite Generalized Gaussian and finite Gamma mixtures in energy disaggregation is proposed and investigated. The procedure includes approximation of the distribution of the sum of two Generalized Gaussian random variables (RVs) and the approximation of the distribution of the sum of two Gamma RVs using Method-of-Moments matching. By adopting this procedure, the probability distribution of each combination of appliances consumption is acquired to predict and disaggregate the specific device data from the aggregated data. Moreover, to make the models more practical we propose a deep version, that we call DNN-Mixture, as a cascade model, which is a combination of a deep neural network and each of the proposed mixture models. As part of our extensive evaluation process, we apply the proposed models on three different datasets, from different geographical locations, that had different sampling rates. The results indicate the superiority of proposed models as compared to the Gaussian mixture model and other widely used approaches. In order to investigate the applicability of our models in challenging unsupervised settings, we tested them on unseen houses with unlabeled data. The outcomes proved the extensibility and robustness of the proposed approach. Finally, the evaluation of the cascade model against the state of the art shows that by benefiting from the advantages of both neural networks and finite mixtures, cascade model can produce promising and competing results with RNN without suffering from its inherent disadvantages. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Unsupervised Liu-type shrinkage estimators for mixture of regression models.
- Author
-
Ghanem, Elsayed, Hatefi, Armin, and Usefi, Hamid
- Subjects
- *
EXPECTATION-maximization algorithms , *REGRESSION analysis , *MULTICOLLINEARITY , *COMPUTER simulation , *HETEROGENEITY - Abstract
The mixture of probabilistic regression models is one of the most common techniques to incorporate the information of covariates into learning of the population heterogeneity. Despite its flexibility, unreliable estimates can occur due to multicollinearity among covariates. In this paper, we develop Liu-type shrinkage methods through an unsupervised learning approach to estimate the model coefficients in the presence of multicollinearity. We evaluate the performance of our proposed methods via classification and stochastic versions of the expectation-maximization algorithm. We show using numerical simulations that the proposed methods outperform their Ridge and maximum likelihood counterparts. Finally, we apply our methods to analyze the bone mineral data of women aged 50 and older. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Exploring job seeker profiles through latent profile analysis.
- Author
-
Stremersch, Jolien, Bouckenooghe, Dave, and Kanar, Adam M.
- Subjects
- *
JOB hunting , *SOCIAL pressure , *JOB postings , *JOB offers , *VOCATIONAL guidance counselors , *QUALITY of work life - Abstract
Primarily using a variable-centered approach, job search research explores the connections between antecedents, processes, and outcomes. A person-centered approach, however, categorizes individuals based on personal and contextual elements. This study used CSM as a theoretical framework to identify job seeker profiles by exploring configurations of job search self-efficacy, conscientiousness, financial need, social pressure, and job search quality and intensity. We examined how these profiles correspond with sociodemographic variables and job search outcomes such as rumination, interviews, and job offers. In a sample of 300 job seekers, four profiles emerged: casual job search contemplator, financially burdened job seeker, financially secure job seeker, and multifaceted job search strategist. The contemplator profile correlated with the fewest interviews, while the financially burdened job seeker had the most. These findings suggest career counselors need to recognize distinctive job seeker patterns requiring tailored counseling approaches, underscoring the potential of the person-centered approach for further job search research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Sigmoid allometries generate male dimorphism in secondary sexual traits: a comment on Packard (2023).
- Author
-
Buzatto, Bruno A., Machado, Glauco, and Palaoro, Alexandre V.
- Subjects
BODY size ,SEXUAL dimorphism ,ALLOMETRY ,BEETLES ,MALES - Abstract
The detection of male dimorphism has seen numerous statistical advances. Packard has recently criticized a widely used method, reanalyzing data from beetles and harvestmen using an alternative method. We disagree with Packard conclusions, probably due to different implicit definitions of male dimorphism. We consider that male dimorphism manifests in a distribution when it is significantly better described by a model with two values of central tendency (bimodality), rather than a model with only one (unimodality). Thus, while Packard suggests sigmoid allometries as alternatives to male dimorphism, we argue that such allometries are manifestations of mechanisms that generate bimodal distributions. Instead of focusing on this dichotomy, we propose an approach to test whether bimodality in a trait simply arises from its allometry by: (1) characterizing the trait static allometry, (2) simulating body size values based on original data parameters, and (3) generating new trait sizes using the static allometries. The percentage of simulations generating equal or greater bimodality than the data represents the likelihood that the bimodality can be explained by the allometry alone. Our method offers a null model linking sigmoid allometries and bimodal distributions, providing a test for mechanisms that accentuate trait bimodality beyond what the trait allometry generates. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. A modified EM-type algorithm to estimate semi-parametric mixtures of non-parametric regressions.
- Author
-
Skhosana, Sphiwe B., Millard, Salomon M., and Kanfer, Frans H. J.
- Abstract
Semi-parametric Gaussian mixtures of non-parametric regressions (SPGMNRs) are a flexible extension of Gaussian mixtures of linear regressions (GMLRs). The model assumes that the component regression functions (CRFs) are non-parametric functions of the covariate(s) whereas the component mixing proportions and variances are constants. Unfortunately, the model cannot be reliably estimated using traditional methods. A local-likelihood approach for estimating the CRFs requires that we maximize a set of local-likelihood functions. Using the Expectation-Maximization (EM) algorithm to separately maximize each local-likelihood function may lead to label-switching. This is because the posterior probabilities calculated at the local E-step are not guaranteed to be aligned. The consequence of this label-switching is wiggly and non-smooth estimates of the CRFs. In this paper, we propose a unified approach to address label-switching and obtain sensible estimates. The proposed approach has two stages. In the first stage, we propose a model-based approach to address the label-switching problem. We first note that each local-likelihood function is a likelihood function of a Gaussian mixture model (GMM). Next, we reformulate the SPGMNRs model as a mixture of these GMMs. Lastly, using a modified version of the Expectation Conditional Maximization (ECM) algorithm, we estimate the mixture of GMMs. In addition, using the mixing weights of the local GMMs, we can automatically choose the local points where local-likelihood estimation takes place. In the second stage, we propose one-step backfitting estimates of the parametric and non-parametric terms. The effectiveness of the proposed approach is demonstrated on simulated data and real data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions.
- Author
-
Browne, Ryan P., Bagnato, Luca, and Punzo, Antonio
- Abstract
Mixtures of multivariate leptokurtic-normal distributions have been recently introduced in the clustering literature based on mixtures of elliptical heavy-tailed distributions. They have the advantage of having parameters directly related to the moments of practical interest. We derive two estimation procedures for these mixtures. The first one is based on the majorization-minimization algorithm, while the second is based on a fixed point approximation. Moreover, we introduce parsimonious forms of the considered mixtures and we use the illustrated estimation procedures to fit them. We use simulated and real data sets to investigate various aspects of the proposed models and algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Row mixture-based clustering with covariates for ordinal responses.
- Author
-
Preedalikit, Kemmawadee, Fernández, Daniel, Liu, Ivy, McMillan, Louise, Nai Ruscone, Marta, and Costilla, Roy
- Subjects
- *
EXPECTATION-maximization algorithms , *LOGISTIC regression analysis , *CLINICAL trials , *CLUSTER analysis (Statistics) , *MIXTURES - Abstract
Existing methods can perform likelihood-based clustering on a multivariate data matrix of ordinal data, using finite mixtures to cluster the rows (observations) of the matrix. These models can incorporate the main effects of individual rows and columns, as well as cluster effects, to model the matrix of responses. However, many real-world applications also include available covariates, which provide insights into the main characteristics of the clusters and determine clustering structures based on both the individuals' similar patterns of responses and the effects of the covariates on the individuals' responses. In our research we have extended the mixture-based models to include covariates and test what effect this has on the resulting clustering structures. We focus on clustering the rows of the data matrix, using the proportional odds cumulative logit model for ordinal data. We fit the models using the Expectation-Maximization algorithm and assess performance using a simulation study. We also illustrate an application of the models to the well-known arthritis clinical trial data set. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Finding Outliers in Gaussian Model-based Clustering.
- Author
-
Clark, Katharine M. and McNicholas, Paul D.
- Subjects
- *
GAUSSIAN mixture models - Abstract
Clustering, or unsupervised classification, is a task often plagued by outliers. Yet there is a paucity of work on handling outliers in clustering. Outlier identification algorithms tend to fall into three broad categories: outlier inclusion, outlier trimming, and post hoc outlier identification methods, with the former two often requiring pre-specification of the number of outliers. The fact that sample squared Mahalanobis distance is beta-distributed is used to derive an approximate distribution for the log-likelihoods of subset finite Gaussian mixture models. An algorithm is then proposed that removes the least plausible points according to the subset log-likelihoods, which are deemed outliers, until the subset log-likelihoods adhere to the reference distribution. This results in a trimming method, called OCLUST, that inherently estimates the number of outliers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Regression discontinuity design with principal stratification in the mixed proportional hazard model: an application to the long-run impact of education on longevity.
- Author
-
Bijwaard, Govert E. and Jones, Andrew M.
- Subjects
REGRESSION discontinuity design ,PROPORTIONAL hazards models ,LONGEVITY ,SURVIVAL analysis (Biometry) - Abstract
We investigate the long-run impact of education on longevity using data for England and Wales from the Health and Lifestyle Survey. Longevity is modelled by survival analysis using a mixed proportional hazard model. For identification we propose a Regression Discontinuity Design implied by an increase in the minimum school leaving age in 1947 (from 14 to 15) combined with a principal stratification method for estimation of the mortality hazard rate. This method allows us to derive the causal effect of extended education on longevity. In line with earlier studies we do not find credible evidence of a causal impact of the additional years of schooling that were induced by the reform on longevity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Using fusion effects to decrease uncertainty in distance sampling models when collating data from different surveys.
- Author
-
Plard, Floriane, Araújo, Hélder, Astarloa, Amaia, Louzao, Maite, Saavedra, Camilo, Bonales, José Antonio Vazquez, Pierce, Graham John, and Authier, Matthieu
- Subjects
DATA analysis ,CETACEA ,ACQUISITION of data ,DATA fusion (Statistics) - Abstract
Estimates of population abundance are required to study the impacts of human activities on populations and assess their conservation status. Despite considerable effort to improve data collection, uncertainty around estimates of cetacean densities can remain large. A fundamental concept underlying distance sampling is the detection function. Here we focus on reducing the uncertainty in the estimation of detection function parameters in analyses combining data sets from multiple surveys, with known effects on the precision of density estimates. We developed detection functions using infinite mixture models that can be applied on data collating multiple species and/or surveys. These models enable automatic clustering by fusing the species and surveys with similar detection functions. We present a simulation analysis of a multisurvey data set in a Bayesian framework where we demonstrated that distance sampling models including fusion effects showed lower uncertainty than classical distance sampling models. We illustrated the benefits of this new model using data of line transect surveys from the Bay of Biscay and Iberian Coast. Future estimates of abundance using conventional distance sampling models on large multispecies surveys or on data sets combining multiple surveys could benefit from this new model to provide more precise density estimates. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Assessment of Neonatal Mortality and Associated Hospital-Related Factors in Healthcare Facilities Within Sunyani and Sunyani West Municipal Assemblies in Bono Region, Ghana.
- Author
-
Tawiah, Kassim, Asosega, Killian Asampana, Iddi, Samuel, Opoku, Alex Akwasi, Abdul, Iddrisu Wahab, Ansah, Richard Kwame, Bukari, Francis Kwame, Okyere, Eric, and Adebanji, Atinuke Olusola
- Abstract
Objectives: Ghana's quest to reduce neonatal mortality, in hospital facilities and communities, continues to be a nightmare. The pursuit of achieving healthy lives and well-being for neonates as enshrined in Sustainable Development Goal three lingered in challenging hospital facilities and communities. Notwithstanding that, there have been increasing efforts in that direction. This study examines the contributing factors that hinder the fight against neonatal mortality in all hospital facilities in the Sunyani and Sunyani West Municipal Assemblies in Bono Region, Ghana. Methods: The study utilized neonatal mortality data consisting of neonatal deaths, structural facility related variables, medical human resources, types of hospital facilities and natal care. The data was collected longitudinally from 2014 to 2019. These variables were analysed using the negative binomial hurdle regression (NBH) model to determine factors that contribute to this menace at the facility level. Cause-specific deaths were obtained to determine the leading causes of neonatal deaths within health facilities in the two municipal assemblies. Results: The study established that the leading causes of neonatal mortality in these districts are birth asphyxia (46%), premature birth (33%), neonatal sepsis (11%) and neonatal jaundice (7%). The NBH showed that neonatal mortality in hospital facilities depend on the number of incubators, monitoring equipment, hand washing facilities, CPAP
b machines, radiant warmers, physiotherapy machines, midwives, paediatric doctors and paediatric nurses in the hospital facility. Conclusions: Early management of neonatal sepsis, birth asphyxia, premature birth and neonatal infections is required to reduce neonatal deaths. The government and all stakeholders in the health sector should provide all hospital facilities with the essential equipment and the medical human resources necessary to eradicate the menace. This will make the realization of Sustainable Development Goal three, which calls for healthy lives and well-being for all, a reality. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
33. The return period of heterogeneous climate data with a new invertible distribution.
- Author
-
Simões e Silva, Beatriz L., Otiniano, Cira E. G., and Nakano, Eduardo Y.
- Subjects
- *
DISTRIBUTION (Probability theory) , *MONTE Carlo method , *WEIBULL distribution , *FINITE mixture models (Statistics) , *ENVIRONMENTAL sciences , *ENVIRONMENTAL risk - Abstract
One of the most widely used risk measures in environmental studies is the return period (T) associated with a given level of return. Its calculation requires knowledge of the distribution function fitted to the data. Thus, the efficiency of T depends on the distribution that fits the data well. Except for models of finite mixtures of distributions, bimodal distributions are scarce and are necessary to adequately model climatic and hydrological data. In this work, we propose a simple method to obtain T based on a new invertible bimodal distribution. To illustrate its applicability, we consider the bimodal invertible Weibull distribution (IBW). Several properties of the IBW distribution were studied and the performance of the maximum likelihood estimates of the parameters was tested using Monte Carlo simulation. Additionally, the IBW regression model has been added. Finally, the proposed methodology was illustrated with temperature data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. On the hazard rate of α-mixture of survival functions.
- Author
-
Shojaee, Omid, Asadi, Majid, and Finkelstein, Maxim
- Subjects
- *
SURVIVAL rate , *HAZARDS , *PROPORTIONAL hazards models , *INVERSE problems - Abstract
The α-mixture model, as a flexible family of distributions, is an effective tool for modeling heterogeneity inS population. This article investigates the hazard rate of α-mixture in terms of hazard rates of mixed baseline distributions. In particular, when the baseline hazard rate follows either additive or multiplicative models an inverse problem to obtain the baseline hazard is solved. We, also, study the α-mixture hazard rate ordering for the ordered mixing distributions in the sense of likelihood ratio order. Sufficient conditions to order two finite α-mixtures in the sense of dispersive ordering are provided. Finally, it is shown that the hazard rate of the finite α-mixture in the multiplicative model tends to the hazard rate of the strongest (weakest) population as α → + ∞ ( − ∞ ). Several examples are presented to illustrate theoretical findings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Improving the study of plant evolution with multi-matrix mixture models.
- Author
-
Tinh, Nguyen Huy and Vinh, Le Sy
- Abstract
Amino acid substitution model is a key component to study the plant evolution from protein sequences. Although single-matrix amino acid substitution models have been estimated for plants (i.e., Q.plant and NQ.plant), they are not able to describe the rate heterogeneity among sites. A number of multi-matrix mixture models have been proposed to handle the site-rate heterogeneity; however, none are specifically estimated for plants. To enhance the study of plant evolution, we estimated both time-reversible and time non-reversible multi-matrix mixture models QPlant.mix and nQPlant.mix from the plant genomes. Experiments showed that the new mixture models were much better than the existing models for plant alignments. We recommend researchers to use the new mixture models for studying the plant evolution. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. A scaled dirichlet-based predictive model for occupancy estimation in smart buildings.
- Author
-
Guo, Jiaxun, Amayri, Manar, Fan, Wentao, and Bouguila, Nizar
- Subjects
PREDICTION models ,SERVER farms (Computer network management) ,FIX-point estimation ,INTELLIGENT buildings - Abstract
In this study, we introduce a predictive model leveraging the scaled Dirichlet mixture model (SDMM). This data-driven approach offers enhanced accuracy in predictions, especially with a limited training dataset, surpassing traditional point estimation methods. Recent research has highlighted the flexibility of the Dirichlet distribution in modelling multivariate proportional data. Our research extends this by employing a scaled Dirichlet distribution, which incorporates additional parameters, to construct our predictive model. Furthermore, we address the challenge of data imbalance through a novel approach centred on data spread rate, effectively balancing the dataset to optimize model performance. Empirical evaluations demonstrate the model's efficacy with both synthetic and real datasets, particularly in estimating occupancy in smart buildings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Coarsened Mixtures of Hierarchical Skew Normal Kernels for Flow and Mass Cytometry Analyses.
- Author
-
Gorsky, Shai, Cliburn Chan, and Li Ma
- Subjects
CYTOMETRY ,NONPARAMETRIC estimation ,MATHEMATICAL statistics ,HIERARCHICAL Bayes model ,MULTILEVEL models - Abstract
Cytometry is the standard multi-parameter assay for measuring single cell phenotype and functionality. It is commonly used for quantifying the relative frequencies of cell subsets in blood and disaggregated tissues. A typical analysis of cytometry data involves cell classification--that is, the identification of cell subgroups in the sample--and comparisons of the cell subgroups across samples or conditions. While modern experiments often necessitate the collection and processing of samples in multiple batches, analysis of cytometry data across batches is challenging because differences across samples may occur due to either true biological variation or technical reasons such as antibody lot effects or instrument optics across batches. Thus a critical step in comparative analyses of multi-sample cytometry data--yet missing in existing automated methods for analyzing such data--is cross-sample calibration, whose goal is to align corresponding cell subsets across multiple samples in the presence of technical variations, so that biological variations can be meaningfully compared. We introduce a Bayesian nonparametric hierarchical modeling approach for accomplishing both calibration and cell classification simultaneously in a unified probabilistic manner. Three important features of our method make it particularly effective for analyzing multi-sample cytometry data: a nonparametric mixture avoids prespecifying the number of cell clusters; a hierarchical skew normal kernel that allows flexibility in the shapes of the cell subsets and cross-sample variation in their locations; and finally the "coarsening" strategy makes inference robust to departures from the model not captured by the skew normal kernels. We demonstrate the merits of our approach in simulated examples and carry out a case study in the analysis of a multi-sample cytometry data set. We provide an R package for our method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Contaminated Gibbs-Type Priors.
- Author
-
Camerlenghi, Federico, Corradin, Riccardo, and Ongaro, Andrea
- Subjects
BAYESIAN analysis ,FLEXIBILITY (Mechanics) ,NONPARAMETRIC statistics ,MATHEMATICAL statistics ,PARTITIONS (Mathematics) - Abstract
Gibbs-type priors are combinatorial processes widely used as key components in several Bayesian nonparametric models. By virtue of their flexibility and mathematical tractability, they turn out to be predominant priors in species sampling problems and mixture modeling. We introduce a new family of processes which extends the Gibbs-type one, by including a contaminant component in the model to account for an excess of observations with frequency one. We first investigate the induced random partition, the associated predictive distribution, the asymptotic behavior of the total number of blocks and the number of blocks with a given frequency: all the results we obtain are in closed form and easily interpretable. A remarkable aspect of contaminated Gibbs-type priors relies on their predictive structure, compared to the one of the standard Gibbs-type family: it depends on the additional sampling information on the number of observations with frequency one out of the observed sample. As a noteworthy example we focus on the contaminated version of the Pitman-Yor process, which turns out to be analytically tractable and computationally feasible. Finally we pinpoint the advantage of our construction in different applications: we show how it helps to improve predictive inference in a species-related dataset exhibiting a high number of species with frequency one; we also discuss the use of the proposed construction in mixture models to perform density estimation and outlier detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Bayesian collapsed Gibbs sampling for a stochastic volatility model with a Dirichlet process mixture.
- Author
-
Wu, Frank C. Z.
- Subjects
GIBBS sampling ,STOCHASTIC models ,MARKOV chain Monte Carlo ,STOCHASTIC processes - Abstract
Summary: This paper replicates the results of the stochastic volatility–Dirichlet process mixture (SV‐DPM) models proposed in Jensen and Maheu (2010) in both a narrow and a wide sense. By using a normal‐Wishart prior and the collapsed Gibbs sampling method, our algorithm can be applied for more general settings, and it is more efficient for sampling the Dirichlet process mixture. For the stochastic volatility component, we adopt the method in Chan (2017) to further increase the overall efficiency of our algorithm. Using the same dataset, we obtain mixed results. Some of the results have significant differences. If we use recent time period dataset, which includes the COVID‐19 pandemic period, the log market portfolio volatility seems to increase in terms of the number of clusters and size of magnitude. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Mixture of experts distributional regression: implementation using robust estimation with adaptive first-order methods.
- Author
-
Rügamer, David, Pfisterer, Florian, Bischl, Bernd, and Grün, Bettina
- Abstract
In this work, we propose an efficient implementation of mixtures of experts distributional regression models which exploits robust estimation by using stochastic first-order optimization techniques with adaptive learning rate schedulers. We take advantage of the flexibility and scalability of neural network software and implement the proposed framework in mixdistreg, an R software package that allows for the definition of mixtures of many different families, estimation in high-dimensional and large sample size settings and robust optimization based on TensorFlow. Numerical experiments with simulated and real-world data applications show that optimization is as reliable as estimation via classical approaches in many different settings and that results may be obtained for complicated scenarios where classical approaches consistently fail. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. A Systematic Review of Factor Mixture Model Applications.
- Author
-
ŞEN, Sedat and COHEN, Allan S.
- Subjects
META-analysis ,SAMPLE size (Statistics) ,LATENT class analysis (Statistics) ,PSYCHOLOGY ,BEHAVIOR disorders in adolescence - Abstract
In this study, a systematic review was conducted on peer-reviewed articles of factor mixture model (FMM) applications. A total of 304 studies were included with 334 applications published from 2003-2022. FMM was mostly used in these studies to detect latent classes and model heterogeneity. Most of the studies were conducted in the U.S. with samples including students, adults, and the general population. The average sample size was 3,562, and the average number of items was 17.34. Measurement tools containing mostly Likert type items and measuring structures in the field of psychology were used in these FMM analyses. Most FMM studies that were reviewed were applied with maximum likelihood estimation methods as implemented in Mplus software. Multiple fit indices were used, the most common of which were AIC, BIC, and entropy. The mean numbers of classes and factors across the 334 applications were 2.96 and 2.17, respectively. Psychological and behavioral disorders, gender, and age variables were mostly the focus of these studies and included use of covariates in these analyses. As a result of this systematic review, the trends in FMM analyses were better understood. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Recommended Practices in Latent Class Analysis Using the Open-Source R-Package tidySEM.
- Author
-
Van Lissa, C. J., Garnier-Villarreal, M., and Anadria, D.
- Subjects
- *
STRUCTURAL equation modeling , *DISCOURSE analysis , *ACADEMIC discourse , *PARAMETRIC modeling , *GROWTH curves (Statistics) - Abstract
Latent class analysis (LCA) refers to techniques for identifying groups in data based on a parametric model. Examples include mixture models, LCA with ordinal indicators, and latent class growth analysis. Despite its popularity, there is limited guidance with respect to decisions that must be made when conducting and reporting LCA. Moreover, there is a lack of user-friendly open-source implementations. Based on contemporary academic discourse, this paper introduces recommendations for LCA which are summarized in the SMART-LCA checklist: Standards for More Accuracy in Reporting of different Types of Latent Class Analysis. The free open-source R-package package tidySEM implements the practices recommended here. It is easy for beginners to adopt thanks to user-friendly wrapper functions, and yet remains relevant for expert users as its models are integrated within the OpenMx structural equation modeling framework and remain fully customizable. The Appendices and tidySEM package vignettes include tutorial examples of common applications of LCA. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Modeling Fourier expansions using point processes on the complex plane with applications.
- Author
-
Wu, Weichao and Micheas, Athanasios C.
- Subjects
- *
TIME series analysis , *CHARACTERISTIC functions , *POINT processes , *SPECTRAL energy distribution , *LOG-linear models , *POISSON processes - Abstract
In this paper we study point processes on the complex plane and illustrate their uses in several statistical areas, where the quantities of interest requiring estimation involve Fourier expansions. In particular, for any problem where we can describe a quantity in terms of its Fourier expansion, we propose modeling the coefficients of the expansion using a point process on the complex plane. We utilize the Poisson complex point process and model its intensity function using log-linear and mixture models. The proposed models are exemplified via applications to general density approximation, via modeling of the characteristic function, and time series analysis, via modeling of the spectral density. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Learning CHARME models with neural networks.
- Author
-
Gómez-García, José G., Fadili, Jalal, and Chesneau, Christophe
- Subjects
ARTIFICIAL neural networks ,ASYMPTOTIC normality ,TIME series analysis ,PARAMETRIC modeling - Abstract
In this paper, we consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts), a class of generalized mixture of nonlinear (non)parametric AR-ARCH time series. The main objective of this paper is to learn the autoregressive and volatility functions of this model with neural networks (NN). This approach is justified thanks to the universal approximation capacity of neural networks. On the other hand, in order to build the learning theory, it is necessary first to prove the ergodicity of the CHARME model. We therefore show in a general nonparametric framework that under certain Lipschitz-type conditions on the autoregressive and volatility functions, this model is stationary, ergodic and τ -weakly dependent. These conditions are much weaker than those in the existing literature. Moreover, this result forms the theoretical basis for deriving an asymptotic theory of the underlying parametric estimation, which we present for this model in a general parametric framework. Altogether, this allows to develop a learning theory for the NN-based autoregressive and volatility functions of the CHARME model, where strong consistency and asymptotic normality of the considered estimator of the NN weights and biases are guaranteed under weak conditions. Numerical experiments are reported to support our theoretical findings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Seismicity Pattern Recognition in the Sumatra Megathrust Zone Through Mathematical Modeling of the Maximum Earthquake Magnitude Using Gaussian Mixture Models.
- Author
-
Rizal, Jose, Gunawan, Agus Y., Yosmar, Siska, and Nuryaman, Aang
- Subjects
GAUSSIAN mixture models ,HIDDEN Markov models ,PROBABILITY density function ,EARTHQUAKE magnitude measurement ,EARTHQUAKE magnitude ,MATHEMATICAL models ,SUBDUCTION zones ,EARTHQUAKE hazard analysis ,EARTHQUAKES - Abstract
The research area of the present study is the Sumatra megathrust zone, which can be partitioned into five segments based on the large earthquake sources, including the Aceh Andaman, Nias Simeulue, Mentawai Siberut, Mentawai Pagai, and Enggano segments. This work presents the recognition of seismicity patterns in the research area from January 1970 to December 2022 using segmental and zonal mathematical modeling of the annual maximum earthquake magnitude. To achieve this, we use two kinds of Gaussian mixture models: G-group Gaussian independent mixture models (G-group GMMs) and N-state Gaussian hidden Markov models (N-state GHMMs) to determine the appropriate probability density function of the seismicity data (ePDF). The fit model is selected based on the smallest Bayes information criterion. For the segment analysis, the results show that the ePDF of the Mentawai-Pagai segment fits the 2-state GHMM, whereas, for the four remaining segments, it tends to fit the 2-group GMM. Subsequently, for the zone analysis, the ePDF of the data fits the 2-state GHMM. Thus, from a segmental and zoning point of view, seismicity patterns fluctuate at two levels. From a seismic risk management aspect, these findings can be used to evaluate the risk vulnerability of an area to destructive earthquakes. That is, the patterns of seismicity sequences in all segments of the Sumatra megathrust zone all fluctuate within the range of moderate to strong earthquakes. Furthermore, the seismicity pattern in the Mentawai-Pagai segment and the Sumatra megathrust zone has Markov properties. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Stochastic Volatility with Feedback
- Author
-
Stoffer, David S., Chiann, Chang, editor, de Souza Pinheiro, Aluisio, editor, and Castro Toloi, Clélia Maria, editor
- Published
- 2024
- Full Text
- View/download PDF
47. Computational Comparisons of Two-Component Mixtures Using Lindley-Type Models
- Author
-
Heerden, O. van, Makgai, S., Bekker, A., Ferreira, J. T., Chen, Ding-Geng, Editor-in-Chief, Bekker, Andriëtte, Editorial Board Member, Coelho, Carlos A., Editorial Board Member, Finkelstein, Maxim, Editorial Board Member, Wilson, Jeffrey R., Editorial Board Member, Ng, Hon Keung Tony, Series Editor, and Lio, Yuhlong, Editorial Board Member
- Published
- 2024
- Full Text
- View/download PDF
48. Robust Clustering with McDonald’s Beta-Liouville Mixture Models for Proportional Data
- Author
-
Sghaier, Oussama, Amayri, Manar, Bouguila, Nizar, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Suen, Ching Yee, editor, Krzyzak, Adam, editor, Ravanelli, Mirco, editor, Trentin, Edmondo, editor, Subakan, Cem, editor, and Nobile, Nicola, editor
- Published
- 2024
- Full Text
- View/download PDF
49. Probabilistic Modeling: From Mixture Models to Probabilistic Circuits
- Author
-
Tomczak, Jakub M. and Tomczak, Jakub M.
- Published
- 2024
- Full Text
- View/download PDF
50. Statistical Modeling of Univariate Unimodal Data Using -Sigmoid Mixture Models
- Author
-
Chasani, Paraskevi, Likas, Aristidis, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Carette, Jacques, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Stettner, Lukasz, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Rettberg, Achim, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Maglogiannis, Ilias, editor, Iliadis, Lazaros, editor, Macintyre, John, editor, Avlonitis, Markos, editor, and Papaleonidas, Antonios, editor
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.