28 results on '"FUZZY statistics"'
Search Results
2. Jointly Modeling Rating Responses and Times with Fuzzy Numbers: An Application to Psychometric Data.
- Author
-
Cao, Niccolò and Calcagnì, Antonio
- Subjects
- *
PSYCHOMETRICS , *SOCIAL surveys , *STATISTICAL models , *INFERENTIAL statistics , *FUZZY numbers , *DATA analysis - Abstract
In several research areas, ratings data and response times have been successfully used to unfold the stagewise process through which human raters provide their responses to questionnaires and social surveys. A limitation of the standard approach to analyze this type of data is that it requires the use of independent statistical models. Although this provides an effective way to simplify the data analysis, it could potentially involve difficulties with regard to statistical inference and interpretation. In this sense, a joint analysis could be more effective. In this research article, we describe a way to jointly analyze ratings and response times by means of fuzzy numbers. A probabilistic tree model framework has been adopted to fuzzify ratings data and four-parameters triangular fuzzy numbers have been used in order to integrate crisp responses and times. Finally, a real case study on psychometric data is discussed in order to illustrate the proposed methodology. Overall, we provide initial findings to the problem of using fuzzy numbers as abstract models for representing ratings data with additional information (i.e., response times). The results indicate that using fuzzy numbers leads to theoretically sound and more parsimonious data analysis methods, which limit some statistical issues that may occur with standard data analysis procedures. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. On the estimation of parameters, survival functions, and hazard rates based on fuzzy life time data.
- Author
-
Shafiq, Muhammad and Viertl, Reinhard
- Subjects
- *
PARAMETERS (Statistics) , *ESTIMATION theory , *FUZZY statistics , *DATA analysis , *HAZARD function (Statistics) - Abstract
Life time data analysis is regarded as one of the significant out-shoots of statistics. Classical statistical techniques reckon life time observations as precise numbers and solely cover variation among the observations. In fact, there are two types of uncertainty in data: variation among the observations and the fuzziness. To this effect, the analysis techniques, which do not consider fuzziness and are only based on precise life time observations, use incomplete information; hence lead to pseudo results. This study aimed at generalizing parameters estimation, survival functions, and hazard rates for fuzzy life time data. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
4. CLASSICAL AND BAYESIAN INFERENCE OF PARETO DISTRIBUTION AND FUZZY LIFE TIMES.
- Author
-
Shafiq, Muhammad
- Subjects
- *
PARETO distribution , *BAYESIAN analysis , *FUZZY statistics , *DATA analysis , *VARIATIONAL principles - Abstract
Life time data are mainly used in medicine, public health, and engineering sciences. The classical statistics related to these fields are based on precise measurements. However, according to modern measurement science precise measurement of continuous phenomena is not possible. In addition the observed quantity is more or less precise. In order to obtain realistic results, in addition to classical and Bayesian inference, the most up to date model for this are fuzzy number approaches which are more suitable and realistic. In this study, I consider the classical and Bayesian inference criteria for the Pareto distribution to obtain more realistic results for fuzzy observations of life time. In addition to stochastic variation the proposed generalized estimators cover fuzziness of the life times as well. [ABSTRACT FROM AUTHOR]
- Published
- 2017
5. Non-convex fuzzy data and fuzzy statistics: a first descriptive approach to data analysis.
- Author
-
Calcagnì, A., Lombardi, L., and Pascali, E.
- Subjects
- *
FUZZY systems , *FUZZY statistics , *DATA analysis , *DESCRIPTIVE statistics , *DECISION making , *AD hoc computer networks - Abstract
LR-fuzzy numbers are widely used in Fuzzy Set Theory applications based on the standard definition of convex fuzzy sets. However, in some empirical contexts such as, for example, human decision making and ratings, convex representations might not be capable to capture more complex structures in the data. Moreover, non-convexity seems to arise as a natural property in many applications based on fuzzy systems (e.g., fuzzy scales of measurement). In these contexts, the usage of standard fuzzy statistical techniques could be questionable. A possible way out consists in adopting ad-hoc data manipulation procedures to transform non-convex data into standard convex representations. However, these procedures can artificially mask relevant information carried out by the non-convexity property. To overcome this problem, in this article we introduce a novel computational definition of non-convex fuzzy number which extends the traditional definition of LR-fuzzy number. Moreover, we also present a new fuzzy regression model for crisp input/non-convex fuzzy output data based on the fuzzy least squares approach. In order to better highlight some important characteristics of the model, we applied the fuzzy regression model to some datasets characterized by convex as well as non-convex features. Finally, some critical points are outlined in the final section of the article together with suggestions about future extensions of this work. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
6. Encoding words into Cloud models from interval-valued data via fuzzy statistics and membership function fitting.
- Author
-
Yang, Xiaojun, Yan, Liaoliao, Peng, Hui, and Gao, Xiangdong
- Subjects
- *
FUZZY statistics , *CODING theory , *CLOUD computing , *DATA analysis , *FUNCTIONAL analysis , *GAUSSIAN function - Abstract
Abstract: When constructing the model of a word by collecting interval-valued data from a group of individuals, both interpersonal and intrapersonal uncertainties coexist. Similar to the interval type-2 fuzzy set (IT2 FS) used in the enhanced interval approach (EIA), the Cloud model characterized by only three parameters can manage both uncertainties. Thus, based on the Cloud model, this paper proposes a new representation model for a word from interval-valued data. In our proposed method, firstly, the collected data intervals are preprocessed to remove the bad ones. Secondly, the fuzzy statistical method is used to compute the histogram of the surviving intervals. Then, the generated histogram is fitted by a Gaussian curve function. Finally, the fitted results are mapped into the parameters of a Cloud model to obtain the parametric model for a word. Compared with eight or nine parameters needed by an IT2 FS, only three parameters are needed to represent a Cloud model. Therefore, we develop a much more parsimonious parametric model for a word based on the Cloud model. Generally a simpler representation model with less parameters usually means less computations and memory requirements in applications. Moreover, the comparison experiments with the recent EIA show that, our proposed method can not only obtain much thinner footprints of uncertainty (FOUs) but also capture sufficient uncertainties of words. [Copyright &y& Elsevier]
- Published
- 2014
- Full Text
- View/download PDF
7. Identifying the distribution difference between two populations of fuzzy data based on a nonparametric statistical method.
- Author
-
Lin, Pei ‐ Chun, Watada, Junzo, and Wu, Berlin
- Subjects
- *
NONPARAMETRIC statistics , *DISTRIBUTION (Probability theory) , *FUZZY numbers , *GOODNESS-of-fit tests , *FUZZY statistics , *DATA analysis - Abstract
Nonparametric statistical tests are a distribution-free method without any assumption that data are drawn from a particular probability distribution. In this paper, to identify the distribution difference between two populations of fuzzy data, we derive a function that can describe continuous fuzzy data. In particular, the Kolmogorov-Smirnov two-sample test is used for distinguishing two populations of fuzzy data. Empirical studies illustrate that the Kolmogorov-Smirnov two-sample test enables us to judge whether two independent samples of continuous fuzzy data are derived from the same population. The results show that the proposed function is successful in distinguishing two populations of continuous fuzzy data and useful in various applications. © 2013 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
8. Robust fuzzy rough classifiers
- Author
-
Hu, Qinghua, An, Shuang, Yu, Xiao, and Yu, Daren
- Subjects
- *
ROUGH sets , *SET theory , *GENERALIZATION , *DATA analysis , *MATHEMATICAL models , *FUZZY sets , *APPROXIMATION theory , *FUZZY statistics - Abstract
Abstract: Fuzzy rough sets, generalized from Pawlak''s rough sets, were introduced for dealing with continuous or fuzzy data. This model has been widely discussed and applied these years. It is shown that the model of fuzzy rough sets is sensitive to noisy samples, especially sensitive to mislabeled samples. As data are usually contaminated with noise in practice, a robust model is desirable. We introduce a new model of fuzzy rough set model, called soft fuzzy rough sets, and design a robust classification algorithm based on the model. Experimental results show the effectiveness of the proposed algorithm. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
9. A method for training finite mixture models under a fuzzy clustering principle
- Author
-
Chatzis, Sotirios
- Subjects
- *
MATHEMATICAL models , *CLUSTER analysis (Statistics) , *FUZZY sets , *METHODOLOGY , *FUZZY statistics , *DATA analysis , *ALGORITHMS - Abstract
Abstract: In this paper, we establish a novel regard towards fuzzy clustering, showing it provides a sound framework for fitting finite mixture models. We propose a novel fuzzy clustering-type methodology for finite mixture model fitting, effected by utilizing a regularized form of the fuzzy c-means (FCM) algorithm, and introducing a proper dissimilarity functional for the algorithm with respect to the probabilistic properties of the model being treated. We apply the proposed methodology in a number of popular finite mixture models, and the corresponding expressions of the fuzzy model fitting algorithm are derived. We examine the efficacy of our novel approach in both clustering and classification applications of benchmark data sets, and we demonstrate the advantages of the proposed approach over maximum-likelihood. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
10. Generalized external indexes for comparing data partitions with overlapping categories
- Author
-
Campello, R.J.G.B.
- Subjects
- *
CLUSTER analysis (Statistics) , *INDEXING , *DATA analysis , *PARTITIONS (Mathematics) , *MATHEMATICAL category theory , *SET theory , *FUZZY statistics - Abstract
Abstract: There is a family of well-known external clustering validity indexes to measure the degree of compatibility or similarity between two hard partitions of a given data set, including partitions with different numbers of categories. A unified, fully equivalent set-theoretic formulation for an important class of such indexes was derived and extended to the fuzzy domain in a previous work by the author [Campello, R.J.G.B., 2007. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Lett., 28, 833–841]. However, the proposed fuzzy set-theoretic formulation is not valid as a general approach for comparing two fuzzy partitions of data. Instead, it is an approach for comparing a fuzzy partition against a hard referential partition of the data into mutually disjoint categories. In this paper, generalized external indexes for comparing two data partitions with overlapping categories are introduced. These indexes can be used as general measures for comparing two partitions of the same data set into overlapping categories. An important issue that is seldom touched in the literature is also addressed in the paper, namely, how to compare two partitions of different subsamples of data. A number of pedagogical examples and three simulation experiments are presented and analyzed in details. A review of recent related work compiled from the literature is also provided. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
11. Mining fuzzy association rules from uncertain data.
- Author
-
Cheng-Hsiung Weng and Yen-Liang Chen
- Subjects
DATA mining ,DATA analysis ,MACHINE learning ,FUZZY statistics ,FUZZY mathematics - Abstract
Association rule mining is an important data analysis method that can discover associations within data. There are numerous previous studies that focus on finding fuzzy association rules from precise and certain data. Unfortunately, real-world data tends to be uncertain due to human errors, instrument errors, recording errors, and so on. Therefore, a question arising immediately is how we can mine fuzzy association rules from uncertain data. To this end, this paper proposes a representation scheme to represent uncertain data. This representation is based on possibility distributions because the possibility theory establishes a close connection between the concepts of similarity and uncertainty, providing an excellent framework for handling uncertain data. Then, we develop an algorithm to mine fuzzy association rules from uncertain data represented by possibility distributions. Experimental results from the survey data show that the proposed approach can discover interesting and valuable patterns with high certainty. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
12. Robustness of density-based clustering methods with various neighborhood relations
- Author
-
Nasibov, Efendi N. and Ulutagay, Gözde
- Subjects
- *
CLUSTER analysis (Statistics) , *ROBUST control , *DATA analysis , *FUZZY statistics , *FUZZY algorithms - Abstract
Abstract: Cluster analysis is one of the most crucial techniques in statistical data analysis. Among the clustering methods, density-based methods have great importance due to their ability to recognize clusters with arbitrary shape. In this paper, robustness of the clustering methods is handled. These methods use distance-based neighborhood relations between points. In particular, DBSCAN (density-based spatial clustering of applications with noise) algorithm and FN-DBSCAN (fuzzy neighborhood DBSCAN) algorithm are analyzed. FN-DBSCAN algorithm uses fuzzy neighborhood relation whereas DBSCAN uses crisp neighborhood relation. The main characteristic of the FN-DBSCAN algorithm is that it combines the speed of the DBSCAN and robustness of the NRFJP (noise robust fuzzy joint points) algorithms. It is observed that the FN-DBSCAN algorithm is more robust than the DBSCAN algorithm to datasets with various shapes and densities. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF
13. Selective sampling for approximate clustering of very large data sets.
- Author
-
Wang, Liang, Bezdek, James C., Leckie, Christopher, and Kotagiri, Ramamohanarao
- Subjects
PATTERN perception ,CLUSTER set theory ,STATISTICAL sampling ,FUZZY statistics ,SAMPLE size (Statistics) ,DATA analysis - Abstract
A key challenge in pattern recognition is how to scale the computational efficiency of clustering algorithms on large data sets. The extension of non-Euclidean relational fuzzy c-means (NERF) clustering to very large (VL = unloadable) relational data is called the extended NERF (eNERF) clustering algorithm, which comprises four phases: (i) finding distinguished features that monitor progressive sampling; (ii) progressively sampling from a N × N relational matrix R
N to obtain a n × n sample matrix Rn ; (iii) clustering Rn with literal NERF; and (iv) extending the clusters in Rn to the remainder of the relational data. Previously published examples on several fairly small data sets suggest that eNERF is feasible for truly large data sets. However, it seems that phases (i) and (ii), i.e., finding Rn , are not very practical because the sample size n often turns out to be roughly 50% of n, and this over-sampling defeats the whole purpose of eNERF. In this paper, we examine the performance of the sampling scheme of eNERF with respect to different parameters. We propose a modified sampling scheme for use with eNERF that combines simple random sampling with (parts of) the sampling procedures used by eNERF and a related algorithm sVAT (scalable visual assessment of clustering tendency). We demonstrate that our modified sampling scheme can eliminate over-sampling of the original progressive sampling scheme, thus enabling the processing of truly VL data. Numerical experiments on a distance matrix of a set of 3,000,000 vectors drawn from a mixture of 5 bivariate normal distributions demonstrate the feasibility and effectiveness of the proposed sampling method. We also find that actually running eNERF on a data set of this size is very costly in terms of computation time. Thus, our results demonstrate that further modification of eNERF, especially the extension stage, will be needed before it is truly practical for VL data. © 2008 Wiley Periodicals, Inc. [ABSTRACT FROM AUTHOR]- Published
- 2008
- Full Text
- View/download PDF
14. Correspondence analysis with fuzzy data: The fuzzy eigenvalue problem
- Author
-
Theodorou, Y., Drossos, C., and Alevizos, P.
- Subjects
- *
CORRESPONDENCE analysis (Statistics) , *MULTIVARIATE analysis , *STATISTICS , *DATA , *ALGEBRA , *EIGENVALUES , *MATRICES (Mathematics) , *RESEARCH - Abstract
This paper constitutes a first step towards an extension of correspondence analysis with fuzzy data (FCA). At this stage, our main objective is to lay down the algebraic foundations for this fuzzy extension of the usual correspondence analysis. A two-step method is introduced to convert the fuzzy eigenvalue problem to an ordinary one. We consider a fuzzy matrix as the set of its cuts. Each such cut is an interval-valued matrix viewed as a line-segment in the matrix space. In this way, line-segments of cut-matrices are transformed into intervals of eigenvalues. Therefore, the two-step method is essentially a reduction of the fuzzy eigenvalue problem to an ordinary one. We illustrate the FCA-fuzzy eigenvalue problem with a simple numerical example. We hope upon the completion of this project in near future, to be able to supply the necessary tools for the end user of the correspondence analysis with fuzzy data. [Copyright &y& Elsevier]
- Published
- 2007
- Full Text
- View/download PDF
15. Soft Transition From Probabilistic to Possibilistic Fuzzy Clustering.
- Author
-
Masulli, Francesco and Rovetta, Stefano
- Subjects
FUZZY systems ,DATA analysis ,FUZZY statistics ,PROBABILISTIC number theory ,UNCERTAINTY (Information theory) ,ALGORITHMS - Abstract
In the fuzzy clustering literature, two main types of membership are usually considered: A relative type, termed probabilistic, and an absolute or possibilistic type, indicating the strength of the attribution to any cluster independent from the rest. There are works addressing the unification of the two schemes. Here, we focus on providing a model for the transition from one schema to the other, to exploit the dual information given by the two schemes, and to add flexibility for the interpretation of results. We apply an uncertainty model based on interval values to memberships in the clustering framework, obtaining a framework that we term graded possibility. We outline a basic example of graded possibilistic clustering algorithm and add some practical remarks about its implementation. The experimental demonstrations presented highlight the different properties attainable through appropriate implementation of a suitable graded possibilistic model. An interesting application is found in automated segmentation of diagnostic medical images, where the model provides an interactive visualization tool for this task. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
16. Extracting complex linguistic data summaries from personnel database via simple linguistic aggregations
- Author
-
Pei, Zheng, Xu, Yang, Ruan, Da, and Qin, Keyun
- Subjects
- *
LINGUISTIC models , *DATA analysis , *DATABASES , *GENETIC algorithms , *AGGREGATION operators , *FUZZY statistics - Abstract
Abstract: A linguistic data summary of a given data set is desirable and human consistent for any personnel department. To extract complex linguistic data summaries, the LOWA operator is used from fuzzy logic and some numerical examples are also provided in this paper. To obtain a complex linguistic data summary with a higher truth degree, genetic algorithms are applied to optimize the number and membership functions of linguistic terms and to select a part of truth degrees for aggregations, in which linguistic terms are represented by the 2-tuple linguistic representation model. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF
17. Design of local fuzzy models using evolutionary algorithms
- Author
-
Bonissone, Piero P., Varma, Anil, Aggour, Kareem S., and Xue, Feng
- Subjects
- *
FUZZY statistics , *ALGORITHMS , *VEHICLES , *DATA analysis - Abstract
Abstract: The application of local fuzzy models to determine the remaining life of a unit in a fleet of vehicles is described. Instead of developing individual models based on the track history of each unit or developing a global model based on the collective track history of the fleet, local fuzzy models are used based on clusters of peers—similar units with comparable utilization and performance characteristics. A local fuzzy performance model is created for each cluster of peers. This is combined with an evolutionary framework to maintain the models. A process has been defined to generate a collection of competing models, evaluate their performance in light of the currently available data, refine the best models using evolutionary search, and select the best one after a finite number of iterations. This process is repeated periodically to automatically update and improve the overall model. To illustrate this methodology an asset selection problem has been identified: given a fleet of industrial vehicles (diesel electric locomotives), select the best subset for mission-critical utilization. To this end, the remaining life of each unit in the fleet is predicted. The fleet is then sorted using this prediction and the highest ranked units are selected. A series of experiments using data from locomotive operations was conducted and the results from an initial validation exercise are presented. The approach of constructing local predictive models using fuzzy similarity with neighboring points along appropriate dimensions is not specific to any asset type and may be applied to any problem where the premise of similarity along chosen attribute dimensions implies similarity in predicted future behavior. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
18. A comparison of three methods for principal component analysis of fuzzy interval data
- Author
-
Giordani, Paolo and Kiers, Henk A.L.
- Subjects
- *
INTERVAL analysis , *PRINCIPAL components analysis , *FUZZY statistics , *DATA analysis - Abstract
Abstract: Vertices Principal Component Analysis (V-PCA), and Centers Principal Component Analysis (C-PCA) generalize Principal Component Analysis (PCA) in order to summarize interval valued data. Neural Network Principal Component Analysis (NN-PCA) represents an extension of PCA for fuzzy interval data. However, also the first two methods can be used for analyzing fuzzy interval data, but they then ignore the spread information. In the literature, the V-PCA method is usually considered computationally cumbersome because it requires the transformation of the interval valued data matrix into a single valued data matrix the number of rows of which depends exponentially on the number of variables and linearly on the number of observation units. However, it has been shown that this problem can be overcome by considering the cross-products matrix which is easy to compute. A review of C-PCA and V-PCA (which hence also includes the computational short-cut to V-PCA) and NN-PCA is provided. Furthermore, a comparison is given of the three methods by means of a simulation study and by an application to an empirical data set. In the simulation study, fuzzy interval data are generated according to various models, and it is reported in which conditions each method performs best. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
19. Dual models for possibilistic regression analysis
- Author
-
Guo, Peijun and Tanaka, Hideo
- Subjects
- *
REGRESSION analysis , *DATA analysis , *LINEAR programming , *FUZZY statistics - Abstract
Abstract: Upper and lower regression models (dual possibilistic models) are proposed for data analysis with crisp inputs and interval or fuzzy outputs. Based on the given data, the dual possibilistic models can be derived from upper and lower directions, respectively, where the inclusion relationship between these two models holds. Thus, the inherent uncertainty existing in the given phenomenon can be approximated by the dual models. As a core part of possibilistic regression, firstly possibilistic regression for crisp inputs and interval outputs is considered where the basic dual linear models based on linear programming, dual nonlinear models based on linear programming and dual nonlinear models based on quadratic programming are systematically addressed, and similarities between dual possibilistic regression models and rough sets are analyzed in depth. Then, as a natural extension, dual possibilistic regression models for crisp inputs and fuzzy outputs are addressed. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
20. Goodman–Kruskal measure of dependence for fuzzy ordered categorical data
- Author
-
Hryniewicz, Olgierd
- Subjects
- *
FUZZY statistics , *DEPENDENCE (Statistics) , *MATHEMATICAL statistics , *DATA analysis - Abstract
Abstract: The generalisation of the Goodman–Kruskal statistic that is used for the measurement of the strength of dependence (association) between two categorical variables with ordered categories is presented. The case when some data are not precise, and observations are described by possibility distributions over a set of categories of one variable is considered. For such data the fuzzy version of statistic has been defined. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
21. Fuzzy multidimensional scaling
- Author
-
Hébert, Pierre-Alexandre, Masson, Marie-Hélène, and Denœux, Thierry
- Subjects
- *
MULTIDIMENSIONAL scaling , *FUZZY statistics , *DATA analysis , *ALGORITHMS - Abstract
Abstract: Multidimensional scaling (MDS) is a data analysis technique for representing measurements of (dis)similarity among pairs of objects as distances between points in a low-dimensional space. MDS methods differ mainly according to the distance model used to scale the proximities. The most usual model is the Euclidean one, although a spherical model is often preferred to represent correlation measurements. These two distance models are extended to the case where dissimilarities are expressed as intervals or fuzzy numbers. Each object is then no longer represented by a point but by a crisp or a fuzzy region in the chosen space. To determine these regions, two algorithms are proposed and illustrated using typical data sets. Experiments demonstrate the ability of the methods to represent both the structure and the vagueness of dissimilarity measurements. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
22. Univariate statistical analysis with fuzzy data
- Author
-
Viertl, Reinhard
- Subjects
- *
FUZZY statistics , *STATISTICS , *DATA analysis , *BAYESIAN analysis - Abstract
Abstract: Statistical data are frequently not precise numbers but more or less non-precise, also called fuzzy. Measurements of continuous variables are always fuzzy to a certain degree. Therefore histograms and generalized classical statistical inference methods for univariate fuzzy data have to be considered. Moreover Bayesian inference methods in the situation of fuzzy a priori information and fuzzy data are discussed. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
23. Data analysis with fuzzy clustering methods
- Author
-
Döring, Christian, Lesot, Marie-Jeanne, and Kruse, Rudolf
- Subjects
- *
FUZZY statistics , *DATA analysis , *ALGORITHMS , *STATISTICS - Abstract
Abstract: An encompassing, self-contained introduction to the foundations of the broad field of fuzzy clustering is presented. The fuzzy cluster partitions are introduced with special emphasis on the interpretation of the two most encountered types of gradual cluster assignments: the fuzzy and the possibilistic membership degrees. A systematic overview of present fuzzy clustering methods is provided, highlighting the underlying ideas of the different approaches. The class of objective function-based methods, the family of alternating cluster estimation algorithms, and the fuzzy maximum likelihood estimation scheme are discussed. The latter is a fuzzy relative of the well-known expectation maximization algorithm and it is compared to its counterpart in statistical clustering. Related issues are considered, concluding with references to selected developments in the area. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
24. Random and fuzzy sets in coarse data analysis
- Author
-
Nguyen, Hung T. and Wu, Berlin
- Subjects
- *
DATA analysis , *STATISTICS , *PROBABILITY theory , *FUZZY statistics - Abstract
Abstract: The theoretical aspects of statistical inference with imprecise data, with focus on random sets, are considered. On the setting of coarse data analysis imprecision and randomness in observed data are exhibited, and the relationship between probability and other types of uncertainty, such as belief functions and possibility measures, is analyzed. Coarsening schemes are viewed as models for perception-based information gathering processes in which random fuzzy sets appear naturally. As an implication, fuzzy statistics is statistics with fuzzy data. That is, fuzzy sets are a new type of data and as such, complementary to statistical analysis in the sense that they enlarge the domain of applications of statistical science. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
25. Extending fuzzy and probabilistic clustering to very large data sets
- Author
-
Hathaway, Richard J. and Bezdek, James C.
- Subjects
- *
FUZZY statistics , *DATA analysis , *PROBABILITY theory , *ALGORITHMS - Abstract
Abstract: Approximating clusters in very large (VL=unloadable) data sets has been considered from many angles. The proposed approach has three basic steps: (i) progressive sampling of the VL data, terminated when a sample passes a statistical goodness of fit test; (ii) clustering the sample with a literal (or exact) algorithm; and (iii) non-iterative extension of the literal clusters to the remainder of the data set. Extension accelerates clustering on all (loadable) data sets. More importantly, extension provides feasibility—a way to find (approximate) clusters—for data sets that are too large to be loaded into the primary memory of a single computer. A good generalized sampling and extension scheme should be effective for acceleration and feasibility using any extensible clustering algorithm. A general method for progressive sampling in VL sets of feature vectors is developed, and examples are given that show how to extend the literal fuzzy (-means) and probabilistic (expectation-maximization) clustering algorithms onto VL data. The fuzzy extension is called the generalized extensible fast fuzzy -means (geFFCM) algorithm and is illustrated using several experiments with mixtures of five-dimensional normal distributions. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
26. Tools for fuzzy random variables: Embeddings and measurabilities
- Author
-
López-Díaz, Miguel and Ralescu, Dan A.
- Subjects
- *
FUZZY statistics , *DATA analysis , *PROBABILITY theory , *STATISTICS - Abstract
Abstract: The concept of fuzzy random variable has been shown to be as a valuable model for handling fuzzy data in statistical problems. The theory of fuzzy-valued random elements provides a suitable formalization for the management of fuzzy data in the probabilistic setting. A concise overview of fuzzy random variables, focussed on the crucial aspects for data analysis, is presented. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
27. Bootstrap approach to the multi-sample test of means with imprecise data
- Author
-
Gil, María Ángeles, Montenegro, Manuel, González-Rodríguez, Gil, Colubi, Ana, and Rosa Casals, María
- Subjects
- *
STATISTICAL bootstrapping , *DATA analysis , *FUZZY statistics , *COMPARATIVE studies - Abstract
Abstract: A bootstrap approach to the multi-sample test of means for imprecisely valued sample data is introduced. For this purpose imprecise data are modelled in terms of fuzzy values. Populations are identified with fuzzy-valued random elements, often referred to in the literature as fuzzy random variables. An example illustrates the use of the suggested method. Finally, the adequacy of the bootstrap approach to test the multi-sample hypothesis of means is discussed through a simulation comparative study. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
28. Two cluster validity indices for the LAMDA clustering method.
- Author
-
Valderrama, Javier Fernando Botía and Valderrama, Diego José Luis Botía
- Subjects
MACHINE learning ,GRANULATION ,DATA analysis - Abstract
The learning algorithm and multivariable data analysis (LAMDA) is an algorithm to group quantitative and qualitative data, applying self-learning and/or directed learning. Usually, LAMDA automatically generates classes by assigning the best data partition to a class. To evaluate the data partitions generated by LAMDA, the internal evaluation is used to find the optimal number of clusters. For the LAMDA algorithm, the cluster validity (CV) is the most popular index which is based on inter-class contrast (ICC). However, other indices have not been defined for LAMDA and a comparative analysis is required to evaluate its performance. In this paper, two metrics called cluster validity index based on granulation error and the ratio of the distance (CVGED) and cluster validity index based on the ratios of covariance and distance (CVCOD) are proposed. Such indices are compared with the CV and ICC indices for two experiments: using a databases repository and selected open data and experimental laboratory data. According to the main results, CVGED and CVCOD have a better performance in compactness, separation, and coefficient of variation than ICC and CV for most of the selected repository databases but the accuracy is limited for the four indices. Nevertheless, CVCOD improves the quality of data partition when the open data and experimental laboratory data are used. • The paper proposes two cluster validity indices for the LAMDA clustering algorithm. • The CVGED index is based on the granulation error and the ratio of the distance. • The CVCOD index is based on the ratio of the distances and the ratio of compactness. • CVGED and CVCOD have a better performance than ICC and CV. • CVCOD can find the best parameter values and to increase the quality of partition. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.