Descriptor: "automatic clustering" / Search Limiters: Available in Library Collection - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"automatic clustering"' showing total 144 results

Start Over Descriptor "automatic clustering" Search Limiters Available in Library Collection

144 results on '"automatic clustering"'

1. Cluster validity indices for automatic clustering: A comprehensive review

Author: Ikotun, Abiodun M., Habyarimana, Faustin, and Ezugwu, Absalom E.
Published: 2025
Full Text: View/download PDF

2. Adaptive multi-model predictive control with optimal model bank formation: Consideration of local models uncertainty and stability

Author: Fathi, Mohammad, Bolandi, Hossein, Vaghei, Bahman Ghorbani, and Ebadolahi, Saeid
Published: 2024
Full Text: View/download PDF

3. Dynamic Social Particle Swarm Optimization For Automatic Clustering

Author: Amdouni, Hamida, Manita, Ghaith, Oliva, Diego, Houssein, Essam H., Korbaa, Ouajdi, and Zapotecas-Martínez, Saúl
Published: 2024
Full Text: View/download PDF

4. Ellipsoidal K -Means: An Automatic Clustering Approach for Non-Uniform Data Distributions.

Author: Abdel-Hakim, Alaa E., Ibrahim, Abdel-Monem M., Bouazza, Kheir Eddine, Deabes, Wael, and Hedar, Abdel-Rahman
Subjects: *CLUSTERING algorithms, *DATA distribution, *K-means clustering, *EUCLIDEAN distance, *CLUSTER analysis (Statistics), *SIMULATED annealing, *CENTROID
Abstract: Traditional K-means clustering assumes, to some extent, a uniform distribution of data around predefined centroids, which limits its effectiveness for many realistic datasets. In this paper, a new clustering technique, simulated-annealing-based ellipsoidal clustering (SAELLC), is proposed to automatically partition data into an optimal number of ellipsoidal clusters, a capability absent in traditional methods. SAELLC transforms each identified cluster into a hyperspherical cluster, where the diameter of the hypersphere equals the minor axis of the original ellipsoid, and the center is encoded to represent the entire cluster. During the assignment of points to clusters, local ellipsoidal properties are independently considered. For objective function evaluation, the method adaptively transforms these ellipsoidal clusters into a variable number of global clusters. Two objective functions are simultaneously optimized: one reflecting partition compactness using the silhouette function (SF) and Euclidean distance, and another addressing cluster connectedness through a nearest-neighbor algorithm. This optimization is achieved using a newly-developed multiobjective simulated annealing approach. SAELLC is designed to automatically determine the optimal number of clusters, achieve precise partitioning, and accommodate a wide range of cluster shapes, including spherical, ellipsoidal, and non-symmetric forms. Extensive experiments conducted on UCI datasets demonstrated SAELLC's superior performance compared to six well-known clustering algorithms. The results highlight its remarkable ability to handle diverse data distributions and automatically identify the optimal number of clusters, making it a robust choice for advanced clustering analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Exploring meta-heuristics for partitional clustering: methods, metrics, datasets, and challenges.

Author: Kaur, Arvinder, Kumar, Yugal, and Sidhu, Jagpreet
Abstract: Partitional clustering is a type of clustering that can organize the data into non-overlapping groups or clusters. This technique has diverse applications across the different various domains like image processing, pattern recognition, data mining, rule-based systems, customer segmentation, image segmentation, and anomaly detection, etc. Hence, this survey aims to identify the key concepts and approaches in partitional clustering. Further, it also highlights its widespread applicability including major advantages and challenges. Partitional clustering faces challenges like selecting the optimal number of clusters, local optima, sensitivity to initial centroids, etc. Therefore, this survey describes the clustering problems as partitional clustering, dynamic clustering, automatic clustering, and fuzzy clustering. The objective of this survey is to identify the meta-heuristic algorithms for the aforementioned clustering. Further, the meta-heuristic algorithms are also categorised into simple meta-heuristic algorithms, improved meta-heuristic algorithms, and hybrid meta-heuristic algorithms. Hence, this work also focuses on the adoption of new meta-heuristic algorithms, improving existing methods and novel techniques that enhance clustering performance and robustness, making partitional clustering a critical tool for data analysis and machine learning. This survey also highlights the different objective functions and benchmark datasets adopted for measuring the effectiveness of clustering algorithms. Before the literature survey, several research questions are formulated to ensure the effectiveness and efficiency of the survey such as what are the various meta-heuristic techniques available for clustering problems? How to handle automatic data clustering? What are the main reasons for hybridizing clustering algorithms? The survey identifies shortcomings associated with existing algorithms and clustering problems and highlights the active area of research in the clustering field to overcome these limitations and improve performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Improvement of DBSCAN Algorithm Involving Automatic Parameters Estimation and Curvature Analysis in 3D Point Cloud of Piled Pipe.

Author: Pratama, Alfan Rizaldy, Bayu Dewantara, Bima Sena, Sari, Dewi Mutiara, and Pramadihanto, Dadet
Subjects: PARAMETER estimation, CURVATURE, CLUSTER analysis (Statistics), ALGORITHMS, SIMPLICITY, POINT cloud
Abstract: Bin-picking in the industrial area is a challenging task since the object is piled in a box. The rapid development of 3D point cloud data in the bin-picking task has not fully addressed the robustness issue of handling objects in every circumstance of piled objects. Density-Based Spatial Clustering of Application with Noise (DBSCAN) as the algorithm that attempts to solve by its density still has a disadvantage like parameter-tuning and ignoring the unique shape of an object. This paper proposes a solution by providing curvature analysis in each point data to represent the shape of an object therefore called Curvature-Density-Based Spatial Clustering of Application with Noise (CVRDBSCAN). Our improvement uses curvature to analyze object shapes in different placements and automatically estimates parameters like Eps and MinPts. Divided by three algorithms, we call it Auto-DBSCAN, CVR-DBSCAN-Avg, and CVR-DBSCAN-Disc. By using real-scanned Time-of-Flight camera datasets separated by three piled conditions that are well separated, well piled, and arbitrary piled to analyze all possibilities in placing objects. As a result, in well separated, Auto-DBSCAN leads by the stability and accuracy in 99.67% which draws as the DBSCAN using specified parameters. For well piled, CVR-DBSCAN-Avg gives the highest stability although the accuracy can be met with DBSCAN on specified parameters in 98.83%. Last, in arbitrary piled though CVR-DBSCAN-Avg in accuracy lower than DBSCAN which is 73.17% compared to 80.43% the stability is slightly higher with less outlier value. Deal with computational time higher than novel DBSCAN, our improvement made the simplicity and deep analysis in scene understanding. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Neighbor-Relationship-Based Adaptive Density Peak Clustering

Author: Zhigang Su, Qian Gao, Jingtang Hao, Yue Wang, and Bing Han
Subjects: Spatial clustering, density peak, uneven density, neighbor relationship, automatic clustering, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The Density Peak Clustering (DPC) algorithm encounters challenges such as difficulty in choosing cluster centers and the chain reaction caused by incorrect assignment of data points when clustering spatial datasets containing clusters with significant density differences or multi-peak clusters. To address these problems, in this paper, starting from enhancing the local density definition, optimizing the selection of cluster centers, and improving the assignment strategy of non-cluster center data points, an Adaptive DPC (NRA-DPC) algorithm is proposed based on the neighbor relationship. The NRA-DPC algorithm utilizes the reverse K-nearest neighbors of data points as the basis for defining the local density of data points and divides the spatial dataset into a core point set and a boundary point set based on the number of elements in the reverse K-nearest neighbor set of data points. The idea of iteration is adopted to select cluster centers from the core point set and assign non-cluster center data points, forming the initial clusters. For each initial cluster formed by the core point set, the corresponding minimum spanning tree (MST) is generated, and based on the average edge length of the MST, the assignment threshold of this cluster is set. The boundary point set completes the corresponding data point assignment task based on this assignment threshold and the mutual K-nearest neighbor relationship. Experimental results indicate that, compared with other typical clustering algorithms, the NRA-DPC algorithm can automatically select cluster centers, reduce the probability of incorrect assignment of non-cluster center data points, and effectively suppress the chain reaction triggered by incorrect assignment of non-cluster center data points, demonstrating more stable clustering performance when dealing with different datasets.
Published: 2024
Full Text: View/download PDF

8. Adaptive clustering algorithm based on improved marine predation algorithm and its application in bearing fault diagnosis

Author: Zhuanzhe Zhao, Mengxian Wang, Yongming Liu, Zhibo Liu, Yuelin Lu, Yu Chen, and Zhijian Tu
Subjects: fault diagnosis, automatic clustering, cluster validity index, marine predator algorithm, k-means clustering, Mathematics, QA1-939, Applied mathematics. Quantitative methods, T57-57.97
Abstract: In cluster analysis, determining the number of clusters is an important issue because there is less information about the most appropriate number of clusters in the real problem. Automatic clustering is a clustering method that automatically finds the most appropriate number of clusters and divides instances into the corresponding clusters. In this paper, a novel automatic clustering algorithm based on the improved marine predator algorithm (IMPA) and K-means algorithm is proposed. The new IMPA utilizes refracted opposition-based learning in population initialization, generates opposite solutions to improve the diversity of the population and produces more accurate solutions. In addition, the sine-cosine algorithm is incorporated to balance global exploration and local development of the algorithm for dynamic updating of the predator and prey population positions. At the same time, the Gaussian-Cauchy mutation is combined to improve the probability of obtaining the globally optimal solution. The proposed IMPA is validated with some benchmark data sets. The calculation results show that IMPA is superior to the original MPA in automatic clustering. In addition, IMPA is also used to solve the problem of fault classification of Xi*an Jiaotong University bearing data. The results show that the IMPA has better and more stable results than other algorithms such as the original MPA, whale optimization algorithm, fuzzy C-means and K-means in automatic clustering.
Published: 2023
Full Text: View/download PDF

9. An automatic density peaks clustering based on a density-distance clustering index

Author: Xiao Xu, Hong Liao, and Xu Yang
Subjects: dpc algorithm, automatic clustering, decision graph, optimal number of clusters, parameter selection, Mathematics, QA1-939
Abstract: The density peaks clustering (DPC) algorithm plays an important role in data mining by quickly identifying cluster centers using decision graphs to identify arbitrary clusters. However, the decision graph introduces uncertainty in determining the cluster centers, which can result in an incorrect number of clusters. In addition, the cut-off distance parameter relies on prior knowledge, which poses a limitation. To address these issues, we propose an improved automatic density peaks clustering (ADPC) algorithm. First, a novel clustering validity index called density-distance clustering (DDC) is introduced. The DDC index draws inspiration from the density and distance characteristics of cluster centers, which is applicable to DPC and aligns with the general definition of clustering. Based on the DDC index, the ADPC algorithm automatically selects the suitable cut-off distance and acquires the optimal number of clusters without additional parameters. Numerical experimental results validate that the introduced ADPC algorithm successfully automatically determines the optimal number of clusters and cut-off distance, significantly outperforming DPC, AP and DBSCAN algorithms.
Published: 2023
Full Text: View/download PDF

10. Unsupervised optimal model bank for multiple model control systems: Genetic-based automatic clustering approach

Author: Mohammad Fathi and Hossein Bolandi
Subjects: Multiple model control, Automatic clustering, Genetic algorithm, Optimal model bank, Science (General), Q1-390, Social sciences (General), H1-99
Abstract: In the Multiple Model Control (MMC) strategies, a bank of simple local models is used to describe the behavior of complex systems with vast operation space. In this approach, the system operation space is divided into several subspaces, and in each subspace, a simple local model is assigned to describe the system behavior. This study addresses the two main challenges in this field which involve determining the optimal number of required local models to form the model bank and identifying the optimal distribution of the local models across the system operation space. Providing appropriate answers to these questions directly affects the performance of the MMC system. In this paper, GA-based automatic clustering method is suggested to form an optimal model bank. In this regard, an appropriate mapping is established between the concepts of MMC and automatic clustering, and a novel unsupervised algorithm is designed to determine the optimal model bank. Unlike the existing methods in the literature, the proposed method can form the global optimal model bank without entrapment into local optima regardless of the initial conditions of the used search algorithm. In this paper, the formation of the optimal model bank using the proposed method is investigated by considering the spacecraft attitude dynamics as a complex, MIMO, non-linear case study and its satisfactory and promising performance is demonstrated.
Published: 2024
Full Text: View/download PDF

11. Calibration of hydrological models for ungauged catchments by automatic clustering using a differential evolution algorithm: The Gorganrood river basin case study

Author: Zahra Alizadeh and Jafar Yazdi
Subjects: automatic clustering, differential evolution (de), gorganrood river basin, hydrologic model calibration, swmm, ungauged catchments, Information technology, T58.5-58.64, Environmental technology. Sanitary engineering, TD1-1066
Abstract: The hydrological model calibration is a challenging task, especially in ungauged catchments. The regionalization calibration methods can be used to estimate the parameters of the model in ungauged sub-catchments. In this article, the model of ungauged sub-catchments is calibrated by a regionalization approach based on automatic clustering. Under the clustering procedure, gauged and ungauged sub-catchments are grouped based on their physical characteristics and similarity. The optimal number of clusters is determined using an automatic differential evolution algorithm-based clustering. Considering obtained five clusters, the value of the silhouette measure is equal to 0.56, which is an acceptable value for goodness of clustering. The calibration process is conducted according to minimizing errors in simulated peak flow and total flow volume. The Storm Water Management Model is applied to calibrate a set of 53 sub-catchments in the Gorganrood river basin. Comparing graphically and statistically simulated and observed runoff values and also calculating the value of the silhouette coefficient demonstrate that the proposed methodology is a promising approach for hydrological model calibration in ungauged catchments. HIGHLIGHTS The model of ungauged sub-catchments is calibrated by a regionalization approach based on automatic clustering.; The optimal number of clusters is determined using an automatic differential evolution algorithm-based clustering.; Comparing graphically and statistically simulated and observed runoff values and also calculating the value of silhouette coefficient proved the superiority of automatic clustering differential evolution in clustering.;
Published: 2023
Full Text: View/download PDF

12. Globally automatic fuzzy clustering for probability density functions and its application for image data.

Author: Nguyen-Trang, Thao, Nguyen-Thoi, Trung, and Vo-Van, Tai
Subjects: PROBABILITY density function, FUZZY algorithms, DIFFERENTIAL evolution, IMAGE recognition (Computer vision)
Abstract: Clustering for probability density functions (CDF) can be categorized as non-fuzzy and fuzzy approaches. Regarding the second approach, the iterative refinement technique has been used for searching the optimal partition. This method could be easily trapped at a local optimum. In order to find the global optimum, a meta-heuristic optimization (MO) algorithm must be incorporated into the fuzzy CDF problem. However, no research utilizing MO to solve the fuzzy CDF problem has been proposed so far due to the lack of a reasonable encoding for converting a fuzzy clustering solution to a chromosome. To address this shortcoming, a new definition called Gaussian prototype is defined first. This type of prototype is capable of accurately representing the cluster without being overly complex. As a result, prototypes' information can be easily integrated into the chromosome via a novel prototype-based encoding method. Second, a new objective function is introduced to evaluate a fuzzy CDF solution. Finally, Differential Evolution (DE) is used to determine the optimal solution for fuzzy clustering. The proposed method, namely DE-AFCF, is the first to propose a globally automatic fuzzy CDF algorithm, which not only can automatically determine the number of clusters k but also can search for the optimal fuzzy partition matrix by taking into account both clustering compactness and separation. The DE-AFCF is also applied in some image clustering problems, such as processed image detection, and traffic image recognition. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

13. Automatic Clustering for Improved Radio Environment Maps in Distributed Applications.

Author: Ben Chikha, Haithem and Alaerjan, Alaa
Subjects: STANDARD deviations, SMART cities, K-means clustering, WIRELESS communications, TELECOMMUNICATION
Abstract: Wireless communication greatly contributes to the evolution of new technologies, such as the Internet of Things (IoT) and edge computing. The new generation networks, including 5G and 6G, provide several connectivity advantages for multiple applications, such as smart health systems and smart cities. Adopting wireless communication technologies in these applications is still challenging due to factors such as mobility and heterogeneity. Predicting accurate radio environment maps (REMs) is essential to facilitate connectivity and improve resource utilization. The construction of accurate REMs through the prediction of reference signal received power (RSRP) can be useful in densely distributed applications, such as smart cities. However, predicting an accurate RSRP in the applications can be complex due to intervention and mobility aspects. Given the fact that the propagation environments can be different in a specific area of interest, the estimation of a common path loss exponent for the entire area produces errors in the constructed REM. Hence, it is necessary to use automatic clustering to distinguish between different environments by grouping locations that exhibit similar propagation characteristics. This leads to better prediction of the propagation characteristics of other locations within the same cluster. Therefore, in this work, we propose using the Kriging technique, in conjunction with the automatic clustering approach, in order to improve the accuracy of RSRP prediction. In fact, we adopt K-means clustering (KMC) to enhance the path loss exponent estimation. We use a dataset to test the proposed model using a set of comparative studies. The results showed that the proposed approach provides significant RSRP prediction capabilities for constructing REM, with a gain of about 3.3 dB in terms of root mean square error compared to the case without clustering. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

14. A Review of Quantum-Inspired Metaheuristic Algorithms for Automatic Clustering.

Author: Dey, Alokananda, Bhattacharyya, Siddhartha, Dey, Sandip, Konar, Debanjan, Platos, Jan, Snasel, Vaclav, Mrsic, Leo, and Pal, Pankaj
Subjects: *QUANTUM computing, *ALGORITHMS, *QUANTUM computers
Abstract: In real-world scenarios, identifying the optimal number of clusters in a dataset is a difficult task due to insufficient knowledge. Therefore, the indispensability of sophisticated automatic clustering algorithms for this purpose has been contemplated by some researchers. Several automatic clustering algorithms assisted by quantum-inspired metaheuristics have been developed in recent years. However, the literature lacks definitive documentation of the state-of-the-art quantum-inspired metaheuristic algorithms for automatically clustering datasets. This article presents a brief overview of the automatic clustering process to establish the importance of making the clustering process automatic. The fundamental concepts of the quantum computing paradigm are also presented to highlight the utility of quantum-inspired algorithms. This article thoroughly analyses some algorithms employed to address the automatic clustering of various datasets. The reviewed algorithms were classified according to their main sources of inspiration. In addition, some representative works of each classification were chosen from the existing works. Thirty-six such prominent algorithms were further critically analysed based on their aims, used mechanisms, data specifications, merits and demerits. Comparative results based on the performance and optimal computational time are also presented to critically analyse the reviewed algorithms. As such, this article promises to provide a detailed analysis of the state-of-the-art quantum-inspired metaheuristic algorithms, while highlighting their merits and demerits. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

15. Automatic clustering of colour images using quantum inspired meta-heuristic algorithms.

Author: Dey, Alokananda, Bhattacharyya, Siddhartha, Dey, Sandip, Platos, Jan, and Snasel, Vaclav
Subjects: PARTICLE swarm optimization, COLOR image processing, METAHEURISTIC algorithms, QUANTUM computers, EVOLUTIONARY algorithms, QUANTUM computing, DIFFERENTIAL evolution, COLOR
Abstract: This work explores the effectiveness and robustness of quantum computing by conjoining the principles of quantum computing with the conventional computational paradigm for the automatic clustering of colour images. In order to develop such a computationally efficient algorithm, two population-based meta-heuristic algorithms, viz., Particle Swarm Optimization (PSO) algorithm and Enhanced Particle Swarm Optimization (EPSO) algorithm have been consolidated with the quantum computing framework to yield the Quantum Inspired Particle Swarm Optimization (QIPSO) algorithm and the Quantum Inspired Enhanced Particle Swarm Optimization (QIEPSO) algorithm, respectively. This paper also presents a comparison between the proposed quantum inspired algorithms with their corresponding classical counterparts and also with three other evolutionary algorithms, viz., Artificial Bee Colony (ABC), Differential Evolution (DE) and Covariance Matrix Adaption Evolution Strategies (CMA-ES). In this paper, twenty different sized colour images have been used for conducting the experiments. Among these twenty images, ten are Berkeley images and ten are real life colour images. Three cluster validity indices, viz., PBM, CS-Measure (CSM) and Dunn index (DI) have been used as objective functions for measuring the effectiveness of clustering. In addition, in order to improve the performance of the proposed algorithms, some participating parameters have been adjusted using the Sobol's sensitivity analysis test. Four segmentation evaluation metrics have been used for quantitative evaluation of the proposed algorithms. The effectiveness and efficiency of the proposed quantum inspired algorithms have been established over their conventional counterparts and the three other competitive algorithms with regards to optimal computational time, convergence rate and robustness. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. CVIK: A Matlab-based cluster validity index toolbox for automatic data clustering

Author: Adán José-García and Wilfrido Gómez-Flores
Subjects: Clustering, Cluster validity index, Automatic clustering, Computer software, QA76.75-76.765
Abstract: We present CVIK, a Matlab-based toolbox for assisting the process of cluster analysis applications. This toolbox aims to implement 28 cluster validity indices (CVIs) for measuring clustering quality available to data scientists, researchers, and practitioners. CVIK facilitates implementing the entire pipeline of automatic clustering in two approaches: (i) evaluating candidate clustering solutions from classical algorithms, in which the number of clusters increases gradually, and (ii) assessing potential solutions in evolutionary clustering algorithms using single- and multi-objective optimization methods. This toolbox also implements distinct proximity measures to estimate data similarity, and the CVIs are capable of processing both feature data and relational data. The source code and examples can be found in this GitHub repository: https://github.com/adanjoga/cvik-toolbox.
Published: 2023
Full Text: View/download PDF

17. Balance-driven automatic clustering for probability density functions using metaheuristic optimization.

Author: Nguyen-Trang, Thao, Nguyen-Thoi, Trung, Nguyen-Thi, Kim-Ngan, and Vo-Van, Tai
Abstract: For solving the clustering for probability density functions (CDF) problem with a given number of clusters, the metaheuristic optimization (MO) algorithms have been widely studied because of their advantages in searching for the global optimum. However, the existing approaches cannot be directly extended to the automatic CDF problem for determining the number of clusters k. Besides, balance-driven clustering, an essential research direction recently developed in the problem of discrete-element clustering, has not been considered in the field of CDF. This paper pioneers a technique to apply an MO algorithm for resolving the balance-driven automatic CDF. The proposed method not only can automatically determine the number of clusters but also can approximate the global optimal solution in which both the clustering compactness and the clusters' size similarity are considered. The experiments on one-dimensional and multidimensional probability density functions demonstrate that the new method possesses higher quality clustering solutions than the other conventional techniques. The proposed method is also applied in analyzing the difficulty levels of entrance exam questions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

18. A novel density deviation multi-peaks automatic clustering algorithm.

Author: Zhou, Wei, Wang, Limin, Han, Xuming, Parmar, Milan, and Li, Mingyang
Subjects: POINT processes, ALGORITHMS
Abstract: The density peaks clustering (DPC) algorithm is a classical and widely used clustering method. However, the DPC algorithm requires manual selection of cluster centers, a single way of density calculation, and cannot effectively handle low-density points. To address the above issues, we propose a novel density deviation multi-peaks automatic clustering method (AmDPC) in this paper. Firstly, we propose a new local-density and use the deviation to measure the relationship between data points and the cut-off distance ( d c ). Secondly, we divide the density deviation into multiple density levels equally and extract the points with higher distances in each density level. Finally, for the multi-peak points with higher distances at low-density levels, we merge them according to the size difference of the density deviation. We finally achieve the overall automatic clustering by processing the low-density points. To verify the performance of the method, we test the synthetic dataset, the real-world dataset, and the Olivetti Face dataset, respectively. The simulation experimental results indicate that the AmDPC method can handle low-density points more effectively and has certain effectiveness and robustness. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. A fault diagnosis framework using unlabeled data based on automatic clustering with meta-learning.

Author: Zhao, Zhiqian, Jiao, Yinghou, Xu, Yeyin, Chen, Zhaobo, and Zio, Enrico
Subjects: *DATA augmentation, *MACHINE learning, *INTERNET of things, *DIAGNOSIS methods, *PROBLEM solving, *CASCADE control
Abstract: With the growth of the industrial internet of things, the poor performance of conventional deep learning models hinders the application of intelligent diagnosis methods in industrial situations such as lack of fault samples and difficulties in data labeling. To solve the above problems, we propose a fault diagnosis framework based on unsupervised meta-learning and contrastive learning, which is called automatic clustering with meta-learning (ACML). First, the amount of data is expanded through data augmentation approaches, and a feature generator is constructed to extract highly discriminative features from the unlabeled dataset using contrastive learning. Then, a cluster generator is used to automatically divide cluster partitions and add pseudo-labels for these. Finally, the classification tasks are derived through taking original samples in the partitions, which are embedded in the meta-learner for fault diagnosis. In the meta-learning stage, we split out two subsets from task and feed them into the inner and outer loops to maintain the class consistency of the real labels. After training, ACML transfers its prior expertise to the unseen task to efficiently complete the categorization of new faults. ACML is applied to two cases concerning a public dataset and a self-constructed dataset, demonstrate that ACML achieves good diagnostic performance, outperforming popular unsupervised methods. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

20. A novel density deviation multi-peaks automatic clustering algorithm

Author: Wei Zhou, Limin Wang, Xuming Han, Milan Parmar, and Mingyang Li
Subjects: Automatic clustering, Density peaks clustering, Density deviation, Low-density points, Electronic computers. Computer science, QA75.5-76.95, Information technology, T58.5-58.64
Abstract: Abstract The density peaks clustering (DPC) algorithm is a classical and widely used clustering method. However, the DPC algorithm requires manual selection of cluster centers, a single way of density calculation, and cannot effectively handle low-density points. To address the above issues, we propose a novel density deviation multi-peaks automatic clustering method (AmDPC) in this paper. Firstly, we propose a new local-density and use the deviation to measure the relationship between data points and the cut-off distance ( $$d_c$$ d c ). Secondly, we divide the density deviation into multiple density levels equally and extract the points with higher distances in each density level. Finally, for the multi-peak points with higher distances at low-density levels, we merge them according to the size difference of the density deviation. We finally achieve the overall automatic clustering by processing the low-density points. To verify the performance of the method, we test the synthetic dataset, the real-world dataset, and the Olivetti Face dataset, respectively. The simulation experimental results indicate that the AmDPC method can handle low-density points more effectively and has certain effectiveness and robustness.
Published: 2022
Full Text: View/download PDF

21. A hybrid genetic-fuzzy ant colony optimization algorithm for automatic K-means clustering in urban global positioning system.

Author: Ran, Xiaojuan, Suyaroj, Naret, Tepsan, Worawit, Ma, Jianghong, Zhou, Xiangbing, and Deng, Wu
Subjects: *GLOBAL Positioning System, *K-means clustering, *ANT algorithms
Published: 2024
Full Text: View/download PDF

22. Improved SOSK-Means Automatic Clustering Algorithm with a Three-Part Mutualism Phase and Random Weighted Reflection Coefficient for High-Dimensional Datasets.

Author: Ikotun, Abiodun M. and Ezugwu, Absalom E.
Subjects: FUZZY algorithms, CENTROID, METAHEURISTIC algorithms, REFLECTANCE, K-means clustering, CENTRAL limit theorem, OUTLIER detection, SEARCH algorithms
Abstract: Automatic clustering problems require clustering algorithms to automatically estimate the number of clusters in a dataset. However, the classical K-means requires the specification of the required number of clusters a priori. To address this problem, metaheuristic algorithms are hybridized with K-means to extend the capacity of K-means in handling automatic clustering problems. In this study, we proposed an improved version of an existing hybridization of the classical symbiotic organisms search algorithm with the classical K-means algorithm to provide robust and optimum data clustering performance in automatic clustering problems. Moreover, the classical K-means algorithm is sensitive to noisy data and outliers; therefore, we proposed the exclusion of outliers from the centroid update's procedure, using a global threshold of point-to-centroid distance distribution for automatic outlier detection, and subsequent exclusion, in the calculation of new centroids in the K-means phase. Furthermore, a self-adaptive benefit factor with a three-part mutualism phase is incorporated into the symbiotic organism search phase to enhance the performance of the hybrid algorithm. A population size of 40 + 2 g was used for the symbiotic organism search (SOS) algorithm for a well distributed initial solution sample, based on the central limit theorem that the selection of the right sample size produces a sample mean that approximates the true centroid on Gaussian distribution. The effectiveness and robustness of the improved hybrid algorithm were evaluated on 42 datasets. The results were compared with the existing hybrid algorithm, the standard SOS and K-means algorithms, and other hybrid and non-hybrid metaheuristic algorithms. Finally, statistical and convergence analysis tests were conducted to measure the effectiveness of the improved algorithm. The results of the extensive computational experiments showed that the proposed improved hybrid algorithm outperformed the existing SOSK-means algorithm and demonstrated superior performance compared to some of the competing hybrid and non-hybrid metaheuristic algorithms. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

23. Consensus Nature Inspired Clustering of Single-Cell RNA-Sequencing Data

Author: Amany H. Abou El-Naga, Sabah Sayed, Akram Salah, and Heba Mohsen
Subjects: Single-cell RNA-seq, automatic clustering, unsupervised learning, swarm intelligence, metaheuristic algorithms, consensus clustering, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Single-cell RNA sequencing (scRNA-seq) enables quantification of mRNA expression at the level of individual cells. scRNA-seq uncovers the disparity of cellular heterogeneity giving insights about the expression profiles of distinct cells revealing cellular differentiation. The rapid advancements in scRNA-seq technologies enable researchers to exploit questions regarding cancer heterogeneity and tumor microenvironment. The process of analyzing mainly clustering scRNA-seq data is computationally challenging due to its noisy high dimensionality nature. In this paper, a computational clustering approach is proposed to cluster scRNA-seq data based on consensus clustering using swarm intelligent optimization algorithms to accurately recognize cell subtypes. The proposed approach uses variational auto-encoders to handle the curse of dimensionality, as it operates to create a latent biologically relevant feature space representing the original data. The new latent space is then clustered using Particle Swarm Optimization Algorithm, Multi-Verse Optimization Algorithm and Grey Wolf Optimization Algorithm. A consensus solution is found using solutions returned by the swarm intelligent algorithms. The proposed approach automatically derives the number of clusters without any prior knowledge. To evaluate the performance of the proposed approach a total of four datasets have been used then a comparison against the existing methods in literature has been performed. Experimental results show that the proposed approach performs better than widely most used tools, achieving an adjusted rand index of.95,.75,.88,.9 for Biase, Goolam, Melanoma cancer and Lung cancer datasets respectively.
Published: 2022
Full Text: View/download PDF

24. Attributed Network Embedding Based on Matrix Factorization and Community Detection

Author: XU Xin-li, XIAO Yun-yue, LONG Hai-xia, YANG Xu-hua, MAO Jian-fei
Subjects: attributed network embedding, matrix factorization, automatic clustering, community detection, curvature, Computer software, QA76.75-76.765, Technology (General), T1-995
Abstract: An attributed network contains not only the complex topological structure but also the nodes with rich attribute information.It can be used to more effectively model modern information systems than traditional networks.Community detection of the attributed network has important research value in hierarchical analysis of complex systems,control of information propagation in the network,and prediction of group behavior of network users.In order to make better use of topology information and attribute information for community discovery,an attributed network embedding based on matrix factorization and community detection(CDEMF) are proposed.First,an attributed network embedding method based on matrix factorization is proposed to model the attributed proximity and the similarity of adjacent nodes calculated in term of the local link information of the network,where the low-dimensional embedding vector corresponding to each node can be obtained by a distributed algorithm of matrix decomposition,that is,the network nodes can be mapped into a collection of data points represented by low-dimensional vectors.Then the community detection method based on curvature and modularity is developed to achieve attributed network community division by clustering the data point set,which can automatically determine the number of communities contained in the data point set.CDEMF is compared with the other 8 kinds of well-known approaches on public real network datasets.The experimental results demonstrate the effectiveness and superiority of CDEMF.
Published: 2021
Full Text: View/download PDF

25. An automatic affinity propagation clustering based on improved equilibrium optimizer and t-SNE for high-dimensional data.

Author: Duan, Yuxian, Liu, Changyun, Li, Song, Guo, Xiangke, and Yang, Chunlin
Subjects: *SWARM intelligence, *MACHINE learning, *EQUILIBRIUM, *DATA distribution, *PROBLEM solving, *ALGORITHMS, *CONSUMER preferences
Abstract: Automatic clustering and dimension reduction are two of the most intriguing topics in the field of clustering. Affinity propagation (AP) is a representative graph-based clustering algorithm in unsupervised learning. However, extracting features from high-dimensional data and providing satisfactory clustering results is a serious challenge for the AP algorithm. Besides, the clustering performance of the AP algorithm is sensitive to preference. In this paper, an improved affinity propagation based on optimization of preference (APBOP) is proposed for automatic clustering on high-dimensional data. This method is optimized to solve the difficult problem of determining the preference of affinity propagation and the poor clustering effect for non-convex data distribution. First, t-distributed stochastic neighbor embedding is introduced to reduce the dimensionality of the original data to solve the redundancy problem caused by excessively high dimensionality. Second, an improved hybrid equilibrium optimizer based on the crisscross strategy (HEOC) is proposed to optimize preference selection. HEOC introduces the crisscross strategy to enhance local search and convergence efficiency. The benchmark function experiments indicate that the HEOC algorithm has better accuracy and convergence rate than other swarm intelligence algorithms. Simulation experiments on high-dimensional and real-world datasets show that APBOP has better effectiveness. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Automatic Clustering for Improved Radio Environment Maps in Distributed Applications

Author: Haithem Ben Chikha and Alaa Alaerjan
Subjects: automatic clustering, edge computing, K-means clustering, Kriging technique, radio environment map, reference signal received power, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Wireless communication greatly contributes to the evolution of new technologies, such as the Internet of Things (IoT) and edge computing. The new generation networks, including 5G and 6G, provide several connectivity advantages for multiple applications, such as smart health systems and smart cities. Adopting wireless communication technologies in these applications is still challenging due to factors such as mobility and heterogeneity. Predicting accurate radio environment maps (REMs) is essential to facilitate connectivity and improve resource utilization. The construction of accurate REMs through the prediction of reference signal received power (RSRP) can be useful in densely distributed applications, such as smart cities. However, predicting an accurate RSRP in the applications can be complex due to intervention and mobility aspects. Given the fact that the propagation environments can be different in a specific area of interest, the estimation of a common path loss exponent for the entire area produces errors in the constructed REM. Hence, it is necessary to use automatic clustering to distinguish between different environments by grouping locations that exhibit similar propagation characteristics. This leads to better prediction of the propagation characteristics of other locations within the same cluster. Therefore, in this work, we propose using the Kriging technique, in conjunction with the automatic clustering approach, in order to improve the accuracy of RSRP prediction. In fact, we adopt K-means clustering (KMC) to enhance the path loss exponent estimation. We use a dataset to test the proposed model using a set of comparative studies. The results showed that the proposed approach provides significant RSRP prediction capabilities for constructing REM, with a gain of about 3.3 dB in terms of root mean square error compared to the case without clustering.
Published: 2023
Full Text: View/download PDF

27. Automatic Data Clustering Using Hybrid Chaos Game Optimization with Particle Swarm Optimization Algorithm.

Author: Ouertani, Mohamed Wajdi, Manita, Ghaith, and Korbaa, Ouajdi
Subjects: PARTICLE swarm optimization, MATHEMATICAL optimization, IMAGE processing
Abstract: In cluster analysis, classical approaches suffer from the problem of identifying the number of clusters, known as the automatic clustering problem. Therefore, automatic clustering has become a popular research area and offers opportunities in various data analysis applications such as bioinformatics, medicine, image processing and consumer segmentation. It is considered as NP- complete problem where it is preferable to use approximate approaches. In this study, we propose an hybrid approach between chaos game optimization and particle swarm optimization (CGOPSO). The Davies-Bouldin index (DBI) is used as a main objective of the proposed approach with the purpose to find the most accurate number of cluster centroids and their positions. To assess its performance, we compared CGOPSO with different other existing algorithms in the literature over 12 classical datasets using two different validity indexes: Davies Bouldin index (DBI) and Compact-Seperated index (CSI). The experimental results have demonstrated that CGOPSO shows better performance than other algorithms. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

28. Automatic Clustering of DNA Sequences With Intelligent Techniques

Author: Yasmin A. Badr, Khaled T. Wassif, and Mahmoud Othman
Subjects: DNA sequences, automatic clustering, pulse coupled neural network, genetic algorithm, bat algorithm, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: With the discovery of new DNAs, a fundamental problem arising is how to categorize those DNA sequences into correct species. Unfortunately, identifying all data groups correctly and assigning a set of DNAs into k clusters where k must be predefined are one of the major drawbacks in clustering analysis, especially when the data have many dimensions and the number of clusters is too large and hard to guess. Furthermore, finding a similarity measure that preserves the functionality and represents both the composition and distribution of the bases in a DNA sequence is one of the main challenges in computational biology. In this paper, a new soft computing metaheuristic framework is introduced for automatic clustering to generate the optimal cluster formation and to determine the best estimate for the number of clusters. Pulse coupled neural network (PCNN) is utilized for the calculation of DNA sequence similarity or dissimilarity. Bat algorithm is hybridized with the well-known genetic algorithm to solve the automatic data clustering problem. Extensive computational experiments are conducted on the expanded human oral microbiome database (eHOMD). A comparative study between the experimental results shows that the proposed hybrid algorithm achieved superior performance over the standard genetic algorithm and bat algorithm. Moreover, the hybrid performance was compared with competing algorithms from the literature review to ascertain its superiority. Mann-Whitney-Wilcoxon rank-sum test is conducted to statistically validate the obtained clusters.
Published: 2021
Full Text: View/download PDF

29. A Hybrid Intelligent Model for the Condition Monitoring and Diagnostics of Wind Turbines Gearbox

Author: Azim Heydari, Davide Astiaso Garcia, Afef Fekih, Farshid Keynia, Lina Bertling Tjernberg, and Livio De Santoli
Subjects: Automatic clustering, condition monitoring, forecasting, GMDH neural network, multi-verse optimization, wind turbine assessment, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Wind turbines (WTs) are often operated in harsh and remote environments, thus making them more prone to faults and costly repairs. Additionally, the recent surge in wind farm installations have resulted in a dramatic increase in wind turbine data. Intelligent condition monitoring and fault warning systems are crucial to improving the efficiency and operation of wind farms and reducing maintenance costs. Gearbox is the major component that leads to turbine downtime. Its failures are mainly caused by the gearbox bearings. Devising condition monitoring approaches for the gearbox bearings is an effective predictive maintenance measure that can reduce downtime and cut maintenance cost. In this paper, we propose a hybrid intelligent condition monitoring and fault warning system for wind turbine’s gearbox. The proposed framework encompasses the following: a) clustering filter- (based on power, rotor speed, blade pitch angle, and wind speed signals)-using the automatic clustering model and ant bee colony optimization algorithm (ABC), b) prediction of gearbox bearing temperature and lubrication oil temperature signals- using variational mode decomposition (VMD), group method of data handling (GMDH) network, and multi-verse optimization (MVO) algorithm, and c) anomaly detection based on the Mahalanobis distances and wavelet transform denoising approach. The proposed condition monitoring system was evaluated using 10 min average SCADA datasets of two 2 MW on-shore wind turbines located in the south of Sweden. The results showed that this strategy can diagnose potential anomalies prior to failure and inhibit reporting alarms in healthy operations.
Published: 2021
Full Text: View/download PDF

30. Automatic Domain Decomposition in Finite Element Method -- A Comparative Study.

Author: Kaveh, Ali, Seddighian, Mohammad Reza, and Hassani, Pouya
Subjects: *FINITE element method, *GRAPH theory, *COMPARATIVE studies
Abstract: In this paper, an automatic data clustering approach is presented using some concepts of the graph theory. Some Cluster Validity Index (CVI) is mentioned, and DB Index is defined as the objective function of meta-heuristic algorithms. Six Finite Element meshes are decomposed containing two- and three- dimensional types that comprise simple and complex meshes. Six meta-heuristic algorithms are utilized to determine the optimal number of clusters and minimize the decomposition problem. Finally, corresponding statistical results are compared. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. K-Means-Based Nature-Inspired Metaheuristic Algorithms for Automatic Data Clustering Problems: Recent Advances and Future Directions.

Author: Ikotun, Abiodun M., Almutari, Mubarak S., and Ezugwu, Absalom E.
Abstract: K-means clustering algorithm is a partitional clustering algorithm that has been used widely in many applications for traditional clustering due to its simplicity and low computational complexity. This clustering technique depends on the user specification of the number of clusters generated from the dataset, which affects the clustering results. Moreover, random initialization of cluster centers results in its local minimal convergence. Automatic clustering is a recent approach to clustering where the specification of cluster number is not required. In automatic clustering, natural clusters existing in datasets are identified without any background information of the data objects. Nature-inspired metaheuristic optimization algorithms have been deployed in recent times to overcome the challenges of the traditional clustering algorithm in handling automatic data clustering. Some nature-inspired metaheuristics algorithms have been hybridized with the traditional K-means algorithm to boost its performance and capability to handle automatic data clustering problems. This study aims to identify, retrieve, summarize, and analyze recently proposed studies related to the improvements of the K-means clustering algorithm with nature-inspired optimization techniques. A quest approach for article selection was adopted, which led to the identification and selection of 147 related studies from different reputable academic avenues and databases. More so, the analysis revealed that although the K-means algorithm has been well researched in the literature, its superiority over several well-established state-of-the-art clustering algorithms in terms of speed, accessibility, simplicity of use, and applicability to solve clustering problems with unlabeled and nonlinearly separable datasets has been clearly observed in the study. The current study also evaluated and discussed some of the well-known weaknesses of the K-means clustering algorithm, for which the existing improvement methods were conceptualized. It is noteworthy to mention that the current systematic review and analysis of existing literature on K-means enhancement approaches presents possible perspectives in the clustering analysis research domain and serves as a comprehensive source of information regarding the K-means algorithm and its variants for the research community. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

32. A Hybrid Validity Index to Determine K Parameter Value of k-Means Algorithm for Time Series Clustering.

Author: Ozkok, Fatma Ozge and Celik, Mete
Subjects: K-means clustering, TIME series analysis, DATA mining
Abstract: Time series is a set of sequential data point in time order. The sizes and dimensions of the time series datasets are increasing day by day. Clustering is an unsupervised data mining technique that groups objects based on their similarities. It is used to analyze various datasets, such as finance, climate, and bioinformatics datasets. k -means is one of the most used clustering algorithms. However, it is challenging to determine the value of k parameter, which is the number of clusters. One of the most used methods to determine the number of clusters (such as k) is cluster validity indexes. Several internal and external validity indexes are used to find suitable cluster numbers based on characteristics of datasets. In this study, we propose a hybrid validity index to determine the value of k parameter of k -means algorithm. The proposed hybrid validity index comprises four internal validity indexes, such as Dunn, Silhouette, C index, and Davies–Bouldin indexes. The proposed method was applied to nine real-life finance and benchmarks time series datasets. The financial dataset was obtained from Yahoo Finance, consisting of daily closing data of stocks. The other eight benchmark datasets were obtained from UCR time series classification archive. Experimental results showed that the proposed hybrid validity index is promising for finding the suitable number of clusters with respect to the other indexes for clustering time-series datasets. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

33. Improved SOSK-Means Automatic Clustering Algorithm with a Three-Part Mutualism Phase and Random Weighted Reflection Coefficient for High-Dimensional Datasets

Author: Abiodun M. Ikotun and Absalom E. Ezugwu
Subjects: symbiotic organism search, K-means, clustering algorithms, hybrid metaheuristics, automatic clustering, outliers, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Automatic clustering problems require clustering algorithms to automatically estimate the number of clusters in a dataset. However, the classical K-means requires the specification of the required number of clusters a priori. To address this problem, metaheuristic algorithms are hybridized with K-means to extend the capacity of K-means in handling automatic clustering problems. In this study, we proposed an improved version of an existing hybridization of the classical symbiotic organisms search algorithm with the classical K-means algorithm to provide robust and optimum data clustering performance in automatic clustering problems. Moreover, the classical K-means algorithm is sensitive to noisy data and outliers; therefore, we proposed the exclusion of outliers from the centroid update’s procedure, using a global threshold of point-to-centroid distance distribution for automatic outlier detection, and subsequent exclusion, in the calculation of new centroids in the K-means phase. Furthermore, a self-adaptive benefit factor with a three-part mutualism phase is incorporated into the symbiotic organism search phase to enhance the performance of the hybrid algorithm. A population size of 40+2g was used for the symbiotic organism search (SOS) algorithm for a well distributed initial solution sample, based on the central limit theorem that the selection of the right sample size produces a sample mean that approximates the true centroid on Gaussian distribution. The effectiveness and robustness of the improved hybrid algorithm were evaluated on 42 datasets. The results were compared with the existing hybrid algorithm, the standard SOS and K-means algorithms, and other hybrid and non-hybrid metaheuristic algorithms. Finally, statistical and convergence analysis tests were conducted to measure the effectiveness of the improved algorithm. The results of the extensive computational experiments showed that the proposed improved hybrid algorithm outperformed the existing SOSK-means algorithm and demonstrated superior performance compared to some of the competing hybrid and non-hybrid metaheuristic algorithms.
Published: 2022
Full Text: View/download PDF

34. Optimized data driven fault detection and diagnosis in chemical processes.

Author: Ardali, Nahid Raeisi, Zarghami, Reza, and Gharebagh, Rahmat Sotudeh
Subjects: *CHEMICAL processes, *FAULT diagnosis, *FEATURE selection, *METAHEURISTIC algorithms, *FEATURE extraction
Abstract: • A novel fault diagnosis scheme was proposed based on optimization methods. • Nonstationary and nonlinear multivariate chemical processes were analyzed. • NSGAII was utilized for feature selection and t-SNE method was used as feature extraction and visualization method. • DBSCAN, k-means, CURE methods were utilized for non- automatic unsupervised learning investigation. • GA, ABC, DE, HS, and PSO, in combination with DB and CS clustering measures were utilized as automatic unsupervised learning investigation. • The proposed method performed well for fault detection and diagnosis of chemical processes. Fault detection and diagnosis (FDD) is crucial for ensuring process safety and product quality in the chemical industry. Despite the large amounts of process data recorded and stored in chemical plants, most of them are not well-labeled, and their conditions are not adequately specified. In this study, an optimized data-driven FDD model was developed for a chemical process based on automatic clustering algorithms. Due to data preprocessing importance, feature selection was performed by a non-dominated sorting genetic algorithm (NSGAII) based on k-means clustering. The optimal subset of features is selected by comparing clustering results for each subset. The performance of the proposed feature selection method was compared with the Fisher discriminant ratio (FDR), and XGBoost methods. The t-distributed stochastic neighbor embedding (t-SNE), Isomap, and KPCA dimension reduction methods were also employed for feature extraction. Finally, automatic clustering was performed based on metaheuristic algorithms for fault detection and diagnosis. Results were compared with non-automatic clustering methods. The performance of the proposed method was evaluated by examining the Tennessee Eastman and four water tank processes as case studies. The results showed that the proposed method is reliable and capable of online and offline chemical process fault detection and diagnosis. As a result, the findings of this study can be used to stabilize the operation of chemical processes. [Display omitted] [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. A Comparative Performance Study of Hybrid Firefly Algorithms for Automatic Data Clustering

Author: Absalom El-Shamir Ezugwu, Moyinoluwa B. Agbaje, Nahla Aljojo, Rosanne Els, Haruna Chiroma, and Mohamed Abd Elaziz
Subjects: Automatic clustering, firefly algorithm, firefly-based hybrid algorithms, clustering validity index, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: In cluster analysis, the goal has always been to extemporize the best possible means of automatically determining the number of clusters. However, because of lack of prior domain knowledge and uncertainty associated with data objects characteristics, it is challenging to choose an appropriate number of clusters, especially when dealing with data objects of high dimensions, varying data sizes, and density. In the last few decades, different researchers have proposed and developed several nature-inspired metaheuristic algorithms to solve data clustering problems. Many studies have shown that the firefly algorithm is a very robust, efficient and effective nature-inspired swarm intelligence global search technique, which has been successfully applied to solve diverse NP-hard optimization problems. However, the diversification search process employed by the firefly algorithm can lead to reduced speed and convergence rate for large-scale optimization problems. Thus this study investigates the application of four hybrid firefly algorithms to the task of automatic clustering of high density and large-scaled unlabelled datasets. In contrast to most of the existing classical heuristic-based data clustering analyses techniques, the proposed hybrid algorithms do not require any prior knowledge of the data objects to be classified. Instead, the hybrid methods automatically determine the optimal number of clusters empirically and during the program execution. Two well-known clustering validity indices, namely the Compact-Separated and Davis-Bouldin indices, are employed to evaluate the superiority of the implemented firefly hybrid algorithms. Furthermore, twelve standard ground truth clustering datasets from the UCI Machine Learning Repository are used to evaluate the robustness and effectiveness of the algorithms against those of the classical swarm optimization algorithms and other related clustering results from the literature. The experimental results show that the new clustering methods depict high superiority in comparison with existing standalone and other hybrid metaheuristic techniques in terms of clustering validity measures.
Published: 2020
Full Text: View/download PDF

36. Stock Price Prediction using Machine Learning and Swarm Intelligence

Author: I. Behravan and S. M. Razavi
Subjects: tehran stock exchange market, automatic clustering, feature selection, particle swarm optimization, support vector regression, Computer engineering. Computer hardware, TK7885-7895, Science
Abstract: Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem.Methods: In this paper, a novel machine learning approach, which works in two phases, is introduced to predict the price of a stock in the next day based on the information extracted from the past 26 days. In the first phase of the method, an automatic clustering algorithm clusters the data points into different clusters, and in the second phase a hybrid regression model, which is a combination of particle swarm optimization and support vector regression, is trained for each cluster. In this hybrid method, particle swarm optimization algorithm is used for parameter tuning and feature selection. Results: The accuracy of the proposed method has been measured by 5 companies’ datasets, which are active in the Tehran Stock Exchange market, through 5 different metrics. On average, the proposed method has shown 82.6% accuracy in predicting stock price in 1-day ahead.Conclusion: The achieved results demonstrate the capability of the method in detecting the sudden jumps in the price of a stock.
Published: 2020
Full Text: View/download PDF

37. Early warning prediction of external force destruction in transmission lines based on automatic clustering model

Author: Dayan MA
Subjects: external force destruction, automatic clustering, Canopy, K-means, data analysis, Telecommunication, TK5101-6720, Technology
Abstract: The external force destruction has become a major threat to the safe and stable operation of overhead transmission lines,bringing difficulties to the defense and early warning work.In order to solve the problem that the traditional clustering center is difficult to accurately determined and susceptible to abnormal points,an automatic clustering method for data analysis work of transmission lines was presented,and external damage data was analyzed from time and space latitude.Firstly,the cluster center was initialized in this method by using Canopy algorithm.Then,the optimized K-means algorithm was used to perform clustering.Finally,the effectiveness of this method was proved by experimental analysis.This method will be applied to the GIS module in the power information system,which can realize the spatio-temporal visualization of the analysis results and provide powerful decision support for finding cause of the external force damage of the transmission line.
Published: 2019
Full Text: View/download PDF

38. Automatic Fuzzy Clustering Using Non-Dominated Sorting Particle Swarm Optimization Algorithm for Categorical Data

Author: Thi Phuong Quyen Nguyen and R. J. Kuo
Subjects: Automatic clustering, categorical data, local density, NSPSO, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Categorical data clustering has been attracted a lot of attention recently due to its necessary in the real-world applications. Many clustering methods have been proposed for categorical data. However, most of the existing algorithms require the predefined number of clusters which is usually unavailable in real-world problems. Only a few works focused on automatic clustering, but mainly handled for numerical data. This study develops a novel automatic fuzzy clustering using non-dominated sorting particle swarm optimization (AFC-NSPSO) algorithm for categorical data. The proposed AFC-NSPSO algorithm can automatically identify the optimal number of clusters and exploit the clustering result with the corresponding selected number of clusters. In addition, a new technique is investigated to identify the maximum number of clusters in a dataset based on the local density. To select a final solution in the first Pareto front, some internal validation indices are used. The performance of the proposed AFC-NSPSO on the real-world datasets collected from the UCI machine learning repository exhibits effectiveness compared with some other existing automatic categorical clustering algorithms. Besides, this study also applies the proposed algorithm to analyze a real-world case study with an unknown number of clusters.
Published: 2019
Full Text: View/download PDF

39. Automatic Data Clustering Using Hybrid Firefly Particle Swarm Optimization Algorithm

Author: Moyinoluwa B. Agbaje, Absalom E. Ezugwu, and Rosanne Els
Subjects: Automatic clustering, firefly algorithm, particle swarm optimization, hybrid metaheuristic, compact-separated validity index, Davies-Bouldin validity index, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The firefly algorithm is a nature-inspired metaheuristic optimization algorithm that has become an important tool for solving most of the toughest optimization problems in almost all areas of global optimization and engineering practices. However, as with other metaheuristic algorithms, the performance of the firefly algorithm depends on adequate parameter tuning. In addition, its diversification as a global metaheuristic can lead to reduced speed, as well as an associated decrease in the rate of convergence when applied to solve problems with large number of variables such as data clustering problems. Clustering is an unsupervised data analysis technique used for identifying homogeneous groups of objects based on the values of their attributes. To mitigate the aforementioned drawbacks, an improved firefly algorithm is hybridized with the well-known particle swarm optimization algorithm to solve automatic data clustering problems. To investigate the performance of the proposed hybrid algorithm, it is compared with four popular metaheuristic methods from literature using twelve standard datasets from the UCI Machine Learning Repository and the two moons dataset. The extensive computational experiments and results analysis carried out shows that the proposed algorithm not only achieves superior performance over the standard firefly and particle swarm optimization algorithms, but also exhibits high level of stability and can be efficiently utilized to solve other clustering problems with high dimensionality.
Published: 2019
Full Text: View/download PDF

40. Clustering by Search in Descending Order and Automatic Find of Density Peaks

Author: Tong Liu, Hangyu Li, and Xudong Zhao
Subjects: Density-based clustering, density peaks clustering, automatic clustering, density categorization, cluster merging, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Clustering by fast search and find of density peaks published on journal Science in 2014 is a density-based clustering technique, which is not only unnecessary to determine the number of clusters in advance, but also able to recognize the clusters of arbitrary shapes. Due to a manual selection of clustering centers on a decision graph, samples which belong to one cluster may be assigned to two or more clusters and vice versa. On assumption that boundary points which keep comparable densities with cluster centers should be regarded as inner points, we make a new method which not only can find all possible clusters automatically but also can combine those with similarities simultaneously to obtain the final clusters. Unlike clustering by fast search and find of density peaks, we only focus on densities with discarding the relative metric which measures the minimum distance between a cluster center and a point with a higher density. Qualitative and quantitative experimental results on sufficient datasets demonstrate the effectiveness of our method.
Published: 2019
Full Text: View/download PDF

41. Improvements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs.

Author: Viloria, Amelec and Pineda Lezama, Omar Bonerge
Subjects: K-means clustering, ACTIVATION energy, QUALITY control charts, SMALL business, DATABASES
Abstract: The Automatic Clustering using Differential Evolution (ACDE) is one of the grouping methods capable of automatically determining the number of the cluster. However, ACDE continues making use of the strategy manual to determine the activation threshold of k, which affects its performance. In this study, the problem of ACDE is enhanced using the U Control Chart (UCC). The performance of the proposed method was tested using five data sets from the National Administrative Department of Statistics (DANE - Departamento Administrativo Nacional de Estadísticas) and the Ministry of Commerce, Industry, and Tourism of Colombia for the innovative capacity of Small and Medium-sized Enterprises (SMEs) and were assessed by the Davies Bouldin Index (DBI) and the Cosine Similarity (CS) measure. The results show that the proposed method yields excellent performance compared to prior researches for most datasets with optimal cluster number yet lowest DBI and CS measure. It can be concluded that the UCC method is able to determine k activation threshold in ACDE that caused effective determination of the cluster number for k-means clustering. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

42. K-Means-Based Nature-Inspired Metaheuristic Algorithms for Automatic Data Clustering Problems: Recent Advances and Future Directions

Author: Abiodun M. Ikotun, Mubarak S. Almutari, and Absalom E. Ezugwu
Subjects: K-means clustering, automatic clustering, nature-inspired metaheuristic algorithms, cluster analysis, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: K-means clustering algorithm is a partitional clustering algorithm that has been used widely in many applications for traditional clustering due to its simplicity and low computational complexity. This clustering technique depends on the user specification of the number of clusters generated from the dataset, which affects the clustering results. Moreover, random initialization of cluster centers results in its local minimal convergence. Automatic clustering is a recent approach to clustering where the specification of cluster number is not required. In automatic clustering, natural clusters existing in datasets are identified without any background information of the data objects. Nature-inspired metaheuristic optimization algorithms have been deployed in recent times to overcome the challenges of the traditional clustering algorithm in handling automatic data clustering. Some nature-inspired metaheuristics algorithms have been hybridized with the traditional K-means algorithm to boost its performance and capability to handle automatic data clustering problems. This study aims to identify, retrieve, summarize, and analyze recently proposed studies related to the improvements of the K-means clustering algorithm with nature-inspired optimization techniques. A quest approach for article selection was adopted, which led to the identification and selection of 147 related studies from different reputable academic avenues and databases. More so, the analysis revealed that although the K-means algorithm has been well researched in the literature, its superiority over several well-established state-of-the-art clustering algorithms in terms of speed, accessibility, simplicity of use, and applicability to solve clustering problems with unlabeled and nonlinearly separable datasets has been clearly observed in the study. The current study also evaluated and discussed some of the well-known weaknesses of the K-means clustering algorithm, for which the existing improvement methods were conceptualized. It is noteworthy to mention that the current systematic review and analysis of existing literature on K-means enhancement approaches presents possible perspectives in the clustering analysis research domain and serves as a comprehensive source of information regarding the K-means algorithm and its variants for the research community.
Published: 2021
Full Text: View/download PDF

43. Automatic clustering of data from sampling and evaluating of neuro-fuzzy network forestimatinge the distribution of Bemisia. tabaci (Hem.:Aleyrodidae)

Author: Bahram Tafaghodinia and Alireza Shabani Nejad
Subjects: automatic clustering, bemisia tabaci, genetic algorithm, neuro fuzzy network, Veterinary medicine, SF600-1100
Abstract: In this study, Neuro Fuzzy network was used to estimate the spatial distribution of Bemisia tabaci in a cucumber field in Behbahan. Pest density assessments were performed based on a 10 m × 10 m grid pattern pattern and a total of 100 sampling units in. In this method latitude and longitude information was used the input data and output of method showed the number of pest. To determine the sensitivity of this method to different levels of the pest after collecting samples, automatic clustering method was used to determine the number of clusters Davies and Bouldin index was used to evaluae criterion. In order to finding the answer, Clustering Search Space Genetic Algorithm was used.Davies and Bouldin index (0.46) showed that the data should be divided into three clusters. Results indicated average, variance, statistical distribution and also coefficient of determination in the observed and the estimated Bemisia tabaci density were not significantly different.Our map showed that patchy pest distribution offers large potential for using site-specific pest control on this field.
Published: 2017

44. Automatic clustering by multi-objective genetic algorithm with numeric and categorical features.

Author: Dutta, Dipankar, Sil, Jaya, and Dutta, Paramartha
Subjects: *GENETIC algorithms, *CATEGORIES (Mathematics), *PLURALITY voting, *MACHINE learning
Abstract: • We have developed a clustering algorithm for an unknown number of clusters by MOGA. • It works with continuous and categorical featured data sets. • It can work with data sets having missing values. • The final solution is selected by majority vote by all non-dominated solutions. • Context-sensitive and cluster-orient genetic operators are designed. Many clustering algorithms categorized as K -clustering algorithm require the user to predict the number of clusters (K) to do clustering. Due to lack of domain knowledge an accurate value of K is difficult to predict. The problem becomes critical when the dimensionality of data points is large; clusters differ widely in shape, size, and density; and when clusters are overlapping in nature. Determining the suitable K is an optimization problem. Automatic clustering algorithms can discover the optimal K. This paper presents an automatic clustering algorithm which is superior to K -clustering algorithm as it can discover an optimal value of K. Iterative hill-climbing algorithms like K -Means work on a single solution and converge to a local optimum solution. Here, Genetic Algorithms (GA s) find out near global optimum solutions, i.e. optimal K as well as the optimal cluster centroids. Single-objective clustering algorithms are adequate for efficiently grouping linearly separable clusters. For non-linearly separable clusters they are not so good. So for grouping non-linearly separable clusters, we apply Multi-Objective Genetic Algorithm (MOGA) by minimizing the intra-cluster distance and maximizing inter-cluster distance. Many existing MOGA based clustering algorithms are suitable for either numeric or categorical features. This paper pioneered employing MOGA for automatic clustering with mixed types of features. Statistical testing on experimental results on real-life benchmark data sets from the University of California at Irvine (UCI) machine learning repository proves the superiority of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

45. PSO-based Dynamic Distributed Algorithm for Automatic Task Clustering in a Robotic Swarm.

Author: Asma, Ayari and Sadok, Bouamama
Subjects: PARTICLE swarm optimization, AGGREGATION (Robotics), DISTRIBUTED algorithms, TRAVELING salesman problem, GENETIC algorithms
Abstract: The Multi-Robot Task Allocation (MRTA) problem has recently become a key research topic. Task allocation is the problem of mapping tasks to robots, such that the most appropriate robot is selected to perform the most fitting task, leading to all tasks being optimally accomplished. Expanding the number of tasks and robots may cause the collaboration among the robots to become tougher. Since this process requires high computational time, this paper describes a technique that reduces the size of the explored state space, by partitioning the tasks into clusters. In real-world problems, the absence of information regarding the number of clusters is ordinarily occurring. Hence, a dynamic clustering is auspicious for partitioning the tasks to an appropriate number of clusters. In this paper, we address the problem of MRTA by putting forward a new simple, automatic and efficient clustering algorithm of the robots' tasks based on a dynamic distributed particle swarm optimization, namely, ACD2PSO. Our approach is made out of two stages: stage I groups the tasks into clusters using the dynamic distributed particle swarm optimization (D2PSO) algorithm and stage II allocates the robots to the clusters. The assignment of robots to the clusters is represented as multiple traveling salesman problems (MTSP). Computational experiments were carried out to prove the effectiveness of our approach in term of clustering time, cost, and the MRTA time, compared to the distributed particle swarm optimization (dPSO) and genetic algorithm (GA). Thanks to the D2PSO algorithm, stagnation and local optima issues are avoided by adding assorted variety to the population, without losing the fast convergence of PSO. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

46. Real-time Kinect-based air-writing system with a novel analytical classifier.

Author: Mohammadi, Shahram and Maleki, Reza
Abstract: Air-writing is an attractive method of interaction between human and machine due to lack of any interface device on the user side. After removing existing limitations and solving the current challenges, it can be used in many applications in the future. In this paper, using the Kinect depth and color images, an air-writing system is proposed to identify single characters such as digits or letters and connected characters such as numbers or words. In this system, automatic clustering, slope variations detection, and a novel analytical classification are proposed as new approaches to eliminate noise in the trajectory from the depth image and hand segmentation, to extract the feature vector, and to identify the character from the feature vector, respectively. Experimental results show that the proposed system can successfully identify single characters and connected characters with the average recognition rate of 97%. It provides a better result than other similar approaches proposed in the literature. In the proposed system, the character recognition time is quite low, about 3 ms, because of using a novel analytical classifier. Evaluation of 4 classifiers shows that the proposed classifier has a higher speed and precision than the SVM, HMM, and K-nearest neighbors classifiers. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

47. Improvements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs.

Author: Viloria, Amelec and Pineda Lezama, Omar Bonerge
Subjects: K-means clustering, ACTIVATION energy, QUALITY control charts, SMALL business, DATABASES
Abstract: The Automatic Clustering using Differential Evolution (ACDE) is one of the grouping methods capable of automatically determining the number of the cluster. However, ACDE continues making use of the strategy manual to determine the activation threshold of k, which affects its performance. In this study, the problem of ACDE is enhanced using the U Control Chart (UCC). The performance of the proposed method was tested using five data sets from the National Administrative Department of Statistics (DANE - Departamento Administrativo Nacional de Estadísticas) and the Ministry of Commerce, Industry, and Tourism of Colombia for the innovative capacity of Small and Medium-sized Enterprises (SMEs) and were assessed by the Davies Bouldin Index (DBI) and the Cosine Similarity (CS) measure. The results show that the proposed method yields excellent performance compared to prior researches for most datasets with optimal cluster number yet lowest DBI and CS measure. It can be concluded that the UCC method is able to determine k activation threshold in ACDE that caused effective determination of the cluster number for k-means clustering. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

48. A novel combinatorial merge-split approach for automatic clustering using imperialist competitive algorithm.

Author: Aliniya, Zahra and Mirroshandel, Seyed Abolghasem
Subjects: *COMBINATORIAL optimization, *IMPERIALIST competitive algorithm, *CLASSIFICATION algorithms, *NUMBER theory, *CLUSTER analysis (Statistics), *PROBLEM solving
Abstract: Highlights • Improving results by combining random and homogeneity based merge-split method. • Reducing number of empty clusters by attention to data density for selecting center. • Avoid falling into local optimum points in the proposed assimilation step. • State-of-the-art accuracies for solving automatic clustering problems. Abstract Cluster analysis has a wide application in many areas, including pattern recognition, information retrieval, and image processing. In most real-world clustering problems, the number of clusters must be predetermined. Automatic clustering is a promising solution for this challenge which automatically determines the number and structure of clusters in data. In recent years, the evolutionary algorithm due to their search mechanisms has been popular in solving automatic clustering problems. Imperialist Competitive Algorithm (ICA) is a successful evolutionary algorithm. In this paper, for the first time, Imperialist Competitive Algorithm (ICA) is used for solving automatic clustering problems, called "the automatic clustering using ICA (AC-ICA)". In the proposed algorithm, in order to increase the exploration ability, the movement of colonies toward the imperialist was changed at the assimilation step. A new method has been provided for changing the number of clusters by combining random and homogeneity based merge-split approach. Furthermore, an efficient method based on density has been proposed for reinitializing empty cluster centers. To use AC-ICA in automatic clustering, the initialization and imperialist competition steps were changed. Based on changes in these two steps, a framework was provided for changing different types of ICA to solve automatic clustering problems. Then, the basic ICA and its three recently developed types, were changed by this framework and their performances in automatic clustering were compared to AC-ICA. The examinations were done on six synthetic and ten real word data sets. The comparison of the proposed algorithm's results with basic ICA, its three recently developed types and several state-of-art automatic clustering methods, shows AC-ICA's superiority in terms of the speed of convergence to the optimal solution and quality of the obtained solution. We also applied our algorithm to a real world application (i.e., face recognition) and the achieved results were acceptable. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

49. Early warning prediction of external force destruction in transmission lines based on automatic clustering model.

Author: MA Dayan
Abstract: The external force destruction has become a major threat to the safe and stable operation of overhead transmission lines, bringing difficulties to the defense and early warning work. In order to solve the problem that the traditional clustering center is difficult to accurately determined and susceptible to abnormal points, an automatic clustering method for data analysis work of transmission lines was presented, and external damage data was analyzed from time and space latitude. Firstly, the cluster center was initialized in this method by using Canopy algorithm. Then, the optimized K-means algorithm was used to perform clustering. Finally, the effectiveness of this method was proved by experimental analysis. This method will be applied to the GIS module in the power information system, which can realize the spatio-temporal visualization of the analysis results and provide powerful decision support for finding cause of the external force damage of the transmission line. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

50. DiscontClust: open program for automatic grouping of discontinuity orientations data.

Author: Suarez-Burgoa, Ludger O.
Subjects: *REGRESSION discontinuity design, *DATA analysis, *INFORMATION storage & retrieval systems, *ALGORITHMS, *DATA
Abstract: This article describes the algorithm of the spectral method that was chosen to create an applied computational program, which performs an automatic clustering of oriented data of discontinuities-measurements in a rock mass. The clustering differentiation of discontinuity families normally is done by a heuristic selection of sets based on a density diagram of discontinuities. But, an automatic clustering based on a statistical algorithm may be faster and passive to be replicated. In addition, an automatic clustering that has been presented as an open program is useful for the implementation in other calculi in the process of data analysis of discontinuity orientations. The article explains in detail the algorithms and its implementation; it has validations and examples that guarantees that the program can be used. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

144 results on '"automatic clustering"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources