33 results on '"Medoid"'
Search Results
2. Incorporating Fuzziness to CLARANS
- Author
-
Ghosh, Sampreeti, Mitra, Sushmita, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Chaudhury, Santanu, editor, Mitra, Sushmita, editor, Murthy, C. A., editor, Sastry, P. S., editor, and Pal, Sankar K., editor
- Published
- 2009
- Full Text
- View/download PDF
3. MACOC: A medoid-based ACO clustering algorithm
- Author
-
Héctor D. Menéndez, David Camacho, Fernando E. B. Otero, UAM. Departamento de Ingeniería Informática, and Análisis de Datos e Inteligencia Aplicada (ING EPS-012)
- Subjects
Informática ,Computer science ,Centroid ,computer.software_genre ,Partition (database) ,Medoid ,Ant Colony Optimization ,Clustering ,Machine Learning ,ComputingMethodologies_PATTERNRECOGNITION ,Unsupervised learning ,Data Mining ,Data mining ,Q335 ,Cluster analysis ,computer - Abstract
Proceedings of 9th International Conference, ANTS 2014, Brussels, Belgium, September 10-12, 2014., he final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-09952-1_11, The application of ACO-based algorithms in data mining is growing over the last few years and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works concerning unsupervised learning have been focused on clustering, showing great potential of ACO-based techniques. This work presents an ACO-based clustering algorithm inspired by the ACO Clustering (ACOC) algorithm. The proposed approach restructures ACOC from a centroid-based technique to a medoid-based technique, where the properties of the search space are not necessarily known. Instead, it only relies on the information about the distances amongst data. The new algorithm, called MACOC, has been compared against well-known algorithms (K-means and Partition Around Medoids) and with ACOC. The experiments measure the accuracy of the algorithm for both synthetic datasets and real-world datasets extracted from the UCI Machine Learning Repository., This work has been partly supported by: Spanish Ministry of Science and Education under project TIN2010-19872 and Savier – an Airbus Defense & Space project (FUAM-076914 and FUAM-076915).
- Published
- 2014
- Full Text
- View/download PDF
4. Cluster Analysis by a Class of Splicing P Systems
- Author
-
Junli Xu, Jie Xue, and Xiyu Liu
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Theoretical computer science ,Computer science ,RNA splicing ,Cluster (physics) ,Process (computing) ,Cluster analysis ,Class (biology) ,P system ,Medoid - Abstract
This paper describes a splicing P system for clustering analysis. We use the thought of partitioning around medoids algorithm to design clustering by splicing P systems and implement it in clustering. Then we give an example with 10 points to indicate the feasibility of the provided algorithm. All the processes are conducted in membranes. The process of the proposed algorithm is provided and an instance to prove its feasibility is presented.
- Published
- 2014
- Full Text
- View/download PDF
5. Beyond Kmedoids: Sparse Model Based Medoids Algorithm for Representative Selection
- Author
-
Yalin Zhang, Feidie Liang, Yu Wang, Jintao Li, and Sheng Tang
- Subjects
Computer science ,business.industry ,Boundary (topology) ,Pattern recognition ,computer.software_genre ,Column (database) ,Automatic summarization ,Medoid ,Image (mathematics) ,Data point ,Point (geometry) ,Data mining ,Artificial intelligence ,Linear combination ,business ,computer ,Algorithm - Abstract
We consider the problem of seeking representative subset of dataset, which can efficiently serve as the condensed view of the entire dataset. The Kmedoids algorithm is a commonly used unsupervised method, which selects center points as representatives. Those center points are mainly located in high density areas and surrounded by other data points. However, boundary points in the low density areas, which are useful for classification problem, are usually overlooked. In this paper we propose a sparse model based medoids algorithm (Smedoids) which aims to learn a special dictionary. Each column of this dictionary is a representative data point from the dataset, and each data point of the dataset can be described well by a linear combination of the columns of this dictionary. In this way, center and boundary points are all selected as representatives. Experiments evaluate the performances of our method for finding representatives of real datasets on the image and video summarization problem and the multi-class classification problem, and our method is shown to out-perform state-of-the-art in accuracy.
- Published
- 2013
- Full Text
- View/download PDF
6. Weighting Features for Partition around Medoids Using the Minkowski Metric
- Author
-
Renato Cordeiro de Amorim and Trevor Fenner
- Subjects
Combinatorics ,ComputingMethodologies_PATTERNRECOGNITION ,business.industry ,Minkowski space ,Partition (number theory) ,Pattern recognition ,Artificial intelligence ,business ,Medoid ,Mathematics ,Weighting - Abstract
In this paper we introduce the Minkowski weighted partition around medoids algorithm (MW-PAM). This extends the popular partition around medoids algorithm (PAM) by automatically assigning K weights to each feature in a dataset, where K is the number of clusters. Our approach utilizes the within-cluster variance of features to calculate the weights and uses the Minkowski metric. We show through many experiments that MW-PAM, particularly when initialized with the Build algorithm (also using the Minkowski metric), is superior to other medoid-based algorithms in terms of both accuracy and identification of irrelevant features.
- Published
- 2012
- Full Text
- View/download PDF
7. Sentic Medoids: Organizing Affective Common Sense Knowledge in a Multi-Dimensional Vector Space
- Author
-
Chris Eckl, Thomas Mazzocco, Amir Hussain, and Erik Cambria
- Subjects
Commonsense knowledge ,Computer science ,business.industry ,Sentiment analysis ,Context (language use) ,Space (commercial competition) ,computer.software_genre ,Medoid ,Artificial intelligence ,business ,computer ,Semantic Web ,Natural language ,Natural language processing - Abstract
Existing approaches to opinion mining and sentiment analysis mainly rely on parts of text in which opinions and sentiments are explicitly expressed such as polarity terms and affect words. However, opinions and sentiments are often conveyed implicitly through context and domain dependent concepts, which make purely syntactical approaches ineffective. To overcome this problem, we have recently proposed Sentic Computing, a multi-disciplinary approach to opinion mining and sentiment analysis that exploits both computer and social sciences to better recognize and process opinions and sentiments over the Web. Among other tools, Sentic Computing includes AffectiveSpace, a language visualization system that transforms natural language from a linguistic form into a multi-dimensional space. In this work, we present a new technique to better cluster this vector space and, hence, better organize and reason on the affective common sense knowledge in it contained.
- Published
- 2011
- Full Text
- View/download PDF
8. Face Recognition Using Multi-exemplar Discriminative Power Analysis
- Author
-
K. K. Achary and Ganesh Bhat
- Subjects
business.industry ,Computer science ,Feature vector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Nonparametric statistics ,Pattern recognition ,computer.software_genre ,Facial recognition system ,Medoid ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Computer Science::Computer Vision and Pattern Recognition ,Artificial intelligence ,Data mining ,Mean-shift ,Cluster analysis ,business ,computer ,Parametric statistics - Abstract
Face recognition systems based on holistic approach consider facial images as high-dimensional data points generated from a multi modal distribution. Recent efforts in methods based on this assumption have been in deriving multi-descriptor nonparametric classification systems. However, for problems such as face recognition, identification of multiple descriptors for class representation is not trivial. Most of the recent nonparametric classification techniques solve this problem in a low dimensional feature space using supervised parametric density based clustering methods. In this paper, we investigate two unsupervised non parametric clustering methods namely, Mean Shift technique and Partition Around Medoid clustering, to obtain multiple descriptors in the feature space derived using Discriminant Power Analysis (DPA). Simulation results of proposed nonparametric multi descriptor extension of DPA on the AR and Yale database indicate improvement in recognition rate of DPA over parametric multi descriptor and single descriptor classifiers.
- Published
- 2011
- Full Text
- View/download PDF
9. The Impact of Triangular Inequality Violations on Medoid-Based Clustering
- Author
-
Saaid Baraty, Dan A. Simovici, and Catalin Zara
- Subjects
Metric space ,ComputingMethodologies_PATTERNRECOGNITION ,Dissimilarity space ,Triangle inequality ,Computer science ,Metric (mathematics) ,Point (geometry) ,Rectifier (neural networks) ,Topology ,Cluster analysis ,Medoid - Abstract
We evaluate the extent to which a dissimilarity space differs from a metric space by introducing the notion of metric point and metricity in a dissimilarity space. The effect of triangular inequality violations on medoid-based clustering of objects in a dissimilarity space is examined and the notion of rectifier is introduced to transform a dissimilarity space into a metric space.
- Published
- 2011
- Full Text
- View/download PDF
10. Non-user-Specific Multivariate Biometric Discretization with Medoid-Based Segmentation
- Author
-
Andrew Beng Jin Teoh and Meng-Hui Lim
- Subjects
Discretization ,Biometrics ,Feature (computer vision) ,Computer science ,business.industry ,Feature extraction ,Univariate ,Segmentation ,Pattern recognition ,Artificial intelligence ,business ,Medoid ,Discretization of continuous features - Abstract
Univariate discretization approach that transforms continuous attributes into discrete elements/binary string based on discrete/binary feature extraction on a single dimensional basis have been attracting much attention in the biometric community mainly to derive biometric-based cryptographic key derivation for security purpose. However, since components of biometric feature are interdependent, univariate approach may destroy important interactions with such attributes and thus very likely to cause features being discretized suboptimally. In this paper, we introduce a multivariate discretization approach encompassing a medoid-based segmentation with effective segmentation encoding technique. Promising empirical results on two benchmark face datasets significantly justify the superiority of our approach with reference to several non-user-specific univariate biometric discretization schemes.
- Published
- 2011
- Full Text
- View/download PDF
11. Data Driven Bandwidth for Medoid Shift Algorithm
- Author
-
Syed Zulqarnain Gilani and Naveed Iqbal Rao
- Subjects
Set (abstract data type) ,Pixel ,Mean squared error ,Computer Science::Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Bandwidth (computing) ,Estimator ,Point (geometry) ,Cluster analysis ,Algorithm ,Medoid ,Mathematics - Abstract
Adaptive data driven bandwidth for medoidshift algorithm has been proposed in this work. The proposed method has made it possible to perform clustering on a variety of high resolution statistically different images. Experiments are performed on natural images as well as daily life images. The images have been chosen such that a comparison analysis between fixed sample point estimator k and adaptive k can be carried out in detail. The results show that a fixed value of k=10 is good for statistically compact images but gives undesirable results in dispersed images. Data driven bandwidth is proposed for each data point/ pixel as well as the complete data set/ image. Experimental results have shown our algorithm to be robust. Performance is evaluated on the basis of root mean square error for the quality of clusters.
- Published
- 2011
- Full Text
- View/download PDF
12. Clustering Categorical Data in a Multiobjective Framework
- Author
-
Ujjwal Maulik, Sanghamitra Bandyopadhyay, and Anirban Mukhopadhyay
- Subjects
business.industry ,Computer science ,Feature vector ,Pattern recognition ,Fuzzy logic ,Medoid ,Euclidean distance ,Set (abstract data type) ,ComputingMethodologies_PATTERNRECOGNITION ,Attribute domain ,Artificial intelligence ,business ,Cluster analysis ,Categorical variable - Abstract
Most of the clustering algorithms are designed for such datasets where the dissimilarity between any two points of the dataset can be computed using standard distance measures such as the Euclidean distance. However, many real-life datasets are categorical in nature, where no natural ordering can be found among the elements in the attribute domain. In such situations, the clustering algorithms, such as K-means [238] and fuzzy C-means (FCM) [62], cannot be applied. The K-means algorithm computes the center of a cluster by computing the mean of the set of feature vectors belonging to that cluster. However, as categorical datasets do not have any inherent distance measure, computing the mean of a set of feature vectors is meaningless. A variation of the K-means algorithm, namely Partitioning Around Medoids (PAM) or K-medoids [243], has been proposed for such kinds of datasets.
- Published
- 2011
- Full Text
- View/download PDF
13. Consensus Multiobjective Differential Crisp Clustering for Categorical Data Analysis
- Author
-
Ujjwal Maulik, Soumyendu Sekhar Bandyopadhyay, Dariusz Plewczynski, and Indrajit Saha
- Subjects
Set (abstract data type) ,Euclidean distance ,Differential evolution ,Consensus clustering ,Data mining ,Cluster analysis ,computer.software_genre ,Categorical variable ,computer ,Multi-objective optimization ,Medoid ,Mathematics - Abstract
In this article, an evolutionary crisp clustering technique is described that uses a new consensus multiobjective differential evolution. The algorithm is therefore able to optimize two conflicting cluster validity measures simultaneously and provides resultant Pareto optimal set of nondominated solutions. Thereafter the problem of choosing the best solution from resultant Pareto optimal set is resolved by creation of consensus clusters using voting procedure. The proposed method is used for analyzing the categorical data where no such natural ordering can be found among the elements in categorical domain. Hence no inherent distance measure, like the Euclidean distance, would work to compute the distance between two categorical objects. Index-coded encoding of the cluster medoids (centres) is used for this purpose. The effectiveness of the proposed technique is provided for artificial and real life categorical data sets. Also statistical significance test has been carried out to establish the statistical significance of the clustering results. Matlab version of the software is available at http://bio.icm.edu.pl/~darman/CMODECC.
- Published
- 2010
- Full Text
- View/download PDF
14. Median Topographic Maps for Biomedical Data Sets
- Author
-
Hammer, Barbara, Hasenfuss, A., Rossi, F., Biehl, M., Hammer, B., Verleysen, M., Villmann, T., Institute of Computer Science, Institute of Computer Science-Clausthal University of Technology (TU Clausthal), Laboratoire Traitement et Communication de l'Information (LTCI), Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Villmann, Th., Biehl, M., Hammer, B., and Verleysen
- Subjects
Neural gas ,Computer science ,Pairwise data ,Data structure ,computer.software_genre ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Clustering ,Medoid ,Matrix (mathematics) ,Neural Gas ,ComputingMethodologies_PATTERNRECOGNITION ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Biomedical data ,Data analysis ,Data mining ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,Cluster analysis ,Focus (optics) ,Dissimilarity data ,computer - Abstract
Median clustering extends popular neural data analysis methods such as the self-organizing map or neural gas to general data structures given by a dissimilarity matrix only. This offers flexible and robust global data inspection methods which are particularly suited for a variety of data as occurs in biomedical domains. In this chapter, we give an overview about median clustering and its properties and extensions, with a particular focus on efficient implementations adapted to large scale data analysis.
- Published
- 2009
- Full Text
- View/download PDF
15. Subgroup Discovery in Data Sets with Multi–dimensional Responses: A Method and a Case Study in Traumatology
- Author
-
Lan Umek, Jean-Hugues Chauchat, Blaž Zupan, Marko Toplak, Dragica Smrke, Annie Morin, and Gregor Makovec
- Subjects
Contingency table ,Dependency (UML) ,Computer science ,business.industry ,Experimental data ,Machine learning ,computer.software_genre ,Outcome (probability) ,Medoid ,Subject-matter expert ,Feature (machine learning) ,Artificial intelligence ,Data mining ,Cluster analysis ,business ,computer - Abstract
Biomedical experimental data sets may often include many features both at input (description of cases, treatments, or experimental parameters) and output (outcome description). State-of-the-art data mining techniques can deal with such data, but would consider only one output feature at the time, disregarding any dependencies among them. In the paper, we propose the technique that can treat many output features simultaneously, aiming at finding subgroups of cases that are similar both in input and output space. The method is based on k -medoids clustering and analysis of contingency tables, and reports on case subgroups with significant dependency in input and output space. We have used this technique in explorative analysis of clinical data on femoral neck fractures. The subgroups discovered in our study were considered meaningful by the participating domain expert, and sparked a number of ideas for hypothesis to be further experimentally tested.
- Published
- 2009
- Full Text
- View/download PDF
16. Partitional Conceptual Clustering of Web Resources Annotated with Ontology Languages
- Author
-
Claudia d'Amato, Floriana Esposito, and Nicola Fanizzi
- Subjects
Information retrieval ,business.industry ,Computer science ,Conceptual clustering ,Ontology language ,Semantics ,computer.software_genre ,Medoid ,Description logic ,Knowledge base ,Artificial intelligence ,business ,Cluster analysis ,Semantic Web ,computer ,Natural language processing - Abstract
The paper deals with the problem of cluster discovery in the context of Semantic Web knowledge bases. A partitional clustering algorithm is presented. It is applied for grouping resources contained in knowledge bases and expressed in the standard ontology languages. The method exploits a language-independent semi-distance measure for individuals that is based on the semantics of the resources w.r.t. a context represented by a set of concept descriptions (discriminating features). The clustering algorithm adapts Bisecting k-Means method to work with medoids. Besides, we propose simple mechanisms to assign each cluster an intensional definition that may suggest new concepts for the knowledge base (vivification). A final experiment demonstrates the validity of the approach through absolute quality indices for clustering results.
- Published
- 2009
- Full Text
- View/download PDF
17. Are Model-Based Clustering and Neural Clustering Consistent? A Case Study from Bioinformatics
- Author
-
Antonina Starita, Elia Biganzoli, Davide Bacciu, and Paulo J. G. Lisboa
- Subjects
Artificial neural network ,Computer science ,business.industry ,Gaussian ,Bayesian probability ,Pattern recognition ,Mixture model ,Medoid ,Data set ,Determining the number of clusters in a data set ,symbols.namesake ,symbols ,Artificial intelligence ,Cluster analysis ,business - Abstract
A novel neural network clustering algorithm, CoRe, is benchmarked against previously published results on a breast cancer data set and applying the method of Partition Around Medoids (PAM). The data serve to compare the samples partitions obtained with the neural network, PAM and model-based algorithms, namely Gaussian Mixture Model (GMM), Variational Bayesian Gaussian Mixture (VBG) and Variational Bayesian Mixtures with Splitting (VBS). It is found that CoRe, on the one hand, agrees with the previously published partitions; on the other hand, it supports the existence of a supplementary cluster that we hypothesize to be an additional tumor subgroup with respect to those previously identified by PAM.
- Published
- 2008
- Full Text
- View/download PDF
18. Evolutionary Clustering in Description Logics: Controlling Concept Formation and Drift in Ontologies
- Author
-
Floriana Esposito, Claudia d'Amato, and Nicola Fanizzi
- Subjects
Fitness function ,Concept drift ,Description logic ,Computer science ,Concept learning ,Metric (mathematics) ,Supervised learning ,Novelty ,Data mining ,Cluster analysis ,computer.software_genre ,computer ,Medoid - Abstract
We present a method based on clustering techniques to detect concept drift or novelty in a knowledge base expressed in Description Logics. The method exploits an effective and language-independent semi-distance measure defined for the space of individuals, that is based on a finite number of dimensions corresponding to a committee of discriminating features (represented by concept descriptions). In the algorithm, the possible clusterings are represented as strings of central elements (medoids, w.r.t. the given metric) of variable length. The number of clusters is not required as a parameter; the method is able to find an optimal choice by means of the evolutionary operators and of a fitness function. An experimentation with some ontologies proves the feasibility of our method and its effectiveness in terms of clustering validity indices. Then, with a supervised learning phase, each cluster can be assigned with a refined or newly constructed intensional definition expressed in the adopted language.
- Published
- 2008
- Full Text
- View/download PDF
19. Evolutionary Rough k-Medoid Clustering
- Author
-
Richard Weber, Martin Lampart, and Georg Peters
- Subjects
Computer science ,business.industry ,Davies–Bouldin index ,Boundary (topology) ,Pattern recognition ,Medoid ,Set (abstract data type) ,Core (graph theory) ,Cluster (physics) ,Rough set ,Artificial intelligence ,Cluster analysis ,business ,Algorithm - Abstract
Recently, clustering algorithms based on rough set theory have gained increasing attention. For example, Lingras et al. introduced a rough k-means that assigns objects to lower and upper approximations of clusters. The objects in the lower approximation surely belong to a cluster while the membership of the objects in an upper approximation is uncertain. Therefore, the core cluster, defined by the objects in the lower approximation is surrounded by a buffer or boundary set with objects with unclear membership status. In this paper, we introduce an evolutionary rough k-medoid clustering algorithm. Evolutionary rough k-medoid clustering belongs to the families of Lingras' rough k-means and classic k-medoids algorithms. We apply the evolutionary rough k-medoids to synthetic as well as to real data sets and compare the results to Lingras' rough k-means. We also introduce a rough version of the Davies-Bouldin-Index as a cluster validity index for the family of rough clustering algorithms.
- Published
- 2008
- Full Text
- View/download PDF
20. Conceptual Clustering and Its Application to Concept Drift and Novelty Detection
- Author
-
Floriana Esposito, Nicola Fanizzi, and Claudia d'Amato
- Subjects
Fuzzy clustering ,Concept drift ,Computer science ,business.industry ,Correlation clustering ,Constrained clustering ,Conceptual clustering ,computer.software_genre ,Machine learning ,Medoid ,Canopy clustering algorithm ,Data mining ,Artificial intelligence ,Cluster analysis ,business ,computer - Abstract
The paper presents a clustering method which can be applied to populated ontologies for discovering interesting groupings of resources therein. The method exploits a simple, yet effective and language-independent, semi-distance measure for individuals, that is based on their underlying semantics along with a number of dimensions corresponding to a set of concept descriptions (discriminating features committee). The clustering algorithm is a partitional method and it is based on the notion of medoids w.r.t. the adopted semi-distance measure. Eventually, it produces a hierarchical organization of groups of individuals. A final experiment demonstrates the validity of the approach using absolute quality indices. We propose two possible exploitations of these clusterings: concept formation and detecting concept drift or novelty.
- Published
- 2008
- Full Text
- View/download PDF
21. Conceptual Clustering Applied to Ontologies
- Author
-
Floriana Esposito, Nicola Fanizzi, and Claudia d'Amato
- Subjects
Brown clustering ,Fuzzy clustering ,business.industry ,Correlation clustering ,Constrained clustering ,Conceptual clustering ,Machine learning ,computer.software_genre ,Medoid ,Canopy clustering algorithm ,Artificial intelligence ,Cluster analysis ,business ,computer ,Mathematics - Abstract
A clustering method is presented which can be applied to semantically annotated resources in the context of ontological knowledge bases. This method can be used to discover emerging groupings of resources expressed in the standard ontology languages. The method exploits a language-independent semi-distance measure over the space of resources, that is based on their semantics w.r.t. a number of dimensions corresponding to a committee of discriminating features represented by concept descriptions. A maximally discriminating group of features can be constructed through a feature construction method based on genetic programming. The evolutionary clustering algorithm proposed is based on the notion of medoids applied to relational representations. It is able to induce a set of clusters by means of a fitness function based on a discernibility criterion. An experimentation with some ontologies proves the feasibility of our method.
- Published
- 2008
- Full Text
- View/download PDF
22. A Multi-relational Hierarchical Clustering Method for Datalog Knowledge Bases
- Author
-
Nicola Fanizzi, Claudia d'Amato, and Floriana Esposito
- Subjects
Theoretical computer science ,Fuzzy clustering ,business.industry ,Computer science ,Correlation clustering ,Semantics ,Machine learning ,computer.software_genre ,Medoid ,Datalog ,Hierarchical clustering ,Inductive logic programming ,Artificial intelligence ,business ,Cluster analysis ,computer ,computer.programming_language - Abstract
A clustering method is presented which can be applied to relational knowledge bases (e.g. DATALOG deductive databases). It can be used to discover interesting groups of resources through their (semantic) annotations expressed in the standard logic programming languages. The method exploits an effective and language-independent semi-distance measure for individuals., that is based on the resource semantics w.r.t. a number of dimensions corresponding to a committee of features represented by a group of concept descriptions (discriminating features). The algorithm is a fusion of the classic BISECTING K-MEANS with approaches based on medoids that are typically applied to relational representations. We discuss its complexity and potential applications to several tasks.
- Published
- 2008
- Full Text
- View/download PDF
23. Quick Shift and Kernel Methods for Mode Seeking
- Author
-
Stefano Soatto and Andrea Vedaldi
- Subjects
Mathematical optimization ,ComputingMethodologies_PATTERNRECOGNITION ,Kernel method ,Variable kernel density estimation ,Kernel embedding of distributions ,Computer Science::Computer Vision and Pattern Recognition ,Radial basis function kernel ,Mean-shift ,Cluster analysis ,Algorithm ,Kernel principal component analysis ,Medoid ,Mathematics - Abstract
We show that the complexity of the recently introduced medoid-shift algorithm in clustering N points is O(N 2), with a small constant, if the underlying distance is Euclidean. This makes medoid shift considerably faster than mean shift, contrarily to what previously believed. We then exploit kernel methods to extend both mean shift and the improved medoid shift to a large family of distances, with complexity bounded by the effective rank of the resulting kernel matrix, and with explicit regularization constraints. Finally, we show that, under certain conditions, medoid shift fails to cluster data points belonging to the same mode, resulting in over-fragmentation. We propose remedies for this problem, by introducing a novel, simple and extremely efficient clustering algorithm, called quick shift, that explicitly trades off under- and over-fragmentation. Like medoid shift, quick shift operates in non-Euclidean spaces in a straightforward manner. We also show that the accelerated medoid shift can be used to initialize mean shift for increased efficiency. We illustrate our algorithms to clustering data on manifolds, image segmentation, and the automatic discovery of visual categories. © 2008 Springer Berlin Heidelberg.
- Published
- 2008
- Full Text
- View/download PDF
24. A Hierarchical Clustering Procedure for Semantically Annotated Resources
- Author
-
Nicola Fanizzi, Floriana Esposito, and Claudia d'Amato
- Subjects
Information retrieval ,Inductive logic programming ,Description logic ,business.industry ,Computer science ,Cluster analysis ,Semantics ,business ,Semantic Web ,Social Semantic Web ,Medoid ,Hierarchical clustering - Abstract
A clustering method is presented which can be applied to relational knowledge bases. It can be used to discover interesting groupings of resources through their (semantic) annotations expressed in the standard languages employed for modeling concepts in the Semantic Web. The method exploits a simple (yet effective and language-independent) semi-distance measure for individuals, that is based on the resource semantics w.r.t. a number of dimensions corresponding to a committee of features represented by a group of concept descriptions (discriminating features). The algorithm is an fusion of the classic Bisecting k-Means with approaches based on medoids since they are intended to be applied to relational representations. We discuss its complexity and the potential applications to a variety of important tasks.
- Published
- 2007
- Full Text
- View/download PDF
25. Continuous Medoid Queries over Moving Objects
- Author
-
Kyriakos Mouratidis, Stavros Papadopoulos, and Dimitris Sacharidis
- Subjects
Euclidean distance ,Set (abstract data type) ,Task (computing) ,ComputingMethodologies_PATTERNRECOGNITION ,Theoretical computer science ,Current (mathematics) ,k-medoids ,Computer science ,Medoid - Abstract
In the k-medoid problem, given a dataset P, we are asked to choose k points in P as the medoids. The optimal medoid set minimizes the average Euclidean distance between the points in P and their closest medoid. Finding the optimal k medoids is NP hard, and existing algorithms aim at approximate answers, i.e., they compute medoids that achieve a small, yet not minimal, average distance. Similarly in this paper, we also aim at approximate solutions. We consider, however, the continuous version of the problem, where the points in P move and our task is to maintain the medoid set on-the-fly (trying to keep the average distance small). To the best of our knowledge, this work constitutes the first attempt on continuous medoid queries. First, we consider centralized monitoring, where the points issue location updates whenever they move. A server processes the stream of generated updates and constantly reports the current medoid set. Next, we address distributed monitoring, where we assume that the data points have some computational capabilities, and they take over part of the monitoring task. In particular, the server installs adaptive filters (i.e., permissible spatial ranges, called safe regions) to the points, which report their location only when they move outside their filters. The distributed techniques reduce the frequency of location updates (and, thus, the network overhead and the server load), at the cost of a slightly higher average distance, compared to the centralized methods. Both our centralized and distributed methods do not make any assumption about the data moving patterns (e.g., velocity vectors, trajectories, etc) and can be applied to an arbitrary number of medoids k. We demonstrate the efficiency and efficacy of our techniques through extensive experiments.
- Published
- 2007
- Full Text
- View/download PDF
26. Hybrid O( $n \sqrt{n}$ ) Clustering for Sequential Web Usage Mining
- Author
-
Ickjai Lee and Jianhua Yang
- Subjects
Clustering high-dimensional data ,Theoretical computer science ,Brown clustering ,Fuzzy clustering ,Computer science ,Correlation clustering ,Single-linkage clustering ,computer.software_genre ,Medoid ,Hierarchical clustering ,Metric space ,ComputingMethodologies_PATTERNRECOGNITION ,Data stream clustering ,CURE data clustering algorithm ,Canopy clustering algorithm ,FLAME clustering ,Data mining ,Cluster analysis ,computer ,k-medians clustering - Abstract
We propose a natural neighbor inspired O($n \sqrt{n}$) hybrid clustering algorithm that combines medoid-based partitioning and agglomerative hierarchial clustering. This algorithm works efficiently by inheriting partitioning clustering strategy and operates effectively by following hierarchial clustering. More importantly, the algorithm is designed by taking into account the specific features of sequential data modeled in metric space. Experimental results demonstrate the virtue of our approach.
- Published
- 2006
- Full Text
- View/download PDF
27. Medoid Queries in Large Spatial Databases
- Author
-
Kyriakos Mouratidis, Dimitris Papadias, and Spiros Papadimitriou
- Subjects
Set (abstract data type) ,Database ,Computer science ,Spatial database ,Fraction (mathematics) ,Node (circuits) ,Data mining ,computer.software_genre ,computer ,Algorithm ,Medoid ,Block (data storage) - Abstract
Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the medoids. Since the problem is NP-hard, research has focused on approximate solutions. Despite an avalanche of methods for small and moderate size datasets, currently there exists no technique applicable to very large databases. In this paper, we provide efficient algorithms that utilize an existing data-partition index to achieve low CPU and I/O cost. In particular, we exploit the intrinsic grouping properties of the index in order to avoid reading the entire dataset. Furthermore, we apply our framework to solve medoid-aggregate queries, where k is not known in advance; instead, we are asked to compute a medoid set that leads to an average distance close to a user-specified parameter T. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands).
- Published
- 2005
- Full Text
- View/download PDF
28. A New and Efficient K-Medoid Algorithm for Spatial Clustering
- Author
-
Isabelle Couloigner and Qiaoping Zhang
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Computer science ,CURE data clustering algorithm ,Multispectral image ,Single-linkage clustering ,Canopy clustering algorithm ,Data mining ,computer.software_genre ,Cluster analysis ,Algorithm ,computer ,k-medians clustering ,Medoid - Abstract
A new k-medoids algorithm is presented for spatial clustering in large applications. The new algorithm utilizes the TIN of medoids to facilitate local computation when searching for the optimal medoids. It is more efficient than most existing k-medoids methods while retaining the exact the same clustering quality of the basic k-medoids algorithm. The application of the new algorithm to road network extraction from classified imagery is also discussed and the preliminary results are encouraging.
- Published
- 2005
- Full Text
- View/download PDF
29. Clustering of Web Sessions Using Levenshtein Metric
- Author
-
Sergey V. Kuznetsov and Andrei Scherbina
- Subjects
Association rule learning ,Computer science ,business.industry ,String searching algorithm ,computer.software_genre ,Fuzzy logic ,Medoid ,Entropy (information theory) ,The Internet ,Edit distance ,Data mining ,Entropy (energy dispersal) ,Cluster analysis ,business ,computer - Abstract
Various commercial and scientific applications require analysis of user behaviour in the Internet. For example, web marketing or network technical support can benefit from web users classification. This is achievable by tracking pages visited by the user during one session (one visit to the particular site). For automated user sessions classification we propose distance that compares sessions judging by the sequence of pages in them and by categories of these pages. Proposed distance is based on Levenshtein metric. Fuzzy C Medoids algorithm was used for clustering, since it has almost linear complexity. Davies-Bouldin, Entropy, and Bezdek validity indices were used to assess the qualities of proposed method. As testing shows, our distance outperforms in this domain both Euclidian and Edit distances.
- Published
- 2004
- Full Text
- View/download PDF
30. An Efficient K-Medoids-Based Algorithm Using Previous Medoid Index, Triangular Inequality Elimination Criteria, and Partial Distance Search
- Author
-
John F. Roddick, Shu-Chuan Chu, and Jeng-Shyang Pan
- Subjects
Information extraction ,ComputingMethodologies_PATTERNRECOGNITION ,Triangle inequality ,k-medoids ,Computational complexity theory ,Computer science ,Computation ,Vector quantization ,Cluster analysis ,computer.software_genre ,Algorithm ,computer ,Medoid - Abstract
Clustering in data mining is a discovery process that groups similar objects into the same cluster. Various clustering algorithms have been designed to fit various requirements and constraints of application. In this paper, we study several k-medoids-based algorithms including the PAM, CLARA and CLARANS algorithms. A novel and efficient approach is proposed to reduce the computational complexity of such k-medoids-based algorithms by using previous medoid index, triangular inequality elimination criteria and partial distance search. Experimental results based on elliptic, curve and Gauss-Markov databases demonstrate that the proposed algorithm applied to CLARANS may reduce the number of distance calculations by 67% to 92% while retaining the same average distance per object. In terms of the running time, the proposed algorithm may reduce computation time by 38% to 65% compared with the CLARANS algorithm.
- Published
- 2002
- Full Text
- View/download PDF
31. Categorizing Visitors Dynamically by Fast and Robust Clustering of Access Logs
- Author
-
Jianhua Yang and Vladimir Estivill-Castro
- Subjects
Iterative method ,Computer science ,business.industry ,Correlation clustering ,computer.software_genre ,Machine learning ,Medoid ,CURE data clustering algorithm ,Outlier ,Expectation–maximization algorithm ,Canopy clustering algorithm ,Artificial intelligence ,Data mining ,Cluster analysis ,business ,Time complexity ,computer - Abstract
Clustering plays a central role in segmenting markets. The identification of categories of visitors to a Web-site is very useful towards improved Web applications. However, the large volume involved in mining visitation paths, demands efficient clustering algorithms that are also resistant to noise and outliers. Also, dissimilarity between visitation paths involves sophisticated evaluation and results in large dimension of attribute-vectors. We present a randomized, iterative algorithm (a la Expectation Maximization or k-means) but based on discrete medoids. We prove that our algorithm converges and that has subquadratic complexity. We compare to the implementation of the fastest version of matrix-based clustering for visitor paths and show that our algorithm outperforms dramatically matrix-based methods.
- Published
- 2001
- Full Text
- View/download PDF
32. Robust Clusterin of Large Geo-referenced Data Sets
- Author
-
Vladimir Estivill-Castro and Michael E. Houle
- Subjects
Computer science ,Delaunay triangulation ,Spatial database ,Cluster (physics) ,Data mining ,computer.software_genre ,Cluster analysis ,Voronoi diagram ,Time complexity ,computer ,Medoid - Abstract
Clustering geo-referenced data with the medoid method is related to k-Means, with the restriction that cluster representatives are chosen from the data. Although the medoid method in general produces clusters of high quality, it is often criticised for the Ω(n2) time that it requires. Our method incorporates both proximity and density information to achieve high-quality clusters in O(n log n) expected time. This is achieved by fast approximation to the medoid objective function using proximity information from Delaunay triangulations.
- Published
- 1999
- Full Text
- View/download PDF
33. Discovering associations in spatial data — An efficient medoid based approach
- Author
-
Vladimir Estivill-Castro and Alan Murray
- Subjects
Geographic information system ,Heuristic ,Computer science ,business.industry ,Spatial database ,computer.software_genre ,Medoid ,Search algorithm ,Data mining ,Cluster analysis ,Heuristics ,business ,Spatial analysis ,Hill climbing ,computer - Abstract
Spatial data mining is the discovery of novel and interesting relationships and characteristics that may exist implicitly in spatial databases. The identification of clusters coupled with Geographical Information System provides a means of information generalization. A variety of clustering approaches exists. A non-hierarchical method in data mining applications is the medoid approach. Many heuristics have been developed for this approach. This paper carefully analyses the complexity of hill-climbing heuristics for medoid based spatial clustering. Improvements to recently suggested heuristics like CLARANS are identified. We propose a novel idea, the stopping early of the heuristic search, and demonstrate that this provides large savings in computational time while the quality of the partition remains unaffected.
- Published
- 1998
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.