Data Compaction Through Simultaneous Selection of Prototypes and Features

Authors :: T. Ravindra Babu
M. Narasimha Murty
S. V. Subrahmanya
Source :: Compression Schemes for Mining Large Datasets ISBN: 9781447156062
Publication Year :: 2013
Publisher :: Springer London, 2013.
Abstract: Efficiency in algorithms for data mining can be achieved through identifying representative prototypes or representative features and base explorative study only on those subsets. It is interesting to examine whether both of them can be achieved simultaneously through lossy compression and efficient clustering algorithms on large datasets. We study this aspect in the present chapter. We further examine whether there is a preference in sequencing both these activities; specifically, we examine clustering followed by compression and compression followed by clustering. We provide a detailed discussion on background material that includes definition of various terms, parameters, choice of thresholds in reducing number of patterns and features, etc. We study eight combinations of lossy compression scenarios. We demonstrate that these lossy compression scenarios with compressed information provide a better classification accuracy than the original dataset. In this direction, we implement the proposed scheme on two large datasets, one with binary-valued features and the other with float-point-valued features. At the end of the chapter, we provide bibliographic notes and a list of references.

Subjects :: Scheme (programming language)
Data compaction
Computer science
Feature selection
Data_CODINGANDINFORMATIONTHEORY
Lossy compression
computer.software_genre
Base (topology)
Compression (functional analysis)
Data mining
Cluster analysis
computer
Selection (genetic algorithm)
computer.programming_language

ISBN :: 978-1-4471-5606-2
ISBNs :: 9781447156062
Database :: OpenAIRE
Journal :: Compression Schemes for Mining Large Datasets ISBN: 9781447156062
Accession number :: edsair.doi...........f911eedc59555e26776ce94728dca1fc
Full Text :: https://doi.org/10.1007/978-1-4471-5607-9_5