Back to Search
Start Over
Data Compaction Through Simultaneous Selection of Prototypes and Features
- Source :
- Compression Schemes for Mining Large Datasets ISBN: 9781447156062
- Publication Year :
- 2013
- Publisher :
- Springer London, 2013.
-
Abstract
- Efficiency in algorithms for data mining can be achieved through identifying representative prototypes or representative features and base explorative study only on those subsets. It is interesting to examine whether both of them can be achieved simultaneously through lossy compression and efficient clustering algorithms on large datasets. We study this aspect in the present chapter. We further examine whether there is a preference in sequencing both these activities; specifically, we examine clustering followed by compression and compression followed by clustering. We provide a detailed discussion on background material that includes definition of various terms, parameters, choice of thresholds in reducing number of patterns and features, etc. We study eight combinations of lossy compression scenarios. We demonstrate that these lossy compression scenarios with compressed information provide a better classification accuracy than the original dataset. In this direction, we implement the proposed scheme on two large datasets, one with binary-valued features and the other with float-point-valued features. At the end of the chapter, we provide bibliographic notes and a list of references.
- Subjects :
- Scheme (programming language)
Data compaction
Computer science
Feature selection
Data_CODINGANDINFORMATIONTHEORY
Lossy compression
computer.software_genre
Base (topology)
Compression (functional analysis)
Data mining
Cluster analysis
computer
Selection (genetic algorithm)
computer.programming_language
Subjects
Details
- ISBN :
- 978-1-4471-5606-2
- ISBNs :
- 9781447156062
- Database :
- OpenAIRE
- Journal :
- Compression Schemes for Mining Large Datasets ISBN: 9781447156062
- Accession number :
- edsair.doi...........f911eedc59555e26776ce94728dca1fc
- Full Text :
- https://doi.org/10.1007/978-1-4471-5607-9_5