Back to Search
Start Over
Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling
- Source :
- SDM
- Publication Year :
- 2014
-
Abstract
- Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.
- Subjects :
- Data stream
Computer science
Data stream mining
business.industry
Machine learning
computer.software_genre
Upper and lower bounds
Article
Weighting
ComputingMethodologies_PATTERNRECOGNITION
Discriminative model
Artificial intelligence
Data mining
Parallel running
business
computer
Classifier (UML)
Importance sampling
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- SDM
- Accession number :
- edsair.doi.dedup.....c2fc53cf7a2868e07e65dcc3a2b87c1a
- Full Text :
- https://doi.org/10.13140/2.1.1450.8487