Back to Search
Start Over
Self-Adaptive Framework for Efficient Stream Data Classification on Storm
- Source :
- IEEE Transactions on Systems, Man, and Cybernetics: Systems. 50:123-136
- Publication Year :
- 2020
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2020.
-
Abstract
- In this era of big data, stream data classification which is one of typical data stream applications has become more and more significant and challengeable. In these applications, it is obvious that data classification is much more frequent than model training. The ratio of stream data to be classified is rapid and time-varying, so it is an important problem to classify the stream data efficiently with high throughput. In this paper, we first analyze and categorize the current data stream machine learning algorithms according to their data structures. Then, we propose stream data classification topology (SDC-Topology) on Storm. For the classification algorithms based on the matrix, we propose self-adaptive stream data classification framework (SASDC-Framework) for efficient stream data classification on Storm. In SASDC-Framework, all the data sets arriving at the same unit time are partitioned into subsets with the nearly best partition size and processed in parallel. To select the nearly best partition size for the stream data sets efficiently, we adopt bisection method strategy and inverse distance weighted strategy. Extreme learning machine, which is a fast and accurate machine learning method based on matrix calculating, is used to test the efficiency of our proposals. According to evaluation results, the throughputs based on SASDC-Framework are 8–35 times higher than those based on SDC-Topology and the best throughput is more than 40000 prediction requests per second in our environment.
- Subjects :
- Data stream
Artificial neural network
Computer science
business.industry
Big data
Data classification
02 engineering and technology
computer.software_genre
Data structure
Computer Science Applications
Human-Computer Interaction
Data set
Statistical classification
Categorization
Control and Systems Engineering
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
Electrical and Electronic Engineering
business
computer
Throughput (business)
Software
Extreme learning machine
Subjects
Details
- ISSN :
- 21682232 and 21682216
- Volume :
- 50
- Database :
- OpenAIRE
- Journal :
- IEEE Transactions on Systems, Man, and Cybernetics: Systems
- Accession number :
- edsair.doi...........6f897b8866b2c3e6667d8585892f66ac
- Full Text :
- https://doi.org/10.1109/tsmc.2017.2757029