Back to Search
Start Over
Nearest Neighbor Classification for High-Speed Big Data Streams Using Spark.
- Source :
- IEEE Transactions on Systems, Man & Cybernetics. Systems; Oct2017, Vol. 47 Issue 10, p2727-2739, 13p
- Publication Year :
- 2017
-
Abstract
- Mining massive and high-speed data streams among the main contemporary challenges in machine learning. This calls for methods displaying a high computational efficacy, with ability to continuously update their structure and handle ever-arriving big number of instances. In this paper, we present a new incremental and distributed classifier based on the popular nearest neighbor algorithm, adapted to such a demanding scenario. This method, implemented in Apache Spark, includes a distributed metric-space ordering to perform faster searches. Additionally, we propose an efficient incremental instance selection method for massive data streams that continuously update and remove outdated examples from the case-base. This alleviates the high computational requirements of the original classifier, thus making it suitable for the considered problem. Experimental study conducted on a set of real-life massive data streams proves the usefulness of the proposed solution and shows that we are able to provide the first efficient nearest neighbor solution for high-speed big and streaming data. [ABSTRACT FROM AUTHOR]
- Subjects :
- BIG data
DATA mining
MACHINE learning
Subjects
Details
- Language :
- English
- ISSN :
- 21682216
- Volume :
- 47
- Issue :
- 10
- Database :
- Complementary Index
- Journal :
- IEEE Transactions on Systems, Man & Cybernetics. Systems
- Publication Type :
- Academic Journal
- Accession number :
- 125206992
- Full Text :
- https://doi.org/10.1109/TSMC.2017.2700889