1. Distance variable improvement of time-series big data stream evaluation
- Author
-
Lintang Matahari Hasani, May Iffah Rizki, Jihan Adibah, Petrus Mursanto, Ari Wibisono, Wendy D. W. T. Bayu, and Valian Fil Ahli
- Subjects
Data stream ,Information Systems and Management ,lcsh:Computer engineering. Computer hardware ,Computer Networks and Communications ,Computer science ,lcsh:TK7885-7895 ,02 engineering and technology ,Intelligent Systems ,Standard deviation ,lcsh:QA75.5-76.95 ,020204 information systems ,Chernoff bound ,0202 electrical engineering, electronic engineering, information engineering ,Time series ,Hoeffding's inequality ,lcsh:T58.5-58.64 ,lcsh:Information technology ,Variable (computer science) ,Tree (data structure) ,Mean absolute percentage error ,Hardware and Architecture ,020201 artificial intelligence & image processing ,lcsh:Electronic computers. Computer science ,Distance improvement ,Big data regression ,Algorithm ,Information Systems - Abstract
Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.
- Published
- 2020