Back to Search
Start Over
Mim: A Merge Iteration and Its Applications for Big Data
- Source :
- IEEE Access, Vol 6, Pp 66984-66997 (2018)
- Publication Year :
- 2018
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2018.
-
Abstract
- With the rapid development of technologies like the Internet, sensors and bioinformatics, and data has grown explosively. In the big data era, more and more iterative algorithms have been applied in the fields of data mining and machine learning. In most situation, the iterative algorithms compute in the entire dataset which are merged from the partial ones. Given the iterative results on partial datasets, it is efficient if the results on the entire dataset can be merged from them, otherwise the re-computing on entire one is time consuming. Unfortunately, current iteration model do not support the results merging. We propose merge iteration computing model (Mim) in this paper. Mim is a solution but not a platform. It states how to execute iterative algorithm effectively through reusing the exiting results without sacrificing the accuracy, and such mechanism is suitable for the most iterative algorithms. We explain the in-partition iteration step, error evaluation step, compensation step (optional), and merge iteration step of Mim, in addition, the in-partition iteration step is preliminary of merging iteration and should be done before the partial datasets are merged. We also analyze the accuracy and performance advantages of Mim theoretically. In the application scenarios, we implement Mim over Spark framework, and applied the Mim to the financial data analysis in a city. Finally, through a series of experiments, we prove the efficiency and accuracy of the proposed Mim on the PageRank and K-means algorithms. Under the various test cases, the maximum optimization ratio of Mim is 25% and 56% comparing with regular iteration on PageRank and K-means, respectively. And the errors are negligible.
- Subjects :
- 020203 distributed computing
iterative algorithm
General Computer Science
Computer science
Iterative method
business.industry
iterative computing model
Big data
General Engineering
02 engineering and technology
merge iteration
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
General Materials Science
lcsh:Electrical engineering. Electronics. Nuclear engineering
business
lcsh:TK1-9971
Merge (version control)
Algorithm
Subjects
Details
- ISSN :
- 21693536
- Volume :
- 6
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....3aed08e3fc925f32a5de4ebb9b4a575c
- Full Text :
- https://doi.org/10.1109/access.2018.2879779