Back to Search
Start Over
Leveraging Machine Learning for Anticipatory Data Delivery in Extreme Scale In-situ Workflows
- Source :
- CLUSTER
- Publication Year :
- 2019
- Publisher :
- IEEE, 2019.
-
Abstract
- Extreme scale scientific workflows are composed of multiple applications that exchange data at runtime. Several data-related challenges are limiting the potential impact of such workflows. While data staging and in-situ models of execution have emerged as approaches to address data-related costs at extreme scales, increasing data volumes and complex data exchange patterns impact the effectiveness of such approaches. In this paper, we design and implement DESTINY, which is an autonomic data delivery mechanism for staging-based in-situ workflows. DESTINY dynamically learns the data access patterns of scientific workflow applications and leverages these patterns to decrease data access costs. Specifically, DESTINY uses machine learning techniques to anticipate future data accesses, proactively packages and delivers the data necessary to satisfy these requests as close to the consumer as possible and, when data staging processes and consumer processes are colocated, removes the need for inter-process communication by making these data available to the consumer as shared-memory objects. When consumer processes reside on nodes other than staging nodes, the data is packaged and stored in a format the client will likely access in future. This amortizes expensive data discovery and assembly operations typically associated with data staging. We experimentally evaluate the performance and scalability of DESTINY on leadership class platforms using synthetic applications and the S3D combustion workflow. We demonstrate that DESTINY is scalable and can achieve a reduction of up to 75% in read response time as compared to in-memory staging service for production scientific workflows.
- Subjects :
- Class (computer programming)
Service (systems architecture)
Computer science
business.industry
Destiny (ISS module)
Data discovery
020206 networking & telecommunications
02 engineering and technology
Machine learning
computer.software_genre
Data access
Workflow
Scalability
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2019 IEEE International Conference on Cluster Computing (CLUSTER)
- Accession number :
- edsair.doi...........bd904117de3d00199f1b8da7d3b73db0
- Full Text :
- https://doi.org/10.1109/cluster.2019.8891003