Author: "Philip E. Davis" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. Adaptive elasticity policies for staging-based in situ visualization

Author: Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, and Manish Parashar
Subjects: Computer Networks and Communications, Hardware and Architecture, Software
Published: 2023
Full Text: View/download PDF

2. The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science

Author: Shuangxi Zhang, Berk Geveci, Matthew Wolf, Kevin Huck, E. Suchyta, Cameron W. Smith, Ruonan Wang, Stephane Ethier, Philip E. Davis, Manish Parashar, Pradeep Subedi, Gabriele Merlo, Abolaji Adesoji, Norbert Podhorszki, Qing Liu, Todd Munson, Shirley Moore, Mark S. Shephard, C.S. Chang, Jeremy Logan, Jong Choi, Lipeng Wan, Kai Germaschewski, David Pugmire, Ian Foster, Scott Klasky, Kshitij Mehta, Chris Harris, and Julien Dominski
Subjects: 020203 distributed computing, Fusion, Computer science, 02 engineering and technology, 01 natural sciences, Code coupling, 010305 fluids & plasmas, Theoretical Computer Science, Computational science, High fidelity, Workflow, Hardware and Architecture, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Software
Abstract: We present the Exascale Framework for High Fidelity coupled Simulations (EFFIS), a workflow and code coupling framework developed as part of the Whole Device Modeling Application (WDMApp) in the Exascale Computing Project. EFFIS consists of a library, command line utilities, and a collection of run-time daemons. Together, these software products enable users to easily compose and execute workflows that include: strong or weak coupling, in situ (or offline) analysis/visualization/monitoring, command-and-control actions, remote dashboard integration, and more. We describe WDMApp physics coupling cases and computer science requirements that motivate the design of the EFFIS framework. Furthermore, we explain the essential enabling technology that EFFIS leverages: ADIOS for performant data movement, PerfStubs/TAU for performance monitoring, and an advanced COUPLER for transforming coupling data from its native format to the representation needed by another application. Finally, we demonstrate EFFIS using coupled multi-simulation WDMApp workflows and exemplify how the framework supports the project’s needs. We show that EFFIS and its associated services for data movement, visualization, and performance collection does not introduce appreciable overhead to the WDMApp workflow and that the resource-dominant application’s idle time while waiting for data is minimal.
Published: 2021
Full Text: View/download PDF

3. Assembling Portable In-Situ Workflow from Heterogeneous Components using Data Reorganization

Author: Bo Zhang, Pradeep Subedi, Philip E Davis, Francesco Rizzi, Keita Teranishi, and Manish Parashar
Published: 2022
Full Text: View/download PDF

4. CoREC

Author: Philip E. Davis, Shaohua Duan, Keita Teranishi, Pradeep Subedi, Manish Parashar, Hemanth Kolla, and Marc Gamell
Subjects: 020203 distributed computing, Mean time between failures, business.industry, Computer science, Distributed computing, 020207 software engineering, 02 engineering and technology, Load balancing (computing), Storage efficiency, Replication (computing), Computer Science Applications, Data recovery, Dataspaces, Data access, Computational Theory and Mathematics, Hardware and Architecture, Modeling and Simulation, Scalability, 0202 electrical engineering, electronic engineering, information engineering, business, Erasure code, Software
Abstract: The dramatic increase in the scale of current and planned high-end HPC systems is leading new challenges, such as the growing costs of data movement and IO, and the reduced mean time between failures (MTBF) of system components. In-situ workflows, i.e., executing the entire application workflows on the HPC system, have emerged as an attractive approach to address data-related challenges by moving computations closer to the data, and staging-based frameworks have been effectively used to support in-situ workflows at scale. However, the resilience of these staging-based solutions has not been addressed, and they remain susceptible to expensive data failures. Furthermore, naive use of data resilience techniques such as n-way replication and erasure codes can impact latency and/or result in significant storage overheads. In this article, we present CoREC, a scalable and resilient in-memory data staging runtime for large-scale in-situ workflows. CoREC uses a novel hybrid approach that combines dynamic replication with erasure coding based on data access patterns. It also leverages multiple levels of replications and erasure coding to support diverse data resiliency requirements. Furthermore, the article presents optimizations for load balancing and conflict-avoiding encoding, and a low overhead, lazy data recovery scheme. We have implemented the CoREC runtime and have deployed with the DataSpaces staging service on leadership class computing machines and present an experimental evaluation in the article. The experiments demonstrate that CoREC can tolerate in-memory data failures while maintaining low latency and sustaining high overall storage efficiency at large scales.
Published: 2020
Full Text: View/download PDF

5. Adaptive Elasticity Policies for Staging-Based in Situ Processing

Author: Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, and Manish Parashar
Published: 2022
Full Text: View/download PDF

6. Adaptive Placement of Data Analysis Tasks For Staging Based In-Situ Processing

Author: Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, and Manish Parashar
Published: 2021
Full Text: View/download PDF

7. An Adaptive Elasticity Policy For Staging Based In-Situ Processing

Author: Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, and Manish Parashar
Published: 2021
Full Text: View/download PDF

8. RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows

Author: Pradeep Subedi, Philip E. Davis, and Manish Parashar
Subjects: business.industry, Computer science, Distributed computing, Data management, computer.software_genre, Dataspaces, Data access, Workflow, Asynchronous communication, Data exchange, Middleware (distributed applications), Resource management (computing), business, computer
Abstract: While in-situ workflow formulations have addressed some of the data-related challenges associated with extreme-scale scientific workflows, these workflows involve complex interactions and different modes of data exchange. In the context of increasing system complexity, such workflows present significant resource management challenges, requiring complex cost-performance tradeoffs. This paper presents RISE, an intelligent staging-based data management middleware, which builds on the DataSpaces framework and performs intelligent scheduling of data management operations to reduce I/O contention. In RISE, data are always written immediately to local buffers to reduce the effect of the transfer impact upon application performance. RISE identifies applications’ data access patterns and moves data towards data consumers only when the network is expected to be idle, reducing the impact of asynchronous background data movement upon critical data read/write requests. We experimentally demonstrate that RISE can take advantage of staging nodes to offload data during writes without degrading application data movement performance.
Published: 2021
Full Text: View/download PDF

9. Design and Performance of Kokkos Staging Space toward Scalable Resilient Application Couplings

Author: Keita Teranishi, Francesco Rizzi, Nicolas Morales, Pradeep Subedi, Bo Zhang, Philip E. Davis, and Parashar Manish
Subjects: Computer science, Scalability, Space (mathematics), Computational science
Published: 2021
Full Text: View/download PDF

10. Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

Author: Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, and Axel Huebl
Subjects: high performance computing, FOS: Computer and information sciences, openPMD, ADIOS, Computer Science - Distributed, Parallel, and Cluster Computing, big data, RDMA, streaming, Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks., 18 pages, 9 figures, SMC2021, supplementary material at https://zenodo.org/record/4906276
Published: 2021

11. Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows

Author: Philip E. Davis, Matthieu Dorier, Manish Parashar, Pradeep Subedi, and Zhe Wang
Subjects: Service (systems architecture), Data processing, Workflow, Computer science, Analytics, business.industry, Distributed computing, Transfer (computing), Data analysis, Representation (mathematics), business, Scheduling (computing)
Abstract: In-situ and in-transit processing alleviate the gap between the computing and I/O capabilities by scheduling data analytics close to the data source. Hybrid in-situ processing splits data analytics into two stages: the data processing that runs in-situ aims to extract regions of interest, which are then transferred to staging services for further in-transit analytics. To facilitate this type of hybrid in-situ processing, the data staging service needs to support complex intermediate data representations generated/consumed by the in-situ tasks. Unstructured (or irregular) mesh is one such derived data representation that is typically used and bridges simulation data and analytics. However, how staging services efficiently support unstructured mesh transfer and processing remains to be explored. This paper investigates design options for transferring and processing unstructured mesh data using staging services. Using polygonal mesh data as an example, we show that hybrid in-situ workflows with staging-based unstructured mesh processing can effectively support hybrid in-situ workflows, and can significantly decrease data movement overheads.
Published: 2021
Full Text: View/download PDF

12. Toward Resilient Heterogeneous Computing Workflow through Kokkos-DataSpaces Integration

Author: Bo Zhang, Nicolas Morales, Keita Teranishi, Manish PArshar, and Philip E. Davis
Subjects: Dataspaces, Workflow, Computer science, Distributed computing, Symmetric multiprocessor system
Published: 2020
Full Text: View/download PDF

13. Benesh: a Programming Model for Coupled Scientific Workflows

Author: Manish Parashar, Lee Ricketson, Jeffrey Hittinger, Philip E. Davis, Shaohua Duan, and Pradeep Subedi
Subjects: Workflow, Computer science, Block (programming), business.industry, Data domain, Formal specification, Component (UML), Programming paradigm, Software engineering, business, Programmer, Abstraction (linguistics)
Abstract: As scientific applications strive towards increasingly realistic modeling of complex phenomena, they are integrating multiple models and simulations into complex, coupled scientific workflows. As a result, ensuring that existing codes can be combined and recombined correctly and flexibly as part of these workflows is essential. In this paper, we propose Benesh, a programming system for creating in-situ scientific workflows. Benesh provides a domain-specific abstraction that enables a programmer to instrument an existing simulation code to be used as a building block in defining complex workflows. Using Benesh, developers define a workflow-level shared specification of data objects over common or partitioned data domains. This permits dependency-based execution to be specified at the workflow level, distinct from the independent operation of the component simulations. We additionally describe features of a scalable runtime that builds on a distributed data services layer to implement the Benesh programming system.
Published: 2020
Full Text: View/download PDF

14. Staging Based Task Execution for Data-driven, In-Situ Scientific Workflows

Author: Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar, and Zhe Wang
Subjects: Task (computing), Workflow, Computer science, business.industry, Distributed computing, Data management, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020207 software engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, business, Data-driven
Abstract: As scientific workflows increasingly use extreme-scale resources, the imbalance between higher computational capabilities, generated data volumes, and available I/O bandwidth is limiting the ability to translate these scales into insights. Insitu workflows (and the in-situ approach) are leveraging storage levels close to the computation in novel ways in order to reduce the required I/O. However, to be effective, it is important that the mapping and execution of such in-situ workflows adopts a data-driven approach, enabling in-situ tasks to be executed flexibly based upon data content. This paper first explores the design space for data-driven in-situ workflows. Specifically, it presents a model that captures different factors that influence the mapping, execution, and performance of data-driven in-situ workflows and experimentally studies the impact of different mapping decisions and execution patterns. The paper then presents the design, implementation, and experimental evaluation of a data-driven in-situ workflow execution framework that leverages in-memory distributed data management and user-defined task-triggers to enable efficient and scalable in-situ workflow execution.
Published: 2020
Full Text: View/download PDF

15. ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management

Author: Mark Kim, Seiji Tsutsumi, George Ostrouchov, James Kress, Keichi Takahashi, Lipeng Wan, Kesheng Wu, Norbert Podhorszki, Kshitij Mehta, Kai Germaschewski, Franz Poeschel, Scott Klasky, Ruonan Wang, Chuck Atkins, Jong Choi, Matthew Wolf, Qing Liu, David Pugmire, Jeremy Logan, William F. Godoy, Philip E. Davis, Manish Parashar, Junmin Gu, Nicholas Thompson, E. Suchyta, Kevin Huck, Greg Eisenhauer, Axel Huebl, and Tahsin Kurc
Subjects: Staging, Computer science, Fortran, Data management, Scalable I/O, computer.software_genre, 01 natural sciences, Data science, 03 medical and health sciences, Exascale computing, Luster GPFS file systems, 0103 physical sciences, 010306 general physics, MATLAB, 030304 developmental biology, computer.programming_language, lcsh:Computer software, 0303 health sciences, Application programming interface, business.industry, Programming language, In-situ, Python (programming language), Supercomputer, Computer Science Applications, lcsh:QA76.75-76.765, Personal computer, RDMA, business, High-performance computing (HPC), computer, Software
Abstract: Author(s): Godoy, WF; Podhorszki, N; Wang, R; Atkins, C; Eisenhauer, G; Gu, J; Davis, P; Choi, J; Germaschewski, K; Huck, K; Huebl, A; Kim, M; Kress, J; Kurc, T; Liu, Q; Logan, J; Mehta, K; Ostrouchov, G; Parashar, M; Poeschel, F; Pugmire, D; Suchyta, E; Takahashi, K; Thompson, N; Tsutsumi, S; Wan, L; Wolf, M; Wu, K; Klasky, S | Abstract: We present ADIOS 2, the latest version of the Adaptable Input Output (I/O) System. ADIOS 2 addresses scientific data management needs ranging from scalable I/O in supercomputers, to data analysis in personal computer and cloud systems. Version 2 introduces a unified application programming interface (API) that enables seamless data movement through files, wide-area-networks, and direct memory access, as well as high-level APIs for data analysis. The internal architecture provides a set of reusable and extendable components for managing data presentation and transport mechanisms for new applications. ADIOS 2 bindings are available in C++11, C, Fortran, Python, and Matlab and are currently used across different scientific communities. ADIOS 2 provides a communal framework to tackle data management challenges as we approach the exascale era of supercomputing.
Published: 2020

16. Addressing data resiliency for staging based scientific workflows

Author: Shaohua Duan, Pradeep Subedi, Philip E. Davis, and Manish Parashar
Subjects: Cray XK7, 020203 distributed computing, Dataspaces, Workflow, Correctness, Computer science, Dataflow, Distributed computing, 0202 electrical engineering, electronic engineering, information engineering, 020207 software engineering, Anomaly detection, 02 engineering and technology
Abstract: As applications move towards extreme scales, data-related challenges are becoming significant concerns, and in-situ workflows based on data staging and in-situ/in-transit data processing have been proposed to address these challenges. Increasing scale is also expected to result in an increase in the rate of silent data corruption errors, which will impact both the correctness and performance of applications. Furthermore, this impact is amplified in the case of in-situ workflows due to the dataflow between the component applications of the workflow. While existing research has explored silent error detection at the application level, silent error detection for workflows remains an open challenge. This paper addresses silent error detection for extreme scale in-situ workflows. The presented approach leverages idle computation resource in data staging to enable timely detection and recovery from silent data corruption, effectively reducing the propagation of corrupted data and end-to-end workflow execution time in the presence of silent errors. As an illustration of this approach, we use a spatial outlier detection approach in staging to detect errors introduced in data transfer and storage. We also provide a CPU-GPU hybrid staging framework for error detection in order to achieve faster error identification. We have implemented our approach within the DataSpaces staging service, and evaluated it using both synthetic and real workflows on a Cray XK7 system (Titan) at different scales. We demonstrate that, in the presence of silent errors, enabling error detection on staged data alongside a checkpoint/restart scheme improves the total in-situ workflow execution time by up to 22% in comparison with using checkpoint/restart alone.
Published: 2019
Full Text: View/download PDF

17. Data Management for Extreme Scale In-situ Workflows

Author: Pradeep Subedi, Anthony Simonet, Philip E. Davis, Shaohua Duan, Zhe Wang, and Manish Parashar
Abstract: End-to-end scientific workflows running in leadership class systems present significant data management challenges due to the increasing volume of data being produced. Furthermore, the impact of emerging storage architectures (e.g., deep memory hierarchies and burst buffers) and the extreme heterogeneity of the system are bringing new data management challenges. Together these data-related challenges are significantly impacting the effective execution of coupled simulations and in-situ workflows on these systems. Increasing systems scales are also expected to result in an increase in node failures and silent data corruptions, which adds to these challenges. Data staging techniques are being used to address these data-related challenges and support extreme scale in-situ workflows. In this paper, we investigate how data staging solutions can leverage deep memory hierarchy via intelligent prefetching and data movement and efficient data placement techniques. Specifically, we present an autonomic data-management framework that leverages system information, data locality, machine learning based approaches, and user hints to capture the data access and movement patterns between components of staging-based in-situ application workflows. It then uses this knowledge to build a more robust data staging platform, which can provide high performance and resilient/error-free data exchange for in-situ workflows. We also present an overview of various data management techniques used by the DataSpaces data staging service that leverage autonomic data management to deliver the right data at the right time to the right application.
Published: 2019
Full Text: View/download PDF

18. Leveraging Machine Learning for Anticipatory Data Delivery in Extreme Scale In-situ Workflows

Author: Manish Parashar, Philip E. Davis, and Pradeep Subedi
Subjects: Class (computer programming), Service (systems architecture), Computer science, business.industry, Destiny (ISS module), Data discovery, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Data access, Workflow, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Extreme scale scientific workflows are composed of multiple applications that exchange data at runtime. Several data-related challenges are limiting the potential impact of such workflows. While data staging and in-situ models of execution have emerged as approaches to address data-related costs at extreme scales, increasing data volumes and complex data exchange patterns impact the effectiveness of such approaches. In this paper, we design and implement DESTINY, which is an autonomic data delivery mechanism for staging-based in-situ workflows. DESTINY dynamically learns the data access patterns of scientific workflow applications and leverages these patterns to decrease data access costs. Specifically, DESTINY uses machine learning techniques to anticipate future data accesses, proactively packages and delivers the data necessary to satisfy these requests as close to the consumer as possible and, when data staging processes and consumer processes are colocated, removes the need for inter-process communication by making these data available to the consumer as shared-memory objects. When consumer processes reside on nodes other than staging nodes, the data is packaged and stored in a format the client will likely access in future. This amortizes expensive data discovery and assembly operations typically associated with data staging. We experimentally evaluate the performance and scalability of DESTINY on leadership class platforms using synthetic applications and the S3D combustion workflow. We demonstrate that DESTINY is scalable and can achieve a reduction of up to 75% in read response time as compared to in-memory staging service for production scientific workflows.
Published: 2019
Full Text: View/download PDF

19. Single-Event Characterization of the 16 nm FinFET Xilinx UltraScale+TM RFSoC Field-Programmable Gate Array under Proton Irradiation

Author: Doug Thorpe, Philip E. Davis, Mark Learn, and David S. Lee
Subjects: Physics, Hardware_MEMORYSTRUCTURES, Proton, 010308 nuclear & particles physics, Event (computing), business.industry, Hardware_PERFORMANCEANDRELIABILITY, 01 natural sciences, Upset, 0103 physical sciences, Optoelectronics, Irradiation, Hardware_ARITHMETICANDLOGICSTRUCTURES, Field-programmable gate array, business, Hardware_LOGICDESIGN
Abstract: This study examines the single-event upset and single-event latch-up susceptibility of the Xilinx 16nm FinFET Zynq UltraScale+ RFSoC FPGA in proton irradiation. Results for SEU in configuration memory, BlockRAM memory, and device SEL are given.
Published: 2019
Full Text: View/download PDF

20. Towards a Smart, Internet-Scale Cache Service for Data Intensive Scientific Applications

Author: Anthony Simonet, Ivan Rodero, Zhe Wang, Philip E. Davis, Yubo Qin, Azita Nouri, and Manish Parashar
Subjects: Service (systems architecture), business.industry, Computer science, Scale (chemistry), Quality of service, 020206 networking & telecommunications, Usability, 02 engineering and technology, Information repository, Data science, Cyberinfrastructure, Ocean Observatories Initiative, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Cache, business
Abstract: Data and services provided by shared facilities, such as large-scale observing facilities, have become important enablers of scientific insights and discoveries across many science and engineering disciplines. Ensuring satisfactory quality of service can be challenging for facilities, due to their remote locations and to the distributed nature of the instruments, observatories, and users, as well as the rapid growth of data volumes and rates. This research explores how knowledge of the facilities usage patterns, coupled with emerging cyberinfrastructures can be leveraged to improve their performance, usability, and scientific impact. We propose a framework with a smart, internet-scale cache augmented with prefetching and data placement strategies to improve data delivery performance for scientific facilities. Our evaluations, which are based on the NSF Ocean Observatories Initiative, demonstrate that our framework is able to predict user requests and reduce data movements by more than 56% across networks.
Published: 2019
Full Text: View/download PDF

21. First coupled GENE–XGC microturbulence simulations

Author: Junyi Cheng, Amitava Bhattacharjee, Philip E. Davis, Gabriele Merlo, Robert Hager, Salomon Janhunen, Frank Jenko, Kai Germaschewski, Scott Klasky, C.S. Chang, Julien Dominski, E. Suchyta, and Scott Parker
Subjects: Physics, Gyrokinetic ElectroMagnetic, Numerical analysis, Context (language use), Condensed Matter Physics, Topology, Grid, 01 natural sciences, 010305 fluids & plasmas, Coupling (physics), Physics::Plasma Physics, Frequency domain, 0103 physical sciences, Microturbulence, Poisson's equation, 010306 general physics
Abstract: Covering the core and the edge region of a tokamak, respectively, the two gyrokinetic turbulence codes Gyrokinetic Electromagnetic Numerical Experiment (GENE) and X-point Gyrokinetic Code (XGC) have been successfully coupled by exchanging three-dimensional charge density data needed to solve the gyrokinetic Poisson equation over the entire spatial domain. Certain challenges for the coupling procedure arise from the fact that the two codes employ completely different numerical methods. This includes, in particular, the necessity to introduce mapping procedures for the transfer of data between the unstructured triangular mesh of XGC and the logically rectangular grid (in a combination of real and Fourier space) used by GENE. Constraints on the coupling scheme are also imposed by the use of different time integrators. First, coupled simulations are presented. We have considered collisionless ion temperature gradient turbulence, in both circular and fully shaped plasmas. Coupled simulations successfully reproduce both GENE and XGC reference results, confirming the validity of the code coupling approach toward a whole device model. Many lessons learned in the present context, in particular, the need for a coupling procedure as flexible as possible, should be valuable to our and other efforts to couple different kinds of codes in pursuit of a more comprehensive description of complex real-world systems and will drive our further developments of a whole device model for fusion plasmas.
Published: 2021
Full Text: View/download PDF

22. Scaling Deep Learning for Cancer with Advanced Workflow Storage Integration

Author: Justin M. Wozniak, Ian Foster, Rick Stevens, Jonathan Ozik, Thomas Brettin, Nicholson Collier, Philip E. Davis, Tong Shu, and Manish Parashar
Subjects: Petascale computing, Workflow, Artificial neural network, business.industry, Computer science, Distributed computing, Deep learning, Cache, Artificial intelligence, business, Supercomputer, Sketch, Exascale computing
Abstract: Cancer Deep Learning Environment (CANDLE) benchmarks and workflows will combine the power of exascale computing with neural network-based machine learning to address a range of loosely connected problems in cancer research. This application area poses unique challenges to the exascale computing environment. Here, we identify one challenge in CANDLE workflows, namely, saving neural network model representations to persistent storage. In this paper, we provide background on this problem, describe our solution, the Model Cache, and present performance results from running the system on a test cluster, ANL/LCRC Blues, and the petascale supercomputer NERSC Cori. We also sketch next steps for this promising workflow storage solution.
Published: 2018
Full Text: View/download PDF

23. Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows

Author: Shaohua Duan, Scott Klasky, Pradeep Subedi, Hemanth Kolla, Manish Parashar, and Philip E. Davis
Subjects: 020203 distributed computing, Random access memory, Memory hierarchy, Computer science, Distributed computing, Stacker, 020207 software engineering, 02 engineering and technology, Supercomputer, Persistence (computer science), Data modeling, Dataspaces, Workflow, Server, Scalability, 0202 electrical engineering, electronic engineering, information engineering, Latency (engineering)
Abstract: Data staging and in-situ workflows are being explored extensively as an approach to address data-related costs at very large scales. However, the impact of emerging storage architectures (e.g., deep memory hierarchies and burst buffers) upon data staging solutions remains a challenge. In this paper, we investigate how burst buffers can be effectively used by data staging solutions, for example, as a persistence storage tier of the memory hierarchy. Furthermore, we use machine learning based prefetching techniques to move data between the storage levels in an autonomous manner. We also present Stacker, a prototype of the proposed solutions implemented within the DataSpaces data staging service, and experimentally evaluate its performance and scalability using the S3D combustion workflow on current leadership class platforms. Our experiments demonstrate that Stacker achieves low latency, high volume data-staging with low overheads as compared to in-memory staging services for production scientific workflows.
Published: 2018
Full Text: View/download PDF

24. Coupling Exascale Multiphysics Applications: Methods and Lessons Learned

Author: Choong-Seock Chang, Mark Ainsworth, Dave Pugmire, Frank Jenko, Greg Eisenhauer, Stephane Ethier, Allen D. Malony, Matthew Wolf, Franck Cappello, Kenneth Moreland, Norbert Podhorszki, Seung-Hoe Ku, Manish Parashar, Mark Kim, Scott Klasky, Sheng Di, Tom Peterka, Berk Geveci, Ozan Tugluk, Ben Whitney, Jong Youl Choi, Philip E. Davis, Julien Dominski, Ian Foster, Kshitij Mehta, Todd Munson, Hanqi Guo, E. Suchyta, Kevin Huck, Bryce Allen, Jeremy Logan, Chad Wood, Gabriele Merlo, James Kress, Qing Liu, Ruonan Wang, and Michael Churchill
Subjects: Coupling, Computational complexity theory, Computer science, Multiphysics, 010103 numerical & computational mathematics, 01 natural sciences, 010305 fluids & plasmas, Online analysis, Visualization, Titan (supercomputer), Computer architecture, 0103 physical sciences, Performance monitoring, Workflow scheduling, 0101 mathematics
Abstract: With the growing computational complexity of science and the complexity of new and emerging hardware, it is time to re-evaluate the traditional monolithic design of computational codes. One new paradigm is constructing larger scientific computational experiments from the coupling of multiple individual scientific applications, each targeting their own physics, characteristic lengths, and/or scales. We present a framework constructed by leveraging capabilities such as in-memory communications, workflow scheduling on HPC resources, and continuous performance monitoring. This code coupling capability is demonstrated by a fusion science scenario, where differences between the plasma at the edges and at the core of a device have different physical descriptions. This infrastructure not only enables the coupling of the physics components, but it also connects in situ or online analysis, compression, and visualization that accelerate the time between a run and the analysis of the science content. Results from runs on Titan and Cori are presented as a demonstration.
Published: 2018
Full Text: View/download PDF

25. The SFR-M$_*$ Correlation Extends to Low Mass at High Redshift

Author: Romeel Davé, Dritan Kodra, Philip E. Davis, Eric Gawiser, Rachel S. Somerville, Anton M. Koekemoer, Steven L. Finkelstein, J. A. Newman, Kartheik Iyer, P. Kurczynski, and Camilla Pacifici
Subjects: Physics, FOS: Physical sciences, Astronomy and Astrophysics, Astrophysics::Cosmology and Extragalactic Astrophysics, Astrophysics, 01 natural sciences, Astrophysics - Astrophysics of Galaxies, Redshift, photometric [techniques], Correlation, 010104 statistics & probability, Space and Planetary Science, Astrophysics of Galaxies (astro-ph.GA), 0103 physical sciences, star formation [galaxies], 0101 mathematics, Low Mass, 010303 astronomy & astrophysics, Astrophysics::Galaxy Astrophysics, evolution [galaxies]
Abstract: To achieve a fuller understanding of galaxy evolution, SED fitting can be used to recover quantities beyond stellar masses (M$_*$) and star formation rates (SFRs). We use Star Formation Histories (SFHs) reconstructed via the Dense Basis method of Iyer \& Gawiser (2017) for a sample of $17,873$ galaxies at $0.54$. The evolution of the correlation is well described by $\log SFR= (0.80\pm 0.029 - 0.017\pm 0.010\times t_{univ})\log M_*$ $- (6.487\pm 0.282-0.039\pm 0.008\times t_{univ})$, where $t_{univ}$ is the age of the universe in Gyr., Comment: 22 pages, 10 figures. Accepted for publication in ApJ
Published: 2018
Full Text: View/download PDF

26. Scalable Data Resilience for In-memory Data Staging

Author: Manish Parashar, Pradeep Subedi, Keita Teranishi, Hemanth Kolla, Philip E. Davis, Shaohua Duan, and Marc Gamell
Subjects: Mean time between failures, business.industry, Computer science, Distributed computing, 020207 software engineering, Fault tolerance, 02 engineering and technology, Storage efficiency, Data recovery, Data modeling, Dataspaces, Data access, Server, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Erasure code, business
Abstract: The dramatic increase in the scale of current and planned high-end HPC systems is leading new challenges, such as the growing costs of data movement and IO, and the reduced mean times between failures (MTBF) of system components. In-situ workflows, i.e., executing the entire application workflows on the HPC system, have emerged as an attractive approach to address data-related challenges by moving computations closer to the data, and staging-based frameworks have been effectively used to support in-situ workflows at scale. However, the resilience of these staging-based solutions has not been addressed and they remain susceptible to expensive data failures. Furthermore, naive use of data resilience techniques such as n-way replication and erasure codes can impact latency and/or result in significant storage overheads. In this paper, we present CoREC, a scalable resilient in-memory data staging runtime for large-scale in-situ workflows. CoREC uses a novel hybrid approach that combines dynamic replication with erasure coding based on data access patterns. The paper also presents optimizations for load balancing and conflict avoiding encoding, and a low overhead, lazy data recovery scheme. We have implemented the CoREC runtime and have deployed with the DataSpaces staging service on Titan at ORNL, and present an experimental evaluation in the paper. The experiments demonstrate that CoREC can tolerate in-memory data failures while maintaining low latency and sustaining high overall storage efficiency at large scales.
Published: 2018
Full Text: View/download PDF

27. Scalable Parallelization of a Markov Coalescent Genealogy Sampler

Author: Greg Wolffe, Adam M. Terwilliger, David Zeitler, and Philip E. Davis
Subjects: 0301 basic medicine, Theoretical computer science, Markov chain, Computer science, Sampling (statistics), Population genetics, Markov process, Markov chain Monte Carlo, Parallel computing, Scalable parallelism, Genealogy, Coalescent theory, 03 medical and health sciences, CUDA, symbols.namesake, 030104 developmental biology, Scalability, symbols, Leverage (statistics)
Abstract: Coalescent genealogy samplers are effective tools for the study of population genetics. They are used to estimate the historical parameters of a population based upon the sampling of present-day genetic information. A popular approach employs Markov chain Monte Carlo (MCMC) methods. While effective, these methods are very computationally intensive, often taking weeks to run. Although attempts have been made to leverage parallelism in an effort to reduce runtimes, they have not resulted in scalable solutions. Due to the inherently sequential nature of MCMC methods, their performance has suffered diminishing returns when applied to large-scale computing clusters. In the interests of reduced runtimes and higher quality solutions, a more sophisticated form of parallelism is required. This paper describes a novel way to apply a recently discovered generalization of MCMC for this purpose. The new approach exploits the multiple-proposal mechanism of the generalized method to enable the desired scalable parallelism while maintaining the accuracy of the original technique.
Published: 2017
Full Text: View/download PDF

28. Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales

Author: Ian Foster, Mark Ainsworth, Bryce Allen, Julie Bessac, Franck Cappello, Jong Youl Choi, Emil Constantinescu, Philip E. Davis, Sheng Di, Wendy Di, Hanqi Guo, Scott Klasky, Kerstin Kleese Van Dam, Tahsin Kurc, Qing Liu, Abid Malik, Kshitij Mehta, Klaus Mueller, Todd Munson, George Ostouchov, Manish Parashar, Tom Peterka, Line Pouchard, Dingwen Tao, Ozan Tugluk, Stefan Wild, Matthew Wolf, Justin M. Wozniak, Wei Xu, and Shinjae Yoo
Subjects: Focus (computing), Computer science, business.industry, Computation, 020207 software engineering, 02 engineering and technology, Supercomputer, Data science, Reduction (complexity), Software, 0202 electrical engineering, electronic engineering, information engineering, Programming paradigm, Systems design, 020201 artificial intelligence & image processing, business
Abstract: A growing disparity between supercomputer computation speeds and I/O rates makes it increasingly infeasible for applications to save all results for offline analysis. Instead, applications must analyze and reduce data online so as to output only those results needed to answer target scientific question(s). This change in focus complicates application and experiment design and introduces algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of supercomputer systems. I review these challenges and describe methods and tools that various groups, including mine, are developing to enable experimental exploration of algorithmic, software, and system design alternatives.
Published: 2017
Full Text: View/download PDF

29. William James and a New Way of Thinking about Logic

Author: Philip E. Davis
Subjects: Philosophy, Epistemology
Published: 2005
Full Text: View/download PDF

30. Democracy and law

Author: Philip E. Davis
Subjects: Public law, Representative democracy, media_common.quotation_subject, Political science, Law, Direct democracy, Comparative law, Chinese law, Liberal democracy, Municipal law, Democracy, media_common
Published: 1964
Full Text: View/download PDF

31. THE MORAL CONTENT OF LAW

Author: Philip E. Davis
Subjects: Philosophy, Law, Sociology, Content (Freudian dream analysis), Social psychology
Published: 1971
Full Text: View/download PDF

32. THE IS-OUGHT PROBLEM: Its History, Analysis, and Dissolution By William H. Bruening Washington, D.C.: University Press of America, 1978

Author: Philip E. Davis
Subjects: Philosophy, Political science, Economic history, Media studies, Is–ought problem
Published: 1978
Full Text: View/download PDF

33. 'ACTION' AND 'CAUSE OF ACTION'

Author: Philip E. Davis
Subjects: Philosophy, Action (philosophy), Cause of action, Psychology, Neuroscience
Published: 1962
Full Text: View/download PDF

34. Modern Logic in the Service of Law. Ilmar Tammelo

Author: Philip E. Davis
Subjects: Service (business), Philosophy, Law, Business
Published: 1981
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

34 results on '"Philip E. Davis"'

1. Adaptive elasticity policies for staging-based in situ visualization

2. The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science

3. Assembling Portable In-Situ Workflow from Heterogeneous Components using Data Reorganization

4. CoREC

5. Adaptive Elasticity Policies for Staging-Based in Situ Processing

6. Adaptive Placement of Data Analysis Tasks For Staging Based In-Situ Processing

7. An Adaptive Elasticity Policy For Staging Based In-Situ Processing

8. RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows

9. Design and Performance of Kokkos Staging Space toward Scalable Resilient Application Couplings

10. Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

11. Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows

12. Toward Resilient Heterogeneous Computing Workflow through Kokkos-DataSpaces Integration

13. Benesh: a Programming Model for Coupled Scientific Workflows

14. Staging Based Task Execution for Data-driven, In-Situ Scientific Workflows

15. ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management

16. Addressing data resiliency for staging based scientific workflows

17. Data Management for Extreme Scale In-situ Workflows

18. Leveraging Machine Learning for Anticipatory Data Delivery in Extreme Scale In-situ Workflows

19. Single-Event Characterization of the 16 nm FinFET Xilinx UltraScale+TM RFSoC Field-Programmable Gate Array under Proton Irradiation

20. Towards a Smart, Internet-Scale Cache Service for Data Intensive Scientific Applications

21. First coupled GENE–XGC microturbulence simulations

22. Scaling Deep Learning for Cancer with Advanced Workflow Storage Integration

23. Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows

24. Coupling Exascale Multiphysics Applications: Methods and Lessons Learned

25. The SFR-M$_*$ Correlation Extends to Low Mass at High Redshift

26. Scalable Data Resilience for In-memory Data Staging

27. Scalable Parallelization of a Markov Coalescent Genealogy Sampler

28. Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales

29. William James and a New Way of Thinking about Logic

30. Democracy and law

31. THE MORAL CONTENT OF LAW

32. THE IS-OUGHT PROBLEM: Its History, Analysis, and Dissolution By William H. Bruening Washington, D.C.: University Press of America, 1978

33. 'ACTION' AND 'CAUSE OF ACTION'

34. Modern Logic in the Service of Law. Ilmar Tammelo

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

34 results on '"Philip E. Davis"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources