Author: "Scott Klasky" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. Understanding the Impact of Data Staging for Coupled Scientific Workflows

Author: Ana Gainaru, Lipeng Wan, Ruonan Wang, Eric Suchyta, Jieyang Chen, Norbert Podhorszki, James Kress, David Pugmire, and Scott Klasky
Subjects: Computational Theory and Mathematics, Hardware and Architecture, Signal Processing
Published: 2022
Full Text: View/download PDF

2. zMesh: Theories and Methods to Exploring Application Characteristics to Improve Lossy Compression Ratio for Adaptive Mesh Refinement

Author: Huizhang Luo, Junqi Wang, Qing Liu, Jieyang Chen, Scott Klasky, and Norbert Podhorszki
Subjects: Computational Theory and Mathematics, Hardware and Architecture, Signal Processing
Published: 2022
Full Text: View/download PDF

3. MGARD+: Optimizing Multilevel Methods for Error-Bounded Scientific Data Reduction

Author: Lipeng Wan, David Pugmire, Xin Liang, Matthew Wolf, Dingwen Tao, Jieyang Chen, James Kress, Scott Klasky, Qing Liu, Norbert Podhorszki, and Ben Whitney
Subjects: FOS: Computer and information sciences, Computer science, Lossy compression, Theoretical Computer Science, Data modeling, Reduction (complexity), Computer Science - Distributed, Parallel, and Cluster Computing, Computational Theory and Mathematics, Computer engineering, Hardware and Architecture, Compression ratio, Decomposition (computer science), Distributed, Parallel, and Cluster Computing (cs.DC), Decomposition method (constraint satisfaction), Error detection and correction, Software, Data compression
Abstract: Data management is becoming increasingly important in dealing with the large amounts of data produced by large-scale scientific simulations and instruments. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale, but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refactoring framework drawing on previous multilevel methods, to achieve high-performance data decomposition and high-quality error-bounded lossy compression. Our contributions are four-fold: 1) We propose a level-wise coefficient quantization method, which uses different error tolerances to quantize the multilevel coefficients. 2) We propose an adaptive decomposition method which treats the multilevel decomposition as a preconditioner and terminates the decomposition process at an appropriate level. 3) We leverage a set of algorithmic optimization strategies to significantly improve the performance of multilevel decomposition/recomposition. 4) We evaluate our proposed method using four real-world scientific datasets and compare with several state-of-the-art lossy compressors. Experiments demonstrate that our optimizations improve the decomposition/recomposition performance of the existing multilevel method by up to 70X, and the proposed compression method can improve compression ratio by up to 2X compared with other state-of-the-art error-bounded lossy compressors under the same level of data distortion.
Published: 2022
Full Text: View/download PDF

4. Identifying challenges and opportunities of in-memory computing on large HPC systems

Author: Dan Huang, Zhenlu Qin, Qing Liu, Norbert Podhorszki, and Scott Klasky
Subjects: Artificial Intelligence, Computer Networks and Communications, Hardware and Architecture, Software, Theoretical Computer Science
Published: 2022
Full Text: View/download PDF

5. An Algorithmic and Software Pipeline for Very Large Scale Scientific Data Compression with Error Guarantees

Author: Tania Banerjee, Jong Choi, Jaemoon Lee, Qian Gong, Ruonan Wang, Scott Klasky, Anand Rangarajan, and Sanjay Ranka
Published: 2022
Full Text: View/download PDF

6. Hybrid Analysis of Fusion Data for Online Understanding of Complex Science on Extreme Scale Computers

Author: Eric Suchyta, Jong Youl Choi, Seung-Hoe Ku, David Pugmire, Ana Gainaru, Kevin Huck, Ralph Kube, Aaron Scheinberg, Frederic Suter, Choongseock Chang, Todd Munson, Norbert Podhorszki, and Scott Klasky
Published: 2022
Full Text: View/download PDF

7. Online data analysis and reduction: An important Co-design motif for extreme-scale computers

Author: Todd Munson, Ian Foster, Shinjae Yoo, Hubertus J. J. van Dam, Igor Yakushin, Zichao Di, Line Pouchard, Manish Parashar, Kerstin Kleese van Dam, Ali Murat Gok, Kevin Huck, Xin Liang, Ozan Tugluk, Lipeng Wan, Justin M. Wozniak, Wei Xu, Kshitij Mehta, Jong Choi, Matthew Wolf, Mark Ainsworth, Julie Bessac, Franck Cappello, Sheng Di, Tom Peterka, Hanqi Guo, Scott Klasky, Christopher Kelly, and Tong Shu
Subjects: Co-design, Computer science, Computation, 020207 software engineering, 010103 numerical & computational mathematics, 02 engineering and technology, Supercomputer, 01 natural sciences, Exascale computing, Theoretical Computer Science, Computational science, Reduction (complexity), Motif (narrative), Hardware and Architecture, Extreme scale, 0202 electrical engineering, electronic engineering, information engineering, 0101 mathematics, Software
Abstract: A growing disparity between supercomputer computation speeds and I/O rates means that it is rapidly becoming infeasible to analyze supercomputer application output only after that output has been written to a file system. Instead, data-generating applications must run concurrently with data reduction and/or analysis operations, with which they exchange information via high-speed methods such as interprocess communications. The resulting parallel computing motif, online data analysis and reduction (ODAR), has important implications for both application and HPC systems design. Here we introduce the ODAR motif and its co-design concerns, describe a co-design process for identifying and addressing those concerns, present tools that assist in the co-design process, and present case studies to illustrate the use of the process and tools in practical settings.
Published: 2021
Full Text: View/download PDF

8. The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science

Author: Shuangxi Zhang, Berk Geveci, Matthew Wolf, Kevin Huck, E. Suchyta, Cameron W. Smith, Ruonan Wang, Stephane Ethier, Philip E. Davis, Manish Parashar, Pradeep Subedi, Gabriele Merlo, Abolaji Adesoji, Norbert Podhorszki, Qing Liu, Todd Munson, Shirley Moore, Mark S. Shephard, C.S. Chang, Jeremy Logan, Jong Choi, Lipeng Wan, Kai Germaschewski, David Pugmire, Ian Foster, Scott Klasky, Kshitij Mehta, Chris Harris, and Julien Dominski
Subjects: 020203 distributed computing, Fusion, Computer science, 02 engineering and technology, 01 natural sciences, Code coupling, 010305 fluids & plasmas, Theoretical Computer Science, Computational science, High fidelity, Workflow, Hardware and Architecture, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Software
Abstract: We present the Exascale Framework for High Fidelity coupled Simulations (EFFIS), a workflow and code coupling framework developed as part of the Whole Device Modeling Application (WDMApp) in the Exascale Computing Project. EFFIS consists of a library, command line utilities, and a collection of run-time daemons. Together, these software products enable users to easily compose and execute workflows that include: strong or weak coupling, in situ (or offline) analysis/visualization/monitoring, command-and-control actions, remote dashboard integration, and more. We describe WDMApp physics coupling cases and computer science requirements that motivate the design of the EFFIS framework. Furthermore, we explain the essential enabling technology that EFFIS leverages: ADIOS for performant data movement, PerfStubs/TAU for performance monitoring, and an advanced COUPLER for transforming coupling data from its native format to the representation needed by another application. Finally, we demonstrate EFFIS using coupled multi-simulation WDMApp workflows and exemplify how the framework supports the project’s needs. We show that EFFIS and its associated services for data movement, visualization, and performance collection does not introduce appreciable overhead to the WDMApp workflow and that the resource-dominant application’s idle time while waiting for data is minimal.
Published: 2021
Full Text: View/download PDF

9. Exploring Large All-Flash Storage System with Scientific Simulation

Author: Junmin Gu, Greg Eisenhauer, Scott Klasky, Norbert Podhorszki, Ruonan Wang, and Kesheng Wu
Published: 2022
Full Text: View/download PDF

10. Region-adaptive, Error-controlled Scientific Data Compression using Multilevel Decomposition

Author: Qian Gong, Ben Whitney, Chengzhu Zhang, Xin Liang, Anand Rangarajan, Jieyang Chen, Lipeng Wan, Paul Ullrich, Qing Liu, Robert Jacob, Sanjay Ranka, and Scott Klasky
Published: 2022
Full Text: View/download PDF

11. Error-Bounded Learned Scientific Data Compression with Preservation of Derived Quantities

Author: Jaemoon Lee, Qian Gong, Jong Choi, Tania Banerjee, Scott Klasky, Sanjay Ranka, and Anand Rangarajan
Subjects: Fluid Flow and Transfer Processes, Process Chemistry and Technology, General Engineering, General Materials Science, data compression, autoencoders, error guarantees, moment preservation, constraint satisfaction, quantization, fusion application, Instrumentation, Computer Science Applications
Abstract: Scientific applications continue to grow and produce extremely large amounts of data, which require efficient compression algorithms for long-term storage. Compression errors in scientific applications can have a deleterious impact on downstream processing. Thus, it is crucial to preserve all the “known” Quantities of Interest (QoI) during compression. To address this issue, most existing approaches guarantee the reconstruction error of the original data or primary data (PD), but cannot directly control the problem of preserving the QoI. In this work, we propose a physics-informed compression technique that is composed of two parts: (i) reduction of the PD with bounded errors and (ii) preservation of the QoI. In the first step, we combine tensor decompositions, autoencoders, product quantizers, and error-bounded lossy compressors to bound the reconstruction error at high levels of compression. In the second step, we use constraint satisfaction post-processing followed by quantization to preserve the QoI. To illustrate the challenges of reducing the reconstruction errors of the PD and QoI, we focus on simulation data generated by a large-scale fusion code, XGC, which can produce tens of petabytes in a single day. The results show that our approach can achieve a high compression amount while accurately preserving the QoI within scientifically acceptable bounds.
Published: 2022
Full Text: View/download PDF

12. P-ckpt: Coordinated Prioritized Checkpointing

Author: Subhendu Behera, Lipeng Wan, Frank Mueller, Matthew Wolf, and Scott Klasky
Published: 2022
Full Text: View/download PDF

13. Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization

Author: Ruonan Wang, Lipeng Wan, Jean-Luc Vay, Scott Klasky, Jieyang Chen, Ian Foster, Todd Munson, Dmitry Ganyushin, Axel Huebl, Ana Gainaru, Xin Liang, Kesheng Wu, Junmin Gu, Norbert Podhorszki, and Franz Poeschel
Subjects: Large class, FOS: Computer and information sciences, Optimization, Distributed databases, Computer science, Layout, media_common.quotation_subject, Fidelity, IO performance, data access optimization, computer.software_genre, Computer Software, Heuristic algorithms, Arrays, Auxiliary memory, media_common, File system, data layout, Communications Technologies, WarpX, Distributed database, Data layout, Dynamic data, Computational modeling, Parallel IO, data layout IO, Exascale computing, Computational Theory and Mathematics, Computer architecture, Computer Science - Distributed, Parallel, and Cluster Computing, Hardware and Architecture, Signal Processing, Performance evaluation, Distributed, Parallel, and Cluster Computing (cs.DC), Distributed Computing, computer
Abstract: The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O logic due to their irregular and dynamic data distributions. Thus, while the enormous data rates of Exascale simulations already challenge existing file system write strategies, the need for efficient read and processing of generated data introduces additional constraints on the data layout strategies that can be used when writing data to secondary storage. We review these I/O challenges and introduce two online data layout reorganization approaches for achieving good tradeoffs between read and write performance. We demonstrate the benefits of using these two approaches for the ECP particle-in-cell simulation WarpX, which serves as a motif for a large class of important Exascale applications. We show that by understanding application I/O patterns and carefully designing data layouts we can increase read performance by more than 80%., 12 pages, 15 figures, accepted by IEEE Transactions on Parallel and Distributed Systems
Published: 2022

14. The Adaptable IO System (ADIOS)

Author: David Pugmire, Norbert Podhorszki, Scott Klasky, Matthew Wolf, James Kress, Mark Kim, Nicholas Thompson, Jeremy Logan, Ruonan Wang, Kshitij Mehta, Eric Suchyta, William Godoy, Jong Choi, George Ostrouchov, Lipeng Wan, Jieyang Chen, Berk Geveci Chuck Atkins, Caitlin Ross, Greg Eisenhauer, Junmin Gu, John Wu, Axel Huebl, and Seiji Tsutsumi
Published: 2022
Full Text: View/download PDF

15. The Need for Pervasive In Situ Analysis and Visualization (P-ISAV)

Author: David Pugmire, Jian Huang, Kenneth Moreland, and Scott Klasky
Published: 2022
Full Text: View/download PDF

16. Maintaining Trust in Reduction: Preserving the Accuracy of Quantities of Interest for Lossy Compression

Author: Qian Gong, Xin Liang, Ben Whitney, Jong Youl Choi, Jieyang Chen, Lipeng Wan, Stéphane Ethier, Seung-Hoe Ku, R. Michael Churchill, C. -S. Chang, Mark Ainsworth, Ozan Tugluk, Todd Munson, David Pugmire, Richard Archibald, and Scott Klasky
Published: 2022
Full Text: View/download PDF

17. Error-controlled, progressive, and adaptable retrieval of scientific data with multilevel decomposition

Author: Lipeng Wan, Jieyang Chen, Qian Gong, Norbert Podhorszki, Scott Klasky, Qing Liu, Ben Whitney, Rick Archibald, David Pugmire, and Xin Liang
Subjects: business.industry, Computer science, Reading (computer), computer.software_genre, Code refactoring, Data retrieval, Computer data storage, Range (statistics), Overhead (computing), Data mining, business, Error detection and correction, computer, Data compression
Abstract: Extreme-scale simulations and high-resolution instruments have been generating an increasing amount of data, which poses significant challenges to not only data storage during the run, but also post-processing where data will be repeatedly retrieved and analyzed for a long period of time. The challenges in satisfying a wide range of post-hoc analysis needs while minimizing the I/O overhead caused by inappropriate and/or excessive data retrieval should never be left unmanaged. In this paper, we propose a data refactoring, compressing, and retrieval framework capable of 1) fine-grained data refactoring with regard to precision; 2) incrementally retrieving and recomposing the data in terms of various error bounds; and 3) adaptively retrieving data in multi-precision and multi-resolution with respect to different analysis. With the progressive data re-composition and the adaptable retrieval algorithms, our framework significantly reduces the amount of data retrieved when multiple incremental precision are requested and/or the downstream analysis time when coarse resolution is used. Experiments show that the amount of data retrieved under the same progressively requested error bound using our framework is 64% less than that using state-of-the-art single-error-bounded approaches. Parallel experiments with up to 1, 024 cores and ~ 600 GB data in total show that our approach yields 1.36× and 2.52× performance over existing approaches in writing to and reading from persistent storage systems, respectively.
Published: 2021
Full Text: View/download PDF

18. Unbalanced Parallel I/O: An Often-Neglected Side Effect of Lossy Scientific Data Compression

Author: Xinying Wang, Lipeng Wan, Jieyang Chen, Qian Gong, Ben Whitney, Jinzhen Wang, Ana Gainaru, Qing Liu, Norbert Podhorszki, Dongfang Zhao, Feng Yan, and Scott Klasky
Published: 2021
Full Text: View/download PDF

19. A codesign framework for online data analysis and reduction

Author: Keichi Takahashi, Igor Yakushin, Kevin Huck, Swati Singhal, Bryce Allen, Jeremy Logan, Todd Munson, Kshitij Mehta, Alan Sussman, E. Suchyta, Jong Youl Choi, Matthew Wolf, Ian Foster, and Scott Klasky
Subjects: Reduction (complexity), Workflow, Computational Theory and Mathematics, Computer Networks and Communications, business.industry, Computer science, Embedded system, business, Software, Computer Science Applications, Theoretical Computer Science
Published: 2021
Full Text: View/download PDF

20. Multilevel Techniques for Compression and Reduction of Scientific Data---The Unstructured Case

Author: Mark Ainsworth, Ozan Tugluk, Ben Whitney, and Scott Klasky
Subjects: Reduction (complexity), Computational Mathematics, Applied Mathematics, Compression (functional analysis), Polygon mesh, Unstructured data, 010103 numerical & computational mathematics, 0101 mathematics, Lossy compression, 01 natural sciences, Mathematics, Computational science, Data reduction
Abstract: Previous work on multilevel techniques for compression and reduction of scientific data is extended to the case of data given on unstructured meshes in two and three dimensions. The centerpiece of ...
Published: 2020
Full Text: View/download PDF

21. Estimating Lossy Compressibility of Scientific Data Using Deep Neural Networks

Author: Jieyang Chen, Jinzhen Wang, Scott Klasky, Dave Pugmire, Qing Liu, Norbert Podhorszki, and Zhenlu Qin
Subjects: Artificial neural network, business.industry, Computer science, Deep learning, Sampling (statistics), Lossy compression, computer.software_genre, Supercomputer, Overhead (computing), Data mining, Artificial intelligence, business, computer, Volume (compression), Data compression
Abstract: Simulation based scientific applications generate increasingly large amounts of data on high-performance computing (HPC) systems. To allow data to be stored and analyzed efficiently, data compression is often utilized to reduce the volume and velocity of data. However, a question often raised by domain scientists is the level of compression that can be expected so that they can make more informed decisions, balancing between accuracy and performance. In this letter, we propose a deep neural network based approach for estimating the compressibility of scientific data. To train the neural network, we build both general features as well as compressor-specific features so that the characteristics of both data and lossy compressors are captured in training. Our approach is demonstrated to outperform a prior analytical model as well as a sampling based approach in the case of a biased estimation, i.e., for SZ. However, for the unbiased estimation (i.e., ZFP), the sampling based approach yields the best accuracy, despite the high overhead involved in sampling the target dataset.
Published: 2020
Full Text: View/download PDF

22. Characterizing Output Bottlenecks of a Production Supercomputer

Author: David A. Dillow, Scott Klasky, Bing Xie, Jong Youl Choi, Christopher Zimmer, Sarp Oral, Jay Lofstead, Norbert Podhorszki, and Jeffrey S. Chase
Subjects: File system, 020203 distributed computing, Statistical benchmarking, business.industry, Computer science, 020206 networking & telecommunications, 02 engineering and technology, Benchmarking, Parallel computing, computer.software_genre, Supercomputer, Software, Titan (supercomputer), Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, Lustre (file system), Data striping, business, computer
Abstract: This article studies the I/O write behaviors of the Titan supercomputer and its Lustre parallel file stores under production load. The results can inform the design, deployment, and configuration of file systems along with the design of I/O software in the application, operating system, and adaptive I/O libraries. We propose a statistical benchmarking methodology to measure write performance across I/O configurations, hardware settings, and system conditions. Moreover, we introduce two relative measures to quantify the write-performance behaviors of hardware components under production load. In addition to designing experiments and benchmarking on Titan, we verify the experimental results on one real application and one real application I/O kernel, XGC and HACC IO, respectively. These two are representative and widely used to address the typical I/O behaviors of applications. In summary, we find that Titan’s I/O system is variable across the machine at fine time scales. This variability has two major implications. First, stragglers lessen the benefit of coupled I/O parallelism (striping). Peak median output bandwidths are obtained with parallel writes to many independent files, with no striping or write sharing of files across clients (compute nodes). I/O parallelism is most effective when the application—or its I/O libraries—distributes the I/O load so that each target stores files for multiple clients and each client writes files on multiple targets in a balanced way with minimal contention. Second, our results suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify “good locations” in the machine or in the file system: component performance is driven by transient load conditions and past performance is not a useful predictor of future performance. For example, we do not observe diurnal load patterns that are predictable.
Published: 2019
Full Text: View/download PDF

23. Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?

Author: Norbert Podhorszki, Jeremy Logan, George Ostrouchov, Qing Liu, Jong Choi, Xubin He, Matthew Wolf, Dan Huang, and Scott Klasky
Subjects: Input/output, business.industry, Computer science, Quality of service, Distributed computing, 02 engineering and technology, Bandwidth throttling, 020202 computer hardware & architecture, Theoretical Computer Science, Computational Theory and Mathematics, Hardware and Architecture, Computer data storage, 0202 electrical engineering, electronic engineering, information engineering, business, Software
Abstract: For a production high-performance computing (HPC) system, where storage devices are shared between multiple applications and managed in a best effort manner, I/O contention is often a major problem. In this paper, we propose a balanced messaging-based re-routing in conjunction with throttling at the middleware level. This work tackles two key challenges that have not been fully resolved in the past: whether I/O variability can be reduced on a QoS-less HPC storage system, and how to design a runtime scheduling system that can scale up to a large amount of cores. The proposed scheme uses a two-level messaging system to re-route I/O requests to a less congested storage location so that write performance is improved, while limiting the impact on read by throttling re-routing. An analytical model is derived to guide the setup of optimal throttling factor. We thoroughly analyze the virtual messaging layer overhead and explore whether the in-transit buffering is effective in managing I/O variability. Contrary to the intuition, in-transit buffer cannot completely solve the problem. It can reduce the absolute variability but not the relative variability. The proposed scheme is verified against a synthetic benchmark as well as being used by production applications.
Published: 2019
Full Text: View/download PDF

24. Harnessing Data Movement in Virtual Clusters for In-Situ Execution

Author: Dan Huang, Qing Liu, Jun Wang, Norbert Podhorszki, Jeremy Logan, Scott Klasky, and Jong Youl Choi
Subjects: 020203 distributed computing, Computer science, business.industry, Distributed computing, Big data, Network virtualization, Provisioning, 02 engineering and technology, Virtualization, computer.software_genre, Bottleneck, Data modeling, Computational Theory and Mathematics, Hardware and Architecture, Asynchronous communication, Analytics, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, business, computer, Virtual network
Abstract: As a result of increasing data volume and velocity, Big Data science at exascale has shifted towards the in-situ paradigm, where large scale simulations run concurrently alongside data analytics. With in-situ, data generated from simulations can be processed while still in memory, thereby avoiding the slow storage bottleneck. However, running simulations and analytics together on shared resources will likely result in substantial contention if left unmanaged, as demonstrated in this work, leading to much reduced efficiency of simulations and analytics. Recently, virtualization technologies such as Linux containers have been widely applied to data centers and physical clusters to provide highly efficient and elastic resource provisioning for consolidated workloads including scientific simulations and data analytics. In this paper, we investigate to facilitate network traffic manipulation and reduce mutual interference on the network for in-situ applications in virtual clusters. In order to dynamically allocate the network bandwidth when it is needed, we adopt SARIMA-based techniques to analyze and predict MPI traffic issued from simulations. Although this can be an effective technique, the naive usage of network virtualization can lead to performance degradation for bursty asynchronous transmissions within an MPI job. We analyze and resolve this performance degradation in virtual clusters.
Published: 2019
Full Text: View/download PDF

25. Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

Author: Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, and Axel Huebl
Subjects: high performance computing, FOS: Computer and information sciences, openPMD, ADIOS, Computer Science - Distributed, Parallel, and Cluster Computing, big data, RDMA, streaming, Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks., 18 pages, 9 figures, SMC2021, supplementary material at https://zenodo.org/record/4906276
Published: 2021

26. zMesh: Exploring Application Characteristics to Improve Lossy Compression Ratio for Adaptive Mesh Refinement

Author: Huizhang Luo, Junqi Wang, Norbert Podhorszki, Scott Klasky, Jieyang Chen, and Qing Liu
Subjects: Tree (data structure), Tree structure, Redundancy (information theory), Adaptive mesh refinement, Computer science, Compression ratio, Overhead (computing), Data_CODINGANDINFORMATIONTHEORY, Lossy compression, Algorithm, Volume (compression)
Abstract: Scientific simulations on high-performance computing systems produce vast amounts of data that need to be stored and analyzed efficiently. Lossy compression significantly reduces the data volume by trading accuracy for performance. Despite the recent success of lossy compression, such as ZFP and SZ, the compression performance is still far from being able to keep up with the exponential growth of data. This paper aims to further take advantage of application characteristics, an area that is often under-explored, to improve the compression ratios of adaptive mesh refinement (AMR) - a widely used numerical solver that allows for an improved resolution in limited regions. We propose a level reordering technique zMesh to reduce the storage footprint of AMR applications. In particular, we group the data points that are mapped to the same or adjacent geometric coordinates such that the dataset is smoother and more compressible. Unlike the prior work where the compression performance is affected by the overhead of metadata, this work re-generates restore recipe using a chained tree structure, thus involving no extra storage overhead for compressed data, which substantially improves the compression ratios. The results demonstrate that zMesh can improve the smoothness of data by 67.9% and 71.3% for Z-ordering and Hilbert, respectively. Overall, zMesh improves the compression ratios by up to 16.5% and 133.7% for ZFP and SZ, respectively. Despite that zMesh involves additional compute overhead for tree and restore recipe construction, we show that the cost can be amortized as the number of quantities to be compressed increases.
Published: 2021
Full Text: View/download PDF

27. A Framework for International Collaboration on ITER Using Large-Scale Data Transfer to Enable Near-Real-Time Analysis

Author: Seung-Hoe Ku, Hyeon K. Park, C.S. Chang, Jong Choi, Ralph Kube, Robert Hager, T. Carroll, S. Kampel, Ruonan Wang, M. J. Choi, K. Silber, Scott Klasky, Eli Dart, B. S. Cho, Randy Michael Churchill, Matthew Wolf, and J. S. Park
Subjects: Nuclear and High Energy Physics, Energy, Test data generation, Computer science, 020209 energy, Mechanical Engineering, Real-time computing, Molecular, 02 engineering and technology, Large scale data, 01 natural sciences, Atomic, 010305 fluids & plasmas, Particle and Plasma Physics, Nuclear Energy and Engineering, Networking and Information Technology R&D (NITRD), Transfer (computing), 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Nuclear, Civil and Structural Engineering
Abstract: The global nature of the ITER project along with its projected approximately petabyte-per-day data generation presents not only a unique challenge but also an opportunity for the fusion community to rethink, optimize, and enhance our scientific discovery process. Recognizing this, collaborative research with computational scientists was undertaken over the past several years to create a framework for large-scale data movement across wide-area networks to enable global near-real-time analysis of fusion data. This would broaden the available computational resources for analysis/simulation and increase the number of researchers actively participating in experiments. An official demonstration of this framework for fast, large data transfer and real-time analysis was carried out between the KSTAR tokamak in Daejeon, Korea, and Princeton Plasma Physics Laboratory (PPPL) in Princeton, New Jersey. Streaming large data transfer, with near-real-time movie creation and analysis of the KSTAR electron cyclotron emission imaging data, was performed using the Adaptable Input Output (I/O) System (ADIOS) framework, and comparisons were made at PPPL with simulation results from the XGC1 code. These demonstrations were made possible utilizing an optimized network configuration at PPPL, which achieved over 8.8 Gbps (88% utilization) in throughput tests from the National Fusion Research Institute to PPPL. This demonstration showed the feasibility for large-scale data analysis of KSTAR data and provides a nascent framework to enable use of globally distributed computational and personnel resources in pursuit of scientific knowledge from the ITER experiment.
Published: 2021

28. Fides: A General Purpose Data Model Library for Streaming Data

Author: David Pugmire, Caitlin Ross, Nicholas Thompson, James Kress, Chuck Atkins, Scott Klasky, and Berk Geveci
Published: 2021
Full Text: View/download PDF

29. Near real-time analysis of big fusion data on HPC systems

Author: Minjun Choi, C.S. Chang, Ruonan Wang, Jong Choi, Ralph Kube, Scott Klasky, and R. Michael Churchill
Subjects: business.industry, Computer science, Real-time computing, Big data, Cloud computing, Virtualization, computer.software_genre, Visualization, Workflow, Data visualization, Parallel processing (DSP implementation), Benchmark (computing), business, computer
Abstract: We are developing the Delta framework that aims to tackle big-data challenges specific to fusion energy sciences. Delta can be used to connect fusion experiments to remote supercomputers. Streaming measurements to distributed compute resources allows to automatically perform high-dimensional data analysis on a cadence that exceeds experimental schedules. Making data analysis results available before the next experiments allows scientists to make more informed decisions about configuration of upcoming experiments. Here we describe how Delta uses database and virtualization facilities, as well as high-performance computing, at the National Energy Research Compute Center to offer a vertically integrated near real-time data analysis and visualization. We also report on ongoing efforts to port the data analysis part of Delta to graphical processing units, which show a reduction of the analysis wall-time for a benchmark workflow by about 35% when compared to a serial implementation.
Published: 2020
Full Text: View/download PDF

30. Taming I/O Variation on QoS-Less HPC Storage: What Can Applications Do?

Author: Zhenbo Qiao, Qing Liu, Jieyang Chen, Norbert Podhorszki, and Scott Klasky
Subjects: Input/output, 020203 distributed computing, business.industry, Computer science, Distributed computing, Quality of service, 020207 software engineering, 02 engineering and technology, Supercomputer, Plot (graphics), Bottleneck, Modeling and simulation, Load management, Computer data storage, 0202 electrical engineering, electronic engineering, information engineering, business
Abstract: As high-performance computing (HPC) is being scaled up to exascale to accommodate new modeling and simulation needs, I/O has continued to be a major bottleneck in the end-to-end scientific processes. Nevertheless, prior work in this area mostly aimed to maximize the average performance, and there has been a lack of study and solutions that can manage I/O performance variation on HPC systems. This work aims to take advantage of the storage characteristics and explore application level solutions that are interference-aware. In particular, we monitor the performance of data analytics and estimate the state of shared storage resources using discrete fourier transform (DFT). If heavy I/O interference is predicted to occur at a given timestep, data analytics can dynamically adapt to the environment by lowering the accuracy and performing partial or no augmentation from the shared storage, dictated by an augmentation-bandwidth plot. We evaluate three data analytics, XGC, GenASiS, and Jet, on Chameleon, and quantitatively demonstrate that both the average and variation of I/O performance can be vastly improved using our dynamic augmentation, with the mean and variance improved by as much as 67% and 96%, respectively, while maintaining acceptable outcome of data analysis.
Published: 2020
Full Text: View/download PDF

31. A Comprehensive Study of In-Memory Computing on Large HPC Systems

Author: Qing Liu, Dan Huang, Scott Klasky, Norbert Podhorszki, and Zhenlu Qin
Subjects: Computer science, business.industry, 020206 networking & telecommunications, Usability, 02 engineering and technology, Supercomputer, Data science, Domain (software engineering), Dataspaces, Software portability, Workflow, In-Memory Processing, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business
Abstract: With the increasing fidelity and resolution enabled by high-performance computing systems, simulation-based scientific discovery is able to model and understand microscopic physical phenomena at a level that was not possible in the past. A grand challenge that the HPC community is faced with is how to handle the large amounts of analysis data generated from simulations. In-memory computing, among others, is recognized to be a viable path forward and has experienced tremendous success in the past decade. Nevertheless, there has been a lack of a complete study and understanding of in-memory computing as a whole on HPC systems. This paper presents a comprehensive study, which goes well beyond the typical performance metrics. In particular, we assess the in-memory computing with regard to its usability, portability, robustness and internal design trade-offs, which are the key factors that of interest to domain scientists. We use two realistic scientific workflows, LAMMPS and Laplace, to conduct comprehensive studies on state-of-the-art in-memory computing libraries, including DataSpaces, DIMES, Flexpath and Decaf. We conduct cross-platform experiments at scale on two leading supercomputers, Titan at ORNL and Cori at NERSC, and summarize our key findings in this critical area.
Published: 2020
Full Text: View/download PDF

32. A terminology for in situ visualization and analysis systems

Author: Aaron Knoll, Paul A. Navrátil, Steve Petruzza, Venkatram Vishwanath, Michel Rasquin, Silvio Rizzi, Jeremy S. Meredith, Thomas Fogal, Jay Lofstead, Bernd Hentschel, David Rogers, James Kress, Han-Wei Shen, Franz Sauer, Cyrus Harrison, Tom Peterka, David Pugmire, Sudhanshu Sane, Charles Hansen, Kenneth Moreland, Berk Geveci, Matthew Wolf, Kwan-Liu Ma, Janine C. Bennett, Rhonda Vickery, William F. Godoy, Sean B. Ziegeler, Ingo Wald, Eric Brugger, Christoph Garth, Steffen Frey, Joseph A. Insley, Jean M. Favre, Andrew Bauer, Soumya Dutta, Gunther H. Weber, Sean Ahern, Matthieu Dorier, Ruonan Wang, John Patchett, E. Wes Bethel, Chris R. Johnson, Valerio Pascucci, Patrick O'Leary, Preeti Malakar, Norbert Podhorszki, Hongfeng Yu, Brad Whitlock, Matthew Larsen, James Ahrens, Robert Sisneros, Joseph A. Cottam, Scott Klasky, Manish Parashar, Hank Childs, Peer-Timo Bremer, and Will Usher
Subjects: Computer science, Scientific visualization, Umbrella term, 020207 software engineering, 02 engineering and technology, In situ visualization, Data science, Theoretical Computer Science, Visualization, Term (time), Terminology, In situ processing, Hardware and Architecture, Integration Type, 0202 electrical engineering, electronic engineering, information engineering, medicine, 020201 artificial intelligence & image processing, medicine.symptom, Distributed Computing, Software, scientific visualization, Confusion
Abstract: The term “in situ processing” has evolved over the last decade to mean both a specific strategy for visualizing and analyzing data and an umbrella term for a processing paradigm. The resulting confusion makes it difficult for visualization and analysis scientists to communicate with each other and with their stakeholders. To address this problem, a group of over 50 experts convened with the goal of standardizing terminology. This paper summarizes their findings and proposes a new terminology for describing in situ systems. An important finding from this group was that in situ systems are best described via multiple, distinct axes: integration type, proximity, access, division of execution, operation controls, and output type. This paper discusses these axes, evaluates existing systems within the axes, and explores how currently used terms relate to the axes.
Published: 2020

33. Processing Full-Scale Square Kilometre Array Data on the Summit Supercomputer

Author: Chen Wu, Norbert Podhorszki, E. Suchyta, Rodrigo Tobar, Fred Dulwich, Baoqiang Lao, Andreas Wicenec, Tao An, Scott Klasky, Markus Dolensky, Valentine G. Anantharaj, and Ruonan Wang
Subjects: Computer science, Pipeline (computing), Real-time computing, 020206 networking & telecommunications, 02 engineering and technology, Supercomputer, 01 natural sciences, Sextant (astronomical), law.invention, Radio telescope, Pipeline transport, Telescope, Workflow, law, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, 010303 astronomy & astrophysics, Reionization, Radio astronomy
Abstract: This work presents a workflow for simulating and processing the full-scale low-frequency telescope data of the Square Kilometre Array (SKA) Phase 1. The SKA project will enter the construction phase soon, and once completed, it will be the world’s largest radio telescope and one of the world’s largest data generators. The authors used Summit to mimic an endto-end SKA workflow, simulating a dataset of a typical 6 hour observation and then processing that dataset with an imaging pipeline. This workflow was deployed and run on 4,560 compute nodes, and used 27,360 GPUs to generate 2.6 PB of data. This was the first time that radio astronomical data were processed at this scale. Results show that the workflow has the capability to process one of the key SKA science cases, an Epoch of Reionization observation. This analysis also helps reveal critical design factors for the next-generation radio telescopes and the required dedicated processing facilities.
Published: 2020
Full Text: View/download PDF

34. Comparing Time-to-Solution for In Situ Visualization Paradigms at Scale

Author: Matthew Wolf, Hank Childs, Jong Choi, Matthew Larsen, Norbert Podhorszki, Scott Klasky, Mark Kim, James Kress, and David Pugmire
Subjects: Scale (ratio), Total cost, Computer science, business.industry, 020207 software engineering, Volume rendering, 02 engineering and technology, 01 natural sciences, Data modeling, Data visualization, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Concurrent computing, Node (circuits), Resource management (computing), business, 010303 astronomy & astrophysics, Algorithm
Abstract: This short paper considers time-to-solution for two in situ visualization paradigms: in-line and in-transit. It is a follow-on work to two previous studies. The first study [10] considered time-to-solution (wall clock time) and total cost (total node seconds incurred) for a single visualization algorithm (isosurfacing). The second study [11] considered only total cost and added a second algorithm (volume rendering). This short paper completes the evaluation, considering time-to-solution for both algorithms. In particular, it extends the first study by adding additional insights from including a second algorithm at larger scale and by doing more extended and formal analysis regarding time-to-solution. Further, it complements the second study as the best in situ configuration to choose can vary when considering time-to-solution over cost. It also makes use of the same data corpus used in the second study, although that data corpus has been refactored with time-to-solution in mind.
Published: 2020
Full Text: View/download PDF

35. Feature-preserving Lossy Compression for In Situ Data Analysis

Author: Scott Klasky, Matthew Wolf, Kshitij Mehta, Igor Yakushin, Jieyang Chen, Ian Foster, and Todd Munson
Subjects: Computer science, Middleware (distributed applications), Distributed computing, Bandwidth (computing), Lossy compression, computer.software_genre, computer, Pipeline (software)
Abstract: The traditional model of having simulations write data to disk for offline analysis can be prohibitively expensive on computers with limited storage capacity or I/O bandwidth. In situ data analysis has emerged as a necessary paradigm to address this issue and is expected to play an important role in exascale computing. We demonstrate the various aspects and challenges involved in setting up a comprehensive in situ data analysis pipeline that consists of a simulation coupled with compression and feature tracking routines, a framework for assessing compression quality, a middleware library for I/O and data management, and a workflow tool for composing and running the pipeline. We perform studies of compression mechanisms and parameters on two supercomputers, Summit at Oak Ridge National Laboratory and Theta at Argonne National Laboratory, for two example application pipelines. We show that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality. Finally, we discuss our perspective on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.
Published: 2020
Full Text: View/download PDF

36. ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management

Author: Mark Kim, Seiji Tsutsumi, George Ostrouchov, James Kress, Keichi Takahashi, Lipeng Wan, Kesheng Wu, Norbert Podhorszki, Kshitij Mehta, Kai Germaschewski, Franz Poeschel, Scott Klasky, Ruonan Wang, Chuck Atkins, Jong Choi, Matthew Wolf, Qing Liu, David Pugmire, Jeremy Logan, William F. Godoy, Philip E. Davis, Manish Parashar, Junmin Gu, Nicholas Thompson, E. Suchyta, Kevin Huck, Greg Eisenhauer, Axel Huebl, and Tahsin Kurc
Subjects: Staging, Computer science, Fortran, Data management, Scalable I/O, computer.software_genre, 01 natural sciences, Data science, 03 medical and health sciences, Exascale computing, Luster GPFS file systems, 0103 physical sciences, 010306 general physics, MATLAB, 030304 developmental biology, computer.programming_language, lcsh:Computer software, 0303 health sciences, Application programming interface, business.industry, Programming language, In-situ, Python (programming language), Supercomputer, Computer Science Applications, lcsh:QA76.75-76.765, Personal computer, RDMA, business, High-performance computing (HPC), computer, Software
Abstract: Author(s): Godoy, WF; Podhorszki, N; Wang, R; Atkins, C; Eisenhauer, G; Gu, J; Davis, P; Choi, J; Germaschewski, K; Huck, K; Huebl, A; Kim, M; Kress, J; Kurc, T; Liu, Q; Logan, J; Mehta, K; Ostrouchov, G; Parashar, M; Poeschel, F; Pugmire, D; Suchyta, E; Takahashi, K; Thompson, N; Tsutsumi, S; Wan, L; Wolf, M; Wu, K; Klasky, S | Abstract: We present ADIOS 2, the latest version of the Adaptable Input Output (I/O) System. ADIOS 2 addresses scientific data management needs ranging from scalable I/O in supercomputers, to data analysis in personal computer and cloud systems. Version 2 introduces a unified application programming interface (API) that enables seamless data movement through files, wide-area-networks, and direct memory access, as well as high-level APIs for data analysis. The internal architecture provides a set of reusable and extendable components for managing data presentation and transport mechanisms for new applications. ADIOS 2 bindings are available in C++11, C, Fortran, Python, and Matlab and are currently used across different scientific communities. ADIOS 2 provides a communal framework to tackle data management challenges as we approach the exascale era of supercomputing.
Published: 2020

37. Orchestrating Fault Prediction with Live Migration and Checkpointing

Author: Lipeng Wan, Frank Mueller, Subhendu Behera, Matthew Wolf, and Scott Klasky
Subjects: File system, 020203 distributed computing, Computer science, Distributed computing, Fault tolerance, 02 engineering and technology, computer.software_genre, Fault (power engineering), Supercomputer, 020202 computer hardware & architecture, Reduction (complexity), Overhead (business), 0202 electrical engineering, electronic engineering, information engineering, computer, Live migration
Abstract: Checkpoint/Restart (C/R) is widely used to provide fault tolerance on High-Performance Computing (HPC) systems. However, Parallel File System (PFS) overhead and failure uncertainty cause significant application overhead. This paper develops an adaptive multi-level C/R model that incorporates a failure prediction and analysis model, which orchestrates failure prediction, checkpointing, checkpoint frequency, and proactive live migration along with the additional benefit of Burst Buffers (BB). It effectively reduces the overheads due to failures, checkpointing, and recovery. Simulation results for the Summit supercomputer yield a reduction of ~20%-86% in application overhead due to BBs, orchestrated failure prediction, and migration. We also observe a ~29% decrease in checkpoint writes to BBs, which can increase the longevity of the BB storage devices.
Published: 2020
Full Text: View/download PDF

38. Multilevel Techniques for Compression and Reduction of Scientific Data-Quantitative Control of Accuracy in Derived Quantities

Author: Mark Ainsworth, Scott Klasky, Ozan Tugluk, and Ben Whitney
Subjects: Pointwise, business.industry, Applied Mathematics, Big data, 010103 numerical & computational mathematics, 01 natural sciences, Reduction (complexity), Computational Mathematics, Compression (functional analysis), Mathematics::Metric Geometry, 0101 mathematics, business, Algorithm, Data compression, Data reduction, Mathematics
Abstract: Although many compression algorithms are focused on preserving pointwise values of the data, application scientists are generally more concerned with derived quantities. Equally well, the user may ...
Published: 2019
Full Text: View/download PDF

39. Multilevel Techniques for Compression and Reduction of Scientific Data---The Multivariate Case

Author: Mark Ainsworth, Scott Klasky, Ozan Tugluk, and Ben Whitney
Subjects: Multivariate statistics, Applied Mathematics, 010103 numerical & computational mathematics, 01 natural sciences, Reduction (complexity), Computational Mathematics, Multigrid method, Tensor product, Compression (functional analysis), 0101 mathematics, Algorithm, Data reduction, Mathematics, Data compression
Abstract: We develop a technique for multigrid adaptive reduction of data (MGARD). Special attention is given to the case of tensor product grids, where our approach permits the use of nonuniformly spaced gr...
Published: 2019
Full Text: View/download PDF

40. Organizing Large Data Sets for Efficient Analyses on HPC Systems

Author: Junmin Gu, Philip Davis, Greg Eisenhauer, William Godoy, Axel Huebl, Scott Klasky, Manish Parashar, Norbert Podhorszki, Franz Poeschel, JeanLuc Vay, Lipeng Wan, Ruonan Wang, and Kesheng Wu
Subjects: History, Computer Science Applications, Education
Abstract: Upcoming exascale applications could introduce significant data management challenges due to their large sizes, dynamic work distribution, and involvement of accelerators such as graphical processing units, GPUs. In this work, we explore the performance of reading and writing operations involving one such scientific application on two different supercomputers. Our tests showed that the Adaptable Input and Output System, ADIOS, was able to achieve speeds over 1TB/s, a significant fraction of the peak I/O performance on Summit. We also demonstrated the querying functionality in ADIOS could effectively support common selective data analysis operations, such as conditional histograms. In tests, this query mechanism was able to reduce the execution time by a factor of five. More importantly, ADIOS data management framework allows us to achieve these performance improvements with only a minimal amount of coding effort.
Published: 2022
Full Text: View/download PDF

41. Multilevel techniques for compression and reduction of scientific data—the univariate case

Author: Ozan Tugluk, Ben Whitney, Mark Ainsworth, and Scott Klasky
Subjects: Flexibility (engineering), Scale (ratio), Computer science, General Engineering, Univariate, 020207 software engineering, 010103 numerical & computational mathematics, 02 engineering and technology, 01 natural sciences, Theoretical Computer Science, Visualization, Reduction (complexity), Computational Theory and Mathematics, Modeling and Simulation, Compression (functional analysis), 0202 electrical engineering, electronic engineering, information engineering, Range (statistics), Computer Vision and Pattern Recognition, 0101 mathematics, Representation (mathematics), Algorithm, Software
Abstract: We present a multilevel technique for the compression and reduction of univariate data and give an optimal complexity algorithm for its implementation. A hierarchical scheme offers the flexibility to produce multiple levels of partial decompression of the data so that each user can work with a reduced representation that requires minimal storage whilst achieving the required level of tolerance. The algorithm is applied to the case of turbulence modelling in which the datasets are traditionally not only extremely large but inherently non-smooth and, as such, rather resistant to compression. We decompress the data for a range of relative errors, carry out the usual analysis procedures for turbulent data, and compare the results of the analysis on the reduced datasets to the results that would be obtained on the full dataset. The results obtained demonstrate the promise of multilevel compression techniques for the reduction of data arising from large scale simulations of complex phenomena such as turbulence modelling.
Published: 2018
Full Text: View/download PDF

42. SIRIUS: Enabling Progressive Data Exploration for Extreme-Scale Scientific Data

Author: Tao Lu, Scott Klasky, Norbert Podhorszki, Huizhang Luo, Jinzhen Wang, Zhenbo Qiao, and Qing Liu
Subjects: Decimation, business.industry, Computer science, Data management, Feature extraction, computer.software_genre, Supercomputer, Data modeling, Data model, Hardware and Architecture, Control and Systems Engineering, Computer data storage, Data analysis, Data mining, business, computer, Information Systems
Abstract: Scientific simulations on high performance computing (HPC) platforms generate large quantities of data. To bridge the widening gap between compute and I/O, and enable data to be more efficiently stored and analyzed, simulation outputs need to be refactored, reduced, and appropriately mapped to storage tiers. However, a systematic solution to support these steps has been lacking in the current HPC software ecosystem. To that end, this paper develops SIRIUS, a progressive JPEG-like data management scheme for storing and analyzing big scientific data. It co-designs data decimation, compression, and data storage, taking the hardware characteristics of each storage tier into considerations. With reasonably low overhead, our approach refactors simulation data, using either topological or uniform decimation, into a much smaller, reduced-accuracy base dataset, and a series of deltas that is used to augment the accuracy if needed. The base dataset and deltas are compressed and written to multiple storage tiers. Data saved on different tiers can then be selectively retrieved to restore the level of accuracy that satisfies data analytics. Thus, SIRIUS provides a paradigm shift towards elastic data analytics and enables end users to make trade-offs between analysis speed and accuracy on-the-fly. This paper further develops algorithms to preserve statistics for data decimation, a common requirement for reducing data. We assess the impact of SIRIUS on unstructured triangular meshes, a pervasive data model used in scientific simulations. In particular, we evaluate two realistic use cases: the blob detection in fusion and high-pressure area extraction in computational fluid dynamics.
Published: 2018
Full Text: View/download PDF

43. Personalized Search Inspired Fast Interactive Estimation of Distribution Algorithm and Its Application

Author: Dunwei Gong, Yang Chen, Yong Zhang, Jong Choi, Xiaoyan Sun, and Scott Klasky
Subjects: 0209 industrial biotechnology, Computer science, business.industry, Probabilistic logic, Evolutionary algorithm, Statistical model, 02 engineering and technology, Bayesian inference, Machine learning, computer.software_genre, Theoretical Computer Science, Personalized search, 020901 industrial engineering & automation, Computational Theory and Mathematics, Estimation of distribution algorithm, 0202 electrical engineering, electronic engineering, information engineering, Domain knowledge, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, computer, Software, Subspace topology
Abstract: Interactive evolutionary algorithms have been applied to personalized search, in which less user fatigue and efficient search are pursued. Motivated by this, we present a fast interactive estimation of distribution algorithm (IEDA) by using the domain knowledge of personalized search. We first induce a Bayesian model to describe the distribution of the new user’s preference on the variables from the social knowledge of personalized search. Then we employ the model to enhance the performance of IEDA in two aspects, that is: 1) dramatically reducing the initial huge space to a preferred subspace and 2) generating the individuals of estimation of distribution algorithm(EDA) by using it as a probabilistic model. The Bayesian model is updated along with the implementation of the EDA. To effectively evaluate individuals, we further present a method to quantitatively express the preference of the user based on the human-computer interactions and train a radial basis function neural network as the fitness surrogate. The proposed algorithm is applied to a laptop search, and its superiorities in alleviating user fatigue and speeding up the search procedure are empirically demonstrated.
Published: 2017
Full Text: View/download PDF

44. Leading magnetic fusion energy science into the big-and-fast data lane

Author: Minjun Choi, Jong Youl Choi, Ruonan Wang, Choong-Seock Chang, Ralph Kube, Scott Klasky, R Michael Churchill, and Jinseop Park
Subjects: Nuclear physics, Magnetic fusion, Computer science, Energy (signal processing)
Published: 2020
Full Text: View/download PDF

45. Machine Learning for the Complex, Multi-scale Datasets in Fusion Energy

Author: Jong Choi, R. Michael Churchill, C. S. Chang, Scott Klasky, and Ralph Kube
Subjects: Sequence, business.industry, Deep learning, Scale (chemistry), Markov chain Monte Carlo, Fusion power, Machine learning, computer.software_genre, Acceleration, Range (mathematics), symbols.namesake, Key (cryptography), symbols, Artificial intelligence, business, computer
Abstract: ML/AI techniques, particularly based on deep learning, will increasingly be used to accelerate scientific discovery for fusion experiment and simulation. Fusion energy devices have many disparate diagnostic instruments, capturing a broad range of interacting physics phenomena over multiple time and spatial scales. Also, fusion experiments are increasingly built to run longer pulses, with a goal of eventually running a reactor continuously. The confluence of these facts leads to large, complex datasets with phenomena manifest over long sequences. A key challenge is enabling scientists/engineers to utilize these datasets, for example to automatically catalog events of interest, predict the onset of phenomena such as tokamak disruptions, and enable comparisons to models/simulation. Given the size, multiple modalities, and multi-scale nature of fusion data, deep learning models are attractive, but at these scales requires utilizing HPC resources. Many ML/AI techniques not fully utilized now will demand even more HPC resources, such as self-supervised learning to help fusion scientists create AI models with less labelled data, and advanced sequence models which use less GPU memory at the expense of increased compute. Additionally, deep learning models will enable faster, more in-depth analysis than previously available, such as extracting physics model parameters from data using conditional variational autoencoders, instead of slower techniques such as Markov chain Monte Carlo (MCMC). Comparison to simulation will also be enhanced through direct acceleration of simulation kernels using deep learning. These ML/AI techniques will give fusion scientists faster results, allowing more efficient machine use, and faster scientific discovery.
Published: 2020
Full Text: View/download PDF

46. Visualization as a Service for Scientific Data

Author: Scott Klasky, Matthew Wolf, Berk Geveci, Lipeng Wan, Dmitry Ganyushin, Jong Choi, Jeremy Logan, E. Suchyta, Jieyang Chen, Kshitij Mehta, Norbert Podhorszki, Nicholas Thompson, Hank Childs, Steven Walton, Xin Liang, David Pugmire, James Kress, Caitlin Ross, Nicole Marsaglia, and Mark Kim
Subjects: Flexibility (engineering), Service (systems architecture), Workflow, Process (engineering), business.industry, Computer science, Interoperability, Scientific visualization, Use case, Software engineering, business, Visualization
Abstract: One of the primary challenges facing scientists is extracting understanding from the large amounts of data produced by simulations, experiments, and observational facilities. The use of data across the entire lifetime ranging from real-time to post-hoc analysis is complex and varied, typically requiring a collaborative effort across multiple teams of scientists. Over time, three sets of tools have emerged: one set for analysis, another for visualization, and a final set for orchestrating the tasks. This trifurcated tool set often results in the manual assembly of analysis and visualization workflows, which are one-off solutions that are often fragile and difficult to generalize. To address these challenges, we propose a serviced-based paradigm and a set of abstractions to guide its design. These abstractions allow for the creation of services that can access and interpret data, and enable interoperability for intelligent scheduling of workflow systems. This work results from a codesign process over analysis, visualization, and workflow tools to provide the flexibility required for production use. Finally, this paper describes a forward-looking research and development plan that centers on the concept of visualization and analysis technology as reusable services, and also describes several real-world use cases that implement these concepts.
Published: 2020
Full Text: View/download PDF

47. Spatial core-edge coupling of the particle-in-cell gyrokinetic codes GEM and XGC

Author: Frank Jenko, Gabriele Merlo, Scott Klasky, Haotian Chen, Amitava Bhattacharjee, Junyi Cheng, Sarat Sreepathi, Stephane Ethier, Seung-Hoe Ku, Robert Hager, Choong-Seock Chang, E. Suchyta, Yang Chen, Scott Parker, Eduardo D'Azevedo, and Julien Dominski
Subjects: Coupling, Physics, Physics::Instrumentation and Detectors, Interface (Java), Edge region, Graphics processing unit, Edge (geometry), Condensed Matter Physics, 01 natural sciences, 010305 fluids & plasmas, Computational science, Core (optical fiber), 0103 physical sciences, Polygon mesh, Particle-in-cell, 010306 general physics
Abstract: Two existing particle-in-cell gyrokinetic codes, GEM for the core region and XGC for the edge region, have been successfully coupled with a spatial coupling scheme at the interface in a toroidal geometry. A mapping technique is developed for transferring data between GEM's structured and XGC's unstructured meshes. Two examples of coupled simulations are presented to demonstrate the coupling scheme. The optimization of GEM for graphics processing unit is also presented.
Published: 2020

48. Data Federation Challenges in Remote Near-Real-Time Fusion Experiment Data Processing

Author: Ruonan Wang, Kshitij Mehta, Greg Eisenhauer, Jong Choi, Ralph Kube, Minjun Choi, Norbert Podhorszki, Jeremy Logan, Scott Klasky, C. S. Chang, Matthew Wolf, R. Michael Churchill, and Jinseop Park
Subjects: Range (mathematics), Data processing, Workflow, Computer science, Data stream mining, Real-time computing, Volume (computing), Enhanced Data Rates for GSM Evolution, Fusion power, Variety (cybernetics)
Abstract: Fusion energy experiments and simulations provide critical information needed to plan future fusion reactors. As next-generation devices like ITER move toward long-pulse experiments, analyses, including AI and ML, should be performed in a wide range of time and computing constraints, from near-real-time constraints, between-shot analysis, and to campaign-wide long-term analysis. However, the data volume, velocity, and variety make it extremely challenging for analyses using only local computational resources. Researchers need the ability to compose and execute workflows spanning edge resources to large-scale high-performance computing facilities.
Published: 2020
Full Text: View/download PDF

49. Opportunities for Cost Savings with In-Transit Visualization

Author: Hank Childs, Jong Choi, James Kress, Norbert Podhorszki, David Pugmire, Matthew Larsen, Matthew Wolf, Mark Kim, and Scott Klasky
Subjects: Computer engineering, Computer science, 0202 electrical engineering, electronic engineering, information engineering, 020207 software engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, Cost savings, Visualization
Abstract: We analyze the opportunities for in-transit visualization to provide cost savings compared to in-line visualization. We begin by developing a cost model that includes factors related to both in-line and in-transit which allows comparisons to be made between the two methods. We then run a series of studies to create a corpus of data for our model. We run two different visualization algorithms, one that is computation heavy and one that is communication heavy with concurrencies up to 32, 768 cores. Our primary results are in exploring the cost model within the context of our corpus. Our findings show that in-transit consistently achieves significant cost efficiencies by running visualization algorithms at lower concurrency, and that in many cases these efficiencies are enough to offset other costs (transfer, blocking, and additional nodes) to be cost effective overall. Finally, this work informs future studies, which can focus on choosing ideal configurations for in-transit processing that can consistently achieve cost efficiencies.
Published: 2020
Full Text: View/download PDF

50. Understanding Performance-Quality Trade-offs in Scientific Visualization Workflows with Lossy Compression

Author: Ben Whitney, Jong Youl Choi, David Pugmire, Nicholas Thompson, Jeremy Logan, Scott Klasky, Kshitij Mehta, Jieyang Chen, Matthew Wolf, and Lipeng Wan
Subjects: Workflow, Computer science, Trade offs, Scientific visualization, Lossy compression, Data science, Storage efficiency, Visualization, Performance quality
Abstract: The cost of I/O is a significant challenge on current supercomputers, and the trend is likely to continue into the foreseeable future. This challenge is amplified in scientific visualization because of the requirement to consume large amounts of data before processing can begin. Lossy compression has become an important technique in reducing the cost of performing I/O. In this paper we consider the implications of using compressed data for visualization within a scientific workflow. We use visualization operations on simulation data that is reduced using three different state-of-the-art compression techniques. We study the storage efficiency and preservation of visualization features on the resulting compressed data, and draw comparisons between the three techniques used. Our contributions can help inform both scientists and researchers in the use and design of compression techniques for preservation of important visualization details.
Published: 2019
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

233 results on '"Scott Klasky"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources