1. SciDP: Support HPC and Big Data Applications via Integrated Scientific Data Processing
- Author
-
Kun Feng, Shujia Zhou, Xian-He Sun, and Xi Yang
- Subjects
File system ,Data processing ,010504 meteorology & atmospheric sciences ,Computer science ,business.industry ,Distributed computing ,Big data ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Supercomputer ,01 natural sciences ,Data modeling ,Visualization ,Data visualization ,0202 electrical engineering, electronic engineering, information engineering ,Distributed File System ,business ,computer ,0105 earth and related environmental sciences - Abstract
Modern High Performance Computing (HPC) applications, such as Earth science simulations, produce large amounts of data due to the surging of computing power, while big data applications have become more compute-intensive due to increasingly sophisticated analysis algorithms. The needs of both HPC and big data technologies for advanced HPC and big data applications create a demand for integrated system support. In this study, we introduce Scientific Data Processing (SciDP) to support both HPC and big data applications via integrated scientific data processing. SciDP can directly process scientific data stored on a Parallel File System (PFS), which is typically deployed in an HPC environment, in a big data programming environment running atop Hadoop Distributed File System (HDFS). SciDP seamlessly integrates PFS, HDFS, and the widely-used R data analysis system to support highly efficient processing of scientific data. It utilizes the merits of both PFS and HDFS for fast data transfer, overlaps computing with data accessing, and integrates R into the data transfer process. Experimental results show that SciDP accelerates analysis and visualization of a production NASA Center for Climate Simulation (NCCS) climate and weather application by 6x to 8x when compared to existing solutions.
- Published
- 2018