1. Global-scale distributed I/O with ParaMEDIC.
- Author
-
Balaji, P., Feng, W., Lin, H., Archuleta, J., Matsuoka, S., Warren, A., Setubal, J., Lusk, Ewing, Thakur, R., Foster, I., Katz, D. S., Jha, S., Shinpaugh, K., Coghlan, S., and Reed, D.
- Subjects
GRID computing ,DISTRIBUTED computing ,BIOINFORMATICS ,COMPUTER input-output equipment ,COMPUTER software ,METADATA ,ALGORITHMS ,CLUSTER analysis (Statistics) - Abstract
Achieving high performance for distributed I/O on a wide-area network continues to be an elusive holy grail. Despite enhancements in network hardware as well as software stacks, achieving high-performance remains a challenge. In this paper, our worldwide team took a completely new and non-traditional approach to distributed I/O, called ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing, by utilizing application-specific transformation of data to orders of magnitude smaller metadata before performing the actual I/O. Specifically, this paper details our experiences in deploying a large-scale system to facilitate the discovery of missing genes and constructing a genome similarity tree by encapsulating the mpiBLAST sequence-search algorithm into ParaMEDIC. The overall project involved nine computational sites spread across the U.S. and generated more than a petabyte of data that was 'teleported' to a large-scale facility in Tokyo for storage. Copyright © 2010 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2010