Back to Search Start Over

Development of a High-Performance Eigensolver on a Peta-Scale Next-Generation Supercomputer System

Authors :
Susumu Yamada
Masahiko Machida
Toshiyuki Imamura
Source :
Progress in Nuclear Science and Technology. 2:643-650
Publication Year :
2011
Publisher :
The Atomic Energy Society of Japan, 2011.

Abstract

For current supercomputer systems, multicore and multisocket processors are required in order to build a system, and choice of interconnection is essential. In addition, for effective development of new code, high-performance, scalable, and reliable numerical software is key. ScaLAPACK and PETSc are software developed for distributed memory parallel computer systems. Real computation requires software that is highly tuned for implementation on new architectures, such as many-core processors. In the present study, we introduce a high-performance, highly scalable eigenvalue solver with the goal of realizing the K-computer system, which is a next-generation supercomputer system. We have developed two versions of this eigenvalue solver, namely, the standard version (eigen_s) and an enhanced-performance version (eigen_sx), both of which were developed on the T2K cluster system housed at the University of Tokyo. Eigen_s uses conventional algorithms, such as Householder tridiagonalization, the divide and conquer (DC) algorithm, and the Householder backtransformation. These algorithms are carefully implemented using a blocking technique and flexible two-dimensional data-distribution in order to reduce the overhead of memory traffic and data transfer, respectively. Eigen_s performs excellently on the T2K system with 4,096 cores (theoretical peak: 37.6 TFLOPS) and exhibits fine performance (3.0 TFLOPS) with a 200,000-dimensional matrix. The enhanced version, eigen_sx, uses more advanced algorithms, such as the narrow-band reduction algorithm, DC for band matrices, and the block Householder back-transformation with WY- representation. Even though this version is still in the test stage, eigen_sxhas realized 4.7 TFLOPS with a 200,000-dimensional matrix.

Details

ISSN :
21854823
Volume :
2
Database :
OpenAIRE
Journal :
Progress in Nuclear Science and Technology
Accession number :
edsair.doi...........4db80283d9589ae0400f0822f78c6c8d