Back to Search Start Over

Stream-K++: Adaptive GPU GEMM Kernel Scheduling and Selection using Bloom Filters

Authors :
Sadasivan, Harisankar
Osama, Muhammad
Podkorytov, Maksim
Huang, Carlus
Liu, Jun
Publication Year :
2024

Abstract

General matrix multiplication (GEMM) operations are crucial in various computational fields. As GPU architectures evolve, optimizing GEMM performance becomes increasingly important. This paper introduces Stream-K++, an enhancement to the promising Stream-K GEMM scheduling algorithm. We expand Stream-K's scheduling policies from three to seven and implement an efficient solution selection mechanism using Bloom filters. Our approach rapidly eliminates up to 95.8% of unsuitable configurations while maintaining a 100% true-negative rate. Implemented using the AMD Composable Kernel library and evaluated on AMD Instinct MI250X GPUs, Stream-K++ demonstrates significant performance gains (up to 43%) in select scenarios. It remains competitive (within 20% of optimal) for 60-97.6% of problem sizes. Our flexible framework, implemented in the Opensieve C++ library, allows for easy adaptation to new problem sizes, scheduling policies, or additional tuning parameters, paving the way for future optimizations in GPU-based GEMM operations.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2408.11417
Document Type :
Working Paper