Back to Search
Start Over
Stream-K++: Adaptive GPU GEMM Kernel Scheduling and Selection using Bloom Filters
- Publication Year :
- 2024
-
Abstract
- General matrix multiplication (GEMM) operations are crucial in various computational fields. As GPU architectures evolve, optimizing GEMM performance becomes increasingly important. This paper introduces Stream-K++, an enhancement to the promising Stream-K GEMM scheduling algorithm. We expand Stream-K's scheduling policies from three to seven and implement an efficient solution selection mechanism using Bloom filters. Our approach rapidly eliminates up to 95.8% of unsuitable configurations while maintaining a 100% true-negative rate. Implemented using the AMD Composable Kernel library and evaluated on AMD Instinct MI250X GPUs, Stream-K++ demonstrates significant performance gains (up to 43%) in select scenarios. It remains competitive (within 20% of optimal) for 60-97.6% of problem sizes. Our flexible framework, implemented in the Opensieve C++ library, allows for easy adaptation to new problem sizes, scheduling policies, or additional tuning parameters, paving the way for future optimizations in GPU-based GEMM operations.
- Subjects :
- Computer Science - Distributed, Parallel, and Cluster Computing
D.2
I.2
Subjects
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2408.11417
- Document Type :
- Working Paper