Back to Search
Start Over
Adaptive Algorithms for Shared Cache on Multicore
- Source :
- [Research Report] RR-7256, INRIA. 2010, pp.17
- Publication Year :
- 2010
- Publisher :
- HAL CCSD, 2010.
-
Abstract
- Reordering instructions and data layout can bring significant performance improvement for memory bounded applications. Parallelizing such applications requires a careful design of the algorithm in order to keep the locality of the sequential execution. On one hand, parallel computation tends to create concurrent tasks that work on independent data sets to reduce communication and synchronization. On the other hand, a multicore architecture with shared cache can bring performance benefits due to high-speed communication between cores if concurrent tasks process data close in memory. In this paper, we aim at finding a good parallelization of memory bounded applications on multicore that preserves the advantage of a shared cache. We focus on sequential applications with iteration through a sequence of memory references. Our solution relies on an adaptive parallel algorithm with a dynamic sliding window that constrains cores sharing the same cache to process data close in memory. This parallel algorithm induces the same number of cache misses as the sequential algorithm at the expense of an increased number of synchronizations. We theoretically analyze the synchronization overhead for both static and dynamic load balancing. Experiments with a memory bounded isosurface extraction application confirm that core collaboration for shared cache access can bring significant performance improvements despite the incurred synchronization costs. On quad cores Nehalem processor, our algorithms are 10% to 30% faster than algorithms not optimized for shared cache thanks to a reduced number of last level cache misses.
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- [Research Report] RR-7256, INRIA. 2010, pp.17
- Accession number :
- edsair.dedup.wf.001..67dcb3e2a4e90e21a5c841167cc22066