Back to Search Start Over

Implementing Fast, Virtualized Profiling to Eliminate Cache Warming

Authors :
Nikoleris, Nikos
Sandberg, Andreas
Hagersten, Erik
Carlson, Trevor E.
Nikoleris, Nikos
Sandberg, Andreas
Hagersten, Erik
Carlson, Trevor E.
Publication Year :
2016

Abstract

Simulation is an important part of the evaluation of next-generation computing systems. Detailed, cycle-level simulation, however, can be very slow when evaluating realistic workloads on modern microarchitectures. Sampled simulation (e.g., SMARTS and SimPoint) improves simulation performance by an order of magnitude or more through the reduction of large workloads into a small but representative sample. Additionally, the execution state just prior to a simulation sample can be stored into checkpoints, allowing for fast restoration and evaluation. Unfortunately, changes in software, architecture or fundamental pieces of the microarchitecture (e.g., hardware-software co-design) require checkpoint regeneration. The end result for co-design degenerates to creating checkpoints for each modification, a task checkpointing was designed to eliminate. Therefore, a solution is needed that allows for fast and accurate simulation, without the need for checkpoints. Virtualized fast-forwarding proposals, like FSA, are an alternative to checkpoints that speed up sampled simulation by advancing the execution at near-native speed between simulation points. They rely, however, on functional simulation to warm the architectural state prior to each simulation point, a costly operation for moderately-sized last-level caches (e.g., above 8MB). Simulating future systems with DRAM caches of many GBs can require warming of billions of instructions, dominating the time for simulation and negating the benefit of virtualized fast-forwarding. This paper proposes CoolSim, an efficient simulation framework that eliminates cache warming. CoolSim advances between simulation points using virtualized fast-forwarding, while collecting sparse memory reuse information (MRI). The MRI is collected more than an order of magnitude faster than functional warming. At the simulation point, detailed simulation is used to evaluate the design while a statistical cache model uses the previously acquired MRI to esti

Details

Database :
OAIster
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1440263533
Document Type :
Electronic Resource