1. Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache
- Author
-
Bruce Jacob, Mehdi Asnaashari, Luyi Kang, Donald Yeung, Devesh Singh, Sylvain Dubois, Candace Walden, Shang Li, and Meenatchi Jagasivamani
- Subjects
Interconnection ,Hardware_MEMORYSTRUCTURES ,Computer science ,business.industry ,Controller (computing) ,020207 software engineering ,02 engineering and technology ,Die (integrated circuit) ,020202 computer hardware & architecture ,Resistive random-access memory ,CMOS ,Hardware and Architecture ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Central processing unit ,Cache ,business ,Software ,Dram ,Information Systems - Abstract
Many emerging non-volatile memories are compatible with CMOS logic, potentially enabling their integration into a CPU’s die. This article investigates such monolithically integrated CPU–main memory chips. We exploit non-volatile memories employing 3D crosspoint subarrays, such as resistive RAM (ReRAM), and integrate them over the CPU’s last-level cache (LLC). The regular structure of cache arrays enables co-design of the LLC and ReRAM main memory for area efficiency. We also develop a streamlined LLC/main memory interface that employs a single shared internal interconnect for both the cache and main memory arrays, and uses a unified controller to service both LLC and main memory requests. We apply our monolithic design ideas to a many-core CPU by integrating 3D ReRAM over each core’s LLC slice. We find that co-design of the LLC and ReRAM saves 27% of the total LLC–main memory area at the expense of slight increases in delay and energy. The streamlined LLC/main memory interface saves an additional 12% in area. Our simulation results show monolithic integration of CPU and main memory improves performance by 5.3× and 1.7× over HBM2 DRAM for several graph and streaming kernels, respectively. It also reduces the memory system’s energy by 6.0× and 1.7×, respectively. Moreover, we show that the area savings of co-design permits the CPU to have 23% more cores and main memory, and that streamlining the LLC/main memory interface incurs a small 4% performance penalty.
- Published
- 2021
- Full Text
- View/download PDF