Author: "Yağlıkçı, A. Giray" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yağlıkçı, A. Giray"' showing total 90 results

Start Over Author "Yağlıkçı, A. Giray"

90 results on '"Yağlıkçı, A. Giray"'

1. Proteus: Achieving High-Performance Processing-Using-DRAM via Dynamic Precision Bit-Serial Arithmetic

Author: Oliveira, Geraldo F., Kabra, Mayank, Guo, Yuxin, Chen, Kangqi, Yağlıkçı, A. Giray, Soysal, Melina, Sadrosadati, Mohammad, Bueno, Joaquin Olivares, Ghose, Saugata, Gómez-Luna, Juan, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Processing-using-DRAM (PUD) is a paradigm where the analog operational properties of DRAM structures are used to perform bulk logic operations. While PUD promises high throughput at low energy and area cost, we uncover three limitations of existing PUD approaches that lead to significant inefficiencies: (i) static data representation, i.e., 2's complement with fixed bit-precision, leading to unnecessary computation over useless (i.e., inconsequential) data; (ii) support for only throughput-oriented execution, where the high latency of individual PUD operations can only be hidden in the presence of bulk data-level parallelism; and (iii) high latency for high-precision (e.g., 32-bit) operations. To address these issues, we propose Proteus, which builds on two key ideas. First, Proteus parallelizes the execution of independent primitives in a PUD operation by leveraging DRAM's internal parallelism. Second, Proteus reduces the bit-precision for PUD operations by leveraging narrow values (i.e., values with many leading zeros). We compare Proteus to different state-of-the-art computing platforms (CPU, GPU, and the SIMDRAM PUD architecture) for twelve real-world applications. Using a single DRAM bank, Proteus provides (i) 17x, 7.3x, and 10.2x the performance per mm2; and (ii) 90.3x, 21x, and 8.1x lower energy consumption than that of the CPU, GPU, and SIMDRAM, respectively, on average across twelve real-world applications. Proteus incurs low area cost on top of a DRAM chip (1.6%) and CPU die (0.03%).
Published: 2025

2. Enabling Efficient and Scalable DRAM Read Disturbance Mitigation via New Experimental Insights into Modern DRAM Chips

Author: Yağlıkçı, Abdullah Giray
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: Increasing storage density exacerbates DRAM read disturbance, a circuit-level vulnerability exploited by system-level attacks. Unfortunately, existing defenses are either ineffective or prohibitively expensive. Efficient mitigation is critical to ensure robust (reliable, secure, and safe) execution in future DRAM-based systems. This dissertation tackles two problems: 1) protecting DRAM-based systems becomes more expensive as technology scaling increases read disturbance vulnerability, and 2) many existing solutions depend on proprietary knowledge of DRAM internals. First, we build a detailed understanding of DRAM read disturbance by rigorously characterizing off-the-shelf modern DRAM chips under varying 1) temperatures, 2) memory access patterns, 3) in-chip locations, and 4) voltage. Our novel observations demystify the implications of large DRAM read disturbance variation on future DRAM read disturbance attacks and solutions. Second, we propose new mechanisms that mitigate read disturbance bitflips efficiently and scalably by leveraging insights into DRAM chip design: 1) subarray-level parallelism and 2) variation in read disturbance across DRAM rows in off-the-shelf DRAM chips. Third, we propose a novel solution that mitigates DRAM read disturbance by selectively throttling unsafe memory accesses that might otherwise cause read disturbance bitflips without proprietary knowledge of DRAM chip internals. We demonstrate that it is possible to mitigate DRAM read disturbance efficiently and scalably with worsening DRAM read disturbance by 1) building a detailed understanding of DRAM read disturbance, 2) leveraging insights into DRAM chips, and 3) devising novel solutions that do not require proprietary knowledge of DRAM chip internals. Our experimental insights and solutions enable future works targeting robust memory systems., Comment: Doctoral thesis
Published: 2024

3. Understanding the Security Benefits and Overheads of Emerging Industry Solutions to DRAM Read Disturbance

Author: Canpolat, Oğuzhan, Yağlıkçı, A. Giray, Oliveira, Geraldo F., Olgun, Ataberk, Ergin, Oğuz, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: We present the first rigorous security, performance, energy, and cost analyses of the state-of-the-art on-DRAM-die read disturbance mitigation method, Per Row Activation Counting (PRAC), described in JEDEC DDR5 specification's April 2024 update. Unlike prior state-of-the-art that advises the memory controller to periodically issue refresh management (RFM) commands, which provides the DRAM chip with time to perform refreshes, PRAC introduces a new back-off signal. PRAC's back-off signal propagates from the DRAM chip to the memory controller and forces the memory controller to 1) stop serving requests and 2) issue RFM commands. As a result, RFM commands are issued when needed as opposed to periodically, reducing RFM's overheads. We analyze PRAC in four steps. First, we define an adversarial access pattern that represents the worst-case for PRAC's security. Second, we investigate PRAC's configurations and security implications. Our analyses show that PRAC can be configured for secure operation as long as no bitflip occurs before accessing a memory location 10 times. Third, we evaluate the performance impact of PRAC and compare it against prior works using Ramulator 2.0. Our analysis shows that while PRAC incurs less than 13% performance overhead for today's DRAM chips, its performance overheads can reach up to 94% for future DRAM chips that are more vulnerable to read disturbance bitflips. Fourth, we define an availability adversarial access pattern that exacerbates PRAC's performance overhead to perform a memory performance attack, demonstrating that such an adversarial pattern can hog up to 94% of DRAM throughput and degrade system throughput by up to 95%. We discuss PRAC's implications on future systems and foreshadow future research directions. To aid future research, we open-source our implementations and scripts at https://github.com/CMU-SAFARI/ramulator2., Comment: To appear in DRAMSec 2024
Published: 2024

4. RowPress Vulnerability in Modern DRAM Chips

Author: Luo, Haocong, Olgun, Ataberk, Yağlıkçı, A. Giray, Tuğrul, Yahya Can, Rhyner, Steve, Cavlak, Meryem Banu, Lindegger, Joël, Sadrosadati, Mohammad, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: Memory isolation is a critical property for system reliability, security, and safety. We demonstrate RowPress, a DRAM read disturbance phenomenon different from the well-known RowHammer. RowPress induces bitflips by keeping a DRAM row open for a long period of time instead of repeatedly opening and closing the row. We experimentally characterize RowPress bitflips, showing their widespread existence in commodity off-the-shelf DDR4 DRAM chips. We demonstrate RowPress bitflips in a real system that already has RowHammer protection, and propose effective mitigation techniques that protect DRAM against both RowHammer and RowPress., Comment: To Appear in IEEE MICRO Top Picks Special Issue (July-August 2024). arXiv admin note: substantial text overlap with arXiv:2306.17061
Published: 2024

5. An Experimental Characterization of Combined RowHammer and RowPress Read Disturbance in Modern DRAM Chips

Author: Luo, Haocong, Yüksel, Ismail Emir, Olgun, Ataberk, Yağlıkçı, A. Giray, Sadrosadati, Mohammad, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: DRAM read disturbance can break memory isolation, a fundamental property to ensure system robustness (i.e., reliability, security, safety). RowHammer and RowPress are two different DRAM read disturbance phenomena. RowHammer induces bitflips in physically adjacent victim DRAM rows by repeatedly opening and closing an aggressor DRAM row, while RowPress induces bitflips by keeping an aggressor DRAM row open for a long period of time. In this study, we characterize a DRAM access pattern that combines RowHammer and RowPress in 84 real DDR4 DRAM chips from all three major DRAM manufacturers. Our key results show that 1) this combined RowHammer and RowPress pattern takes significantly smaller amount of time (up to 46.1% faster) to induce the first bitflip compared to the state-of-the-art RowPress pattern, and 2) at the minimum aggressor row activation count to induce at least one bitflip, the bits that flip are different across RowHammer, RowPress, and the combined patterns. Based on our results, we provide a key hypothesis that the read disturbance effect caused by RowPress from one of the two aggressor rows in a double-sided pattern is much more significant than the other., Comment: To appear at DSN Disrupt 2024 (June 2024)
Published: 2024

6. Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis

Author: Yuksel, Ismail Emir, Tugrul, Yahya Can, Bostanci, F. Nisa, Oliveira, Geraldo F., Yaglikci, A. Giray, Olgun, Ataberk, Soysal, Melina, Luo, Haocong, Gómez-Luna, Juan, Sadrosadati, Mohammad, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: We experimentally analyze the computational capability of commercial off-the-shelf (COTS) DRAM chips and the robustness of these capabilities under various timing delays between DRAM commands, data patterns, temperature, and voltage levels. We extensively characterize 120 COTS DDR4 chips from two major manufacturers. We highlight four key results of our study. First, COTS DRAM chips are capable of 1) simultaneously activating up to 32 rows (i.e., simultaneous many-row activation), 2) executing a majority of X (MAJX) operation where X>3 (i.e., MAJ5, MAJ7, and MAJ9 operations), and 3) copying a DRAM row (concurrently) to up to 31 other DRAM rows, which we call Multi-RowCopy. Second, storing multiple copies of MAJX's input operands on all simultaneously activated rows drastically increases the success rate (i.e., the percentage of DRAM cells that correctly perform the computation) of the MAJX operation. For example, MAJ3 with 32-row activation (i.e., replicating each MAJ3's input operands 10 times) has a 30.81% higher average success rate than MAJ3 with 4-row activation (i.e., no replication). Third, data pattern affects the success rate of MAJX and Multi-RowCopy operations by 11.52% and 0.07% on average. Fourth, simultaneous many-row activation, MAJX, and Multi-RowCopy operations are highly resilient to temperature and voltage changes, with small success rate variations of at most 2.13% among all tested operations. We believe these empirical results demonstrate the promising potential of using DRAM as a computation substrate. To aid future research and development, we open-source our infrastructure at https://github.com/CMU-SAFARI/SiMRA-DRAM., Comment: To appear in DSN 2024
Published: 2024

7. BreakHammer: Enhancing RowHammer Mitigations by Carefully Throttling Suspect Threads

Author: Canpolat, Oğuzhan, Yağlıkçı, A. Giray, Olgun, Ataberk, Yüksel, İsmail Emir, Tuğrul, Yahya Can, Kanellopoulos, Konstantinos, Ergin, Oğuz, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: RowHammer is a major read disturbance mechanism in DRAM where repeatedly accessing (hammering) a row of DRAM cells (DRAM row) induces bitflips in other physically nearby DRAM rows. RowHammer solutions perform preventive actions (e.g., refresh neighbor rows of the hammered row) that mitigate such bitflips to preserve memory isolation, a fundamental building block of security and privacy in modern computing systems. However, preventive actions induce non-negligible memory request latency and system performance overheads as they interfere with memory requests. As shrinking technology node size over DRAM chip generations exacerbates RowHammer, the overheads of RowHammer solutions become prohibitively expensive. As a result, a malicious program can effectively hog the memory system and deny service to benign applications by causing many RowHammer-preventive actions. In this work, we tackle the performance overheads of RowHammer solutions by tracking and throttling the generators of memory accesses that trigger RowHammer solutions. To this end, we propose BreakHammer. BreakHammer 1) observes the time-consuming RowHammer-preventive actions of existing RowHammer mitigation mechanisms, 2) identifies hardware threads that trigger many of these actions, and 3) reduces the memory bandwidth usage of each identified thread. As such, BreakHammer significantly reduces the number of RowHammer-preventive actions performed, thereby improving 1) system performance and DRAM energy, and 2) reducing the maximum slowdown induced on a benign application, with near-zero area overhead. Our extensive evaluations demonstrate that BreakHammer effectively reduces the negative performance, energy, and fairness effects of eight RowHammer mitigation mechanisms. To foster further research we open-source our BreakHammer implementation and scripts at https://github.com/CMU-SAFARI/BreakHammer., Comment: To appear in MICRO'24
Published: 2024

8. Amplifying Main Memory-Based Timing Covert and Side Channels using Processing-in-Memory Operations

Author: Kanellopoulos, Konstantinos, Bostanci, F. Nisa, Olgun, Ataberk, Yaglikci, A. Giray, Yuksel, Ismail Emir, Ghiasi, Nika Mansouri, Bingol, Zulal, Sadrosadati, Mohammad, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: The adoption of processing-in-memory (PiM) architectures has been gaining momentum because they provide high performance and low energy consumption by alleviating the data movement bottleneck. Yet, the security of such architectures has not been thoroughly explored. The adoption of PiM solutions provides a new way to directly access main memory, which malicious user applications can exploit. We show that this new way to access main memory opens opportunities for high-throughput timing attacks that are hard-to-mitigate without significant performance overhead. We introduce IMPACT, a set of high-throughput main memory-based timing attacks that leverage characteristics of PiM architectures to establish covert and side channels. IMPACT enables high-throughput communication and private information leakage by exploiting the shared DRAM row buffer. To achieve high throughput, IMPACT (i) eliminates cache bypassing steps required by processor-centric main memory and cache-based timing attacks and (ii) leverages the intrinsic parallelism of PiM operations. We showcase two applications of IMPACT. First, we build two covert-channel attacks that run on the host CPU and leverage different PiM approaches to gain direct and fast access to main memory and establish high-throughput communication covert channels. Second, we showcase a side-channel attack that leaks private information of concurrently running victim applications that are accelerated with PiM. Our results demonstrate that (i) our covert channels achieve 12.87 Mb/s and 14.16 Mb/s communication throughput, respectively, which is up to 4.91x and 5.41x faster than the state-of-the-art main memory-based covert channels, and (ii) our side-channel attack allows the attacker to leak secrets with a low error rate. To avoid such covert and side channels in emerging PiM systems, we propose and evaluate three defenses.
Published: 2024

9. MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing

Author: Oliveira, Geraldo F., Olgun, Ataberk, Yağlıkçı, Abdullah Giray, Bostancı, F. Nisa, Gómez-Luna, Juan, Ghose, Saugata, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a DRAM array's massive internal parallelism to execute very-wide data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways. First, since applications have varying degrees of SIMD parallelism, PUD execution often leads to underutilization, throughput loss, and energy waste. Second, most PUD architectures are limited to the execution of parallel map operations. Third, the need to feed the wide DRAM row with tens of thousands of data elements combined with the lack of adequate compiler support for PUD systems create a programmability barrier. Our goal is to design a flexible PUD system that overcomes the limitations caused by the large and rigid granularity of PUD. To this end, we propose MIMDRAM, a hardware/software co-designed PUD system that introduces new mechanisms to allocate and control only the necessary resources for a given PUD operation. The key idea of MIMDRAM is to leverage fine-grained DRAM (i.e., the ability to independently access smaller segments of a large DRAM row) for PUD computation. MIMDRAM exploits this key idea to enable a multiple-instruction multiple-data (MIMD) execution model in each DRAM subarray. We evaluate MIMDRAM using twelve real-world applications and 495 multi-programmed application mixes. Our evaluation shows that MIMDRAM provides 34x the performance, 14.3x the energy efficiency, 1.7x the throughput, and 1.3x the fairness of a state-of-the-art PUD framework, along with 30.6x and 6.8x the energy efficiency of a high-end CPU and GPU, respectively. MIMDRAM adds small area cost to a DRAM chip (1.11%) and CPU die (0.6%)., Comment: Extended version of HPCA 2024 paper. arXiv admin note: text overlap with arXiv:2109.05881 by other authors
Published: 2024

10. Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions

Author: Yağlıkçı, Abdullah Giray, Tuğrul, Yahya Can, Oliveira, Geraldo F., Yüksel, İsmail Emir, Olgun, Ataberk, Luo, Haocong, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: Read disturbance in modern DRAM chips is a widespread phenomenon and is reliably used for breaking memory isolation, a fundamental building block for building robust systems. RowHammer and RowPress are two examples of read disturbance in DRAM where repeatedly accessing (hammering) or keeping active (pressing) a memory location induces bitflips in other memory locations. Unfortunately, shrinking technology node size exacerbates read disturbance in DRAM chips over generations. As a result, existing defense mechanisms suffer from significant performance and energy overheads, limited effectiveness, or prohibitively high hardware complexity. In this paper, we tackle these shortcomings by leveraging the spatial variation in read disturbance across different memory locations in real DRAM chips. To do so, we 1) present the first rigorous real DRAM chip characterization study of spatial variation of read disturbance and 2) propose Sv\"ard, a new mechanism that dynamically adapts the aggressiveness of existing solutions based on the row-level read disturbance profile. Our experimental characterization on 144 real DDR4 DRAM chips representing 10 chip designs demonstrates a large variation in read disturbance vulnerability across different memory locations: in the part of memory with the worst read disturbance vulnerability, 1) up to 2x the number of bitflips can occur and 2) bitflips can occur at an order of magnitude fewer accesses, compared to the memory locations with the least vulnerability to read disturbance. Sv\"ard leverages this variation to reduce the overheads of five state-of-the-art read disturbance solutions, and thus significantly increases system performance., Comment: A shorter version of this work is to appear at the 30th IEEE International Symposium on High-Performance Computer Architecture (HPCA-30), 2024
Published: 2024

11. CoMeT: Count-Min-Sketch-based Row Tracking to Mitigate RowHammer at Low Cost

Author: Bostanci, F. Nisa, Yuksel, Ismail Emir, Olgun, Ataberk, Kanellopoulos, Konstantinos, Tugrul, Yahya Can, Yaglikci, A. Giray, Sadrosadati, Mohammad, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: We propose a new RowHammer mitigation mechanism, CoMeT, that prevents RowHammer bitflips with low area, performance, and energy costs in DRAM-based systems at very low RowHammer thresholds. The key idea of CoMeT is to use low-cost and scalable hash-based counters to track DRAM row activations. CoMeT uses the Count-Min Sketch technique that maps each DRAM row to a group of counters, as uniquely as possible, using multiple hash functions. When a DRAM row is activated, CoMeT increments the counters mapped to that DRAM row. Because the mapping from DRAM rows to counters is not completely unique, activating one row can increment one or more counters mapped to another row. Thus, CoMeT may overestimate, but never underestimates, a DRAM row's activation count. This property of CoMeT allows it to securely prevent RowHammer bitflips while properly configuring its hash functions reduces overestimations. As a result, CoMeT 1) implements substantially fewer counters than the number of DRAM rows in a DRAM bank and 2) does not significantly overestimate a DRAM row's activation count. Our comprehensive evaluations show that CoMeT prevents RowHammer bitflips with an average performance overhead of only 4.01% across 61 benign single-core workloads for a very low RowHammer threshold of 125, normalized to a system with no RowHammer mitigation. CoMeT achieves a good trade-off between performance, energy, and area overheads. Compared to the best-performing state-of-the-art mitigation, CoMeT requires 74.2x less area overhead at the RowHammer threshold 125 and incurs a small performance overhead on average for all RowHammer thresholds. Compared to the best-performing low-area-cost mechanism, at a very low RowHammer threshold of 125, CoMeT improves performance by up to 39.1% while incurring a similar area overhead. CoMeT is openly and freely available at https://github.com/CMU-SAFARI/CoMeT., Comment: To appear at HPCA 2024
Published: 2024

12. Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis

Author: Yuksel, Ismail Emir, Tugrul, Yahya Can, Olgun, Ataberk, Bostanci, F. Nisa, Yaglikci, A. Giray, Oliveira, Geraldo F., Luo, Haocong, Gómez-Luna, Juan, Sadrosadati, Mohammad, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Processing-using-DRAM (PuD) is an emerging paradigm that leverages the analog operational properties of DRAM circuitry to enable massively parallel in-DRAM computation. PuD has the potential to reduce or eliminate costly data movement between processing elements and main memory. Prior works experimentally demonstrate three-input MAJ (MAJ3) and two-input AND and OR operations in commercial off-the-shelf (COTS) DRAM chips. Yet, demonstrations on COTS DRAM chips do not provide a functionally complete set of operations. We experimentally demonstrate that COTS DRAM chips are capable of performing 1) functionally-complete Boolean operations: NOT, NAND, and NOR and 2) many-input (i.e., more than two-input) AND and OR operations. We present an extensive characterization of new bulk bitwise operations in 256 off-the-shelf modern DDR4 DRAM chips. We evaluate the reliability of these operations using a metric called success rate: the fraction of correctly performed bitwise operations. Among our 19 new observations, we highlight four major results. First, we can perform the NOT operation on COTS DRAM chips with a 98.37% success rate on average. Second, we can perform up to 16-input NAND, NOR, AND, and OR operations on COTS DRAM chips with high reliability (e.g., 16-input NAND, NOR, AND, and OR with an average success rate of 94.94%, 95.87%, 94.94%, and 95.85%, respectively). Third, data pattern only slightly affects bitwise operations. Our results show that executing NAND, NOR, AND, and OR operations with random data patterns decreases the success rate compared to all logic-1/logic-0 patterns by 1.39%, 1.97%, 1.43%, and 1.98%, respectively. Fourth, bitwise operations are highly resilient to temperature changes, with small success rate fluctuations of at most 1.66% when the temperature is increased from 50C to 95C. We open-source our infrastructure at https://github.com/CMU-SAFARI/FCDRAM, Comment: A shorter version of this work is to appear at the 30th IEEE International Symposium on High-Performance Computer Architecture (HPCA-30), 2024
Published: 2024

13. Rethinking the Producer-Consumer Relationship in Modern DRAM-Based Systems

Author: Patel, Minesh, Shahroodi, Taha, Manglik, Aditya, Yağlıkçı, Abdullah Giray, Olgun, Ataberk, Luo, Haocong, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: Generational improvements to commodity DRAM throughout half a century have long solidified its prevalence as main memory across the computing industry. However, overcoming today's DRAM technology scaling challenges requires new solutions driven by both DRAM producers and consumers. In this paper, we observe that the separation of concerns between producers and consumers specified by industry-wide DRAM standards is becoming a liability to progress in addressing scaling-related concerns. To understand the problem, we study four key directions for overcoming DRAM scaling challenges using system-memory cooperation: (i) improving memory access latencies; (ii) reducing DRAM refresh overheads; (iii) securely defending against the RowHammer vulnerability; and (iv) addressing worsening memory errors. We find that the single most important barrier to advancement in all four cases is the consumer's lack of insight into DRAM reliability. Based on an analysis of DRAM reliability testing, we recommend revising the separation of concerns to incorporate limited information transparency between producers and consumers. Finally, we propose adopting this revision in a two-step plan, starting with immediate information release through crowdsourcing and publication and culminating in widespread modifications to DRAM standards., Comment: arXiv admin note: substantial text overlap with arXiv:2204.10378
Published: 2024

14. PULSAR: Simultaneous Many-Row Activation for Reliable and High-Performance Computing in Off-the-Shelf DRAM Chips

Author: Yuksel, Ismail Emir, Tugrul, Yahya Can, Bostanci, F. Nisa, Yaglikci, Abdullah Giray, Olgun, Ataberk, Oliveira, Geraldo F., Soysal, Melina, Luo, Haocong, Luna, Juan Gomez, Sadrosadati, Mohammad, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Data movement between the processor and the main memory is a first-order obstacle against improving performance and energy efficiency in modern systems. To address this obstacle, Processing-using-Memory (PuM) is a promising approach where bulk-bitwise operations are performed leveraging intrinsic analog properties within the DRAM array and massive parallelism across DRAM columns. Unfortunately, 1) modern off-the-shelf DRAM chips do not officially support PuM operations, and 2) existing techniques of performing PuM operations on off-the-shelf DRAM chips suffer from two key limitations. First, these techniques have low success rates, i.e., only a small fraction of DRAM columns can correctly execute PuM operations because they operate beyond manufacturer-recommended timing constraints, causing these operations to be highly susceptible to noise and process variation. Second, these techniques have limited compute primitives, preventing them from fully leveraging parallelism across DRAM columns and thus hindering their performance benefits. We propose PULSAR, a new technique to enable high-success-rate and high-performance PuM operations in off-the-shelf DRAM chips. PULSAR leverages our new observation that a carefully crafted sequence of DRAM commands simultaneously activates up to 32 DRAM rows. PULSAR overcomes the limitations of existing techniques by 1) replicating the input data to improve the success rate and 2) enabling new bulk bitwise operations (e.g., many-input majority, Multi-RowInit, and Bulk-Write) to improve the performance. Our analysis on 120 off-the-shelf DDR4 chips from two major manufacturers shows that PULSAR achieves a 24.18% higher success rate and 121% higher performance over seven arithmetic-logic operations compared to FracDRAM, a state-of-the-art off-the-shelf DRAM-based PuM technique.
Published: 2023

15. Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips

Author: Olgun, Ataberk, Osseiran, Majd, Yaglikci, Abdullah Giray, Tugrul, Yahya Can, Luo, Haocong, Rhyner, Steve, Salami, Behzad, Luna, Juan Gomez, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: We experimentally demonstrate the effects of read disturbance (RowHammer and RowPress) and uncover the inner workings of undocumented read disturbance defense mechanisms in High Bandwidth Memory (HBM). Detailed characterization of six real HBM2 DRAM chips in two different FPGA boards shows that (1) the read disturbance vulnerability significantly varies between different HBM2 chips and between different components (e.g., 3D-stacked channels) inside a chip, (2) DRAM rows at the end and in the middle of a bank are more resilient to read disturbance, (3) fewer additional activations are sufficient to induce more read disturbance bitflips in a DRAM row if the row exhibits the first bitflip at a relatively high activation count, (4) a modern HBM2 chip implements undocumented read disturbance defenses that track potential aggressor rows based on how many times they are activated. We describe how our findings could be leveraged to develop more powerful read disturbance attacks and more efficient defense mechanisms. We open source all our code and data to facilitate future research at https://github.com/CMU-SAFARI/HBM-Read-Disturbance., Comment: To appear in DSN 2024
Published: 2023

16. ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation

Author: Olgun, Ataberk, Tugrul, Yahya Can, Bostanci, Nisa, Yuksel, Ismail Emir, Luo, Haocong, Rhyner, Steve, Yaglikci, Abdullah Giray, Oliveira, Geraldo F., and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: We introduce ABACuS, a new low-cost hardware-counter-based RowHammer mitigation technique that performance-, energy-, and area-efficiently scales with worsening RowHammer vulnerability. We observe that both benign workloads and RowHammer attacks tend to access DRAM rows with the same row address in multiple DRAM banks at around the same time. Based on this observation, ABACuS's key idea is to use a single shared row activation counter to track activations to the rows with the same row address in all DRAM banks. Unlike state-of-the-art RowHammer mitigation mechanisms that implement a separate row activation counter for each DRAM bank, ABACuS implements fewer counters (e.g., only one) to track an equal number of aggressor rows. Our evaluations show that ABACuS securely prevents RowHammer bitflips at low performance/energy overhead and low area cost. We compare ABACuS to four state-of-the-art mitigation mechanisms. At a near-future RowHammer threshold of 1000, ABACuS incurs only 0.58% (0.77%) performance and 1.66% (2.12%) DRAM energy overheads, averaged across 62 single-core (8-core) workloads, requiring only 9.47 KiB of storage per DRAM rank. At the RowHammer threshold of 1000, the best prior low-area-cost mitigation mechanism incurs 1.80% higher average performance overhead than ABACuS, while ABACuS requires 2.50X smaller chip area to implement. At a future RowHammer threshold of 125, ABACuS performs very similarly to (within 0.38% of the performance of) the best prior performance- and energy-efficient RowHammer mitigation mechanism while requiring 22.72X smaller chip area. ABACuS is freely and openly available at https://github.com/CMU-SAFARI/ABACuS., Comment: To appear in USENIX Security '24
Published: 2023

17. Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator

Author: Luo, Haocong, Tuğrul, Yahya Can, Bostancı, F. Nisa, Olgun, Ataberk, Yağlıkçı, A. Giray, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: We present Ramulator 2.0, a highly modular and extensible DRAM simulator that enables rapid and agile implementation and evaluation of design changes in the memory controller and DRAM to meet the increasing research effort in improving the performance, security, and reliability of memory systems. Ramulator 2.0 abstracts and models key components in a DRAM-based memory system and their interactions into shared interfaces and independent implementations. Doing so enables easy modification and extension of the modeled functions of the memory controller and DRAM in Ramulator 2.0. The DRAM specification syntax of Ramulator 2.0 is concise and human-readable, facilitating easy modifications and extensions. Ramulator 2.0 implements a library of reusable templated lambda functions to model the functionalities of DRAM commands to simplify the implementation of new DRAM standards, including DDR5, LPDDR5, HBM3, and GDDR6. We showcase Ramulator 2.0's modularity and extensibility by implementing and evaluating a wide variety of RowHammer mitigation techniques that require different memory controller design changes. These techniques are added modularly as separate implementations without changing any code in the baseline memory controller implementation. Ramulator 2.0 is rigorously validated and maintains a fast simulation speed compared to existing cycle-accurate DRAM simulators. Ramulator 2.0 is open-sourced under the permissive MIT license at https://github.com/CMU-SAFARI/ramulator2
Published: 2023

18. RowPress: Amplifying Read Disturbance in Modern DRAM Chips

Author: Luo, Haocong, Olgun, Ataberk, Yağlıkçı, A. Giray, Tuğrul, Yahya Can, Rhyner, Steve, Cavlak, Meryem Banu, Lindegger, Joël, Sadrosadati, Mohammad, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: Memory isolation is critical for system reliability, security, and safety. Unfortunately, read disturbance can break memory isolation in modern DRAM chips. For example, RowHammer is a well-studied read-disturb phenomenon where repeatedly opening and closing (i.e., hammering) a DRAM row many times causes bitflips in physically nearby rows. This paper experimentally demonstrates and analyzes another widespread read-disturb phenomenon, RowPress, in real DDR4 DRAM chips. RowPress breaks memory isolation by keeping a DRAM row open for a long period of time, which disturbs physically nearby rows enough to cause bitflips. We show that RowPress amplifies DRAM's vulnerability to read-disturb attacks by significantly reducing the number of row activations needed to induce a bitflip by one to two orders of magnitude under realistic conditions. In extreme cases, RowPress induces bitflips in a DRAM row when an adjacent row is activated only once. Our detailed characterization of 164 real DDR4 DRAM chips shows that RowPress 1) affects chips from all three major DRAM manufacturers, 2) gets worse as DRAM technology scales down to smaller node sizes, and 3) affects a different set of DRAM cells from RowHammer and behaves differently from RowHammer as temperature and access pattern changes. We demonstrate in a real DDR4-based system with RowHammer protection that 1) a user-level program induces bitflips by leveraging RowPress while conventional RowHammer cannot do so, and 2) a memory controller that adaptively keeps the DRAM row open for a longer period of time based on access pattern can facilitate RowPress-based attacks. To prevent bitflips due to RowPress, we describe and evaluate a new methodology that adapts existing RowHammer mitigation techniques to also mitigate RowPress with low additional performance overhead. We open source all our code and data to facilitate future research on RowPress., Comment: Extended version of the paper "RowPress: Amplifying Read Disturbance in Modern DRAM Chips" at the 50th Annual International Symposium on Computer Architecture (ISCA), 2023
Published: 2023

19. TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs

Author: Yüksel, İsmail Emir, Olgun, Ataberk, Salami, Behzad, Bostancı, F. Nisa, Tuğrul, Yahya Can, Yağlıkçı, A. Giray, Ghiasi, Nika Mansouri, Mutlu, Onur, and Ergin, Oğuz
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: Prior works propose SRAM-based TRNGs that extract entropy from SRAM arrays. SRAM arrays are widely used in a majority of specialized or general-purpose chips that perform the computation to store data inside the chip. Thus, SRAM-based TRNGs present a low-cost alternative to dedicated hardware TRNGs. However, existing SRAM-based TRNGs suffer from 1) low TRNG throughput, 2) high energy consumption, 3) high TRNG latency, and 4) the inability to generate true random numbers continuously, which limits the application space of SRAM-based TRNGs. Our goal in this paper is to design an SRAM-based TRNG that overcomes these four key limitations and thus, extends the application space of SRAM-based TRNGs. To this end, we propose TuRaN, a new high-throughput, energy-efficient, and low-latency SRAM-based TRNG that can sustain continuous operation. TuRaN leverages the key observation that accessing SRAM cells results in random access failures when the supply voltage is reduced below the manufacturer-recommended supply voltage. TuRaN generates random numbers at high throughput by repeatedly accessing SRAM cells with reduced supply voltage and post-processing the resulting random faults using the SHA-256 hash function. To demonstrate the feasibility of TuRaN, we conduct SPICE simulations on different process nodes and analyze the potential of access failure for use as an entropy source. We verify and support our simulation results by conducting real-world experiments on two commercial off-the-shelf FPGA boards. We evaluate the quality of the random numbers generated by TuRaN using the widely-adopted NIST standard randomness tests and observe that TuRaN passes all tests. TuRaN generates true random numbers with (i) an average (maximum) throughput of 1.6Gbps (1.812Gbps), (ii) 0.11nJ/bit energy consumption, and (iii) 278.46us latency.
Published: 2022

20. Fundamentally Understanding and Solving RowHammer

Author: Mutlu, Onur, Olgun, Ataberk, and Yağlıkçı, A. Giray
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: We provide an overview of recent developments and future directions in the RowHammer vulnerability that plagues modern DRAM (Dynamic Random Memory Access) chips, which are used in almost all computing systems as main memory. RowHammer is the phenomenon in which repeatedly accessing a row in a real DRAM chip causes bitflips (i.e., data corruption) in physically nearby rows. This phenomenon leads to a serious and widespread system security vulnerability, as many works since the original RowHammer paper in 2014 have shown. Recent analysis of the RowHammer phenomenon reveals that the problem is getting much worse as DRAM technology scaling continues: newer DRAM chips are fundamentally more vulnerable to RowHammer at the device and circuit levels. Deeper analysis of RowHammer shows that there are many dimensions to the problem as the vulnerability is sensitive to many variables, including environmental conditions (temperature \& voltage), process variation, stored data patterns, as well as memory access patterns and memory control policies. As such, it has proven difficult to devise fully-secure and very efficient (i.e., low-overhead in performance, energy, area) protection mechanisms against RowHammer and attempts made by DRAM manufacturers have been shown to lack security guarantees. After reviewing various recent developments in exploiting, understanding, and mitigating RowHammer, we discuss future directions that we believe are critical for solving the RowHammer problem. We argue for two major directions to amplify research and development efforts in: 1) building a much deeper understanding of the problem and its many dimensions, in both cutting-edge DRAM chips and computing systems deployed in the field, and 2) the design and development of extremely efficient and fully-secure solutions via system-memory cooperation., Comment: Invited paper to appear in ASPDAC 2023
Published: 2022
Full Text: View/download PDF

21. DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips

Author: Olgun, Ataberk, Hassan, Hasan, Yağlıkçı, A. Giray, Tuğrul, Yahya Can, Orosa, Lois, Luo, Haocong, Patel, Minesh, Ergin, Oğuz, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: To understand and improve DRAM performance, reliability, security and energy efficiency, prior works study characteristics of commodity DRAM chips. Unfortunately, state-of-the-art open source infrastructures capable of conducting such studies are obsolete, poorly supported, or difficult to use, or their inflexibility limit the types of studies they can conduct. We propose DRAM Bender, a new FPGA-based infrastructure that enables experimental studies on state-of-the-art DRAM chips. DRAM Bender offers three key features at the same time. First, DRAM Bender enables directly interfacing with a DRAM chip through its low-level interface. This allows users to issue DRAM commands in arbitrary order and with finer-grained time intervals compared to other open source infrastructures. Second, DRAM Bender exposes easy-to-use C++ and Python programming interfaces, allowing users to quickly and easily develop different types of DRAM experiments. Third, DRAM Bender is easily extensible. The modular design of DRAM Bender allows extending it to (i) support existing and emerging DRAM interfaces, and (ii) run on new commercial or custom FPGA boards with little effort. To demonstrate that DRAM Bender is a versatile infrastructure, we conduct three case studies, two of which lead to new observations about the DRAM RowHammer vulnerability. In particular, we show that data patterns supported by DRAM Bender uncovers a larger set of bit-flips on a victim row compared to the data patterns commonly used by prior work. We demonstrate the extensibility of DRAM Bender by implementing it on five different FPGAs with DDR4 and DDR3 support. DRAM Bender is freely and openly available at https://github.com/CMU-SAFARI/DRAM-Bender., Comment: Extended version of paper that is to appear in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
Published: 2022

22. SpyHammer: Understanding and Exploiting RowHammer under Fine-Grained Temperature Variations

Author: Orosa, Lois, Rührmair, Ulrich, Yaglikci, A. Giray, Luo, Haocong, Olgun, Ataberk, Jattke, Patrick, Patel, Minesh, Kim, Jeremie, Razavi, Kaveh, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: RowHammer is a DRAM vulnerability that can cause bit errors in a victim DRAM row solely by accessing its neighboring DRAM rows at a high-enough rate. Recent studies demonstrate that new DRAM devices are becoming increasingly vulnerable to RowHammer, and many works demonstrate system-level attacks for privilege escalation or information leakage. In this work, we perform the first rigorous fine-grained characterization and analysis of the correlation between RowHammer and temperature. We show that RowHammer is very sensitive to temperature variations, even if the variations are very small (e.g., $\pm 1$ {\deg}C). We leverage two key observations from our analysis to spy on DRAM temperature: 1) RowHammer-induced bit error rate consistently increases (or decreases) as the temperature increases, and 2) some DRAM cells that are vulnerable to RowHammer exhibit bit errors only at a particular temperature. Based on these observations, we propose a new RowHammer attack, called SpyHammer, that spies on the temperature of DRAM on critical systems such as industrial production lines, vehicles, and medical systems. SpyHammer is the first practical attack that can spy on DRAM temperature. Our evaluation in a controlled environment shows that SpyHammer can infer the temperature of the victim DRAM modules with an error of less than $\pm 2.5$ {\deg}C at the 90th percentile of all tested temperatures, for 12 real DRAM modules (120 DRAM chips) from four main manufacturers., Comment: This work is to appear at IEEE Access, 2024
Published: 2022

23. HiRA: Hidden Row Activation for Reducing Refresh Latency of Off-the-Shelf DRAM Chips

Author: Yağlıkçı, Abdullah Giray, Olgun, Ataberk, Patel, Minesh, Luo, Haocong, Hassan, Hasan, Orosa, Lois, Ergin, Oğuz, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: DRAM is the building block of modern main memory systems. DRAM cells must be periodically refreshed to prevent data loss. Refresh operations degrade system performance by interfering with memory accesses. As DRAM chip density increases with technology node scaling, refresh operations also increase because: 1) the number of DRAM rows in a chip increases; and 2) DRAM cells need additional refresh operations to mitigate bit failures caused by RowHammer, a failure mechanism that becomes worse with technology node scaling. Thus, it is critical to enable refresh operations at low performance overhead. To this end, we propose a new operation, Hidden Row Activation (HiRA), and the HiRA Memory Controller (HiRA-MC). HiRA hides a refresh operation's latency by refreshing a row concurrently with accessing or refreshing another row within the same bank. Unlike prior works, HiRA achieves this parallelism without any modifications to off-the-shelf DRAM chips. To do so, it leverages the new observation that two rows in the same bank can be activated without data loss if the rows are connected to different charge restoration circuitry. We experimentally demonstrate on 56% real off-the-shelf DRAM chips that HiRA can reliably parallelize a DRAM row's refresh operation with refresh or activation of any of the 32% of the rows within the same bank. By doing so, HiRA reduces the overall latency of two refresh operations by 51.4%. HiRA-MC modifies the memory request scheduler to perform HiRA when a refresh operation can be performed concurrently with a memory access or another refresh. Our system-level evaluations show that HiRA-MC increases system performance by 12.6% and 3.73x as it reduces the performance degradation due to periodic refreshes and refreshes for RowHammer protection (preventive refreshes), respectively, for future DRAM chips with increased density and RowHammer vulnerability., Comment: To appear in the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022
Published: 2022

24. Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network Inference

Author: Orosa, Lois, Koppula, Skanda, Kanellopoulos, Konstantinos, Yağlıkçı, A. Giray, Mutlu, Onur, Pasricha, Sudeep, editor, and Shafique, Muhammad, editor
Published: 2024
Full Text: View/download PDF

25. Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture

Author: Olgun, Ataberk, Bostanci, F. Nisa, Oliveira, Geraldo F., Tugrul, Yahya Can, Bera, Rahul, Yaglikci, A. Giray, Hassan, Hasan, Ergin, Oguz, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: We propose Sectored DRAM, a new, low-overhead DRAM substrate that reduces wasted energy by enabling fine-grained DRAM data transfers and DRAM row activation. Sectored DRAM leverages two key ideas to enable fine-grained data transfers and row activation at low chip area cost. First, a cache block transfer between main memory and the memory controller happens in a fixed number of clock cycles where only a small portion of the cache block (a word) is transferred in each cycle. Sectored DRAM augments the memory controller and the DRAM chip to execute cache block transfers in a variable number of clock cycles based on the workload access pattern with minor modifications to the memory controller's and the DRAM chip's circuitry. Second, a large DRAM row, by design, is already partitioned into smaller independent physically isolated regions. Sectored DRAM provides the memory controller with the ability to activate each such region based on the workload access pattern via small modifications to the DRAM chip's array access circuitry. Activating smaller regions of a large row relaxes DRAM power delivery constraints and allows the memory controller to schedule DRAM accesses faster. Compared to a system with coarse-grained DRAM, Sectored DRAM reduces the DRAM energy consumption of highly-memory-intensive workloads by up to (on average) 33% (20%) while improving their performance by up to (on average) 36% (17%). Sectored DRAM's DRAM energy savings, combined with its system performance improvement, allows system-wide energy savings of up to 23%. Sectored DRAM's DRAM chip area overhead is 1.7% the area of a modern DDR4 chip. We hope and believe that Sectored DRAM's ideas and results will help to enable more efficient and high-performance memory systems. To this end, we open source Sectored DRAM at https://github.com/CMU-SAFARI/Sectored-DRAM., Comment: Extended version of paper that is to appear in ACM Transactions on Architecture and Code Optimization (ACM TACO)
Published: 2022

26. Self-Managing DRAM: A Low-Cost Framework for Enabling Autonomous and Efficient in-DRAM Operations

Author: Hassan, Hasan, Olgun, Ataberk, Yaglikci, A. Giray, Luo, Haocong, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: The memory controller is in charge of managing DRAM maintenance operations (e.g., refresh, RowHammer protection, memory scrubbing) to reliably operate modern DRAM chips. Implementing new maintenance operations often necessitates modifications in the DRAM interface, memory controller, and potentially other system components. Such modifications are only possible with a new DRAM standard, which takes a long time to develop, likely leading to slow progress in the adoption of new architectural techniques in DRAM chips. We propose a new low-cost DRAM architecture, Self-Managing DRAM (SMD), that enables autonomous in-DRAM maintenance operations by transferring the responsibility for controlling maintenance operations from the memory controller to the SMD chip. To enable autonomous maintenance operations, we make a single modification to the DRAM interface, such that an SMD chip rejects memory controller accesses to DRAM regions under maintenance, while allowing memory accesses to others. Thus, SMD enables 1) implementing new in-DRAM maintenance mechanisms (or modifying existing ones) with no further changes in the DRAM interface or other system components, and 2) overlapping the latency of a maintenance operation in one DRAM region with the latency of accessing data in another. We evaluate SMD and show that it 1) can be implemented without adding new pins to the DDRx interface with low latency and area overhead, 2) achieves 4.1% average speedup across 20 four-core memory-intensive workloads over a DDR4-based system/DRAM co-design technique that intelligently parallelizes maintenance operations with memory accesses, and 3) guarantees forward progress for rejected memory accesses. We believe and hope SMD can enable innovations in DRAM architecture to rapidly come to fruition. We open source all SMD source code and data at https://github.com/CMU-SAFARI/SelfManagingDRAM., Comment: Extended version of MICRO 2024 paper titled "Self-Managing DRAM: A Low-Cost Framework for Enabling Autonomous and Efficient DRAM Maintenance Operations''
Published: 2022

27. Understanding RowHammer Under Reduced Wordline Voltage: An Experimental Study Using Real DRAM Devices

Author: Yağlıkçı, A. Giray, Luo, Haocong, de Oliviera, Geraldo F., Olgun, Ataberk, Patel, Minesh, Park, Jisung, Hassan, Hasan, Kim, Jeremie S., Orosa, Lois, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: RowHammer is a circuit-level DRAM vulnerability, where repeatedly activating and precharging a DRAM row, and thus alternating the voltage of a row's wordline between low and high voltage levels, can cause bit flips in physically nearby rows. Recent DRAM chips are more vulnerable to RowHammer: with technology node scaling, the minimum number of activate-precharge cycles to induce a RowHammer bit flip reduces and the RowHammer bit error rate increases. Therefore, it is critical to develop effective and scalable approaches to protect modern DRAM systems against RowHammer. To enable such solutions, it is essential to develop a deeper understanding of the RowHammer vulnerability of modern DRAM chips. However, even though the voltage toggling on a wordline is a key determinant of RowHammer vulnerability, no prior work experimentally demonstrates the effect of wordline voltage (VPP) on the RowHammer vulnerability. Our work closes this gap in understanding. This is the first work to experimentally demonstrate on 272 real DRAM chips that lowering VPP reduces a DRAM chip's RowHammer vulnerability. We show that lowering VPP 1) increases the number of activate-precharge cycles needed to induce a RowHammer bit flip by up to 85.8% with an average of 7.4% across all tested chips and 2) decreases the RowHammer bit error rate by up to 66.9% with an average of 15.2% across all tested chips. At the same time, reducing VPP marginally worsens a DRAM cell's access latency, charge restoration, and data retention time within the guardbands of system-level nominal timing parameters for 208 out of 272 tested chips. We conclude that reducing VPP is a promising strategy for reducing a DRAM chip's RowHammer vulnerability without requiring modifications to DRAM chips., Comment: To appear in DSN 2022
Published: 2022

28. A Case for Transparent Reliability in DRAM Systems

Author: Patel, Minesh, Shahroodi, Taha, Manglik, Aditya, Yaglikci, A. Giray, Olgun, Ataberk, Luo, Haocong, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: Today's systems have diverse needs that are difficult to address using one-size-fits-all commodity DRAM. Unfortunately, although system designers can theoretically adapt commodity DRAM chips to meet their particular design goals (e.g., by reducing access timings to improve performance, implementing system-level RowHammer mitigations), we observe that designers today lack sufficient insight into commodity DRAM chips' reliability characteristics to implement these techniques in practice. In this work, we make a case for DRAM manufacturers to provide increased transparency into key aspects of DRAM reliability (e.g., basic chip design properties, testing strategies). Doing so enables system designers to make informed decisions to better adapt commodity DRAM to meet modern systems' needs while preserving its cost advantages. To support our argument, we study four ways that system designers can adapt commodity DRAM chips to system-specific design goals: (1) improving DRAM reliability; (2) reducing DRAM refresh overheads; (3) reducing DRAM access latency; and (4) mitigating RowHammer attacks. We observe that adopting solutions for any of the four goals requires system designers to make assumptions about a DRAM chip's reliability characteristics. These assumptions discourage system designers from using such solutions in practice due to the difficulty of both making and relying upon the assumption. We identify DRAM standards as the root of the problem: current standards rigidly enforce a fixed operating point with no specifications for how a system designer might explore alternative operating points. To overcome this problem, we introduce a two-step approach that reevaluates DRAM standards with a focus on transparency of DRAM reliability so that system designers are encouraged to make the most of commodity DRAM technology for both current and future DRAM chips.
Published: 2022

29. DR-STRaNGe: End-to-End System Design for DRAM-based True Random Number Generators

Author: Bostancı, F. Nisa, Olgun, Ataberk, Orosa, Lois, Yağlıkçı, A. Giray, Kim, Jeremie S., Hassan, Hasan, Ergin, Oğuz, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: Random number generation is an important task in a wide variety of critical applications including cryptographic algorithms, scientific simulations, and industrial testing tools. True Random Number Generators (TRNGs) produce truly random data by sampling a physical entropy source that typically requires custom hardware and suffers from long latency. To enable high-bandwidth and low-latency TRNGs on commodity devices, recent works propose TRNGs that use DRAM as an entropy source. Although prior works demonstrate promising DRAM-based TRNGs, integration of such mechanisms into real systems poses challenges. We identify three challenges for using DRAM-based TRNGs in current systems: (1) generating random numbers can degrade system performance by slowing down concurrently-running applications due to the interference between RNG and regular memory operations in the memory controller (i.e., RNG interference), (2) this RNG interference can degrade system fairness by unfairly prioritizing applications that intensively use random numbers (i.e., RNG applications), and (3) RNG applications can experience significant slowdowns due to the high RNG latency. We propose DR-STRaNGe, an end-to-end system design for DRAM-based TRNGs that (1) reduces the RNG interference by separating RNG requests from regular requests in the memory controller, (2) improves the system fairness with an RNG-aware memory request scheduler, and (3) hides the large TRNG latencies using a random number buffering mechanism with a new DRAM idleness predictor that accurately identifies idle DRAM periods. We evaluate DR-STRaNGe using a set of 186 multiprogrammed workloads. Compared to an RNG-oblivious baseline system, DR-STRaNGe improves the average performance of non-RNG and RNG applications by 17.9% and 25.1%, respectively. DR-STRaNGe improves average system fairness by 32.1% and reduces average energy consumption by 21%.
Published: 2022

30. DarkGates: A Hybrid Power-Gating Architecture to Mitigate the Performance Impact of Dark-Silicon in High Performance Processors

Author: Yahya, Jawad Haj, Kim, Jeremie S., Yaglikci, A. Giray, Park, Jisung, Rotem, Efraim, Sazeides, Yanos, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: To reduce the leakage power of inactive (dark) silicon components, modern processor systems shut-off these components' power supply using low-leakage transistors, called power-gates. Unfortunately, power-gates increase the system's power-delivery impedance and voltage guardband, limiting the system's maximum attainable voltage (i.e., Vmax) and, thus, the CPU core's maximum attainable frequency (i.e., Fmax). As a result, systems that are performance constrained by the CPU frequency (i.e., Fmax-constrained), such as high-end desktops, suffer significant performance loss due to power-gates. To mitigate this performance loss, we propose DarkGates, a hybrid system architecture that increases the performance of Fmax-constrained systems while fulfilling their power efficiency requirements. DarkGates is based on three key techniques: i) bypassing on-chip power-gates using package-level resources (called bypass mode), ii) extending power management firmware to support operation either in bypass mode or normal mode, and iii) introducing deeper idle power states. We implement DarkGates on an Intel Skylake microprocessor for client devices and evaluate it using a wide variety of workloads. On a real 4-core Skylake system with integrated graphics, DarkGates improves the average performance of SPEC CPU2006 workloads across all thermal design power (TDP) levels (35W-91W) between 4.2% and 5.3%. DarkGates maintains the performance of 3DMark workloads for desktop systems with TDP greater than 45W while for a 35W-TDP (the lowest TDP) desktop it experiences only a 2% degradation. In addition, DarkGates fulfills the requirements of the ENERGY STAR and the Intel Ready Mode energy efficiency benchmarks of desktop systems., Comment: The paper is accepted to HPCA 2022
Published: 2021

31. A Deeper Look into RowHammer`s Sensitivities: Experimental Analysis of Real DRAM Chips and Implications on Future Attacks and Defenses

Author: Orosa, Lois, Yağlıkçı, Abdullah Giray, Luo, Haocong, Olgun, Ataberk, Park, Jisung, Hassan, Hasan, Patel, Minesh, Kim, Jeremie S., and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: RowHammer is a circuit-level DRAM vulnerability where repeatedly accessing (i.e., hammering) a DRAM row can cause bit flips in physically nearby rows. The RowHammer vulnerability worsens as DRAM cell size and cell-to-cell spacing shrink. Recent studies demonstrate that modern DRAM chips, including chips previously marketed as RowHammer-safe, are even more vulnerable to RowHammer than older chips such that the required hammer count to cause a bit flip has reduced by more than 10X in the last decade. Therefore, it is essential to develop a better understanding and in-depth insights into the RowHammer vulnerability of modern DRAM chips to more effectively secure current and future systems. Our goal in this paper is to provide insights into fundamental properties of the RowHammer vulnerability that are not yet rigorously studied by prior works, but can potentially be $i$) exploited to develop more effective RowHammer attacks or $ii$) leveraged to design more effective and efficient defense mechanisms. To this end, we present an experimental characterization using 248~DDR4 and 24~DDR3 modern DRAM chips from four major DRAM manufacturers demonstrating how the RowHammer effects vary with three fundamental properties: 1)~DRAM chip temperature, 2)~aggressor row active time, and 3)~victim DRAM cell's physical location. Among our 16 new observations, we highlight that a RowHammer bit flip 1)~is very likely to occur in a bounded range, specific to each DRAM cell (e.g., 5.4% of the vulnerable DRAM cells exhibit errors in the range 70C to 90C), 2)~is more likely to occur if the aggressor row is active for longer time (e.g., RowHammer vulnerability increases by 36% if we keep a DRAM row active for 15 column accesses), and 3)~is more likely to occur in certain physical regions of the DRAM module under attack (e.g., 5% of the rows are 2x more vulnerable than the remaining 95% of the rows)., Comment: A shorter version of this work is to appear at the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-54), 2021
Published: 2021
Full Text: View/download PDF

32. Security Analysis of the Silver Bullet Technique for RowHammer Prevention

Author: Yağlıkçı, Abdullah Giray, Kim, Jeremie S., Devaux, Fabrice, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: The purpose of this document is to study the security properties of the Silver Bullet algorithm against worst-case RowHammer attacks. We mathematically demonstrate that Silver Bullet, when properly configured and implemented in a DRAM chip, can securely prevent RowHammer attacks. The demonstration focuses on the most representative implementation of Silver Bullet, the patent claiming many implementation possibilities not covered in this demonstration. Our study concludes that Silver Bullet is a promising RowHammer prevention mechanism that can be configured to operate securely against RowHammer attacks at various efficiency-area tradeoff points, supporting relatively small hammer count values (e.g., 1000) and Silver Bullet table sizes (e.g., 1.06KB)., Comment: 40 pages
Published: 2021

33. IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors

Author: Haj-Yahya, Jawad, Kim, Jeremie S., Yaglikci, A. Giray, Puddu, Ivan, Orosa, Lois, Luna, Juan Gómez, Alser, Mohammed, and Mutlu, Onur
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: To operate efficiently across a wide range of workloads with varying power requirements, a modern processor applies different current management mechanisms, which briefly throttle instruction execution while they adjust voltage and frequency to accommodate for power-hungry instructions (PHIs) in the instruction stream. Doing so 1) reduces the power consumption of non-PHI instructions in typical workloads and 2) optimizes system voltage regulators' cost and area for the common use case while limiting current consumption when executing PHIs. However, these mechanisms may compromise a system's confidentiality guarantees. In particular, we observe that multilevel side-effects of throttling mechanisms, due to PHI-related current management mechanisms, can be detected by two different software contexts (i.e., sender and receiver) running on 1) the same hardware thread, 2) co-located Simultaneous Multi-Threading (SMT) threads, and 3) different physical cores. Based on these new observations on current management mechanisms, we develop a new set of covert channels, IChannels, and demonstrate them in real modern Intel processors (which span more than 70% of the entire client and server processor market). Our analysis shows that IChannels provides more than 24x the channel capacity of state-of-the-art power management covert channels. We propose practical and effective mitigations to each covert channel in IChannels by leveraging the insights we gain through a rigorous characterization of real systems., Comment: To appear in ISCA 2021
Published: 2021

34. QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips

Author: Olgun, Ataberk, Patel, Minesh, Yağlıkçı, A. Giray, Luo, Haocong, Kim, Jeremie S., Bostancı, Nisa, Vijaykumar, Nandita, Ergin, Oğuz, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: True random number generators (TRNG) sample random physical processes to create large amounts of random numbers for various use cases, including security-critical cryptographic primitives, scientific simulations, machine learning applications, and even recreational entertainment. Unfortunately, not every computing system is equipped with dedicated TRNG hardware, limiting the application space and security guarantees for such systems. To open the application space and enable security guarantees for the overwhelming majority of computing systems that do not necessarily have dedicated TRNG hardware, we develop QUAC-TRNG. QUAC-TRNG exploits the new observation that a carefully-engineered sequence of DRAM commands activates four consecutive DRAM rows in rapid succession. This QUadruple ACtivation (QUAC) causes the bitline sense amplifiers to non-deterministically converge to random values when we activate four rows that store conflicting data because the net deviation in bitline voltage fails to meet reliable sensing margins. We experimentally demonstrate that QUAC reliably generates random values across 136 commodity DDR4 DRAM chips from one major DRAM manufacturer. We describe how to develop an effective TRNG (QUAC-TRNG) based on QUAC. We evaluate the quality of our TRNG using NIST STS and find that QUAC-TRNG successfully passes each test. Our experimental evaluations show that QUAC-TRNG generates true random numbers with a throughput of 3.44 Gb/s (per DRAM channel), outperforming the state-of-the-art DRAM-based TRNG by 15.08x and 1.41x for basic and throughput-optimized versions, respectively. We show that QUAC-TRNG utilizes DRAM bandwidth better than the state-of-the-art, achieving up to 2.03x the throughput of a throughput-optimized baseline when scaling bus frequencies to 12 GT/s., Comment: 15 pages, 14 figures. A shorter version of this work is to appear at the 48th IEEE International Symposium on Computer Architecture (ISCA 2021)
Published: 2021

35. BlockHammer: Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows

Author: Yağlıkçı, Abdullah Giray, Patel, Minesh, Kim, Jeremie S., Azizi, Roknoddin, Olgun, Ataberk, Orosa, Lois, Hassan, Hasan, Park, Jisung, Kanellopoulos, Konstantinos, Shahroodi, Taha, Ghose, Saugata, and Mutlu, Onur
Subjects: Computer Science - Cryptography and Security, Computer Science - Hardware Architecture
Abstract: Aggressive memory density scaling causes modern DRAM devices to suffer from RowHammer, a phenomenon where rapidly activating a DRAM row can cause bit-flips in physically-nearby rows. Recent studies demonstrate that modern DRAM chips, including chips previously marketed as RowHammer-safe, are even more vulnerable to RowHammer than older chips. Many works show that attackers can exploit RowHammer bit-flips to reliably mount system-level attacks to escalate privilege and leak private data. Therefore, it is critical to ensure RowHammer-safe operation on all DRAM-based systems. Unfortunately, state-of-the-art RowHammer mitigation mechanisms face two major challenges. First, they incur increasingly higher performance and/or area overheads when applied to more vulnerable DRAM chips. Second, they require either proprietary information about or modifications to the DRAM chip design. In this paper, we show that it is possible to efficiently and scalably prevent RowHammer bit-flips without knowledge of or modification to DRAM internals. We introduce BlockHammer, a low-cost, effective, and easy-to-adopt RowHammer mitigation mechanism that overcomes the two key challenges by selectively throttling memory accesses that could otherwise cause RowHammer bit-flips. The key idea of BlockHammer is to (1) track row activation rates using area-efficient Bloom filters and (2) use the tracking data to ensure that no row is ever activated rapidly enough to induce RowHammer bit-flips. By doing so, BlockHammer (1) makes it impossible for a RowHammer bit-flip to occur and (2) greatly reduces a RowHammer attack's impact on the performance of co-running benign applications. Compared to state-of-the-art RowHammer mitigation mechanisms, BlockHammer provides competitive performance and energy when the system is not under a RowHammer attack and significantly better performance and energy when the system is under attack., Comment: A shorter version of this work is to appear at the 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA-27), 2021
Published: 2021
Full Text: View/download PDF

36. Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network Inference

Author: Orosa, Lois, primary, Koppula, Skanda, additional, Kanellopoulos, Konstantinos, additional, Yağlıkçı, A. Giray, additional, and Mutlu, Onur, additional
Published: 2023
Full Text: View/download PDF

37. Revisiting RowHammer: An Experimental Analysis of Modern DRAM Devices and Mitigation Techniques

Author: Kim, Jeremie S., Patel, Minesh, Yaglikci, A. Giray, Hassan, Hasan, Azizi, Roknoddin, Orosa, Lois, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture, Computer Science - Cryptography and Security
Abstract: In order to shed more light on how RowHammer affects modern and future devices at the circuit-level, we first present an experimental characterization of RowHammer on 1580 DRAM chips (408x DDR3, 652x DDR4, and 520x LPDDR4) from 300 DRAM modules (60x DDR3, 110x DDR4, and 130x LPDDR4) with RowHammer protection mechanisms disabled, spanning multiple different technology nodes from across each of the three major DRAM manufacturers. Our studies definitively show that newer DRAM chips are more vulnerable to RowHammer: as device feature size reduces, the number of activations needed to induce a RowHammer bit flip also reduces, to as few as 9.6k (4.8k to two rows each) in the most vulnerable chip we tested. We evaluate five state-of-the-art RowHammer mitigation mechanisms using cycle-accurate simulation in the context of real data taken from our chips to study how the mitigation mechanisms scale with chip vulnerability. We find that existing mechanisms either are not scalable or suffer from prohibitively large performance overheads in projected future devices given our observed trends of RowHammer vulnerability. Thus, it is critical to research more effective solutions to RowHammer.
Published: 2020

38. CLR-DRAM: A Low-Cost DRAM Architecture Enabling Dynamic Capacity-Latency Trade-Off

Author: Luo, Haocong, Shahroodi, Taha, Hassan, Hasan, Patel, Minesh, Yaglikci, Abdullah Giray, Orosa, Lois, Park, Jisung, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: DRAM is the prevalent main memory technology, but its long access latency can limit the performance of many workloads. Although prior works provide DRAM designs that reduce DRAM access latency, their reduced storage capacities hinder the performance of workloads that need large memory capacity. Because the capacity-latency trade-off is fixed at design time, previous works cannot achieve maximum performance under very different and dynamic workload demands. This paper proposes Capacity-Latency-Reconfigurable DRAM (CLR-DRAM), a new DRAM architecture that enables dynamic capacity-latency trade-off at low cost. CLR-DRAM allows dynamic reconfiguration of any DRAM row to switch between two operating modes: 1) max-capacity mode, where every DRAM cell operates individually to achieve approximately the same storage density as a density-optimized commodity DRAM chip and 2) high-performance mode, where two adjacent DRAM cells in a DRAM row and their sense amplifiers are coupled to operate as a single low-latency logical cell driven by a single logical sense amplifier. We implement CLR-DRAM by adding isolation transistors in each DRAM subarray. Our evaluations show that CLR-DRAM can improve system performance and DRAM energy consumption by 18.6% and 29.7% on average with four-core multiprogrammed workloads. We believe that CLR-DRAM opens new research directions for a system to adapt to the diverse and dynamically changing memory capacity and access latency demands of workloads., Comment: This work is to appear at ISCA 2020
Published: 2020

39. SysScale: Exploiting Multi-domain Dynamic Voltage and Frequency Scaling for Energy Efficient Mobile Processors

Author: Haj-Yahya, Jawad, Alser, Mohammed, Kim, Jeremie, Yaglıkçı, A. Giray, Vijaykumar, Nandita, Rotem, Efraim, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: There are three domains in a modern thermally-constrained mobile system-on-chip (SoC): compute, IO, and memory. We observe that a modern SoC typically allocates a fixed power budget, corresponding to worst-case performance demands, to the IO and memory domains even if they are underutilized. The resulting unfair allocation of the power budget across domains can cause two major issues: 1) the IO and memory domains can operate at a higher frequency and voltage than necessary, increasing power consumption and 2) the unused power budget of the IO and memory domains cannot be used to increase the throughput of the compute domain, hampering performance. To avoid these issues, it is crucial to dynamically orchestrate the distribution of the SoC power budget across the three domains based on their actual performance demands. We propose SysScale, a new multi-domain power management technique to improve the energy efficiency of mobile SoCs. SysScale is based on three key ideas. First, SysScale introduces an accurate algorithm to predict the performance (e.g., bandwidth and latency) demands of the three SoC domains. Second, SysScale uses a new DVFS (dynamic voltage and frequency scaling) mechanism to distribute the SoC power to each domain according to the predicted performance demands. Third, in addition to using a global DVFS mechanism, SysScale uses domain-specialized techniques to optimize the energy efficiency of each domain at different operating points. We implement SysScale on an Intel Skylake microprocessor for mobile devices and evaluate it using a wide variety of SPEC CPU2006, graphics (3DMark), and battery life workloads (e.g., video playback). On a 2-core Skylake, SysScale improves the performance of SPEC CPU2006 and 3DMark workloads by up to 16% and 8.9% (9.2% and 7.9% on average), respectively., Comment: To appear at ISCA 2020
Published: 2020

40. EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM

Author: Koppula, Skanda, Orosa, Lois, Yağlıkçı, Abdullah Giray, Azizi, Roknoddin, Shahroodi, Taha, Kanellopoulos, Konstantinos, and Mutlu, Onur
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: The effectiveness of deep neural networks (DNN) in vision, speech, and language processing has prompted a tremendous demand for energy-efficient high-performance DNN inference systems. Due to the increasing memory intensity of most DNN workloads, main memory can dominate the system's energy consumption and stall time. One effective way to reduce the energy consumption and increase the performance of DNN inference systems is by using approximate memory, which operates with reduced supply voltage and reduced access latency parameters that violate standard specifications. Using approximate memory reduces reliability, leading to higher bit error rates. Fortunately, neural networks have an intrinsic capacity to tolerate increased bit errors. This can enable energy-efficient and high-performance neural network inference using approximate DRAM devices. Based on this observation, we propose EDEN, a general framework that reduces DNN energy consumption and DNN evaluation latency by using approximate DRAM devices, while strictly meeting a user-specified target DNN accuracy. EDEN relies on two key ideas: 1) retraining the DNN for a target approximate DRAM device to increase the DNN's error tolerance, and 2) efficient mapping of the error tolerance of each individual DNN data type to a corresponding approximate DRAM partition in a way that meets the user-specified DNN accuracy requirements. We evaluate EDEN on multi-core CPUs, GPUs, and DNN accelerators with error models obtained from real approximate DRAM devices. For a target accuracy within 1% of the original DNN, our results show that EDEN enables 1) an average DRAM energy reduction of 21%, 37%, 31%, and 32% in CPU, GPU, and two DNN accelerator architectures, respectively, across a variety of DNNs, and 2) an average (maximum) speedup of 8% (17%) and 2.7% (5.5%) in CPU and GPU architectures, respectively, when evaluating latency-bound DNNs., Comment: This work is to appear at MICRO 2019
Published: 2019

41. What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study

Author: Ghose, Saugata, Yağlıkçı, Abdullah Giray, Gupta, Raghav, Lee, Donghyuk, Kudrolli, Kais, Liu, William X., Hassan, Hasan, Chang, Kevin K., Chatterjee, Niladrish, Agrawal, Aditya, O'Connor, Mike, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: Main memory (DRAM) consumes as much as half of the total system power in a computer today, resulting in a growing need to develop new DRAM architectures and systems that consume less power. Researchers have long relied on DRAM power models that are based off of standardized current measurements provided by vendors, called IDD values. Unfortunately, we find that these models are highly inaccurate, and do not reflect the actual power consumed by real DRAM devices. We perform the first comprehensive experimental characterization of the power consumed by modern real-world DRAM modules. Our extensive characterization of 50 DDR3L DRAM modules from three major vendors yields four key new observations about DRAM power consumption: (1) across all IDD values that we measure, the current consumed by real DRAM modules varies significantly from the current specified by the vendors; (2) DRAM power consumption strongly depends on the data value that is read or written; (3) there is significant structural variation, where the same banks and rows across multiple DRAM modules from the same model consume more power than other banks or rows; and (4) over successive process technology generations, DRAM power consumption has not decreased by as much as vendor specifications have indicated. Based on our detailed analysis and characterization data, we develop the Variation-Aware model of Memory Power Informed by Real Experiments (VAMPIRE). We show that VAMPIRE has a mean absolute percentage error of only 6.8% compared to actual measured DRAM power. VAMPIRE enables a wide range of studies that were not possible using prior DRAM power models. As an example, we use VAMPIRE to evaluate a new power-aware data encoding mechanism, which can reduce DRAM energy consumption by an average of 12.2%. We plan to open-source both VAMPIRE and our extensive raw data collected during our experimental characterization., Comment: presented at SIGMETRICS 2018
Published: 2018

42. Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency

Author: Chang, Kevin K., Yaglıkçı, Abdullah Giray, Ghose, Saugata, Agrawal, Aditya, Chatterjee, Niladrish, Kashyap, Abhijith, Lee, Donghyuk, O'Connor, Mike, Hassan, Hasan, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: This paper summarizes our work on experimental characterization and analysis of reduced-voltage operation in modern DRAM chips, which was published in SIGMETRICS 2017, and examines the work's significance and future potential. We take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the DRAM supply voltage is lowered below the nominal voltage level specified by DRAM standards. We perform an experimental study of 124 real DDR3L (low-voltage) DRAM chips manufactured recently by three major DRAM vendors. We find that reducing the supply voltage below a certain point introduces bit errors in the data, and we comprehensively characterize the behavior of these errors. We discover that these errors can be avoided by increasing the latency of three major DRAM operations (activation, restoration, and precharge). We perform detailed DRAM circuit simulations to validate and explain our experimental findings. We also characterize the various relationships between reduced supply voltage and error locations, stored data patterns, DRAM temperature, and data retention. Based on our observations, we propose a new DRAM energy reduction mechanism, called Voltron. The key idea of Voltron is to use a performance model to determine by how much we can reduce the supply voltage without introducing errors and without exceeding a user-specified threshold for performance loss. Our evaluations show that Voltron reduces the average DRAM and system energy consumption by 10.5% and 7.3%, respectively, while limiting the average system performance loss to only 1.8%, for a variety of memory-intensive quad-core workloads. We also show that Voltron significantly outperforms prior dynamic voltage and frequency scaling mechanisms for DRAM.
Published: 2018

43. Understanding Reduced-Voltage Operation in Modern DRAM Chips: Characterization, Analysis, and Mechanisms

Author: Chang, Kevin K., Yağlıkçı, Abdullah Giray, Ghose, Saugata, Agrawal, Aditya, Chatterjee, Niladrish, Kashyap, Abhijith, Lee, Donghyuk, O'Connor, Mike, Hassan, Hasan, and Mutlu, Onur
Subjects: Computer Science - Hardware Architecture
Abstract: The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM energy consumption. We would like to reduce the DRAM supply voltage more aggressively, to further reduce energy. Aggressive supply voltage reduction requires a thorough understanding of the effect voltage scaling has on DRAM access latency and DRAM reliability. In this paper, we take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the supply voltage is lowered below the nominal voltage level specified by DRAM standards. Using an FPGA-based testing platform, we perform an experimental study of 124 real DDR3L (low-voltage) DRAM chips manufactured recently by three major DRAM vendors. We find that reducing the supply voltage below a certain point introduces bit errors in the data, and we comprehensively characterize the behavior of these errors. We discover that these errors can be avoided by increasing the latency of three major DRAM operations (activation, restoration, and precharge). We perform detailed DRAM circuit simulations to validate and explain our experimental findings. We also characterize the various relationships between reduced supply voltage and error locations, stored data patterns, DRAM temperature, and data retention. Based on our observations, we propose a new DRAM energy reduction mechanism, called Voltron. The key idea of Voltron is to use a performance model to determine by how much we can reduce the supply voltage without introducing errors and without exceeding a user-specified threshold for performance loss. Voltron reduces the average system energy by 7.3% while limiting the average system performance loss to only 1.8%, for a variety of workloads., Comment: 25 pages, 25 figures, 7 tables, Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS)
Published: 2017
Full Text: View/download PDF

44. Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions

Author: Yağlıkçı, Abdullah Giray, primary, Tuğrul, Yahya Can, additional, Oliveira, Geraldo F., additional, Yüksel, İsmail Emir, additional, Olgun, Ataberk, additional, Luo, Haocong, additional, and Mutlu, Onur, additional
Published: 2024
Full Text: View/download PDF

45. Leveraging Adversarial Detection to Enable Scalable and Low Overhead RowHammer Mitigations

Author: Canpolat, Oğuzhan, Yağlıkçı, A. Giray, Olgun, Ataberk, Yüksel, İsmail Emir, Tuğrul, Yahya Can, Kanellopoulos, Konstantinos, Ergin, Oğuz, Mutlu, Onur, Canpolat, Oğuzhan, Yağlıkçı, A. Giray, Olgun, Ataberk, Yüksel, İsmail Emir, Tuğrul, Yahya Can, Kanellopoulos, Konstantinos, Ergin, Oğuz, and Mutlu, Onur
Abstract: RowHammer is a prime example of read disturbance in DRAM where repeatedly accessing (hammering) a row of DRAM cells (DRAM row) induces bitflips in other physically nearby DRAM rows. RowHammer solutions perform preventive actions (e.g., refresh neighbor rows of the hammered row) that mitigate such bitflips to preserve memory isolation, a fundamental building block of security and privacy in modern computing systems. However, preventive actions induce non-negligible memory request latency and system performance overheads as they interfere with memory requests in the memory controller. As shrinking technology node size over DRAM chip generations exacerbates RowHammer, the overheads of RowHammer solutions become prohibitively large. As a result, a malicious program can effectively hog the memory system and deny service to benign applications by causing many RowHammer preventive actions. In this work, we tackle the performance overheads of RowHammer solutions by tracking the generators of memory accesses that trigger RowHammer solutions. To this end, we propose BreakHammer. BreakHammer cooperates with existing RowHammer solutions to identify hardware threads that trigger preventive actions. To do so, BreakHammer estimates the RowHammer likelihood of a thread, based on how frequently it triggers RowHammer preventive actions. BreakHammer limits the number of on-the-fly requests a thread can inject into the memory system based on the thread's RowHammer likelihood. By doing so, BreakHammer significantly reduces the number of performed counter-measures, improves the system performance by an average (maximum) of 48.7% (105.5%), and reduces the maximum slowdown induced on a benign application by 14.6% with near-zero area overhead (e.g., 0.0002% of a highend processor's chip area).
Published: 2024

46. DRAM Bender: An Extensible and Versatile FPGA-Based Infrastructure to Easily Test State-of-the-Art DRAM Chips

Author: Olgun, Ataberk, primary, Hassan, Hasan, additional, Yağlıkçı, A. Giray, additional, Tuğrul, Yahya Can, additional, Orosa, Lois, additional, Luo, Haocong, additional, Patel, Minesh, additional, Ergin, Oğuz, additional, and Mutlu, Onur, additional
Published: 2023
Full Text: View/download PDF

47. RowPress: Amplifying Read Disturbance in Modern DRAM Chips

Author: Luo, Haocong, primary, Olgun, Ataberk, additional, Yağlıkçı, Abdullah Giray, additional, Tuğrul, Yahya Can, additional, Rhyner, Steve, additional, Cavlak, Meryem Banu, additional, Lindegger, Joël, additional, Sadrosadati, Mohammad, additional, and Mutlu, Onur, additional
Published: 2023
Full Text: View/download PDF

48. An Experimental Analysis of RowHammer in HBM2 DRAM Chips

Author: Olgun, Ataberk, primary, Osseiran, Majd, additional, Yağlıkçı, A. Giray, additional, Tuğrul, Yahya Can, additional, Luo, Haocong, additional, Rhyner, Steve, additional, Salami, Behzad, additional, Luna, Juan Gomez, additional, and Mutlu, Onur, additional
Published: 2023
Full Text: View/download PDF

49. A deeper look into RowHammer's sensitivities: Experimental analysis of real DRAM chips and implications on future attacks and defenses

Author: Yağlıkçı, A. Giray, Orosa, L., Luo, H., Olgun, Ataberk, Park, J., Hassan, H., Patel, M., Yağlıkçı, A. Giray, Orosa, L., Luo, H., Olgun, Ataberk, Park, J., Hassan, H., and Patel, M.
Abstract: ARM;et al.;Huawei;IBM;Intel;Microsoft, 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2021 -- 18 October 2021 through 22 October 2021 -- 172825, RowHammer is a circuit-level DRAM vulnerability where repeatedly accessing (i.e., hammering) a DRAM row can cause bit flips in physically nearby rows. The RowHammer vulnerability worsens as DRAM cell size and cell-to-cell spacing shrink. Recent studies demonstrate that modern DRAM chips, including chips previously marketed as RowHammer-safe, are even more vulnerable to RowHammer than older chips such that the required hammer count to cause a bit flip has reduced by more than 10X in the last decade. Therefore, it is essential to develop a better understanding and in-depth insights into the RowHammer vulnerability of modern DRAM chips to more effectively secure current and future systems. Our goal in this paper is to provide insights into fundamental properties of the RowHammer vulnerability that are not yet rigorously studied by prior works, but can potentially be i) exploited to develop more effective RowHammer attacks or ii) leveraged to design more effective and efficient defense mechanisms. To this end, we present an experimental characterization using 248 DDR4 and 24 DDR3 modern DRAM chips from four major DRAM manufacturers demonstrating how the RowHammer effects vary with three fundamental properties: 1) DRAM chip temperature, 2) aggressor row active time, and 3) victim DRAM cell fs physical location. Among our 16 new observations, we highlight that a RowHammer bit flip 1) is very likely to occur in a bounded range, specific to each DRAM cell (e.g., 5.4% of the vulnerable DRAM cells exhibit errors in the range 70 .C to 90 .C), 2) is more likely to occur if the aggressor row is active for longer time (e.g., RowHammer vulnerability increases by 36% if we keep a DRAM row active for 15 column accesses), and 3) is more likely to occur in certain physical regions of the DRAM module under attack (e.g., 5% of the rows are 2x more vulnerable than the remaining 95% of the rows). Our study has important practical implications on future RowHammer attacks and defenses. We d
Published: 2022

50. QUAC-TRNG: High Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips

Author: Bostancı, F. Nisa, Luo, Haocong, Ergin, Oğuz, Patel, Minesh, Yağlıkçı, A. Giray, Vijaykumar, Nandita, Olgun, Ataberk, Bostancı, F. Nisa, Luo, Haocong, Ergin, Oğuz, Patel, Minesh, Yağlıkçı, A. Giray, Vijaykumar, Nandita, and Olgun, Ataberk
Abstract: ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) -- JUN 14-19, 2021 -- ELECTR NETWORK, True random number generators (TRNG) sample random physical processes to create large amounts of random numbers for various use cases, including security-critical cryptographic primitives, scientific simulations, machine learning applications, and even recreational entertainment. Unfortunately, not every computing system is equipped with dedicated TRNG hardware, limiting the application space and security guarantees for such systems. To open the application space and enable security guarantees for the overwhelming majority of computing systems that do not necessarily have dedicated TRNG hardware (e.g., processing-in-memory systems), we develop QUAC-TRNG, a new high-throughput TRNG that can be fully implemented in commodity DRAM chips, which are key components in most modern systems. QUAC-TRNG exploits the new observation that a carefully-engineered sequence of DRAM commands activates four consecutive DRAM rows in rapid succession. This QUadruple ACtivation (QUAC) causes the bitline sense amplifiers to non-deterministically converge to random values when we activate four rows that store conflicting data because the net deviation in bitline voltage fails to meet reliable sensing margins. We experimentally demonstrate that QUAC reliably generates random values across 136 commodity DDR4 DRAM chips from one major DRAM manufacturer. We describe how to develop an effective TRNG (QUAC-TRNG) based on QUAC. We evaluate the quality of our TRNG using the commonly-used NIST statistical test suite for randomness and find that QUAC-TRNG successfully passes each test. Our experimental evaluations show that QUAC-TRNG reliably generates true random numbers with a throughput of 3.44 Gb/s (per DRAM channel), outperforming the state-of-the-art DRAM-based TRNG by 15.08x and 1.41x for basic and throughput-optimized versions, respectively. We show that QUAC-TRNG utilizes DRAM bandwidth better than the state-of-the-art, achieving up to 2.03x the throughput of a throughput-optimized baseline, IEEE, ACM, IEEE Comp Soc
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

90 results on '"Yağlıkçı, A. Giray"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources