4,055 results on '"Dram"'
Search Results
2. FMEA-TSTM-NNGA: A Novel Optimization Framework Integrating Failure Mode and Effect Analysis, the Taguchi Method, a Neural Network, and a Genetic Algorithm for Improving the Resistance in Dynamic Random Access Memory Components.
- Author
-
Lin, Chia-Ming and Chen, Shang-Liang
- Subjects
- *
DYNAMIC random access memory , *FAILURE mode & effects analysis , *TAGUCHI methods , *GENETIC algorithms , *MEDICAL equipment - Abstract
Dynamic random access memory (DRAM) serves as a critical component in medical equipment. Given the exacting standards demanded by medical equipment products, manufacturers face pressure to improve their product quality. The electrical characteristics of these products are based on the resistance value of the DRAM components. Hence, the purpose of this study is to optimize the resistance value of DRAM components in medical equipment. We proposed a novel FMEA-TSTM-NNGA framework that integrates failure mode and effect analysis (FMEA), the two-stage Taguchi method (TSTM), neural networks (NN), and genetic algorithms (GA) to optimize the manufacturing process. Moreover, the proposed FMEA-TSTM-NNGA framework achieved a substantial reduction in experimental trials, cutting the required number by a factor of 85.3 when compared to the grid search method. Our framework successfully identified optimal manufacturing condition settings for the resistance values of DRAM components: Depo time = 27 s, Depo O2 flow = 151 sccm, ARC-LTO etch time = 43 s, ARC-LTO etch pressure = 97 mTorr, Ox-SiCO etch time = 91 s, Ox-SiCO gas ratio = 22%, and Polish time = 84 s. The results helped the case company improve the resistance value of DRAM components from 191.1 × 10−3 Ohm to 176.84 × 10−3 Ohm, which is closer to the target value of 176.5 × 10−3 Ohm. The proposed FMEA-TSTM-NNGA framework is designed to operate efficiently on resource-constrained, facilitating real-time adjustments to production attributes. This capability enables DRAM manufacturers to swiftly optimize product quality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Improving Ti Thin Film Resistance Deviations in Physical Vapor Deposition Sputtering for Dynamic Random-Access Memory Using Dynamic Taguchi Method, Artificial Neural Network and Genetic Algorithm.
- Author
-
Lin, Chia-Ming and Chen, Shang-Liang
- Subjects
- *
ARTIFICIAL neural networks , *PHYSICAL vapor deposition , *THIN films , *TAGUCHI methods , *GENETIC algorithms , *DYNAMIC random access memory - Abstract
Many dynamic random-access memory (DRAM) manufacturing companies encounter significant resistance value deviations during the PVD sputtering process for manufacturing Ti thin films. These resistance values are influenced by the thickness of the thin films. Current mitigation strategies focus on adjusting film thickness to reduce resistance deviations, but this approach affects product structure profile and performance. Additionally, varying Ti thin film thicknesses across different product structures increase manufacturing complexity. This study aims to minimize resistance value deviations across multiple film thicknesses with minimal resource utilization. To achieve this goal, we propose the TSDTM-ANN-GA framework, which integrates the two-stage dynamic Taguchi method (TSDTM), artificial neural networks (ANN), and genetic algorithms (GA). The proposed framework requires significantly fewer experimental resources than traditional full factorial design and grid search method, making it suitable for resource-constrained and low-power computing environments. Our TSDTM-ANN-GA framework successfully identified an optimal production condition configuration for five different Ti thin film thicknesses: Degas temperature = 245 °C, Ar flow = 55 sccm, DC power = 5911 W, and DC power ramp rate = 4009 W/s. The results indicate that the deviation between the resistance values and their design values for the five Ti thin film thicknesses decreased by 86.8%, 94.1%, 95.9%, 98.2%, and 98.8%, respectively. The proposed method effectively reduced resistance deviations for the five Ti thin film thicknesses and simplified manufacturing management, allowing the required design values to be achieved under the same manufacturing conditions. This framework can efficiently operate on resource-limited and low-power computers, achieving the goal of real-time dynamic production parameter adjustments and enabling DRAM manufacturing companies to improve product quality promptly. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. A Novel DNA Synthesis Platform Design with High-Throughput Paralleled Addressability and High-Density Static Droplet Confinement.
- Author
-
Yang, Shijia, Wang, Dayin, Zhao, Zequan, Wang, Ning, Yu, Meng, Zhang, Kaihuan, Luo, Yuan, and Zhao, Jianlong
- Subjects
DATA warehousing ,RANDOM access memory ,INTEGRATED circuits ,DNA synthesis ,DNA - Abstract
Using DNA as the next-generation medium for data storage offers unparalleled advantages in terms of data density, storage duration, and power consumption as compared to existing data storage technologies. To meet the high-speed data writing requirements in DNA data storage, this paper proposes a novel design for an ultra-high-density and high-throughput DNA synthesis platform. The presented design mainly leverages two functional modules: a dynamic random-access memory (DRAM)-like integrated circuit (IC) responsible for electrode addressing and voltage supply, and the static droplet array (SDA)-based microfluidic structure to eliminate any reaction species diffusion concern in electrochemical DNA synthesis. Through theoretical analysis and simulation studies, we validate the effective addressing of 10 million electrodes and stable, adjustable voltage supply by the integrated circuit. We also demonstrate a reaction unit size down to 3.16 × 3.16 μm
2 , equivalent to 10 million/cm2 , that can rapidly and stably generate static droplets at each site, effectively constraining proton diffusion. Finally, we conducted a synthesis cycle experiment by incorporating fluorescent beacons on a microfabricated electrode array to examine the feasibility of our design. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
5. Novel STI Technology for Enhancing Reliability of High-k/Metal Gate DRAM
- Author
-
Hyojin Park, Gyuhyun Kil, Wonju Sung, Junghoon Han, Jungwoo Song, and Byoungdeog Choi
- Subjects
ALD ,DRAM ,dielectric ,gate first high-k/metal gate ,gate oxide reliability ,HKMG ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The challenges associated with semiconductor are increasing because of the rapid changes in the semiconductor market and the extreme scaling of semiconductors, with some processes reaching their technological limits. In the case of gate dielectrics, these limitations can be overcome by adopting high-k metal gate (HKMG) architecture instead of the previously used poly silicon/silicon oxy-nitride (PSION) structure. However, implementing the HKMG in a conventional DRAM process degrades the gate oxide. Therefore, in this study, a shallow trench isolation (STI) technology was developed to improve the gate oxide reliability in gate first HKMG DRAM structures. A novel STI process was developed to prevent the reduction in the oxide growth that occurs when the STI seam (or void) generated during the STI gap fill process meets the low temperature gate oxide process of the HKMG with SiGe. With the spacer STI (S-STI) structure, the ALD spacer was formed in the STI space region before the STI gap fill process to control the position of the STI seam (or void). Thus, a favorable environment for the growth of the gate oxide was established under the reduced effect of STI seam, and the oxide reliability was improved while maintaining the original structure and processes of the HKMG DRAM. Various analyses confirmed that the reliability was enhanced without the inherent characteristics of the HKMG being affected. These results revealed that the STI integration technology introduced herein improves the oxide reliability of HKMG DRAM products and maintains their technological excellence for the various complex needs of a rapidly changing market.
- Published
- 2024
- Full Text
- View/download PDF
6. SpyHammer: Understanding and Exploiting RowHammer Under Fine-Grained Temperature Variations
- Author
-
Lois Orosa, Ulrich Ruhrmair, A. Giray Yaglikci, Haocong Luo, Ataberk Olgun, Patrick Jattke, Minesh Patel, Jeremie S. Kim, Kaveh Razavi, and Onur Mutlu
- Subjects
Rowhammer ,DRAM ,security ,temperature ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
RowHammer is a DRAM vulnerability that can cause bit errors in a victim DRAM row solely by accessing its neighboring DRAM rows at a high-enough rate. Recent studies demonstrate that new DRAM devices are becoming increasingly vulnerable to RowHammer, and many works demonstrate system-level attacks for privilege escalation or information leakage. In this work, we perform the first rigorous fine-grained characterization and analysis of the correlation between RowHammer and temperature. We show that RowHammer is very sensitive to temperature variations, even if the variations are very small (e.g., ±1 °C). We leverage two key observations from our analysis to spy on DRAM temperature: 1) RowHammer-induced bit error rate consistently increases (or decreases) as the temperature increases, and 2) some DRAM cells that are vulnerable to RowHammer exhibit bit errors only at a particular temperature. Based on these observations, we propose a new RowHammer attack, called SpyHammer, that spies on the temperature of DRAM on critical systems such as industrial production lines, vehicles, and medical systems. SpyHammer is the first practical attack that can spy on DRAM temperature. Our evaluation in a controlled environment shows that SpyHammer can infer the temperature of the victim DRAM modules with an error of less than ±2.5 °C at the 90th percentile of all tested temperatures, for 12 real DRAM modules (120 DRAM chips) from four main manufacturers.
- Published
- 2024
- Full Text
- View/download PDF
7. Physics-Based Compact Model of Independent Dual-Gate BEOL-Transistors for Reliable Capacitorless Memory
- Author
-
Lihua Xu, Kaifei Chen, Zhi Li, Yue Zhao, Lingfei Wang, and Ling Li
- Subjects
BTI ,compact model ,contact effects ,DRAM ,independent dual gate a-IGZO-FET ,disorder semiconductor ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Capacitorless DRAM architectures based on Back-End-of-Line (BEOL)-transistors are promising for long-retention, high-density and low-power 3D DRAM solutions due to its low leakage, operational flexibility, and monolithic integration capability. Different from classical silicon-based devices, in-depth studies on the performances of nanoscale multi-gate transistors (e.g., a-InGaZnO-FET) are still barely conducted for physical description, due to the complicated multi-gating principle, finite-size effects on transport, increased variation sources and enlarged parasitic effect. Hence, high-performance multi-nanoscale (down to $\sim ~50$ nm) dual-gate a-IGZO transistors are fabricated, and a physical compact model is developed based on the surface potential for dual-gated coupling and the disordered transport with finite-size-correction. The short channel behaviors on sub-threshold swing, mobility and threshold voltage are investigated, and contact effects are validated by the transfer-line method (TLM). Regarding the specific challenge of dual-gate alignment, possible misalignment and parasitic effects on multi-device fluctuations are important of large-scale circuit design and analyzed by TCAD simulations. Besides, the bias-temperature instability (BTI) has been comprehensively investigated. In awareness of the above effects, this model bridges fabrication-based material properties and structural parameters, assisting in a threshold fluctuation-resistant operation scheme for capacitorless multi-bit memory, showing a great potential in future monolithic integration circuit design using BEOL-transistor.
- Published
- 2024
- Full Text
- View/download PDF
8. FMEA-TSTM-NNGA: A Novel Optimization Framework Integrating Failure Mode and Effect Analysis, the Taguchi Method, a Neural Network, and a Genetic Algorithm for Improving the Resistance in Dynamic Random Access Memory Components
- Author
-
Chia-Ming Lin and Shang-Liang Chen
- Subjects
DRAM ,FMEA ,Taguchi method ,neural network ,genetic algorithm ,Mathematics ,QA1-939 - Abstract
Dynamic random access memory (DRAM) serves as a critical component in medical equipment. Given the exacting standards demanded by medical equipment products, manufacturers face pressure to improve their product quality. The electrical characteristics of these products are based on the resistance value of the DRAM components. Hence, the purpose of this study is to optimize the resistance value of DRAM components in medical equipment. We proposed a novel FMEA-TSTM-NNGA framework that integrates failure mode and effect analysis (FMEA), the two-stage Taguchi method (TSTM), neural networks (NN), and genetic algorithms (GA) to optimize the manufacturing process. Moreover, the proposed FMEA-TSTM-NNGA framework achieved a substantial reduction in experimental trials, cutting the required number by a factor of 85.3 when compared to the grid search method. Our framework successfully identified optimal manufacturing condition settings for the resistance values of DRAM components: Depo time = 27 s, Depo O2 flow = 151 sccm, ARC-LTO etch time = 43 s, ARC-LTO etch pressure = 97 mTorr, Ox-SiCO etch time = 91 s, Ox-SiCO gas ratio = 22%, and Polish time = 84 s. The results helped the case company improve the resistance value of DRAM components from 191.1 × 10−3 Ohm to 176.84 × 10−3 Ohm, which is closer to the target value of 176.5 × 10−3 Ohm. The proposed FMEA-TSTM-NNGA framework is designed to operate efficiently on resource-constrained, facilitating real-time adjustments to production attributes. This capability enables DRAM manufacturers to swiftly optimize product quality.
- Published
- 2024
- Full Text
- View/download PDF
9. Top DDR4 RAM for Gaming in 2024
- Author
-
Khajuria, Kapish
- Subjects
Dynamic random access memory ,Dynamic cell -- Product/service Evaluations ,DRAM ,Computers - Abstract
Byline: Kapish Khajuria Sure, the buzz around DDR5 RAM is undeniable. But unless you're already equipped with a high-end system or willing to splurge on a new motherboard and CPU [...]
- Published
- 2024
10. Mitigating WL-to-WL Disturbance in Dynamic Random-Access Memory (DRAM) through Adopted Spherical Shallow Trench Isolation with Silicon Nitride Layer in the Buried Channel Array Transistor (BCAT).
- Author
-
Kim, Yeon-Seok, Lim, Chang-Young, and Kwon, Min-Woo
- Subjects
SILICON nitride ,TRANSISTORS ,TRENCHES ,STRUCTURAL optimization ,DATA integrity ,NITRIDES - Abstract
The Pass Gate Effect (PGE), often referred to as adjacent cell interference, presents a significant challenge in dynamic random-access memory (DRAM). In this study, we investigate the impact of PGE and propose innovative solutions to address this issue in DRAM technology, employing 10 nm node technology with buried channel array transistors. To evaluate the efficacy of our proposals, we utilized SILVACO for simulating various DRAM configurations. Our approach centers on two key structural optimizations: the introduction of a spherical Shallow Trench Isolation (STI) and the incorporation of a silicon nitride (Si
3 N4 ) layer within the spherical STI structure. These optimizations were meticulously designed to mitigate the PGE by considering several factors that are highly influential in its manifestation. To validate our approach, we conducted comprehensive simulations, comparing the PGE factors of typical DRAM structures with those of our proposed configurations. The results of our analysis strongly support the effectiveness of our proposed structural enhancements in alleviating the PGE when contrasted with conventional DRAM structures. Remarkably, our optimizations achieved a remarkable 82.4% reduction in the PGE, marking a significant breakthrough in the field of DRAM technology. By addressing the PGE challenge and substantially reducing its impact, our research contributes to the advancement of DRAM technology, offering practical solutions to enhance data integrity and reliability in the era of 10 nm node DRAM. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
11. Investigating the Association between the Autophagy Markers LC3B, SQSTM1/p62 , and DRAM and Autophagy-Related Genes in Glioma.
- Author
-
Danish, Farheen, Qureshi, Muhammad Asif, Mirza, Talat, Amin, Wajiha, Sufiyan, Sufiyan, Naeem, Sana, Arshad, Fatima, and Mughal, Nouman
- Subjects
- *
AUTOPHAGY , *GLIOMAS , *GENE expression , *CELL culture , *GENES , *PROTEIN expression - Abstract
High-grade gliomas are extremely fatal tumors, marked by severe hypoxia and therapeutic resistance. Autophagy is a cellular degradative process that can be activated by hypoxia, ultimately resulting in tumor advancement and chemo-resistance. Our study aimed to examine the link between autophagy markers' expression in low-grade gliomas (LGGs) and high-grade gliomas (HGGs). In 39 glioma cases, we assessed the protein expression of autophagy markers LC3B, SQSTM1/p62, and DRAM by immunohistochemistry (IHC) and the mRNA expression of the autophagy genes PTEN, PI3K, AKT, mTOR, ULK1, ULK2, UVRAG, Beclin 1, and VPS34 using RT-qPCR. LC3B, SQSTM1/p62, and DRAM expression were positive in 64.1%, 51.3%, and 28.2% of glioma cases, respectively. The expression of LC3B and SQSTM1/p62 was notably higher in HGGs compared to LGGs. VPS34 exhibited a significant differential expression, displaying increased fold change in HGGs compared to LGGs. Additionally, it exhibited robust positive associations with Beclin1 (rs = 0.768), UVRAG (rs = 0.802), and ULK2 (rs = 0.786) in HGGs. This underscores a potential association between autophagy and the progression of gliomas. We provide preliminary data for the functional analysis of autophagy using a cell culture model and to identify potential targets for therapeutic interventions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Improving Ti Thin Film Resistance Deviations in Physical Vapor Deposition Sputtering for Dynamic Random-Access Memory Using Dynamic Taguchi Method, Artificial Neural Network and Genetic Algorithm
- Author
-
Chia-Ming Lin and Shang-Liang Chen
- Subjects
DRAM ,two-stage dynamic Taguchi method ,artificial neural network ,genetic algorithm ,Mathematics ,QA1-939 - Abstract
Many dynamic random-access memory (DRAM) manufacturing companies encounter significant resistance value deviations during the PVD sputtering process for manufacturing Ti thin films. These resistance values are influenced by the thickness of the thin films. Current mitigation strategies focus on adjusting film thickness to reduce resistance deviations, but this approach affects product structure profile and performance. Additionally, varying Ti thin film thicknesses across different product structures increase manufacturing complexity. This study aims to minimize resistance value deviations across multiple film thicknesses with minimal resource utilization. To achieve this goal, we propose the TSDTM-ANN-GA framework, which integrates the two-stage dynamic Taguchi method (TSDTM), artificial neural networks (ANN), and genetic algorithms (GA). The proposed framework requires significantly fewer experimental resources than traditional full factorial design and grid search method, making it suitable for resource-constrained and low-power computing environments. Our TSDTM-ANN-GA framework successfully identified an optimal production condition configuration for five different Ti thin film thicknesses: Degas temperature = 245 °C, Ar flow = 55 sccm, DC power = 5911 W, and DC power ramp rate = 4009 W/s. The results indicate that the deviation between the resistance values and their design values for the five Ti thin film thicknesses decreased by 86.8%, 94.1%, 95.9%, 98.2%, and 98.8%, respectively. The proposed method effectively reduced resistance deviations for the five Ti thin film thicknesses and simplified manufacturing management, allowing the required design values to be achieved under the same manufacturing conditions. This framework can efficiently operate on resource-limited and low-power computers, achieving the goal of real-time dynamic production parameter adjustments and enabling DRAM manufacturing companies to improve product quality promptly.
- Published
- 2024
- Full Text
- View/download PDF
13. Low-Power Single Bitline Load Sense Amplifier for DRAM.
- Author
-
Dai, Chenghu, Lu, Yixiao, Lu, Wenjuan, Lin, Zhiting, Wu, Xiulong, and Peng, Chunyu
- Subjects
DYNAMIC random access memory ,ENERGY consumption ,DETECTOR circuits ,INTEGRATED circuits - Abstract
With the significant growth in modern computing systems, dynamic random access memory (DRAM) has become a power/performance/energy bottleneck in data-intensive applications. Both the power management mechanism and downscaling method face decreasing performance or difficulties in the smaller footprint of the DRAM capacitor. Since optimizing the circuit of sense amplifier (SA) is an efficient method to reduce energy consumption, we propose two single bitline load sense amplifier (SBLSA) circuits, i.e., a redundant voltage discharged SBLSA (RVD-SBLSA) circuit and a bit aware SBLSA (BA-SBLSA) circuit, to improve conventional and single bitline write (SBW) circuits. The RVD-SBLSA circuit utilizes a clamp diode to discharge redundant voltage over VDD/2 with an additional working stage. The BA-SBLSA circuit abandons the single bitline load (SBL) circuit during read and write '1' operations. The RVD-SBLSA circuit can offer the lowest total energy consumption, and the BA-SBLSA circuit can make a better balance between energy consumption and latency. Through the simulation results, the proposed circuits can efficiently reduce energy consumption or balance energy consumption and latency and show huge potentials in very large-scale integrated circuits. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Modeling and design of energy-efficient dependable memory sub-systems
- Author
-
Tovletoglou, Konstantinos, Karakonstantis, Georgios, Nikolopoulos, Dimitrios S., and O'Neill, Maire
- Subjects
621.39 ,DRAM ,memory ,energy efficiency ,error characterization ,real system ,relaxed circuit parameters ,heterogeneous-reliability memory ,dependable systems ,modeling ,design ,refresh rate - Abstract
The rapid increase of processed data is driving the aggressive scaling of DRAM for meeting the needs of higher memory density and bandwidth. As a result of the high memory demand, projections forecast that the memory sub-system will be responsible for a considerable portion of the overall power consumption within most multicore systems. However, the DRAM scaling is hampered by the adoption of pessimistic circuit parameters, that are selected based on the worst-case conditions for reliable operation. Such an approach might guarantee error-free storage of data, but the incurred power and performance overheads raise doubts about its efficiency in the future. This dissertation is focused on characterization and modeling of the DRAM behaviour under non-nominal DRAM circuit parameters, and design of energy-saving techniques in real systems to ensure the reliable operation of the system. Initially, we present the characterization of the DRAM reliability under relaxed circuit parameters and various conditions. We are able to understand the major effects of the workloads on the DRAM error behaviour under realistic conditions. In order to achieve this, we have developed an experimental framework on two server systems and a thermal testbed to control the DRAM temperature. We analyze the correlation between the DRAM error behavior, and the circuit parameters, temperature and workload-depended features. We apply supervised learning methods to construct a prediction model of the DRAM error behaviour based on the main features identified by our characterization. The prediction allows us to relax the DRAM circuit parameters just enough to avoid errors while having the maximum energy savings possible. We develop a benchmark that is able to stress the system even when the server is in the field. Furthermore, we propose a heterogeneous-reliability memory framework that enables allocation of critical data in a reliable memory domain, while the rest of the data are allocated on a variably-reliable memory domain. This ensures the reliable operation and storage of critical data in memory that is operating under nominal circuit parameters. While data that can tolerate errors are stored in memory that is operating under relaxed circuit parameters and is more energy-efficient. We introduce a programming interface to expose the capabilities of the framework to the users and a governor for scaling the DRAM circuit parameters dynamically. We extend the system with a checkpoint and restart mechanism to ensure even in the worst-case that data can be restored. We further enable the user of the framework to evaluate the fault tolerance and approximate techniques of their applications by implementing it on a real system. Finally, we devise software techniques to enable the exploitation of the refresh-by-access property of DRAM. We modify the scheduling order of accesses to the memory controller by re-ordering of the tasks in an application to minimize the duration of data residing in memory. This results eventually in decreased probability of errors. We extend our methodology in an application specific compiler to bound the access interval for all application data. We achieve this by controlling the size of each task and the order, while tracing the data accessed for each task. In the process to understand the refresh-by-access property, we develop a simulator for fast measurement of the interval between accesses through binary instrumentation. We use the outputs of the simulator to better understand the refresh-by-access property and to improve the existing DRAM fault injection schemes. By taking into consideration of the real duration that the data were stored in memory, we have more representative error fault injection when DRAM is operating under relaxed circuit parameters.
- Published
- 2021
15. A Novel DNA Synthesis Platform Design with High-Throughput Paralleled Addressability and High-Density Static Droplet Confinement
- Author
-
Shijia Yang, Dayin Wang, Zequan Zhao, Ning Wang, Meng Yu, Kaihuan Zhang, Yuan Luo, and Jianlong Zhao
- Subjects
DNA data storage ,DNA synthesis ,static droplet ,microfluidics ,DRAM ,Biotechnology ,TP248.13-248.65 - Abstract
Using DNA as the next-generation medium for data storage offers unparalleled advantages in terms of data density, storage duration, and power consumption as compared to existing data storage technologies. To meet the high-speed data writing requirements in DNA data storage, this paper proposes a novel design for an ultra-high-density and high-throughput DNA synthesis platform. The presented design mainly leverages two functional modules: a dynamic random-access memory (DRAM)-like integrated circuit (IC) responsible for electrode addressing and voltage supply, and the static droplet array (SDA)-based microfluidic structure to eliminate any reaction species diffusion concern in electrochemical DNA synthesis. Through theoretical analysis and simulation studies, we validate the effective addressing of 10 million electrodes and stable, adjustable voltage supply by the integrated circuit. We also demonstrate a reaction unit size down to 3.16 × 3.16 μm2, equivalent to 10 million/cm2, that can rapidly and stably generate static droplets at each site, effectively constraining proton diffusion. Finally, we conducted a synthesis cycle experiment by incorporating fluorescent beacons on a microfabricated electrode array to examine the feasibility of our design.
- Published
- 2024
- Full Text
- View/download PDF
16. Multivalued DRAM.
- Author
-
Karmakar, Supriyo
- Abstract
Multiple-channel field-effect transistors (MCFETs) switch the current among different channels in the FET based on the applied voltage in its gate terminal. MCFETs can be used to design a multivalued logic circuit with the lowest number of circuit elements. Different MCFET logic circuits and unipolar inverters are now considered to be an option to follow Moore's law. However, many details about the performance of MCFET in different logic circuit applications are in the research phase. In this paper, a circuit model of MCFET based on Verilog-A has been developed and a circuit for multivalued dynamic random-access memory (DRAM) is designed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. A Design of a Dual Delay Line DLL with Wide Input Duty Cycle Range.
- Author
-
Qin, Binyu, Zhao, Leilei, Fang, Chenyu, and Poechmueller, Peter
- Subjects
DELAY lines ,FREQUENCY dividers ,CLOCKS & watches - Abstract
This article describes a dual-controller dual-delay line delay lock loop (DC-DL DLL). The proposed DLL adopted a dual delay line structure, each delay line was composed of a coarse adjustment and a fine adjustment unit, and the dual delay lines had corresponding control units to reduce the mismatch between the delay lines, and it avoided the complicated design of duty cycle correction (DCC) circuit. A frequency divider was added to divide the input clock to achieve a wider input clock duty cycle adjustment. Additionally, a simple clock synthesis circuit was proposed to synthesize the required clock. The DLL design used the 25 nm process with a voltage of 1.2 V. The simulation results showed that at a working frequency of 1.6 GHz, the peak-to-peak jitter of the DC-DL DLL after locking was approximately 17.61 ps, the maximum output duty cycle error was about 1.3%, and the input duty cycle ranged from 20% to 80%, with a power consumption of 10.06 mW. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Extending Memory Capacity in Modern Consumer Systems With Emerging Non-Volatile Memory: Experimental Analysis and Characterization Using the Intel Optane SSD
- Author
-
Geraldo F. Oliveira, Saugata Ghose, Juan Gomez-Luna, Amirali Boroumand, Alexis Savery, Sonny Rao, Salman Qazi, Gwendal Grignou, Rahul Thakur, Eric Shiu, and Onur Mutlu
- Subjects
Consumer devices ,DRAM ,emerging technologies ,experimental characterization ,I/O systems ,memory capacity ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
DRAM scalability is becoming a limiting factor to the available memory capacity in consumer devices. As a potential solution, manufacturers have introduced emerging non-volatile memories (NVMs) into the market, which can be used to increase the memory capacity of consumer devices by augmenting or replacing DRAM. In this work, we provide the first analysis of the impact of extending the main memory space of consumer devices using off-the-shelf NVMs. We equip real web-based Chromebook computers with the Intel Optane solid-state drive (SSD), which contains state-of-the-art low-latency NVM, and use the NVM as swap space. We analyze the performance and energy consumption of the Optane-equipped Chromebooks, and compare this with (i) a baseline system with double the amount of DRAM than the system with the NVM-based swap space; and (ii) a system where the Intel Optane SSD is naively replaced with a state-of-the-art NAND-flash-based SSD. Our experimental analysis reveals that while Optane-based swap space provides a cost-effective way to alleviate the DRAM capacity bottleneck in consumer devices, naive integration of the Optane SSD leads to several system-level overheads, mostly related to (1) the Linux block I/O layer, which can negatively impact overall performance; and (2) the off-chip traffic to the swap space, which can negatively impact energy consumption. To reduce the Linux block I/O layer overheads, we tailor several system-level mechanisms (i.e., the I/O scheduler and the I/O request completion mechanism) to the currently-running application’s access pattern. To reduce the off-chip traffic overhead, we leverage an operating system feature (called Zswap) that allocates some DRAM space to be used as a compressed in-DRAM cache for data swapped between DRAM and the Intel Optane SSD, significantly reducing energy consumption caused by the off-chip traffic to the swap space. We conclude that emerging NVMs are a cost-effective solution to alleviate the DRAM capacity bottleneck in consumer devices, which can be further enhanced by tailoring system-level mechanisms to better leverage the characteristics of our workloads and the NVM.
- Published
- 2023
- Full Text
- View/download PDF
19. Investigation Into the Degradation of DDR4 DRAM Owing to Total Ionizing Dose Effects
- Author
-
Gyeongyeop Lee, Minki Suh, Minsang Ryu, Yunjong Lee, Jin-Woo Han, and Jungsik Kim
- Subjects
Annealing ,DDR4 ,dose rate ,DRAM ,gamma ray ,interface trap ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Total ionizing dose (TID) effects of gamma rays were investigated on DDR4 dynamic random access memory (DRAM) and analyzed using TCAD simulations. In this study, we considered the operating states, dose rates, temperatures, and annealing to analyze the impact of TID under different conditions. The worst degradation was observed in the operated state and at a low-dose rate because of the absence of an electrostatic barrier that reduced the possibility of interface trap formation under unbiased and high-dose rate conditions. At lower temperatures, the effects of radiation were mitigated by the reduced production of protons ( $\text{H}^{+}$ ). In addition, the unbiased DRAM and high-temperature conditions are the fastest to recover during post-irradiation annealing. In TCAD simulations, the retention time decreased with increasing temperature because the band-to-band tunneling (BTBT) generation increased. Furthermore, the retention time and row activation latency ( $t_{\mathrm {RCD}}$ ) degraded as the concentration of the interface traps increased. This is because the interface traps caused leakage currents and hindered the flow of electrons.
- Published
- 2023
- Full Text
- View/download PDF
20. Overhang Saddle Fin Sidewall Structure for Highly Reliable DRAM Operation
- Author
-
Jin-Woo Han, Minki Suh, Gyeongyeop Lee, and Jungsik Kim
- Subjects
Overhang saddle fin ,DRAM ,rowhammer ,retention time ,TCAD simulation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
A novel memory cell transistor structure based on a saddle fin-based DRAM is presented for highly reliable operations. The overhang saddle fin (oss-fin) active structure is formed by two steps of etching of the fin; isotropic etching for the short side and anisotropic etching for the long side of the fin. The overhang sidewall fin structure results in the increase of retention time, decrease of isolation leakage current, increase of rowhammering tolerance, and increase of programming efficiency. A Technology Computer-Aided Design (TCAD) simulation study compares the overhang and the conventional saddle fin (s-fin) in terms of those reliability parameters. A lowered electric field underneath the storage node, a lowered passing gate coupling capacitance, and an elongated isolation leakage path are attributed to the reliability enhancements in the overhang saddle fin.
- Published
- 2023
- Full Text
- View/download PDF
21. BL-PIM: Varying the Burst Length to Realize the All-Bank Performance and Minimize the Multi-Workload Interference for in-DRAM PIM
- Author
-
Chang Hyun Kim, Won Jun Lee, Yoonah Paik, Seok Young Kim, and Seon Wook Kim
- Subjects
DRAM ,a memory controller ,a burst length ,processing-in-DRAM ,JEDEC ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
As the demand for transformer applications increases rapidly, technologies to solve memory bottlenecks are attracting attention. One of them is an in-DRAM Processing-In-Memory (PIM) architecture to perform the computation inside DRAM. Major DRAM makers introduce the PIM samples, executing all banks’ computations simultaneously to maximize the internal DRAM bandwidth for achieving the highest performance. However, the realization as a commercial product is problematic since the all-bank execution does not concurrently perform non-PIM applications during the PIM execution with PIM memory, thus separating their memory space. This paper proposes a BL-PIM architecture to increase the burst length (BL) of memory requests inside a bank to maximize internal bandwidth and overlap the computation across banks, thus achieving all-bank performance. On the other hand, outside a bank, it seems not to increase the BL, thus allowing us to preserve the data consistency in memory hierarchy and execute non-PIM and PIM applications together with PIM memory. Also, the memory-intensive PIM computation using larger BL significantly reduces their outstanding memory requests, thus minimizing the performance interference with other applications. We carefully extend the DRAM timing diagram and develop the cooperation mechanism between a memory controller and a PIM device. We implemented the BL-PIM architecture on FPGA and compared the performance with real machines using four transformer models and eight compute and memory-bound SPEC benchmarks. We achieved the BL-PIM performance up to 28.9x and 12.0x faster than the CPU single-thread and multi-threaded execution in the transformer models. Also, when we increased the burst length by 16 times as the maximum, the BL-PIM was 1.2x faster than the ideal all-bank PIM execution. We also experimented with the multi-workload execution using the SPEC benchmarks, showing that our architecture can minimize performance interference. To our knowledge, the study of the PIM’s multi-workload execution is the first in public.
- Published
- 2023
- Full Text
- View/download PDF
22. High-Performance and Power-Saving Mechanism for Page Activations Based on Full Independent DRAM Sub-Arrays in Multi-Core Systems
- Author
-
Tareq A. Alawneh, Mutsam M. Jarajreh, Jawdat S. Alkasassbeh, and Ahmed A. M. Sharadqh
- Subjects
DRAM ,fine-grained ,main memory ,page activation ,performance ,power efficiency ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Modern DRAM devices’ performance and energy efficiency are significantly improved when the row-buffer locality is exploited properly. In multi-core architectures, however, the DRAM-based main memory banks used by the processing units, called cores, are shared. Memory interference, also known as memory contention, occurs when many cores contend for simultaneous access to the shared banks. The performance benefits provided by utilizing the available row-buffer locality are diminished by the increased memory contention brought on by the integration of more cores. Large DRAM page sizes are therefore activated in order to access only a tiny amount of data. Poor energy efficiency or wasted opportunity to loosen DRAM power timing restrictions are both downsides to this page over-fetching issue. This study introduces a Fine-Grained Activation (FGA) technique to reduce the number of involved bitlines when accessing DRAM memory. This technique significantly improves the parallelism at the DRAM subarray level to support multiple memory accesses routed to distinct subarrays inside the same memory bank. The FGA technique presented in this research intends to provide large energy savings while simultaneously delivering significant performance gains. Our evaluation findings with 4-core multi-program benchmarks demonstrate that the FGA technique proposed in this paper can significantly improve both DRAM performance and DRAM energy efficiency with a negligible area overhead. In comparison to the baseline, the Half-DRAM page activation mechanism, and the recently suggested FGA mechanism, the proposed technique in this study reduces the average DRAM memory access latency for the evaluated four-core applications by 25.6%, 27.1%, and 14.8%, respectively. Our introduced technique also decreases the DRAM activation power by an average of 46.7%, 27.1%, and 14.7%, respectively, when compared with the baseline, Half-DRAM technique, and the recently proposed FGA mechanism.
- Published
- 2023
- Full Text
- View/download PDF
23. A Custom Hardware Architecture for the Link Assessment Problem
- Author
-
Chinazzo, André, Schryver, Christian De, Zweig, Katharina, Wehn, Norbert, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bast, Hannah, editor, Korzen, Claudius, editor, Meyer, Ulrich, editor, and Penschuck, Manuel, editor
- Published
- 2022
- Full Text
- View/download PDF
24. A True Process-Heterogeneous Stacked Embedded DRAM Structure Based on Wafer-Level Hybrid Bonding.
- Author
-
Wang, Song, Jiang, Xiping, Bai, Fujun, Xiao, Wenwu, Long, Xiaodong, Ren, Qiwei, and Kang, Yi
- Subjects
DYNAMIC random access memory ,HYBRID securities ,THREE-dimensional integrated circuits ,WAFER level packaging ,SEMICONDUCTOR wafer bonding ,ENERGY consumption - Abstract
In response to the increasing manufacturing complexity/cost in maintaining DRAM advancements through traditional scaling, three-dimensional integrated circuits (3D ICs) and 2.5-dimensional ICs with Si interposers are known as promising candidates to overcome these challenges due to their advantages of low power, small form factor, high density, and high bandwidth. In this work, we present a true process-heterogeneous stacked embedded DRAM (SeDRAM) using hybrid bonding 3D integration process, achieving high bandwidth of 34 GBps/Gbit and high energy efficiency of 0.88 pJ/bit. Moreover, the critical factors of the SeDRAM design are presented (e.g., the low data movement energy, high-density physical interface, simplified protocol definition, process compatibility, density extensibility, and hybrid bonding connection fast test by DFT (design for test). Our results and design methodology have paved the way to realize applications of hybrid bonding to high bandwidth and energy efficiency DRAM. More importantly, the SeDRAM solution can also support the maximum storage density of 48 Gbit and the bandwidth capability of TBps. It can greatly alleviate the "memory wall" problem and thus improve its competitiveness in near-memory computing/computing-in-memory fields. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Fully bulk CMOS compatible Key Shape Floating Body Memory (KFBM)
- Author
-
Masakazu Kakumu, Yisuo Li, Koji Sakui, and Nozomu Harada
- Subjects
Floating body memory ,KFBM ,CMOS compatibility ,DRAM ,Flash ,4F2 ,Electric apparatus and materials. Electric circuits. Electric networks ,TK452-454.4 ,Computer engineering. Computer hardware ,TK7885-7895 - Abstract
This paper presents a capacitorless memory cell with bulk CMOS compatibility, consisting of a MOSFET with a virtual floating body formed by the trench. The name Key shape Floating Body Memory (KFBM) is derived from the resemblance of the structure to the shape of an antique key. The carrier concentration in the vertical device beneath the MOSFET results in over more than 5 orders of magnitude of the on–off cell current ratio with off-current less than 100pA/cell. The device achieves a retention time of about 1 s at 85C and over 10 s at 27C all the while maintaining high density and scalability. On the basis of TCAD simulation we have demonstrated high tolerance to disturbance (more than 1000 times with all types of signals), which has been an issue with DRAM memories. KFBM can incorporate both dynamic RAM and flash features.
- Published
- 2023
- Full Text
- View/download PDF
26. 2T1C DRAM based on semiconducting MoS2 and semimetallic graphene for in-memory computing
- Author
-
Gou Saifei, Wang Yin, Dong Xiangqi, Xu Zihan, Wang Xinyu, Sun Qicheng, Xie Yufeng, Zhou Peng, and Bao Wenzhong
- Subjects
molybdenum disulfide (MoS2) ,graphene ,DRAM ,in-memory computing ,Science ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
In-memory computing is an alternative method to effectively accelerate the massive data-computing tasks of artificial intelligence (AI) and break the memory wall. In this work, we propose a 2T1C DRAM structure for in-memory computing. It integrates a monolayer graphene transistor, a monolayer MoS2 transistor, and a capacitor in a two-transistor-one-capacitor (2T1C) configuration. In this structure, the storage node is in a similar position to that of one-transistor-one-capacitor (1T1C) dynamic random-access memory (DRAM), while an additional graphene transistor is used to accomplish the non-destructive readout of the stored information. Furthermore, the ultralow leakage current of the MoS2 transistor enables the storage of multi-level voltages on the capacitor with a long retention time. The stored charges can effectually tune the channel conductance of the graphene transistor due to its excellent linearity so that linear analog multiplication can be realized. Because of the almost unlimited cycling endurance of DRAM, our 2T1C DRAM has great potential for in situ training and recognition, which can significantly improve the recognition accuracy of neural networks.
- Published
- 2023
- Full Text
- View/download PDF
27. CYBER SECURITY IN INDUSTRIAL CONTROL SYSTEMS (ICS): A SURVEY OF ROWHAMMER VULNERABILITY
- Author
-
Hakan AYDIN and Ahmet SERTBAŞ
- Subjects
rowhammer ,cyber security ,dram ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Increasing dependence on Information and Communication Technologies (ICT) and especially on the Internet in Industrial Control Systems (ICS) has made these systems the primary target of cyber-attacks. As ICS are extensively used in Critical Infrastructures (CI), this makes CI more vulnerable to cyber-attacks and their protection becomes an important issue. On the other hand, cyberattacks can exploit not only software but also physics; that is, they can target the fundamental physical aspects of computation. The newly discovered RowHammer (RH) fault injection attack is a serious vulnerability targeting hardware on reliability and security of DRAM (Dynamic Random Access Memory). Studies on this vulnerability issue raise serious security concerns. The purpose of this study was to overview the RH phenomenon in DRAMs and its possible security risks on ICSs and to discuss a few possible realistic RH attack scenarios for ICSs. The results of the study revealed that RH is a serious security threat to any computerbased system having DRAMs, and this also applies to ICS.
- Published
- 2022
- Full Text
- View/download PDF
28. Atomic layer deposition of molybdenum oxide using (NtBu)2(NMe2)2Mo, hydrogen peroxide (H2O2), and ozone (O3) for DRAM application.
- Author
-
Lee, Seunghwan, Yang, Hae Lin, Kim, Beomseok, Lee, Jinho, Lim, Hanjin, and Park, Jin-Seong
- Subjects
- *
ATOMIC layer deposition , *MOLYBDENUM oxides , *MOLYBDENUM , *OZONE , *HYDROGEN peroxide , *STRAY currents - Abstract
Molybdenum oxide (MoO x) films have the unique characteristics of a number of possible structures and high work functions. In DRAM using high-k dielectrics, MoO x can be used to reduce leakage current that originates from Schottky emission. High quality MoO x can be deposited using atomic layer deposition (ALD) that is advantageous in terms of the resulting film's high uniformity, high conformality, and precise thickness controllability. In this work, MoO x films were deposited using bis(tert-butylimido)-bis(dimethylamido)molybdenum ((N t Bu) 2 (NMe 2) 2 Mo) as a metal precursor, and hydrogen peroxide (H 2 O 2) and ozone (O 3) as oxidants. The MoO x films were deposited between 100 and 300 °C growth temperature. MoO x deposited at 200 °C using H 2 O 2 and O 3 shows different GPC values of 0.08 and 0.20 Å/cycle, respectively, due to the different reactivities of the oxidants. The O/Mo ratio, atomic concentration of impurities, crystallinity, optical properties, work function, and sheet resistance of TiN altered by MoO x fabricated using H 2 O 2 and O 3 were investigated. The reactivity of O 3 is higher than that of H 2 O 2 , which increases the sheet resistance of TiN by 23.1%. Finally, a cross section of MoO x deposited with H 2 O 2 on a trench wafer was investigated to demonstrate conformal deposition onto a complex structure. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. The Dynamic Random Access Memory Challenge in Embedded Computing Systems
- Author
-
Jung, Matthias, Weis, Christian, Wehn, Norbert, and Chen, Jian-Jia, editor
- Published
- 2021
- Full Text
- View/download PDF
30. On-Die Dynamic Remapping Cache: Strong and Independent Protection Against Intermittent Faults
- Author
-
Sangjae Park and Jungrae Kim
- Subjects
Computer architecture ,memory architecture ,DRAM ,reliability ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
As process scaling continues, DRAM is getting more vulnerable to errors. System companies and DRAM vendors have introduced ECC to protect DRAM against growing errors. ECC, however, should be combined with a repair mechanism to prevent non-transient faults from repeatedly producing errors and to prevent overlapping faults from accumulating into more severe errors. We propose a novel approach to repair memory and improve reliability with minimal overheads. On-die Dynamic Remapping Cache (DRC) minimizes the repair overheads by focusing on active faults. Most intermittent faults (e.g., Variable Retention Time) generate errors occasionally, and the number of active faults at any one time is significantly lower than the total. DRC tracks the activity and severity of faults at run-time and uses a small cache inside a DRAM to remap active faults. This efficiency enables aggressive remapping of bit faults, which eliminates fault accumulations and improves reliability. Our evaluation shows DRC can provide much stronger protection than the state-of-the-art protection schemes with no performance degradation and a negligible chip area overhead.
- Published
- 2022
- Full Text
- View/download PDF
31. SEC-BADAEC: An Efficient ECC With No Vacancy for Strong Memory Protection
- Author
-
Yuseok Song, Sangjae Park, Michael B. Sullivan, and Jungrae Kim
- Subjects
Reliability ,ECC ,on-die ECC ,DRAM ,SEC ,BADAEC ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Shrinking process technology and rising memory densities have made memories increasingly vulnerable to errors. Accordingly, DRAM vendors have introduced On-die Error Correction Code (O-ECC) to protect data against the growing number of errors. Current O-ECC provides weak Single Error Correction (SEC), but future memories will require stronger protection as error rates rise. This paper proposes a novel ECC, called Single Error Correction–Byte-Aligned Double Adjacent Error Correction (SEC-BADAEC), and its construction algorithm to improve memory reliability. SEC-BADAEC requires the same redundancy as SEC O-ECC, but it can also correct some frequent 2-bit error patterns. Our evaluation shows SEC-BADAEC can improve memory reliability by 23.5% and system-level reliability by 29.8% with negligible overheads.
- Published
- 2022
- Full Text
- View/download PDF
32. A Single-Ended Transmitter With Low Switching Noise Injection and Quadrature Clock Correction Schemes for DRAM Interface
- Author
-
Dong-Wan Ko and Won-Young Lee
- Subjects
DRAM ,transmitter (TX) ,double data rate (DDR) ,clock correction ,switching noise ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper presents a transmitter with a phase controller for low switching noise injection and a quadrature clock corrector (QCC) for correcting both phase error and duty cycle distortion of the divided quadrature clocks. The phase errors and the duty cycle distortions of the quadrature clocks determine the quality of the output DQS. The proposed QCC simultaneously runs phase correction and the duty adjustment of quadrature clocks for fast correction time. In order to reduce power switching noise induced by output drivers, the proposed transmitter transfers the data at different timings using the phase controller which generates the interpolated quadrature clocks for even and odd channels. Since the even channel is synchronized with the reference quadrature clocks and the odd channel is synchronized with the interpolated quadrature clocks, the peak switching currents consumed by output drivers are spread. The proposed circuit has been designed in 180-nm CMOS process using VDD of 1.8-V and VDDQ of 0.6-V and the target data rate is 3.2 Gbps. The corrected quadrature clocks have the duty cycle distortion of 0.2% and the phase error of 1.18° with input clock distortion. The output DQS of the transmitter shows the peak-to-peak jitter of 30.55-ps in the low switching noise injection mode with the phase offset of 122°, which is improved by 28.8% as compared to the normal mode.
- Published
- 2022
- Full Text
- View/download PDF
33. Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System
- Author
-
Juan Gomez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu
- Subjects
Processing-in-memory ,near-data processing ,memory systems ,data movement bottleneck ,DRAM ,benchmarking ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of main memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new 3D-stacked memory technologies that integrate memory with a logic layer where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. The UPMEM PIM architecture combines traditional DRAM memory arrays with general-purpose in- order cores, called DRAM Processing Units (DPUs), integrated in the same chip. This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. We evaluate the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and compare their performance and energy consumption to their modern CPU and GPU counterparts. Our extensive evaluation conducted on two real UPMEM-based PIM systems with 640 and 2,556 DPUs provides new insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems.
- Published
- 2022
- Full Text
- View/download PDF
34. Software-defined significance-driven computing
- Author
-
Chalios, Charalambos, Vandierendonck, Hans, de Supinski, Bronis, and Nikolopoulos, Dimitrios
- Subjects
approximate computing ,significance-based computing ,parallel computing ,reliability ,task-based programming ,DRAM ,scheduling ,linux - Abstract
Approximate computing has been an emerging programming and system design paradigm that has been proposed as a way to overcome the power-wall problem that hinders the scaling of the next generation of both high-end and mobile computing systems. Towards this end, a lot of researchers have been studying the effects of approximation to applications and those hardware modifications that allow increased power benefits for reduced reliability. In this work, we focus on runtime system modifications and task-based programming models that enable software-controlled, user-driven approximate computing. We employ a systematic methodology that allows us to evaluate the potential energy and performance benefits of approximate computing using as building blocks unreliable hardware components. We present a set of extensions to OpenMP 4.0 that enable the programmer to define computations suitable for approximation. We introduce task-significance, a novel concept that describes the contribution of a task to the quality of the result. We use significance as a channel of communication from domain specific knowledge about applications towards the runtime-system, where we can optimise approximate execution depending on user constraints. Finally, we show extensions to the Linux kernel that enable it to operate seamlessly on top of unreliable memory and provide a user-space interface for memory allocation from the unreliable portion of the physical memory. Having this framework in place allowed us to identify what we call the refresh-by-access property of applications that use dynamic random-access memory (DRAM). We use this property to implement techniques for task-based applications that minimise the probability of errors when using unreliable memory enabling increased quality and power efficiency when using unreliable DRAM.
- Published
- 2017
35. DRAMing for autophagy.
- Author
-
Leytens, Alexandre and Dengjel, Jörn
- Subjects
- *
CELL physiology , *CELL death , *LYSOSOMES , *CYTOPROTECTION , *AUTOPHAGY , *APOPTOSIS - Abstract
Autophagy, a catabolic lysosomal recycling pathway, is often found dysregulated in human diseases. Whereas its prime cell stress‐related function is cytoprotection, autophagy has also been linked to the activation of apoptosis and cell death. One group of proteins which participates in the orchestration of autophagy and apoptosis is the family of DRAM proteins. In the current issue of The FEBS Journal, Barthet et al. uncover a compensatory crosstalk between the two newest members of the family, DRAM‐4 and DRAM‐5, the latter one regulating autophagic activity. Comment on https://doi.org/10.1111/febs.16365 [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Improved thermal network modeling of die stacking DRAM and optimization.
- Author
-
Li, Mingtai, Li, Tuanjie, and Tang, Yaqiong
- Subjects
- *
DYNAMIC random access memory , *FINITE element method , *HEAT transfer - Abstract
As the dynamic random access memory (DRAM) chip tends to the larger storage capacity by die stacking, the 3D die stacking requires thermal modeling for fast temperature predicting and initial design. This work presents a theoretical model capable of fast calculating and optimizing the temperature of each die. The improved thermal network defines equivalent shape correction parameters to improve the calculation accuracy of the thermal network. The 3D die sacking derives a novel topology of improved thermal network through the division of the heat transfer path inside DRAM. The analysis demonstrates that the calculating results of improved thermal network show good consistency with the finite element method in steady and transient states thermal analysis. Beyond this, the effect of size and thermal power is discussed in the calculation accuracy. The improved thermal network is used in the optimization design of DRAM with eight-dies vertical stacking. The maximum temperature increment is reduced by 15% after optimization. • The improved thermal network of DRAM considers all heat transfer paths in the 3D die stacking. • The shape correction parameters are defined to improve the calculation accuracy of the improved thermal network. • The effects of size and thermal power on the calculation accuracy of the improved thermal network are analyzed. • The model effectiveness is verified by an optimization design of DRAM with eight-die stacking. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. The Performance Enhancement of PMOSFETs and Inverter Chains at Low Temperature and Low Voltage by Removing Plasma-Damaged Layers.
- Author
-
Song, Junhwa, Lee, Eunsun, Hong, Seungho, Kim, Jihun, Oh, Jeonghoon, and Choi, Byoungdeog
- Subjects
LOW temperatures ,DYNAMIC random access memory ,TRANSMISSION electron microscopes ,COLD (Temperature) ,LOW voltage systems - Abstract
In this work we report on the improvement in cold temperature characteristics of PMOSFETs and inverter circuits by removing the plasma-damaged layer of the source/drain contacts. We removed the plasma-induced damage on the Si using a simple in situ Si soft treatment technique. We found by transmission electron microscope (TEM) analysis that the damaged amorphous layer reduced from 52 Å to 42 Å and 35 Å with a treatment time of 10 and 20 s, respectively. As a result, the resistances of both the n+ and p+ contacts decreased for all contact sizes and the standard deviations at the cold temperature were suppressed by 45%. At −25 °C, the saturation current of the PMOSFET increased by 3% and the propagation delay time (t
PD ) decreased by 2%. The tPD increases by 19.3% when the temperature decreases from 85 °C to −25 °C, and the operating voltage decreases from 1.2 V to 0.95 V at the same time. However, this increase can be reduced to 17% by applying the soft treatment for 10 s. This simple and short time process will be considered essential for both mobile applications and automotive applications of dynamic random access memory (DRAM) devices requiring a low-voltage and low-temperature operation. [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
38. A Mini Tutorial of Processing in Memory: From Principles, Devices to Prototypes.
- Author
-
Pan, Biao, Wang, Guangyao, Zhang, He, Kang, Wang, and Zhao, Weisheng
- Abstract
Data movement overheads caused by the recent explosion in big data applications have made traditional von Neumann architecture fails to tackle big data workloads. Processing in Memory (PIM), where computational tasks are performed within the memory, has drawn increasing attention. To present meaningful insights to readers, we divide current PIM paradigm into charge-based and resistance-based categories according to different memory devices. This mini tutorial aims to provide a concise overview of the implementation of PIM schemes, highlighting important macro prototypes in artificial intelligence applications that have been released in the past five years. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
39. Insertion of Hafnium Interlayer to Improve the Thermal Stability of Ultrathin TiSi x in TiSi x /n + -Si Ohmic Contacts.
- Author
-
Liu, Yaodong, Sun, Xianglie, Xu, Jing, Gao, Jianfeng, Liu, Jinbiao, Zhou, Xuebing, Li, Yongliang, Li, Junfeng, Wang, Wenwu, Ye, Tianchun, and Luo, Jun
- Subjects
- *
OHMIC contacts , *THERMAL stability , *HAFNIUM , *THIN films , *X-ray photoelectron spectroscopy , *TRANSMISSION electron microscopy - Abstract
Serious agglomeration of ultrathin TiSix employed in source–drain (S–D) ohmic contacts after high-temperature annealing was manifested in previous work. This severely restricts its application in state-of-the-art DRAM peripheral 3-D FinFET transistors. In this work, the effects of hafnium (Hf) interlayer on the thermal stability of ultrathin TiSix in TiSix/n+-Si ohmic contacts were systematically studied. As-prepared ultrathin TiSix and TiSix/n+-Si ohmic contacts with 0-, 1-, 2-, and 3-nm Hf interlayers were characterized by means of specific contact resistivity ($\rho _{\text {c}}$), transmission electron microscopy (TEM), X-ray diffraction (XRD), and X-ray photoelectron spectroscopy (XPS). It is found that compared to the counterparts with 0-, 2-, and 3-nm Hf, the presence of 1-nm Hf interlayer is proven to be effective in enhancing the thermal stability of ultrathin TiSix significantly. With 1-nm Hf, amorphous HfO2, and Hf silicate are readily formed, which hinders the interdiffusion between Ti and Si atoms and resultant agglomeration of ultrathin TiSix films. This is thought to be responsible for such a remarkably enhanced thermal stability of ultrathin TiSix in TiSix/n+-Si ohmic contacts. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. KOMEDİ Mİ? DRAM MI? VİŞNE BAHÇESİ.
- Author
-
GÜNÖR, Tuğba
- Abstract
Copyright of Ankara Üniversitesi Dil ve Tarih-Cografya Fakültesi Dergisi DTCF Dergisi is the property of Ankara Universitesi Dil ve Tarih-Cografya Fakultesi (DTCF Dergisi) and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
41. Optimizing cache utilization in modern cache hierarchies
- Author
-
Huang, Cheng-Chieh, Nagarajan, Vijayanand, and Topham, Nigel
- Subjects
004.5 ,cache ,DRAM ,memory hierarchy - Abstract
Memory wall is one of the major performance bottlenecks in modern computer systems. SRAM caches have been used to successfully bridge the performance gap between the processor and the memory. However, SRAM cache’s latency is inversely proportional to its size. Therefore, simply increasing the size of caches could result in negative impact on performance. To solve this problem, modern processors employ multiple levels of caches, each of a different size, forming the so called memory hierarchy. Upon a miss, the processor will start to lookup the data from the highest level (L1 cache) to the lowest level (main memory). Such a design can effectively reduce the negative performance impact of simply using a large cache. However, because SRAM has lower storage density compared to other volatile storage, the size of an SRAM cache is restricted by the available on-chip area. With modern applications requiring more and more memory, researchers are continuing to look at techniques for increasing the effective cache capacity. In general, researchers are approaching this problem from two angles: maximizing the utilization of current SRAM caches or exploiting new technology to support larger capacity in cache hierarchies. The first part of this thesis focuses on how to maximize the utilization of existing SRAM cache. In our first work, we observe that not all words belonging to a cache block are accessed around the same time. In fact, a subset of words are consistently accessed sooner than others. We call this subset of words as critical words. In our study, we found these critical words can be predicted by using access footprint. Based on this observation, we propose critical-words-only cache (co cache). Unlike the conventional cache which stores all words that belongs to a block, co-cache only stores the words that we predict as critical. In this work, we convert an L2 cache to a co-cache and use L1s access footprint information to predict critical words. Our experiments show the co-cache can outperform a conventional L2 cache in the workloads whose working-set-sizes are greater than the L2 cache size. To handle the workloads whose working-set-sizes fit in the conventional L2, we propose the adaptive co-cache (acocache) which allows the co-cache to be configured back to the conventional cache. The second part of this thesis focuses on how to efficiently enable a large capacity on-chip cache. In the near future, 3D stacking technology will allow us to stack one or multiple DRAM chip(s) onto the processor. The total size of these chips is expected to be on the order of hundreds of megabytes or even few gigabytes. Recent works have proposed to use this space as an on-chip DRAM cache. However, the tags of the DRAM cache have created a classic space/time trade-off issue. On the one hand, we would like the latency of a tag access to be small as it would contribute to both hit and miss latencies. Accordingly, we would like to store these tags in a faster media such as SRAM. However, with hundreds of megabytes of die-stacked DRAM cache, the space overhead of the tags would be huge. For example, it would cost around 12 MB of SRAM space to store all the tags of a 256MB DRAM cache (if we used conventional 64B blocks). Clearly this is too large, considering that some of the current chip multiprocessors have an L3 that is smaller. Prior works have proposed to store these tags along with the data in the stacked DRAM array (tags-in-DRAM). However, this scheme increases the access latency of the DRAM cache. To optimize access latency in the DRAM cache, we propose aggressive tag cache (ATCache). Similar to a conventional cache, the ATCache caches recently accessed tags to exploit temporal locality; it exploits spatial locality by prefetching tags from nearby cache sets. In addition, we also address the high miss latency issue and cache pollution caused by excessive prefetching. To reduce this overhead, we propose a cost-effective prefetching, which is a combination of dynamic prefetching granularity tunning and hit-prefetching, to throttle the number of sets prefetched. Our proposed ATCache (which consumes 0.4% of overall tag size) can satisfy over 60% of DRAM cache tag accesses on average. The last proposed work in this thesis is a DRAM-Cache-Aware (DCA) DRAM controller. In this work, we first address the challenge of scheduling requests in the DRAM cache. While many recent DRAM works have built their techniques based on a tagsin- DRAM scheme, storing these tags in the DRAM array, however, increases the complexity of a DRAM cache request. In contrast to a conventional request to DRAM main memory, a request to the DRAM cache will now translate into multiple DRAM cache accesses (tag and data). In this work, we address challenges of how to schedule these DRAM cache accesses. We start by exploring whether or not a conventional DRAM controller will work well in this scenario. We introduce two potential designs and study their limitations. From this study, we derive a set of design principles that an ideal DRAM cache controller must satisfy. We then propose a DRAM-cache-aware (DCA) DRAM controller that is based on these design principles. Our experimental results show that DCA can outperform the baseline over 14%.
- Published
- 2016
42. Optically connected memory for disaggregated data centers.
- Author
-
Gonzalez, Jorge, G. Palma, Mauricio, Hattink, Maarten, Rubio-Noriega, Ruth, Orosa, Lois, Mutlu, Onur, Bergman, Keren, and Azevedo, Rodolfo
- Subjects
- *
MEMORY , *OPTICAL devices , *ENERGY consumption - Abstract
• Optical memory disaggregation for data centers achieves low energy-per-bit consumption. • Main memory disaggregation using state-of-the-art optical devices. • First evaluation of both energy-per-bit and system-level performance for main memory disaggregation with optical devices. Recent advances in integrated photonics enable the implementation of reconfigurable, high-bandwidth, and low energy-per-bit interconnects in next-generation data centers. We propose and evaluate an Optically Connected Memory (OCM) architecture that disaggregates the main memory from the computation nodes in data centers. OCM is based on micro-ring resonators (MRRs), and it does not require any modification to the DRAM memory modules. We calculate energy consumption from real photonic devices and integrate them into a system simulator to evaluate performance. Our results show that (1) OCM is capable of interconnecting four DDR4 memory channels to a computing node using two fibers with 1.02 pJ energy-per-bit consumption and (2) OCM performs up to 5.5× faster than a disaggregated memory with 40G PCIe NIC connectors to computing nodes. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. DRAMSys4.0: An Open-Source Simulation Framework for In-depth DRAM Analyses.
- Author
-
Steiner, Lukas, Jung, Matthias, Prado, Felipe S., Bykov, Kirill, and Wehn, Norbert
- Subjects
- *
DYNAMIC random access memory , *SOFTWARE architecture , *MATHEMATICAL optimization - Abstract
The simulation of Dynamic Random Access Memories (DRAMs) on system level requires highly accurate models due to their complex timing and power behavior. However, conventional cycle-accurate DRAM subsystem models often become a bottleneck for the overall simulation speed. A promising alternative are simulators based on Transaction Level Modeling, which can be fast and accurate at the same time. In this paper we present DRAMSys4.0, which is, to the best of our knowledge, the fastest and most extensive open-source cycle-accurate DRAM simulation framework. DRAMSys4.0 includes a novel software architecture that enables a fast adaption to different hardware controller implementations and new JEDEC standards. In addition, it already supports the latest standards DDR5 and LPDDR5. We explain how to apply optimization techniques for an increased simulation speed while maintaining full temporal accuracy. Furthermore, we demonstrate the simulator's accuracy and analysis tools with two application examples. Finally, we provide a detailed investigation and comparison of the most prominent cycle-accurate open-source DRAM simulators with regard to their supported features, analysis capabilities and simulation speed. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Model Updating of a Prestressed Concrete Rigid Frame Bridge Using Multiple Markov Chain Monte Carlo Method and Differential Evolution.
- Author
-
Liu, G. and Jiang, W.
- Subjects
- *
MARKOV chain Monte Carlo , *MARKOV processes , *PRESTRESSED concrete , *DIFFERENTIAL evolution - Abstract
Model updating is a widely adopted method to minimize the error between test results from the real structure and outcomes from the finite element (FE) model for obtaining an accurate and reliable FE model of the target structure. However, uncertainties from the environment, excitation and measurement variability can reduce the accuracy of predictions of the updated FE model. The Bayesian model updating method using multiple Markov chains based on differential evolution adaptive metropolis (DREAM) algorithm is explored, which runs multiple chains simultaneously for a global exploration, and it automatically tunes the scale and orientation of the proposal distribution during the evolution of the posterior distribution. The performance of the proposed method is illustrated numerically with a beam model and a three-span rigid frame bridge. Results show that the DREAM algorithm is capable for updating the FE model in civil engineering. It extends the Bayesian model updating method to multiple Markov chains scenario, which provides higher accuracy than single chain algorithm such as the delayed rejection adaptive metropolis-hastings (DRAM) method. Moreover, results from both examples indicate that the proposed method is insensitive to values of initial parameters, which avoid errors resulting from inappropriate prior knowledge of parameters in the FE model updating. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Longevity of Commodity DRAMs in Harsh Environments Through Thermoelectric Cooling
- Author
-
Deepak M. Mathew, Hammam Kattan, Christian Weis, Jorg Henkel, Norbert Wehn, and Hussam Amrouch
- Subjects
DRAM ,commodity hardware ,thermoelectric ,harsh-environment ,automotive ,temperature ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Today, more and more commodity hardware devices are used in safety-critical applications, such as advanced driver assistance systems in automotive. These applications demand very high reliability of electronic components even in adverse environmental conditions, such as high temperatures. Ensuring the reliability of microelectronic components is a major challenge at these high temperatures. The computing systems of these applications rely on DRAMs as working memory, which are built upon bit cells that store charges in capacitors. These commodity DRAMs are optimized for cost per bit and not for high reliability. Thus, very high temperatures impose an enormous challenge for commodity DRAMs as the data retention time and reliability decrease largely, affecting the data correctness. Data correctness can be ensured up to certain temperatures by increasing the refresh rate to counterbalance the retention time reduction. However, this severely degrades the access latencies and the usable DRAM bandwidth. To overcome these limitations, we present for the first time a Thermoelectric Cooling (TEC) solution for commodity DRAMs in harsh-environments, such as automotive. Our TEC solution enables the use of commodity off-the-shelf DRAMs in safety-critical applications by reducing the temperature conditions to a range where they can operate reliably. This TEC solution is applied a posteriori to the DRAM chips without using high-cost package solutions. Thus, it maintains the low-cost targets of such devices, improves the reliability, and at the same time, counterbalances the adverse effects of increasing the refresh rate. To quantitatively evaluate the benefits of TEC on commodity DRAMs in harsh-environments, we performed system-level evaluations with several applications backed up by the measured data on commodity DRAMs. Our experimental results, using accurate multi-physics simulations that employ finite element method, demonstrate that the TEC-based cooling ensures that the maximum temperature of all DRAM chips is always below 85°C despite that the original on-chip temperature (i.e., in the absence of our TEC based cooling) goes beyond 120°C.
- Published
- 2021
- Full Text
- View/download PDF
46. System-Level Communication Performance Estimation for DMA-Controlled Accelerators
- Author
-
Sunwoo Kim, Sungkyung Park, and Chester Sungchung Park
- Subjects
CNN accelerator ,design space exploration ,direct memory access ,DRAM ,on-chip bus ,system-level performance estimation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The performance of a hardware accelerator is often limited by the communication bandwidth between local on-chip memories and DRAM across on-chip bus. In this paper, a system-level performance estimation algorithm is newly proposed for evaluating the communication performance of direct memory access (DMA) controlled accelerators. The proposed algorithm can estimate the communication performance accurately for both DRAM-limited and bus-limited cases. In detail, the communication performance for the DRAM-limited case is estimated using dynamic prediction of DRAM command patterns whereas the communication performance for the bus-limited case is estimated based on the maximum bus burst latency. Depending on whether the communication bandwidth is limited by the bus protocol overhead or the DRAM latency, the proposed algorithm estimates the communication bandwidth of a DMA-controlled accelerator according to the performance bottleneck. It is shown that the proposed algorithm significantly improves the estimation accuracy when it is applied to CNNs and wireless communications. Also, when the proposed algorithm together with a full-system simulator is used to explore a design space defined by a set of tile sizes and bus-related parameters, it speeds up conventional algorithms by more than a factor of 100 by filtering out a large number of unpromising design points. It is also shown that the proposed algorithm alone can approach the maximum accelerator performance with a performance degradation of less than 5%. An ablation study is applied to prove the efficacy of individual steps of the proposed algorithm.
- Published
- 2021
- Full Text
- View/download PDF
47. Power-Efficient Deep Convolutional Neural Network Design Through Zero-Gating PEs and Partial-Sum Reuse Centric Dataflow
- Author
-
Lin Ye, Jinghao Ye, Masao Yanagisawa, and Youhua Shi
- Subjects
CNNs ,zero-gating ,data reuse ,DRAM ,power consumption ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Convolution neural networks (CNNs) have shown great success in many areas such as object detection and pattern recognition at the cost of extreme high computation complexity and significant external memory access, which makes state-of-the-art deep CNNs difficult to be implemented on resource-constrained portable/wearable devices with limited capacity of battery. To address this design challenge, a power-efficient CNN design through zero-gating processing elements (PEs) and partial-sum reuse centric dataflow is proposed in this paper. Unlike the existing works which either only consider the zeros in activation maps or use off-chip training process for on-chip computation reduction, a zero-gating PE design is proposed to avoid unnecessary on-chip computation by taking advantages of the large number of zeros in both the filter’s weights of pre-trained models and the activation maps. Furthermore, a partial-sum reuse centric dataflow is also proposed for off-chip DRAM access reduction. The evaluation results show that the overall power consumption of PE arrays with our proposal can be reduced by 37% and 14% at the cost of 8% and 1% area overhead when compared to the baseline PE design and the existing only-activation-gated design (i.e. that in Eyeriss), respectively. Moreover, the proposed method can achieve 35% and 47% DRAM access reduction with the corresponding 14% and 49% energy savings for AlexNet and VGG-16 when compared to that in Eyeriss.
- Published
- 2021
- Full Text
- View/download PDF
48. CIDAN-XE: Computing in DRAM with Artificial Neurons.
- Author
-
Singh, Gian, Wagle, Ankit, Khatri, Sunil, and Vrudhula, Sarma
- Subjects
CONVOLUTIONAL neural networks ,NEURONS ,ENERGY consumption - Abstract
This paper presents a DRAM-based processing-in-memory (PIM) architecture, called CIDAN-XE. It contains a novel computing unit called the neuron processing element (NPE). Each NPE can perform a variety of operations that include logical, arithmetic, relational, and predicate operations on multi-bit operands. Furthermore, they can be reconfigured to switch operations during run-time without increasing the overall latency or power of the operation. Since NPEs consume a small area and can operate at very high frequencies, they can be integrated inside the DRAM without disrupting its organization or timing constraints. Simulation results on a set of operations such as AND, OR, XOR, addition, multiplication, etc., show that CIDAN-XE achieves an average throughput improvement of 72X/5.4X and energy efficiency improvement of 244X/29X over CPU/GPU. To further demonstrate the benefits of using CIDAN-XE, we implement several convolutional neural networks and show that CIDAN-XE can improve upon the throughput and energy efficiency over the latest PIM architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
49. Supporting Moderate Data Dependency, Position Dependency, and Divergence in PIM-Based Accelerators.
- Author
-
Lenjani, Marzieh and Skadron, Kevin
- Subjects
- *
ACCESS control , *RANDOM access memory , *GRAPHICS processing units , *WORD problems (Mathematics) - Abstract
Processing in memory (PIM) can alleviate the data movement overhead. However, PIM units built inside memory layers have low frequency, and this requires high parallelism to compensate for the low clock frequency. Single-instruction–multiple-data (SIMD) architectures can provide high parallelism for PIM with low overhead per arithmetic logic unit (ALU) operation. In SIMD, multiple ALUs perform the same instruction, and the processing unit accesses multiple consecutive words at once. Therefore, the control and access overhead is amortized among ALU operations. However, SIMD units cannot fully exploit the available word-level parallelism for 1) operations with data/position dependency or 2) operations with divergence (where different operations are performed on different words). A recent work, Fulcrum, proposes a subarray-level PIM design with high parallelism. This article discusses how Fulcrum alleviates the control and access overhead while exploiting word-level parallelism for operations with data/position dependency and divergence. We evaluate Fulcrum against bank-level SIMD architectures to highlight these benefits. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
50. Analytical Model for Memory-Centric High Level Synthesis-Generated Applications.
- Author
-
Davila-Guzman, Maria Angelica, Tejero, Ruben Gran, Villarroya-Gaudo, Maria, and Gracia, Dario Suarez
- Subjects
- *
FIELD programmable gate arrays , *HIGH performance computing , *KERNEL operating systems , *BINARY sequences , *RANDOM access memory , *ENERGY consumption - Abstract
High performance computing (HPC) demands huge memory bandwidth and computing resources to achieve maximum performance and energy efficiency. FPGAs can provide both, and with the help of High Level Synthesis, those HPC applications can be easily written in high level languages. However, the optimization process remains time-consuming, especially when based on trial-and-error bitstream generation. Model-based performance prediction is a practical and fast approach for kernel optimization, specially if done with information from pre-synthesis reports. This article presents an analytical model focused on memory intensive applications that captures the memory behavior and accurately predicts the kernel execution time within seconds rather than hours, as bitstream generation requires. The model has been validated with two DRAM technologies: DDR4 and HBM2, with a set of microbenchmarks and high performance computing applications showing an average error of 11% for DDR4 and 10% for HBM2. Compared with previous studies, our predictions at least halve the estimation error. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.