378 results on '"Lee, Donghyuk"'
Search Results
2. Experimental study on water circulation performance of combined interstitial permeable retention block pavement
- Author
-
Park, Joohoun, Choi, Kyoungyoung, Park, Jaerock, and Lee, Donghyuk
- Published
- 2025
- Full Text
- View/download PDF
3. Graph neural networks for classification and error detection in 2D architectural detail drawings
- Author
-
Ko, Jaechang and Lee, Donghyuk
- Published
- 2025
- Full Text
- View/download PDF
4. Purification and Detection of Ubiquitinated Plant Proteins Using Tandem Ubiquitin Binding Entities
- Author
-
Lee, DongHyuk and Coaker, Gitta
- Subjects
Biochemistry and Cell Biology ,Biological Sciences ,Generic health relevance ,Ubiquitinated Proteins ,Plant Proteins ,Ubiquitin ,Ubiquitination ,Proteasome Endopeptidase Complex ,Receptors ,Chimeric Antigen ,Plant ubiquitination ,TUBE ,Tandem ubiquitin binding entities ,Other Chemical Sciences ,Developmental Biology ,Biochemistry and cell biology ,Medicinal and biomolecular chemistry - Abstract
The timing and amplitude of plant signaling are frequently regulated through posttranslational modification of key signaling sectors, which facilitates rapid and flexible responses. Protein ubiquitination can serve as a degradation marker, influence subcellular localization, alter protein-protein interactions, and affect protein activity. Identification of polyubiquitinated proteins has been challenging due to their rapid degradation by the proteasome or removal of modifications by deubiquitination enzymes (DUBs). Tandem ubiquitin binding entities (TUBEs) are based on ubiquitin-associated domains and protect against both proteasomal degradation and DUBs. Here, we provide a protocol for purification of ubiquitinated plant proteins using TUBEs after transient expression in Nicotiana benthamiana. This protocol can also be applied to other plants to purify multiple ubiquitinated proteins or track ubiquitination of a target protein. This methodology provides an effective method for identification of ubiquitin ligase substrates and can be coupled with TUBEs targeting specific ubiquitination linkages.
- Published
- 2023
5. Genomes and epigenomes of matched normal and tumor breast tissue reveal diverse evolutionary trajectories and tumor-host interactions
- Author
-
Zhu, Bin, Tapinos, Avraam, Koka, Hela, Yi Lee, Priscilla Ming, Zhang, Tongwu, Zhu, Wei, Wang, Xiaoyu, Klein, Alyssa, Lee, DongHyuk, Tse, Gary M., Tsang, Koon-ho, Wu, Cherry, Hua, Min, Highfill, Chad A., Lenz, Petra, Zhou, Weiyin, Wang, Difei, Luo, Wen, Jones, Kristine, Hutchinson, Amy, Hicks, Belynda, Garcia-Closas, Montserrat, Chanock, Stephen, Tse, Lap Ah, Wedge, David C., and Yang, Xiaohong R.
- Published
- 2024
- Full Text
- View/download PDF
6. CD20/TNFR1 dual-targeting antibody enhances lysosome rupture-mediated cell death in B cell lymphoma
- Author
-
Kim, Jeong Ryeol, Lee, Donghyuk, Kim, Yerim, and Kim, Joo Young
- Published
- 2023
- Full Text
- View/download PDF
7. DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis
- Author
-
Lym, Sangkug, Lee, Donghyuk, O'Connor, Mike, Chatterjee, Niladrish, and Erez, Mattan
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Machine Learning - Abstract
Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves when computing and memory resources are increased. We present DeLTA, the first analytical model that accurately estimates the traffic at each GPU memory hierarchy level, while accounting for the complex reuse patterns of a parallel convolution algorithm. We demonstrate that our model is both accurate and robust for different CNNs and GPU architectures. We then show how this model can be used to carefully balance the scaling of different GPU resources for efficient CNN performance improvement.
- Published
- 2019
- Full Text
- View/download PDF
8. Regulation of reactive oxygen species during plant immunity through phosphorylation and ubiquitination of RBOHD.
- Author
-
Lee, DongHyuk, Lal, Neeraj K, Lin, Zuh-Jyh Daniel, Ma, Shisong, Liu, Jun, Castro, Bardo, Toruño, Tania, Dinesh-Kumar, Savithramma P, and Coaker, Gitta
- Subjects
Reactive Oxygen Species ,Protein-Serine-Threonine Kinases ,Arabidopsis Proteins ,Signal Transduction ,Plant Diseases ,Gene Expression Regulation ,Plant ,Phosphorylation ,Ubiquitination ,Plant Immunity ,Protein Domains ,NADPH Oxidases ,Gene Expression Regulation ,Plant - Abstract
Production of reactive oxygen species (ROS) is critical for successful activation of immune responses against pathogen infection. The plant NADPH oxidase RBOHD is a primary player in ROS production during innate immunity. However, how RBOHD is negatively regulated remains elusive. Here we show that RBOHD is regulated by C-terminal phosphorylation and ubiquitination. Genetic and biochemical analyses reveal that the PBL13 receptor-like cytoplasmic kinase phosphorylates RBOHD's C-terminus and two phosphorylated residues (S862 and T912) affect RBOHD activity and stability, respectively. Using protein array technology, we identified an E3 ubiquitin ligase PIRE (PBL13 interacting RING domain E3 ligase) that interacts with both PBL13 and RBOHD. Mimicking phosphorylation of RBOHD (T912D) results in enhanced ubiquitination and decreased protein abundance. PIRE and PBL13 mutants display higher RBOHD protein accumulation, increased ROS production, and are more resistant to bacterial infection. Thus, our study reveals an intricate post-translational network that negatively regulates the abundance of a conserved NADPH oxidase.
- Published
- 2020
9. What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study
- Author
-
Ghose, Saugata, Yağlıkçı, Abdullah Giray, Gupta, Raghav, Lee, Donghyuk, Kudrolli, Kais, Liu, William X., Hassan, Hasan, Chang, Kevin K., Chatterjee, Niladrish, Agrawal, Aditya, O'Connor, Mike, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
Main memory (DRAM) consumes as much as half of the total system power in a computer today, resulting in a growing need to develop new DRAM architectures and systems that consume less power. Researchers have long relied on DRAM power models that are based off of standardized current measurements provided by vendors, called IDD values. Unfortunately, we find that these models are highly inaccurate, and do not reflect the actual power consumed by real DRAM devices. We perform the first comprehensive experimental characterization of the power consumed by modern real-world DRAM modules. Our extensive characterization of 50 DDR3L DRAM modules from three major vendors yields four key new observations about DRAM power consumption: (1) across all IDD values that we measure, the current consumed by real DRAM modules varies significantly from the current specified by the vendors; (2) DRAM power consumption strongly depends on the data value that is read or written; (3) there is significant structural variation, where the same banks and rows across multiple DRAM modules from the same model consume more power than other banks or rows; and (4) over successive process technology generations, DRAM power consumption has not decreased by as much as vendor specifications have indicated. Based on our detailed analysis and characterization data, we develop the Variation-Aware model of Memory Power Informed by Real Experiments (VAMPIRE). We show that VAMPIRE has a mean absolute percentage error of only 6.8% compared to actual measured DRAM power. VAMPIRE enables a wide range of studies that were not possible using prior DRAM power models. As an example, we use VAMPIRE to evaluate a new power-aware data encoding mechanism, which can reduce DRAM energy consumption by an average of 12.2%. We plan to open-source both VAMPIRE and our extensive raw data collected during our experimental characterization., Comment: presented at SIGMETRICS 2018
- Published
- 2018
10. Exploiting Row-Level Temporal Locality in DRAM to Reduce the Memory Access Latency
- Author
-
Hassan, Hasan, Pekhimenko, Gennady, Vijaykumar, Nandita, Seshadri, Vivek, Lee, Donghyuk, Ergin, Oguz, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This paper summarizes the idea of ChargeCache, which was published in HPCA 2016 [51], and examines the work's significance and future potential. DRAM latency continues to be a critical bottleneck for system performance. In this work, we develop a low-cost mechanism, called ChargeCache, that enables faster access to recently-accessed rows in DRAM, with no modifications to DRAM chips. Our mechanism is based on the key observation that a recently-accessed row has more charge and thus the following access to the same row can be performed faster. To exploit this observation, we propose to track the addresses of recently-accessed rows in a table in the memory controller. If a later DRAM request hits in that table, the memory controller uses lower timing parameters, leading to reduced DRAM latency. Row addresses are removed from the table after a specified duration to ensure rows that have leaked too much charge are not accessed with lower latency. We evaluate ChargeCache on a wide variety of workloads and show that it provides significant performance and energy benefits for both single-core and multi-core systems., Comment: arXiv admin note: substantial text overlap with arXiv:1609.07234
- Published
- 2018
11. SoftMC: Practical DRAM Characterization Using an FPGA-Based Infrastructure
- Author
-
Hassan, Hasan, Vijaykumar, Nandita, Khan, Samira, Ghose, Saugata, Chang, Kevin, Pekhimenko, Gennady, Lee, Donghyuk, Ergin, Oguz, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This paper summarizes the SoftMC DRAM characterization infrastructure, which was published in HPCA 2017, and examines the work's significance and future potential. SoftMC (Soft Memory Controller) is the first publicly-available DRAM testing infrastructure that can flexibly and efficiently test DRAM chips in a manner accessible to both software and hardware developers. SoftMC is an FPGA-based testing platform that can control and test memory modules designed for the commonly-used DDR (Double Data Rate) interface. SoftMC has two key properties: (i) it provides flexibility to thoroughly control memory behavior or to implement a wide range of mechanisms using DDR commands; and (ii) it is easy to use as it provides a simple and intuitive high-level programming interface for users, completely hiding the low-level details of the FPGA. We demonstrate the capability, flexibility, and programming ease of SoftMC with two example use cases. First, we implement a test that characterizes the retention time of DRAM cells. Second, we show that the expected latency reduction of two recently-proposed mechanisms, which rely on accessing recently-refreshed or recently-accessed DRAM cells faster than other DRAM cells, is not observable in existing DRAM chips. Various versions of the SoftMC platform have enabled many of our other DRAM characterization studies. We discuss several other use cases of SoftMC, including the ability to characterize emerging non-volatile memory modules that obey the DDR standard. We hope that our open-source release of SoftMC fills a gap in the space of publicly-available experimental memory testing infrastructures and inspires new studies, ideas, and methodologies in memory system design.
- Published
- 2018
12. Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency
- Author
-
Chang, Kevin K., Yaglıkçı, Abdullah Giray, Ghose, Saugata, Agrawal, Aditya, Chatterjee, Niladrish, Kashyap, Abhijith, Lee, Donghyuk, O'Connor, Mike, Hassan, Hasan, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This paper summarizes our work on experimental characterization and analysis of reduced-voltage operation in modern DRAM chips, which was published in SIGMETRICS 2017, and examines the work's significance and future potential. We take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the DRAM supply voltage is lowered below the nominal voltage level specified by DRAM standards. We perform an experimental study of 124 real DDR3L (low-voltage) DRAM chips manufactured recently by three major DRAM vendors. We find that reducing the supply voltage below a certain point introduces bit errors in the data, and we comprehensively characterize the behavior of these errors. We discover that these errors can be avoided by increasing the latency of three major DRAM operations (activation, restoration, and precharge). We perform detailed DRAM circuit simulations to validate and explain our experimental findings. We also characterize the various relationships between reduced supply voltage and error locations, stored data patterns, DRAM temperature, and data retention. Based on our observations, we propose a new DRAM energy reduction mechanism, called Voltron. The key idea of Voltron is to use a performance model to determine by how much we can reduce the supply voltage without introducing errors and without exceeding a user-specified threshold for performance loss. Our evaluations show that Voltron reduces the average DRAM and system energy consumption by 10.5% and 7.3%, respectively, while limiting the average system performance loss to only 1.8%, for a variety of memory-intensive quad-core workloads. We also show that Voltron significantly outperforms prior dynamic voltage and frequency scaling mechanisms for DRAM.
- Published
- 2018
13. LISA: Increasing Internal Connectivity in DRAM for Fast Data Movement and Low Latency
- Author
-
Chang, Kevin K., Nair, Prashant J., Ghose, Saugata, Lee, Donghyuk, Qureshi, Moinuddin K., and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This paper summarizes the idea of Low-Cost Interlinked Subarrays (LISA), which was published in HPCA 2016, and examines the work's significance and future potential. Contemporary systems perform bulk data movement movement inefficiently, by transferring data from DRAM to the processor, and then back to DRAM, across a narrow off-chip channel. The use of this narrow channel results in high latency and energy consumption. Prior work proposes to avoid these high costs by exploiting the existing wide internal DRAM bandwidth for bulk data movement, but the limited connectivity of wires within DRAM allows fast data movement within only a single DRAM subarray. Each subarray is only a few megabytes in size, greatly restricting the range over which fast bulk data movement can happen within DRAM. Our HPCA 2016 paper proposes a new DRAM substrate, Low-Cost Inter-Linked Subarrays (LISA), whose goal is to enable fast and efficient data movement across a large range of memory at low cost. LISA adds low-cost connections between adjacent subarrays. By using these connections to interconnect the existing internal wires (bitlines) of adjacent subarrays, LISA enables wide-bandwidth data transfer across multiple subarrays with little (only 0.8%) DRAM area overhead. As a DRAM substrate, LISA is versatile, enabling a variety of new applications. We describe and evaluate three such applications in detail: (1) fast inter-subarray bulk data copy, (2) in-DRAM caching using a DRAM architecture whose rows have heterogeneous access latencies, and (3) accelerated bitline precharging by linking multiple precharge units together. Our extensive evaluations show that each of LISA's three applications significantly improves performance and memory energy efficiency on a variety of workloads and system configurations.
- Published
- 2018
14. Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips
- Author
-
Chang, Kevin K., Kashyap, Abhijith, Hassan, Hasan, Ghose, Saugata, Hsieh, Kevin, Lee, Donghyuk, Li, Tianshi, Pekhimenko, Gennady, Khan, Samira, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This article summarizes key results of our work on experimental characterization and analysis of latency variation and latency-reliability trade-offs in modern DRAM chips, which was published in SIGMETRICS 2016, and examines the work's significance and future potential. The goal of this work is to (i) experimentally characterize and understand the latency variation across cells within a DRAM chip for these three fundamental DRAM operations, and (ii) develop new mechanisms that exploit our understanding of the latency variation to reliably improve performance. To this end, we comprehensively characterize 240 DRAM chips from three major vendors, and make six major new observations about latency variation within DRAM. Notably, we find that (i) there is large latency variation across the cells for each of the three operations; (ii) variation characteristics exhibit significant spatial locality: slower cells are clustered in certain regions of a DRAM chip; and (iii) the three fundamental operations exhibit different reliability characteristics when the latency of each operation is reduced. Based on our observations, we propose Flexible-LatencY DRAM (FLY-DRAM), a mechanism that exploits latency variation across DRAM cells within a DRAM chip to improve system performance. The key idea of FLY-DRAM is to exploit the spatial locality of slower cells within DRAM, and access the faster DRAM regions with reduced latencies for the fundamental operations. Our evaluations show that FLY-DRAM improves the performance of a wide range of applications by 13.3%, 17.6%, and 19.5%, on average, for each of the three different vendors' real DRAM chips, in a simulated 8-core system.
- Published
- 2018
15. RowClone: Accelerating Data Movement and Initialization Using DRAM
- Author
-
Seshadri, Vivek, Kim, Yoongu, Fallin, Chris, Lee, Donghyuk, Ausavarungnirun, Rachata, Pekhimenko, Gennady, Luo, Yixin, Mutlu, Onur, Gibbons, Phillip B., Kozuch, Michael A., and Mowry, Todd C.
- Subjects
Computer Science - Hardware Architecture - Abstract
In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do not involve any actual computation. RowClone exploits the organization and operation of commodity DRAM to perform these operations completely inside DRAM using two mechanisms. The first mechanism, Fast Parallel Mode, copies data between two rows inside the same DRAM subarray by issuing back-to-back activate commands to the source and the destination row. The second mechanism, Pipelined Serial Mode, transfers cache lines between two banks using the shared internal bus. RowClone significantly reduces the raw latency and energy consumption of bulk data copy and initialization. This reduction directly translates to improvement in performance and energy efficiency of systems running copy or initialization-intensive workloads, Comment: arXiv admin note: text overlap with arXiv:1605.06483
- Published
- 2018
16. Tiered-Latency DRAM: Enabling Low-Latency Main Memory at Low Cost
- Author
-
Lee, Donghyuk, Kim, Yoongu, Seshadri, Vivek, Liu, Jamie, Subramanian, Lavanya, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This paper summarizes the idea of Tiered-Latency DRAM (TL-DRAM), which was published in HPCA 2013, and examines the work's significance and future potential. The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM latency has remained almost constant, making memory latency the performance bottleneck in today's systems. We observe that the high access latency is not intrinsic to DRAM, but a trade-off is made to decrease the cost per bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense amplifier through a wire called a bitline. These bit-lines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM latency. Specialized low-latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense amplifier area overhead. To achieve both low latency and low cost per bit, we introduce Tiered-Latency DRAM (TL-DRAM). In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one of the two segments to be accessed with the latency of a short-bitline DRAM without incurring a high cost per bit. We propose mechanisms that use the low-latency segment as a hardware-managed or software-managed cache. Our evaluations show that our proposed mechanisms improve both performance and energy efficiency for both single-core and multiprogrammed workloads. Tiered-Latency DRAM has inspired several other works on reducing DRAM latency with little to no architectural modification., Comment: arXiv admin note: substantial text overlap with arXiv:1601.06903
- Published
- 2018
17. Adaptive-Latency DRAM: Reducing DRAM Latency by Exploiting Timing Margins
- Author
-
Lee, Donghyuk, Kim, Yoongu, Pekhimenko, Gennady, Khan, Samira, Seshadri, Vivek, Chang, Kevin, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This paper summarizes the idea of Adaptive-Latency DRAM (AL-DRAM), which was published in HPCA 2015, and examines the work's significance and future potential. AL-DRAM is a mechanism that optimizes DRAM latency based on the DRAM module and the operating temperature, by exploiting the extra margin that is built into the DRAM timing parameters. DRAM manufacturers provide a large margin for the timing parameters as a provision against two worst-case scenarios. First, due to process variation, some outlier DRAM chips are much slower than others. Second, chips become slower at higher temperatures. The timing parameter margin ensures that the slow outlier chips operate reliably at the worst-case temperature, and hence leads to a high access latency. Using an FPGA-based DRAM testing platform, our work first characterizes the extra margin for 115 DRAM modules from three major manufacturers. The experimental results demonstrate that it is possible to reduce four of the most critical timing parameters by a minimum/maximum of 17.3%/54.8% at 55C while maintaining reliable operation. AL-DRAM uses these observations to adaptively select reliable DRAM timing parameters for each DRAM module based on the module's current operating conditions. AL-DRAM does not require any changes to the DRAM chip or its interface; it only requires multiple different timing parameters to be specified and supported by the memory controller. Our real system evaluations show that AL-DRAM improves the performance of memory-intensive workloads by an average of 14% without introducing any errors. Our characterization and proposed techniques have inspired several other works on analyzing and/or exploiting different sources of latency and performance variation within DRAM chips., Comment: arXiv admin note: substantial text overlap with arXiv:1603.08454
- Published
- 2018
18. Exploiting the DRAM Microarchitecture to Increase Memory-Level Parallelism
- Author
-
Kim, Yoongu, Seshadri, Vivek, Lee, Donghyuk, Liu, Jamie, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This paper summarizes the idea of Subarray-Level Parallelism (SALP) in DRAM, which was published in ISCA 2012, and examines the work's significance and future potential. Modern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the high latency of on-chip memory. Adding more banks to the system to mitigate this problem incurs high system cost. Our goal in this work is to achieve the benefits of increasing the number of banks with a low-cost approach. To this end, we propose three new mechanisms, SALP-1, SALP-2, and MASA (Multitude of Activated Subarrays), to reduce the serialization of different requests that go to the same bank. The key observation exploited by our mechanisms is that a modern DRAM bank is implemented as a collection of subarrays that operate largely independently while sharing few global peripheral structures. Our three proposed mechanisms mitigate the negative impact of bank serialization by overlapping different components of the bank access latencies of multiple requests that go to different subarrays within the same bank. SALP-1 requires no changes to the existing DRAM structure, and needs to only reinterpret some of the existing DRAM timing parameters. SALP-2 and MASA require only modest changes (< 0.15% area overhead) to the DRAM peripheral structures, which are much less design constrained than the DRAM core. Our evaluations show that SALP-1, SALP-2 and MASA significantly improve performance for both single-core systems (7%/13%/17%) and multi-core systems (15%/16%/20%), averaged across a wide range of workloads. We also demonstrate that our mechanisms can be combined with application-aware memory request scheduling in multicore systems to further improve performance and fairness.
- Published
- 2018
19. Purification and Detection of Ubiquitinated Plant Proteins Using Tandem Ubiquitin Binding Entities
- Author
-
Lee, DongHyuk, primary and Coaker, Gitta, additional
- Published
- 2022
- Full Text
- View/download PDF
20. Improving DRAM Performance by Parallelizing Refreshes with Accesses
- Author
-
Chang, Kevin K., Lee, Donghyuk, Chishti, Zeshan, Alameldeen, Alaa R., Wilkerson, Chris, Kim, Yoongu, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
Modern DRAM cells are periodically refreshed to prevent data loss due to leakage. Commodity DDR DRAM refreshes cells at the rank level. This degrades performance significantly because it prevents an entire rank from serving memory requests while being refreshed. DRAM designed for mobile platforms, LPDDR DRAM, supports an enhanced mode, called per-bank refresh, that refreshes cells at the bank level. This enables a bank to be accessed while another in the same rank is being refreshed, alleviating part of the negative performance impact of refreshes. However, there are two shortcomings of per-bank refresh. First, the per-bank refresh scheduling scheme does not exploit the full potential of overlapping refreshes with accesses across banks because it restricts the banks to be refreshed in a sequential round-robin order. Second, accesses to a bank that is being refreshed have to wait. To mitigate the negative performance impact of DRAM refresh, we propose two complementary mechanisms, DARP (Dynamic Access Refresh Parallelization) and SARP (Subarray Access Refresh Parallelization). The goal is to address the drawbacks of per-bank refresh by building more efficient techniques to parallelize refreshes and accesses within DRAM. First, instead of issuing per-bank refreshes in a round-robin order, DARP issues per-bank refreshes to idle banks in an out-of-order manner. Furthermore, DARP schedules refreshes during intervals when a batch of writes are draining to DRAM. Second, SARP exploits the existence of mostly-independent subarrays within a bank. With minor modifications to DRAM organization, it allows a bank to serve memory accesses to an idle subarray while another subarray is being refreshed. Extensive evaluations show that our mechanisms improve system performance and energy efficiency compared to state-of-the-art refresh policies and the benefit increases as DRAM density increases., Comment: The original paper published in the International Symposium on High-Performance Computer Architecture (HPCA) contains an error. The arxiv version has an erratum that describes the error and the fix for it
- Published
- 2017
- Full Text
- View/download PDF
21. GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies
- Author
-
Kim, Jeremie S., Cali, Damla Senol, Xin, Hongyi, Lee, Donghyuk, Ghose, Saugata, Alser, Mohammed, Hassan, Hasan, Ergin, Oguz, Alkan, Can, and Mutlu, Onur
- Subjects
Quantitative Biology - Genomics ,Computer Science - Computational Engineering, Finance, and Science - Abstract
Motivation: Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments. Results: We propose a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM). GRIM-Filter quickly filters seed locations by 1) introducing a new representation of coarse-grained segments of the reference genome, and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for a sequence alignment error tolerance of 0.05, GRIM-Filter 1) reduces the false negative rate of filtering by 5.59x--6.41x, and 2) provides an end-to-end read mapper speedup of 1.81x--3.65x, compared to a state-of-the-art read mapper employing the best previous seed location filtering algorithm. Availability: The code is available online at: https://github.com/CMU-SAFARI/GRIM, Comment: arXiv admin note: text overlap with arXiv:1708.04329
- Published
- 2017
- Full Text
- View/download PDF
22. GRIM-filter: fast seed filtering in read mapping using emerging memory technologies
- Author
-
Kim, Jeremie S, Senol, Damla, Xin, Hongyi, Lee, Donghyuk, Ghose, Saugata, Alser, Mohammed, Hassan, Hasan, Ergin, Oguz, Alkan, Can, and Mutlu, Onur
- Subjects
Quantitative Biology - Genomics - Abstract
Motivation: Seed filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. Read mappers 1) quickly generate possible mapping locations (i.e., seeds) for each read, 2) extract reference sequences at each of the mapping locations, and then 3) check similarity between each read and its associated reference sequences with a computationally expensive dynamic programming algorithm (alignment) to determine the origin of the read. Location filters come into play before alignment, discarding seed locations that alignment would have deemed a poor match. The ideal location filter would discard all poor matching locations prior to alignment such that there is no wasted computation on poor alignments. Results: We propose a novel filtering algorithm, GRIM-Filter, optimized to exploit emerging 3D-stacked memory systems that integrate computation within a stacked logic layer, enabling processing-in-memory (PIM). GRIM-Filter quickly filters locations by 1) introducing a new representation of coarse-grained segments of the reference genome and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for 5% error acceptance rates, GRIM-Filter eliminates 5.59x-6.41x more false negatives and exhibits end-to-end speedups of 1.81x-3.65x compared to mappers employing the best previous filtering algorithm.
- Published
- 2017
23. Understanding Reduced-Voltage Operation in Modern DRAM Chips: Characterization, Analysis, and Mechanisms
- Author
-
Chang, Kevin K., Yağlıkçı, Abdullah Giray, Ghose, Saugata, Agrawal, Aditya, Chatterjee, Niladrish, Kashyap, Abhijith, Lee, Donghyuk, O'Connor, Mike, Hassan, Hasan, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM energy consumption. We would like to reduce the DRAM supply voltage more aggressively, to further reduce energy. Aggressive supply voltage reduction requires a thorough understanding of the effect voltage scaling has on DRAM access latency and DRAM reliability. In this paper, we take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the supply voltage is lowered below the nominal voltage level specified by DRAM standards. Using an FPGA-based testing platform, we perform an experimental study of 124 real DDR3L (low-voltage) DRAM chips manufactured recently by three major DRAM vendors. We find that reducing the supply voltage below a certain point introduces bit errors in the data, and we comprehensively characterize the behavior of these errors. We discover that these errors can be avoided by increasing the latency of three major DRAM operations (activation, restoration, and precharge). We perform detailed DRAM circuit simulations to validate and explain our experimental findings. We also characterize the various relationships between reduced supply voltage and error locations, stored data patterns, DRAM temperature, and data retention. Based on our observations, we propose a new DRAM energy reduction mechanism, called Voltron. The key idea of Voltron is to use a performance model to determine by how much we can reduce the supply voltage without introducing errors and without exceeding a user-specified threshold for performance loss. Voltron reduces the average system energy by 7.3% while limiting the average system performance loss to only 1.8%, for a variety of workloads., Comment: 25 pages, 25 figures, 7 tables, Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS)
- Published
- 2017
- Full Text
- View/download PDF
24. The MAP4 Kinase SIK1 Ensures Robust Extracellular ROS Burst and Antibacterial Immunity in Plants
- Author
-
Zhang, Meixiang, Chiang, Yi-Hsuan, Toruño, Tania Y, Lee, DongHyuk, Ma, Miaomiao, Liang, Xiangxiu, Lal, Neeraj K, Lemos, Mark, Lu, Yi-Ju, Ma, Shisong, Liu, Jun, Day, Brad, Dinesh-Kumar, Savithramma P, Dehesh, Katayoon, Dou, Daolong, Zhou, Jian-Min, and Coaker, Gitta
- Subjects
Biochemistry and Cell Biology ,Biological Sciences ,1.1 Normal biological development and functioning ,Arabidopsis ,Arabidopsis Proteins ,Gene Expression Regulation ,Plant ,NADPH Oxidases ,Phosphorylation ,Plant Diseases ,Plant Immunity ,Protein Serine-Threonine Kinases ,Pseudomonas syringae ,Reactive Oxygen Species ,BIK1 ,MAP4K ,PTI ,RBOHD ,ROS ,SIK1 ,autoimmunity ,phosphorylation ,Microbiology ,Medical Microbiology ,Immunology ,Biochemistry and cell biology ,Medical microbiology - Abstract
Microbial patterns are recognized by cell-surface receptors to initiate pattern-triggered immunity (PTI) in plants. Receptor-like cytoplasmic kinases (RLCKs), such as BIK1, and calcium-dependent protein kinases (CPKs) are engaged during PTI to activate the NADPH oxidase RBOHD for reactive oxygen species (ROS) production. It is unknown whether protein kinases besides CPKs and RLCKs participate in RBOHD regulation. We screened mutants in all ten Arabidopsis MAP4 kinases (MAP4Ks) and identified the conserved MAP4K SIK1 as a positive regulator of PTI. sik1 mutants were compromised in their ability to elicit the ROS burst in response to microbial features and exhibited compromised PTI to bacterial infection. SIK1 directly interacts with, phosphorylates, and stabilizes BIK1 in a kinase activity-dependent manner. Furthermore, SIK1 directly interacts with and phosphorylates RBOHD upon flagellin perception. Thus, SIK1 positively regulates immunity by stabilizing BIK1 and activating RBOHD to promote the extracellular ROS burst.
- Published
- 2018
25. Comparison of somatic mutation landscapes in Chinese versus European breast cancer patients
- Author
-
Zhu, Bin, Joo, Lijin, Zhang, Tongwu, Koka, Hela, Lee, DongHyuk, Shi, Jianxin, Lee, Priscilla, Wang, Difei, Wang, Feng, Chan, Wing-cheong, Law, Sze Hong, Tsoi, Yee-kei, Tse, Gary M., Lai, Shui Wun, Wu, Cherry, Yang, Shuyuan, Yang Chan, Emily Ying, Shan Wong, Samuel Yeung, Wang, Mingyi, Song, Lei, Jones, Kristine, Hutchinson, Amy, Hicks, Belynda, Prokunina-Olsson, Ludmila, Garcia-Closas, Montserrat, Chanock, Stephen, Tse, Lap Ah, and Yang, Xiaohong R.
- Published
- 2022
- Full Text
- View/download PDF
26. Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM
- Author
-
Seshadri, Vivek, Lee, Donghyuk, Mullins, Thomas, Hassan, Hasan, Boroumand, Amirali, Kim, Jeremie, Kozuch, Michael A., Mutlu, Onur, Gibbons, Phillip B., and Mowry, Todd C.
- Subjects
Computer Science - Hardware Architecture - Abstract
Bitwise operations are an important component of modern day programming. Many widely-used data structures (e.g., bitmap indices in databases) rely on fast bitwise operations on large bit vectors to achieve high performance. Unfortunately, in existing systems, regardless of the underlying architecture (e.g., CPU, GPU, FPGA), the throughput of such bulk bitwise operations is limited by the available memory bandwidth. We propose Buddy, a new mechanism that exploits the analog operation of DRAM to perform bulk bitwise operations completely inside the DRAM chip. Buddy consists of two components. First, simultaneous activation of three DRAM rows that are connected to the same set of sense amplifiers enables us to perform bitwise AND and OR operations. Second, the inverters present in each sense amplifier enables us to perform bitwise NOT operations, with modest changes to the DRAM array. These two components make Buddy functionally complete. Our implementation of Buddy largely exploits the existing DRAM structure and interface, and incurs low overhead (1% of DRAM chip area). Our evaluations based on SPICE simulations show that, across seven commonly-used bitwise operations, Buddy provides between 10.9X---25.6X improvement in raw throughput and 25.1X---59.5X reduction in energy consumption. We evaluate three real-world data-intensive applications that exploit bitwise operations: 1) bitmap indices, 2) BitWeaving, and 3) bitvector-based implementation of sets. Our evaluations show that Buddy significantly outperforms the state-of-the-art., Comment: arXiv admin note: text overlap with arXiv:1605.06483
- Published
- 2016
27. Understanding and Exploiting Design-Induced Latency Variation in Modern DRAM Chips
- Author
-
Lee, Donghyuk, Khan, Samira, Subramanian, Lavanya, Ghose, Saugata, Ausavarungnirun, Rachata, Pekhimenko, Gennady, Seshadri, Vivek, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
Variation has been shown to exist across the cells within a modern DRAM chip. We empirically demonstrate a new form of variation that exists within a real DRAM chip, induced by the design and placement of different components in the DRAM chip. Our goals are to understand design-induced variation that exists in real, state-of-the-art DRAM chips, exploit it to develop low-cost mechanisms that can dynamically find and use the lowest latency at which to operate a DRAM chip reliably, and, thus, improve overall system performance while ensuring reliable system operation. To this end, we first experimentally demonstrate and analyze designed-induced variation in modern DRAM devices by testing and characterizing 96 DIMMs (768 DRAM chips). Our characterization identifies DRAM regions that are vulnerable to errors, if operated at lower latency, and finds consistency in their locations across a given DRAM chip generation, due to design-induced variation. Based on our extensive experimental analysis, we develop two mechanisms that reliably reduce DRAM latency. First, DIVA Profiling uses runtime profiling to dynamically identify the lowest DRAM latency that does not introduce failures. DIVA Profiling exploits design-induced variation and periodically profiles only the vulnerable regions to determine the lowest DRAM latency at low cost. Our second mechanism, DIVA Shuffling, shuffles data such that values stored in vulnerable regions are mapped to multiple error-correcting code (ECC) codewords. Combined together, our two mechanisms reduce read/write latency by 40.0%/60.5%, which translates to an overall system performance improvement of 14.7%/13.7%/13.8% (in 2-/4-/8-core systems) across a variety of workloads, while ensuring reliable operation., Comment: This paper is a two column version of the paper, D. Lee et al., "Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms", SIGMETRICS 2017
- Published
- 2016
28. Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity
- Author
-
Lee, Donghyuk
- Subjects
Computer Science - Hardware Architecture - Abstract
In modern systems, DRAM-based main memory is significantly slower than the processor. Consequently, processors spend a long time waiting to access data from main memory, making the long main memory access latency one of the most critical bottlenecks to achieving high system performance. Unfortunately, the latency of DRAM has remained almost constant in the past decade. This is mainly because DRAM has been optimized for cost-per-bit, rather than access latency. As a result, DRAM latency is not reducing with technology scaling, and continues to be an important performance bottleneck in modern and future systems. This dissertation seeks to achieve low latency DRAM-based memory systems at low cost in three major directions. First, based on the observation that long bitlines in DRAM are one of the dominant sources of DRAM latency, we propose a new DRAM architecture, Tiered-Latency DRAM (TL-DRAM), which divides the long bitline into two shorter segments using an isolation transistor, allowing one segment to be accessed with reduced latency. Second, we propose a fine-grained DRAM latency reduction mechanism, Adaptive-Latency DRAM, which optimizes DRAM latency for the common operating conditions for individual DRAM module. Third, we propose a new technique, Architectural-Variation-Aware DRAM (AVA-DRAM), which reduces DRAM latency at low cost, by profiling and identifying only the inherently slower regions in DRAM to dynamically determine the lowest latency DRAM can operate at without causing failures. This dissertation provides a detailed analysis of DRAM latency by using both circuit-level simulation with a detailed DRAM model and FPGA-based profiling of real DRAM modules. Our latency analysis shows that our low latency DRAM mechanisms enable significant latency reductions, leading to large improvement in both system performance and energy efficiency., Comment: 159 pages, PhD thesis, CMU 2016
- Published
- 2016
29. Adaptive-Latency DRAM (AL-DRAM)
- Author
-
Lee, Donghyuk, Kim, Yoongu, Pekhimenko, Gennady, Khan, Samira, Seshadri, Vivek, Chang, Kevin, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This paper summarizes the idea of Adaptive-Latency DRAM (AL-DRAM), which was published in HPCA 2015. The key goal of AL-DRAM is to exploit the extra margin that is built into the DRAM timing parameters to reduce DRAM latency. The key observation is that the timing parameters are dictated by the worst-case temperatures and worst-case DRAM cells, both of which lead to small amount of charge storage and hence high access latency. One can therefore reduce latency by adapting the timing parameters to the current operating temperature and the current DIMM that is being accessed. Using an FPGA-based testing platform, our work first characterizes the extra margin for 115 DRAM modules from three major manufacturers. The experimental results demonstrate that it is possible to reduce four of the most critical timing parameters by a minimum/maximum of 17.3%/54.8% at 55C while maintaining reliable operation. AL-DRAM adaptively selects between multiple different timing parameters for each DRAM module based on its current operating condition. AL-DRAM does not require any changes to the DRAM chip or its interface; it only requires multiple different timing parameters to be specified and supported by the memory controller. Real system evaluations show that AL-DRAM improves the performance of memory-intensive workloads by an average of 14% without introducing any errors., Comment: This is a summary of the original paper, entitled "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case" which appears in HPCA 2015
- Published
- 2016
30. Integrative molecular characterisation of gallbladder cancer reveals micro-environment-associated subtypes
- Author
-
Nepal, Chirag, Zhu, Bin, O’Rourke, Colm J., Bhatt, Deepak Kumar, Lee, Donghyuk, Song, Lei, Wang, Difei, Van Dyke, Alison L., Choo-Wosoba, Hyoyoung, Liu, Zhiwei, Hildesheim, Allan, Goldstein, Alisa M., Dean, Michael, LaFuente-Barquero, Juan, Lawrence, Scott, Mutreja, Karun, Olanich, Mary E., Lorenzo Bermejo, Justo, Ferreccio, Catterina, Roa, Juan Carlos, Rashid, Asif, Hsing, Ann W., Gao, Yu-Tang, Chanock, Stephen J., Araya, Juan Carlos, Andersen, Jesper B., and Koshiol, Jill
- Published
- 2021
- Full Text
- View/download PDF
31. Plasma Membrane Localized GCaMP-MS4A12 by Orai1 Co-Expression Shows Thapsigargin- and Ca2+-Dependent Fluorescence Increases
- Author
-
Han, Jung Woo, Heo, Woon, Lee, Donghyuk, Kang, Choeun, Kim, Hye-Yeon, Jun, Ikhyun, So, Insuk, Hur, Hyuk, Lee, Min Goo, Jung, Minkyu, and Kim, Joo Young
- Published
- 2021
- Full Text
- View/download PDF
32. RowHammer: Reliability Analysis and Security Implications
- Author
-
Kim, Yoongu, Daly, Ross, Kim, Jeremie, Fallin, Chris, Lee, Ji Hye, Lee, Donghyuk, Wilkerson, Chris, Lai, Konrad, and Mutlu, Onur
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Cryptography and Security - Abstract
As process technology scales down to smaller dimensions, DRAM chips become more vulnerable to disturbance, a phenomenon in which different DRAM cells interfere with each other's operation. For the first time in academic literature, our ISCA paper exposes the existence of disturbance errors in commodity DRAM chips that are sold and used today. We show that repeatedly reading from the same address could corrupt data in nearby addresses. More specifically: When a DRAM row is opened (i.e., activated) and closed (i.e., precharged) repeatedly (i.e., hammered), it can induce disturbance errors in adjacent DRAM rows. This failure mode is popularly called RowHammer. We tested 129 DRAM modules manufactured within the past six years (2008-2014) and found 110 of them to exhibit RowHammer disturbance errors, the earliest of which dates back to 2010. In particular, all modules from the past two years (2012-2013) were vulnerable, which implies that the errors are a recent phenomenon affecting more advanced generations of process technology. Importantly, disturbance errors pose an easily-exploitable security threat since they are a breach of memory protection, wherein accesses to one page (mapped to one row) modifies the data stored in another page (mapped to an adjacent row)., Comment: This is the summary of the paper titled "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors" which appeared in ISCA in June 2014
- Published
- 2016
33. Tiered-Latency DRAM (TL-DRAM)
- Author
-
Lee, Donghyuk, Kim, Yoongu, Seshadri, Vivek, Liu, Jamie, Subramanian, Lavanya, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
This paper summarizes the idea of Tiered-Latency DRAM, which was published in HPCA 2013. The key goal of TL-DRAM is to provide low DRAM latency at low cost, a critical problem in modern memory systems. To this end, TL-DRAM introduces heterogeneity into the design of a DRAM subarray by segmenting the bitlines, thereby creating a low-latency, low-energy, low-capacity portion in the subarray (called the near segment), which is close to the sense amplifiers, and a high-latency, high-energy, high-capacity portion, which is farther away from the sense amplifiers. Thus, DRAM becomes heterogeneous with a small portion having lower latency and a large portion having higher latency. Various techniques can be employed to take advantage of the low-latency near segment and this new heterogeneous DRAM substrate, including hardware-based caching and software based caching and memory allocation of frequently used data in the near segment. Evaluations with simple such techniques show significant performance and energy-efficiency benefits., Comment: This is a summary of the original paper, entitled "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture" which appears in HPCA 2013
- Published
- 2016
34. Reducing Performance Impact of DRAM Refresh by Parallelizing Refreshes with Accesses
- Author
-
Chang, Kevin Kai-Wei, Lee, Donghyuk, Chishti, Zeshan, Alameldeen, Alaa R., Wilkerson, Chris, Kim, Yoongu, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
Modern DRAM cells are periodically refreshed to prevent data loss due to leakage. Commodity DDR DRAM refreshes cells at the rank level. This degrades performance significantly because it prevents an entire rank from serving memory requests while being refreshed. DRAM designed for mobile platforms, LPDDR DRAM, supports an enhanced mode, called per-bank refresh, that refreshes cells at the bank level. This enables a bank to be accessed while another in the same rank is being refreshed, alleviating part of the negative performance impact of refreshes. However, there are two shortcomings of per-bank refresh. First, the per-bank refresh scheduling scheme does not exploit the full potential of overlapping refreshes with accesses across banks because it restricts the banks to be refreshed in a sequential round-robin order. Second, accesses to a bank that is being refreshed have to wait. To mitigate the negative performance impact of DRAM refresh, we propose two complementary mechanisms, DARP (Dynamic Access Refresh Parallelization) and SARP (Subarray Access Refresh Parallelization). The goal is to address the drawbacks of per-bank refresh by building more efficient techniques to parallelize refreshes and accesses within DRAM. First, instead of issuing per-bank refreshes in a round-robin order, DARP issues per-bank refreshes to idle banks in an out-of-order manner. Furthermore, DARP schedules refreshes during intervals when a batch of writes are draining to DRAM. Second, SARP exploits the existence of mostly-independent subarrays within a bank. With minor modifications to DRAM organization, it allows a bank to serve memory accesses to an idle subarray while another subarray is being refreshed. Extensive evaluations show that our mechanisms improve system performance and energy efficiency compared to state-of-the-art refresh policies and the benefit increases as DRAM density increases., Comment: 3 pages, 3 figures
- Published
- 2016
35. Simultaneous Multi Layer Access: A High Bandwidth and Low Cost 3D-Stacked Memory Interface
- Author
-
Lee, Donghyuk, Pekhimenko, Gennady, Khan, Samira, Ghose, Saugata, and Mutlu, Onur
- Subjects
Computer Science - Hardware Architecture - Abstract
Limited memory bandwidth is a critical bottleneck in modern systems. 3D-stacked DRAM enables higher bandwidth by leveraging wider Through-Silicon-Via (TSV) channels, but today's systems cannot fully exploit them due to the limited internal bandwidth of DRAM. DRAM reads a whole row simultaneously from the cell array to a row buffer, but can transfer only a fraction of the data from the row buffer to peripheral IO circuit, through a limited and expensive set of wires referred to as global bitlines. In presence of wider memory channels, the major bottleneck becomes the limited data transfer capacity through these global bitlines. Our goal in this work is to enable higher bandwidth in 3D-stacked DRAM without the increased cost of adding more global bitlines. We instead exploit otherwise-idle resources, such as global bitlines, already existing within the multiple DRAM layers by accessing the layers simultaneously. Our architecture, Simultaneous Multi Layer Access (SMLA), provides higher bandwidth by aggregating the internal bandwidth of multiple layers and transferring the available data at a higher IO frequency. To implement SMLA, simultaneous data transfer from multiple layers through the same IO TSVs requires coordination between layers to avoid channel conflict. We first study coordination by static partitioning, which we call Dedicated-IO, that assigns groups of TSVs to each layer. We then provide a simple, yet sophisticated mechanism, called Cascaded-IO, which enables simultaneous access to each layer by time-multiplexing the IOs. By operating at a frequency proportional to the number of layers, SMLA provides a higher bandwidth (4X for a four-layer stacked DRAM). Our evaluations show that SMLA provides significant performance improvement and energy reduction (55%/18% on average for multi-programmed workloads, respectively) over a baseline 3D-stacked DRAM with very low area overhead.
- Published
- 2015
36. The Blacklisting Memory Scheduler: Balancing Performance, Fairness and Complexity
- Author
-
Subramanian, Lavanya, Lee, Donghyuk, Seshadri, Vivek, Rastogi, Harsha, and Mutlu, Onur
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
In a multicore system, applications running on different cores interfere at main memory. This inter-application interference degrades overall system performance and unfairly slows down applications. Prior works have developed application-aware memory schedulers to tackle this problem. State-of-the-art application-aware memory schedulers prioritize requests of applications that are vulnerable to interference, by ranking individual applications based on their memory access characteristics and enforcing a total rank order. In this paper, we observe that state-of-the-art application-aware memory schedulers have two major shortcomings. First, such schedulers trade off hardware complexity in order to achieve high performance or fairness, since ranking applications with a total order leads to high hardware complexity. Second, ranking can unfairly slow down applications that are at the bottom of the ranking stack. To overcome these shortcomings, we propose the Blacklisting Memory Scheduler (BLISS), which achieves high system performance and fairness while incurring low hardware complexity, based on two observations. First, we find that, to mitigate interference, it is sufficient to separate applications into only two groups. Second, we show that this grouping can be efficiently performed by simply counting the number of consecutive requests served from each application. We evaluate BLISS across a wide variety of workloads/system configurations and compare its performance and hardware complexity, with five state-of-the-art memory schedulers. Our evaluations show that BLISS achieves 5% better system performance and 25% better fairness than the best-performing previous scheduler while greatly reducing critical path latency and hardware area cost of the memory scheduler (by 79% and 43%, respectively), thereby achieving a good trade-off between performance, fairness and hardware complexity.
- Published
- 2015
37. Contact Scheduling-Based Video Streaming with Real-Time Events for Space Internet Over DTN
- Author
-
Kyung, Donggu, Lee, Donghyuk, Lee, Namhwa, Lim, Ducsun, Joe, Inwhee, Lee, Kyungrak, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zhang, Junjie James, Series Editor, Kim, Kuinam J., editor, and Kim, Hye-Young, editor
- Published
- 2020
- Full Text
- View/download PDF
38. DTN-SMTP: A Novel Mail Transfer Protocol with Minimized Interactions for Space Internet
- Author
-
Lee, Donghyuk, Kang, Jinyeong, Dahouda, Mwamba Kasongo, Joe, Inwhee, Lee, Kyungrak, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Gervasi, Osvaldo, editor, Murgante, Beniamino, editor, Misra, Sanjay, editor, Garau, Chiara, editor, Blečić, Ivan, editor, Taniar, David, editor, Apduhan, Bernady O., editor, Rocha, Ana Maria A.C., editor, Tarantino, Eufemia, editor, Torre, Carmelo Maria, editor, and Karaca, Yeliz, editor
- Published
- 2020
- Full Text
- View/download PDF
39. Two serine residues in Pseudomonas syringae effector HopZ1a are required for acetyltransferase activity and association with the host co-factor.
- Author
-
Ma, Ka-Wai, Jiang, Shushu, Hawara, Eva, Lee, DongHyuk, Pan, Songqin, Coaker, Gitta, Song, Jikui, and Ma, Wenbo
- Subjects
Pseudomonas syringae ,Ralstonia ,Arabidopsis ,Phytic Acid ,Acetyltransferases ,Serine ,Bacterial Proteins ,Arabidopsis Proteins ,Virulence Factors ,Virulence ,Plant Diseases ,Protein Processing ,Post-Translational ,Host-Pathogen Interactions ,Arabidopsis thaliana ,YopJ family type III effectors ,acetyltransferase ,bacterial virulence ,inositol hexakisphosphate ,stomatal aperture ,Emerging Infectious Diseases ,Infectious Diseases ,Aetiology ,2.1 Biological and endogenous factors ,Biological Sciences ,Agricultural and Veterinary Sciences ,Plant Biology & Botany - Abstract
Gram-negative bacteria inject type III secreted effectors (T3SEs) into host cells to manipulate the immune response. The YopJ family effector HopZ1a produced by the plant pathogen Pseudomonas syringae possesses acetyltransferase activity and acetylates plant proteins to facilitate infection. Using mass spectrometry, we identified a threonine residue, T346, as the main autoacetylation site of HopZ1a. Two neighboring serine residues, S349 and S351, are required for the acetyltransferase activity of HopZ1a in vitro and are indispensable for the virulence function of HopZ1a in Arabidopsis thaliana. Using proton nuclear magnetic resonance (NMR), we observed a conformational change of HopZ1a in the presence of inositol hexakisphosphate (IP6), which acts as a eukaryotic co-factor and significantly enhances the acetyltransferase activity of several YopJ family effectors. S349 and S351 are required for IP6-binding-mediated conformational change of HopZ1a. S349 and S351 are located in a conserved region in the C-terminal domain of YopJ family effectors. Mutations of the corresponding serine(s) in two other effectors, HopZ3 of P. syringae and PopP2 of Ralstonia solanacerum, also abolished their acetyltransferase activity. These results suggest that, in addition to the highly conserved catalytic residues, YopJ family effectors also require conserved serine(s) in the C-terminal domain for their enzymatic activity.
- Published
- 2015
40. Fast and accurate mapping of Complete Genomics reads
- Author
-
Lee, Donghyuk, Hormozdiari, Farhad, Xin, Hongyi, Hach, Faraz, Mutlu, Onur, and Alkan, Can
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,Bioengineering ,Biotechnology ,Good Health and Well Being ,Algorithms ,Electronic Data Processing ,Genome ,Human ,Genomics ,High-Throughput Nucleotide Sequencing ,Humans ,Sequence Alignment ,Sequence Analysis ,DNA ,Software ,Complete Genomics ,Read mapping ,Gapped reads ,High throughput sequencing - Abstract
Many recent advances in genomics and the expectations of personalized medicine are made possible thanks to power of high throughput sequencing (HTS) in sequencing large collections of human genomes. There are tens of different sequencing technologies currently available, and each HTS platform have different strengths and biases. This diversity both makes it possible to use different technologies to correct for shortcomings; but also requires to develop different algorithms for each platform due to the differences in data types and error models. The first problem to tackle in analyzing HTS data for resequencing applications is the read mapping stage, where many tools have been developed for the most popular HTS methods, but publicly available and open source aligners are still lacking for the Complete Genomics (CG) platform. Unfortunately, Burrows-Wheeler based methods are not practical for CG data due to the gapped nature of the reads generated by this method. Here we provide a sensitive read mapper (sirFAST) for the CG technology based on the seed-and-extend paradigm that can quickly map CG reads to a reference genome. We evaluate the performance and accuracy of sirFAST using both simulated and publicly available real data sets, showing high precision and recall rates.
- Published
- 2015
41. The Pseudomonas syringae Type III Effector HopF2 Suppresses Arabidopsis Stomatal Immunity
- Author
-
Hurley, Brenden, Lee, Donghyuk, Mott, Adam, Wilton, Michael, Liu, Jun, Liu, Yulu C, Angers, Stephane, Coaker, Gitta, Guttman, David S, and Desveaux, Darrell
- Subjects
ADP Ribose Transferases ,Arabidopsis ,Bacterial Proteins ,Host-Pathogen Interactions ,Plant Immunity ,Plant Stomata ,Plants ,Genetically Modified ,Proteomics ,Pseudomonas syringae ,Type III Secretion Systems ,General Science & Technology - Abstract
Pseudomonas syringae subverts plant immune signalling through injection of type III secreted effectors (T3SE) into host cells. The T3SE HopF2 can disable Arabidopsis immunity through Its ADP-ribosyltransferase activity. Proteomic analysis of HopF2 interacting proteins identified a protein complex containing ATPases required for regulating stomatal aperture, suggesting HopF2 may manipulate stomatal immunity. Here we report HopF2 can inhibit stomatal immunity independent of its ADP-ribosyltransferase activity. Transgenic expression of HopF2 in Arabidopsis inhibits stomatal closing in response to P. syringae and increases the virulence of surface inoculated P. syringae. Further, transgenic expression of HopF2 inhibits flg22 induced reactive oxygen species production. Intriguingly, ADP-ribosyltransferase activity is dispensable for inhibiting stomatal immunity and flg22 induced reactive oxygen species. Together, this implies HopF2 may be a bifunctional T3SE with ADP-ribosyltransferase activity required for inhibiting apoplastic immunity and an independent function required to inhibit stomatal immunity.
- Published
- 2014
42. Accelerating Read Mapping with FastHASH
- Author
-
Xin, Hongyi, Lee, Donghyuk, Hormozdiari, Farhad, Yedkar, Samihan, Mutlu, Onur, and Alkan, Can
- Abstract
Abstract With the introduction of next-generation sequencing (NGS) technologies, we are facing an exponential increase in the amount of genomic sequence data. The success of all medical and genetic applications of next-generation sequencing critically depends on the existence of computational techniques that can process and analyze the enormous amount of sequence data quickly and accurately. Unfortunately, the current read mapping algorithms have difficulties in coping with the massive amounts of data generated by NGS. We propose a new algorithm, FastHASH, which drastically improves the performance of the seed-and-extend type hash table based read mapping algorithms, while maintaining the high sensitivity and comprehensiveness of such methods. FastHASH is a generic algorithm compatible with all seed-and-extend class read mapping algorithms. It introduces two main techniques, namely Adjacency Filtering, and Cheap K-mer Selection. We implemented FastHASH and merged it into the codebase of the popular read mapping program, mrFAST. Depending on the edit distance cutoffs, we observed up to 19-fold speedup while still maintaining 100% sensitivity and high comprehensiveness.
- Published
- 2013
43. A Stable Tele-operation of a Mobile Robot with the Haptic Feedback
- Author
-
Lee, Jangmyung, Yoon, Hanuel, and Lee, Donghyuk
- Published
- 2018
- Full Text
- View/download PDF
44. Cooperative control system of the floating cranes for the dual lifting
- Author
-
Nam, Mihee, Kim, Jinbeom, Lee, Jaechang, Kim, Daekyung, Lee, Donghyuk, and Lee, Jangmyung
- Published
- 2018
- Full Text
- View/download PDF
45. Supplemental Table 1 from Nitrated Polycyclic Aromatic Hydrocarbon (Nitro-PAH) Signatures and Somatic Mutations in Diesel Exhaust-Exposed Bladder Tumors
- Author
-
Gonzalez, Nicole, primary, Rao, Nina, primary, Dean, Michael, primary, Lee, Donghyuk, primary, Hurson, Amber N., primary, Baris, Dalsu, primary, Schwenn, Molly, primary, Johnson, Alison, primary, Prokunina-Olsson, Ludmila, primary, Friesen, Melissa C., primary, Zhu, Bin, primary, Rothman, Nathaniel, primary, Silverman, Debra T., primary, and Koutros, Stella, primary
- Published
- 2023
- Full Text
- View/download PDF
46. Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing
- Author
-
Pellauer, Michael, primary, Clemons, Jason, additional, Balaji, Vignesh, additional, Crago, Neal, additional, Jaleel, Aamer, additional, Lee, Donghyuk, additional, O’Connor, Mike, additional, Parashar, Anghsuman, additional, Treichler, Sean, additional, Tsai, Po-An, additional, Keckler, Stephen W., additional, and Emer, Joel S., additional
- Published
- 2023
- Full Text
- View/download PDF
47. DTN-SMTP: A Novel Mail Transfer Protocol with Minimized Interactions for Space Internet
- Author
-
Lee, Donghyuk, primary, Kang, Jinyeong, additional, Dahouda, Mwamba Kasongo, additional, Joe, Inwhee, additional, and Lee, Kyungrak, additional
- Published
- 2020
- Full Text
- View/download PDF
48. Contact Scheduling-Based Video Streaming with Real-Time Events for Space Internet Over DTN
- Author
-
Kyung, Donggu, primary, Lee, Donghyuk, additional, Lee, Namhwa, additional, Lim, Ducsun, additional, Joe, Inwhee, additional, and Lee, Kyungrak, additional
- Published
- 2019
- Full Text
- View/download PDF
49. Supplemental Table 5 from Nitrated Polycyclic Aromatic Hydrocarbon (Nitro-PAH) Signatures and Somatic Mutations in Diesel Exhaust-Exposed Bladder Tumors
- Author
-
Gonzalez, Nicole, primary, Rao, Nina, primary, Dean, Michael, primary, Lee, Donghyuk, primary, Hurson, Amber N., primary, Baris, Dalsu, primary, Schwenn, Molly, primary, Johnson, Alison, primary, Prokunina-Olsson, Ludmila, primary, Friesen, Melissa C., primary, Zhu, Bin, primary, Rothman, Nathaniel, primary, Silverman, Debra T., primary, and Koutros, Stella, primary
- Published
- 2023
- Full Text
- View/download PDF
50. Mutations in the HPV16 genome induced by APOBEC3 are associated with viral clearance
- Author
-
Zhu, Bin, Xiao, Yanzi, Yeager, Meredith, Clifford, Gary, Wentzensen, Nicolas, Cullen, Michael, Boland, Joseph F., Bass, Sara, Steinberg, Mia K., Raine-Bennett, Tina, Lee, DongHyuk, Burk, Robert D., Pinheiro, Maisa, Song, Lei, Dean, Michael, Nelson, Chase W., Burdett, Laurie, Yu, Kai, Roberson, David, Lorey, Thomas, Franceschi, Silvia, Castle, Philip E., Walker, Joan, Zuna, Rosemary, Schiffman, Mark, and Mirabello, Lisa
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.