768 results on '"Translation lookaside buffer"'
Search Results
2. Pinning Page Structure Entries to Last-Level Cache for Fast Address Translation
- Author
-
Osang Kwon, Yongho Lee, and Seokin Hong
- Subjects
Address translation ,page walk ,translation lookaside buffer ,virtual memory ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
As the memory footprint of emerging applications continues to increase, the address translation becomes a critical performance bottleneck owing to frequent misses on the Translation Lookaside Buffer (TLB). In addition, the TLB miss penalty becomes more critical in modern computer systems because the levels of the hierarchical page table (a.k.a. radix page table) are increasing to extend the address space. To reduce TLB misses, modern high-performance processors employ a multi-level TLB structure using a large last-level TLB. Employing a large last-level TLB may reduce TLB misses. However, its capacity is still limited, and it can incur a chip area overhead. In this paper, we propose a Page Structure Entry (PSE) pinning mechanism that provides a large PSE store by dedicating some space to the last-level cache to store only the page structure entries. The PSE Pinning is based on three key observations. First, memory-intensive applications suffer from frequent misses in the last-level cache. Thus, most of the space in the last-level cache is not utilized well. Second, most PSEs are fetched from the main memory during the page table walk process, meaning that the cache lines for the PSEs are frequently evicted from on-chip caches. Finally, a small number of PSEs are frequently accessed while others are not. By exploiting these three observations, PSE Pinning pins the frequently accessed page structure entries to the last-level caches so that they can reside on the cache. Experimental results show that PSE Pinning improves the performance of memory-intensive workloads suffering from frequent L2 TLB misses by 7.8% on average.
- Published
- 2022
- Full Text
- View/download PDF
3. Accelerating Address Translation for Virtualization by Leveraging Hardware Mode.
- Author
-
Sha, Sai, Zhang, Yi, Luo, Yingwei, Wang, Xiaolin, and Wang, Zhenlin
- Subjects
- *
WALKING speed , *HARDWARE , *VIRTUAL machine systems - Abstract
The overhead of memory virtualization remains nontrivial. The traditional shadow paging (TSP) resorts to a shadow page table (SPT) to achieve the native page walk speed, but page table updates require hypervisor interventions. Alternatively, nested paging enables low-overhead page table updates, but utilizes the hardware MMU to perform a long-latency two-dimensional page walk. This paper proposes new memory virtualization solutions based on hardware (machine) mode—the highest CPU privilege level in some architectures like Sunway and RISC-V. A programming interface, running in hardware mode, enables software-implementation of hardware support functions. We first propose Software-based Nested Paging (SNP), which extends the software MMU to perform a two-dimensional page walk in hardware mode. Second, we present Swift Shadow Paging (SSP), which accomplishes page table synchronization by intercepting TLB flushing in hardware mode. Finally we propose Accelerated Shadow Paging (ASP) combining SSP and SNP. ASP handles the last-level SPT page faults by walking two-dimensional page tables in hardware mode, which eliminates most hypervisor interventions. This paper systematically compares multiple memory virtualization models by analyzing their designs and evaluating their performance both on a real system and a simulator. The experiments show that the virtualization overhead of ASP is less than 4.5% for all workloads. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Concurrency Control Algorithms for Translation Lookaside Buffer
- Author
-
Agarwal, Manisha, Jailia, Manisha, Kacprzyk, Janusz, Series Editor, Fong, Simon, editor, Akashe, Shyam, editor, and Mahalle, Parikshit N., editor
- Published
- 2019
- Full Text
- View/download PDF
5. MemCAM: A Hybrid Memristor-CMOS CAM Cell for On-Chip Caches
- Author
-
Zareen Sadiq and Shehzad Hasan
- Subjects
Memristor content-addressable memory ,memristor crossbar ,translation lookaside buffer ,miss rate reduction ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Non-volatile nanoscale memory devices (such as memristors) have promised to overcome the challenges of scalability and leakage currents of CMOS based memory devices. These novel memories can be fabricated in back-end-of-the-line of any CMOS process. Currently, a lot of research is focused on investigating the benefits of memristors for associative memories. These are Content-Addressable Memories (CAM) in which search based data access takes place. Searching for a particular bit in memristor is time consuming while search in CMOS CAM zone is efficient. To combine the speed and ease of search of CMOS memory and the scalability of memristor memory, we present a novel multibit hybrid CMOS-Memristor Associative Memory Cell. The benefits of such memory cells manifest in on-chip caches - the instruction and data cache, Branch Target Buffer, and Translation Lookaside Buffer. To exemplify the benefit of the cell further, we also simulate the MemCAM as the TLB of an ARM processor and obtained upto 50% decrease in miss rates of Data TLB and upto 93% in that of Instruction TLB. Average speedup of 1.16 was also achieved on various benchmark applications of PARCSEC and MiBench suites.
- Published
- 2021
- Full Text
- View/download PDF
6. Page Table Compaction for TLB Coalescing
- Author
-
Jae Young Hur and Joonho Kong
- Subjects
Architecture ,memory management ,page table ,performance ,translation lookaside buffer ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In the traditional page-based memory management scheme, frequent page-table walks degrade performance and memory bandwidth utilization. A translation lookaside buffer (TLB) coalescing scheme reduces the problems by efficiently utilizing TLB and exploiting the contiguity in physical memory. In modern system hardware, it is usual that a memory transaction concurrently accesses multiple data. However, state-of-the-art TLB coalescing schemes do not fully utilize the data-level parallelism inherent in hardware. As a result, performance and memory bandwidth utilization can be degraded because of certain page-table walk overheads. To alleviate the overheads, we propose to conduct the compaction of allocated memory blocks (CAMB) in a page table. The proposed scheme can significantly reduce page-table walks by utilizing the data-level parallelism in hardware and the block-level allocation in operating system. A design, an analysis, a case study, an implementation, and an evaluation are presented. Considering image processing workloads as an example, experiments are conducted. The results indicate the presented scheme can improve performance and memory bandwidth utilization with modest cost.
- Published
- 2020
- Full Text
- View/download PDF
7. ReconOS
- Author
-
Agne, Andreas, Platzner, Marco, Plessl, Christian, Happe, Markus, Lübbers, Enno, Koch, Dirk, editor, Hannig, Frank, editor, and Ziener, Daniel, editor
- Published
- 2016
- Full Text
- View/download PDF
8. Building Code Randomization Defenses
- Author
-
Davi, Lucas, Sadeghi, Ahmad-Reza, Zdonik, Stan, Series editor, Shekhar, Shashi, Series editor, Katz, Jonathan, Series editor, Wu, Xindong, Series editor, Jain, Lakhmi C., Series editor, Padua, David, Series editor, Shen, Xuemin Sherman, Series editor, Furht, Borko, Series editor, Subrahmanian, V.S., Series editor, Hebert, Martial, Series editor, Ikeuchi, Katsushi, Series editor, Siciliano, Bruno, Series editor, Jajodia, Sushil, Series editor, Lee, Newton, Series editor, Davi, Lucas, and Sadeghi, Ahmad-Reza
- Published
- 2015
- Full Text
- View/download PDF
9. Improved Tool Support for Machine-Code Decompilation in HOL4
- Author
-
Fox, Anthony, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Urban, Christian, editor, and Zhang, Xingyuan, editor
- Published
- 2015
- Full Text
- View/download PDF
10. BIOS and Management Firmware
- Author
-
Gough, Corey, Steiner, Ian, Saunders, Winston, Gough, Corey, Steiner, Ian, and Saunders, Winston
- Published
- 2015
- Full Text
- View/download PDF
11. Making Information Hiding Effective Again
- Author
-
Yueqiang Cheng, Chenggang Wu, Kang Yan, Yinqian Zhang, Bowen Tang, Zhe Wang, Mengyao Xie, Yuanming Lai, Zhiping Shi, and Pen-Chung Yew
- Subjects
Computer science ,Information hiding ,Code reuse ,Translation lookaside buffer ,Code (cryptography) ,Overhead (computing) ,Cache ,Side channel attack ,Electrical and Electronic Engineering ,Computer security ,computer.software_genre ,computer ,Block (data storage) - Abstract
Information hiding (IH) is an important building block for many defenses against code reuse attacks, such as code-pointer integrity (CPI), control-flow integrity (CFI), and fine-grained code (re-)randomization, because of its effectiveness and performance. It employs randomization to probabilistically "hide" sensitive memory areas, called safe areas, from attackers and ensures their addresses are not leaked by any pointers directly. These defenses used safe areas to protect their critical data, such as jump targets and randomization secrets. However, recent works have shown that IH is vulnerable to various attacks. In this paper, we propose a new IH technique called SafeHidden. It continuously re-randomizes the locations of safe areas and thus prevents the attackers from probing and inferring the memory layout to find its location. A new thread-private memory mechanism is proposed to isolate the thread-local safe areas and prevent adversaries from reducing the randomization entropy. It also randomizes the safe areas after the TLB misses to prevent attackers from inferring the address of safe areas using cache side-channels. Existing IH-based defenses can utilize SafeHidden directly without any change. Our experiments show that SafeHidden not only prevents existing attacks effectively but also incurs low-performance overhead.
- Published
- 2022
12. Data Layout in Main Memory
- Author
-
Plattner, Hasso and Plattner, Hasso
- Published
- 2014
- Full Text
- View/download PDF
13. Enhancing Instruction TLB Resilience to Soft Errors.
- Author
-
Sanchez-Macian, Alfonso, Aranda, Luis Alberto, Reviriego, Pedro, Kiani, Vahdaneh, and Maestro, Juan Antonio
- Subjects
- *
CACHE memory , *SOFT errors , *DATA corruption , *ERROR correction (Information theory) , *VIRTUAL private networks - Abstract
A translation lookaside buffer (TLB) is a type of cache used to speed up the virtual to physical memory translation process. Instruction TLBs store virtual page numbers and their related physical page numbers for the last accessed pages of instruction memory. TLBs like other memories suffer soft errors that can corrupt their contents. A false positive due to an error produced in the virtual page number stored in the TLB may lead to a wrong translation and, consequently, the execution of a wrong instruction that can lead to a program hard fault or to data corruption. Parity or error correction codes have been proposed to provide protection for the TLB, but they require additional storage space. This paper presents some schemes to increase the instruction TLB resilience to this type of errors without requiring any extra storage space, by taking advantage of the spatial locality principle that takes place when executing a program. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. Improving Instruction TLB Reliability with Efficient Multi-bit Soft Error Protection.
- Author
-
Kiani, Vahdaneh and Reviriego, Pedro
- Subjects
- *
SOFT errors , *CACHE memory , *VIRTUAL storage (Computer science) , *FAULT-tolerant computing , *STATISTICAL reliability - Abstract
Abstract A Translation Lookaside Buffer (TLB) is a type of memory cache that is used to store recent translations of virtual to physical memory to reduce the access latency. Every time the processor accesses the virtual memory, it must be translated to the corresponding physical address, so the number of accesses to the TLB is high. Consequently, soft errors affecting the TLB can lead to hard fault, silent data corruption, and system freeze by corrupting its content. Many studies have proposed to provide protection for the Content Addressable Memory (CAM), which is a part of a TLB that stores the VPNs, but these protection techniques in most cases do not cover the case of multiple errors. This paper presents an efficient, fast and high error coverage approach to improve the reliability of TLB against Multiple Bit Upsets (MBUs) by considering the performance improvement with a low-cost overhead. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. Diversifying the Software Stack Using Randomized NOP Insertion
- Author
-
Jackson, Todd, Homescu, Andrei, Crane, Stephen, Larsen, Per, Brunthaler, Stefan, Franz, Michael, Jajodia, Sushil, editor, Ghosh, Anup K., editor, Subrahmanian, V.S., editor, Swarup, Vipin, editor, Wang, Cliff, editor, and Wang, X. Sean, editor
- Published
- 2013
- Full Text
- View/download PDF
16. Xeon Phi Core Microarchitecture
- Author
-
Rahman, Rezaur and Rahman, Rezaur
- Published
- 2013
- Full Text
- View/download PDF
17. Superprocessors and Supercomputers
- Author
-
Roth, Peter Hans, Jacobi, Christian, Weber, Kai, and Hoefflinger, Bernd, editor
- Published
- 2012
- Full Text
- View/download PDF
18. Efficient classification of private memory blocks
- Author
-
Bhargavi R. Upadhyay, Alberto Ros, and Jalpa Shah
- Subjects
Scheme (programming language) ,Multi-core processor ,Hardware_MEMORYSTRUCTURES ,Computer Networks and Communications ,Computer science ,CPU cache ,Translation lookaside buffer ,Directory ,Theoretical Computer Science ,Computer architecture ,Shared memory ,Artificial Intelligence ,Hardware and Architecture ,Granularity ,Latency (engineering) ,computer ,Software ,computer.programming_language - Abstract
Shared memory architectures are pervasive in the multicore technology era. Still, sequential and parallel applications use most of the data as private in a multicore system. Recent proposals using this observation and driven by a classification of private/shared memory data can reduce the coherence directory area or the memory access latency. The effectiveness of these proposals depends on the accuracy of the classification. The existing proposals perform the private/shared classification at page granularity, leading to a miss-classification and reducing the number of detected private memory blocks. We propose a mechanism able to accurately classify memory blocks using the existing translation lookaside buffers (TLB), which increases the effectiveness of proposals relying on a private/shared classification. Our experimental results show that the proposed scheme reduces L1 cache misses by 25% compared to a page-grain classification approach, which translates into an improvement in system performance by 8.0% with respect to a page-grain approach.
- Published
- 2021
19. Adjusting Switching Granularity of Load Balancing for Heterogeneous Datacenter Traffic
- Author
-
Weihe Li, Lyu Wenjun, Wenchao Jiang, Tian He, Jianxin Wang, Jiawei Huang, Zhaoyi Li, and Jinbin Hu
- Subjects
Computer Networks and Communications ,Computer science ,Network packet ,Distributed computing ,Translation lookaside buffer ,Bisection bandwidth ,Throughput ,Load balancing (computing) ,Computer Science Applications ,Load management ,Bandwidth (computing) ,Granularity ,Electrical and Electronic Engineering ,Software - Abstract
The state-of-the-art datacenter load balancing designs commonly optimize bisection bandwidth with homogeneous switching granularity. Their performances surprisingly degrade under mixed traffic containing both short and long flows. Specifically, the short flows suffer from long-tailed delay, while the throughputs of long flows also degrade dramatically due to low link utilization and packet reordering. To solve these problems, we design a traffic-aware load balancing (TLB) scheme to adaptively adjust the switching granularity of long flows according to the load strength of short ones. Under the heavy load of short flows, the long flows use large switching granularity to help short ones obtain more opportunities in choosing short queues to complete quickly. On the contrary, the long flows reroute flexibly with small switching granularity to achieve high throughput. Furthermore, under extremely bursty scenario, we utilize the packet slicing scheme for long flows to release bandwidth for short ones. The experimental results of NS2 simulation and testbed implementation show that TLB significantly reduces the average flow completion time of short flows by 16%-67% over the state-of-the-art load balancers and achieves the high throughput for long flows. Moreover, for extreme bursty case, at the acceptable throughput degradation of long flows, TLB with packet slicing reduces the deadline missing ratio of bursty short flows by up to 80%.
- Published
- 2021
20. One Size Fits all, Again! The Architecture of the Hybrid OLTP&OLAP Database Management System HyPer
- Author
-
Kemper, Alfons, Neumann, Thomas, van der Aalst, Wil, Series editor, Mylopoulos, John, Series editor, Rosemann, Michael, Series editor, Shaw, Michael J., Series editor, Szyperski, Clemens, Series editor, Castellanos, Malu, editor, Dayal, Umeshwar, editor, and Markl, Volker, editor
- Published
- 2011
- Full Text
- View/download PDF
21. WCET-Aware Assembly Level Optimizations
- Author
-
Lokuciejewski, Paul, Marwedel, Peter, Lokuciejewski, Paul, and Marwedel, Peter
- Published
- 2011
- Full Text
- View/download PDF
22. The Power Processing Element (PPE)
- Author
-
Koranne, Sandeep and Koranne, Sandeep
- Published
- 2009
- Full Text
- View/download PDF
23. Algorithm Optimizations: Low Computational Complexity
- Author
-
Novak, Miroslav, Singh, Sameer, editor, Tan, Zheng-Hua, and Lindberg, Børge
- Published
- 2008
- Full Text
- View/download PDF
24. TPE: A Hardware-Based TLB Profiling Expert for Workload Reconstruction
- Author
-
Liwei Zhou, Yunjie Zhang, and Yiorgos Makris
- Subjects
Profiling (computer programming) ,021110 strategic, defence & security studies ,business.industry ,Computer science ,Translation lookaside buffer ,0211 other engineering and technologies ,Hypervisor ,Workload ,02 engineering and technology ,Simics ,020202 computer hardware & architecture ,Software ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Instrumentation (computer programming) ,Electrical and Electronic Engineering ,business ,Computer hardware - Abstract
We propose TPE, a hardware-based framework to perform workload execution forensics in microprocessors. Specifically, TPE leverages custom hardware instrumentation to capture the operational profile of the Translation Lookaside Buffer (TLB), as well as process these information off-line through multiple machine learning and/or deep learning approaches, in order to identify the executed processes and reconstruct the workload. Unlike software-based forensics methods implemented at the operating system (OS) or hypervisor level, whose data logging and monitoring mechanisms may be compromised through software attacks, TPE is implemented directly in hardware and, therefore, provides innate immunity to software tampering. A prototype of TPE is demonstrated in Linux on two representative architectures, i.e., 32-bit $\times 86$ and 64-bit RISC-V, implemented in the Simics and Spike simulation environment respectively. Experimental results using the Mibench workload benchmark suite reveal favorable process identification accuracy at low logging rate, which corroborates the effectiveness and the generalizability of TPE.
- Published
- 2021
25. Pioneer: Verifying Code Integrity and Enforcing Untampered Code Execution on Legacy Systems
- Author
-
Seshadri, Arvind, Luk, Mark, Perrig, Adrian, van Doom, Leendert, Khosla, Pradeep, Jajodia, Sushil, editor, Christodorescu, Mihai, editor, Jha, Somesh, editor, Maughan, Douglas, editor, Song, Dawn, editor, and Wang, Cliff, editor
- Published
- 2007
- Full Text
- View/download PDF
26. Energy-Effective Instruction Fetch Unit for Wide Issue Processors
- Author
-
Aragón, Juan L., Veidenbaum, Alexander V., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Srikanthan, Thambipillai, editor, Xue, Jingling, editor, and Chang, Chip-Hong, editor
- Published
- 2005
- Full Text
- View/download PDF
27. A Fetch Policy Maximizing Throughput and Fairness for Two-Context SMT Processors
- Author
-
Sun, Caixia, Tang, Hongwei, Zhang, Minxuan, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Cao, Jiannong, editor, Nejdl, Wolfgang, editor, and Xu, Ming, editor
- Published
- 2005
- Full Text
- View/download PDF
28. BabelFish: Fusing Address Translations for Containers
- Author
-
Umur Darbaz, Dimitrios Skarlatos, Bhargava Gopireddy, Nam Sung Kim, and Josep Torrellas
- Subjects
Computer science ,media_common.quotation_subject ,Overhead (engineering) ,Cloud computing ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Execution time ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,Latency (engineering) ,Electrical and Electronic Engineering ,Function (engineering) ,media_common ,010302 applied physics ,business.industry ,Translation lookaside buffer ,020202 computer hardware & architecture ,Hardware and Architecture ,Virtual machine ,Virtual memory ,Container (abstract data type) ,Operating system ,Page table ,business ,computer ,Software - Abstract
Cloud computing has begun a transformation from using virtual machines to containers. Containers are attractive because multiple of them can share a single kernel, and add minimal performance overhead. Cloud providers leverage the lean nature of containers to run hundreds of them on a few cores. Furthermore, containers enable the serverless paradigm, which leads to the creation of short-lived processes. In this work, we identify that containerized environments create page translations that are extensively replicated across containers in the TLB and in page tables. The result is high TLB pressure and redundant kernel work during page table management. To remedy this situation, this paper proposes BabelFish, a novel architecture to share page translations across containers in the TLB and in page tables. We evaluate BabelFish with simulations of an 8-core processor running a set of Docker containers in an environment with conservative container co-location. On average, under BabelFish, 53% of the translations in containerized workloads and 93% of the translations in serverless workloads are shared. As a result, BabelFish reduces the mean and tail latency of containerized data-serving workloads by 11% and 18%, respectively. It also lowers the execution time of containerized compute workloads by 11%. Finally, it reduces serverless function bring-up time by 8% and execution time by 10%-55%.
- Published
- 2021
29. Introduction to High-Performance Memory Systems
- Author
-
Hadimioglu, Haldun, Kaeli, David, Kuskin, Jeffrey, Nanda, Ashwini, Torrellas, Josep, Hadimioglu, Haldun, editor, Kuskin, Jeffrey, editor, Torrellas, Josep, editor, Kaeli, David, editor, and Nanda, Ashwini, editor
- Published
- 2004
- Full Text
- View/download PDF
30. Towards an Asynchronous MIPS Processor
- Author
-
Zhang, Qianyi, Theodoropoulos, Georgios, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Omondi, Amos, editor, and Sedukhin, Stanislav, editor
- Published
- 2003
- Full Text
- View/download PDF
31. T
- Author
-
Kajan, Ejub and Kajan, Ejub
- Published
- 2002
- Full Text
- View/download PDF
32. Estimation of tea leaf blight severity in natural scene images
- Author
-
Yan Zhang, Kang Wei, Dong Liang, Gensheng Hu, and Wenxia Bao
- Subjects
Conditional random field ,Spots ,business.industry ,Translation lookaside buffer ,0211 other engineering and technologies ,food and beverages ,Pattern recognition ,04 agricultural and veterinary sciences ,02 engineering and technology ,Convolutional neural network ,Robustness (computer science) ,Metric (mathematics) ,040103 agronomy & agriculture ,0401 agriculture, forestry, and fisheries ,Blight ,Segmentation ,Artificial intelligence ,General Agricultural and Biological Sciences ,business ,021101 geological & geomatics engineering ,Mathematics - Abstract
Tea leaf blight (TLB) is a common tea disease seriously affecting the quality and yield of tea. An accurate estimation of TLB severity can be used to guide tea farmers to reasonably spray pesticides. This study proposes an estimation method for TLB severity in natural scene images and consists of four main steps: segmentation of the diseased leaves, area fitting of the diseased leaves, segmentation of the disease spots, and estimation of disease severity. Target leaves with TLB in the tea images are segmented by combining the U-Net network and fully connected conditional random field to reduce the influence of complex background. An ellipse restoration method is proposed to generate an elliptic mask to fit the full size of the occluded or damaged TLB leaves. The disease spot regions are segmented from the TLB leaves by a support vector machine classifier to calculate the Initial Disease Severity (IDS) index. The IDS index, color features, and texture features of the TLB leaves are inputted into the metric learning model to finally estimated disease severity. Experimental results show that the proposed method has higher estimation accuracy and stronger robustness against occluded and damaged TLB leaves compared with conventional convolution neural network methods and classical machine learning techniques.
- Published
- 2021
33. MemCAM: A Hybrid Memristor-CMOS CAM Cell for On-Chip Caches
- Author
-
Shehzad Hasan and Zareen Sadiq
- Subjects
Speedup ,General Computer Science ,Computer science ,translation lookaside buffer ,02 engineering and technology ,Memristor ,miss rate reduction ,law.invention ,law ,memristor crossbar ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,System on a chip ,Hardware_MEMORYSTRUCTURES ,business.industry ,Memristor content-addressable memory ,Translation lookaside buffer ,General Engineering ,Content-addressable memory ,021001 nanoscience & nanotechnology ,020202 computer hardware & architecture ,CMOS ,Branch target predictor ,Embedded system ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,0210 nano-technology ,business ,lcsh:TK1-9971 - Abstract
Non-volatile nanoscale memory devices (such as memristors) have promised to overcome the challenges of scalability and leakage currents of CMOS based memory devices. These novel memories can be fabricated in back-end-of-the-line of any CMOS process. Currently, a lot of research is focused on investigating the benefits of memristors for associative memories. These are Content-Addressable Memories (CAM) in which search based data access takes place. Searching for a particular bit in memristor is time consuming while search in CMOS CAM zone is efficient. To combine the speed and ease of search of CMOS memory and the scalability of memristor memory, we present a novel multibit hybrid CMOS-Memristor Associative Memory Cell. The benefits of such memory cells manifest in on-chip caches - the instruction and data cache, Branch Target Buffer, and Translation Lookaside Buffer. To exemplify the benefit of the cell further, we also simulate the MemCAM as the TLB of an ARM processor and obtained upto 50% decrease in miss rates of Data TLB and upto 93% in that of Instruction TLB. Average speedup of 1.16 was also achieved on various benchmark applications of PARCSEC and MiBench suites.
- Published
- 2021
34. Modeling and Analysis of the Page Sizing Problem for NVM Storage in Virtualized Systems
- Author
-
Yunjoo Park and Hyokyung Bahn
- Subjects
General Computer Science ,Page fault ,Computer science ,page fault ,02 engineering and technology ,computer.software_genre ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,General Materials Science ,address translation ,Electrical and Electronic Engineering ,Page ,memory performance ,Hardware_MEMORYSTRUCTURES ,Translation lookaside buffer ,General Engineering ,020206 networking & telecommunications ,Virtualization ,virtualization ,020202 computer hardware & architecture ,Non-volatile memory ,Memory management ,Page size ,Operating system ,NVM ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,lcsh:TK1-9971 ,computer ,Access time - Abstract
Recently, NVM (non-volatile memory) has advanced as a fast storage medium, and traditional memory management systems designed for HDD storage should be reconsidered. In this article, we revisit the page sizing problem in NVM storage, specially focusing on virtualized systems. The page sizing problem has not caught attention in traditional systems because of the two reasons. First, the memory performance is not sensitive to the page size when HDD is adopted as storage. We show that this is not the case in NVM storage by analyzing the TLB miss rate and the page fault rate, which have trade-off relations with respect to the page size. Second, changing the page size in traditional systems is not easy as it accompanies significant overhead. However, due to the widespread adoption of virtualized systems, the page sizing problem becomes feasible for virtual machines, which are generated for executing specific workloads with fixed hardware resources. In this article, we design a page size model that accurately estimates the TLB miss rate and the page fault rate for NVM storage. We then present a method that has the ability of estimating the memory access time as the page size is varied, which can guide a suitable page size for given environments. By considering workload characteristics with given memory and storage resources, we show that the memory performance of virtualized systems can be improved by 38.4% when our model is adopted.
- Published
- 2021
35. Detecting Hardware-Assisted Virtualization With Inconspicuous Features
- Author
-
Yueqiang Cheng, Yi Zou, Zhi Zhang, Dongxi Liu, Yansong Gao, and Surya Nepal
- Subjects
021110 strategic, defence & security studies ,Computer Networks and Communications ,Computer science ,business.industry ,Translation lookaside buffer ,0211 other engineering and technologies ,Hardware-assisted virtualization ,Cloud computing ,02 engineering and technology ,Transparency (human–computer interaction) ,computer.software_genre ,Virtualization ,Virtual machine ,Operating system ,Malware ,Cache ,Safety, Risk, Reliability and Quality ,business ,computer - Abstract
Recent years have witnessed the proliferation of the deployment of virtualization techniques. Virtualization is designed to be transparent, that is, unprivileged users should not be able to detect whether a system is virtualized. Such detection can result in serious security threats such as evading virtual machine (VM)-based malware dynamic analysis and exploiting vulnerabilities for cross-VM attacks. The traditional software-based virtualization leaves numerous artifacts/fingerprints, which can be exploited without much effort to detect the virtualization. In contrast, current mainstream hardware-assisted virtualization significantly enhances the virtualization transparency, making itself more transparent and difficult to be detected. Nonetheless, we showcase three new identified low-level inconspicuous features, which can be leveraged by an unprivileged adversary to effectively and stealthily detect the hardware-assisted virtualization. All three features come from the chipset fingerprints, rather than the traces of software-based virtualization implementations (e.g., Xen or KVM). The identified features include i) Translation-Lookaside Buffer (TLB) stores an extra layer of address translations; ii) Last-Level Cache (LLC) caches one more layer of page-table entries; and iii) Level-1 Data (L1D) Cache is unstable. Based on the above features, we develop three corresponding virtualization detection techniques, which are then comprehensively evaluated on three native environments and three popular cloud providers: i) Amazon Elastic Compute Cloud, ii) Google Compute Engine and iii) Microsoft Azure. Experimental results validate that these three adversarial detection techniques are effective (with no false positive) and stealthy (without triggering suspicious system events, e.g., VM-exit ) in detecting the above commodity virtualized environments.
- Published
- 2021
36. Monolithic 3D-Based SRAM/MRAM Hybrid Memory for an Energy-Efficient Unified L2 TLB-Cache Architecture
- Author
-
Young-Ho Gong
- Subjects
General Computer Science ,CPU cache ,Computer science ,02 engineering and technology ,01 natural sciences ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Static random-access memory ,energy efficiency ,Monolithic 3D ,010302 applied physics ,Magnetoresistive random-access memory ,Random access memory ,Hardware_MEMORYSTRUCTURES ,business.industry ,Translation lookaside buffer ,General Engineering ,Cache-only memory architecture ,cache memory ,SRAM ,MRAM ,020202 computer hardware & architecture ,Memory management ,Embedded system ,translation look-aside buffer ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,lcsh:TK1-9971 ,Efficient energy use - Abstract
Monolithic 3D (M3D) integration has been emerged as a promising technology for fine-grained 3D stacking. As the M3D integration offers extremely small dimension of via in a nanometer-scale, it is beneficial for small microarchitectural blocks such as caches, register files, translation look-aside buffers (TLBs), etc. However, since the M3D integration requires low-temperature process for stacked layers, it causes lower performance for stacked transistors compared to the conventional 2D process. In contrast, non-volatile memory (NVM) such as magnetic RAM (MRAM) is originally fabricated at a low temperature, which enables the M3D integration without performance degradation. In this paper, we propose an energy-efficient unified L2 TLB-cache architecture exploiting M3D-based SRAM/MRAM hybrid memory. Since the M3D-based SRAM/MRAM hybrid memory consumes much smaller energy than the conventional 2D SRAM-only memory and 2D SRAM/MRAM hybrid memory, while providing comparable performance, our proposed architecture improves energy efficiency significantly. Especially, as our proposed architecture changes the memory partitioning of the unified L2 TLB-cache depending on the L2 cache miss rate, it maximizes the energy efficiency for parallel workloads suffering extremely high L2 cache miss rate. According to our analysis using PARSEC benchmark applications, our proposed architecture reduces the energy consumption of L2 TLB + L2 cache by up to 97.7% (53.6% on average), compared to the baseline with the 2D SRAM-only memory, with negligible impact on performance. Furthermore, our proposed technique reduces the memory access energy consumption by up to 32.8% (10.9% on average), by reducing memory accesses due to TLB misses.
- Published
- 2021
37. Improving the Precise Interrupt Mechanism of Software- Managed TLB Miss Handlers
- Author
-
Jaleel, Aamer, Jacob, Bruce, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Monien, Burkhard, editor, Prasanna, Viktor K., editor, and Vajapeyam, Sriram, editor
- Published
- 2001
- Full Text
- View/download PDF
38. Content-Based Prefetching: Initial Results
- Author
-
Cooksey, Robert, Colarelli, Dennis, Grunwald, Dirk, Goos, G., editor, Hartmanis, J., editor, van Leeuwen, J., editor, Chong, Frederic T., editor, Kozyrakis, Christoforos, editor, and Oskin, Mark, editor
- Published
- 2001
- Full Text
- View/download PDF
39. An Architectural and Circuit-Level Approach to Improving the Energy Efficiency of Microprocessor Memory Structures
- Author
-
Albonesi, David H., Silveira, Luis Miguel, editor, Devadas, Srinivas, editor, and Reis, Ricardo, editor
- Published
- 2000
- Full Text
- View/download PDF
40. High-Resolution Weather Forecasting: A Teraflop Sustained on RISC/cache or Vector Processors
- Author
-
Thomas, S. J., Desgagné, M., Valin, M., Pollard, Andrew, editor, Mewhort, Douglas J. K., editor, and Weaver, Donald F., editor
- Published
- 2000
- Full Text
- View/download PDF
41. NWCache: Optimizing disk accesses via an optical network/write cache hybrid
- Author
-
Carrera, Enrique V., Bianchini, Ricardo, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Rolim, José, editor, Mueller, Frank, editor, Zomaya, Albert Y., editor, Ercal, Fikret, editor, Olariu, Stephan, editor, Ravindran, Binoy, editor, Gustafsson, Jan, editor, Takada, Hiroaki, editor, Olsson, Ron, editor, Kale, Laxmikant V., editor, Beckman, Pete, editor, Haines, Matthew, editor, ElGindy, Hossam, editor, Caromel, Denis, editor, Chaumette, Serge, editor, Fox, Geoffrey, editor, Pan, Yi, editor, Li, Keqin, editor, Yang, Tao, editor, Chiola, G., editor, Conte, G., editor, Mancini, L. V., editor, Méry, Domenique, editor, Sanders, Beverly, editor, Bhatt, Devesh, editor, and Prasanna, Viktor, editor
- Published
- 1999
- Full Text
- View/download PDF
42. Summary
- Author
-
Wieferink, Andreas, Meyr, Heinrich, Leupers, Rainer, Wieferink, Andreas, Meyr, Heinrich, and Leupers, Rainer
- Published
- 2008
- Full Text
- View/download PDF
43. When Storage Response Time Catches Up With Overall Context Switch Overhead, What Is Next?
- Author
-
Tei-Wei Kuo, Ming-Chang Yang, Chun-Feng Wu, and Yuan-Hao Chang
- Subjects
Random access memory ,Hardware_MEMORYSTRUCTURES ,Page fault ,CPU cache ,business.industry ,Computer science ,Translation lookaside buffer ,Response time ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Memory management ,Embedded system ,Virtual memory ,0202 electrical engineering, electronic engineering, information engineering ,Central processing unit ,Electrical and Electronic Engineering ,business ,Software ,Context switch - Abstract
The virtual memory technique provides a large and cheap memory space by extending the memory space with storage devices. It applies context switch to asynchronously swapping pages between memory and storage devices for hiding the long response time of storage devices when a page fault occurs. However, the overall context switch overhead is high because the context switch itself is a complex function and would further incur TLB shootdown/flush and compulsory CPU cache misses after context switches. On the contrary, as the rapid responsiveness improvement of high-end storage devices, we observe that the response time of high-end storage devices catches up and gradually becomes smaller than the overall context switch overhead. At this turning point, to further enhance the system responsiveness, we advocate adopting synchronous swapping rather than context switch in response to page faults. Meanwhile, we propose a strategy, called shadow huge page management, to further improve the overall system performance by minimizing the overall time overheads caused by page faults and page swappings. Evaluation results show that the proposed system can efficiently reduce the total CPU wasting time.
- Published
- 2020
44. <scp>ECO</scp> TLB
- Author
-
Tushar Krishna, Steffen Maass, Taesoo Kim, Mohan Kumar, and Abhishek Bhattacharjee
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,Address space ,Translation lookaside buffer ,Linux kernel ,computer.software_genre ,Asynchrony (computer programming) ,Hardware and Architecture ,Asynchronous communication ,Operating system ,Isolation (database systems) ,Interrupt ,Page table ,computer ,Software ,Information Systems - Abstract
We propose ecoTLB —software-based eventual translation lookaside buffer (TLB) coherence—which eliminates the overhead of the synchronous TLB shootdown mechanism in operating systems that use address space identifiers (ASIDs). With an eventual TLB coherence, ecoTLB improves the performance of free and page swap operations by removing the inter-processor interrupt (IPI) overheads incurred to invalidate TLB entries. We show that the TLB shootdown has implications for page swapping in particular in emerging, disaggregated data centers and demonstrate that ecoTLB can improve both the performance and the specific swapping policy decisions using ecoTLB ’s asynchronous mechanism. We demonstrate that ecoTLB improves the performance of real-world applications, such as Memcached and Make, that perform page swapping using Infiniswap , a solution for next generation data centers that use disaggregated memory, by up to 17.2%. Moreover, ecoTLB improves the 99th percentile tail latency of Memcached by up to 70.8% due to its asynchronous scheme and improved policy decisions. Furthermore, we show that recent features to improve security in the Linux kernel, like kernel page table isolation (KPTI), can result in significant performance overheads on architectures without support for specific instructions to clear single entries in tagged TLBs, falling back to full TLB flushes. In this scenario, ecoTLB is able to recover the performance lost for supporting KPTI due to its asynchronous shootdown scheme and its support for tagged TLBs. Finally, we demonstrate that ecoTLB improves the performance of free operations by up to 59.1% on a 120-core machine and improves the performance of Apache on a 16-core machine by up to 13.7% compared to baseline Linux, and by up to 48.2% compared to ABIS, a recent state-of-the-art research prototype that reduces the number of IPIs.
- Published
- 2020
45. Технології апаратної віртуалізації мікропроцесорів Intel
- Author
-
Yu. Povstiana, N. Khrystynets, M. Dovgonyuk, N. Cherniashchuk, and О. Miskevych
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer Networks and Communications ,Computer science ,Translation lookaside buffer ,computer.software_genre ,Memory controller ,Hardware and Architecture ,Memory virtualization ,Virtual memory ,Operating system ,Cache ,Cache hierarchy ,computer ,Software ,Range (computer programming) - Abstract
The features of the Nehalem architecture of microprocessors are considered: memory controller, cache hierarchy, TLB and memory access organization. The technical characteristics of the processors on the LGA1156 socket within one model range are presented as the results of tests of this architecture and the methods of virtual memory organization are investigated.
- Published
- 2020
46. Object-Level Memory Allocation and Migration in Hybrid Memory Systems
- Author
-
Bingsheng He, Hai Jin, Liu Renshan, Haikun Liu, Yu Zhang, and Xiaofei Liao
- Subjects
Random access memory ,Hardware_MEMORYSTRUCTURES ,Source code ,Computer science ,business.industry ,media_common.quotation_subject ,Translation lookaside buffer ,02 engineering and technology ,Static memory allocation ,020202 computer hardware & architecture ,Theoretical Computer Science ,Non-volatile memory ,Memory management ,Computational Theory and Mathematics ,Hardware and Architecture ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Cache ,business ,Software ,Dram ,Data migration ,media_common - Abstract
Hybrid memory systems composed of emerging non-volatile memory (NVM) and DRAM have drawn increasing attention in recent years. To fully exploit the advantages of both NVM and DRAM, a primary goal is to properly place application data on the hybrid memories. Previous studies have focused on page migration schemes to achieve higher performance and energy efficiency. However, those schemes all rely on online page access monitoring (costly), and data migration at the page granularity may cause additional overhead due to DRAM bandwidth contention and maintenance of cache/TLB consistency. In this article, we present Object-level memory Allocation and Migration (OAM) mechanisms for hybrid memory systems. OAM exploits a profiling tool to characterize objects’ memory access patterns at different execution phases of applications, and applies a performance/energy model to direct the initial static memory allocation and runtime dynamic object migration between NVM and DRAM. Based on our newly-developed programming interfaces for hybrid memory systems, application source codes can be automatically transformed via static code instrumentation. We evaluate OAM on an emulated hybrid memory system, and experimental results show that OAM can significantly reduce system energy-delay-product by 61 percent on average compared to a page-interleaving data placement scheme. It can also significantly reduce data migration overhead by 83 and 69 percent compared to the state-of-the-art page migration scheme CLOCK-DWF and 2PP, respectively, while improving application performance by up to 22 and 10 percent.
- Published
- 2020
47. Formal Reasoning Under Cached Address Translation
- Author
-
Hira Taqdees Syeda and Gerwin Klein
- Subjects
Hardware_MEMORYSTRUCTURES ,Programming language ,Computer science ,Address space ,Translation lookaside buffer ,Classical logic ,computer.software_genre ,Memory management unit ,Computational Theory and Mathematics ,Artificial Intelligence ,Cache ,Page table ,computer ,Software ,Context switch ,Abstraction (linguistics) - Abstract
Operating system (OS) kernels achieve isolation between user-level processes using hardware features such as multi-level page tables and translation lookaside buffers (TLBs). The TLB caches address translation, and therefore correctly controlling the TLB is a fundamental security property of OS kernels—yet all large-scale formal OS verification projects we are aware of leave the correct functionality of TLB as an assumption. In this paper, we present a verified sound abstraction of a detailed concrete model of the memory management unit (MMU) of the ARMv7-A architecture. This MMU abstraction revamps our previous address space specific MMU abstraction to include new software-visible TLB features such as caching of globally-mapped and partial translation entries in a two-stage TLB. We use this abstraction as the underlying model to develop a logic for reasoning about low-level programs in the presence of cached address translation. We extract invariants and necessary conditions for correct TLB operation that mirrors the informal reasoning of OS engineers. We systematically show how these invariants adapt to global and partial translation entries. We show that our program logic reduces to a standard logic for user-level reasoning, reduces to side-condition checks for kernel-level reasoning, and can handle typical OS kernel tasks such as context switching.
- Published
- 2020
48. TLB Coalescing for Multi-Grained Page Migration in Hybrid Memory Systems
- Author
-
Xiaoyuan Wang, Haikun Liu, Xiaofei Liao, Yu Zhang, and Hai Jin
- Subjects
TLB coalescing ,General Computer Science ,Computer science ,Virtual memory ,02 engineering and technology ,Parallel computing ,Memory systems ,01 natural sciences ,hybrid memory system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,General Materials Science ,Electrical and Electronic Engineering ,page migration ,multiple page size ,010302 applied physics ,Hardware_MEMORYSTRUCTURES ,Translation lookaside buffer ,General Engineering ,020202 computer hardware & architecture ,Memory management ,Key (cryptography) ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Cache ,lcsh:TK1-9971 ,Dram - Abstract
Superpages have long been proposed to enlarge the coverage of translation lookaside buffer (TLB). They are extremely beneficial for reducing address translation overhead in big memory systems, such as hybrid memory systems that composed of DRAM and non-volatile memories (NVMs). However, superpages conflict with fine-grained memory migration, one of the key techniques in hybrid memory systems to improve performance and energy efficiency. Fine-grained page migrations usually require to splinter superpages, mitigating the benefit of TLB hardware for superpages. In this paper, we present Tamp, an efficient memory management mechanism to support multiple page sizes in hybrid memory systems. We manage large-capacity NVM using superpages, and use a relatively small size of DRAM to cache hot base pages within the superpages. We find that there are remarkable contiguity exist for hot base pages in superpages. In response, we bind those contiguous hot pages together and migrate them to DRAM. We also propose multi-grained TLBs to coalesce multiple page address translations into a single TLB entry. Our experimental results show that Tamp can significantly reduce TLB misses by 62.4% on average, and improve application performance (IPC) by 16.2%, compared to a page migration policy without TLB coalescing support.
- Published
- 2020
49. System Level Representation
- Author
-
Geuskens, Bibiche, Rose, Kenneth, Geuskens, Bibiche, and Rose, Kenneth
- Published
- 1998
- Full Text
- View/download PDF
50. Translation lookaside buffer management
- Author
-
Y. I. Klimiankou
- Subjects
Os kernel ,Hardware_MEMORYSTRUCTURES ,tlb management ,Computer science ,Translation lookaside buffer ,Information technology ,T58.5-58.64 ,computer.software_genre ,Associative cache ,physical memory ,virtual memory ,Memory management ,Physical address ,Virtual memory ,Operating system ,Overhead (computing) ,memory management ,computer - Abstract
This paper focuses on the Translation Lookaside Buffer (TLB) management as part of memory management. TLB is an associative cache of the advanced processors, which reduces the overhead of the virtual to physical address translations. We consider challenges related to the design of the TLB management subsystem of the OS kernel on the example of the IA-32 platform and propose a simple model of complete and consistent policy of TLB management. This model can be used as a foundation for memory management subsystems design and verification.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.